Training Augmentation

DiffAug and ADA

DiffAug and ADA are image augmentation strategies applied during training to help the model learn effectively when trained on a smaller or limited dataset. By introducing controlled variations to the training images, these methods allow the model to learn more robustly from smaller datasets. Each method uses its augmentation pipeline, applying techniques such as flipping, colour shifts, or geometric changes to improve training outcomes.

It's important to note that these augmentations are not applied directly to your dataset files. Instead, they act like a lens during training. When the model "sees" your images, it sees an augmented version based on the chosen pipeline. Your original data remains unchanged.

Choosing Augmentation Method

Choosing between DiffAug and ADA primarily depends on two factors: the size of your dataset and the training goal. It's important to consider both before beginning the training process.

Criteria	DiffAug	ADA
Dataset Size	Small to medium dataset (fewer than 10k images)	Works well with all sizes
Training Speed	Faster	Slower
Variation	Creates more visual variation in the final model	More consistent visuals compared to the dataset
Best For	Quick experiments, smaller datasets, and cases where variety is welcome	Long, stable training sessions with a focus on dependable model output quality

Choosing Augmentation Pipeline

A general rule of thumb, the more data you have, the fewer augmentation techniques you need, which also means faster training. The pipeline refers to the combination of augmentation techniques applied during training to help the model learn better. Choosing the right pipeline would depend on your dataset.

DiffAug Pipeline

When choosing a pipeline for DiffAug, consider how much variation your dataset can realistically tolerate. Since DiffAug applies augmentations on each training batch, it introduces more visual variety, which can help the model generalize better, but it can also shift the dataset from its original form. Below are quick guidelines to help you choose the right components:

Colour
Ideal for datasets captured under various lighting conditions or environments.
Avoid if your dataset is colour-sensitive, like artwork where colour accuracy matters.
Translation
Useful when the subject appears in different positions across the dataset.
Avoid if spatial consistency is important (e.g., faces or aligned subjects)
Cutout
Helps the model learn to focus on different parts of the image by masking random regions.
Avoid if your dataset is very small or if the subject takes up only a small part of the image, as critical details might be lost.

ADA Pipeline

When choosing an ADA pipeline, the most important factor is your dataset size. Each augmentation technique has different benefits depending on how much data you're working with. A good starting point for most cases is the default 'bgc' pipeline. As a general rule, the more data you have, the less augmentation you need. For datasets with 10k images or fewer, 'bgc' helps maintain stability during training. When your dataset exceeds 100k images, it's often best to simplify the pipeline by using only 'bg', or even disabling augmentations altogether (to be implemented). Ultimately, every dataset behaves differently, so it's always best to experiment and adjust based on your training results.

Below is a breakdown of each augmentation component used in ADA pipelines:

Blit (b): Flips image to left/right, 90° interval rotations, and translation.
Geometric (g): Zooms the image in/out, rotates the image (up to 360°), and stretches or squishes the image.
Colour (c): Perform brightness, contrast, and colour tone, and saturation changes to simulate different lighting.
Filter (f): Sharpens or blurs the image by adjusting textures.
Noise (n): Adds noise to the image.
Cutout (c): Covers a random part of the image using a patch.

Note: The pipeline is defined using combinations of these letters. For example, 'bgcfnc' includes Blit, Geometric, Colour, Filter, Noise, and Cutout augmentations.

Diving Deeper: Technical Breakdown of Augmentation Methods

Both DiffAug and ADA perform augmentations during training to help models generalize better. This is especially useful when working with smaller datasets or with unbalanced ones where certain visual styles are underrepresented. However, the two methods operate in fundamentally different ways. DiffAug applies constant augmentations per batch, introducing consistent randomness throughout training, which can lead to greater diversity in generated outputs. In contrast, ADA augments data based on the discriminator's confidence, adaptively increasing or decreasing augmentation strength over time to maintain training stability. This difference in augmentation strategy not only affects the training dynamics but also the visual consistency and convergence behavior of the final model.

Difference between DiffAug and ADA

Category	DiffAug	ADA
Purpose	Applies fixed, differentiable augmentations to each batch of real and fake images	Dynamically adjusts augmentation strength to prevent discriminator overfitting during training
Applies To	Both real and fake images.	Both real and fake images.
Augmentation Strength	Fixed strength with randomized augmentations applied uniformly across training	Adaptive strength which adjusts in response to the discriminator's performance during training
Augmentation Types	• Colour (brightness, saturation, contrast) • Translation (X, Y shift) • Cutout (random masking)	• Blit (pixel shifts, flips, 90° rotations) • Geometric (zoom, rotate, stretch) • Colour • Filter (blur/sharpen) • Noise • Cutout

Unlike DiffAug, which applies fixed augmentations to every batch from the start of training, ADA adjusts augmentation strength dynamically over time. It begins with an augmentation probability $p = 0$, allowing the discriminator to initially learn from unaltered real and generated images. As training progresses and the discriminator becomes increasingly confident, risking overfitting, ADA gradually increases $p$, making augmentations stronger to regularize the training. This adaptive mechanism helps prolong stable training by preventing the discriminator from memorizing the dataset. However, if $p$ becomes too high, the augmentations can start to degrade the quality of training. Therefore, it's important to experiment with the generated .pkl checkpoints along the way to find the best model output for your needs.

For more technical details, please refer to:

DiffAug: https://arxiv.org/pdf/2006.10738
ADA: https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/ada-paper.pdf

Augmentation Examples

DiffAUG Transformation Previews

Colour

Colour transformations adjust brightness, saturation, and contrast to simulate different lighting conditions.
Translation

Translation shifts the image along the X and Y axes. Useful when subjects appear in different positions.
Cutout

Cutout masks random regions of the image, helping the model learn to focus on different parts of the image.

ADA Transformation Previews

Blit (b)

Pixel blitting includes flips, 90° interval rotations, and translation operations.
Geometric (g)

General geometric transformations include zoom, rotation (up to 360°), and stretching/squishing.
Colour (c)

Colour transformations simulate different lighting through brightness, contrast, and saturation changes.
Filter (f)

Image-space filtering includes sharpening and blurring operations to adjust textures.
Noise (n) & Cutout (c)

Image-space corruptions include adding noise and masking random regions with patches.