Mixup Deep Dive
What is Mixup Data Augmentation really?
Intro
In this post we are going to dive into what Mixup is. Mixup is a very powerful data augmentation tool that is super helpful tool to have in your toolbox, especially when you don't have enough data and are overfitting.
The goal of this post is not to show you the intricacies of training a model using mixup - that will be reserved for a future post. The goal of this post is to communicate an intuitive understanding of what mixup is and why it works. If you don't know what the tool is, it's impossible to have good intuition on how and when to use it.
We will be using the Pets Dataset to demonstrate this.
Bonus Challenge:As you go through each step, think about what other kinds of data you may be able to apply these concepts to. Could you apply these transformations to NLP Embeddings? Could you apply these transformations to Tabular Data?
from fastai2.data.external import *
from fastai2.vision.all import *
from PIL import Image
import matplotlib.pyplot as plt
from pylab import rcParams
from functools import partial,update_wrapper
seed = 42
# Download and get path for dataseet
path = untar_data(URLs.PETS) #Sample dataset from fastai2
path.ls()
def plot_images(imgs):
rcParams['figure.figsize'] = 10, 20
imgs_len = len(imgs)
for x in range(0,imgs_len):
plt.subplot(1,imgs_len,x+1)
plt.imshow(imgs[x])
pets = DataBlock(
blocks = (ImageBlock, CategoryBlock),
get_items = get_image_files,
splitter= RandomSplitter(valid_pct = 0.2, seed=seed),
get_y= using_attr(RegexLabeller(r'(.+)_\d+.jpg$'),'name'),
item_tfms=Resize(460),
batch_tfms=aug_transforms(min_scale = 0.9,size=224)
)
dls = pets.dataloaders(path/"images")
Mixup Explained
So what is Mixup really? To undestand that, we are first going to look at a couple pictures and understand what a Neural network would normally see an image then use the same images and apply mixup after that. To say that another way. we want to understand the inputs and the outputs. To say that another way, we want to know our xs and our ys
x: No Mixup
Let's use 2 images as an example. I have plotted them below.
im1 = tensor(Image.open((path/'images').ls()[8]).resize((500,371))).float()/255;
im2 = tensor(Image.open((path/'images').ls()[6]).resize((500,371))).float()/255;
plot_images([im1,im2])
Great, so the inputs are the pictures. What are the outputs? Well the output is going to be what breed they are. Let's see what breed they are.
(path/'images').ls()[8],(path/'images').ls()[6]
Ok we can see in the file name that the dog is a leonberger and the cat is a ragdoll. So now we need to translate that into the one-hot encoded matrix for our model to predict. Looking at dls.vocab gives us all the class names.
dls.vocab
Let's define y for these 2 images. In a normal scenario, we have 1 column per class. When looking at the vocab above we saw that there were 37 classes. All of them will be 0 except the target.
Let's start by figuring out which column is the target (ie leonberger and ragdoll). Then we just need a tensor of length 37 that is all zeros except that position which will be a 1.
list(dls.vocab).index('leonberger'),list(dls.vocab).index('Ragdoll')
y_leonberger = tensor([0]*25+[1]+[0]*(37-26))
# 37 classes long, all 0 except position 8 which represents Ragdoll and is 1
y_Ragdoll = tensor([0]*8+[1]+[0]*(37-9))
print(y_leonberger)
print(y_Ragdoll)
Great! We have our images that go in, and our output we want to predict. This is what a normal neural network is going to try to predict. Let's see whats different if we use these 2 images with the Mixup data Augmentation
x: Yes Mixup
For the images, we are going to apply an augmentation. What Mixup does, it really mixing up 2 images together. The name is really exactly what it is doing. Let's apply mixup to an image and see what I mean.
Let's take a mix of the 2 images. We will take 40% of the first image, and 60% of the second image and plot them. We are doing this by multiplying the actual pixel values.
For example, If the pixel 1 value from image 1 .4 + pixel 1 value from image 2 .6 and that will equal pixel 1 value in my new image. Take a look at the third image and you can see it really does have a bit of each image in there.
im_mixup = im1*.6+im2*.4
plot_images([im1,im2,im_mixup])
y: Yes Mixup
So now we have our new augmented image with mixup. Clearly it's not really fair to call it 100% of either class. In fact it's 60% of one class and 40% of the other. So how does our Y work?
Well, we already have our ys when they are 100% of either class, so lets just take 60% of one + 40% of the other exactly like we did for our images. That should give us an appropriate label.
y_leonberger = tensor([0]*25+[1]+[0]*(37-26))
# 37 classes long, all 0 except position 8 which represents Ragdoll and is 1
y_Ragdoll = tensor([0]*8+[1]+[0]*(37-9))
y_mixup = y_leonberger*.6+y_Ragdoll*.4
print(y_leonberger)
print(y_Ragdoll)
print(y_mixup)
learn = cnn_learner(dls,resnet34,cbs=MixUp)