This post will walk through how to do use soft labeling in fastai, and demonstrate how it helps with noisy labels to improve training and your metrics.

This post was inspired by a 1st place kaggle submission (not mine), so we know it's a good idea! The repo for that is here which is done in pytorch lightning. This post will use fastai.

Let's get started!

Imports

from fastai.vision.all import *
path = untar_data(URLs.IMAGEWOOF)
from sklearn.model_selection import StratifiedKFold
from numpy.random import default_rng

Get Noisy Data

I am using the noisy datasets repo that was hugely inspired by the noisy imagenette repository to get noisy labels for the imagewoof dataset.

First we get the noisy imagewoof csv, then use that to build the dataloaders.

#this code is taken from the noisy imagenette github repo linked above with slight modifications
def get_dls(size, woof, pct_noise, bs, splitter=ColSplitter()):
    path = untar_data(URLs.IMAGEWOOF)
    df = pd.read_csv('https://raw.githubusercontent.com/Isaac-Flath/noisy_datasets/main/noisy_imagewoof.csv')
    df = df.loc[df.is_valid==False]
    batch_tfms = [Normalize.from_stats(*imagenet_stats)]
    dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                       splitter=splitter,
                       get_x=ColReader('path', pref=path), 
                       get_y=ColReader(f'noisy_labels_{pct_noise}'),
                       item_tfms=[RandomResizedCrop(size, min_scale=0.35), FlipItem(0.5)],
                       batch_tfms=batch_tfms)
    return dblock.dataloaders(df, bs=bs)

dls = get_dls(224,woof=True,pct_noise=5,bs=16)
dls.show_batch()

Create Crossfold, Train, and Predict

The reason I am doing cross folds is to get predicted labels on the training set. The predicted labels on the training set are using labels each model was not trained on.

Note: I am doing this with a 2 fold, but you may want to use a 5-fold or more folds.

This cross-fold code was mostly supplied by Zach Mueller, with minor modifications by me for this dataset and tutorial. There is also a tutorial he wrote with more details here

df = pd.read_csv('https://raw.githubusercontent.com/Isaac-Flath/noisy_datasets/main/noisy_imagewoof.csv')
train_df = df.loc[df.is_valid==False]
df.head(3)

skf = StratifiedKFold(n_splits=2, shuffle=True, random_state=1)
splits, preds, targs, preds_c,  = [],[],[],[]
items = pd.DataFrame(columns = ['path', 'noisy_labels_1', 'noisy_labels_5', 'noisy_labels_25','noisy_labels_50', 'is_valid'])

for _, val_idx in skf.split(train_df.path,train_df.noisy_labels_5):
    splitter = IndexSplitter(val_idx)
    splits.append(val_idx)

    dls = get_dls(224,woof=True,pct_noise=5,bs=16,splitter=splitter)

    learn = cnn_learner(dls,resnet18,metrics=[accuracy,RocAuc()])
    learn.fine_tune(10,reset_opt=True)
    
    # store predictions
    p, t, c = learn.get_preds(ds_idx=1,with_decoded=True)
    preds.append(p); targs.append(t); preds_c.append(c); 
    items = pd.concat([items,dls.valid.items])

Look at Predictions

Lets throw it all in a dataframe so we can look at what we have a little easier. First, let's break out our different pieces of information.

imgs = L(o for o in items.path.values)
y_true = L(o for o in items.noisy_labels_5.values) # Labels from dataset
y_targ = L(dls.vocab[o] for o in torch.cat(targs)) # Labels from out predictions
y_pred = L(dls.vocab[o] for o in torch.cat(preds_c)) # predicted labels or "pseudo labels"
p_max = torch.cat(preds).max(dim=1)[0] # max model score for row

We can double check we are matching things up correctly by checking that the labels line up from the predictions and the original data. Throwing some simple assert statements in is nice because it takes no time and it will let you know if you screw something up later as you are tinkering with things.

assert (y_true == y_targ) # test we matched these up correct

Put it in a dataframe and see what we have.

res = pd.DataFrame({'imgs':imgs,'y_true':y_true,'y_pred':y_pred}).set_index('imgs')
print(res.shape)
print(df.shape)
res.sample(5)

(9025, 2)
(12954, 7)

Soft Labeling Setup

Now, we have all the data we need to train a model with soft labels. To recap we have:

Dataloaders with noisy labels
Dataframe with img path, y_true, and y_pred (pseudo labels we generated in the cross-fold above)

Now, we will need to convert things to one-hot encoding, so let's do that for our dataframe

res = pd.get_dummies(res,columns=['y_true','y_pred'])

Now, lets change the Loss Function and Metric to support one hot encoded targets

class CrossEntropyLossOneHot(nn.Module):
    def __init__(self):
        super(CrossEntropyLossOneHot, self).__init__()
        self.log_softmax = nn.LogSoftmax(dim=-1)

    def forward(self, preds, labels):
        return torch.mean(torch.sum(-labels * self.log_softmax(preds), -1))

def accuracy(inp, targ, axis=-1):
    "Compute accuracy with `targ` when `pred` is bs * n_classes"
    pred,targ = flatten_check(inp.argmax(dim=axis), targ.argmax(dim=axis))
    return (pred == targ).float().mean()

Soft Labeling CallBack

Finally, lets write the callback that does the Soft Labeling.

There's a few components to this. To put in english what is happening below in each section:

before_train and before_validate: This is grabbing the list of images for the entire dataloader. We don't need to do this every batch, so it fits well here;
before_batch: This filters the list of images that was defined down to only the images in our batch. From there, it one hot encodes the y variable, and if it's a training batch it does y_true 0.7 + y_pred 0.3. We don't want to smooth the validation set as we want a good representation of what the metrics would be on a separate test set. This is the core of soft labeling.

The intuition for this is that the labels that the model in the crossfold got wrong above have a higher chance of just being incorrect labels. So we smooth those out to punish incorrect classifications less.

Note: You can set thresholds for soft labeling to smooth more or less based on the confidence your predicted labels have. I don’t have that built into this callback, but it is something you can experiment with!

This Callback a collaboration:

Zach Mueller got me started with the callback system in fastai, particularly around dataloader batch indexing in fastai
Kevin H. and I collaborated on this. We were working on this for a joint project, and we were both running experiments to get it to work right and perform.

class SoftLabelCB(Callback):
    def __init__(self, df_preds,y_true_weight = 0.7): 
        '''df_preds is a pandas dataframe where index is image paths
             Must have y_true and y_pred one hot encoded columns (ie y_true_0, y_true_1)
          '''
        
        self.y_true_weight = y_true_weight
        self.y_pred_weight = 1 - y_true_weight
        self.df = df_preds

    def before_train(self):
        if type(self.dl.items)==type(pd.DataFrame()): self.imgs_list = L(o for o in self.dl.items.iloc[:,0].values)
        if is_listy(self.dl.items): self.imgs_list = L(self.dl.items)      
    
    def before_validate(self):
        if type(self.dl.items)==type(pd.DataFrame()): self.imgs_list = L(o for o in self.dl.items.iloc[:,0].values)
        if is_listy(self.dl.items): self.imgs_list = L(self.dl.items)       
    
    def before_batch(self):
        # get the images' names for the current batch
        imgs = self.imgs_list[self.dl._DataLoader__idxs[self.iter*self.dl.bs:self.iter*self.dl.bs+self.dl.bs]]
        # get soft labels
        df = self.df
        soft_labels = df.loc[imgs,df.columns.str.startswith('y_true')].values
        
        if self.training:
            soft_labels = soft_labels*self.y_true_weight + df.loc[imgs,df.columns.str.startswith('y_pred')].values*self.y_pred_weight
        self.learn.yb = (Tensor(soft_labels).cuda(),)

Train the Model and Results

Then we put the callback and our one hot metric and loss function into a learning and fine tune it. As you can see, we get a small bump in both accuracy and roc_auc score.

This is training on the same data that the last of the crossfolds was, so it's a good comparison.

without soft labeling: max accuracy was 86.8%, which was hit very early on and then did not see improvements for 8 more epochs.
With soft labeling: max accuracy was 88%, over 1% higher than without soft labeling. In addition, the last 4 epochs showed epoch over epoch improvements to the metric and loss with the last epoch being the highest accuracy. We can almost certainly train longer to see even better benefits.

learn = cnn_learner(dls,resnet18,metrics=[accuracy,RocAuc()],loss_func=CrossEntropyLossOneHot(),cbs=SoftLabelCB(res))

learn.fine_tune(10,reset_opt=True)

	path	truth	noisy_labels_1	noisy_labels_5	noisy_labels_25	noisy_labels_50	is_valid
0	train/n02111889/n02111889_5826.JPEG	n02111889	n02111889	n02111889	n02111889	n02111889	False
1	train/n02111889/n02111889_1944.JPEG	n02111889	n02111889	n02111889	n02111889	n02086240	False
2	train/n02111889/n02111889_17657.JPEG	n02111889	n02111889	n02111889	n02111889	n02111889	False

epoch	train_loss	valid_loss	accuracy	roc_auc_score	time
0	0.820277	0.695468	0.868823	0.967281	00:58
1	0.912045	0.729069	0.846444	0.965334	01:01
2	0.728870	0.716678	0.848659	0.964523	00:59
3	0.640925	0.717469	0.848659	0.964562	00:59
4	0.620054	0.712924	0.839575	0.963071	01:00
5	0.502611	0.703821	0.850210	0.964302	00:59
6	0.352605	0.727077	0.858631	0.965233	00:59
7	0.304586	0.729460	0.864392	0.964620	00:59
8	0.249052	0.723166	0.858631	0.965410	00:56
9	0.180612	0.732588	0.859739	0.965431	00:52

epoch	train_loss	valid_loss	accuracy	roc_auc_score	time
0	0.853930	0.630505	0.868351	0.971000	00:42
1	0.841145	0.655725	0.860372	0.968649	00:42
2	0.834104	0.692984	0.839539	0.967488	00:42
3	0.658782	0.686378	0.854167	0.968132	00:42
4	0.606825	0.703417	0.846853	0.966847	00:42
5	0.503965	0.687867	0.843528	0.966903	00:42
6	0.409177	0.686660	0.857713	0.967467	00:41
7	0.340657	0.690113	0.858821	0.967676	00:42
8	0.237057	0.685794	0.866356	0.967809	00:42
9	0.222322	0.681753	0.865470	0.968221	00:42

	y_true	y_pred
imgs
train/n02086240/n02086240_6323.JPEG	n02086240	n02086240
train/n02093754/n02093754_696.JPEG	n02093754	n02093754
train/n02089973/n02089973_12157.JPEG	n02089973	n02089973
train/n02096294/n02096294_4188.JPEG	n02096294	n02096294
train/n02086240/n02086240_6595.JPEG	n02086240	n02115641

epoch	train_loss	valid_loss	accuracy	roc_auc_score	time
0	0.781899	0.655288	0.869016	0.970987	00:42
1	0.777672	0.666892	0.860594	0.970698	00:43
2	0.734413	0.635256	0.852837	0.969718	00:43
3	0.611115	0.629129	0.860816	0.969812	00:43
4	0.575180	0.618686	0.863475	0.970406	00:43
5	0.464586	0.602176	0.871897	0.969895	00:43
6	0.433750	0.608785	0.867021	0.971009	00:43
7	0.414037	0.597265	0.873005	0.970817	00:45
8	0.344089	0.597751	0.875665	0.971109	00:45
9	0.314052	0.582850	0.880541	0.971416	00:45