Purpose

Many Kaggle competition are adopting a format where you must do portions of your work (particularly inference) in a kaggle kernal with internet disables. When people are first starting with the format there is usually some frustration figuring out how to work around this. This guide is my resource and reference sheet to setting all of it up and I hope it helps others as well.

None of what I am doing is novel or a method I created - I am simply taking pieces from a lot of different places and consolidating it into a guide formatted in the way I like. I will credit sources as they come up, so please check out the original content for extra details!

There's 2 main things you will need to adjust due to the no internet stipulation:

  • Downloading and using pretrained models
  • Installing new packages and libraries
    Note: Follow me on twitter if you want updates to the blog posts.

Setup

I this post I will be assuming that you already have the kaggle API installed. If you don't go have it installed yet, no problem!

Go to this repo and follow the instruction in the readme. You will need to follow the instructions in the Installation and API Credentials sections

Datasets

The key to all of this is to use datasets. Kaggle datasets are accessible to kaggle kernels without internet. A common application is just what you think, to upload datasets for machine learning. They can also be used to upload other things you need access to in that environment, such as model weights or libraries.

The instructions in this section are everything you need to create and update a dataset.

How do I create a dataset?

  1. Create a folder containing the files you want to upload
  2. Run kaggle datasets init -p /path/to/dataset to generate a metadata file
  3. Add your dataset's metadata to the generated file, datapackage.json
  4. Run kaggle datasets create -p /path/to/dataset to create the dataset
    Note: Credit for these instructions goes to Meg Risdal’s post

How do I update a dataset?

  1. Run kaggle datasets init -p /path/to/dataset to generate a metadata file if you don't already have one
  2. Make sure the id field in datapackage.json points to your dataset
  3. Run kaggle datasets version -p /path/to/dataset -m "Your message here"
    Note: Credit for these instructions goes to Meg Risdal’s post

Getting and Using the Files

Now that we know how to create and update a dataset, we need to know what to put in there. How do we get a pretrained model? How do we get a library? That's what this section is for :)

Pretrained Models

Once you have your model, you can export it. Check out the docs for the specific library you are working in.

For example in fastai for an inference export/import:

learner = cnn_learner(dls,resnet18)
learn.fine_tune(3)
learn.export('your_dataset/model.pkl')

That would then be imported using

learn = load_learner('file:///kaggle/input/fastai/model.pkl')
dl = learn.dls.test_dl(df)
preds - learn.get_preds(dl=dl)

Libraries

To download a library you can use pip download. If you download it to the dataset folder you created, you can update the dataset with the instructions above. We'll download fastai in this example so we can update to the latest version, but the same thing applies with libraries not installed at all in the kernel

!pip download fastai -d ./your_dataset/

then you can install in the kaggle kernal like this:

!pip install -Uqq fastai --no-index --find-links=file:///kaggle/input/your_dataset/

and then import is as normal

from fastai.vision.all import *

Note: Credit for these instructions goes to samuelepino’s kaggle kernel

Conclusion

If there's anything you do that's missing or you would like added please let me know via comment or twitter. My goal is to have a one stop shop guide for all the small config/setup stuff needed for kaggle competitors.

Follow me on twitter if you want updates to the blog posts.