Kaggle Setup Guide
Getting started with Kaggle Kernel Submission
Purpose
Many Kaggle competition are adopting a format where you must do portions of your work (particularly inference) in a kaggle kernal with internet disables. When people are first starting with the format there is usually some frustration figuring out how to work around this. This guide is my resource and reference sheet to setting all of it up and I hope it helps others as well.
None of what I am doing is novel or a method I created - I am simply taking pieces from a lot of different places and consolidating it into a guide formatted in the way I like. I will credit sources as they come up, so please check out the original content for extra details!
There's 2 main things you will need to adjust due to the no internet stipulation:
- Downloading and using pretrained models
- Installing new packages and libraries
Note: Follow me on twitter if you want updates to the blog posts.
I this post I will be assuming that you already have the kaggle API installed. If you don't go have it installed yet, no problem!
Go to this repo and follow the instruction in the readme. You will need to follow the instructions in the Installation and API Credentials sections
The key to all of this is to use datasets. Kaggle datasets are accessible to kaggle kernels without internet. A common application is just what you think, to upload datasets for machine learning. They can also be used to upload other things you need access to in that environment, such as model weights or libraries.
The instructions in this section are everything you need to create and update a dataset.
- Create a folder containing the files you want to upload
- Run
kaggle datasets init -p /path/to/dataset
to generate a metadata file - Add your dataset's metadata to the generated file,
datapackage.json
- Run
kaggle datasets create -p /path/to/dataset
to create the datasetNote: Credit for these instructions goes to Meg Risdal’s post
- Run
kaggle datasets init -p /path/to/dataset
to generate a metadata file if you don't already have one - Make sure the id field in
datapackage.json
points to your dataset - Run
kaggle datasets version -p /path/to/dataset -m "Your message here"
Note: Credit for these instructions goes to Meg Risdal’s post
Pretrained Models
Once you have your model, you can export it. Check out the docs for the specific library you are working in.
For example in fastai for an inference export/import:
learner = cnn_learner(dls,resnet18)
learn.fine_tune(3)
learn.export('your_dataset/model.pkl')
That would then be imported using
learn = load_learner('file:///kaggle/input/fastai/model.pkl')
dl = learn.dls.test_dl(df)
preds - learn.get_preds(dl=dl)
To download a library you can use pip download
. If you download it to the dataset folder you created, you can update the dataset with the instructions above. We'll download fastai in this example so we can update to the latest version, but the same thing applies with libraries not installed at all in the kernel
!pip download fastai -d ./your_dataset/
then you can install in the kaggle kernal like this:
!pip install -Uqq fastai --no-index --find-links=file:///kaggle/input/your_dataset/
and then import is as normal
from fastai.vision.all import *
Conclusion
If there's anything you do that's missing or you would like added please let me know via comment or twitter. My goal is to have a one stop shop guide for all the small config/setup stuff needed for kaggle competitors.
Follow me on twitter if you want updates to the blog posts.