Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
The Kaggle Workbook

You're reading from   The Kaggle Workbook Self-learning exercises and valuable insights for Kaggle data science competitions

Arrow left icon
Product type Paperback
Published in Feb 2023
Publisher Packt
ISBN-13 9781804611210
Length 172 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Luca Massaron Luca Massaron
Author Profile Icon Luca Massaron
Luca Massaron
Konrad Banachewicz Konrad Banachewicz
Author Profile Icon Konrad Banachewicz
Konrad Banachewicz
Arrow right icon
View More author details
Toc

Learning from top solutions

In this section, we gather aspects of the top solutions that could allow us to rise above the level of the baseline solution. Keep in mind that the leaderboards (both public and private) in this competition were quite tight; this was due to a combination of a couple of factors:

  • Noisy data: it was easy to get to 0.89 accuracy by correctly identifying a large part of the train data, and then each new correct one allowed for a tiny move upward
  • Limited size of the data

Pretraining

The first and most obvious remedy to the issue of limited data size was pretraining: using more data. Pretraining a deep learning model on more data can be beneficial because it can help the model learn better representations of the data, which can in turn improve the performance of the model on downstream tasks. When a deep learning model is trained on a large dataset, it can learn to extract useful features from the data that are relevant to the task at hand. This can provide a strong foundation for the model, allowing it to learn more effectively when it is fine-tuned on a smaller, specific dataset.

Additionally, pretraining on a large dataset can help the model to generalize better to new, unseen data. Because the model has seen a wide range of examples during pretraining, it can better adapt to new data that may be different from the training data in some way. This can be especially important when working with deep learning models, which can have a large number of parameters and can be difficult to train effectively from scratch.

The Cassava competition was held a year before as well: https://www.kaggle.com/competitions/cassava-disease/overview.

With minimal adjustments, the data from the 2019 edition could be leveraged in the context of the current one. Several competitors addressed the topic:

Test time augmentation

The idea behind Test Time Augmentation (TTA) is to apply different transformations to the test image: rotations, flipping, and translations. This creates a few different versions of the test image, and we generate a prediction for each of them. The resulting class probabilities are then averaged to get a more confident answer. An excellent demonstration of this technique is given in a notebook by Andrew Khael: https://www.kaggle.com/code/andrewkh/test-time-augmentation-tta-worth-it.

TTA was used extensively by the top solutions in the Cassava competition, an excellent example being the top three private leaderboard results: https://www.kaggle.com/competitions/cassava-leaf-disease-classification/discussion/221150.

Transformers

While more widely known architectures like ResNeXt and EfficientNet were used a lot in the course of the competition, it was the addition of more novel ones that provided the extra edge to many competitors yearning for progress in a tightly packed leaderboard. Transformers emerged in 2017 as a revolutionary architecture for NLP (if somehow you missed the paper that started it all, here it is: https://arxiv.org/abs/1706.03762) and were such a spectacular success that, inevitably, many people started wondering if they could be applied to other modalities as well – vision being an obvious candidate. The aptly named Vision Transformer (ViT) made one of its first appearances in a Kaggle competition in the Cassava contest.

An excellent tutorial for ViT has been made public: https://www.kaggle.com/code/abhinand05/vision-transformer-vit-tutorial-baseline.

Ensembling

Ensembling is very popular on Kaggle (see Chapter 9 of The Kaggle Book for a more elaborate description) and the Cassava competition was no exception. As it turned out, combining diverse architectures was very beneficial (by averaging the class probabilities): EfficientNet, ResNext, and ViT are sufficiently different from each other that their predictions complement each other. When building a machine learning ensemble, it is useful to combine models that are different from one another because this can help improve the overall performance of the ensemble.

Ensembling is the process of combining the predictions of multiple models to create a more accurate prediction. By combining models that have different strengths and weaknesses, the ensemble can take advantage of the strengths of each individual model to make more accurate predictions.

For example, if the individual models in an ensemble are all based on the same type of algorithm, they may all make similar errors on certain types of data. By combining models that use different algorithms, the ensemble can potentially correct for the errors made by each individual model, leading to better overall performance. Additionally, by combining models that have been trained on different data or using different parameters, the ensemble can potentially capture more of the underlying variation in the data, leading to more accurate predictions.

Another important approach was stacking, i.e., using models in two stages. First we construct multiple predictions from diverse models, and those are subsequently used as input for a second-level model: https://www.kaggle.com/competitions/cassava-leaf-disease-classification/discussion/220751.

The winning solution involved a different approach (with fewer models in the final blend), but relied on the same core logic: https://www.kaggle.com/competitions/cassava-leaf-disease-classification/discussion/221957.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image