What’s your experimentation process look like when fine-tuning a pre-trained computer vision model?
How is that different from your process when doing architecture search?
Would love to hear from @OBaratz @kausthubk @richmond @lu.riera @katzav.avi @msminhas @mjcullan @salmankhaliq22 @ThomIves
@kbaheti Would love to hear your thoughts or process on how you approach experimentation, whether in CV or more general deep learning.
What’s your experimentation process look like when fine-tuning a pre-trained computer vision model?
For me, traditionally, fine-tuning meant keeping the initial pretrained layers and then training the later layers. In recent times, I have actually also often used it as a means to get the learned features and then do something with those features. That has actually proven to be a super powerful tool for me!
How is that different from your process when doing architecture search?
You have probably heard this a million times but I will say it again - start with a simple model! In my experience, most vision tasks perform decently well with a ResNet. However, in specific use-cases, I would look beyond that to see if I can a model better suited to my application based on (1) if it is available for commercial use lol (2) how does that architecture fit into my application ? If I have a text-to-image application but the architecture is a vision encoder, I could still use those learned image features to my benefit (3) how complex is it, i.e., how long would it take to run for my entire application ? I like to be agile in coding up my models and seeing what the results & runtime look like; production jobs usually need to run on a timely cadence so it is preferable to keep that short
1 Like
Such great insight, thank you for sharing @ZubiaMansoor!
1 Like
For me, it is the transfer learning what I use most often.
When I’m on Kaggle competition, my usual approach is to try a set of my favorite encoders and construct a task-specific model if needed.
TLDR:
- Using hydra for experiment configuration
- Zero or minimal number of hard-coded values. The ultimate goal is to have all moving parts set by configuration files. This way you don’t write .py code, instead you change only yaml files.
- Tensorboard to log metrics & hyper-parameters of each experiment
- Hydra sweep to automate grid-search process of optimal model/loss/dataset sampling regime. Early stopping to save time on unsuccessful attempts.
A few example of from past challenges:
- A fully convolutional siamese segmentation network. In this architecture I used a pre-trained encoder from timm and siamese decoder was trained from scratch.
- A center-net like detection model for small object detection. Again, encoder was taken from timm, but decoder & head detection part were custom-built for a specific task (Detection + Multilabel classification + single-dimension size regression)
- A classification model for detecting whether an image has hidden embedded message (steganography analysis). Departing from classical CNN classification model, a bit of model surgery was performed on top of ImageNet pretrained weights: stride 1, no maxpool at the start, concatenation of maxpool & avgpool, swish → mish activation function.
To me the biggest need on such challenges, is to being able to quickly change any part of the model. These days I achieve this by using configuration files for each model architecture & Hydra to instantiate it:
_target_: xview3.centernet.models.CenterNetUNetModel.from_config
config:
activation: relu
num_extra_blocks: 0
num_channels: ${dataset.num_channels}
encoder:
_target_: pytorch_toolbelt.modules.encoders.timm.HRNetW32Encoder
pretrained: True
use_incre_features: False
decoder:
upsample_type: bilinear
block_type: UnetBlock
channels:
- 128
- 256
- 256
- 256
segmentation:
_target_: xview3.centernet.models.heads.UpsampleSegmentationHead
num_classes: ${dataset.segmentation_num_classes}
upsample_blocks: 1
activation: relu
dropout_rate: 0.2
This allows me to express experimentation process in a form of hydra sweep run:
train.py -m model=seg_hrnetw32 model.config.activation=relu,elu,silu loss=bce,focal,dice
With all the training results logged to tensorboard I can run training over a weekend and come and check final results later on without being much involved into manual running of each experiment.
4 Likes
This is so dope! Thank you @EKhvedchenya! For @trust_level_1 Eugene is on our SuperGradients team and is a Kaggle genius (I forget your rank, is it master?). He’s an excellent resource to have around the community!
Check him out here: Eugene Khvedchenya | Grandmaster | Kaggle
It’s GM (Grandmaster) both on Kaggle and Signate platforms. It was quite a long yet exciting experience to get there, but I’m happy to share what I’ve learned through this journey with folks here 
4 Likes