What's your experimentation process like?

What’s your experimentation process look like when fine-tuning a pre-trained computer vision model?

How is that different from your process when doing architecture search?

Would love to hear from @OBaratz @kausthubk @richmond @lu.riera @katzav.avi @msminhas @mjcullan @salmankhaliq22 @ThomIves

@kbaheti Would love to hear your thoughts or process on how you approach experimentation, whether in CV or more general deep learning.

What’s your experimentation process look like when fine-tuning a pre-trained computer vision model?

For me, traditionally, fine-tuning meant keeping the initial pretrained layers and then training the later layers. In recent times, I have actually also often used it as a means to get the learned features and then do something with those features. That has actually proven to be a super powerful tool for me!

How is that different from your process when doing architecture search?

You have probably heard this a million times but I will say it again - start with a simple model! In my experience, most vision tasks perform decently well with a ResNet. However, in specific use-cases, I would look beyond that to see if I can a model better suited to my application based on (1) if it is available for commercial use lol (2) how does that architecture fit into my application ? If I have a text-to-image application but the architecture is a vision encoder, I could still use those learned image features to my benefit (3) how complex is it, i.e., how long would it take to run for my entire application ? I like to be agile in coding up my models and seeing what the results & runtime look like; production jobs usually need to run on a timely cadence so it is preferable to keep that short

1 Like

Such great insight, thank you for sharing @ZubiaMansoor!

1 Like

For me, it is the transfer learning what I use most often.

When I’m on Kaggle competition, my usual approach is to try a set of my favorite encoders and construct a task-specific model if needed.

TLDR:

  • Using hydra for experiment configuration
  • Zero or minimal number of hard-coded values. The ultimate goal is to have all moving parts set by configuration files. This way you don’t write .py code, instead you change only yaml files.
  • Tensorboard to log metrics & hyper-parameters of each experiment
  • Hydra sweep to automate grid-search process of optimal model/loss/dataset sampling regime. Early stopping to save time on unsuccessful attempts.

A few example of from past challenges:

  • A fully convolutional siamese segmentation network. In this architecture I used a pre-trained encoder from timm and siamese decoder was trained from scratch.
  • A center-net like detection model for small object detection. Again, encoder was taken from timm, but decoder & head detection part were custom-built for a specific task (Detection + Multilabel classification + single-dimension size regression)
  • A classification model for detecting whether an image has hidden embedded message (steganography analysis). Departing from classical CNN classification model, a bit of model surgery was performed on top of ImageNet pretrained weights: stride 1, no maxpool at the start, concatenation of maxpool & avgpool, swish → mish activation function.

To me the biggest need on such challenges, is to being able to quickly change any part of the model. These days I achieve this by using configuration files for each model architecture & Hydra to instantiate it:

_target_: xview3.centernet.models.CenterNetUNetModel.from_config
config:
  activation: relu
  num_extra_blocks: 0
  num_channels: ${dataset.num_channels}

  encoder:
    _target_: pytorch_toolbelt.modules.encoders.timm.HRNetW32Encoder
    pretrained: True
    use_incre_features: False

  decoder:
    upsample_type: bilinear
    block_type: UnetBlock
    channels:
      - 128
      - 256
      - 256
      - 256

  segmentation:
    _target_: xview3.centernet.models.heads.UpsampleSegmentationHead
    num_classes: ${dataset.segmentation_num_classes}
    upsample_blocks: 1
    activation: relu
    dropout_rate: 0.2

This allows me to express experimentation process in a form of hydra sweep run:

train.py -m model=seg_hrnetw32 model.config.activation=relu,elu,silu loss=bce,focal,dice

With all the training results logged to tensorboard I can run training over a weekend and come and check final results later on without being much involved into manual running of each experiment.

4 Likes

This is so dope! Thank you @EKhvedchenya! For @trust_level_1 Eugene is on our SuperGradients team and is a Kaggle genius (I forget your rank, is it master?). He’s an excellent resource to have around the community!

Check him out here: Eugene Khvedchenya | Grandmaster | Kaggle

It’s GM (Grandmaster) both on Kaggle and Signate platforms. It was quite a long yet exciting experience to get there, but I’m happy to share what I’ve learned through this journey with folks here :slight_smile:

4 Likes