Overview of computer vision training libraries megathread

Hey everyone,

This topic will be a megathread, featuring all the notebooks that are created as part of the overview of computer vision training libraries.

I’ll add a short write up and notebook with a coding example as I finish them.

Any questions, comments, suggestions are 100% welcome.


1 Like

Overview of torch image models (timm)


timm is a PyTorch-based collection of models, pre-trained weights, and utilities focused on the state-of-the-art (SOTA) in computer vision.

The library was created in 2019 by Ross Wightman. As of version 0.6.11, timm offers 765 models with weights pretrained on the ImageNet dataset. It’s a comprehensive training libary that is beloved by the computer vision community and was recently named the top trending library on papers-with-code of 2021!

Use cases

The timm library is primarily used for image classification.


  • Not only does this library have over 700 pre-trained SOTA image classification models, it also enables lets you use your own data-loaders, optimizers, and schedulers, There are also scripts for reproducing and fine-tuning deep learning models over custom datasets.

  • You can easily load a pre-trained model, get a list of models with pre-trained weights, and search for model architectures using a regex like wildcard syntax.

  • There’s a number of training, validation, inferencing, and checkpoint cleaning scripts in the GitHub repo. This makes it easy for you to reproduce results and fine-tune models over custom datasets.

  • An incredibly useful feature is the ability to work on input images with varying numbers of channels. This might pose a problem in other libraries. timm is able to do this by summing the weights of the initial convolutional layer for channels fewer than 3, or intelligently replicating these weights to the desired number of channels otherwise.


  • You can only use it for classification

  • There aren’t a tonne of example recipes for training. Most of the models don’t have their recipes available for inspection.

  • Though the library has a tonne of features, I found it difficult to find out where to get started, particularly when applying it for custom use-cases.

  • I feel like the documentation could be better. There’s a really helpful Medium post that’s a practitioners guide that gets into a lot of detail, but sometimes you just want something that gets to the point.

That being said…here’s an example of making a prediction and performing transfer learning with a timm model.



I love how you choose the hardest possible version of Pizza for your model to classify.

“Though the wasabi and ginger on the side may be a giveaway?” A giveaway… to pizza? Which normally looks like this :pizza:!?

You’re working your models wayyy too hard.




That’s how I do 'em!

Over It Spinning GIF by Pudgy Penguins

1 Like

That was insightful, I am more of a TensorFlow person, but starting to learn PyTorch now. Lets see when i get to use timm library

An overview of Ultralytics YOLOv5

YOLOv5 is a family of deep learning architectures for object detection by Ultralytics.

The open-sourced repository by Ultralytics showcases futuristic vision-based AI methods and has been developed by inculcating learnings and practices that have evolved during research and development.

In this overview you will learn how to:

  • Perform inference using the Ultralytics YOLOv5
  • Perform transfer learning using Ultralytics YOLOv5


  • It’s PyTorch based code and models are pre-trained on the COCO dataset.
  • Plenty of scripts in the repo that make it easy to train and reproduce results on the COCO dataset.
  • You can perform training and fine-tuning on custom data.
  • The different variants of pre-trained YOLOv5 models are available for use on PyTorch Hub.
  • Inference is simple!
  • There’s also an integration with Deci AI.


  • Currently supports only the YOLOv5 object detection models, though there are plans to add classification and segmentation in the future
  • Wasn’t easy for me to figure out how to add custom features such as optimizers and loss functions among others.
  • It relies on some ‘magic numbers’—constant values for strides and augmentation scales that cannot be modified without affecting the results. Moreover, the rationale behind them is not documented.

How to perform inference on image and video with YOLOv5

There are at least two ways you can use the YOLOv5 model. This overview will focus on:

  1. Cloning the Ultralytics GitHub repo and using their scripts

  2. Downloading the model from the Pytorch Hub

Check out the example notebook here


Inference with YOLOv5 is mindnumbingly simple hahaha. So extremely well thought out in terms of torchhub integration etc.

1 Like

It’s quite nice. Dude, I’ve got an AMA planned with it’s creator (Glenn Jocher) coming up in January. Keep an eye out for that in the events page (though you know I’ll DM you about it :laughing:)

1 Like

Yeah you mentioned it on your code-with-me session - keen to listen to that one.

1 Like