timm is a PyTorch-based collection of models, pre-trained weights, and utilities focused on the state-of-the-art (SOTA) in computer vision.
The library was created in 2019 by Ross Wightman. As of version 0.6.11, timm offers 765 models with weights pretrained on the ImageNet dataset. It’s a comprehensive training libary that is beloved by the computer vision community and was recently named the top trending library on papers-with-code of 2021!
Use cases
The timm library is primarily used for image classification.
Pros
Not only does this library have over 700 pre-trained SOTA image classification models, it also enables lets you use your own data-loaders, optimizers, and schedulers, There are also scripts for reproducing and fine-tuning deep learning models over custom datasets.
You can easily load a pre-trained model, get a list of models with pre-trained weights, and search for model architectures using a regex like wildcard syntax.
There’s a number of training, validation, inferencing, and checkpoint cleaning scripts in the GitHub repo. This makes it easy for you to reproduce results and fine-tune models over custom datasets.
An incredibly useful feature is the ability to work on input images with varying numbers of channels. This might pose a problem in other libraries. timm is able to do this by summing the weights of the initial convolutional layer for channels fewer than 3, or intelligently replicating these weights to the desired number of channels otherwise.
Cons
You can only use it for classification
There aren’t a tonne of example recipes for training. Most of the models don’t have their recipes available for inspection.
Though the library has a tonne of features, I found it difficult to find out where to get started, particularly when applying it for custom use-cases.
I feel like the documentation could be better. There’s a really helpful Medium post that’s a practitioners guide that gets into a lot of detail, but sometimes you just want something that gets to the point.
That being said…here’s an example of making a prediction and performing transfer learning with a timm model.
YOLOv5 is a family of deep learning architectures for object detection by Ultralytics.
The open-sourced repository by Ultralytics showcases futuristic vision-based AI methods and has been developed by inculcating learnings and practices that have evolved during research and development.
In this overview you will learn how to:
Perform inference using the Ultralytics YOLOv5
Perform transfer learning using Ultralytics YOLOv5
Pros:
It’s PyTorch based code and models are pre-trained on the COCO dataset.
Plenty of scripts in the repo that make it easy to train and reproduce results on the COCO dataset.
You can perform training and fine-tuning on custom data.
The different variants of pre-trained YOLOv5 models are available for use on PyTorch Hub.
Currently supports only the YOLOv5 object detection models, though there are plans to add classification and segmentation in the future
Wasn’t easy for me to figure out how to add custom features such as optimizers and loss functions among others.
It relies on some ‘magic numbers’—constant values for strides and augmentation scales that cannot be modified without affecting the results. Moreover, the rationale behind them is not documented.
How to perform inference on image and video with YOLOv5
There are at least two ways you can use the YOLOv5 model. This overview will focus on:
Cloning the Ultralytics GitHub repo and using their scripts
It’s quite nice. Dude, I’ve got an AMA planned with it’s creator (Glenn Jocher) coming up in January. Keep an eye out for that in the events page (though you know I’ll DM you about it )