What are some pitfalls to watch out for when looking for object detection datasets?

I’m working on some projects and I’ve come across a lot of datasets.

That got me wondering…

Meme Think GIF

• How do you go about judging whether a particular dataset will be a good fit for training?

• What heuristics you use to assess the quality of the datasets?

1 Like

Would love to hear from @kbaheti @lu.riera @msminhas @mjcullan, I’m sure y’all have some good insights to share.

@harpreet.sahota In my experience, being able to anticipate as much variability as possible has been useful for developing good models from training data.

This will also depend from task to task. Let’s focus on human pose estimation for this discussion. In terms of the variability that would be essential for a good dataset includes: Different camera positions/angles, different lighting conditions, different backgrounds, different body types, and different number of people in frame are definitely going to help generalize the performance of the model.

Recently I have also determined that differing intrinsic camera parameters can help cover more use cases that may be encountered in production settings. The intrinsic parameters include focal length, aperture, field-of-view, resolution - which may differ among users when capturing video/images.

The distribution of these variabilities should also be considered, as you may want to have a decent balance between these - otherwise you run the risk of overfitting a specific angle or condition - which depending on your application - may be desirable or not. Sometimes for example, your production application may require only a few cameras pointing at objects from a few angles - such as in a factory production setting. In this case, having different angles/positions may not be necessary, as the model may never receive data that is much different from its training dataset. This is a case where intrinsic/extrinsic parameter variability may not be useful at all.

In summary, having a good class distribution for your dataset as well as variability in extrinsic/intrinsic camera parameters and environmental conditions (where applicable) can help to build a more robust/generalizable training set for your computer vision models.


Thank you so much for this insightful response @lu.riera. This is some grade A food for thought.

@kausthubk I think you would be interested in the point about camera parameters.