How much does camera change affect model performance?


Has anyone seen how changing a visual spectrum (RGB) camera impacts performance of an object detector?

E.g. detectron2 or YOLOv5 but changing from an iPhone camera to a Samsung Galaxy camera or to an off-board machine vision camera/DSLR.


Prior Experience

I’ve seen that in satellite imaging, a change in the sensor results in model performance not being retained, and that a model would have to be trained on that imager’s sensor characteristics. In that situation it was semantic segmentation and was for specific spectral bands (not visual RGB).

Scale & Details

  • ~500-1000 trained models on specific instances of the same problem (detectron trained on 1000 mini datasets for specific use cases).
  • high precision requirements (>=0.999) must be maintained

My assumptions so far:

We will need to match:

  • focal lengths will need to match
  • field of view will need to match
  • warping (fisheye etc.) will need to match

We can train with augmentations to resolve:

  • luminance & colour representation mismatch
  • sensor effects (motion blur, rolling shutter, etc.)

@kausthubk I ran this by our research team at Deci and here’s what they said:

It depends mostly on the training data and augmentations you are using.

When training on COCO dataset, the images come from various sources and devices, thus a model which was trained on that data is expected to perform well on images from different cameras.

Detection models are usually trained with augmentations, that include, among others, changes in tint, brightness, contrast, and so on.

These augmentations are usually larger than the slight differences between different cameras, and thus should make such differences negligible.


Hmm yeah - thanks @harpreet.sahota that was my suspicion - so more warping augmentations thrown in might help there.

The challenge here is in validating the risk in changing the camera (all of our training data is from the same camera spec - not varied like larger public datasets) given the extremely high precision and recall requirements needed for this use case.

My thought process is that either of the following options makes sense:

  1. Do something like a k-fold cross-validation test essentially but the hold out sets should be to warping augmentations that haven’t been used in training. This won’t validate a specific camera but it will validate if our model is invariant to warping that’s representative of different lenses.

  2. A/B test in the field with the new camera for some users and see if there’s a performance drop compared to prior usage or compared to other users.

That’s my current train of thought.

1 Like

Tagging in @avi and @OBaratz to see if they have any thoughts here.

Interesting use case!

As I see it, changes in lenses (can affect sharpness of images), channels (you mentioned RGB to some other formats), and even angles can really drop the performance of a model that was trained on a very specific dataset.
If you cannot enrich the training data maybe you should consider continuous fine tuning your model using the production inputs. You will probably get insufficient predictions at the beginning, but you’ll reach steady state that will be much better.


Yeah fair enough @OBaratz - Sounds like a logistical nightmare more than anything - we’d have to close that loop and train without deploying until the replacement model is ready to those stringent quality requirements. It’s a regulated industry.

1 Like