How do you know if you should freeze BatchNorm layers during transfer learning?

If I understand correct, in a pretrained model the BatchNorm layers inherit the statistics from the dataset the network was initially trained on.

I assume that when you’re fine-tuning a model it might be desirable normalize to the statistics of the current dataset.

Is this intuition correct? Or is this one of those “it depends” situations?

Any tips, insight, or advice? Would love to hear from @salmankhaliq22 @yashikajain201 @mjcullan @chris @kbaheti on this.


I beleive it depends on what preprocessing steps were done on the data, if we keep the preprocessing steps the same as in pretrained model, then batch normalization of pretrained model would do just fine on the new data.

Like if I standardized my data before feeding it to my model then standardizing your data would be enough.

This is my understanding, but I would love to hear your thoughts!

@harpreet.sahota, That’s a good point!
I feel your intuition is correct, and it would make sense to unfreeze the Batchnorm while fine-tuning, but I think if the dataset which would be used for fine-tuning is quite small, the new statistics calculated using that could be noisy, and normalizing using the noisy statistics might not be desirable.
So, maybe if the dataset used for fine-tuning is large enough, it would be a good idea to unfreeze the BatchNorm layers.

1 Like

Thanks @salmankhaliq22 and @yashikajain201 - good pointers there.

1 Like


It’s an unhelpful answer but realistically it’s: “It depends”.

If it’s something you can spend iterations on - I’d say it’s worth it to test both approaches. There’s no golden rule I’ve heard of that is convincing enough.

If not, I’d err on the side of “leave them unfrozen” for the most part.