I beleive it depends on what preprocessing steps were done on the data, if we keep the preprocessing steps the same as in pretrained model, then batch normalization of pretrained model would do just fine on the new data.
Like if I standardized my data before feeding it to my model then standardizing your data would be enough.
This is my understanding, but I would love to hear your thoughts!
@harpreet.sahota, That’s a good point!
I feel your intuition is correct, and it would make sense to unfreeze the Batchnorm while fine-tuning, but I think if the dataset which would be used for fine-tuning is quite small, the new statistics calculated using that could be noisy, and normalizing using the noisy statistics might not be desirable.
So, maybe if the dataset used for fine-tuning is large enough, it would be a good idea to unfreeze the BatchNorm layers.
It’s an unhelpful answer but realistically it’s: “It depends”.
If it’s something you can spend iterations on - I’d say it’s worth it to test both approaches. There’s no golden rule I’ve heard of that is convincing enough.
If not, I’d err on the side of “leave them unfrozen” for the most part.