I’m wondering if anyone can share some insight into feature drift.
What does it look like for Classical ML vs Deep Learning?
What effects can it have on our system?
How do we safe guard against it?
I know @richmond has been thinking a lot about this lately. Would love to hear from you.
What is Feature Drift
Feature Drift is defined as a variation in the statistical properties of segments of a dataset, which overall causes a shift in the distribution of features used as input signals to train or conduct inference on machine learning models.
What effect can it have on our system
The effect of feature drift is an unexpected change in the model’s performance. A notable difference between a machine learning model performance in training(development environment) and live inference(production environment) is called Train-serving skew.
How do we safeguard against it
Using dataset or feature monitoring tools available on solutions such as SageMaker or AzureML DataDriftDetector provides an approach to safeguarding against feature drift.
Also using feature stores to centralise the management of features used across organisations helps to identify sources of problems quicker and propagate solutions to problems efficiently.
Feature Stores solutions:
- Vertex AI Feature Store