Data drift happens when the data you are sending to your models to make predictions doesn't resemble the data that you used to train the model with. This means that your models will become less and less accurate as the data continues to drift. Because of that, it's important to constantly monitor your models for data drift, so you know when to retrain the models on new data. DataRobot MLOps is a good tool for that.
In this guide you will learn how to detect data drift with DataRobot MLOps, and what to do if you detect it.
You can query data drift programmatically. The easiest way to do so is by querying the
deployment object with the Python SDK:
#deployment is a DataRobot Deployment instance feature_drift_data = deployment.get_feature_drift() for feature in feature_drift_data: pprint(feature.name) pprint(feature.drift_score) # Example output: # 'weight' # 0.03335281792037 # The best (non drifted) scores are close to 0. # 'model year' # 2.886657367499116 # This is an example of a score that has drifted substantially.
(See the data drift support in the latest Python Client documentation.)
To give you an idea of how data drift works in practice, we have created a set of scripts that let you cause data drift on a deployed model.
The script works on the Auto MPG example model you deployed in the Quickstart Guide.
To cause data drift we generated fake "drifting" cars that have parameters different to what the model used for training, and then made requests to predict the MPG for these fake cars. DataRobot correctly detects substantial feature drift, and reflects that in both the GUI and the API.
Check out and run the script in this GitHub project.
Besides making API calls, you can also set up monitoring for data drift in the GUI—and get notified when your models are at risk of becoming inaccurate.
Updated about 1 year ago