Measuring Data Drift

Learn about and understand and detect data drift in models deployed with DataRobot MLOps

Data drift happens when the data you are sending to your models to make predictions doesn't resemble the data that you used to train the model with. This means that your models will become less and less accurate as the data continues to drift. Because of that, it's important to constantly monitor your models for data drift, so you know when to retrain the models on new data. DataRobot MLOps is a good tool for that.

In this guide you will learn how to detect data drift with DataRobot MLOps, and what to do if you detect it.

You can read about how it's done in the GUI in this article in DataRobot Community. (And, if needed, you can check out this article to learn a bit more about data drift.)

Checking data drift using the API

You can query data drift programmatically. The easiest way to do so is by querying the deployment object with the Python SDK:

#deployment is a DataRobot Deployment instance

feature_drift_data = deployment.get_feature_drift()

for feature in feature_drift_data:
# Example output:
# 'weight'
# 0.03335281792037 # The best (non drifted) scores are close to 0.
# 'model year'
# 2.886657367499116 # This is an example of a score that has drifted substantially.

(See the data drift support in the latest Python Client documentation.)

Causing data drift

To give you an idea of how data drift works in practice, we have created a set of scripts that let you cause data drift on a deployed model.

The script works on the Auto MPG example model you deployed in the Quickstart Guide.

To cause data drift we generated fake "drifting" cars that have parameters different to what the model used for training, and then made requests to predict the MPG for these fake cars. DataRobot correctly detects substantial feature drift, and reflects that in both the GUI and the API.

Check out and run the script in this GitHub project.

Monitoring data drift

Besides making API calls, you can also set up monitoring for data drift in the GUI—and get notified when your models are at risk of becoming inaccurate.

Data Drift monitoring settingsData Drift monitoring settings

Data Drift monitoring settings

Did this page help you?