1.1. ML lifecycle. What can go wrong with ML in production?

What can go wrong with data and machine learning services in production. Data quality issues, data drift, and concept drift.

Video 1. ML lifecycle. What can go wrong with ML in production, by Emeli Dral

Evaluations in ML model lifecycle

Building a successful ML model involves the following stages:

  • Data preparation,

  • Feature engineering,

  • Model training,

  • Model evaluation,

  • Model deployment.

You can perform different types of evaluations at each of these stages. For example,

  • During data preparation, exploratory data analysis (EDA) helps to understand the dataset and validate the problem statement.

  • At the experiment stage, performing cross-validation and holdout testing helps validate and test if ML models are useful.

However, the work does not stop here! Once the best model is deployed to production and starts bringing business value, every erroneous prediction has its costs. It is crucial to ensure that this model functions stably and reliably. To do that, one must continuously monitor the production ML model and data.

What can go wrong in production?

Many things can go wrong once you deploy an ML model to the real world. Here are some examples.

Training-serving skew. Model degrades if training data is very different from production data.

Data quality issues. In most cases, when something is wrong with the model, this is due to data quality and integrity issues. These can be caused by:

  • Data processing issues, e.g., broken pipelines or infrastructure updates.

  • Data schema changes in the upstream system, third-party APIs, or catalogs.

  • Data loss at source when dealing with broken sensors, logging errors, database outages, etc.

Broken upstream model. Often, not one model but a chain of ML models operates in production. If one model gives wrong outputs, it can affect downstream models.

Concept drift. Gradual concept drift occurs when the target function continuously changes over time, leading to model degradation. If the change is sudden – like the recent pandemic – you’re dealing with sudden concept drift.

Data drift. Distribution changes in the input features may signal data drift and potentially cause ML model performance degradation. For example, a significant number of users coming from a new acquisition channel can negatively affect the model trained on user data. Chances are that users from different channels behave differently. To get back on track, the model needs to learn new patterns.

Underperforming segments. A model might perform differently on diverse data segments. It is crucial to monitor performance across all segments.

Adversarial adaptation. In the era of neural networks, models might face adversarial attacks. Monitoring helps detect these issues on time.

Summing up

Many factors can impact the performance of an ML model in production. ML monitoring and observability are crucial to ensure that models perform as expected and provide value.

Last updated