LogoLogo
DiscordGitHub
  • Welcome!
  • ML OBSERVABILITY COURSE
    • Module 1: Introduction
      • 1.1. ML lifecycle. What can go wrong with ML in production?
      • 1.2. What is ML monitoring and observability?
      • 1.3. ML monitoring metrics. What exactly can you monitor?
      • 1.4. Key considerations for ML monitoring setup
      • 1.5. ML monitoring architectures
    • Module 2: ML monitoring metrics
      • 2.1. How to evaluate ML model quality
      • 2.2. Overview of ML quality metrics. Classification, regression, ranking
      • 2.3. Evaluating ML model quality [CODE PRACTICE]
      • 2.4. Data quality in machine learning
      • 2.5. Data quality in ML [CODE PRACTICE]
      • 2.6. Data and prediction drift in ML
      • 2.7. Deep dive into data drift detection [OPTIONAL]
      • 2.8. Data and prediction drift in ML [CODE PRACTICE]
    • Module 3: ML monitoring for unstructured data
      • 3.1. Introduction to NLP and LLM monitoring
      • 3.2. Monitoring data drift on raw text data
      • 3.3. Monitoring text data quality and data drift with descriptors
      • 3.4. Monitoring embeddings drift
      • 3.5. Monitoring text data [CODE PRACTICE]
      • 3.6. Monitoring multimodal datasets
    • Module 4: Designing effective ML monitoring
      • 4.1. Logging for ML monitoring
      • 4.2. How to prioritize ML monitoring metrics
      • 4.3. When to retrain machine learning models
      • 4.4. How to choose a reference dataset in ML monitoring
      • 4.5. Custom metrics in ML monitoring
      • 4.6. Implementing custom metrics in Evidently [OPTIONAL]
      • 4.7. How to choose the ML monitoring deployment architecture
    • Module 5: ML pipelines validation and testing
      • 5.1. Introduction to data and ML pipeline testing
      • 5.2. Train and evaluate an ML model [OPTIONAL CODE PRACTICE]
      • 5.3. Test input data quality, stability and drift [CODE PRACTICE]
      • 5.4. Test ML model outputs and quality [CODE PRACTICE]
      • 5.5. Design a custom test suite with Evidently [CODE PRACTICE]
      • 5.6. Run data drift and model quality checks in an Airflow pipeline [OPTIONAL CODE PRACTICE]
      • 5.7. Run data drift and model quality checks in a Prefect pipeline [OPTIONAL CODE PRACTICE]
      • 5.8. Log data drift test results to MLflow [CODE PRACTICE]
    • Module 6: Deploying an ML monitoring dashboard
      • 6.1. How to deploy a live ML monitoring dashboard
      • 6.2. ML model monitoring dashboard with Evidently. Batch architecture [CODE PRACTICE]
      • 6.3. ML model monitoring dashboard with Evidently. Online architecture [CODE PRACTICE]
      • 6.4. ML monitoring with Evidently and Grafana [OPTIONAL CODE PRACTICE]
      • 6.5. Connecting the dots: full-stack ML observability
Powered by GitBook
On this page
  • Matching ML monitoring setup and the use case
  • Model retraining cadence
  • Reference dataset
  • Custom metrics
  • Summing up
  1. ML OBSERVABILITY COURSE
  2. Module 1: Introduction

1.4. Key considerations for ML monitoring setup

Key considerations for ML monitoring setup. Service criticality, retraining cadence, reference dataset, and ML monitoring architecture.

Previous1.3. ML monitoring metrics. What exactly can you monitor?Next1.5. ML monitoring architectures

Last updated 1 year ago

Video 4. , by Emeli Dral

When designing the ML monitoring setup for a specific model, you might want to consider the following aspects:

  • Matching the ML monitoring setup to the use case

  • Model retraining cadence

  • Choice of reference dataset

  • Custom metrics

Matching ML monitoring setup and the use case

While setting up an ML monitoring system, it makes sense to align the complexity of monitoring with the complexity of the deployment and operations of the ML service. Some factors to consider:

  • ML service implementation. Is it a real-time production service, batch Airflow DAG, or an ad hoc Python script?

  • Feedback loop and environmental stability. Both influence the cadence of metrics calculations and the choice of specific metrics.

  • Service criticality. What is the business cost of model quality drops? What risks should we monitor for? More critical models might require a more complex monitoring setup.

Model retraining cadence

ML monitoring and retraining are closely connected. Some retraining factors to keep in mind when setting up an ML monitoring system include:

  • Frequency and costs of model retraining.

  • How you implement the retraining: whether you want to monitor the metrics and retrain on a trigger or set up a predefined retraining schedule (for example, weekly).

  • Issues that prevent updating the model too often, e.g., complex approval processes, regulations, need for manual testing.

Reference dataset

In a situation where various production models use different data types – e.g., numerical and categorical features, tabular data, text data, images, or videos – setting up data quality monitoring can be overwhelming.

Instead, you can use a reference dataset to help automatically generate different tests based on the provided example and compare the new batches of data against it.

If you follow this strategy, it is important that you select and curate an appropriate reference: as it becomes as important as choosing the right metrics. A “good” reference dataset must represent the expected data patterns correctly.

You can also utilize a reference dataset as a baseline for the distribution drift comparison. You can consider having a fixed reference dataset, a moving one, or multiple windows.

Based on the scenario, you can use different reference datasets: for example, one dataset for distribution drift detection and another to generate data quality test conditions.

Custom metrics

Standard monitoring metrics like accuracy or AUC are good starting points. However, depending on the use case, you may need to introduce more comprehensive custom monitoring metrics.

Some examples of custom metrics include:

  • Use-case specific model quality metrics (e.g., lift-10% for churn prediction in the telecom industry),

  • Heuristics that reflect quality (e.g., the share of predictions higher than a specific threshold), especially when ground truth is not available.

  • Business quality metrics and KPIs (e.g., estimated savings),

  • Custom drift detection methods beyond standard statistical tests.

Summing up

While designing an ML monitoring system, tailor your approach to fit your specific requirements and challenges:

  • Ensure the monitoring setup aligns with the complexity of your use case.

  • Consider binding retraining to monitoring, if relevant.

  • Use reference datasets to simplify the monitoring process but make sure they are carefully curated.

  • Define custom metrics that fit your problem statement and data properties.

For a deeper dive into the ML monitoring setup, head to .

Module 4
Key considerations for ML monitoring setup