LogoLogo
DiscordGitHub
  • Welcome!
  • ML OBSERVABILITY COURSE
    • Module 1: Introduction
      • 1.1. ML lifecycle. What can go wrong with ML in production?
      • 1.2. What is ML monitoring and observability?
      • 1.3. ML monitoring metrics. What exactly can you monitor?
      • 1.4. Key considerations for ML monitoring setup
      • 1.5. ML monitoring architectures
    • Module 2: ML monitoring metrics
      • 2.1. How to evaluate ML model quality
      • 2.2. Overview of ML quality metrics. Classification, regression, ranking
      • 2.3. Evaluating ML model quality [CODE PRACTICE]
      • 2.4. Data quality in machine learning
      • 2.5. Data quality in ML [CODE PRACTICE]
      • 2.6. Data and prediction drift in ML
      • 2.7. Deep dive into data drift detection [OPTIONAL]
      • 2.8. Data and prediction drift in ML [CODE PRACTICE]
    • Module 3: ML monitoring for unstructured data
      • 3.1. Introduction to NLP and LLM monitoring
      • 3.2. Monitoring data drift on raw text data
      • 3.3. Monitoring text data quality and data drift with descriptors
      • 3.4. Monitoring embeddings drift
      • 3.5. Monitoring text data [CODE PRACTICE]
      • 3.6. Monitoring multimodal datasets
    • Module 4: Designing effective ML monitoring
      • 4.1. Logging for ML monitoring
      • 4.2. How to prioritize ML monitoring metrics
      • 4.3. When to retrain machine learning models
      • 4.4. How to choose a reference dataset in ML monitoring
      • 4.5. Custom metrics in ML monitoring
      • 4.6. Implementing custom metrics in Evidently [OPTIONAL]
      • 4.7. How to choose the ML monitoring deployment architecture
    • Module 5: ML pipelines validation and testing
      • 5.1. Introduction to data and ML pipeline testing
      • 5.2. Train and evaluate an ML model [OPTIONAL CODE PRACTICE]
      • 5.3. Test input data quality, stability and drift [CODE PRACTICE]
      • 5.4. Test ML model outputs and quality [CODE PRACTICE]
      • 5.5. Design a custom test suite with Evidently [CODE PRACTICE]
      • 5.6. Run data drift and model quality checks in an Airflow pipeline [OPTIONAL CODE PRACTICE]
      • 5.7. Run data drift and model quality checks in a Prefect pipeline [OPTIONAL CODE PRACTICE]
      • 5.8. Log data drift test results to MLflow [CODE PRACTICE]
    • Module 6: Deploying an ML monitoring dashboard
      • 6.1. How to deploy a live ML monitoring dashboard
      • 6.2. ML model monitoring dashboard with Evidently. Batch architecture [CODE PRACTICE]
      • 6.3. ML model monitoring dashboard with Evidently. Online architecture [CODE PRACTICE]
      • 6.4. ML monitoring with Evidently and Grafana [OPTIONAL CODE PRACTICE]
      • 6.5. Connecting the dots: full-stack ML observability
Powered by GitBook
On this page
  • Challenges of standard ML monitoring
  • Early monitoring metrics
  • Module 2 structure
  • Summing up
  1. ML OBSERVABILITY COURSE
  2. Module 2: ML monitoring metrics

2.1. How to evaluate ML model quality

How to evaluate ML model quality directly and use early monitoring to detect potential ML model issues.

PreviousModule 2: ML monitoring metricsNext2.2. Overview of ML quality metrics. Classification, regression, ranking

Last updated 1 year ago

Video 1. , by Emeli Dral

Challenges of standard ML monitoring

When it comes to standard ML monitoring, we usually start by measuring ML model performance metrics:

  • Model quality and error metrics show how the ML model performs in production. For example, you can track precision, recall, and log-loss for classification models or MAE for regression models.

  • Business or product metrics help evaluate the ML model’s impact on business performance. You might want to track such metrics as purchases, clicks, views, etc.

However, standard ML monitoring is not always enough. Some challenges can complicate the ML performance assessment:

  • Feedback or ground truth is delayed. When ground truth is not immediately available, calculating quality metrics can be technically impossible.

  • Past performance does not guarantee future results, especially when the environment is unstable.

  • Many segments with different quality. Aggregated metrics might not provide insights for diverse user/object groups. In this case, we need to monitor quality metrics for each segment separately.

  • The target function is volatile. Volatile target function can lead to fluctuating performance metrics, making it difficult to differentiate between local quality drops and major performance issues.

Early monitoring metrics

You can adopt early monitoring together with standard monitoring metrics to tackle these challenges.

Early monitoring focuses on metrics derived from consistently available data: input data and ML model output data. For example, you can track:

  • Data quality to detect issues with data quality and integrity.

  • Data drift to monitor changes in the input feature distributions.

  • Output drift to observe shifts in model predictions.

Module 2 structure

This module includes both theoretical parts and code practice for each of the evaluation types. Here is the module structure:

Model quality

  • Theory: ML model quality metrics for regression, classification, and ranking problems.

  • Practice: building a sample report in Python showcasing quality metrics.

Data quality

  • Theory: data quality metrics.

  • Practice: creating a sample report in Python on data quality.

Data and prediction drift

  • Theory: an overview of the data drift metrics.

  • [OPTIONAL] Theory: a deeper dive into data drift detection methods and strategies.

  • Practice: building a sample report in Python to detect data and prediction drift for various data type.

Summing up

Tracking ML quality metrics in production is crucial to ensure that ML models perform reliably in real-world scenarios. However, standard ML performance metrics like model quality and error are not always enough.

Adopting early monitoring and measuring data quality, data drift, and prediction drift provides insights into potential issues when standard performance metrics cannot be calculated.

Through this module, learners will gain a theoretical understanding and hands-on experience in evaluating and interpreting model quality, data quality, and data drift metrics.

How to evaluate ML model quality