LogoLogo
DiscordGitHub
  • Welcome!
  • ML OBSERVABILITY COURSE
    • Module 1: Introduction
      • 1.1. ML lifecycle. What can go wrong with ML in production?
      • 1.2. What is ML monitoring and observability?
      • 1.3. ML monitoring metrics. What exactly can you monitor?
      • 1.4. Key considerations for ML monitoring setup
      • 1.5. ML monitoring architectures
    • Module 2: ML monitoring metrics
      • 2.1. How to evaluate ML model quality
      • 2.2. Overview of ML quality metrics. Classification, regression, ranking
      • 2.3. Evaluating ML model quality [CODE PRACTICE]
      • 2.4. Data quality in machine learning
      • 2.5. Data quality in ML [CODE PRACTICE]
      • 2.6. Data and prediction drift in ML
      • 2.7. Deep dive into data drift detection [OPTIONAL]
      • 2.8. Data and prediction drift in ML [CODE PRACTICE]
    • Module 3: ML monitoring for unstructured data
      • 3.1. Introduction to NLP and LLM monitoring
      • 3.2. Monitoring data drift on raw text data
      • 3.3. Monitoring text data quality and data drift with descriptors
      • 3.4. Monitoring embeddings drift
      • 3.5. Monitoring text data [CODE PRACTICE]
      • 3.6. Monitoring multimodal datasets
    • Module 4: Designing effective ML monitoring
      • 4.1. Logging for ML monitoring
      • 4.2. How to prioritize ML monitoring metrics
      • 4.3. When to retrain machine learning models
      • 4.4. How to choose a reference dataset in ML monitoring
      • 4.5. Custom metrics in ML monitoring
      • 4.6. Implementing custom metrics in Evidently [OPTIONAL]
      • 4.7. How to choose the ML monitoring deployment architecture
    • Module 5: ML pipelines validation and testing
      • 5.1. Introduction to data and ML pipeline testing
      • 5.2. Train and evaluate an ML model [OPTIONAL CODE PRACTICE]
      • 5.3. Test input data quality, stability and drift [CODE PRACTICE]
      • 5.4. Test ML model outputs and quality [CODE PRACTICE]
      • 5.5. Design a custom test suite with Evidently [CODE PRACTICE]
      • 5.6. Run data drift and model quality checks in an Airflow pipeline [OPTIONAL CODE PRACTICE]
      • 5.7. Run data drift and model quality checks in a Prefect pipeline [OPTIONAL CODE PRACTICE]
      • 5.8. Log data drift test results to MLflow [CODE PRACTICE]
    • Module 6: Deploying an ML monitoring dashboard
      • 6.1. How to deploy a live ML monitoring dashboard
      • 6.2. ML model monitoring dashboard with Evidently. Batch architecture [CODE PRACTICE]
      • 6.3. ML model monitoring dashboard with Evidently. Online architecture [CODE PRACTICE]
      • 6.4. ML monitoring with Evidently and Grafana [OPTIONAL CODE PRACTICE]
      • 6.5. Connecting the dots: full-stack ML observability
Powered by GitBook
On this page
  • Software system health
  • Data quality and data integrity
  • ML model quality and relevance
  • Business KPIs
  1. ML OBSERVABILITY COURSE
  2. Module 1: Introduction

1.3. ML monitoring metrics. What exactly can you monitor?

A framework to organize ML monitoring metrics. Software system health, data quality, ML model quality, and business KPIs.

Previous1.2. What is ML monitoring and observability?Next1.4. Key considerations for ML monitoring setup

Last updated 1 year ago

Video 3. , by Emeli Dral

An ML-based service is more than just an ML model. One needs to keep tabs on all the facets of the ML system quality. When it comes to monitoring the system performance, there are different groups of metrics.

Software system health

It doesn’t matter how excellent your model is when the whole ML system is down. To track the overall system health, you can reuse existing monitoring schemes from other production services. Standard software performance metrics include latency, error rate, memory usage, disk usage, etc.

Data quality and data integrity

In many cases, model issues stem from issues with the input data. To monitor data quality and integrity, you can keep tabs on metrics like the share of missing values, type mismatch, or range violations for important features. The goal here is to ensure the stability of data pipelines.

ML model quality and relevance

ML model performance metrics help to ensure that ML models work as expected:

  • Standard metrics help evaluate the quality of the ML model in production. For example, you can track metrics like precision and recall for classification, MAE or RMSE for regression, or top-k accuracy for ranking.

  • You can also track use-case specific quality metrics like bias or fairness: for example, through metrics like predictive parity or equalized odds.

  • When ground truth is unavailable or delayed, use proxy metrics. Keep tabs on prediction drift, input data drift, or share of new categories. These metrics can signal potential problems before the ML model quality is affected.

Business KPIs

The ultimate measure of the model quality is its impact on the business. Depending on business needs, you may want to monitor clicks, purchases, loan approval rates, cost savings, etc. This is typically custom to the use case and might involve collaborating with product managers or business teams to determine the right business KPIs.

For a deeper dive into ML model quality and relevance and data quality and integrity metrics, head to .

Module 2
ML monitoring metrics