6.5. Connecting the dots: full-stack ML observability
A brief summary of the Open-source ML observability course learnings.
Last updated
A brief summary of the Open-source ML observability course learnings.
Last updated
Video 5. , by Emeli Dral
This is the final lesson of the Open-source ML observability course. Let’s recap what we’ve learned during the course!
Start small and expand. Ad hoc reports are a good starting point for ML monitoring that is easy to implement. It is useful for initial learning about data and model quality before establishing a comprehensive monitoring system. Don’t hesitate to start small!
As you progress and deploy multiple models in production, or if you work with mission-critical use cases, you’d need a more extensive setup.
Jobs to be done to implement full-stack production ML observability:
Immediate monitoring flow helps to detect issues and to alert during model inference. If you have a production-critical service, it is essential to implement it.
Delayed monitoring flow allows you to evaluate model quality when you get the labels (as ground truth is often not available immediately!).
Model evaluation flow is needed to test model quality at updates and retraining.
Observability components to keep in mind when building ML monitoring:
Logging layer. If you have a production service, implementing logging is a must to capture model inferences and collect performance metrics.
Alerting layer allows you to monitor metrics and get notifications when things go wrong.
Dashboarding and analytics help to visualize the performance, quickly detect root cause issues, and define actions for debugging and retraining.
ML monitoring metrics We covered what metrics to use to assess , , and . We also discussed how to implement for specific use cases. For example, you can integrate custom metrics related to business KPIs and specific aspects of model quality into ML monitoring.
ML monitoring design We covered different aspects of ML monitoring design, including how to select and use . We also discussed the connection between cadence and ML monitoring.
ML monitoring architectures We explored different , from ad hoc reports and test suites to batch and real-time ML monitoring, and learned how to implement them in practice in and .
ML monitoring for unstructured data We also touched on how to build a monitoring system for and .
⭐️ to contribute back! This helps us create free, open-source tools and content for the community.
📌 so we can make this course better.
💻 for more discussions and materials on ML monitoring and observability.