Challenge 6: Monitor your models
Previous Challenge Next Challenge
Introduction
There are times when the training data becomes not representative anymore because of changing demographics, trends etc. To catch any skew or drift in feature distributions or even in predictions, it is necessary to monitor your model performance continuously.
Note We’ll be using Model Monitoring v1 for this challenge, which is configured during Online Prediction Endpoint configuration for online models, and Batch Prediction Run configuration for batch execution.
If you’ve chosen the online inferencing path, continue with Online Monitoring, otherwise please skip to the Batch Monitoring section.
Online Monitoring
Description
Vertex AI Endpoints provide Model Monitoring capabilities which will be configured for this challenge. Turn on Training-serving skew detection for your model and use an hourly granularity to get alerts. Create a new notification channel that uses Pub/Sub messages and configure it to use a new Pub/Sub topic.
Send at least 10K prediction requests to collect monitoring data.
Success Criteria
- Show that the Model Monitoring is running successfully for the endpoint that’s created in the previous challenge.
- Show that there’s new Pub/Sub topic and a Pub/Sub notification channel for the Model Monitoring job.
- By default Model Monitoring keeps request/response data in a BigQuery dataset, find and show that data.
- No code was modified.
Tips
- You can use the
sample.csv
file from Challenge 1 as the baseline data. - You can use the same tool you’ve used for the previous challenge to generate the requests, make sure to include some data that has a different distribution than the training data.
Learning Resources
- Introduction to Vertex AI Model Monitoring
- Creating a Pub/Sub topic
- Creating a notification channel
Batch Monitoring
Description
Vertex AI Batch prediction jobs provide Model Monitoring capabilities as well. Create a new Batch Predition job with monitoring turned on with BigQuery input and ouput tables, use default values for the alert thresholds. Create a new notification channel that uses Pub/Sub messages and configure it to use a new Pub/Sub topic.
Success Criteria
- There’s a new Batch Prediction job with monitoring turned on.
- Show that there’s new Pub/Sub topic and a Pub/Sub notification channel for the Model Monitoring job.
- As batch inferencing will take roughly ~10 minutes again, it’s sufficient to show the properly configured job configuration.
- No code was modified.
Tips
- You can use the
sample.csv
file from Challenge 1 as the baseline training data. - You can use the same data you’ve used for the previous challenge to run the batch predictions, make sure to include some data that has a different distribution than the training data.
Learning Resources
- Model monitoring for Batch Predictions
- Creating a Pub/Sub topic
- Creating a notification channel