Challenge 5: Make it work and make it scale
Previous Challenge Next Challenge
Introduction
Having a model is only the first step, we can now make predictions using that model. This is typically called inferencing (or scoring) and can be done
- In an online fashion with an HTTP endpoint that can generate predictions for incoming data in real-time,
- Or in batch by running the model on a large set of files or a database table.
From this challenge onwards you’ll have the option to either do online inferencing or batch inferencing. Please choose your path:
Online Inferencing
Description
So, you’ve chosen for online inferencing. In order to use the model to serve predictions in an online fashion it has to be deployed to an endpoint. Luckily Vertex AI provides exactly what we need, a managed service for serving predictions, called Online Prediction.
Create a new Vertex AI Endpoint and deploy the freshly trained model. Use the smallest machine type but make sure that it can scale to more than 1 node by configuring autoscaling.
Note
The deployment of the model will take ~10 minutes to complete.Warning
Note that the Qwiklab environment we’re using has a quota on the endpoint throughput (30K requests per minute), do not exceed that.
Success Criteria
- The model has been deployed to an endpoint and can serve requests.
- Show that the Endpoint has scaled to more than 1 instance under load.
- No code was modified.
Tips
- Verify first that you’re getting predictions from the endpoint before generating load (for example using cURL)
- In order to generate load you can use any tool you want, but the easiest approach would be to install apache-bench on Cloud Shell or your notebook environment. Google it, if you don’t know how to use it :)
Learning Resources
- Documentation on Online Predictions deployment
- More info on the request data format. Remember that we’ve used the
scikit-learn
framework to train our model.
Batch Inferencing
Description
So, you’ve chosen for the batch inferencing path. We’re going to use Vertex AI Batch Predictions to get predictions for data in a BigQuery table. First, go ahead and create a new table with at most 10K rows that’s going to be used for generating the predictions. Once the table is created, create a new Batch Prediction job with that table as the input and another BigQuery table as the output, using the previously created model. Choose a small machine type and 2 compute nodes. Don’t turn on Model Monitoring yet as that’s for the next challenge.
Note
The batch inferencing will take roughly ~10 minutes, most of that is the overhead of starting the cluster, so increasing the number of instances won’t help with the small table we’re using.
Success Criteria
- There’s a properly structured input table in BigQuery with 10K rows.
- There’s a succesful Batch Prediction job.
- There are predictions in a new BigQuery table.
- No code was modified.
Tips
- The pipeline that we’ve used in the previous challenge contains a task to prepare the data using BigQuery, have a look at that for inspiration.
- Make sure that the input table has the exact same number of input columns as required by the model. Remember, for training extra data is needed which is not an input for the model at inferencing time ;)
Learning Resources
- Creating BigQuery datasets
- Creating BigQuery tables
- BigQuery public datasets
- Vertex AI Batch Predictions