Challenge 5: Make it work and make it scale

Introduction

Having a model is only the first step, we can now make predictions using that model. This is typically called inferencing (or scoring) and can be done

In an online fashion with an HTTP endpoint that can generate predictions for incoming data in real-time,
Or in batch by running the model on a large set of files or a database table.

From this challenge onwards you’ll have the option to either do online inferencing or batch inferencing. Please choose your path:

Online Inferencing
Batch Inferencing

Online Inferencing

Description

So, you’ve chosen for online inferencing. In order to use the model to serve predictions in an online fashion it has to be deployed to an endpoint. Luckily Vertex AI provides exactly what we need, a managed service for serving predictions, called Online Prediction.

Create a new Vertex AI Endpoint and deploy the freshly trained model. Use the smallest machine type but make sure that it can scale to more than 1 node by configuring autoscaling.

Note
The deployment of the model will take ~10 minutes to complete.

Warning
Note that the Qwiklab environment we’re using has a quota on the endpoint throughput (30K requests per minute), do not exceed that.

Success Criteria

The model has been deployed to an endpoint and can serve requests.
Show that the Endpoint has scaled to more than 1 instance under load.
No code was modified.

Tips

Verify first that you’re getting predictions from the endpoint before generating load (for example using cURL)
In order to generate load you can use any tool you want, but the easiest approach would be to install apache-bench on Cloud Shell or your notebook environment. Google it, if you don’t know how to use it :)

Learning Resources

Documentation on Online Predictions deployment
More info on the request data format. Remember that we’ve used the scikit-learn framework to train our model.

Batch Inferencing

Description

So, you’ve chosen for the batch inferencing path. We’re going to use Vertex AI Batch Predictions to get predictions for data in a BigQuery table. First, go ahead and create a new table with at most 10K rows that’s going to be used for generating the predictions. Once the table is created, create a new Batch Prediction job with that table as the input and another BigQuery table as the output, using the previously created model. Choose a small machine type and 2 compute nodes. Don’t turn on Model Monitoring yet as that’s for the next challenge.

Note
The batch inferencing will take roughly ~10 minutes, most of that is the overhead of starting the cluster, so increasing the number of instances won’t help with the small table we’re using.

Success Criteria

There’s a properly structured input table in BigQuery with 10K rows.
There’s a succesful Batch Prediction job.
There are predictions in a new BigQuery table.
No code was modified.

Tips

The pipeline that we’ve used in the previous challenge contains a task to prepare the data using BigQuery, have a look at that for inspiration.
Make sure that the input table has the exact same number of input columns as required by the model. Remember, for training extra data is needed which is not an input for the model at inferencing time ;)

Learning Resources

Creating BigQuery datasets
Creating BigQuery tables
BigQuery public datasets
Vertex AI Batch Predictions

Previous Challenge Next Challenge