Figure 1 shows the overall architecture presented in this blog. We cover the components and their connection in the context of two common workflows of the MLOps system.
Components
Vertex AI is at the heart of this system, and it leverages Vertex Managed Datasets, AutoML, Predictions, and Pipelines. We can create and manage a dataset as it grows using Vertex Managed Datasets. Vertex AutoML selects the best model without your knowing much about modeling. Vertex Predictions creates an endpoint (RestAPI) to which the client communicates.
It is a simple, fully managed yet somewhat complete end-to-end MLOps workflow moves from a dataset to training a model that gets deployed. This workflow can be programmatically written in Vertex Pipelines. Vertex Pipelines outputs the specification for an ML pipeline allowing you to re-run the pipeline whenever or wherever you want. Specify when and how to trigger the pipeline using Cloud Functions and Cloud Storage.
Cloud Functions is a serverless way to deploy your code in Google Cloud. In this particular project, it triggers the pipeline by listening to changes on the specified Cloud Storage location. Specifically, if a new dataset is added, for example, a new span number is created; the pipeline is triggered to train the dataset, and a new model is deployed.
Workflow
This MLOps system prepares the dataset with either Vertex Dataset’s built-in user interface (UI) or any external tools based on your preference. You can upload the prepared dataset into the designated GCS bucket with a new folder named SPAN-NUMBER. Cloud Functions then detects the changes in the GCS bucket and triggers the Vertex Pipeline to run the jobs from AutoML training to endpoint deployment.
Inside the Vertex Pipeline, it checks if there is an existing dataset created previously. If the dataset is new, Vertex Pipeline creates a new Vertex Dataset by importing the dataset from the GCS location and emits the corresponding Artifact. Otherwise, it adds the additional dataset to the existing Vertex Dataset and emits an artifact.
When the Vertex Pipeline recognizes the dataset as a new one, it trains a new AutoML model and deploys it by creating a new endpoint. If the dataset isn’t new, it tries to retrieve the model ID from Vertex Model and determines whether a new AutoML model or an updated AutoML model is needed. The second branch determines whether the AutoML model has been created. If it hasn’t been created, the second branch creates a new model. Also, when the model is trained, the corresponding component emits the artifact as well.
Directory structure that reflects different distributions
In this project, I have created two subsets of the CIFAR-10 dataset, SPAN-1 and SPAN-2. A more general version of this project can be found here, which shows how to build training and batch evaluation pipelines pipelines. The pipelines can be set up to cooperate so they can evaluate the currently deployed model and trigger the retraining process.
ML Pipeline with Kubeflow Pipelines (KFP)
We chose to use Kubeflow Pipelines to orchestrate the pipeline. There are a few things that I would like to highlight. First, it’s good to know how to make branches with conditional statements in KFP. Second, you need to explore AutoML API specifications to fully leverage AutoML capabilities, such as training a model based on the previously trained one. Last, you also need to find a way to emit artifacts for Vertex Dataset and Vertex Model to consume that Vertex AI can recognize them. Let’s go through these one by one.
Branching strategy
In this project, there are two main conditions and two sub-branches inside the second main branch. The main branches split the pipeline based on a condition if there is an existing Vertex Dataset. The sub-branches are applied in the second main branch, which is selected when there is an exciting Vertex Dataset. It looks up the list of models and decides to train an AutoML model from scratch or a previously trained one.
ML pipelines written in KFP can have conditions with a special syntax of kfp.dsl.Condition. For instance, we can define the branches as follows: