Mar 24, 2023 • Written By Eric Liu
The BentoML team is thrilled to announce the integration between BentoML and Kubeflow in the latest Kubeflow 1.7 releases. This represents the initial stride towards a streamlined machine-learning solution at scale.
“As a longtime user of Kubeflow and a very satisfied user of BentoML, this integration makes it more exciting for us to upgrade to Kubeflow 1.7! I believe our Data Scientist colleagues would appreciate being able to build, package, and deploy models easily with minimum hassle!"
- Benjamin Tan, Machine Learning Engineer, DKatalis
Kubeflow has emerged as a comprehensive and adaptable ML platform for Kubernetes, with mature components to address the critical challenges in developing and training models. BentoML allows developers to build AI applications once and deploy them on various platforms without needing to modify any code. This unique ability has made it a preferred tool in the industry.
With the release of Kubeflow 1.7, BentoML now has native integration with Kubeflow, allowing developers to leverage BentoML's cloud-native components. Prior, developers were limited to exporting and deploying Bento as a single container. With this integration, models trained in Kubeflow can easily be packaged, containerized, and deployed to a Kubernetes cluster as microservices. This architecture enables the individual models to run in their own pods, utilizing the most optimal hardware for their respective tasks and enabling independent scaling.
To showcase the integration's capabilities, By following along with this tutorial, you'll build a fraud detection service using the Kaggle IEEE-CIS Fraud Detection dataset. The tutorial covers everything from training the models in Kubeflow notebooks to packaging and deploying the resulting BentoML service to a Kubernetes cluster.
This example can also be run from the notebook.ipynb is included in this directory.
This guide assumes that Kubeflow is already installed in the Kubernetes cluster. See Kubeflow Manifests for installation instructions.
Install BentoML cloud-native components and custom resource definitions.
Install the required packages to run this example.
Download Kaggle dataset.
In this demonstration, we'll train three fraud detection models using the Kaggle IEEE-CIS Fraud Detection dataset. To showcase saving and serving multiple models with Kubeflow and BentoML, we'll split the dataset into three equal-sized chunks and use each chunk to train a separate model. While this approach has no practical benefits, it will help illustrate how to save and serve multiple models with Kubeflow and BentoML.
Define the preprocessor.
Define our training function with the number of boosting rounds and maximum depths.
We will divide the training data into three equal-sized chunks and treat them as independent data sets. Based on these data sets, we will train three separate fraud detection models. The trained model will be saved to the local model store using BentoML model saving API.
Saved models can be loaded back into the memory and debugged in the notebook.
After the models are built and scored, let's create the service definition. You can find the service definition in the service.py module in this example. Let's break down the service.py module and explain what each section does.
First, we will create a list of preprocessors and runners from the three models we saved earlier. Runners are abstractions of the model inferences that can be scaled independently. See Using Runners for more details.
Next, we will create a service with the list of runners passed in.
Finally, we will create the API function is_fraud. We'll use the @api decorator to declare that the function is an API and specify the input and output types as pandas.DataFrame and JSON, respectively. The function is defined as async so that the inference calls to the runners can happen simultaneously without waiting for the results to return before calling the next runner. The inner function _is_fraud defines the model inference logic for each runner. All runners are called simultaneously through the asyncio.gather function, and the results are aggregated into a list. The function will return True if any of the models return True.
For more about service definitions, please see Service and APIs.
Building the service and models into a bento allows it to be distributed among collaborators, containerized into an OCI image, and deployed in the Kubernetes cluster. To build a service into a bento, we first need to define the bentofile.yaml file. See Building Bentos for more options.
Running the following command will build the service into a bento and store it to the local bento store.
Serving the bento will bring up a service endpoint in HTTP or gRPC for the service API we defined. Use --help to see more serving options.
BentoML offers three custom resource definitions (CRDs) in the Kubernetes cluster.
• BentoRequest - Describes the metadata needed for building the container image of the Bento, such as the download URL. Created by the user.
• Bento - Describes the metadata for the Bento such as the address of the image and the runners. Created by users or by the yatai-image-builder operator for reconsiliating BentoRequest resources.
• BentoDeployment - Describes the metadata of the deployment such as resources and autoscaling behaviors. Reconciled by the yatai-deployment operator to create Kubernetes deployments of API Servers and Runners.
Next, we will demonstrate two ways of deployment.
• Deploying using a BentoRequest resource by providing a Bento
• Deploying Using a Bento resource by providing a pre-built container image from a Bento
In this workflow, we will export the Bento to a remote storage. We will then leverage the yatai-image-build operator to containerize the Bento and yatai-deployment operator deploy the containerized Bento image.
Push the Bento built and saved in the local Bento store to a cloud storage such as AWS S3.
Apply the BentoRequest and BentoDeployment resources as defined in deployment_from_bentorequest.yaml included in this example.
Once the resources are created, the yatai-image-builder operator will reconcile the BentoRequest resource and spawn a pod to build the container image from the provided Bento defined in the resource. The yatai-image-builder operator will push the built image to the container registry specified during the installation and create a Bento resource with the same name. At the same time, the yatai-deployment operator will reconcile the BentoDeployment resource with the provided name and create Kubernetes deployments of API Servers and Runners from the container image specified in the Bento resource.
In this workflow, we will build and push the container image from the Bento. We will then leverage the yatai-deployment operator to deploy the containerized Bento image.
Containerize the image through containerize sub-command.
Push the containerized Bento image to a remote repository of your choice.
Apply the Bento and BentoDeployment resources as defined in deployment_from_bento.yaml file included in this example.
Once the resources are created, the yatai-deployment operator will reconcile the BentoDeployment resource with the provided name and create Kubernetes deployments of API Servers and Runners from the container image specified in the Bento resource.
Verify the deployment of API Servers and Runners. Note that API server and runners are run in separate pods and created in separate deployments that can be scaled independently.
Port forward the Fraud Detection service to test locally. You should be able to visit the Swagger page of the service by requesting http://0.0.0.0:8080 while port forwarding.
Delete the Bento and BentoDeployment resources.
The 1.7 release is just the beginning of an exciting collaboration between BentoML and Kubeflow. The integration allows developers to easily deploy BentoML services on Kubernetes for optimized hardware utilization and independent scaling. Future plans include integration with Kubeflow Pipeline for more deployment options. Whether you're new to MLOps or a current user of BentoML or Kubeflow, we invite you to try out the integration and provide feedback for further improvements.
If you enjoyed this article, please show your support by ⭐ our projects on GitHub (BentoML, Kubeflow) and joining both the Kubeflow and the BentoML Slack Community. Searching for a great place to run your ML services? Check out Bento Cloud for the easiest and fastest way to deploy your bento.