Build, Ship, and Scale 

AI Applications

BentoML is the platform for AI application developers, providing the tools and infrastructure to streamline your entire AI product development lifecycle.


Copyright © 2022 BentoML

Getting StartedBlog

Ship to prod faster

Learn about how we can help accelerate your machine learning projects to production. Save time and resources by streamlining deployment for your development and production workflows.

Try BENTO CLOUD: The best way to ship ML


The fastest way to build AI applications


BentoML’s open-source framework simplifies AI application development and enables you to build your idea faster

BentoML natively supports all the popular ML frameworks, including Pytorch, Tensorflow, JAX, XGBoost, HuggingFace, MLFlow, as well as the latest pre-built open source LLMs (large language models) and Generative AI models

Build AI products using any pre-trained models

Scale Python for AI workloads

BentoML scales your AI workloads built with Python. Multi-model graph inference, parallel model inference, and adaptive batching, among many advanced AI features, are wrapped in easy-to-use primitives in Python

Develop with one unified interface that can be easily rolled out as a REST API endpoint or gRPC service, integrated into data pipelines for batch workloads, or processing in real time with streaming architecture.

One unified framework for online, offline, and streaming


From prototype to production in minutes with BentoCloud

BentoCloud is a fully-managed platform for deploying and operating AI applications built with BentoML, simplifying your AI team’s journey from prototype to production

Harness GPU for inference without the headaches

Run ML inference on GPU without any code changes. BentoML set up the correct CUDA/cuDNN environment for your model automatically and pick the optimal runtime for different ML frameworks

Unlock insights and performance of your models

BentoCloud offers centralized model monitoring and lifecycle management for you to gain real-time insights into your AI applications, enabling you to proactively optimize performance.

Continuously deliver for AI-powered products

Reduce time-to-market of your newly improved models. BentoCloud enables CI/CD for your AI products, allowing easy experimentation, explainability and shadow testing with online traffic. 


Scaling should be exciting, not stressful

BentoCloud unlocks your AI application’s full potential, giving you the flexibility to scale whenever you need it.

Serverless infrastructure built for AI

Reliable auto-scaling without compromises - BentoCloud offers traffic fluctuations handling, CPU/GPU compute cost optimization and high availability, with everything automatically provisioned for your needs

Scale down to Zero

You don’t want to pay expensive GPUs for underutilized AI models.  BentoCloud automatically scales them down to zero (at no cost to you) when traffic is flat.

Operate a fleet of live models with ease

Sleep better without infrastructure alarms waking you up at night. Our platform was built with industry best practices to scale your workload to hundreds and thousands of live models in production.

Foster best practices across your organization

BentoML is the open standard for creating AI applications, bringing the consistency that empowers developers to be more agile, innovative, and productive, across all your AI product teams.

"Koo adopted BentoML as a platform of choice for model deployments and monitoring. It was clear that deploying ML models, a statistic that most companies struggle with, was now a solved problem using BentoML."

Harsh Singhal

Head of Machine Learning & AI

Use cases across industries

🏦 Financial Services

From fraud detection to credit risk scoring, financial services require production grade ML services

Learn more

🛒 eCommerce

Product recommendations are a proven way to increase top line revenue

Learn more

🏖️ Travel

Finding the best local experiences for a traveling user is key to providing value

Learn more

✈️ Transportation

Determining the best flights or travel itinerary in real-time provides a competitive advantage

Learn moreLearn more

Determining an ideal price in real-time helps maximize value while providing a good user experience

 🚗 Automotive

💬 Communications

Using NLP models in text based applications provides a value add service for users

Learn moreLearn MoreSchedule a DemoSchedule a Demo
Schedule a Demo

"BentoML enables us to deliver business value quickly by allowing us to deploy ML models to our existing infrastructure and scale the model services easily."

Shihgian Lee

Senior Machine Learning Engineer

Trusted by the best ML teams in the world

Join the 🍱 community !
ContactPrivacy PolicyTutorialDocumentationGalleryOpen SourceCommunity