BentoML is the platform for AI application developers, providing the tools and infrastructure to streamline your entire AI product development lifecycle.
Copyright © 2022 BentoML
Ship to prod faster
Learn about how we can help accelerate your machine learning projects to production. Save time and resources by streamlining deployment for your development and production workflows.
Try BENTO CLOUD: The best way to ship ML
The fastest way to build AI applications
BUILD
BentoML’s open-source framework simplifies AI application development and enables you to build your idea faster
BentoML natively supports all the popular ML frameworks, including Pytorch, Tensorflow, JAX, XGBoost, HuggingFace, MLFlow, as well as the latest pre-built open source LLMs (large language models) and Generative AI models
BentoML scales your AI workloads built with Python. Multi-model graph inference, parallel model inference, and adaptive batching, among many advanced AI features, are wrapped in easy-to-use primitives in Python
Develop with one unified interface that can be easily rolled out as a REST API endpoint or gRPC service, integrated into data pipelines for batch workloads, or processing in real time with streaming architecture.
SHIP
From prototype to production in minutes with BentoCloud
BentoCloud is a fully-managed platform for deploying and operating AI applications built with BentoML, simplifying your AI team’s journey from prototype to production
Run ML inference on GPU without any code changes. BentoML set up the correct CUDA/cuDNN environment for your model automatically and pick the optimal runtime for different ML frameworks
BentoCloud offers centralized model monitoring and lifecycle management for you to gain real-time insights into your AI applications, enabling you to proactively optimize performance.
Reduce time-to-market of your newly improved models. BentoCloud enables CI/CD for your AI products, allowing easy experimentation, explainability and shadow testing with online traffic.
Scale
Scaling should be exciting, not stressful
BentoCloud unlocks your AI application’s full potential, giving you the flexibility to scale whenever you need it.
Reliable auto-scaling without compromises - BentoCloud offers traffic fluctuations handling, CPU/GPU compute cost optimization and high availability, with everything automatically provisioned for your needs
You don’t want to pay expensive GPUs for underutilized AI models. BentoCloud automatically scales them down to zero (at no cost to you) when traffic is flat.
Sleep better without infrastructure alarms waking you up at night. Our platform was built with industry best practices to scale your workload to hundreds and thousands of live models in production.
BentoML is the open standard for creating AI applications, bringing the consistency that empowers developers to be more agile, innovative, and productive, across all your AI product teams.
"Koo adopted BentoML as a platform of choice for model deployments and monitoring. It was clear that deploying ML models, a statistic that most companies struggle with, was now a solved problem using BentoML."
Harsh Singhal
Head of Machine Learning & AI
Use cases across industries
From fraud detection to credit risk scoring, financial services require production grade ML services
Product recommendations are a proven way to increase top line revenue
Finding the best local experiences for a traveling user is key to providing value
Determining the best flights or travel itinerary in real-time provides a competitive advantage
Determining an ideal price in real-time helps maximize value while providing a good user experience
Using NLP models in text based applications provides a value add service for users
"BentoML enables us to deliver business value quickly by allowing us to deploy ML models to our existing infrastructure and scale the model services easily."
Shihgian Lee
Senior Machine Learning Engineer
Trusted by the best ML teams in the world