Build, Ship, Scale

Applications

BentoML is the platform for software engineers to build AI products.

Billions of predictions per day 3000+ community members Used by 1000+ organizations

Trusted by the best AI teams

me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me
me

BentoML

Unified AI Application Framework

With BentoML, you can easily build AI products with any pre-trained models, ship to production in minutes, and scale with confidence.

01. Models

Manage and version all your models in an open and standardized format

Llama 2
Stable Diffusion
Flan-T5
Segment Anything
CLIP
Your Own Model
openllm.import_model('llama', model_id='meta-llama/Llama-2-7b-chat-hf')

02. Service APIs

Unifying your AI app's business logic, pre/post-processing, model inference, and multi-model graphs in one framework

llama_runner = openllm.Runner('llama', model_id='meta-llama/Llama-2-7b-chat-hf') svc = bentoml.Service(name="llama-service", runners=[llama_runner]) @svc.api(input=bentoml.io.Text(), output=bentoml.io.Text()) async def prompt(input_text: str) -> str: async for req in llama_runner.generate_iterator.async_stream(input_text): pass return req[0]['text']

03. Run

Build once and run it anywhere, any way you need

HTTP
gRPC
Batch Inference
Python API
bentoml serve imagine:latest

Use Cases

Freedom to build with state-of-the-art AI models pre-packaged and pre-optimized at your fingertips

me

Multimodal

Text-to-Image Generation

Large Vision-Language Model

LLM-Aided Visual Reasoning

Multimodal LLMs

me

NLP & LLMs

Large Language Model

Text Classification

Sentiment Analysis

Text Embedding

me

Computer Vision

Image Classification

Object Detection

Image Segmentation

OCR

me

Audio

Speech Recognition

Voice Generation

Music Generation

me

Tabular

Tabular Regression

Tabular Classification

Tabular Forecasting

BentoCloud

Bring your AI products to market 10x Faster

Free developers from the time-consuming process of messing with infrastructure, so they can focus on innovating with AI

bentoml deploy stable-diffusion:v2 –-instance-type=gpu

Deliver AI products in a fast and repeatable way

Harness GPU for inference without the headaches

Unlock insight and performance of your models

Serverless GPUs to scale your model inference

bentoml deploy clip-api-service:latest –min 0max 10

Automatically scale up when traffic spikes

Scale down to zero when no traffic

Pay only the compute you used

What our customers say

“Koo started to adopt BentoML more than a year ago as a platform of choice for model deployments and monitoring. From our early experience it was clear that deploying ML models, a statistic that most companies struggle with, was a solved problem for Koo. The BentoML team works closely with their community of users like I've never seen before. Their AMAs, the advocacy on Slack and getting on calls with their customers, are much appreciated by early-adopters and seasoned users”

Harsh Singhal, Head of Machine Learning, Koo

“BentoML is helping us future-proof our machine learning deployment infrastructure at Mission Lane. It is enabling us to rapidly develop and test our model scoring services , and to seamlessly deploy them into our dev, staging, and production Kubernetes clusters.”

Mike Kuhlen, Data Science & Machine Learning Solutions and Strategy, Mission Lane

“BentoML enables us to deliver business value quickly by allowing us to deploy ML models to our existing infrastructure and scale the model services easily.”

Shihgian Lee, Senior Machine Learning Engineer, Porch

"BentoML is an excellent tool for saving resources and running ML at scale in production"

Woongkyu Lee, Data and ML Engineer, LINE

“BentoML has helped us scale the way we help our users package and test their models. Their framework is core piece of our product. Really happy to be a part of the BentoML community.”

Gabriel Bayomi, CEO, OpenLayer