Billions of predictions per day 3000+ community members Use by 1000+ organizations
With BentoML, you can easily build AI products with any pre-trained models, ship to production in minutes, and scale with confidence.
Manage and version all your models in an open and standardized format
openllm.serialisation.import_model(openllm.Llama.from_pretrained(model_id='meta-llama/Llama-2-7b-chat-hf'), trust_remote_code=False)
Unifying your AI app's business logic, pre/post-processing, model inference, and multi-model graphs in one framework
llama_runner = openllm.Runner('llama', model_id='meta-llama/Llama-2-7b-chat-hf') svc = bentoml.Service(name="llama-service", runners=[llama_runner]) @svc.api(input=bentoml.io.Text(), output=bentoml.io.Text()) async def prompt(input_text: str) -> str: async for req in llama_runner.generate_iterator.async_stream(input_text): pass return req[0]['text']
Build once and run it anywhere, any way you need
bentoml serve imagine:latest
Free developers from the time-consuming process of messing with infrastructure, so they can focus on innovating with AI
bentoml deploy stable-diffusion:v2 –instance-type=gpu
Deliver AI products in a fast and repeatable way
Harness GPU for inference without the headaches
Unlock insight and performance of your models
bentoml deploy clip-api-service:latest –min 0 –max 10
Automatically scale up when traffic spikes
Scale down to zero when no traffic
Pay only the compute you used