November 28, 2023 • Written By Sherlock Xu
The AI industry is advancing at a breathtaking rate. In July, Stability AI unveiled Stable Diffusion XL 1.0, an open-source model for image generation tasks. Only 4 months later, it announced Stable Video Diffusion (SVD), a generative AI model designed to create short video clips from a single static image. It is an image-to-video latent diffusion model, which means it uses a given image as a starting frame (conditioning frame) and then generates a sequence of video frames, resulting in a video clip.
SVD is trained to generate a video sequence of 14 frames at a resolution of 576x1024 pixels, using the same-sized image as the context. The model also comes with a variant featuring a standard frame-wise decoder to ensure the generated video has a more coherent flow over time. Its extension, SVD-XT, which stretches the canvas further, is capable of rendering a 25-frame sequence, thus paving the way for more advanced video creations. Similarly, it comes with a frame-wise decoder for better consistency.
These models provide new opportunities for animation, film pre-visualization, digital art, and dynamic content generation for advertising and social media. They offer a novel approach to enriching visual presentations, educational materials, and even simulations for training purposes across various sectors.
While the use cases are broad and promising, it's important to note that the technology is only designed for research purposes at this stage. The models also have limitations, such as the short length of generated videos and potential issues with the realism of generated content. More importantly, every time a new open-source AI model comes out, it always poses a challenge that every AI application developer needs to face - how to serve and deploy such models in production.
To help you deploy SVD models, we created this sample project BentoSVD with all the example code and dependencies already prepared for you. By following the steps in this blog post, you can:
Clone the BentoSVD repository and navigate to the project directory:
Key files in this project include:
config.py: Specifies the SVD model that you want to download and use to launch a server to create short videos.
requirements.txt: Dependencies required to run this project, such as BentoML.
import_model.py: A script to import SVD models into your BentoML Model Store.
service.py: The BentoML Service definition, which specifies the input and output logic for an API endpoint and leverages a custom BentoML Runner to run inference.
bentofile.yaml: The build configurations for packaging the SVD model and all associated files into a Bento.
Set up an isolated virtual environment for dependency isolation.
Run the following command to install dependencies.
It's important to test the behavior of the BentoML Service before building the SVD Bento. The Service uses
models.get to retrieve the model defined in this file, so you need to set
config.py. You can choose from
svd_xt_image_decoder, with the default set to
svd. Later, the chosen model version will be downloaded and used for launching the server.
Run the script to download the model specified.
Upon downloading, the model is registered in the BentoML Model Store, your local repository for managing models. Run
bentoml models list to view the model.
Start a server locally, which is accessible at http://0.0.0.0:3000. I used an Ampere A100 machine with 80 GB of VRAM to run this Service locally, while you should be able to start it with less than 15 GB of VRAM.
If your machine is not as powerful as A100, you may need to set a longer timeout value so that you can successfully receive the response. In this case, apply custom configuration (timeout set to 300 seconds) by running the following command.
Send a request via
curl after the server starts. Replace
test_image.png in the following command with your own image.
The model was trained to generate 25 frames at resolution 576x1024 given a context frame of the same size. Therefore, I recommend you send an image of the same resolution to get the desired response.
This is the image used in the curl request:
Returned output (leaves blowing and cloud moving):
Since the model is working properly, you can build a Bento for better distribution.
bentofile.yaml is already available in the project directory with some basic configurations. You can run
bentoml build directly to create a Bento.
Make sure you have already logged in to BentoCloud, then push the Bento to the serverless platform for advanced features like automatic scaling and observability.
Note: Alternatively, run
bentoml containerize svd-service:latest to create a Bento Docker image, and then deploy it to any Docker-compatible environment.
On the BentoCloud console, you can find the uploaded Bento on the Bentos page.
With the Bento pushed to BentoCloud, you can deploy it via the BentoCloud console. Perform the following steps:
Navigate to the Deployments page and click Create.
Select On-Demand Function, which is useful for scenarios with loose latency requirements and sparse traffic.
Specify the required fields for the Bento Deployment on the Advanced tab and click Submit. Pay attention to the following fields:
Instance: As I mentioned above, the VRAM requirement of the SVD model should be less than 15GB. I recommend you select machines like gpu.t4.xlarge, gpu.a10g.2xlarge or gpu.l4.2xlarge for the Runner. Ensure that the instance you select matches or surpasses this specification to avoid performance bottlenecks.
Timeout: This is the maximum duration to wait before a response is received. I suggest you set the timeout of both the API Server and the Runner to 300 seconds. This property defaults to 60 seconds, which may fall short if your instance has limited VRAM. Based on our tests, the SVD model’s response time varied across different hardware settings:
BentoML Configuration. Add the following in this field to set the timeout environment variables.
For more information about other properties, see Deployment creation and update information.
Wait for the application to be up and running.
The exposed URL is displayed on the Overview tab. Similarly, you can send a request via
curl. Remember to replace the endpoint URL and the image in the command below with your own.
This is the image sent in this request:
View the application’s performance metrics on the Monitoring tab.
The field of AI is ever-evolving, and the step from a static image to a dynamic video with the help of a new open-source model is just a part of this journey. With each new model coming to the open-source community, we may encounter both opportunities and challenges. What remains unchanged is that we need an efficient tool to easily create AI applications around these sophisticated models, and BentoML stands out in this landscape. If you want to find out more, I invite you to dive in, experiment with the SVD models, explore the conveniences of BentoCloud, and contribute to the different example projects of BentoML.
Happy coding! ⌨️
To learn more about BentoML and its ecosystem tools, check out the following resources: