September 14, 2023 • Written By Sherlock Xu
LangChain is an open-source framework for creating applications powered by large language models (LLMs), like AI chatbots and personal assistants. When it comes to LLMs, you can also use OpenLLM, an open-source platform in the BentoML ecosystem designed specifically for serving and deploying LLMs in production. As OpenLLM provides seamless support for both LangChain and BentoML, you can harness their combined strengths, unlocking enhanced capabilities from each platform.
In this blog post, I will build a LangChain application capable of creating self-introductions for job interviews. Our journey begins with a simple script, eventually evolving it into a sophisticated version that exposes an API endpoint, allowing for wider external interactions.
pip install langchain openllm. BentoML is automatically installed with OpenLLM.
g4dn.12xlargeto avoid potential resource limitations. The resource requirements vary with the model; you can gauge the necessary vRAM using the Model Memory Calculator from Hugging Face.
LangChain’s integration with OpenLLM is straightforward. Instantiate an
llm object using
langchain.llms by specifying the required model details:
databricks/dolly-v2-7b model powers this application. You can run
openllm models to see all supported models. Running the script triggers OpenLLM to download the model to the BentoML local Model Store if it’s not already available.
Run this script and here is the result returned:
"Hi, my name is Joe, and I am here to help you. As a data analyst, my job is to make sense of data and derive business value from it. I am experienced in a wide variety of data roles and types and can help you find your way with my eyes wide open. I offer a free 30 minute consultation to discuss your data needs."
The self-introduction is ok, but ideally the application should solicit user-specific details like their name, industry, educational background, and skill set, for a more tailored experience. Moreover, you may want to provide the model with additional contextual information to refine the self-introduction for specific scenarios, like a job interview.
LangChain’s prompt templates offer the flexibility of integrating scenario information and customized variables. These templates are essentially pre-defined recipes, such as instructions and few-shot examples, which guide the LLMs to generate content for specific tasks. As shown below, I embedded a template to generate self-introductions for software engineer job interviews:
With a ready-to-use template, you can create an LLMChain, a common object in LangChain to adds functionality around the language model, and specify the required variables to generate dynamic self-introductions. Here is the entire application v2 code for your reference. Note that this time I used databricks/dolly-v2-12b as the model for better inference performance.
Run this application and here are two example results for your reference.
Example output 1:
"Hi, I'm John Smith. I'm a software engineer based in the San Francisco Bay area with over 5 years of experience working in the artificial intelligence industry. My past jobs have involved building data pipelines and distributed machine learning workflows, with a focus on Kubernetes and AWS. I also teach Python and Go to people with no background in coding. Beyond work, I am an avid cyclist and microservices fan. I'm very happy to be considered for this role, and look forward to discussing my experience and skills further with you."
Example output 2:
"I am an experienced software engineer with extensive experience in Kubernetes, BentoML and AWS. In my current role, I use my extensive experience to help our engineering team deploy our products to AWS, as well as contribute to BentoML pipelines to improve our product development and delivery process. Before taking on this role, I studied computer science at Stanford University and developed ML-based sentiment analysis tools in my final year project. I am now excited to apply my skills in this field to help develop and improve our engineering culture."
As you can see, this enhanced version is able to generate more personalized and contextually relevant introductions.
OpenLLM provides first-class support for BentoML. Integrating the application with BentoML allows you to leverage the benefits of the BentoML framework and its ecosystem. For example, you can wrap the
llm instance created previously in a BentoML Runner for better scaling and inference; you are also able to expose an API endpoint (available on a Swagger UI) for the BentoML Service containing the Runner for external interactions with users.
For input validation and structure, I used
pydantic to ensure the correctness of user-provided data. Furthermore, it may be a good idea to provide sample inputs, which are pre-populated on the Swagger UI for clarity. See the following code snippet for details.
I list the entire application v3 code below, usually stored in a
service.py file in BentoML.
Start the application server using:
Access the application server at http://0.0.0.0:3000. In the Swagger UI, scroll down to the
/generate endpoint, and click Try it out. Let’s try something different this time by changing the
skills fields, and click Execute.
Example output (obviously there are some hallucinations 🤣):
"Hello, I am John Smith. After graduating from Stanford University, I have been working as a software engineer for 5 years. Throughout my career, I have used my skills in 3D graphics and computer vision to contribute taxpayers in the gaming industry. My work in this space has involved calibrating camera parameters and creating 3D assets for game characters and environments. My job has required me to use my extensive knowledge of 3D graphics and computer vision to help create realistic visuals. Additionally, I recently launched a camera calibration tool that has been adopted by other engineers in the industry. I feel very fortunate to be working in such a creative industry. I hope to provide value to your team in some way in the future."
You can then build the LangChain application into a Bento, the standardized distribution format in BentoML, and ship it to BentoCloud for production deployment. Alternatively, containerize the Bento and deploy the resulting Docker image anywhere.
Since the open-source release of OpenLLM, we have noticed growing attention and enthusiasm around the project. One of the highlighted features is its integration with popular frameworks to create AI applications. While I delved into its synergy with BentoML and LangChain in this post, OpenLLM also supports Transformers Agents. Stay tuned for a future blog post where I will unpack this integration in greater detail. As always, happy coding! ⌨️
To learn more about BentoML, OpenLLM, and other ecosystem tools, check out the following resources: