August 17, 2023 • Written By Sherlock Xu
Optical Character Recognition (OCR) involves training models to recognize and convert visual representations of text into a digital format. Recent advances in deep learning have greatly improved the accuracy and capabilities of OCR systems, enabling a wide range of applications. In this blog post, I will demonstrate how to build a simple OCR application with BentoML and EasyOCR.
Note: EasyOCR is a python module for extracting text from images. It can read both natural scene text and dense text in documents, with over supported 80+ languages.
Let’s create a virtual environment first for dependency isolation.
Then, install the required dependencies.
download_model.py file as below.
This file uses
easyocr.Reader to create a reader instance and loads it into memory. I am using English as the language here while you can choose a preferred language from EasyOCR’s language list. The
save_model method registers the model into BentoML’s local Model Store. In addition to the model name (
en-reader) and the model instance (
reader), you can also add additional information of the model, such as labels and metadata.
Run this script and the model should be saved to the Model Store.
View all the models in the Store.
Create a BentoML Service (by convention,
service.py) with an API endpoint to expose the model externally.
Let’s look at the file in more detail.
bentoml.easyocr.get method retrieves the model from the Model Store. Alternatively, you can also use the
bentoml.models.get method for the same purpose. Note that BentoML provides framework-specific
get methods for each framework module. The difference between them and
bentoml.models.get is that the former ones verify if the model found matches the specified framework.
With the retrieved model,
to_runner creates a Runner instance. In BentoML, Runners are units of computation in BentoML. As BentoML uses a microservices architecture to serve AI applications, Runners allow you to combine different models, scale them independently, and even assign different resources (e.g. CPU and GPU) to them.
bentoml.Service then creates a Service with the Runner wrapped in it.
Lastly, use a decorator for the
transcript_text function to define an API endpoint. The function expects an image input and returns plain text (a NumPy array). More specifically, the model returns the following information for a given image.
I only need the detected text so my code here only extracts it (which is at index 1 in each item of the results) and joins them with newline characters to create a single string of all detected texts.
Start the Service. Add the
--reload flag to reload the Service when code changes are detected.
Interact with the Service by visiting http://0.0.0.0:3000 or send a request via
bentoml-ocr.png in the following example with your own file.
This is the image used in the command:
The output by the model:
The result is not perfect as the “+” icons are rendered in separate lines but the model has detected most of the text. As I mentioned above, the model’s output also includes the bounding box and confidence scores. You can customize the code as needed to change the output format.
Once the BentoML Service is ready, package the model into a Bento, the standardized distribution format in BentoML. To create a Bento, define a
bentofile.yaml file as below. See Bento build options to learn more.
bentoml build in your project directory to build the Bento.
View all available Bentos:
Containerize the Bento with Docker:
The newly-created image has the same tag as the Bento:
With the Docker image, you can deploy it to any Docker-compatible environments like Kubernetes. Alternatively, push the image to BentoCloud, a serverless platform in the BentoML ecosystem that allows you to run and scale AI applications.
We’ve traversed the journey of building and deploying an OCR model using EasyOCR and BentoML, starting from setting up the environment to packaging the model into a Docker container. As always, the success of an OCR model is dependent not only on the technology but also on the quality and diversity of the training data. The results can always be enhanced with more specialized training, tweaking parameters, or integrating post-processing steps. The path is rich with possibilities, and I encourage you to experiment, iterate, and share their findings with BentoML.
Happy coding ⌨️, and until next time!