Using AI to Extract Text From Images

“Words are the most powerful drug used by mankind.”

Rudyard Kipling

The fact that I have been writing this blog for nearly ten years without a penny of compensation tells me that I love words. While I don’t succeed as often as I would like to, crafting well-written sentences in clever and insightful ways makes me downright giddy. On the flip side, finding typos and sloppy paragraphs after clicking Publish makes me cringe. My words are an extension of me and I don’t want to be thought of as a careless or incompetent man. Even when I am not completely sure of what I am trying to say, I still want to say it with grace and style.

Therefore, it’s a natural progression for me to go from analyzing images for their visual content to processing them for their embedded text. More than just a curiosity, retrieving text from a photograph has real value. Consider the case where you receive a screenshot containing text that you want to paste into a form or document. Since you cannot select and copy jpeg content, you are forced to read and manually type those words and phrases. Not only is this annoying and time consuming, it’s an error-prone operation.

Making it So

This is where AI image processing steps in. Using the same concepts I applied to visual analysis, I created an application that reads image files and returns their textual content. As before, I am using the AWS Rekognition platform for analysis and Avaya Spaces to display the output. In the real world, the Spaces room would be replaced with a rules-based workflow engine, but it works fine for demonstration purposes.

To see some clever AI in action, please take a look at my latest Cheapo-Cheapo Productions video.

The Nitty-Gritty

To process an image for embedded text, I invoke the AWS Rekognition detectText() API. If successful, the command returns an array of TextDetections objects where each object describes an instance of found text.

For this proof-of-concept application, I am only sending the DetectedText values to my Spaces room. I am also screening on the Type value. Rekognition returns both WORD (single words) and LINE (lines of text) types. For this demonstration, LINE best demonstrates text extraction.

Lastly, note the Confidence value. This can be used to screen out DetectedText values that fall below a particular threshold — e.g. I only display text with a confidence value of 85% or higher.

Mischief Managed

You don’t have to be a lover of words to see the value in this. Optical Character Recognition (OCR) has been around for many years, but this takes it to the next level. Not only can AI be used to find words, the composability nature of cloud platforms like Avaya OneCloud and AWS Rekognition allow this functionality to be embedded inside any workflow. For example, applications can be built to search images for inappropriate, sensitive, and private data and launch alerts when it is encountered. This can be done with static images or real-time video.

As always, thank you for reading and watching. I hope you find this technology as exciting as I do.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: