“A picture is worth a thousand words and a video is worth a million pictures.”
Ankala Subbarao
In my previous blog article, Real-Time AI Image Recognition, I showed you how I integrated my PC’s webcam with AWS Image Rekognition. My camera application took a photo every two seconds, those photos were sent to Rekognition for label detection, and the results were posted to Avaya Spaces as chat messages. As pleased as I was with the outcome, I knew it was only the beginning. For me, image recognition solution building is a trilogy that consists of still images, video files, and streaming video. All three are important and provide their own sets of powerful, and sometimes unique use-cases.
I am not quite ready to tackle streaming video (the hardest one of the bunch), so I took the time to understand the fundamentals of processing video files. I then put that newfound knowledge to work building a prototype application that demonstrates how the technology can be used.
Remember the photograph that Harry Potter had of his deceased parents? Unlike the photos on my fireplace mantle, Lily and James Potter danced within the picture frame. Processing video files is a bit like processing a static photograph if you think of a video as a linear collection of Harry Potter images. This allows you to divide a video into separate chunks where each chunk is a set timeframe — for example, 500 milliseconds. Those chunks are then processed one after the other.
As you would imagine, analyzing a video typically takes a lot longer than analyzing a single jpeg file. To account for that, AWS video Rekognition works asynchronously. You pass it a video to work on, Rekognition returns a unique job Id, and then you wait for Rekognition to notify you that the job is done. Notification can come to the same application, or you can divide your solution into two halves. One half sends in the videos and the other half accepts the results. This happens to be the way I coded my prototype.
Notification can come in a number of different ways — email, http, https, AWS Lambda, SMS text messages, etc. For me, configuring an http/https webhook is the easiest and most straightforward integration.
Perhaps it’s me, but I found the setup of Rekognition notifications to be somewhat convoluted. The final solution involved multiple AWS services — IAM, Simple Notification Service, and Simple Queue Service. Amazon describes the process here, but even then it was a trial and error process to make it work.
Once I had the AWS notification configuration established, I was able to upload a video into an S3 Bucket and invoke Rekognition’s StartLabelDetection(). The invocation parameters include information about the uploaded S3 video along with the notification channel settings:

A successful invocation of startLabelDetection() returns a JobId. When label processing is compete, my application’s notification http webhook will be invoked for that JobId. The notification webhook will then call getLabelDetection() to receive the label analysis data for the video in timeline order.
To see my application in action (including the gory details of label analysis), check out this Cheapo-Cheapo Productions video.
Mischief Managed
As challenging as some aspects of this endeavor were, it was a very gratifying process. Static image analysis is powerful and applying the same AI methods to videos takes everything to a new level of power and excitement. Whether you are thinking near real-time or historic analysis, the use-case are nearly endless.
I now have two of the three image analysis scenarios coded. Streaming video feels a bit daunting right now, but as with anything new, it will all makes sense one day. Stay tuned for more fun and games.