“If we can see the little things, we can see the entire universe.”
My favorite subjects are those that appear to be boundless — where the more I learn, the more I realize how little I know. Artificial Intelligence for image and video processing is one of those subjects. While my work developing AI applications these past several weeks taught me a lot, there is so much more that I still don’t appreciate let alone comprehend. Thankfully, I can be quite tenacious when it comes to concepts that capture my attention and I keep plugging along until I finally reach a place of comfort.
In Image Analysis from AWS Rekognition and Avaya OneCoud CPaas, I demonstrated how AI can examine a photograph to tell you that it sees a person. I took it a little further in Facial Recognition from AWS Rekognition and Avaya OneCloud CPaaS and built an application where AI was used to tell you who that person was. This time around, I not only want to know that an image contains a person, I want to categorize that person by gender, age, and emotion. To add a little spice to the recipe, I want to do this in near real-time.
Customer experience improves as the provider knows more about who they are working with. The needs of me, a 60-something male, are not necessarily the same as those of a 20-something female. Adding emotion to the mix personalizes the experience even more. “Seeing” that someone is confused or unhappy can potentially alleviate problems before they escalate out of control.
Seeing is Believing
As is nearly always the case, I learn best by creating the thing I am trying to understand. In this case, I developed a prototype application that processed images from my PC’s webcam in near real-time. In order to visualize the output, I send the analysis to an Avaya Spaces room. In the real world, I would use the AI discoveries to personalize customer service through applications such as agent assist and/or dynamic content delivery.
To see gender, age, and emotion analysis in action, please check out my Cheapo-Cheapo Productions video.
The backend AI analysis generates far more information than I have chosen to send to the Spaces room. I ignore the following:
Eyes Open true/false
Mouth Open true/false
The AI engine simultaneously reports seven different levels of emotion — Disgusted, Happy, Surprised, Confused, Calm, Angry, and Sad. I have chosen to only present the emotion with the highest confidence level. It’s all there for the taking, though.
Finally, Rekognition not only tells me what it sees, it tells me where in the image it sees it. Among many things, this information could be used to tag the digital photograph.
Imagine standing in front of a digital display that dynamically changed its content depending on who stood before it. Younger people saw items that appealed to them and old guys like me saw old guy stuff. That is the power of personalized AI technology.
I have yet to tire of my adventures with artificial intelligence and image/video recognition/analysis and expect to return to this subject very again. Thanks to all who have reached out to me with questions, comments, and collaboration. I can’t do this in a vacuum.