It seems that every new day brings a new advancement in AI translation technology, with algorithms continually adding support for new languages or enhancing the accuracy of their interpretations. However, all these advancements have one thing in common: They are limited to interpreting spoken tongues. However, one startup is bringing AI translation to the realm of sign language, using computer vision and natural language processing (NLP) to interpret the gesture-based communications of American Sign Language (ASL).
SignAll is a small Hungarian company with big ambitions for translating ASL. SignAll is taking on the uniquely difficult task of turning gestures into spoken words, as reported by TechCrunch.
“It’s multi-channel communication; it’s really not just about shapes or hand movements,” CEO Zsolt Robotka said to TechCrunch. “If you really want to translate sign language, you need to track the entire upper body and facial expressions — that makes the computer vision part very challenging.”
The complexity of this challenge is illustrated by the sophistication of the hardware used to detect the subtle motion of sign language. The SignAll system incorporates a Kinect 2 sensor, which is used to control Microsoft’s Xbox One video game console. The system also employs three RGB cameras.
“We need this complex configuration because then we can work around the lack of resolution, both time and spatial (i.e. refresh rate and number of pixels), by having different points of view,” chief R&D officer Márton Kajtár said to TechCrunch. “You can have quite complex finger configurations, and the traditional methods of skeletonizing the hand don’t work because they occlude each other. So we’re using the side cameras to resolve occlusion.”
The complexity of the ASL translation process is further exacerbated by the need to watch facial expressions and variations in gestures that can change the meaning of signs.
“The nature of the language is continuous signing. That makes it hard to tell when one sign ends and another begins,” Robotka added. “But it’s also a very different language; you can’t translate word by word, recognizing them from a vocabulary.”
To accommodate this fluid form of communication, SignAll’s system works by translating complete sentences, rather than individual words. Translating each sign individually in a sequential fashion would generate misinterpretations or would result in interpretations that are excessively simplistic.
SignAll is planning the initial public pilot of the system at Gallaudet University, a Washington, D.C. school for the deaf. The company will install a translation booth at the university to promote communications between hearing people and hearing-impaired staff.
Nirel Marofsky is project analyst for the cognitive engine and application ecosystem at Veritone. She acts as a liaison to strategic partners, integrating developers and their capabilities into the Veritone Platform. Learn more about our platform and join the Veritone developer ecosystem today.