Google has made available the speech-recognition technology that underlies Google Assistant to anyone who wants to add transcription in their own products.
The Google Cloud Speech application programming interface (API) is now generally available for all third-party developers, as reported by Android Authority. Google noted that the Cloud Speech API has already been used for various purposes by different companies.
For example, Clarion of Japan has capitalized on the API to add voice control to its vehicle navigation and entertainment systems. Houston-based telephony applications firm InteractiveTel also has employed the Cloud Speech API for the transcription of its dealers’ interactions with customers.
Google said the API is suited for any kind of application that needs to accept commands, such as phones, PCs, tablets and internet of things devices, including consumer electronics products.
Google Cloud Speech powers Google Assistant, a smart speaker that serves as a voice-controlled home digital assistant. Responding to natural-language commands, Google Assistant can do everything from answer questions, to order goods and services online, to control smart-home features like lights, appliances, and thermostats.
Google Assistant presently is competing Amazon’s Alexa for leadership in the smart-speaker segment. While Alexa presently dominates the market, Google Assistant eventually will take the lead, partially because if its widespread deployment on different platforms, IHS Markit predicts.
The Cloud Speech API can comprehend more than 80 languages and dialects, according to Google. The algorithm is designed to improve its accuracy and features over time.
Improvements include better transcription accuracy for longer audio files, faster processing time, and the support of more file formats.
Google said the Speech API can perform transcription in real time. The API can stream text results, returning partial recognition results the instant they become available, with the transcribed text appearing immediately as the user is speaking. Speech API also can generate text based on files.
The API also retains a high level of accuracy in noisy locations, Google said. This eliminates the need for signal processing or noise cancellation electronics in speech systems. It also allows the creation of transcription systems for all kinds of purposes, including those set in places with a high volume of noise, such out outdoor venues.
To work effectively in different types of devices and apps, speech recognition can be customized for its context by using specific sets of word hints. These hints, consisting of a set of words and phrases likely to be spoken, improve accuracy for a specific use.
The speech API also can filter inappropriate content, and can integrate audio files with Google Cloud Storage.
Users and prospective developers can test the API on the Google Cloud Speech API site.
Tyler Schulze is vice president, strategy & development at Veritone. He serves as general manager for developer partnerships, cognitive engine ecosystem, and media ingestion for the Veritone platform. Learn more about our platform and join the Veritone developer ecosystem today.