Microsoft’s has rolled out a new experimental app, dubbed Dictate, that enhances the voice transcription capabilities of the Office suite of productivity applications by using technology from the company’s Cortana artificial-intelligence assistant.
Microsoft already supports transcription with its Windows Speech Recognition application. However, Dictate leverages the advanced AI and speech-recognition technologies from Microsoft Cognitive Services, which are used in Cortana, as well as in Microsoft Translator.
The use of the AI technology gives Dictate new features, including the capability to accept a number of spoken editing commands. Spoken commands understood by Dictate include the capability to create new lines, delete, add punctuation and more to format the text.
The system responds to normal spoken commands like “delete” to correct and manipulate text and punctuation.
The app also can transcribe more than 20 languages and perform real-time text translation for up to 60 different tongues. Using Dictate, a user could speak English and have it transcribed and translated into French.
Dictate acts as an add-in that works with Outlook, Word, and PowerPoint for Windows. Users can download the app from Microsoft’s site.
Dictate is just the latest entry into the fast-growing speech-recognition segment. Competitors include Nuance Communications Inc., whose Dragon software can turn spoken words into text.
The global market for natural-language processing software, hardware, and services—including voice transcription—is set to expand to $2.1 billion by 2024, up from $277.2 million in 2015, according to the market research firm Tractica.
Microsoft is distributing Dictate as part of The Garage, the company’s development community for employees. The Garage is designed to promote workers’ creativity by allowing them to work on projects outside of their normal field.
Projects developed by The Garage include the Personal Shopping Assistant, an app that allows users to organize and compare deals for products. The organization also produced Project Lively, which users to collaboratively make changes to Office files without having to save them in in a cloud storage system.
Dictate and other Garage projects are leverage technology from Microsoft Cognitive Services organization. Cognitive Services was designed to proliferate the company’s AI application programming interfaces (APIs).
Microsoft’s Bing Speech API can recognize audio from a microphone in real-time. It also can process audio speech from a different real-time audio source, or to recognize audio from within a file. The API supports real-time streaming, allowed partial recognition results to be produced as the audio is being sent to the server.
In addition to transcription and translation, the organization offers APIs for emotion and sentiment detection, vision and speech recognition, language understanding and search.
Tyler Schulze is vice president, strategy & development at Veritone. He serves as general manager for developer partnerships, cognitive engine ecosystem, and media ingestion for the Veritone platform. Learn more about our platform and join the Veritone developer ecosystem today.