Veritone Voice FAQs

Return to Veritone Voice


What is Veritone Voice and who is it built for?

Veritone Voice is a hyper-realistic synthetic Voice as a Service (VaaS) solution that allows content creators and owners across industries to securely and ethically create, distribute, and monetize synthetic voices.

Veritone offers a custom synthetic voice cloning solution that allows users to securely create verified custom synthetic voices that can be created in many different languages.

In addition to custom voices, the Veritone Voice self-serve application enables users to create voice projects from a library of over 300+ stock voices and 70 premium voice-over artists across more than 150 languages.

What benefits will Veritone Voice deliver for customers and prospective customers?

As a complete end-to-end solution for synthetic voice, Veritone Voice gives users a complete suite of voice capabilities including voice creation, management, licensing, rights and clearances, workflows, and monetization.

Here are a few industries and divisions that can benefit from Veritone Voice:

  • Advertising: Create content at speed and scale in multiple languages. With no need to schedule live recording time, you can produce new content on demand using custom voice models for celebrities, athletes, influencers, broadcasters, and sports announcers. All you need is their consent.
  • Audiobooks & Publishing: Bring great stories to life with AI voice-over. Capture and reuse the unique voices of known talent for engaging, lifelike audio, or create consistency with stock voices. Adjust pitch, tone, and speed to produce nuanced narration, with translation into 150 languages. Reach new audiences and scale production without losing authenticity.
  • Broadcasting: Connect with audiences in multiple languages and regions by synthetically reproducing broadcasts in the original announcer’s voice. Augment talent and increase variation for news and weather broadcasts or public announcements. Manage production costs without impacting audio quality.
  • Corporate Communications: Lend authority and familiarity to corporate communications by replicating the voices of company leaders. Translate speech into multiple languages with dialect and accent options to show your employees you care. Speaking their language facilitates a powerful connection and a sense of loyalty.
  • eLearning & Training: Make online learning and employee training materials more engaging with a recognizable celebrity or corporate leader’s voice. No need to pay for studio time or juggle schedules, just get their approval and create content on demand. Increase attentiveness and retention while minimizing production costs.
  • Film and TV: Create captivating voice-over content for film, TV, and video games—easily, and at scale. Make content more accessible and inclusive with narration and audio descriptions for the visually impaired. Reach new audiences with authentic dubbing in the original talent’s own voice. Edit easily and scale rapidly. 
  • Podcasting: Reach new markets in record time by localizing podcast content using the original host’s cloned voice. The Veritone Voice Network is a multilingual custom AI voice solution purpose-built to help podcasters expand their listener base and boost ad revenue at scale. 
  • Sports: Bring the voice of a beloved sports announcer, athlete, or sports personality to new markets at rapid speed and scale. Once you have their approval, you can create real-time audio content in multiple languages in the voice audiences already love. Our uniquely secure voice cloning is taking sports to new heights, and new places.

For more industries, please visit Veritone Voice.

What business problems are Veritone Voice solving for?

Veritone Voice allows content creators the ability to produce truly lifelike AI voice at unmatched speed and scale; create content on demand using text-to-speech or speech-to-speech input; reach new audiences in localized languages, in real-time, with branded voices.

Custom Voice Cloning:
Produce voice-over content without juggling schedules or paying for studio time. Clone voices including celebrities, sports announcers, and public figures—all you need is their consent. Create localized content on demand using text-to-speech or speech-to-speech input.

Enterprise Workflows:
Take advantage of Veritone’s proven AI expertise to optimize your voice automation output and succeed at scale. From enhancing metadata to generating dialogue, we use best-of-breed AI to deliver the best possible results from end to end.

API & Real-time voice
Extend the power of true-to-life, real-time AI voice across all your products and projects. With our world-class AI voice API, you can save valuable time and automate at scale by connecting Veritone Voice directly to any app. 

Stock & Premium AI Voice
Start creating your own text-to-speech synthetic voice projects right away. Choose from more than 300 stock voices or 70 premium options for a voice your audience will recognize. Translate into over 150 languages and customize intonation, gender, dialect, and accent.

How is Veritone Voice unique from existing synthetic voice vendors?

Veritone Voice supports both text-to-speech and speech-to-speech modalities giving clients the ability to create voices for all of their voice projects. With Veritone’s VaaS solutions, Veritone Voice offers a comprehensive suite of integrated voice features including voice creation, voice management, voice licensing with rights and clearances, voice workflows, and voice monetization.

Veritone Voice is built on Veritone’s proprietary enterprise AI platform, aiWARE. For an additional fee, users can leverage these cognitive engines, such as translation and transcription and combine them with advanced automated workflows to deliver transformed audio, at scale.

How will Veritone protect against “deepfakes” and potentially malicious intentions?

Ownership of one’s voice and protecting their IP is critical. We want to make sure that we not only help our clients generate licensing opportunities but also ensure they have the necessary support to navigate rights and clearances. This will ensure their name, image, and likeness are only being used by approved parties that maintain high standards.

Veritone Voice safeguards include regulated processes and checkpoints to ensure proper rights, clearances, and pricing are followed. Added IP protection includes inaudible watermarks and proprietary tools to help ensure content can only be accessed after permission is granted.

The voice creation process includes both written and verbal consent verification. Once created, the talent has the right to approve all synthetic recordings. All created recordings include an inaudible watermark that Veritone can verify.

All voice training data and voice models are stored in a highly secure, proprietary digital asset management platform, ensuring the protection of your data.

Only authorized users will have access to create new clips, and all clip creation is tracked at the user level. The voice model code only works on Veritone systems and cannot be deployed anywhere else.

If at any time, the voice owner would like their voice clone deprecated, Veritone will destroy the voice model.

Will synthetic voice raise questions about authenticity?

For Veritone Voice clients, synthetic voice is a powerful tool that can be used at the complete control of the voice owners. Some clients may use synthetic voice for localization or limited to production editing, but Veritone Voice can also be used for complete end-to-end production. The voice owner has full control, who knows their audience best.

As a best practice, we recommend adding disclaimers so the audience is fully aware that they are hearing a synthetic voice.


What is Voice as a Service, and what capabilities will it deliver?

Veritone’s VaaS solutions, Veritone Voice offers a comprehensive suite of integrated voice features including voice creation, voice management, voice licensing with rights and clearances, voice workflows, and voice monetization.

What is the difference between text-to-speech vs. speech-to-speech processes?

Text-to-speech (TTS) is the process of producing synthetic speech from a text file.

Speech-to-speech (STS) is the process of producing synthetic speech from an audio file.

What stock AI voices and languages are immediately available?

Veritone Voice offers a rich marketplace of over 300 stock voices that is immediately available to customers. You may choose voices from a broad and diverse marketplace of genders, over 150+ languages, numerous accents, and stylize each voice so that it suits your needs. Additionally, select over 70 recognizable voice-artist approved AI voices, available to license at additional cost.

What is required to create your own custom AI-generated voice?

Custom voice creation is supported by our managed services team. To start, the voice talent or individual whose voice will be recorded and used to create a custom voice model must explicitly consent (verbal and written) to the creation of their voice model. If the voice talent is deceased, the estate as well as the IP owner if not the estate must provide explicit consent.

  1. Secure Consent
    As ethical cloning pioneers, we never build a voice model without approval. The individual whose voice will be used must provide their explicit consent. If the talent is deceased or in the public domain, the estate or IP owner must sign off.  
  2. Input pre-existing or newly recorded audio content
    Next, we need about three hours of high fidelity, isolated audio recording which we’ll use to train the model. We can use pre-existing audio or provide scripts to record. Content should model the desired output style, and multiple models can be built to accommodate different styles and languages. 
  3. Customize voice content
    Once the model is built, you can use the self-serve app for both text-to-speech and speech-to-speech content creation in near real-time. Or work with our experts to manage your output needs. Additional models in new languages can be built in about two days. 

Can I choose the specific voice engines to create my AI-generated custom voice?

Veritone Voice currently has access to market-leading voice engines that’s growing daily. A member from our managed services team will assist with the proper identification of these models based on use cases.

How does Veritone Voice store the voice training data and voice models?

All voice training data and voice models are stored in a highly secure, proprietary digital asset management platform, ensuring the protection of your data.

Is Veritone Voice available on both desktop and mobile?

Veritone Voice is mobile-responsive and built for any browser on desktop and mobile.

At this time, Veritone Voice does not have a mobile app.


What is the cost for subscribing to Veritone Voice?

Custom Voices

Starts at $9K/ per voice USD
Contact us to get started

Enterprise Workflows
Contact us for details

Stock & Premium Voices
Starts at $500/mo USD
Contact us to get started

API & Real Time Voice
Contact us for details


How will Veritone protect my AI voice model and its usage?

Our team of experts works closely with you and your team to thoroughly define a master services agreement or platform licensing is determined.

The VaaS solution includes such features as inaudible watermarks, the automated inclusion of a copyright tone; traceability, the ability to track the components used to replicate your voice clips; licensing protocols, regulated process and checkpoints to ensure proper rights, clearances, and pricing are followed.

Once a synthetic voice model has been created, can anyone use it?

No. Veritone has built-in licensing protocols to ensure custom voices are only being used by approved parties that maintain high standards.

How will Veritone ensure synthetic voice is not misappropriated and that content rights are not violated?

For custom voice models, Veritone manages the model creation from end-to-end along with the production of audio files that use the model. All requests for synthetic content creation will come into the experienced Veritone Voice managed services team and only be produced with prior audio and written approval from the voice owner.

What if I don’t want to have a custom voice model anymore?

Your voice is made into a code, and that code only works on Veritone systems. If you decide to stop using it, we destroy the code of your voice and provide receipt of destruction. It will no longer exist on our servers or be available anywhere, it will be deleted.

How does Veritone disclose to listeners that a synthetic voice is being used?

Working with the Open Voice Network, IAB, and other governing bodies Veritone will adhere to best practices to protect consumers, and IP (voice) owners.

Depending on the application of synthetic content, the listener may or may not know it’s synthetic. For example, a celebrity authorizing the use of their voice model to fix a bit of audio in a movie, or if they use their voice on content in a foreign language rebroadcast with localized translations, the audio file might go without official notice.

It is a best practice to offer a disclaimer for consumers when synthetic voice is used for net new content particularly if a deceased voice is generated.

Consumer disclosures, in audio and/or visual, may be required when the voice model is being licensed and used for a paid endorsement or for government officials making public statements.

What steps is Veritone taking to ensure it’s following best practices when it comes to synthetic voice?

Veritone upholds a promise for good and is committed to working to address public concern and protect the intellectual property of the voice talent and advertising community. We will publish industry best practices and governance for synthetic content usage in public or commercial channels. In addition, Veritone is an active member of the IAB, the Open Voice Network, and other governing bodies as part of our efforts to develop global best practices for synthetic content.

Back to top  |  Veritone Voice