You have been redirected to the Veritone Channel Partner Portal located within


Veritone aiWARE-Delivered Cognitive Engine Intelligence

With our operating system for AI, you can get cognitive engine delivered intelligence in two ways: either packaged within industry-specific solutions or in custom combinations via our open API.


Over 10,000 Unique Cognitive Engines Comprise The Fabric Of AI

We built a machine-learning ecosystem of third-party and native cognitive engines that give you the breadth of capability and depth of specialization needed in AI applications to meet virtually any use case.

We continuously monitor cognitive engines in the market so we can give you the best machine-learning toolkits for custom development. Learn how to apply our ecosystem’s different cognitive capabilities within these classes.


When you apply artificial intelligence to recordings of human language, it locates, captures, identifies, and categorizes the spoken word quickly, extracting insights previously hidden in unstructured files.

Discover how our ecosystem’s speech-processing cognitive engines automate transcription, language identification, speaker separation, and keyword-spotting capabilities for your audio and video content.

  • Transcription

    Transcription engines convert spoken words in audio and video recordings into readable and searchable text. They are built and trained to transcribe different languages, dialects, and slang words.

  • Language Identification

    Language identification engines pinpoint the natural human language(s) spoken in an audio recording.

  • Speaker Separation

    Speaker separation engines distinguish between multiple speakers in audio and video by partitioning recordings and streams into multiple segments according to speaker.

  • Keyword-spotting

    Keyword-spotting engines find specific words in an audio recording without producing a transcript.


Natural language processing (NLP) provides advanced text analysis enabling systematic functions such as recognizing the emotion behind an author’s words and transforming words into a dialect someone from a different region can understand.

Learn how our language-based cognitive engines perform translation, text-to-speech, named-entity recognition, summarization, content classification, language identification, and sentiment automation for your written text files.

  • Translation

    Translation engines translate written and transcribed text from one language to another. Specialized machine-learning algorithms for not only languages but dialects and slang dramatically increase the accuracy of translated output.

  • Text-to-Speech

    Text-to-speech engines produce spoken words from text, including recognizing voice gender and some accents.

  • Named-Entity Recognition

    Named-entity recognition or entity extraction engines classify things found in text into predefined categories, such as people, organizations, and locations.

  • Summarization

    Summarization engines generate a synopsis or condensed version of written text.

  • Content Classification

    Content classification engines categorize one or multiple documents according to a pre-defined relationship framework or ontology.

  • Language Identification

    Language identification engines pinpoint the natural human language(s) spoken in an audio recording.

  • Sentiment

    Sentiment engines discern the tone behind a series of written words, which helps you gain an understanding of the attitudes, opinions, and emotions expressed.


Image-processing machine-learning algorithms intelligently identify, extract details from, and segment your pictures and videos. They also tag images with explicit or inappropriate content.

Find out how our ecosystem cognitive engines enable you to employ object detection, object recognition, visual moderation, optical character recognition, and scene-break detection for your visual media.

  • Object Detection

    Object detection or image recognition engines detect the presence of multiple objects or concepts within video or still images, including vehicles, people, animals, and more.

  • Object Recognition

    Object recognition engines identify multiple objects within video or still images, including logos, vehicles, landmarks, license plates, colors, food, art, apparel, weapons, and more.

  • Visual Moderation

    Visual moderation engines tag images and video that likely contain explicit content, such as sexual or gory imagery, not appropriate for some audiences.

  • Optical Character Recognition

    Optical Character Recognition (OCR), also known as text recognition, extracts text from an image, video, or document. OCR encompasses license plates, on-screen text, and documents of various types.

  • Scene Break Detection

    Scene-break engines segment video files by identifying each instance of a scene change, such as in a movie or TV show.


Biometrics analyzes the unique physical identifiers that make people who they are. AI harnesses this technology to detect the presence of a face or identify a specific person within photographs and video. When provided a library of known individuals, cognitive engines can learn to pick out most instances of a celebrity, known offender, employee, or more within visual content.

Discover how Veritone’s ecosystem of cognitive intelligence provide face detection and face recognition services for your image-based media and evidence.

  • Face Detection

    Face detection engines detect the presence of individuals in video or still images.

  • Face Recognition

    Face recognition engines detect the presence of individuals in video or still images and can recognize specific people based on a library of known faces.


Much like the human brain learns to detect and recognize meaningful patterns in sounds, artificial intelligence can capture audio signatures in order to detect future occurrences. These signatures or fingerprints are a compact representation of music, environmental sound, advertisements, and more, all of which can be differentiated from other noise.

Learn how our ecosystem of cognitive engines performs audio fingerprinting on your audio files.

  • Audio Fingerprinting

    Audio or acoustic fingerprinting engines generate a condensed digital summary as a reference clip, which they then use to quickly locate the same item (such as an advertisement) within multiple media files.


Data volumes are growing exponentially, and extracting actionable insights from them requires complex and time-consuming analytics. Cognitive data-analytics engines uncover location metadata and associate common data-sets to extract meaning.

Explore how Veritone’s ecosystem of cognitive engines conduct geolocation and correlation for your varying data types.

  • Geolocation

    Geolocation engines identify the geographic location of a person or object in the real world or some virtual equivalent.

  • Correlation

    Correlation engines associate two structured datasets based on some commonality, such as time or location, and produce a cohesive structured output.


Data can be structured, unstructured, audio, or visual. Yet, due to the lack of harmony among different datasets, the data often must be transformed to create synergy or eliminate certain elements before you can use it.

Learn how aiWARE maps complex information and data and uses carefully orchestrated visual redaction and transcoding engines to create heterogeneous datasets you can use.

  • Orchestration

    To meet your cost, accuracy, and speed requirements, Veritone ConductorTM,the aiWARE orchestration engine, extracts features from data files and intelligently routes them through the most effective combination of cognitive engines across an individual cognitive class.

  • Visual Redaction

    Visual redaction engines sensor or obscure defined parts of an image or video, such as an individual’s face, license plate, nudity, or sensitive images.

  • Transcoding

    Transcoding engines convert a data file from one format to another for video, audio, and text files.