12.1.17 — Veritone

Stanford Index Reveals AI Image and Speech Recognition Equaling or Exceeding Human Levels

One of the biggest questions facing the world is exactly when artificial general intelligence (AGI) will achieve the technological singularity, i.e., the point when AI will equal and then surpass human levels. However, new findings from Stanford University indicate that artificial narrow intelligence (ANI) technologies like image and speech recognition are already attaining a technological singularity of their own, by meeting or beating human capabilities in specific areas.

The Stanford AI100 index shows that the best object recognition systems have already outstripped human performance levels of accuracy. Meanwhile, leading AI speech recognition algorithms have reached parity.

“In technical metrics, image and speech recognition are both approaching, if not surpassing, human-level performance,” Stanford stated in a press release. “The authors noted that AI systems have excelled in such real-world applications as object detection, the ability to understand and answer questions and classification of photographic images of skin cancer cells.”

The AI100 index estimates that object recognition in 2014 reached an accuracy rate of 95 percent, which Stanford believes is equivalent to human performance. Since then, the accuracy level has increased steadily and is closing in on 100 percent. Speech recognition hit the 95 percent level in 2017, according to the index.

Despite this progress, Stanford notes that AGI still has a long way to go before it truly achieves human-level smarts.

“AI has made truly amazing strides in the past decade, but computers still can’t exhibit the common sense or the general intelligence of even a 5-year-old,” said Yoav Shoham, professor emeritus of computer science at Stanford.

Tech giants like IBM and Google and emerging companies like Veritone have made major investments in developing singular AGI systems that can perform any intellectual task that a person can. However, these initiatives aren’t expected to result in true AGI until 2035 at the earliest, some experts predict.

On the other hand, hundreds of companies now are offering thousands of ANI cognitive engines, each of which can perform a single AI task, including image and speech recognition. When used in combination, these ANI cognitive engines can meet or exceed human accuracy in specific tasks. More significantly, when combinations of engines are orchestrated intelligently, they can approximate the capabilities of AGI.

For example, a robot could navigate the world, interact with people and perform complex tasks by employing separate cognitive engines including machine vision, object recognition, natural language processing, sentiment analysis, face detection and motion tracking.

With individual engines reaching or exceeding human levels, the ANI approach could soon yield products that approximate AGI-level capabilities.

Tyler Schulze is vice president, strategy & development at Veritone. He serves as general manager for developer partnerships, cognitive engine ecosystem, and media ingestion for the Veritone aiWARE platform. Learn more about our platform and join the Veritone developer ecosystem today.

Media + Entertainment

SOLUTIONS FOR

Public Sector

SOLUTIONS FOR

Talent Acquisition

SOLUTIONS FOR

Consulting Services

SOLUTIONS FOR

Managed Services

SOLUTIONS FOR

Platform

Media + Entertainment

Public Sector

Talent Acquisition

Stanford Index Reveals AI Image and Speech Recognition Equaling or Exceeding Human Levels