To win at AI you have to cheat

Having the most accurate and useful machine learning is not about having the best algorithms or the biggest data set. That’s so 2017!

Artificial intelligence and machine learning are fantastic new tools, and I’ve personally seen them used to solve a myriad of use cases and generate a lot of productivity for companies big and small. It enables things that weren’t previously possible like tagging faces in millions photos, filtering out spam from your inbox, and making vast swaths of media content more searchable.

But if you ask me to give you the most accurate machine learning model for your particular use case, I’m going to cheat.

I have aces up my machine learning sleeve

Allow me to explain.

Previously, the explosion of interest in machine learning has been partly due to the prevalence, availability, and ease-of-use of API services that connect users’ data with pre-trained models. Upload a photo of a celebrity, get back a tag of that celebrities name — simple!

These tools worked for some generic use cases, and also made for a great demonstration of the capabilities of machine learning, but this is only the very beginning of what is possible.

As some of you are finding out the hard way, giant, pre-trained machine learning models underperform on some use cases. The reason is that these big models weren’t trained on your use case, they were trained on generic datasets in an attempt to server as many different kinds of use cases as possible. You can throw a lot of different problems at these models, but at the cost of a higher degree of accuracy.

So what I do to improve the accuracy is cheat. I train a model on YOUR dataset, not some generic dataset I’ve gathered independent of your use case.

How can I do that too?

Machine learning is becoming ever more portable. With tools like Machine Box, one can start orchestrating thousands of machine learning models in their stack pretty easily. When you get to this kind of scale, you can start to spin up models for specific use cases on the fly, train them live, and then spin them down when they’re no longer needed or relevant.

This puts the spotlight on the dataset and the training. My goal is to give you as many tools as possible to turn your existing data into a training dataset, use it to train a model, deploy the model, validate it, and then let you improve it live (without having to redeploy).

This is where the best accuracy comes from. Let me give you an example;

Here, I’ve used the Veritone platform to upload a photo of Bill Murray and Dan Ackroyd to use for face recognition training.

Using Library within the aiWARE platform

When I then run the trained recognition against a video clip from Ghostbusters, I end up with some of the faces not being recognized.

Instead of trying to find a better algorithm, I just take a screenshot FROM that clip, add it to the Veritone library for that person, and re-run the engine.

Taking a screenshot with COMMAND-SHIFT-4 on a Mac

Notice that there are fewer unknown faces now as they’ve been correctly associated with the people they represent.

You are not going to find better training data for an algorithm than your own data. Before Machine Box and Veritone, unlocking the training potential of one’s own data was extremely difficult. But these tools have special technology to minimize the amount of training data you need, and they also solve deployment, scale, portability, and security so you can focus on making the best model for your use case.

What about errors?

Your first model won’t be perfect, in fact, no machine learning will be correct 100% of the time. What you do with the examples where it got the answer wrong is where you can differentiate your implementation of machine learning. At Machine Box, we encourage developers to build a feedback workflow back into the engine (the API has wonderful endpoints for this) to correct mistakes that come up. Does it say Dan Ackroyd when it should be Bill Murray? Snap the offending frame, label it and teach it to Facebox. You’ll be astonished at just how much this will improve the model.

Try it

Don’t just take my word for it, go and download the Machine Box tools now, or sign up for Veritone to try it on the platform to see for yourself. It’s really easy to do, all you need is a problem to solve.

To win at AI you have to cheat was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.