Have you ever wondered how Amazon Echo understands what you’re saying even though you asked it something in a different way than your spouse? How does it know what you mean when you say “Whats the weather going to be like tomorrow?” versus “Tell me if its going to rain later”?
Perhaps you’ve wished to have this same search capability in your app or software solution, and came across this post looking for a way to solve this without hiring a machine learning engineer. Or perhaps you’re just curious as to how this type of search gets solved. The technological dynamic behind these capabilities is called Natural Language Processing.
You might think that existing solutions simply listen for key words, and then perform search on those words. And in a sense, you would be right. But the traditional regular expression search breaks down when you’re trying to imbue relevance and context. For example, the words ‘later’ and ‘tomorrow’ really might mean the same thing. You need Machine Learning and Natural Language Processing software to understand the context of those words, how they relate to each other, and how they can be used to deliver an answer to the user.
In order to produce a NLP model, you need to think about things like morphological segmentation, part-of-speech tagging, parsing, sentences, terminology, etc. You need to decide on things like word-similarity. For example, searching for the word ‘frog’ should return results with the word ‘toad’ in them before they return results with the word ‘chair’. You have to essentially do this with the entire English language. You then need to understand the differences between people, times, ordinals, places, locations etc. Searching for ‘pictures of Paris’ has a lot of context to it that needs to be understood to return results. First of all, you want pictures, so your search system needs to only show images as results. Then, images of a particular place can use metadata such as “location=Paris” as opposed to “name=Paris”. Your user will see much more relevant results if Natural Language Processing is used.
As you can imagine, there are a lot of use cases around understanding text as it relates to search. There are few software solutions out there that don’t have to deal with a tremendous amount of unstructured text to index and search. Using Natural Language Processing is critical to be able to return relevant results in a timely fashion.
There are a lot of Natural Language Processing APIs out there to try, with varying degrees of accuracy and cost. Always keep in mind that you’ll be analyzing a remarkable amount of data and making tons and tons of API calls, so pick a service that can handle the load and not break your bank. You might want to consider Textboxby Machine Box. Its really easy to use, no machine learning knowledge required. Its also built to scale to meet tremendous demands on processing and analyzing lots and lots of text. It runs anywhere, integrates into any environment, and costs the same no matter how much you use it.