Text Analytics

Into the Fire - A no less somewhat less nonsense introduction to NLP

Natural Language Processing? - What is NLP?

Language is messy. In our attempts to convey meaning, and emotions to each other, we have come up with some extraordinarily complex structures that need years of learning to grasp. There are countless rules and even more exceptions to those rules but somehow we manage to communicate with each other. The name, scientists have come up with for mess is natural language.

And then there are computers, machines that require a lot of structure to work. NLP is the attempt to make those two worlds meet, to have computers parse, process, and understand the language we use in our daily (natural) lifes. In the coming articles we will have a look at tools, techniques, and methods that help us deal with the chaotic complexity of natural language. We will see the many ways in which NLP will make dealing with language easier, one method at the time. Today we will start with the first:
Regular Expressions

Searching through a piece of text doesn’t sound like a task one would need a lot of fancy NLP technology for, but we recently had a case where this was actually necessary:

One of our customers asked us if we could help them with searching for specific terms in their documents. Their task was to deal with contracts and requirements, where missing even a small detail can potentially cost millions down the line. Additionally, these documents are usually very long, sometimes several thousands of pages. So if one where to simply Ctrl + F for a specific term one might get hundreds and hundreds of results, most of which irrelevant. 

New advances in text analytics make the tech news nearly every week, most prominently IBM Watson, but also more recently AI approaches such as ELMo or BERT. And now it made world news with the pandemic caused by the Covid-19 virus, with the white house requesting help via NLP.

Text Analytics and Natural Language Processing (NLP) deal with all types of automatic processing of texts and is often built on top of machine learning or artificial intelligence approaches. The idea of this article is not to explain how text analytics works, but instead to explain what is possible.