I wrote this for the online language learning company Lingoda. You can read the full version on their blog.
Natural Language Processing is how machines understand human language. As a branch of Artificial Intelligence, the field of Natural Language Processing (NLP) plays an important part in making interactions between humans and computers easier. We’ll give you an NLP overview and explain how machines mimic the very same way you learn a new language.
Natural Language Processing explained
Natural Language Processing or NLP for short is present in everyday interactions you have with all sorts of machines. When you type a question into a search engine, NLP analyses your search intent to deliver relevant results. Virtual assistants such as smart speakers or chatbots rely on Natural Language Processing to interact with you. Further NLP applications are auto-generated translations and captions, sorting of messages, checking of spelling and grammar, recognition of handwritten or printed text, and text-to-speech output.
How does Natural Language Processing relate to Artificial Intelligence?
Artificial Intelligence is a broad term for simulating or mimicking human intelligence. AI systems can have learning capabilities which follow the human process: learning by example, trial-and-error and problem solving. Machine Learning is the subset of AI that deals with applied algorithms teaching computers how to learn, often from large sets of data. Machine Learning is a process: the computer learns and improves how to do a task, but hasn’t been explicitly programmed to do the task a certain way.
Natural Language Processing uses Machine Learning to teach computers to understand and translate human language. The more they learn, the better they can make sense of text in spoken or written form, classify or rearrange it, translate it, and interact with it.
How machines learn language the same way you do
So how does Natural Language Processing work? Machine Learning is not much different from the way you learn a language with the one exception that computers are able to handle and review a lot more examples, data, that is, in a much shorter time.
Modern Machine Learning uses neural networks, which use artificial neurons for signal transmission modeled after the human brain. In simplified terms, a neural network learns by training itself to improve the accuracy of results through minimisation of errors. The learning process itself consists of reviewing large sets of examples.
The individual tasks Machine Learning neural networks perform to get better and better at Natural Language Processing are very similar to what you do when learning a new language. In other words, a computer follows the same “tricks” as humans to better understand language, although on a different scale.
Syntactical analysis in NLP
Syntax is the linguistic term for the rules and principles regarding sentence structure and word order in a language. Natural Language Processing parses sentences to identify sentence structure and how words relate to each other. The following tasks are part of syntactical analysis:
- Segmentation: The separation of text into individual chunks or tokens, therefore also called tokenisation, to make the handling of text easier. These can be words or sentences. Segmentation in English and other languages with spaces separating words, this task is straightforward, but look at written Chinese or Japanese and you’ll require additional knowledge for segmentation.
- Lemmatisation and stemming: Both processes reduce words to a base form, a lemma or a stem, through a dictionary or a set of rules. You do the same when you’re trying to recognise words you know without inflectional endings or to identify the stem or infinitive of a verb in a sentence.
- Tagging: Within a sentence, identifying parts of speech or POS is called tagging. When you learn a new language, labeling parts as noun, verb, adverb, adjective, object etc. can be helpful to better understand sentence structure and to break down complex structures.
- Word removal: So-called stop-words occur frequently and add little or no semantic value, such as “like”, “yours” or “I”. Humans also tend to ignore stop-words during learning and focus on the “meat and bones” of difficult sentences instead.
Semantical analysis in NLP
In linguistics, semantic analysis relates syntactic structures to their meaning. It begins with the relationship between individual words, but also includes common word combinations, idiomatic speech, figures of speech and meaning in context. As you might have guessed, semantic analysis is the part of Natural Language Processing which is harder to master for Artificial Intelligence. The main methods to look at meaning are:
- Lexical analysis: This is looking at the meaning of individual words in context.
- Word sense disambiguation: Most words in use in a language have more than one meaning. Through disambiguation, we choose the one which makes the most sense in a given context. The better humans know or understand a language, the more intuitive this process is.
- Relationships: Through extraction, Natural Language Processing attempts to understand the meaning of text by following relationships between entities, places, people etc. Sometimes, this can seem closely related to semantic tagging: the question “who married whom” can be solved by correctly identifying the noun and object in a sentence, but relationships can carry more complicated connotations as well.
More use cases for Natural Language Processing
Apart from the aforementioned intelligent assistants, translation, speech recognition and grammar tools, NLP has many more use cases such as:
- Sentiment analysis: NLP can classify emotions in text as positive, negative or neutral. Facebook does that with user-generated content, but brands also use it to understand how customers feel about their products.
- Text extraction: NLP can find relevant terms in a body of text of any size and extract or further process them.
- Topic classification: A text can be separated into individual parts according to distinct topics.
- Document handling: This enables users without knowledge of programming or AI training to tell a computer what to do with a stack of digital or virtual documents, for example form processing or calculating costs, returns etc.
- Text Generation: Though the art produced by Artificial Intelligence is still of questionable quality, Natural Language Processing can generate legible and meaningful text, for example a summary of sports results. With a sample size large enough, NLP can imitate the style of a specific author and rewrite text accordingly.
You can read this post on the Lingoda blog.