Anyone who has ever tried to learn a foreign language can tell you that learning a new language is hard. Languages are layers of complex rules that often break, depending on context. Even if all of the rules were true all of the time, a great deal of brain power is required to comprehend vague articles, illogical clauses, irony, and figures of speech. Humans are remarkably good at interpreting pieces of language that are illogical or don’t follow the rules. People are able to use past experience, context, and facial expressions to fill in the holes and interpret the speaker’s intent. A classic example of the complexity of language is “When the hammer fell on the glass table, it shattered.” As a human, it is pretty obvious that the hammer did not shatter—the glass table did. However, if I didn’t know that a hammer is made of metal, and glass shatters when sharply struck by a solid object, I could think that the hammer shattered.
How are we supposed to communicate this nest of tangled rules, structures, contexts, and logic to a computer? The study of teaching a computer how to read and speak “like a human” is called natural language processing (NLP). This is how tools like Siri, Alexa, and any other voice-activated software work (or don’t work, depending on the question you ask). Natural language processing is incredibly important in shaping how humans understand and interact with technology. In this post, I’m going to talk a little bit about what makes language difficult in general and identify some of the current challenges and strengths of natural language processing.
What is natural language processing good for?
An image of an Amazon Alexa on a table.
Why would you want to encode human language for a computer to understand to begin with? What is the point of spending all this time trying to create a method for computers to understand human language?
Well, you’ve probably got one answer to that question sitting in your pocket, and quite possibly another on a shelf at home (39 million Americans do). Natural language processing is the key to tools like Siri and Alexa (or Google Home or Amazon Echo—everything in that camp).
But being able to talk to your phone isn’t the only, or even the most commonly experienced, application for NLP. It is also the secret to how spam filters, screen readers, and automatic language translation work. In all of these cases, a set of algorithms is used to translate spoken or written words into something a machine can understand and react to. These algorithms are the core of NLP. While some of the algorithms are very complicated, most NLP begins with a few simple functions.
Think back to elementary school. Did you ever diagram a sentence? That same set of tasks is usually the beginning of NLP. If you need a refresher on what sentence diagramming entails, you begin by identifying all the words (parsing or tokenization), determining linguistic root terms (lemmatization and stemming), labeling parts of speech (part-of-speech tagging), and identifying the parts of the sentence like subject and predicate (semantic relationship detection). There are many sets of rules and algorithms beyond this, but almost all NLP starts with this core.
Now that we have identified why natural language processing is valuable, we can look deeper into what makes NLP (and language itself) so challenging.
Why is learning a language hard?
Of course, some people don’t find learning a language difficult at all (usually people who have already learned a second language), and most of us don’t remember how challenging we found learning our first languages. That being said, learning a new language isn’t usually a walk in the park. There will always be challenges because languages are complicated.
When thinking about the complexity of a language, there are two major components: lexicon and grammar. A lexicon is a collection of words within a language. Words represent a unit of information, a sound, an idea, or an analogous concept within a language, though they are not the smallest “category” in linguistics. A key part of the definition of a word is that it usually cannot be divided into a smaller unit that produces meaning. The smallest meaning-producing units are called morphemes, which includes some words in addition to suffixes and prefixes. A single word can be a morpheme, though some words are made up of several morphemes, and some morphemes are not words. The study of how morphemes contain meaning is called morphology.
The lexicon of a language is usually enormous. Let’s take English as an example. English is an old language that is a hybrid of several different root tongues. This means that English has collected many words over the years—the Oxford English Dictionary has more than 200,000 entries (though around 47,000 are obsolete). An average English speaker’s vocabulary typically contains around 20,000 words, which means that one person usually knows about a tenth of their own language. So, language is hard because there are a lot of words which can be used in many different ways (e.g., “run” is both a noun and a verb)—the sheer volume of terms to be remembered is a monumental task for any person.
Words are one of the smallest and most simple parts of the construction of language. The next step is to use rules to order words into a pattern that makes sense to other listeners. This is the role of grammar. Like lexicon, grammar can be broken down into a few more technical concepts. These are worth briefly explaining because they’ll come in handy when we are exploring why natural language processing is challenging.
This diagram shows the major levels of a language.
First, there is syntax. Syntax is the study of how words and phrases are put in an order that can communicate a concept to the listener. Syntax and morphological rules make up grammar. This is a fancy way of saying that the way words are constructed with prefixes and suffixes can be combined with the order that they are said/written to make meaning.
An example of a morphological difference is that redo and undo have different definitions due to their prefixes (re- and un-). A syntactical difference is the difference between “The girl bit the dog” and “The dog bit the girl.” Even though the words are the same between the two sentences, they illustrate very different scenes. Syntax can also include the formation of phrases: “She walks.” is correct, while “She buys.” is not. Syntax and grammar rules can be especially difficult for learners to pick up.
There is also a layer of meaning beyond syntax—two, actually. Semantics and pragmatics are the studies of the meaning of sentences and the contextual information of sentences, respectively. An example of a semantic study is deciphering the meaning of “it” in the sentence “When the hammer fell on the glass table, it shattered.” Pragmatics is more focused on the context. For example, determining the difference between someone saying “what a flood!” when talking about a gallon of spilled water versus when referring to the one-hundred year flood that washed away their town.
For a native language speaker, semantics and pragmatics are usually some of the least challenging parts of understanding language. Once a speaker has a grasp on the lexicon and has mastered the rules of grammar, they can usually understand what another speaker means. Computers, on the other hand, do not necessarily share human learning patterns.
Why is natural language processing hard?
Let’s go back to the example I used in the beginning: “when the hammer hit the glass table, and it shattered.” You know that the glass table shattered, not the hammer. Why? Because you understand a vast number of fairly complicated things, and in combination, they provide the context you need to decode the ambiguous “it.” For example, you know that hammers are made of metal, and metal is hard. You know what glass is, and you know it can shatter. On a more complicated level, you can imagine how thick a glass table top is, so you understand that it is delicate. All of this is easy to explain to a human, but very challenging to teach a computer. The computer lacks the context and background information needed to understand that ambiguous “it.” Essentially, computers are not good at understanding more complex aspects of semantics and pragmatics.
Now, is all semantics out of reach? Certainly not. Some grammar rules are easily expressed. For example, a language program could easily be coded to know that verbs in the same class as “buy” need a subject and an object, while verbs in the same class as “walk” only need a subject. However, a computer would have a much more difficult time determining the difference between the exclamation “Fire!” in a crowded theater and “Fire!” in a sharp-shooting practice.
Left: An image of a house on fire. Right: An image of a sport shooter during a competition.
While it certainly isn’t perfect, natural language processing has some significant strengths. A lexicon of several hundred thousand words is no problem for a computer. It can easily form proper sentences if words and definitions are tagged with parts of speech, and it can be given a set of complete grammar rules. Using machine learning algorithms and a corpus (a large library of words) of common English, a computer can understand which terms are common and easily understood and which are too complicated for everyday usage.
Though it is a generalization, we could say that computers are good at building lexicons, most syntax, morphology, and, typically, semantics. They are not so good at semantics when there is a vague article or background knowledge is needed to understand the sentence, and it is very difficult to encode pragmatics.
While natural language processing has grown immensely and made massive strides, a new field is forming. It has been dubbed natural language understanding (NLU), and is even more intense and machine learning dependent than NLP. The goal of NLU is to advance natural language processing to a point at which it can decode the subtleties of human language—the semantics and pragmatics. To expand NLP to NLU, deep learning algorithms have to be optimized and a more complete corpus of language must be used to refine the machine learning models.
For the time being, NLP is the core of many everyday tools, but more development will continue to revolutionize the way humans and technology interact.
Other useful sources: