World’s largest grammar database reveals accelerating loss of language diversity

There’s a crisis unfolding in the field of linguistics: Global language experts estimate that, without intervention, about one language will be lost every month for the next 40 years.

A study published today in Science Advances debuts a grammatical database that documents the enormous diversity of current languages on the planet, highlighting just how much humanity stands to lose and why it's worth saving.

Known as Grambank, it is now the world’s largest publicly available comparative grammatical database. Initiated by scholars in the Department of Linguistic and Cultural Evolution at the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, more than 100 authors from 68 institutions (including CU Boulder) contributed to the years-long, global data project.

The analysis of more than 400,000 data points and 2,400 separate languages and dialects reveals that language loss is occurring unevenly across major linguistic regions of the world, with Indigenous languages in northeast South America, Alaska to Oregon, and in northern Australia at highest risk.

“Grambank is showing us the importance of working on language documentation and revitalization in order to preserve this legacy of human communication, culture and cognition,” said Hannah Haynie, co-first author of the study and assistant professor in the Department of Linguistics at CU Boulder.

Hannah Haynie, assistant professor in the Department of Linguistics at CU Boulder (Credit: Hannah Haynie)

Grammar 101

Grammar is simply the rules of a language: the words and sounds used and how they are combined and interpreted. Grammatical elements of a language include word order (if the subject goes before or after the verb), tense (present, past or future), comparatives (words that express “bigger” or “smaller”) and whether a language has gendered pronouns.

Over the past century, many researchers have studied languages, worked with their speakers and published books or other types of grammatical descriptions of languages. Grambank is built both on these research analyses and prior language databases, but compared with previous databases it is larger in scale and more thorough. It encodes 195 possible grammatical features for about 215 language families.

“Our understanding of grammar and what that tells us about humans is limited by what we can observe,” said Haynie. “We're putting those observations into this data set, and that allows for comparison.”

As there are currently about 4,300 languages with published grammatical descriptions—out of about 7,000 known languages in the modern world—Grambank is over halfway to encoding all possible grammar information that can be extracted from existing data sources, said Haynie.

‘Unusual’ languages

Using Grambank, the team found they could identify “unusual” languages: those that stray further from the averages in variation typically found in language, which often have no known sister languages. But they also found there’s nothing particularly unusual about endangered languages compared with those that are not endangered.

“A lot of fairly ordinary languages, in terms of their basic grammar, happen to be endangered for a variety of reasons,” said Haynie.

English, spoken around the world by 1.5 billion people, is actually “a pretty weird language” by Grambank’s standards.

“Some of the places with more ‘unusual’ languages are places like Europe and Northern Africa—languages that we, as English speakers, tend to be more familiar with,” said Haynie.

The bigger takeaway for Haynie is that none of the languages in the data set are identical. Of all 2,400 languages and dialects in the data set, only five match up the same using the grammatical code used to document and analyze them within Grambank. Though vocabulary may play a big role in the mutual unintelligibility that linguists rely on to determine what counts as separate languages, Grambank shows that the grammatical “fingerprints” of languages are also typically unique, she said.

“It means that every language is pretty darn special,” said Haynie.

Language always finds a way.”

Language loss

Language extinction has occurred throughout human history, but its speed has been accelerating due to social, political and economic pressures, said Haynie.

It’s as if, while mapping the human genome, scientists saw the genes themselves rapidly disappearing before their eyes.

“Right now we're at a critical state in terms of language endangerment,” said Haynie, noting the United Nations has declared this the International Decade of Indigenous Languages to try to promote language preservation, documentation and revitalization.

This global language loss is also not evenly distributed. Several regions are at higher risk of losing Indigenous languages such as Aleut in Alaska and Salish languages of the Pacific Northwest, Yagua and Tariana spoken in South America, and the languages of Kuuk-Thayorre and Wardaman native to Northern Australian communities.

“Indigenous languages here in North America, languages around us and on our continent, are some of the most endangered languages in the world,” she said.

Genealogy versus geography

One element that has been “hotly debated” within linguistics for years is the relationship between genealogy and geography in the development of language. That is: Which features in language are inherited from family and culture (genealogy) and which are more likely to be shared through contact among neighbors (geography)?

The Grambank analysis found genealogy seems to be consistently more important than geography—meaning the faithful inheritance of ancestral language plays a stronger role in shaping grammar in languages still spoken today than who someone’s geographical neighbors were and how they talked, said Haynie.

While language crossover and bilingualism are well documented throughout history, this finding showcases how there is much we can still learn about human history and the ways we communicate in present day from the words of our ancestors.

“Language always finds a way,” said Haynie.

The Grambank database is an open-access comprehensive resource maintained by the Max Planck Society.

World’s largest grammar database reveals accelerating loss of language diversity

Grammar 101

‘Unusual’ languages

Language loss

Genealogy versus geography

News Headlines

Campus Community

Events & Exhibits

Search

Other ways to search:

World’s largest grammar database reveals accelerating loss of language diversity

Grammar 101

‘Unusual’ languages

Language loss

Genealogy versus geography

News Headlines

Campus Community

Events & Exhibits