Adina Williams (Facebook Artificial Intelligence Research / New York University) gave a computational linguistics talk to CU Linguistics on the association of nouns to gender classes.
Wednesday, February 19th
Title: "Is the association of nouns to gender classes truly arbitrary?"
Debate has long raged in linguistics about the nature of the relationship between nouns and grammatical gender. How a language with a robust grammatical gender system chooses to gender its nouns appears at first glance largely unrelated to the meaning of the nouns. For example, one lexeme can take different genders in different languages: e.g., ‘table’ is feminine when translated into Spanish, but masculine when translated into German. In this talk, I present two approaches for measuring the arbitrariness of gender on inanimate nouns: the first approach measures how well gender correlates with word vector representations of meaning, and the second measures much information about grammatical gender (in bits) is shared between nouns and other words in the noun’s context. This work shows that state-of-the-art NLP systems trained on large-scale corpora from multiple languages can help us uncover new facts relevant to linguistic typology. These multilingual studies can also be viewed as an initial step towards a general methodological program that utilizes information theoretic approaches to shed light on traditional cognitive scientific questions about language structure and use.