Email to connect: samo9533@colorado.edu

Title: Computational Methods for Describing Endangered Morphology

Abstract:

  • As Natural Language Processing (NLP) expands to a broader range of languages, it is encountering a dearth of annotated resources which are necessary to train state-of-the-art supervised machine learning systems. The lack of resources presents a barrier to applying computational approaches to the urgent need for documenting and describing the world’s 3000+ endangered languages. This talk presents three methods for computational morphological analysis that overcome the lack of resources in seven typologically diverse languages. The first method explores whether morpheme segmentation and glossing is best treated as one joint or two separate, sequential steps. The second method addresses the expense of human annotation when automatically inducing morphological paradigms, and the third method augments this task with artificial and unsupervised data. This work builds a foundation for technology that could speed and improve linguistic analysis and annotation. The methods presented here can be applied to other areas of linguistics, NLP, or low-resource domains in high-resource languages.
  • ICS Program: Dual PhD
  • Advisor: Mans Hulden
  • Home degree department: Linguistics