The Department’s computational linguistics faculty are dedicated to the advancement of human language technology and the automatic production of richer and more accurate representations of utterances in English, Chinese, Hindi/Urdu, Arabic, Farsi and other languages.

Working within the Computational Language and EducAtion Research Center (CLEAR), these researchers and their affiliates apply cutting edge techniques from computer science to challenging issues in the processing of natural language.

Prof. Mans Hulden’s research focuses on computational modeling and learning of word-level phenomena in language. Prof. Hulden is the author of Foma, a freely available finite-state toolkit designed for the production of morphological and phonological analyzers and generators. Finite-state techniques also find practical applications in large-scale language processing tasks such as shallow parsing and information extraction.

Prof. Martha Palmer's research involves the application of supervised machine learning to linguistically annotated data in order to train Natural Language Processing components, such as word sense disambiguation systems. These components comprise the building blocks for many different types of end-to-end systems with various applications, such as Information Retrieval, Information Extraction, Question Answering and Machine Translation. The linguistic annotation defines the depth and accuracy of the computer-generated representations, and the research offers a principled approach to developing new layers of increasingly rich levels of semantic and pragmatic annotation.

Computational linguistics is inherently interdisciplinary: it relies on the one hand on the latest developments in linguistic theories, but also on new algorithms and machine-learning approaches from computer science as well as findings about human language processing from cognitive research.

The potential applications of human language technology are far reaching: they can be applied to any field with information in the form of text or speech. With increasing digitization of resources and collections (books, document archives, various types of records,  etc.), there is a growing interest in applying computational linguistics across all disciplines and in all languages to help facilitate information gathering, filtering and prioritizing.