NLP @ CU Boulder
"The idea of giving computers the ability to process human language is as old as the idea of computers themselves. This vibrant interdisciplinary enterprise has many names corresponding to its many facets, names like speech and language processing, human lanquage technology, natural lanquage processing and computational linquistics. The goal of this exciting field is to provide scientific insights into the nature of human language and to enable human-machine communication and improve human-human communication."
-Professor Jim Martin
Daniel Jurafsky and James H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition (2ed.), Prentice Hall 2009
The NLP Process
Training computers to accurately deal with languages is a complex process that intricately weaves together linguistic insights and computational models that reference real world contexts. The process can begin with linguistic analysis, computational models, or a combination of the two. After it’s begun, however, it usually cycles in the following manner.
The NLP Ecosystem
The NLP ecosystem is comprised of linguists, computer scientists, and domain experts, as well as the computational linguists who link these three groups together.
Featured Projects
Our faculty are engaged in research projects ranging from language documentation and morphological analysis to semantic analysis and biomedical informatics. We are also currently working on an autonomous conversational agent in a junior high through college classroom setting. Featured below are some of the projects we are most proud of, both past and present.
Ongoing
Jan 28th
DARPA AIDA Program
Autonomous Interperation of Disparate Alternatives
Our goal is to automatically analyze the content of written documents and extract key pieces of information about the events they describe, including where different news sources contradict each other.
Problem
We can’t possibly keep track of everything that is happening day to day - in the news, in medicine, in financial markets, on social media, etc.
Solution
Natural Language Processing can automatically extract key events, along with who is participating in them and the order in which they happen, to help make our job of keeping on top of things much more tractable.
Techniques Used
- Deep Learning
- Graph Embeddings
- Coreference Resolution
- Type Matching
- Entity & Event Annotation & Recognition
- Ontology Construction & Mapping
Ongoing
Jan 28th
THYME
Temporal History of Your Medical Events
Our goal is automatically extracting the timeline of a disease and its treatment from patient records. This benefits individual patients and their doctors by providing quick, accurate summaries of a patient’s history covering several years. Moreover, aggregating together timelines for large numbers of patients can also aid in analyzing the effectiveness of alternative treatments and the development of new treatments, benefitting all patients.
Problem
Ever increasing amounts of electronic clinical data and medical subspecialization hinder the ability of doctors and patients to stay on top of all aspects of a patient’s medical history.
Solution
Natural Language Processing can automatically process thousands of patient records in seconds. This allows automatic identification of salient diseases, signs, symptoms, and treatments, while preserving the timeline of the patient’s medical history.
Techniques Used
- Annotation of Temporal Relations Between Events
- Annotation and Parsing of Abstract Meaning Representations
- Coreference Annotation and Resolution
- Entity & Event Annotation & Recognition
Ongoing
Jan 28th
Universal NLP
NLP is making immense contributions to the English and Chinese speaking worlds. Automating teaching to give children access to education and automatic machine translation increasing access to healthcare are just two examples. For the rest of the world to benefit from NLP, it needs to function in their languages too.
Problem
The majority of the world's 7000 languages have limited data available for Natural Language Processing.
Solution
When we don’t have enough data to use classical NLP, there are approaches that can make up for this lack.
Techniques Used
- Transfer Learning
- Pre-training
- Multi-task Training
- Meta Learning