eTASC: Empricial Evidence for a Theoretical Approach to Semantic Components

Although the field of natural language processing (NLP) has made considerable strides in the automated interpretation of typical newswire text, the flexibility and extensibility of language still cause great difficulty. Due to the reliance on supervised machine learning, current NLP systems can only provide accurate meaning representations for phrases that are very similar to text previously seen in training data. As a result, performance of such systems on new genres and new domains drops dramatically. Breaking through this barrier to more robust and generalizable meaning representations will only be possible through major advances in our understanding of how individual words contribute to the overall construction of a sentence's interpretation in context. This requires meaning representations that can generalize to classes of words, such as those provided by VerbNet, and that can be further specified based on the shared semantic components of the specific words used in context, such as that posited by the Generative Lexicon.

To achieve this goal, we are partnering with James Pustejovsky ( to construct SemNet, a richer broad-coverage lexical resource based on VerbNet and enhanced by concepts from the Generative Lexicon. Our theoretical research into the connections between the Generative Lexicon (GL) and VerbNet (VN) will provide the foundation on which we can build novel, rich semantic
representations of complex changes of state, such as material transformations, object modifications, creation events and disruption events. In addition, using GL qualia structure and analysis of corpus data, we are adding verb-specific selectional preferences to VerbNet's class-based thematic role preferences. Finally, the Rich Event Ontology will house qualia relations between object concepts and event concepts referenced in SemNet.

Communicating with Compuers (CwC)

The objective of DARPA’s CwC program is to facilitate and accelerate progress toward symmetric communication between humans and computers. In partnership with University of Illinois Urbana-Champagne (, Indiana University, and Washington State  University, we are exploring ways for an interactive intelligent system to achieve cognitive coherency, concentrating on robust communication with humans through incrementally adapting its understanding of the human’s language, its communication abilities and,
consequently, its ability to support domain-specific reasoning. Working within a Blocks World scenario, we are using a joint building task between human and computer to experiment with methods for achieving these goals. Here at CU, we are creating flexible meaning representations for events and spatial relations that can bridge syntactic parses of human language and computer planning instructions.

Funded by the NIH

Multi-source Integrated Platform for Answering Clinical Questions

MiPACQ is a question answering project. It is a project designed to build a system through which doctors can ask a computer questions about existing medical records and information sources to get answers quickly and efficiently. This project is a joint effort of the University of Colorado at Boulder, the Mayo Clinic, and the Harvard School.

Funded by the NSF

Project EPIC applies natural language processing techniques in order to facilitate computer mediated communication during times of crisis.

Funded by HHS

The SHARP project at Colorado aims to merge and standardize patient data from non-electronic forms, such as the free text of radiology and pathology notes, into an electronic health record (EHR). The project applies natural language processing techniques to extract structured information from clinical notes that allows the information contained there to be searched, e.g. for a diagnosis, compared, e.g. to find common co-morbidities with a certain diagnosis, and summarized. The project will help improve patient care by reducing inconsistencies in patient data, providing physicians with more accurate and uniform information in a centralized location.

Funded by NIH

The THYME project aims to identify clinical events like diseases, symptoms and treatments and recognize their ordering along a timeline. Specifically, it aims to develop an annotation schema for temporal relations in clinical free text, create an annotated corpus of clinical text following the schema, develop new algorithms for training temporal relation discovery systems on this corpus and evaluate these systems on various use cases, including clinical notes on colon cancer and radiology reports on brain tumors.