Published: Jan. 18, 2023

Our Strand 1 team works toward developing new advancements in how machines process human language, gestures, and emotions to create an AI Partner that can understand and facilitate classroom collaboration. The team kicked off iSAT’s third year by identifying three main research themes in which to focus their efforts: (1) Content Analysis & Dialogue Management–Multimodal Interactive Agent (MMIA), led by Jim Martin (CU Boulder), Jeff Flanigan (UC Santa Cruz), and Martha Palmer (CU Boulder); (2) Speech processing and diarization, led by Jacob Whitehill (Worcester Polytechnic Institute); and (3) Situated Grounding, led by James Pustejovsky (Brandeis University) and Nikhil Krishnaswamy (Colorado State University).

Putting the Pieces Together: Jigsaw Worksheets

The Strand 1 team identified the Jigsaw worksheets—completed by middle school students as part of lesson four of the SchoolWide Labs Sensor Immersion Curriculum Unit—as a great dataset for helping our AI Partner understand how students collaborate on tasks and think through problems. While this work engages all three themes of Strand 1, the task of determining optimal interactions between the students and the AI partner is primarily a focus for the MMIA, which is tentatively referred to as the Planning Investigation Partner (PIP).

A Multimodal Interactive Agent

The MMIA team started work on turning the paper Jigsaw worksheet into a digital version that can record how students collaborate on tasks and think through problems. They analyzed the student entries in the Jigsaw worksheets, and— through cross-strand planning meetings—sketched out a road map for developing an engaging MMIA that can scaffold student discussions about sensor projects.

In parallel with this planning effort, the team began annotating the Jigsaw worksheet entries with Abstract Meaning Representation (AMR), OnTask, and Academic Productive Talk (APT) and trained models for each layer. They are also analyzing how Automatic Speech Recognition errors impact each of these downstream tasks.

Martha Palmer

Strand 1 co-lead, Martha Palmer, outlines the MMIA to iSAT’s cross- strand researchers.

The team also finished the first pass of the annotation scheme design of the Dependency Dialog Acts, which capture the speaker intention and the threading structure in student conversations. The team is working on annotating and utilizing this framework to provide linguistic insight into high level content analysis tasks, such as Collaborative Problem Solving, APT and equitable conversation.

To increase the amount of student collaboration data, the MMIA team worked with Strands 2 and 3 to analyze the limitations of breakout group conversations around the Jigsaw worksheet and to brainstorm potential improvements to the flow of lesson four. The team is also working with Strands 2 and 3 to conceptualize a set of lab experiments to further define the role of the interactive agent in the Jigsaw worksheet application.

Speech Processing and Diarization

To help our AI Partner understand students when they talk, the ASR/Diarization theme has a new child speech dataset for additional comparisons of off-the-shelf and lab developed ASR systems. They established a corpus of transcribed data for benchmarking, training and fine tuning speaker verification models on child speech. This includes sensor immersion transcripts and close-talking microphone recordings from existing corpora, and they continue to achieve improvements in Word Error Rates. The team is also training an interruption detection model.

Our AI Partner will also need to identify who is speaking and when (diarization). To this end, the team has been working on person re-identification applied to school classroom environments, including the application of a state-of-the-art person re-id system to a challenging real-world classroom dataset. 

Situated Grounding

Students and teachers establish common ground when interacting with one another through both behavioral and verbal cues, as well as prior goals, expectations, and beliefs. The Situated Grounding team is tasked with identifying this common ground through discourse and gesture. This quarter, the team has finalized the initial guidelines and moved into annotation mode. Targeted datasets include EggNOG—a biological information database—the Fibonacci weights experiments out of our CU Boulder and CSU labs, and Sensor Immersion both in the lab and classrooms.

The Brandeis research team is adjudicating the original Gesture Abstract Meaning Representation (GAMR) annotations on data from EggNOG. To help our AI Partner understand when speakers are referring back to previously expressed thoughts or ideas later on in a conversation, the team is working with student annotators on the coreference multi-sentence AMR annotation across the adjudicated EggNOG speech and gesture AMR.

Lab data collection is underway at CSU and Brandeis, supplemented by Strand 2 data from CU Boulder. This data is being annotated for gesture semantics and object grounding, which is being used to train novel gesture recognition and object grounding models for use on lab, and eventually, classroom data. The group is also working with Strand 2 on non-verbal behavior (NVB) annotation in the classroom.