Published: Sept. 27, 2022

Strand 1 researchers are working to answer the foundational question: What AI advances are needed to understand and facilitate collaborative learning conversations? These team members enable our AI Partner to recognize the various complexities of human speech – essentially creating the “eyes” and “ears” of the Partner.

Advancing AI Components

Over the last several months the team has been advancing the performance of the individual AI components, especially in the context of classroom conversations with the Sensor Immersion curriculum unit for which we collected hundreds of hours of data in the past year. These components include: (1) Automatic Speech Recognition, (2) Abstract Meaning Representations, fine-tuned on a small classroom training set of annotations, (3) Behavioral Engagement in the form of on-off topic/task classification (On-Topic/Task) trained on classroom annotation, and (4) Academically Productive Talk classification with no domain specific fine-tuning.

The team also compared the performance of the Abstract Meaning Representation parser on both human transcription and Automatic Speech Recognition transcription, as well as On-Topic/Task, with encouraging results noting only a 20 percent performance drop. Additionally, they tested other components on comparable test sets including Speaker Diarization, tested on Abstract Meaning Representation headset data, and Eye Gaze Detection, tested on YouTube classroom data. Strand 1 is also making good progress on two novel research areas where they are in the process of defining new annotation schemes. These include (1) Dialogue Dependency Acts: applying standard Dialogue Act and Rhetorical Structure Theory annotation to classroom dialogues and (2) Integrating Gesture Detection and annotation with Abstract Meaning Representation parses, a multimodal integration of speech and gesture.

Strand 1 Object Detection

This experiment helps Strand 1 with key focuses including object detection, joint detection, and grounding

Gesture Abstract Meaning Representation

Over the next several months, the focus will be on continuing to improve Speech Processing and Diarization by mapping the impact of background noise more accurately and piloting Speech Diarization in the classrooms. The team will also fine tune the Abstract Meaning Representation parsing and pursue more annotations of training data and more formal comparison of Abstract Meaning Representation /Academically Productive Talk/On-Task/Topic performance on human transcripts versus Automatic Speech Recognition output. Another major focus will be developing Gesture Abstract Meaning Representation – Strand 1’s situated grounding system – for integration of gestures with a focus on representing the distinction between “content-bearing gestures’’ and “ampliative or co-suppositional gestures’’ in the Gesture Abstract Meaning Representation notation.

Strand 1 GAMR

Strand 1 collecting data to help with developing Gesture Abstract Meaning Representation

This includes distinguishing between nonverbal actions that involve situated grounding through pointing or iconic gestures (that block, grab/pick up), from co-speech gestures that often reflect speaker sentiment, attitude, or epistemic state. When the actual gesture is nearly identical, it is crucial to know how to disambiguate the top-level type. This fall, the team has multiple sites carrying out and recording collaborative interactions, which will provide video data for the modeling dialogue acts and gestures, as well as aspects of content analysis.