Recent advances in speech representation learning
Self-supervised representation learning methods recently achieved great successes in NLP and computer vision domains, reaching new performance levels while reducing required labels for many downstream scenarios. Speech representation learning is experiencing similar progress, with work primarily focused on automatic speech recognition (ASR) as the downstream task. This talk will focus on our recent work on weakly-, semi-, and self-supervised speech representation learning. Learning such high-quality speech representations enabled our research on the Generative Spoken Language Modeling (GSLM) task, where both acoustic and linguistic characteristics of a language are learned directly from raw audio without any lexical or text resources.
BIO: Abdelrahman Mohamed is a research scientist at Facebook AI Research (FAIR) in Seattle. Before FAIR, he was a principal scientist/manager in Amazon Alexa and a researcher in Microsoft Research Redmond. Abdelrahman received his Ph.D. from the University of Toronto, working with Geoffrey Hinton and Gerald Penn as part of the team that started the Deep Learning revolution in Spoken Language Processing in 2009. He is the recipient of the IEEE Signal Processing Society Best Journal Paper Award for 2016. His research interests span Deep Learning, Spoken Language Processing, and Natural Language Understanding. Abdelrahman has been focusing lately on improving learned speech representations for downstream ASR applications through weakly-, semi-, and self-supervised learning.
Google Scholar: https://scholar.google.ca/citations?user=tJ_PrzgAAAAJ&hl=en