Language Identification and Language Specific Letter-to-Sound Rules
Stephen Lewis, Katie McGrath, and Jeffrey Reuppel
full paper (PDF)
ABSTRACT. This paper describes a system that improves automatic ARPABET transcription by addressing performance issues resulting from Arabic and Russian transliteration in English text. Our system is called EAR (English, Arabic, Russian). The EAR system has two components: 1. An n-gram language identifier module which classifies an incoming unknown word as Arabic, Russian, or English, 2. Language specific letter to sound rules which output a pronunciation for a word based on its classification. Our results show overall system error reduction rates at upwards of 45% as compared to a system trained only on English.
Stephen Lewis, Katie McGrath, and Jeffrey Reuppel are MA students in the Department of Linguistics at the University of Colorado.
Colorado Research in Linguistics - Volume 17, Issue 1 - June 2004
Home | Previous Issues | Submission Guidelines | Editorial Board | Academic Journals
Colorado Research in Linguistics is the working papers journal of the Department of Linguistics at the University of Colorado.