Colorado Research in Linguistics

June 2004

Language Identification and Language Specific Letter-to-Sound Rules

Stephen Lewis, Katie McGrath, and Jeffrey Reuppel

full paper (PDF)

ABSTRACT. This paper describes a system that improves automatic ARPABET transcription by addressing performance issues resulting from Arabic and Russian transliteration in English text. Our system is called EAR (English, Arabic, Russian). The EAR system has two components: 1. An n-gram language identifier module which classifies an incoming unknown word as Arabic, Russian, or English, 2. Language specific letter to sound rules which output a pronunciation for a word based on its classification. Our results show overall system error reduction rates at upwards of 45% as compared to a system trained only on English.

Stephen Lewis, Katie McGrath, and Jeffrey Reuppel are MA students in the Department of Linguistics at the University of Colorado.

Colorado Research in Linguistics - Volume 17, Issue 1 - June 2004

Home | Previous Issues | Submission Guidelines | Editorial Board | Academic Journals

Colorado Research in Linguistics is the working papers journal of the Department of Linguistics at the University of Colorado.


Google
University of Colorado World Wide Web

PDF documents require Adobe Acrobat