Optimising a Language Recognition System Through Phoneme-Based Vector Representation
Abstract
This article analyzes vector representation of phonemes as an alternative to improve a language identification system (LID). CBOW (Continuous Bag-of-Words) and Skip-gram architectures proposed by Mikolov are studied. These models allow predicting words within a context by generating n-dimensional vectors. In this work we will analyze the application of these models in smaller phonetic units or n-grams.
Downloads
References
E.Ambikairajah, H. Li, L. Wang, B.Yin, and V.Sethu. “Language Identification: A tutorial”. IEEE Circuits and Systems Magazine, pages 82-108. May 2011.
E. Singer, P. A. Torres-Carrasquillo, T. P. Gleason, W. M. Campbell, and D. A. Reynolds. “Acoustic, phonetic, and discriminative approaches to automatic language identification”. Interspeech , 2003.
C. Salamea, L.F. D'Haro, R. de Córdoba, M. A., Caraballo “Incorporación de n-gramas discriminativos para mejorar un reconocedor de idioma fonotáctico basado en i-vectores”, Procesamiento del Lenguaje Natural, Revista nº 51, págias145-152, 2013.
M. A. Zissman et al. “Comparison of four approaches to automatic language identification of telephone speech”. IEEE Transactions on Speech and Audio Processing, pages 31 -44, 1996.
L. J. Rodriguez-Fuentes, N. Brummer, M. Penagarikano, A. Varona, G. Bordel, and M. Diez. “The Albayzin 2012 language recognition evaluation”. In Interspeech , pages 1497 -1501, 2013.
S.Lai, K.Liu, L. Xu andJ. Zhao. “How toGenerate a Good Word Embedding”,National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, China, July 2015.
T. Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean.“Distributed Representations of Words and Phrases and their Compositionality”. In Proceedings of NIPS, 2013.
M. Díez, A. Varona, M. Peñagarikano, L. J. Rodríguez-Fuentes, and G. Bordel. “On the use of pone log-likelihood ratios as features in spoken language recognition”. In Slt, pages 274-279, 2012.
P. Schwarz, “Phoneme Recognition based on Long Temporal Context”, PhD Thesis. Brno University of Technology, 2009.
D. A. Reynolds. “A Gaussian mixture modeling approach to text independent Speaker identification”. Ph.D. thesis, Georgia Inst. of Technol., 1992.
This article is published by LAJC under a Creative Commons Attribution-Non-Commercial-Share-Alike 4.0 International License. This means that non-exclusive copyright is transferred to the National Polytechnic School. The Author (s) give their consent to the Editorial Committee to publish the article in the issue that best suits the interests of this Journal. Find out more in our Copyright Notice.
Disclaimer
LAJC in no event shall be liable for any direct, indirect, incidental, punitive, or consequential copyright infringement claims related to articles that have been submitted for evaluation, or published in any issue of this journal. Find out more in our Disclaimer Notice.