Optimising a Language Recognition System Through Phoneme-Based Vector Representation

  • Francisco Charro Escuela Politécnica Nacional
  • Marco Herrera Escuela Politécnica Nacional
  • Nataly Pozo Escuela Politécnica Nacional
  • Andrés Rosales Escuela Politécnica Nacional
Keywords: Vector Representation, Language Recognition, Skip-gram, n-grams, embeddings.

Abstract

This article analyzes vector representation of phonemes as an alternative to improve a language identification system (LID). CBOW (Continuous Bag-of-Words) and Skip-gram architectures proposed by Mikolov are studied. These models allow predicting words within a context by generating n-dimensional vectors. In this work we will analyze the application of these models in smaller phonetic units or n-grams.

DOI  

Downloads

Download data is not yet available.

Author Biography

Francisco Charro, Escuela Politécnica Nacional

 

 

References

E.Ambikairajah, H. Li, L. Wang, B.Yin, and V.Sethu. “Language Identification: A tutorial”. IEEE Circuits and Systems Magazine, pages 82-108. May 2011.

E. Singer, P. A. Torres-Carrasquillo, T. P. Gleason, W. M. Campbell, and D. A. Reynolds. “Acoustic, phonetic, and discriminative approaches to automatic language identification”. Interspeech , 2003.

C. Salamea, L.F. D'Haro, R. de Córdoba, M. A., Caraballo “Incorporación de n-gramas discriminativos para mejorar un reconocedor de idioma fonotáctico basado en i-vectores”, Procesamiento del Lenguaje Natural, Revista nº 51, págias145-152, 2013.

M. A. Zissman et al. “Comparison of four approaches to automatic language identification of telephone speech”. IEEE Transactions on Speech and Audio Processing, pages 31 -44, 1996.

L. J. Rodriguez-Fuentes, N. Brummer, M. Penagarikano, A. Varona, G. Bordel, and M. Diez. “The Albayzin 2012 language recognition evaluation”. In Interspeech , pages 1497 -1501, 2013.

S.Lai, K.Liu, L. Xu andJ. Zhao. “How toGenerate a Good Word Embedding”,National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, China, July 2015.

T. Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean.“Distributed Representations of Words and Phrases and their Compositionality”. In Proceedings of NIPS, 2013.

M. Díez, A. Varona, M. Peñagarikano, L. J. Rodríguez-Fuentes, and G. Bordel. “On the use of pone log-likelihood ratios as features in spoken language recognition”. In Slt, pages 274-279, 2012.

P. Schwarz, “Phoneme Recognition based on Long Temporal Context”, PhD Thesis. Brno University of Technology, 2009.

D. A. Reynolds. “A Gaussian mixture modeling approach to text independent Speaker identification”. Ph.D. thesis, Georgia Inst. of Technol., 1992.

Published
2017-11-01
How to Cite
[1]
F. Charro, M. Herrera, N. Pozo, and A. Rosales, “Optimising a Language Recognition System Through Phoneme-Based Vector Representation”, LAJC, vol. 4, no. 3, pp. 49-54, Nov. 2017.
Section
Research Articles for the Regular Issue