Optimising a Language Recognition System Through Phoneme-Based Vector Representation
Abstract
This article analyzes vector representation of phonemes as an alternative to improve a language identification system (LID). CBOW (Continuous Bag-of-Words) and Skip-gram architectures proposed by Mikolov are studied. These models allow predicting words within a context by generating n-dimensional vectors. In this work we will analyze the application of these models in smaller phonetic units or n-grams.
Downloads
References
E.Ambikairajah, H. Li, L. Wang, B.Yin, and V.Sethu. “Language Identification: A tutorial”. IEEE Circuits and Systems Magazine, pages 82-108. May 2011.
E. Singer, P. A. Torres-Carrasquillo, T. P. Gleason, W. M. Campbell, and D. A. Reynolds. “Acoustic, phonetic, and discriminative approaches to automatic language identification”. Interspeech , 2003.
C. Salamea, L.F. D'Haro, R. de Córdoba, M. A., Caraballo “Incorporación de n-gramas discriminativos para mejorar un reconocedor de idioma fonotáctico basado en i-vectores”, Procesamiento del Lenguaje Natural, Revista nº 51, págias145-152, 2013.
M. A. Zissman et al. “Comparison of four approaches to automatic language identification of telephone speech”. IEEE Transactions on Speech and Audio Processing, pages 31 -44, 1996.
L. J. Rodriguez-Fuentes, N. Brummer, M. Penagarikano, A. Varona, G. Bordel, and M. Diez. “The Albayzin 2012 language recognition evaluation”. In Interspeech , pages 1497 -1501, 2013.
S.Lai, K.Liu, L. Xu andJ. Zhao. “How toGenerate a Good Word Embedding”,National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, China, July 2015.
T. Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean.“Distributed Representations of Words and Phrases and their Compositionality”. In Proceedings of NIPS, 2013.
M. Díez, A. Varona, M. Peñagarikano, L. J. Rodríguez-Fuentes, and G. Bordel. “On the use of pone log-likelihood ratios as features in spoken language recognition”. In Slt, pages 274-279, 2012.
P. Schwarz, “Phoneme Recognition based on Long Temporal Context”, PhD Thesis. Brno University of Technology, 2009.
D. A. Reynolds. “A Gaussian mixture modeling approach to text independent Speaker identification”. Ph.D. thesis, Georgia Inst. of Technol., 1992.
Copyright Notice
Authors who publish this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-Non-Commercial-Share-Alike 4.0 International 4.0 that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
Disclaimer
LAJC in no event shall be liable for any direct, indirect, incidental, punitive, or consequential copyright infringement claims related to articles that have been submitted for evaluation, or published in any issue of this journal. Find out more in our Disclaimer Notice.