Exploring Topics in Information Technology Open Educational Resources through the LDA Algorithm
Abstract
This paper explores the application of machine learning and text mining techniques to discover OER issues in the context of Engineering Education. Applying the LDA (Latent Dirichlet Allocation) algorithm, themes are extracted from OER, it is possible to consider them as additional metadata. This augmentation serves to enhance the description and categorization of OER. Furthermore, this study introduces a methodology to automatically identify topics in open educational resources. In this research, a dataset of 80 OER was obtained from the Skills Commons repository. The highest coherence value achieved at 0.42, emerged when the number of topics was 9 in the LDA model. These nine topics are closely associated with Information Technology Education.
Downloads
References
V. Segarra-Faggioni and A. Romero-Pelaez, “Automatic classification of OER for metadata quality assessment,” in 2022 International Conference on Advanced Learning Technologies (ICALT), 2022, pp. 16–18. doi: 10.1109/ICALT55010.2022.00011.
X. Ochoa and E. Duval, “Automatic evaluation of metadata quality in digital repositories,” Int. J. Digit. Libr., vol. 10, no. 2–3, pp. 67–91, Aug. 2009, doi: 10.1007/s00799-009-0054-4.
J. Chicaiza, N. Piedra, J. Lopez-Vargas, and E. Tovar-Caro, “Recommendation of open educational resources. An approach based on linked open data,” null, 2017, doi: 10.1109/educon.2017.7943018.
H. Jelodar et al., “Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey,” Multimed. Tools Appl., vol. 78, no. 11, pp. 15169–15211, Jun. 2019, doi: 10.1007/S11042-018-6894-4/TABLES/11.
M. Tavakoli, M. Elias, G. Kismihók, and S. Auer, “Metadata analysis of open educational resources,” in ACM International Conference Proceeding Series, 2021, pp. 626–631. doi: 10.1145/3448139.3448208.
M. Kim and D. Kim, “A Suggestion on the LDA-Based Topic Modeling Technique Based on ElasticSearch for Indexing Academic Research Results,” Appl. Sci., vol. 12, no. 6, p. 3118, Mar. 2022, doi: 10.3390/app12063118.
S. Ozdemirci and M. Turan, “Case Study on well-known Topic Modeling Methods for Document Classification,” in 2021 6th International Conference on Inventive Computation Technologies (ICICT), Jan. 2021, pp. 1304–1309. doi: 10.1109/ICICT50816.2021.9358473.
M. Molavi, M. Tavakoli, and G. Kismihók, “Extracting Topics from Open Educational Resources,” in Addressing Global Challenges and Quality Education, 2020, pp. 455–460.
R. Wirth and J. Hipp, “Crisp-dm: towards a standard process modell for data mining,” 2000. [Online]. Available: https://api.semanticscholar.org/CorpusID:1211505
P. Haya, “La metodología CRISP-DM en ciencia de datos,” INSTITUTO DE INGENIERÍA DEL CONOCIMIENTO, 2021. https://acortar.link/zE0aF8
M. Tavakoli, M. Elias, G. Kismihok, and S. Auer, “Quality Prediction of Open Educational Resources A Metadata-based Approach,” in 2020 IEEE 20th International Conference on Advanced Learning Technologies (ICALT), Jul. 2020, pp. 29–31. doi: 10.1109/ICALT49669.2020.00007.
D. M. Blei, A. Y. Ng, and M. Jordan, “Latent Dirichlet Allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003, Accessed: Mar. 25, 2019. [Online]. Available: https://acortar.link/zcnBG2
H. Lane, C. Howard, and H. Max Hapke, Natural Language Processing in Action: Understanding, Analyzing, and Generating Text with Python. Manning Publications., 2019.
V. Mirjalili and S. Raschka, Python Machine Learning. Marcombo, 2019.
S. Weiss, N. Indurkhya, and T. Zhang, Fundamentals of Predictive Text Mining, Second Edi. 2015. doi: 10.1007/978-1-4471-6750-1.
S. Bird and E. Loper, “NLTK: The Natural Language Toolkit,” 2006. Accessed: Oct. 14, 2018. [Online]. Available: www.python.org.
S. Raschka, J. Patterson, and C. Nolet, “Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence,” CoRR, vol. abs/2002.0, 2020, [Online]. Available: https://arxiv.org/abs/2002.04803
Copyright (c) 2024 Latin American Journal of Computing
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This article is published by LAJC under a Creative Commons Attribution-Non-Commercial-Share-Alike 4.0 International License. This means that non-exclusive copyright is transferred to the National Polytechnic School. The Author (s) give their consent to the Editorial Committee to publish the article in the issue that best suits the interests of this Journal. Find out more in our Copyright Notice.
Disclaimer
LAJC in no event shall be liable for any direct, indirect, incidental, punitive, or consequential copyright infringement claims related to articles that have been submitted for evaluation, or published in any issue of this journal. Find out more in our Disclaimer Notice.