Exploring Topics in Information Technology Open Educational Resources through the LDA Algorithm

Keywords: metadata, OER, LDA, text mining, topic modeling

Abstract

This paper explores the application of machine learning and text mining techniques to discover OER issues in the context of Engineering Education. Applying the LDA (Latent Dirichlet Allocation) algorithm, themes are extracted from OER, it is possible to consider them as additional metadata. This augmentation serves to enhance the description and categorization of OER. Furthermore, this study introduces a methodology to automatically identify topics in open educational resources. In this research, a dataset of 80 OER was obtained from the Skills Commons repository. The highest coherence value achieved at 0.42, emerged when the number of topics was 9 in the LDA model. These nine topics are closely associated with Information Technology Education.

DOI

Downloads

Download data is not yet available.

References

V. Segarra-Faggioni and A. Romero-Pelaez, “Automatic classification of OER for metadata quality assessment,” in 2022 International Conference on Advanced Learning Technologies (ICALT), 2022, pp. 16–18. doi: 10.1109/ICALT55010.2022.00011.

X. Ochoa and E. Duval, “Automatic evaluation of metadata quality in digital repositories,” Int. J. Digit. Libr., vol. 10, no. 2–3, pp. 67–91, Aug. 2009, doi: 10.1007/s00799-009-0054-4.

J. Chicaiza, N. Piedra, J. Lopez-Vargas, and E. Tovar-Caro, “Recommendation of open educational resources. An approach based on linked open data,” null, 2017, doi: 10.1109/educon.2017.7943018.

H. Jelodar et al., “Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey,” Multimed. Tools Appl., vol. 78, no. 11, pp. 15169–15211, Jun. 2019, doi: 10.1007/S11042-018-6894-4/TABLES/11.

M. Tavakoli, M. Elias, G. Kismihók, and S. Auer, “Metadata analysis of open educational resources,” in ACM International Conference Proceeding Series, 2021, pp. 626–631. doi: 10.1145/3448139.3448208.

M. Kim and D. Kim, “A Suggestion on the LDA-Based Topic Modeling Technique Based on ElasticSearch for Indexing Academic Research Results,” Appl. Sci., vol. 12, no. 6, p. 3118, Mar. 2022, doi: 10.3390/app12063118.

S. Ozdemirci and M. Turan, “Case Study on well-known Topic Modeling Methods for Document Classification,” in 2021 6th International Conference on Inventive Computation Technologies (ICICT), Jan. 2021, pp. 1304–1309. doi: 10.1109/ICICT50816.2021.9358473.

M. Molavi, M. Tavakoli, and G. Kismihók, “Extracting Topics from Open Educational Resources,” in Addressing Global Challenges and Quality Education, 2020, pp. 455–460.

R. Wirth and J. Hipp, “Crisp-dm: towards a standard process modell for data mining,” 2000. [Online]. Available: https://api.semanticscholar.org/CorpusID:1211505

P. Haya, “La metodología CRISP-DM en ciencia de datos,” INSTITUTO DE INGENIERÍA DEL CONOCIMIENTO, 2021. https://acortar.link/zE0aF8

M. Tavakoli, M. Elias, G. Kismihok, and S. Auer, “Quality Prediction of Open Educational Resources A Metadata-based Approach,” in 2020 IEEE 20th International Conference on Advanced Learning Technologies (ICALT), Jul. 2020, pp. 29–31. doi: 10.1109/ICALT49669.2020.00007.

D. M. Blei, A. Y. Ng, and M. Jordan, “Latent Dirichlet Allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003, Accessed: Mar. 25, 2019. [Online]. Available: https://acortar.link/zcnBG2

H. Lane, C. Howard, and H. Max Hapke, Natural Language Processing in Action: Understanding, Analyzing, and Generating Text with Python. Manning Publications., 2019.

V. Mirjalili and S. Raschka, Python Machine Learning. Marcombo, 2019.

S. Weiss, N. Indurkhya, and T. Zhang, Fundamentals of Predictive Text Mining, Second Edi. 2015. doi: 10.1007/978-1-4471-6750-1.

S. Bird and E. Loper, “NLTK: The Natural Language Toolkit,” 2006. Accessed: Oct. 14, 2018. [Online]. Available: www.python.org.

S. Raschka, J. Patterson, and C. Nolet, “Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence,” CoRR, vol. abs/2002.0, 2020, [Online]. Available: https://arxiv.org/abs/2002.04803

Published
2024-01-08
How to Cite
[1]
V. Segarra-Faggioni, A. Romero-Pelaez, J. C. Morocho-Yunga, and R. Ludeña, “Exploring Topics in Information Technology Open Educational Resources through the LDA Algorithm”, LAJC, vol. 11, no. 1, pp. 106-115, Jan. 2024.
Section
Research Articles for the Regular Issue