Academic performance prediction model for the propedeutic course of the Escuela Politécnica Nacional and the implementation of an automated supervised learning model
In this article, a supervised machine learning model is applied that predicts the probability that a student of the National Polytechnic School will pass the leveling course. To carry out this task, a statistical methodology based on gradient boosting and logistic regression is described where the learning problem is formulated in terms of the minimization of the error function through the gradient descent method. To explain the probability of approval, dimensions suggested by the literature related to socioeconomic, demographic, family, institutional and academic performance variables are taken into consideration in the application and in the leveling course that the student has. The results of the decision tree model show a precision level of 96% in the test data set, with an area under the ROC curve of 89.1, these levels being generally accepted. On the other hand, the results of the logistic regression suggest that factors such as the weighted qualification of the first two months, the qualification with which they applied, their study schedule, their geographical location of origin, among others, affect in one way or another the probability of the student to pass the leveling course.
G. Guiselle, “Factores asociados al rendimiento académico en estudiantes universitarios desde el nivel socioeconómico: Un estudio en la Universidad de Costa Rica”, El Salvador: Revista Electrónica Educare, vol. 17, 2013.
F. Carlos. “Sistemas de evaluación académica”, El Salvador: Editorial Universitaria, 2014.
V. Jorge y col., “Una explicación del rendimiento estudiantil universitario mediante modelos de regresión logística”. Venezuela: Visión Gerencial, 2009.
A. Carmen y col., “DISCUSSION PAPER SERIES The Economics of University Dropouts and Delayed Graduation : A Survey The Economics of University Dropouts and Delayed Graduation : A Survey". En: 11421, 2018.
Rodríguez Ayán, M. N., & Coello García, M. T. (2008), Prediction of university students’ academic achievement by linear and logistic models. Spanish Journal of Psychology, 11(1), 275–288. https://doi.org/10.1017/s1138741600004315
Friedman, Jerome H. “Greedy Function Approximation: A Gradient Boosting Machine.” The Annals of Statistics 29, 33,. 5, pp. 1189–1232, 2001 JSTOR, www.jstor.org/stable/2699986.
T. Hastie, T. Robert y F. Jerome, “The elements of statistical learning: data mining, inference, and prediction”, New York: Springer, 2017.
Jordi Gironés Roig y col. Minería de datos: modelos y algoritmos. pp. 274, 2017 isbn: 9788491169048.
C. Vincent, glmulti: “Model Selection and Multimodel Inference Made Easy”. R package version 184.108.40.206. [Online]. Available: https://CRAN.R-project.org/package=glmulti. [Accessed: 2019].
J. Hunt, “Classification by induction: Aplications to modelling and control of non linear dynamic systems. Intelligent Systems Engineering”, 1993.
I. Kononenko, I. Bratko and M. Kukar. Machine, “Learning and Data MIning: Methods and Aplications”. John Wiley & Sons Ltd, 1998.
S. Larose y col. “Nonintellectual learning factors as determinants for success in college". En: Research in Higher Education 39.3, pp. 275-297, 1998.
T. Ernest, P. Patrick, T. Terenzini y Lee M. “Wole. Orientation to College and Freshman Year Persistence/Withdrawal Decisions". En: The Journal of Higher Education 57.2, pp. 155, 1986.
N. Alexander y W. Ruth. “Determinants of College Success". En: The Journal of Higher Education 11.9, pp. 479-485, 1940.
Carmen Aina. Success and failure of Italian university students. Evidence from administrative data". pp 1-51, (2010).
P. Babcock y M. Mindy. “The falling time cost of college: Evidence from half a century of time use data". En: Review of Economics and Statistics, 2011
S. Iván y col.”Factores Asociados Al Abandono En Estudiantes De Grupos Vulnerables. Caso Escuela Politécnica Nacional". En: Congresos CLABES, pp. 132-141. [Online]. Available: https://revistas . [Accessed: 2018].
S. Walter, Escudero. “Big data y aprendizaje autom_atico: Ideas y desafíos para economistas". En: Una nueva econometría. isbn: 978-987-655-201-1, 2018.
Latin-American Journal of Computing is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
By participating as Author (s) in LAJC, I acknowledge that non-exclusive copyright is transferred to the National Polytechnic School. Find out more in our Copyright Notice.
Furthermore, if the article is accepted for publication in LAJC, Authors give their consent to the Editorial Committee to publish the article in the issue that best suits the interests of the Journal.
LAJC in no event shall be liable for any direct, indirect, incidental, punitive, or consequential copyright infringement claims related to articles that have been submitted for evaluation, or published in any issue of this journal. Find out more in our Disclaimer Notice.