Academic performance prediction model for the propedeutic course of the Escuela Politécnica Nacional and the implementation of an automated supervised learning model

  • Karen Calva EcuAnalytics
  • Miguel Flores Escuela Politécnica Nacional
  • Hugo Porras EcuAnalytics - INsight
  • Ana Cabezas-Martínez Facultad Latinoaméricana de Ciencias Sociales
Keywords: academic performance, logistic regression, decision trees, GBM, gradient descent method


In this article, a supervised machine learning model is applied that predicts the probability that a student of the National Polytechnic School will pass the leveling course. To carry out this task, a statistical methodology based on gradient boosting and logistic regression is described where the learning problem is formulated in terms of the minimization of the error function through the gradient descent method. To explain the probability of approval, dimensions suggested by the literature related to socioeconomic, demographic, family, institutional and academic performance variables are taken into consideration in the application and in the leveling course that the student has. The results of the decision tree model show a precision level of 96% in the test data set, with an area under the ROC curve of 89.1, these levels being generally accepted. On the other hand, the results of the logistic regression suggest that factors such as the weighted qualification of the first two months, the qualification with which they applied, their study schedule, their geographical location of origin, among others, affect in one way or another the probability of the student to pass the leveling course.



Download data is not yet available.


G. Guiselle, “Factores asociados al rendimiento académico en estudiantes universitarios desde el nivel socioeconómico: Un estudio en la Universidad de Costa Rica”, El Salvador: Revista Electrónica Educare, vol. 17, 2013.

F. Carlos. “Sistemas de evaluación académica”, El Salvador: Editorial Universitaria, 2014.

V. Jorge y col., “Una explicación del rendimiento estudiantil universitario mediante modelos de regresión logística”. Venezuela: Visión Gerencial, 2009.

A. Carmen y col., “DISCUSSION PAPER SERIES The Economics of University Dropouts and Delayed Graduation : A Survey The Economics of University Dropouts and Delayed Graduation : A Survey". En: 11421, 2018.

Rodríguez Ayán, M. N., & Coello García, M. T. (2008), Prediction of university students’ academic achievement by linear and logistic models. Spanish Journal of Psychology, 11(1), 275–288.

Friedman, Jerome H. “Greedy Function Approximation: A Gradient Boosting Machine.” The Annals of Statistics 29, 33,. 5, pp. 1189–1232, 2001 JSTOR,

T. Hastie, T. Robert y F. Jerome, “The elements of statistical learning: data mining, inference, and prediction”, New York: Springer, 2017.

Jordi Gironés Roig y col. Minería de datos: modelos y algoritmos. pp. 274, 2017 isbn: 9788491169048.

C. Vincent, glmulti: “Model Selection and Multimodel Inference Made Easy”. R package version [Online]. Available: [Accessed: 2019].

J. Hunt, “Classification by induction: Aplications to modelling and control of non linear dynamic systems. Intelligent Systems Engineering”, 1993.

I. Kononenko, I. Bratko and M. Kukar. Machine, “Learning and Data MIning: Methods and Aplications”. John Wiley & Sons Ltd, 1998.

S. Larose y col. “Nonintellectual learning factors as determinants for success in college". En: Research in Higher Education 39.3, pp. 275-297, 1998.

T. Ernest, P. Patrick, T. Terenzini y Lee M. “Wole. Orientation to College and Freshman Year Persistence/Withdrawal Decisions". En: The Journal of Higher Education 57.2, pp. 155, 1986.

N. Alexander y W. Ruth. “Determinants of College Success". En: The Journal of Higher Education 11.9, pp. 479-485, 1940.

Carmen Aina. Success and failure of Italian university students. Evidence from administrative data". pp 1-51, (2010).

P. Babcock y M. Mindy. “The falling time cost of college: Evidence from half a century of time use data". En: Review of Economics and Statistics, 2011

S. Iván y col.”Factores Asociados Al Abandono En Estudiantes De Grupos Vulnerables. Caso Escuela Politécnica Nacional". En: Congresos CLABES, pp. 132-141. [Online]. Available: https://revistas . [Accessed: 2018].

S. Walter, Escudero. “Big data y aprendizaje autom_atico: Ideas y desafíos para economistas". En: Una nueva econometría. isbn: 978-987-655-201-1, 2018.

How to Cite
K. Calva, M. Flores, H. Porras, and A. Cabezas-Martínez, “Academic performance prediction model for the propedeutic course of the Escuela Politécnica Nacional and the implementation of an automated supervised learning model”, LAJC, vol. 8, no. 2, pp. 58-71, Jul. 2021.
Research Articles for the Regular Issue