Academic performance prediction model for the propedeutic course of the Escuela Politécnica Nacional and the implementation of an automated supervised learning model

  • Karen Calva EcuAnalytics
  • Miguel Flores Escuela Politécnica Nacional
  • Hugo Porras EcuAnalytics - INsight
  • Ana Cabezas-Martínez Facultad Latinoaméricana de Ciencias Sociales
Keywords: academic performance, logistic regression, decision trees, GBM, gradient descent method


In this article, a supervised machine learning model is applied that predicts the probability that a student of the National Polytechnic School will pass the leveling course. To carry out this task, a statistical methodology based on gradient boosting and logistic regression is described where the learning problem is formulated in terms of the minimization of the error function through the gradient descent method. To explain the probability of approval, dimensions suggested by the literature related to socioeconomic, demographic, family, institutional and academic performance variables are taken into consideration in the application and in the leveling course that the student has. The results of the decision tree model show a precision level of 96% in the test data set, with an area under the ROC curve of 89.1, these levels being generally accepted. On the other hand, the results of the logistic regression suggest that factors such as the weighted qualification of the first two months, the qualification with which they applied, their study schedule, their geographical location of origin, among others, affect in one way or another the probability of the student to pass the leveling course.


Download data is not yet available.
Research Articles for the Regular Issue