ISSN:1390-9266 e-ISSN:1390-9134 LAJC 2024
65
DOI:
LATIN-AMERICAN JOURNAL OF COMPUTING (LAJC), Vol XI, Issue 2, July 2024
10.5281/zenodo.12192085
LATIN-AMERICAN JOURNAL OF COMPUTING (LAJC), Vol XI, Issue 2, July 2024
• Elitism enabled.
• Tournament selection (2 competing individuals).
• Single-point crossover.
With the above settings, 100 algorithm executions were
performed, and the average of the obtained results was
calculated. The results were:
• Average run time: 02:54:41.
• Average number of features used: 29.4.
• Medium depth: 41.4.
• Average number of nodes: 157.
• Average training accuracy: 0.772.
• Average test accuracy: 0.755.
The Fig. 5 presents the average test accuracies for each
class, with a standard deviation of 0.2682 in this case. It can
be observed that the prediction for classes 3 and 9 had an
average accuracy below 0.2, with class 9 being the worst
predicted by the model. Additionally, classes 20 and 21 had
averages lower than 0.6, and classes 4, 11, 13, 16, and 18
achieved an average accuracy below 0.8. On the other hand,
the remaining classes (12 in total) achieved accuracies higher
than 0.8.
Fig. 5. Average test accuracy by class
Among the 10 tests conducted, the one with the best
performance showed a training accuracy of 0.804 and a test
accuracy of 0.799. Analyzing the Fig. 5 and the average of the
results obtained, it is possible to notice that the average
training accuracy obtained was 0.772, and the average test
accuracy was 0.755, indicating these results as moderate.
Regarding the average accuracy obtained per class, there were
occurrences of extremely low accuracies, especially for
classes 3 and 9, including accuracies equal to 0 in some of
their executions, meaning that the resulting genetic algorithm
made errors in all predicted classifications. On the other hand,
12 out of the 21 classes achieved accuracies higher than 0.8,
indicating promising results.
V. C
ONCLUSIONS
This work aimed to present an approach for fault detection
and classification, evaluating its performance when applied to
the Tennessee Eastman Process. Decision trees induced by
genetic programming were used to build and train the
predictive classification model. The results of this application
were collected and analyzed.
It is important to highlight that approaches based on
decision trees can provide interpretable models, and the
application of such models in the Tennessee Eastman Process
has not been found in the previous literature. In this sense, this
work stands as one of the first to use interpretable approaches
in fault classification for the Tennessee Eastman Process
dataset.
Another point to be discussed concerns the reduction of
input attributes. By default, the dataset has 30 such attributes,
and in a few executions, the proposed model managed to
reduce this quantity to a maximum of 28 attributes. Although
there was a reduction in some tests, this number is not
significant or consistent.
Despite decision trees being simple models to understand
and interpret, as their decisions are represented in a
hierarchical structure that is easily comprehensible,
facilitating explanations to non-technical users, the
interpretability of the trees obtained by the model was
hindered by their size. The trees had an average depth of 41.4
and an average number of nodes of 157. In light of these
results, it identifies a greater difficulty in interpreting the
resulting trees due to their size. Such size is also due to the
complexity of the 21-class fault classification problem, which
is an extensive issue.
Through the aforementioned ideas, it is concluded that,
despite the model not achieving satisfactory results for all
classes, a good part of the classes was predicted reasonably or
adequately. Moreover, it is the first study that uses an
interpretable model applied to the Tennessee Eastman dataset.
However, the model needs changes and refinements for better
results.
For future work, it is necessary to apply optimization
techniques to improve the algorithm performance, aiming to
reduce its execution time. Additionally, implementing
functionalities and strategies that make the trees more
interpretable and provide better accuracy results is crucial.
The intention is to apply niche techniques, specifically fitness
sharing based on Hamming distance, to increase the
population diversity, and implement pruning techniques to
reduce the size of the trees and make them more interpretable.
R
EFERENCES
[1] A. Rajkomar, J. Dean, e I. Kohane, “Machine Learning in Medicine,”
New England Journal of Medicine, vol. 380, no. 14, pp. 1347-1358,
Abr. 2019.
[2] E. F. Brown et al., “Hierarchical decision trees for anomaly detection
in interconnected systems,” in Proceedings of the International
Conference on Industrial Engineering, pp. 126–132, 2020.
[3] J. R. Koza, Genetic Programming: On the Programming of Computers
by Means of Natural Selection. MIT Press, 1992.
[4] R. K. DeLisle and S. L. Dixon, “Induction of decision trees via
evolutionary programming,” Journal of Chemical Information and
Computer Sciences, vol. 44, no. 3, pp. 862–870, 2004.
[5] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone,
Classification and Regression Trees. Chapman & Hall, 1984.
[6] A. Silva, T. Killian, I. D. Jimenez Rodriguez, S. Son, e M. Gombolay,
“Optimization Methods for Interpretable Differentiable Decision Trees
in Reinforcement Learning,” arXiv, 2019.
[7] J. R. Koza, Genetic Programming: On the Programming of Computers
by Means of Natural Selection. MIT Press, 1985.
[8] Q. U. Nguyen, M. Zhang, K. Zhang, and S. Li, “Evolutionary
construction of decision trees for multiclass classification,” IEEE
Transactions on Evolutionary Computation, vol. 19, no. 6, pp. 822–
834, 2015.
[9] N. Javed, F. Gobet, e P. Lane, “Simplification of genetic
programs: a literature survey,”(EXE 1MRMRK ERH /RS[PIHKI
(MWGSZIV]ZSPRSTT%FV