
ISSN:1390-9266 e-ISSN:1390-9134 LAJC 2026 52
DOI:
LATIN-AMERICAN JOURNAL OF COMPUTING (LAJC), Vol XIII, Issue 1, January 2026
https://doi.org/10.33333/lajc.vol13n1.04
LATIN-AMERICAN JOURNAL OF COMPUTING (LAJC), Vol XIII, Issue 1, January - June 2026
VI. CONCLUSION
This study compared feature-based machine-learning
models and convolutional neural networks (CNNs) for static
malware detection using a dataset of more than 200,000
Portable Executable files [1]. The experimental design
provided a unified benchmark in which both traditional
classifiers and visual deep-learning models were trained and
evaluated under identical conditions.
The results demonstrated that tree-based ensembles,
particularly CatBoost [8], remain the most accurate static
detectors when high-quality handcrafted features are
available, achieving 0.990 ± 0.001 ROC-AUC and 0.947 ±
0.005 F1. However, CNNs trained directly on byte-derived
image representations achieved competitive performance
without manual feature engineering. Among the visual
models, entropy-based inputs consistently outperformed
greyscale and combined modalities, and deeper networks such
as ResNet-18 [5] and EfficientNet-B0 [6] significantly
exceeded the accuracy of the shallow SimpleCNN. These
findings confirm that entropy visualization provides strong
discriminative cues and that model depth enhances
representational capacity in image-based malware analysis
[3],[4].
The comparative analysis revealed a narrowing gap of
approximately 0.036 ROC-AUC between the best feature-
based and CNN models, suggesting that vision-driven
approaches can serve as scalable, automated alternatives when
handcrafted features are unavailable or costly to compute.
Future work should explore hybrid static–dynamic
pipelines that combine image-based deep learning with
lightweight engineered features to further improve robustness
against obfuscation and dataset drift [14]. Benchmarking
computational efficiency across uniform hardware and
assessing real-world, imbalanced data distributions would
also strengthen the practical applicability of visual malware
detection methods.
The limited improvement from the combined greyscale–
entropy representation further highlights the challenge of
naive multimodal fusion. Although the two inputs differ
visually, their spatially correlated content may reduce
discriminative gradients when processed jointly; suggesting
that future architectures should employ attention-based fusion
or late-stage embedding integration to better exploit
complementary features without introducing redundancy.
FUNDING STATEMENT
This research received no external funding. All
computational work and analysis were conducted using
personal and institutional resources without third-party
support.
ACKNOWLEDGMENT
The author thanks Dr. Shahrzad Zargari of Sheffield
Hallam University for supervision and constructive feedback
throughout the research project.
AUTHOR CONTRIBUTIONS
Harry Darton: Conceptualization, Methodology, Data
Curation, Formal Analysis, Writing – Original Draft, Writing
– Review & Editing.
REFERENCES
[1] M. Lester, “PE malware machine learning dataset [Data set],” Practical
Security Analytics, 2021. [Online]. Available:
https://practicalsecurityanalytics.com/pe-malware-machine-learning-
dataset/
[2] H. S. Anderson and P. Roth, “EMBER: An open dataset for training
static PE malware machine learning models,” arXiv preprint, 2018.
[Online]. Available: https://arxiv.org/abs/1804.04637
[3] L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, “Malware
images: Visualization and automatic classification,” in Proc. 8th Int.
Symp. Visualization for Cyber Security (VizSec 2011), pp. 1–7, ACM,
2011. doi: 10.1145/2016904.2016908
[4] K. S. Han, J. H. Lim, B. Kang, and E. G. Im, “Malware analysis using
visualized images and entropy graphs,” Int. J. Inf. Security, vol. 14, no.
1, p. 1, 2014. doi: 10.1007/s10207-014-0242-0
[5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proc. IEEE Conf. Computer Vision and Pattern
Recognition (CVPR 2016), pp. 770–778, 2016. doi:
10.1109/CVPR.2016.90
[6] M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for
convolutional neural networks,” in Proc. 36th Int. Conf. Machine
Learning (ICML 2019), vol. 97, pp. 6105–6114, PMLR, 2019.
[Online]. Available: https://proceedings.mlr.press/v97/tan19a.html
[7] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,”
in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and
Data Mining (KDD ’16), pp. 785–794, ACM, 2016. doi:
10.1145/2939672.2939785
[8] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A.
Gulin, “CatBoost: Unbiased boosting with categorical features,” in
Proc. 32nd Int. Conf. Neural Information Processing Systems (NeurIPS
2018), pp. 6639–6649, 2018. [Online]. Available:
https://proceedings.neurips.cc/paper_files/paper/2018/hash/14491b75
6b3a51daac41c24863285549-Abstract.html
[9] A. Bensaoud, N. Abudawaood, and J. Kalita, “Classifying malware
images with convolutional neural network models,” arXiv preprint,
2020. [Online]. Available: https://arxiv.org/abs/2010.16108
[10] AV-TEST Institute, “Malware statistics & trends report,” 2024.
[Online]. Available: https://www.av-test.org/en/statistics/malware/
[11] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol.
521, no. 7553, pp. 436–444, 2015. doi: 10.1038/nature14539
[12] M. Kalash et al., “Malware classification with deep convolutional
neural networks,” in Proc. 10th Int. Conf. New Technologies, Mobility
and Security (NTMS), pp. 1–5, IEEE, 2018. doi:
10.1109/NTMS.2018.8328749
[13] C. E. Shannon, “A mathematical theory of communication,” Bell Syst.
Tech. J., vol. 27, no. 3, pp. 379–423, 1948. doi: 10.1002/j.1538-
7305.1948.tb01338.x
[14] M. Brosolo and M. Conti, “The road less travelled: Investigating
robustness and explainability in CNN malware detection,” arXiv
preprint, 2025. doi: 10.48550/arXiv.2503.01391
[15] B. Al-Masri, N. Bakir, A. El-Zaart, and K. Samrouth, “Dual
convolutional malware network (DCMN): An image-based malware
classification using dual convolutional neural networks,” Electronics,
vol. 13, no. 18, p. 3607, 2024. doi: 10.3390/electronics13183607
[16] J. Saxe and K. Berlin, “Deep neural network based malware detection
using two-dimensional binary program features,” arXiv preprint
arXiv:1508.03096, 2015.
[17] E. Raff et al., “Malware detection by eating a whole EXE,” arXiv
preprint arXiv:1710.09435, 2017.
Note on the Use of Artificial Intelligence (AI):
AI tools were used only for minor language editing and
reference formatting. All methodological design, analysis,
data interpretation, and writing decisions were performed by
the author.