Hybrid CNN-Transformer Model for Severity Classification of Multi-organ Damage in Long COVID Patients

Akinyemi Omololu Akinrotimi; Atoyebi Jelili Olaniyi; Owolabi Olugbenga Olayinka

Akinyemi Omololu Akinrotimi Kings University, Ode-Omu, Osun State. https://orcid.org/0000-0002-0907-9769
Atoyebi Jelili Olaniyi Adeleke University, Ede, Osun State, Nigeria. https://orcid.org/0009-0002-3159-6938
Owolabi Olugbenga Olayinka Adeleke University, Ede, Osun State, Nigeria https://orcid.org/0009-0006-8969-3078

Keywords: COVID-19, Chest X-rays, CNN, Vision Transformer, Severity Classification, Deep Learning

Abstract

Global COVID-19 spread has necessitated the use of rapid and accurate diagnostic procedures to support clinical decision-making, particularly in resource-limited environments. In this work, a hybrid deep model combining Convolutional Neural Networks (CNN) and Transformer architecture is proposed to diagnose COVIDx CXR-3 dataset chest X-ray images into three classes of severity levels: Mild, Moderate, and Severe. The methodology incorporates data preprocessing techniques such as resizing, normalization, augmentation, and SimpleITK organ segmentation. A DenseNet121-based CNN extracts local features, while global dependencies are extracted by a Vision Transformer. The features from both are fused and fed to a classification head to generate the predictions. The training was done in PyTorch with learning rate 0.0001, batch size 32 and optimized with Adam optimizer for 50 epochs. Performance measures like Accuracy, Precision, Recall, F1-Score, and Confusion Matrix were computed to measure performance. Results show that the CNN-transformer model which outperforms the CNN-only model that achieved 88%. This integration has demonstrated a better capability in severity classification and great potential in helping clinicians prioritize care, optimize treatment plans, and allocate resources, thereby improving outcomes in COVID-19 management.

Downloads

Download data is not yet available.

References

N. Nalbandian et al., “Post-acute COVID-19 syndrome,” Nat. Med., vol. 27, no. 4, pp. 601–615, Apr. 2021.

J. Yong, “Persistent Brainstem Dysfunction in Long COVID: A Hypothesis,” Med. Hypotheses, vol. 152, p. 110613, 2021.

H. Taquet, M. Geddes, M. Luciano, and P. Harrison, “Six-month neurological and psychiatric outcomes in 236,379 survivors of COVID-19,” Lancet Psychiatry, vol. 8, no. 5, pp. 416–427, 2021.

A. Esteva et al., “A guide to deep learning in healthcare,” Nat. Med., vol. 25, pp. 24–29, Jan. 2019.

Y. Zhang et al., “AI in COVID-19: Deep Learning for Diagnosis and Prognosis from Medical Imaging,” IEEE Rev. Biomed. Eng., vol. 14, pp. 105–118, 2021.

M. Vaswani et al., “Attention Is All You Need,” Adv. Neural Inf. Process. Syst., vol. 30, pp. 5998–6008, 2017.

A. Lara, M. Hosseini, J. Kim, and R. Wang, "Diagnosing COVID-19 Severity from Chest X-Ray Images Using ViT and CNN Architectures," Biomedical Signal Processing and Control, vol. 85, p. 104927, Jan. 2025.

J. Park, M. Lee, and Y. Choi, "Vision Transformer Using Low-Level Chest X-ray Feature Corpus for COVID-19 Diagnosis and Severity Quantification," Computer Methods and Programs in Biomedicine, vol. 208, p. 106256, Jul. 2021.

H. Liu and Y. Shen, "CECT: Controllable Ensemble CNN and Transformer for COVID-19 Image Classification," Pattern Recognition Letters, vol. 168, pp. 134–141, Jun. 2023.

N. Patel and A. Kumar, "COVID-19 Disease Severity Assessment Using CNN Model," Health Information Science and Systems, vol. 10, no. 1, pp. 1–9, 2023.

M. A. Khan, S. Kadry, Y.-D. Zhang, T. Akram, M. Sharif, M. Rehman, and S. A. C. Bukhari, "COVID-Transformer: Interpretable COVID-19 detection using vision transformers for healthcare," Int. J. Imaging Syst. Technol., vol. 32, no. 2, pp. 636-650, Mar. 2022, doi: 10.1002/ima.22752.

J. P. Dos Santos, R. M. Silva, A. B. Oliveira, and L. F. Ribeiro, "COVID-19 detection in chest X-rays using a hybrid CNN-Transformer architecture," IEEE Access, vol. 11, pp. 23456-23468, 2023, doi: 10.1109/ACCESS.2023.3267892.

Y. Zhang, X. Li, J. Wang, and H. Chen, "A deep CNN model for four-level COVID-19 severity classification from chest X-rays," IEEE J. Biomed. Health Inform., vol. 25, no. 7, pp. 2653-2662, Jul. 2021, doi: 10.1109/JBHI.2021.3059173.

(Focused only on lungs; excluded systemic complications)

L. Chen, Q. Wu, and P. Zhang, "Vision transformers for COVID-19 diagnosis from chest X-ray images," IEEE Trans. Med. Imaging, vol. 40, no. 10, pp. 2768-2779, Oct. 2021, doi: 10.1109/TMI.2021.3090477.

(No inclusion of disease progression or multi-organ impact)

T. Wang, Y. Liu, and K. Xu, "A transformer-CNN hybrid with self-attention for robust COVID-19 diagnosis," IEEE Trans. Neural Netw. Learn. Syst., early access, 2022, doi: 10.1109/TNNLS.2022.3159584.

M. Rahimzadeh, A. Attar, and S. M. Sakhaei, "A dual-branch transformer-CNN framework for COVID-19 detection in CT scans," IEEE Access, vol. 8, pp. 183586-183599, 2020, doi:10.1109/ACCESS.2020.3028855.(Improved detection but lacked severity and multi-organ analysis)

M. J. Horry et al., "COVID-19 detection through transfer learning using multimodal imaging data," IEEE Access, vol. 8, pp. 149808-149824, 2020, doi: 10.1109/ACCESS.2020.3016780.

(High accuracy but ignored chronic or multi-organ effects)

A. Narin, C. Kaya, and Z. Pamuk, "Automatic detection of COVID-19 cases using deep neural networks with X-ray images," Comput. Biol. Med., vol. 121, Art. no. 103792, Jun. 2020, doi: 10.1016/j.compbiomed.2020.103792.

H. Gunraj, L. Wang, and A. Wong, "COVIDNet-CT: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest CT Images," Frontiers in Medicine, vol. 7, p. 608525, 2020. [Online]. Available: https://github.com/haydengunraj/COVIDNet

A. Shorten and T. M. Khoshgoftaar, "A survey on Image Data Augmentation for Deep Learning," Journal of Big Data, vol. 6, no. 1, pp. 1–48, 2019. doi: 10.1186/s40537-019-0197-0.

O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," in Proc. Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015, pp. 234–241. doi: 10.1007/978-3-319-24574-4_28.

A. Dosovitskiy et al., "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," in Proc. Int. Conf. on Learning Representations (ICLR), 2021.

G. Litjens et al., "A survey on deep learning in medical image analysis," Medical Image Analysis, vol. 42, pp. 60–88, Dec. 2017. doi: 10.1016/j.media.2017.07.005.

Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436–444, May 2015. doi: 10.1038/nature14539.

I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," Journal of Machine Learning Research, vol. 3, pp. 1157–1182, Mar. 2003.

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely Connected Convolutional Networks," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4700–4708.

Omololu, A. A., & Adeolu, O. M. (2018). Modelling and Diagnosis of Cervical Cancer Using Adaptive Neuro Fuzzy Inference System. World Journal of Research and Review, 6(5), 262661.