443. Sentiment and Linguistic Analysis of Epidemic Outbreak Data from Official and Alternative Sources

Sentiment and Linguistic Analysis of Epidemic Outbreak Data from Official and Alternative Sources

Keywords: epidemic outbreaks, sentiment analysis, text mining, epidemiological surveillance, public communication.

Abstract

Information on epidemic outbreaks is a key input for health surveillance, as it allows for the assessment of the spread and associated social perception. This study examines emotional and linguistic patterns in narratives disseminated by international organizations (WHO, UN, CDC) and digital platforms (Google News and Reddit) over a three-month period. The KDD process was applied in R Studio (selection, preprocessing, transformation, modeling, and evaluation), using Bing and NRC lexicons and a supervised Naive Bayes model to enhance the detection of emotional nuances. A total of 12,340 texts (3,100 from official sources, 4,240 from Google News, and 5,000 from Reddit) were analyzed using standardized queries in English: pandemic, confinement, epidemic, and HMPV. Official sources showed a greater presence of positive emotions linked to cooperation and security; Google News concentrated negative narratives with terms such as risk and dangerous; Reddit combined fear and sadness with appearances of hope. The analysis included t-tests and ANOVA with 95% confidence intervals. The work is exploratory and preliminary in nature and suggests that surveillance systems should integrate the monitoring of social networks and digital media, along with public policy measures to improve communication in health crisis situations.

DOI

Accepted
2025-07-23
How to Cite: Ordoñez Guerrero, K., Cordero Bazurto, J., Brito Casanova, G., & Samaniego Mena, E. (2026). Sentiment and Linguistic Analysis of Epidemic Outbreak Data from Official and Alternative Sources. En Latin-American Journal of Computing (Vol. 13, Número 1). Escuela Politécnica Nacional.
Section
Research Articles for the Next Issue (Early Access)