Sofija Marković
Inferring COVID-19 severity determinants by combining epidemiological modeling and machine learningAbstract
Over the past year and a half, we have witnessed great challenges that SARS-CoV-2
pandemic brought upon health systems and economies in most parts of the world.
Despite strict disease control measures and vaccination effort, so far over 4.5
million lives were lost due to COVID-19. Determining the most relevant factors that
influence severity of the disease is crucial for a better understanding of
epidemiological risks and prioritizing resources towards more endangered
countries/regions. We here propose a combination of epidemiological (non-linear dynamics) and ecological (regression based) models aimed to assess main determinants of COVID-19 severity. Instead of commonly used severity measures, fatality counts and CFR, we introduce a measure inferred from the SPEIRD disease dynamics model which we previously developed. Our measure, m/r corresponds to the ratio of mortality and recovery rates, and can be used to determine socio-demographic, environmental and health parameters related to the severity of the disease. In our research, we used Principal Component Analysis to partially decorrelate and reduce initially large number of potential risk factors, combined with regularization-based feature selection methods, Lasso and Elastic Net and two nonparametric machine learning algorithms, Random Forest and Gradient Boost. Combining multiple feature selection methods allowed us to robustly (i.e., through multiple independent regression-based techniques) obtain most relevant COVID-19 severity determinants for 51 US states and territories. Our results identify the prevalence of chronic diseases (mainly cardiovascular diseases and cancer) in the population, as well as chronic pollution exposure, as the most important predictors of COVID-19 severity. Other predictors identified to significantly affect severity were percentage of youth population, racial structure, and population density. In conclusion, without a prior bias towards known clinical determinants of COVID-19 severity, by combining epidemiological modeling and machine learning based multivariate analyses, our model successfully recovered dependencies known from previous clinical studies (e.g. prevalence of chronic diseases was known to increase severity, while higher percentage of youth population presents lower risk), while also revealing additional risk factors, such as population density and chronic exposure to air pollution. |