Study on the mortality risk and predictive model for COVID-19 inpatients with pneumonia manifestations
- Published
- Accepted
- Received
- Academic Editor
- Faiza Farhan
- Subject Areas
- Epidemiology, Public Health, COVID-19
- Keywords
- COVID-19, In-patient, Mortality risk, Machine learning, Prediction model
- Copyright
- © 2026 Li et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits using, remixing, and building upon the work non-commercially, as long as it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
- Cite this article
- 2026. Study on the mortality risk and predictive model for COVID-19 inpatients with pneumonia manifestations. PeerJ 14:e20795 https://doi.org/10.7717/peerj.20795
Abstract
Background
In 2020, COVID-19 posed a major threat to global public health in a remarkably short period. Although the WHO declared an end to the emergency phase in May 2023, a considerable proportion of recovered cases experience medium- and long-term effects, which pose ongoing health challenges to society. Therefore, it remains necessary to conduct relevant research in the post-epidemic era to explore the risk factors for death in COVID-19 inpatients.
Methods
We determined the mortality of COVID-19 inpatients with pneumonia manifestations through one-year follow-up, utilizing real-world data from three medical centers. Clinical characteristics associated with mortality risk were analyzed by logistic regression. Then, the dataset was randomly partitioned into three sets at a ratio of 4:2:4. Three machine learning algorithms were employed to develop and validate a mortality risk predictive model for COVID-19 inpatients, and a web-based visualization tool was created.
Results
There were 100 fatalities among the 1,693 samples included in this study. Meanwhile, we identified 37 factors correlated with increased mortality risk in COVID-19 inpatients with pneumonia manifestations. Ultimately, we developed a mortality risk predictive model using the random forest algorithm, which demonstrated superior predictive performance (AUC=0.907, 95% CI=0.849-0.957).
Conclusions
This study reports a mortality rate of 5.9% for COVID-19 inpatients with pneumonia manifestations. The high-performance mortality risk prediction model obtained in this study provides important practical guidance for monitoring the mortality risks of COVID-19 inpatients with pneumonia manifestations.
Introduction
A novel coronavirus named SARS-CoV-2 (which causes COVID-19) swept the world in 2020, posing a significant global public health threat in a short period of time due to its strong transmissibility (Wang et al., 2020; Wu, Leung & Leung, 2020; Deng & Peng, 2020). As of February 4, 2024, there have been over 774 million confirmed cases of COVID-19 and more than 7 million deaths reported worldwide (WHO, 2024). In certain countries, the fatality rate may exceed 20% (Sorci, Faivre & Morand, 2020).
Epidemiological data suggest that SARS-CoV-2 is primarily transmitted through short-range airborne aerosols, respiratory droplets, and both direct and indirect contact with surfaces contaminated by infectious respiratory droplets (Chan et al., 2020; Ong et al., 2020; Lednickya et al., 2020; Toubiana et al., 2020; Cheng et al., 2020; Riddell et al., 2020). Individuals can exhibit a range of clinical manifestations following infection, which may include asymptomatic carriers, pneumonia, an exaggerated inflammatory response, respiratory failure, and acute respiratory distress syndrome (ARDS) (Yang et al., 2020; Guan et al., 2020; Grasselli et al., 2020; Mehta et al., 2020). Among these, pneumonia is the most prevalent clinical manifestation observed in hospitalized patients (George et al., 2020). At the early stages of the epidemic, COVID-19 was mainly manifested as an acute respiratory illness, which may present with varying degrees of acute upper or lower respiratory tract syndromes. Research has indicated that certain patients with COVID-19, particularly among the elderly and immunocompromised populations, are at risk of developing severe pneumonia (Jartti et al., 2011). Pneumonia represents a more serious complication following infection with the SARS-CoV-2 virus; it can result in impaired oxygen exchange, respiratory failure, and potentially multiple organ failure, ultimately leading to death (Chen et al., 2020).
Despite the World Health Organization officially declaring the cessation of the emergency phase of COVID-19 in May 2023, it is estimated that approximately 31% to 69% of individuals who have recovered from the initial infection may experience effects from a range of persistent or novel symptoms in the medium to long term (Tenforde et al., 2020; Su et al., 2022; Ballering et al., 2022). These effects can severely reduce patients’ quality of life long after they are free of the virus and pose global health challenges to society (Koc et al., 2022; Briggs & Vassall, 2021). COVID-19 has been proven to be a challenging disease, and while multiple organ failure and death are rare in mild, self-limiting respiratory presentations of COVID-19, it can present as severe progressive pneumonia. Therefore, in the post-pandemic era, it is still necessary to conduct relevant research on previous cases and explore the impact of related risk factors on the mortality risk of COVID-19 pneumonia cases.
In view of the serious consequences that pneumonia can have on COVID-19 patients, and the lack of research focusing on SARS-CoV-2 infection presenting with pneumonia manifestations, this paper will focus on cases of SARS-CoV-2 pneumonia with lung inflammatory manifestations (also known as pneumonia of COVID-19). This study aims to explore the risk factors of death in cases of it, in order to provide a scientific basis for clinical treatment of relevant populations in the post-epidemic era, thereby reducing the mortality rate of COVID-19 pneumonia.
Materials & Methods
Study objects
We collected 2,283 cases of COVID-19 (caused by the SARS-CoV-2 variant B.1.1.529) that presented with pneumonia manifestations, from patients hospitalized in three hospitals in Guangdong Province (1,007 cases from Guangzhou Chest Hospital, 764 cases from Shenzhen Longhua District Central Hospital and 512 cases from Dongguan Binhaiwan Central Hospital) between October 2022 and February 2023 through an electronic medical record system, following the obtainment of their informed consent. Meanwhile, we collected the basic information, symptoms (including fever, cough, shortness of breath, dyspnea), complications (including severe pneumonia and respiratory failure), and blood test results of the cases through the above system. Participants were followed up for one year to confirm their survival status.
After excluding patients with lost follow-up and incomplete blood test results (see Fig. 1), 1,693 cases were eventually included for analysis. All cases were confirmed by the nucleic acid etiological testing and chest computerized tomography (CT) examination.
Figure 1: Flowchart of the study.
The dataset was randomly split into a training set, validation set, and test set at a ratio of 4:2:4.All methods of this study were carried out in accordance with the Declaration of Helsinki. Meanwhile, written informed consent was obtained from all study subjects, and the study was approved by the institutional review boards of Guangzhou Medical University (NO. 202404023).
Sample size
The sample size was calculated according to the following formula:
p is the estimated mortality rate of hospitalized patients with COVID-19. a is the probability of type I error, and Z = 1.96 when a is set to 0.05. δ is the allowable error.
According to research reports, the global mortality rate of inpatients with COVID-19 (p) was 16% (Baptista et al., 2023), and δ was set to 0.2p. Therefore, a sample size of 505 was required for the study. However, considering a 10% loss to follow-up rate, the minimum sample size required for the study was 556. In this study, a total of 1,693 samples were included, which could meet the requirements of statistical power.
Modeling by machine learning
We constructed a mortality risk prediction model utilizing machine learning (ML) algorithms (see Fig. 1). Initially, multivariate logistic regression analysis was employed to conduct a preliminary screening of the model’s characteristic factors. Subsequently, the dataset was randomly partitioned into training, validation and test sets in a ratio of 4:2:4 using the Caret package (Version: 7.0-1) (Zou et al., 2023; Li et al., 2025). Secondly, the Synthetic Minority Over-sampling Technique (SMOTE) (Yang et al., 2022; Hao, Wang & Bryant, 2014) was applied to the training set for resampling, which was implemented via the Smotefamily package (Version 1.4.0). The oversampling ratio of SMOTE was set close to 1:1 (with an oversampling multiple of 15), which can effectively balance recall and precision and align with the practical needs of clinical decision-making. The number of k-nearest neighbors (k) was set to 3—the most commonly used and robust default value—and no additional parameter adjustments were made. Then, the h2o package (Version: 3.44.0.3) was used for the construction and performance evaluation of machine learning models—three ML algorithms, namely random forest (RF), gradient boosting machine (GBM), and elastic net (EN), were employed to develop the mortality risk prediction model on this training set (Hu & Szymczak, 2023; Natekin & Knoll, 2013; Hong, Chen & Harris, 2013; López-Blanco & Chacón, 2016). Subsequently, the auto-validation function of the h2o package was utilized to perform five-fold cross-validation on the validation set to adjust and optimize the parameters of the machine learning models (the hyperparameter tuning protocol is shown in Table S5), so as to enhance the models’ performance. Ultimately, the prediction performance of the constructed machine learning models was assessed using the test set, and the randomForest package (Version: 4.7-1.2) was employed to output important variables.
Statistical analysis
Statistical methods for variable filtering and model construction are described in the preceding section. The quantitative data that conformed to a normal distribution or approximate normality were presented as mean ± standard deviation ( ± S), with independent sample t-tests employed for comparison between two groups. Otherwise, results were expressed as median (M) and percentiles (P25, P75), and analyzed using the rank sum test between groups. The qualitative data were represented as percentages (%) and compared between groups utilizing the Chi-square test or Fisher’s exact probability method. Receiver operating characteristic (ROC) curves were plotted using the h2o package (Version: 3.44.0.3) and pROC package (Version: 1.19.0.1), and the area under the curve (AUC), sensitivity, specificity, and their respective 95% confidence intervals (95% CIs) were calculated to evaluate the performance of the models. The higher these indicators’ values, the better the model performance. A significance level of 0.05 was established, and differences were deemed statistically significant when P < 0.05.
All data analyses were performed in R Project for Statistical Computing (R software, Version 4.4.1; R Core Team, 2024).
Results
Demographic characteristics
A total of 1,693 subjects were included in this study (see the flow chart in Fig. 1). As illustrated in Table 1, there were 100 fatalities among the cases included in the study, resulting in a mortality rate of 5.9% for COVID-19 inpatients with pneumonia manifestations. The mean age of the participants was 59.1 ± 22.0 years, with a significant proportion (53.0%) being over 60 years old. Additionally, males comprised a larger percentage than females (58.8% vs. 41.2%), and the majority of subjects presented with non-severe pneumonia (89.0%).
| Total (N = 1693) |
G Hospital (N = 845) |
S Hospital (N = 531) |
D Hospital (N = 317) |
P* | |
|---|---|---|---|---|---|
| Age (year) | 59.1 ± 22 | 57.1 ± 21.7 | 57.7 ± 24.1 | 66.9 ± 16.8 | <0.001 |
| Age (year) | <0.001 | ||||
| <60 | 796 (47.0%) | 254 (47.8%) | 95 (30.0%) | 447 (52.9%) | |
| ≥60 | 897 (53.0%) | 277 (52.2%) | 222 (70.0%) | 398 (47.1%) | |
| Sex | 0.995 | ||||
| Female | 698 (41.2%) | 349 (41.3%) | 218 (41.1%) | 131 (41.3%) | |
| Male | 995 (58.8%) | 496 (58.7%) | 313 (58.9%) | 186 (58.7%) | |
| Severe pneumonia | <0.001 | ||||
| No | 1506 (89.0%) | 790 (93.5%) | 464 (87.4%) | 252 (79.5%) | |
| Yes | 187 (11.0%) | 55 (6.5%) | 67 (12.6%) | 65 (20.5%) | |
| Status | 0.452 | ||||
| Alive | 1593 (94.1%) | 801 (94.8%) | 497 (93.6%) | 295 (93.1%) | |
| Dead | 100 (5.9%) | 44 (5.2%) | 34 (6.4%) | 22 (6.9%) |
Notes:
- G Hospital
-
Guangzhou Chest Hospital
- S Hospital
-
Shenzhen Longhua District Central Hospital
- D Hospital
-
Dongguan Marina Bay Central Hospital
Symptoms and comorbidities of COVID-19 such as dyspnea and severe pneumonia are associated with mortality risk
We conducted an analysis of the risk factors associated with mortality in COVID-19 inpatients with pneumonia manifestations after adjusting for age and sex, and explored the relationship between symptoms, comorbidities, and blood test indicators and patients’ mortality risk. As detailed in Tables S1–S4, we identified 37 significant factors linked to increased mortality risk in COVID-19 inpatients. These include age ≥60 years (adjusted OR = 7.64, 95% CI [4.23–15.26]), anhelation (adjusted OR = 2.03, 95% CI [1.33–3.11]), dyspnea (adjusted OR = 3.70, 95% CI [2.21–6.05]), severe pneumonia (adjusted OR = 20.04, 95% CI [12.45–32.96]), respiratory failure (adjusted OR = 11.60, 95% CI [7.36–18.49]), coagulation dysfunction (adjusted OR=5.62, 95% CI [1.87–15.37]), platelet abnormalities (PLT, adjusted OR = 1.87, 95% CI [1.21–2.87]), D-dimer abnormalities (adjusted OR = 7.72, 95% CI [4.62–13.52]), abnormal procalcitonin levels (PCT, adjusted OR = 4.75, 95% CI [2.96–7.88]), and elevated lactate dehydrogenase levels (LDH, adjusted OR = 4.65, 95% CI [2.96–7.49])—all of which were identified as significant risk factors for increased mortality (P < 0.05).
Construction and validation of mortality risk prediction model
Among the 37 factors identified as influencing the risk of mortality in COVID-19 inpatients with pneumonia manifestations above, we employed three machine learning (ML) algorithms—RF, GBM, and EN—to construct a predictive model for mortality risk. The parameter values for these models are detailed in Table S5. As illustrated in Fig. 2 and Table 2, the AUC, sensitivity, specificity, and corresponding 95% Confidence Intervals (CI) for the test set were as follows: RF: aUC = 0.907 (0.849–0.957), sensitivity = 0.875 (0.773–0.978), specificity = 0.870 (0.844–0.896); GBM: AUC = 0.898 (0.840–0.956), sensitivity = 0.850 (0.739–0.961), specificity = 0.856 (0.828–0.883); and EN: AUC = 0.896 (0.825–0.967), sensitivity = 0.725 (0.587–0.863), specificity = 0.911 (0.888–0.933).
The larger the area under the curve (AUC), sensitivity, and specificity, the better the predictive performance of the model. Based on a comprehensive assessment of the models’ AUC, sensitivity, and specificity, the RF model demonstrated superior performance, with the highest AUC value and relatively high sensitivity and specificity. Consequently, it was selected as the primary mortality risk prediction model for this study. Moreover, key features contributing to this model’s efficacy are depicted in Fig. 3, underscoring that characteristics such as severe pneumonia were of great importance to the mortality risk in COVID-19 inpatients. Among the 37 factors identified as influencing the risk of mortality in COVID-19 inpatients above.
Figure 2: (A–C) ROC curves of three machine learning mortality risk prediction models on the test set.
| Model | Sensitivity (95% CI) | Specificity (95% CI) | AUC (95% CI) |
|---|---|---|---|
| RF | 0.875 (0.773–0.978) |
0.870 (0.844–0.896) |
0.907 (0.849–0.957) |
| GBM | 0.850 (0.739–0.961) |
0.856 (0.828–0.883) |
0.898 (0.840–0.956) |
| EN | 0.725 (0.587–0.863) |
0.911 (0.888–0.933) |
0.896 (0.825–0.967) |
Notes:
- RF
-
Random Forest
- GBM
-
Gradient Boosting Machine
- EN
-
Elastic Net
- AUC
-
Area under the curve
- 95% CI
-
95% confidence interval
Figure 3: Top 15 Variable importance assessed based on random forest model.
Web-based model visualization
To enhance the practical application of the model, we developed a web-based visual interaction tool for the mortality risk prediction model using the Shiny package in R software, based on the key features illustrated in Fig. 3. As depicted in Fig. 4, users can access and utilize this tool for risk prediction via the website (https://outch-lee.shinyapps.io/Lung_Health/), by following the corresponding operational prompts.
Figure 4: Screenshot of the web-based mortality risk model.
Discussion
In this study, we used ML methods to analyze real-world data from three centers, reported a mortality rate of 5.9% among COVID-19 inpatients with pneumonia manifestations, identified 37 significant factors related to the mortality risk of these patients, and also built and validated a predictive model for mortality risk with superior performance.
Our study reported a mortality rate of 5.9% among COVID-19 inpatients with pneumonia manifestations, which is comparable to the 5% mortality rate documented by Li et al. (2020a). The mortality rate of these patients is influenced by various factors, including domestic socioeconomic status, development levels, income, and health care conditions (Abou Ghayda et al., 2022). Current literature indicates that the case mortality rates of COVID-19 inpatients range from 3.67% to 10% (Baptista et al., 2023; Li et al., 2020a; Verity et al., 2020; Onder, Rezza & Brusaferro, 2020; Chowdhury, Rathod & Gernsheimer, 2020), suggesting that our findings are within a moderate range. According to data from the WHO website (https://data.who.int/dashboards/covid19/cases), over 760 million cases of COVID-19 have been recorded globally since December 2019, and approximately 10–20% of these cases transition into a prolonged state in the post-emergency phase. This indicates that even in the late post-pandemic period, we still need to focus on the health risks of patients with COVID-19.
In addition to compromising the pulmonary function of patients, leading to symptoms such as cough and potentially respiratory failure, the novel coronavirus may also inflict damage on other organs and tissues, including the heart and kidneys, thereby manifesting as a systemic disease (Docherty et al., 2020; Giacca & Shah, 2022; Qi & Yu, 2020). Although the virulence of the virus has diminished with ongoing mutations—resulting in a reduced incidence of multiple organ failure and mortality—there remain instances where severe progressive pneumonia can occur. This study identifies severe pneumonia as a significant risk factor for mortality among COVID-19 inpatients. This is likely attributable to its association with various comorbidities that exacerbate the effects of pulmonary infection on overall health (Guan et al., 2020). Furthermore, patients exhibit impaired adaptive immune responses alongside abnormal elevations in inflammatory cells and cytokine storms, which may contribute to both local and systemic organ damage (Li et al., 2020b). Research indicates that abnormalities in blood markers such as lymphocytes, platelets, D-dimer, creatine kinase, and procalcitonin are closely correlated with COVID-19 severity and mortality risk (Huang et al., 2020; Gao et al., 2021; Mao et al., 2020). Our study corroborates these findings through multivariate variable screening analysis, and it suggests that excessive inflammation induced by COVID-19 significantly impacts patient prognosis. Additionally, consideration should be given to whether hospitalized individuals who have previously contracted COVID-19 will experience prolonged effects from this inflammatory response during the sequelae of COVID-19.
The medium- and long-term consequences of COVID-19 are imposing a substantial burden on the quality of life for survivors, while also generating significant health and economic challenges globally (Koc et al., 2022; Davis et al., 2023). Nevertheless, there seems to be a notable lack of awareness regarding the health impacts of COVID-19, particularly in low- and middle-income countries (The Lancet, 2023). Generally, the symptoms of sequelae associated with COVID-19 may include fatigue, myalgia, palpitations, cognitive impairment, dyspnea, anxiety, chest pain, and others. Additionally, factors such as age, comorbidities, blood coagulation abnormalities, and diabetes can predispose individuals to the development of sequelae (Tenforde et al., 2020; Ballering et al., 2022; Raveendran, Jayadevan & Sashidharan, 2021; Pretorius et al., 2022; Turner et al., 2023). Moreover, our study indicates that respiratory complications and coagulopathy significantly elevate the mortality risk of COVID-19 inpatients. This underscores the necessity for vigilance concerning this population’s health status in the post-pandemic era. It is imperative to closely monitor potential abnormalities in respiratory function and coagulation parameters while implementing timely interventions to mitigate mortality risks.
In comparison to traditional models such as logistic regression, ML algorithms are adept at managing nonlinear relationships between independent and dependent variables, demonstrating superior performance relative to conventional predictive models (Boulesteix & Schmid, 2014). Furthermore, these algorithms excel in processing high-dimensional datasets and can effectively operate even when the number of features exceeds that of samples, whereas this scenario may induce overfitting issues in logistic regression (Deo & Nallamothu, 2016). This study evaluates three distinct ML models: RF, GBM, and EN. The findings indicate that the RF model outperformed the other models in predicting patient mortality risk, particularly regarding sensitivity. Additionally, compared with the mortality risk prediction models developed in existing studies—specifically, the logistic regression model constructed by Murri et al. (2021) using blood test indicators of COVID-19 inpatients from a single hospital (AUC = 0.87, sensitivity = 0.840, specificity = 0.774), and the model built by Chen et al. (2021) via the random forest algorithm using blood test and other indicators of 6,415 patients (AUC = 0.90, sensitivity = 0.872, specificity = 0.800)—our study not only focused on blood test indicators but also prioritized the assessment of patients’ comorbidities. This inclusion of comorbidities better reflects patients’ risk of disease progression, and the resulting model demonstrates superior predictive accuracy, along with higher sensitivity and specificity (AUC = 0.907, sensitivity = 0.875, specificity = 0.870).
Nevertheless, this study is subject to certain limitations due to objective constraints. Firstly, the participants in this research are exclusively from the Guangdong region of China, which may not adequately reflect the circumstances of other regions or diverse populations. Secondly, a limited number of fatalities were recorded in this study, potentially impacting the stability of the evaluation metrics. Thirdly, due to resource limitations, important predictive factors or confounding factors, such as COVID-19 vaccination status, prior SARS-CoV-2 infection history, socioeconomic status, and residential area, were not included in the analysis, which may limit the comprehensiveness of the model and may lead to residual confounding. Finally, owing to the limitation of the number of outcome events, reserving one center for validation or conducting stratified sensitivity analysis would be susceptible to random errors, potentially leading to statistically insignificant results. Therefore, we did not consider conducting sensitivity analysis or reserving one of the hospitals for external validation. However, we have fully verified the stability and generalizability of the model through alternative validation analyses such as feature importance assessment and cross-validation.
Conclusions
In conclusion, through an analysis of multicenter hospitalized cases of COVID-19 with pneumonia manifestations, we found that the mortality rate among these patients was 5.9%. The clinical characteristics influencing mortality in COVID-19 inpatients were identified, and a novel predictive tool was developed and validated based on real-world multicenter data. This tool can accurately assess the mortality risk among COVID-19 inpatients with commendable predictive performance, which holds significant practical implications for monitoring mortality risks in these individuals.



