Development of a predictive model for severe hyperlipidemic acute pancreatitis based on LASSO regression: a retrospective study

View article
PeerJ

Introduction

Hyperlipidemic acute pancreatitis (HLAP), defined as acute pancreatitis resulting from hyperlipidemia, is characterized by significantly increased triglyceride levels, moderately elevated amylase levels, pseudohyponatremia, and a tendency to occur at a younger age (Expert Consensus on the Diagnosis and Treatment of Acute Pancreatitis with Hypertriglyceridemia (Emergency Department) (Expert Group), 2021; Zhou et al., 2024; Fan, Li & Wu, 2025). In recent years, the incidence of HLAP has been increasing, making it the second most common cause of acute pancreatitis (Chinese Pancreatic Surgery Association et al., 2021). HLAP is associated with high morbidity and mortality rates, imposing a significant medical burden (Deng et al., 2025). Therefore, it is of great significance to study the risk factors for HLAP-related severe cases and develop a prediction model, in order to identify severe cases early and initiate early and active intervention, thereby reducing the incidence of severe cases, mortality rate, and alleviating the disease burden.

A number of scholars have studied the risk factors related to the severe progression of HLAP, such as serum calcium, D-dimer, neutrophil percentage, C-reactive protein (CRP), and blood urea nitrogen (BUN). The research methods have mostly included single-factor analysis and logistic regression analysis (Thong, Mong Trinh & Phat, 2021). Some scholars have studied the predictive performance of a single indicator or combined indicators for severe HLAP (Zhu et al., 2020), while others studied the predictive performance of commonly used acute pancreatitis scoring systems for severe HLAP (Qiu et al., 2015; Yang et al., 2016). However, there are few studies that specifically target the construction of a model for predicting severe HLAP (Dong et al., 2024; Lin et al., 2024). No prediction models or scoring systems have been recommended for clinical use by guidelines or consensus statements so far. Therefore, developing early diagnostic prediction models with excellent performance that are specifically designed for the severe progression of HLAP in the early stage is urgently needed. LASSO regression is an effective variable selection tool that can solve the problems of collinearity and overfitting, which is particularly important in the era of big data (Emmert-Streib & Dehmer, 2019). In this context, in this study, we collected case data of HLAP patients treated at Taixing People’s Hospital Affiliated to Yangzhou University, used LASSO regression to screen and analyze risk factors, and constructed a logistic model to help clinicians accurately identify severe patients in the early stage.

Materials & Methods

Study population

The clinical data of patients with HLAP admitted to Taixing People’s Hospital Affiliated to Yangzhou University from January 1, 2020, to June 30, 2023, were collected. Inclusion criteria: the patients were included if they met the diagnostic criteria of HLAP (referring to the guideline “Expert Consensus on the Diagnosis and Treatment of Emergency Hypertriglyceridemic Acute Pancreatitis”: (1) meeting the diagnostic criteria of acute pancreatitis (AP); (2) serum TG level ≥1,000 mg/dL (11.30 mmol/L), or serum TG level 500–1,000 mg/dL (5.65–11.30 mmol/L) with chylous serum; and other causes of AP excluded). Exclusion criteria: (1) severe liver and kidney function diseases; (2) malignant tumors; (3) transfer to another hospital halfway; (4) the interval between onset and admission greater than 48 h; and (5) incomplete data (missing >20%).

Data collection and sorting

Data were entered using Epidata3.1, with double entry and consistency checks conducted. The data collected included gender, age, vital signs, history of diabetes (type 1 or type 2), recurrence, blood routine, liver and kidney function, electrolytes, coagulation function, blood gas analysis, blood glucose, blood lipid, amylase and other data within 48 h after admission. Excel and R 4.3.1 were used to organize the data. The makeX function in glmnet package of R 4.3.1 software was used to complete the missing values by mean imputation when LASSO regression was performed, and the simple median imputation method was used to handle the missing values when the logistic regression model was constructed.

Risk factor analysis and model construction

The patients were classified into the mild group and the moderate/severe group based on the 2012 American Atlanta Acute Pancreatitis New Grading and Classification System. First, univariate analysis was conducted. Continuous variables were tested for normality using the Shapiro–Wilk test, and were described using the mean ± standard deviation if they met the normal distribution criteria. In that case, the t test was used. If they did not meet the normal distribution criteria, they were described using the median (interquartile range (IQR)), and the rank-sum test was used. Categorical variables were described using the number of cases, and the chi-square test or Fisher’s exact test was used. P < 0.05 was considered statistically significant. The statistically significant variables identified in the univariate analysis were included in the LASSO regression analysis using the glmnet package of R 4.3.1 software, with a 10-fold cross-validation method, and the lambda value (lambda.1se) corresponding to the minimum mean squared error was selected as the optimal solution. The variables selected by LASSO regression were used to construct the logistic regression model using the rms package in R 4.3.1 software. The model’s discriminatory ability was evaluated using the ROC curve and the AUC. The consistency of the model was evaluated using the Hosmer–Lemeshow goodness-of-fit test and calibration curve.

Ethics approval

This study was approved by the Medical Ethics Committee of Taixing People’s Hospital (No: LS2025008; No: YJ2025009), and the requirement for informed consent was waived.

Results

Patients’ demographic characteristics and univariate analysis

A total of 356 patients were enrolled in this study, including 237 males (66.6%) and 119 females (33.4%), aged 42.7 ± 10.2 years. There were 296 mild cases (83.1%) and 60 moderate/severe cases (16.8%). The mean length of hospital stay was 7.90 days, with a median of 7 days and an IQR of 5–9 days. A total of 243 patients (68.3%) had fatty liver, 147 (41.3%) had a history of diabetes (type 1 or type 2), and 111 (31.2%) had recurrent HLAP. A total of 107 indicators (variables) were collected. Univariate analysis identified 50 variables with statistically significant differences between the mild and moderate/severe groups (Table 1).

Table 1:
Univariate analysis of severe hyperlipidemic acute pancreatitis.
Variable Mild Moderate/severe t/W/χ2 P
Diabetes [n (%)] 6.1924 0.013
No 182 (51.27%) 26(7.32%)
Yes 113 (31.83%) 34(9.58%)
Recurrence [n (%)] 6.2933 0.012
No 195 (54.78%) 50 (14.04%)
Yes 101 (28.37%) 10 (2.81%)
Urinary protein [n (%)]
No 96 (48%) 13 (6.5%) 10.716 0.001
Yes 62 (31%) 29 (14.5%)
Red blood cell count [M (P25, P75), 1012 /L] 4.80 (4.40, 5.14) 5.02 (4.64, 5.44) 6,816.5 0.022
Hematocrit [ x ¯ ±s, %] 41.47 ± 4.66 43.45 ± 6.07 −2.3549 0.021
Red blood cell distribution width [M (P25, P75), %] 13.00 (12.40, 13.50) 13.20 (12.90, 14.20) 5,861 0.002
White blood cell count [M (P25, P75), 109 /L] 12.58 (10.09, 15.58) 13.65 (11.60, 16.20) 7,154 0.030
Neutrophilic granulocyte percentage [M (P25, P75), %] 82.20 (76.82, 86.85) 85.10 (80.95, 87.80) 6,884.5 0.015
Lymphocyte percentage [M (P25, P75), %] 10.70 (7.10, 15.60) 7.90 (5.21, 11.55) 10,102 0.002
Eosinophils percentage [M (P25, P75), %] 0.40 (0.10, 0.80) 0.10 (0, 0.50) 10,366 <0.001
Lymphocyte count [M (P25, P75), 109/L] 1.32 (0.94, 1.82) 1.06 (0.79, 1.60) 9,577 0.015
Eosinophil count [M (P25, P75), 109/L] 0.04 (0.02, 0.09) 0.02 (0, 0.06) 10,272 <0.001
Platecrit [M (P25, P75), %] 0.23 (0.19, 0.27) 0.26 (0.20, 0.30) 5,325.5 0.014
C-reactive protein [M (P25, P75), mg/L] 30.50 (7.48, 101.00) 90.7 (21.20, 253.94) 3,220 <0.001
Procalcitonin [M (P25, P75), ng/mL] 0.07 (0.04, 0.17) 0.33 (0.06, 1.55) 3,490 <0.001
Prothrombin time, PT [M (P25, P75), s] 11.30 (10.70, 11.85) 11.80 (11.30, 12.50) 3,223.5 <0.001
Prothrombin activity, PTA [ x ¯ ±s, %] 105.69 ± 15.11 94.65 ± 15.69 4.264 <0.001
International normalized ratio, INR [M (P25, P75)] 0.98 (0.93, 1.03) 1.02 (0.98, 1.10) 3,355 <0.001
Thrombin time, TT [M (P25, P75), s] 12.90 (12.20, 13.75) 11.90 (11.55, 12.60) 1,122 0.015
D-dimer [M (P25, P75), ng/ml] 550 (343, 884) 1,415 (726, 2,070) 1,392.5 <0.001
Na [M (P25, P75), mmol/L] 133.10 (129.00, 136.60) 131.55(125.25, 135.05) 10,221 0.011
Ca [M (P25, P75), mmol/L] 2.26 (2.14, 2.39) 2.03(1.77, 2.26) 11,922 <0.001
Phosphorus [M (P25, P75), mmol/L] 1.12(0.90, 1.39) 1.02 (0.71, 1.31) 9,782.5 0.012
HCO3 [M (P25, P75), mmol/L] 20.50 (17.85, 22.10) 18.10(13.10, 20.80) 8,144.5 <0.001
Urea nitrogen [M (P25, P75), mmol/L] 4.58 (3.68, 5.65) 5.05(4.30, 7.20) 3,886 0.032
Uric acid [M (P25, P75), umol/L] 352 (279, 433) 401 (310, 504) 5,701 0.005
AST [M (P25, P75), U/L] 23.00 (18.50, 32.00) 29.50 (22.50, 39.50) 1,399.5 0.047
LDH [M (P25, P75), U/L] 246 (205, 325) 341 (271, 488) 3,153 <0.001
Total cholesterol, TC [M (P25, P75), mmol/L] 10.06 (7.96, 13.29) 14.50 (9.92, 18.69) 5,267.5 <0.001
Triglyceride, TG [M (P25, P75), mmol/L] 19.39(15.03, 25.85) 25.07(17.66, 28.57) 6,294.5 <0.001
Low density lipoprotein, LDL [M (P25, P75), mmol/L] 2.21 (1.19, 3.14) 2.71 (1.36, 4.10) 5,863 0.037
Apolipoprotein B [M (P25, P75), g/L] 0.53 (0.31, 0.82) 0.68 (0.37, 1.37) 4,201.5 0.023
Apolipoprotein A1/B [M (P25, P75)] 1.82 (1.10, 3.10) 1.18 (0.61, 2.20) 7,027 <0.001
CK-MB [M (P25, P75), U/L] 2.88(1.09, 8.60) 8.00 (1.56, 11.00) 3,468.5 0.005
Serum amylase [M (P25, P75), IU/L] 162(96, 292) 403 (215, 740) 3,110 <0.001
PH [M (P25, P75)] 7.39 (7.36, 7.42) 7.34 (7.26, 7.40) 2,797 <0.001
pCO2 [M (P25, P75), mmHg] 40(37,42) 35 (29, 39) 2,788 <0.001
actual bicarbonate, AB [M (P25, P75), mmol/L] 24.70 (21.35, 26.00) 19.55 (12.70, 25.80) 2,664 <0.001
Standard bicarbonate, SB [M (P25, P75), mmol/L] 24.75 (22.45, 25.80) 21.25 (14.20, 24.7) 2,702 <0.001
Total CO2 [M (P25, P75), mmol/L] 26 (22.10, 27) 20.50 (14, 27) 2,660.5 <0.001
Standard base excess, SBE [M (P25, P75), mmol/L] −0.20 (−3.80, 1.40) −5.90 (−15, 0.70) 2,684.5 <0.001
Actual base excess, ABE [M (P25, P75), mmol/L] −0.20 (−3.20, 1.20) −4.70 (−13.60, −0.30) 2,713 <0.001
Lactic acid [M (P25, P75), mmol/L] 1.70 (1.30, 2.40) 2.30 (1.55, 2.95) 1,447.5 0.006
Ionized calcium [M (P25, P75), mmol/L] 1.00 (0.94, 1.06) 0.95 (0.79, 1.03) 2,536 0.012
Blood glucose [M (P25, P75), mmol/L] 9.97 (7.34, 14.14) 14.78 (10.55, 17.66) 4,275 <0.001
NLR [M (P25, P75)] 7.61 (4.93, 12.09) 11.00 (7.33, 17.01) 5,896.5 0.002
PLR [M (P25, P75)] 145.70 (102.26, 215.04) 210.77 (124.87, 290.08) 6,085.5 0.005
MLR [M (P25, P75)] 0.51 (0.35, 0.77) 0.69 (0.32, 1.26) 6,563 0.034
CLR [M (P25, P75)] 24.86 (4.89, 85.92) 92.41 (22.04, 227.91) 2,809 <0.001
CAR [M (P25, P75)] 0.73 (0.17, 2.31) 3.28 (0.43, 6.64) 2,620.5 <0.001
DOI: 10.7717/peerj.20471/table-1

Notes:

NLR

neutrophil/lymphocyte

PLR

platelet/lymphocyte

MLR

monocytes/lymphocytes

CLR

C-reactive protein/lymphocyte

CAR

C-reactive protein/albumin

The t-test was used for continuous variables with normal distribution. The wilcoxon rank sum test was used for continuous variables with non-normal distribution. The chi-square test or Fisher’s exact test was used for categorical variables. P < 0.05 was considered statistically significant.

Variables selection of LASSO regression

The 50 variables identified through the univariate analysis were further analyzed using 10-fold cross-validation LASSO regression (Figs. 1 and 2), with Lambda.1se = 0.076 selected as the optimal solution. At this time, the screened variables were D-dimer, serum calcium, cholesterol, SB, total carbon dioxide, and CAR (Table 2).

Lambda and partial regression coefficent of LASSO regression.

Figure 1: Lambda and partial regression coefficent of LASSO regression.

Lambda and deviation plot of LASSO regression analysis.

Figure 2: Lambda and deviation plot of LASSO regression analysis.

Table 2:
Partial regression coefficients of non-return-to-zero variable of lasso regression (Lambda= Lambda.1se).
Variable Partial regression coefficients
D-dimer 0.0002
Ca −1.2003
Total cholesterol, TC 0.0409
Standard bicarbonate, SB −0.0131
Total CO2 −0.0275
C-reactive protein/albumin 0.0037
DOI: 10.7717/peerj.20471/table-2

Logistic regression model

The abovementioned variables were included in the multivariate logistic regression analysis, and the results showed that D-dimer, serum calcium, and cholesterol were the risk factors for severe HLAP (AUC = 0.8341 (95% CI [0.7724–0.8958]); Table 3). The ROC curve graph is presented in Fig. 3, showing that the model has good discriminatory ability. The Hosmer–Lemeshow goodness-of-fit test was conducted (χ2 = 6.8383, P = 0.5542). The calibration curve graph is shown in Fig. 4, where the curve is close to the ideal curve, indicating that the model is consistent.

Table 3:
Logistic regression model.
Coefficients Standard error Wald Z P OR 95%CI
Constant 2.7796 1.5088 1.84 0.065 16.1118 0.8372–310.0716
D-dimer 0.0009 0.0003 3.59 <0.001 1.0009 1.0004–1.0015
Ca −3.4379 0.6803 −5.05 <0.001 0.0321 0.0085–0.1219
Total cholesterol, TC 0.1744 0.0326 5.35 <0.001 1.1905 1.1168–1.2691
DOI: 10.7717/peerj.20471/table-3

Notes:

The significance of coefficients was assessed using the Wald test. P < 0.05 was considered statistically significant.

ROC curve of logistic regression model.

Figure 3: ROC curve of logistic regression model.

Calibration curve of logistic regression.

Figure 4: Calibration curve of logistic regression.

Discussion

This study utilized LASSO regression for variable selection and logistic regression modeling, identifying elevated D-dimer levels, decreased serum calcium, and increased cholesterol as significant risk factors for severe HLAP. These findings are consistent with many previous studies (Zhou et al., 2024; Zhu et al., 2020; Dong et al., 2024; Lin et al., 2024; Wu et al., 2025; Liu et al., 2023; Xu et al., 2022). Current research suggests that the pathogenesis of HLAP primarily involves free fatty acids, impaired pancreatic microcirculation, inflammatory response, oxidative stress, and intrapancreatic calcium overload (Qiu et al., 2023). D-dimer is a degradation product of fibrin, and an elevated D-dimer level indicates the presence of a state of hypercoagulability and secondary fibrinolysis activation. This study identified D-dimer as a risk factor for severe HLAP, which may be because HLAP releases large amounts of inflammatory mediators, activates the coagulation and fibrinolysis system, forms microvascular thrombi, and impairs pancreatic microcirculation, thereby exacerbating the condition (Dong et al., 2024; Wu et al., 2025). Previous studies have shown that hypocalcemia often indicates a severe condition, which may be related to the binding of fatty acids to calcium ions and intrapancreatic calcium overload (Lin et al., 2024). Therefore, monitoring serum calcium is of great significance. This study also revealed cholesterol’s predictive role for severe HLAP, which is consistent with the research of Liu et al. (2023), and may be related to the mechanisms underlying intrapancreatic calcium overload and directly damaging endothelial cells in pancreatic cells (Xu et al., 2022). Many previous studies have shown that the ratio of neutrophils to lymphocytes, CRP, PCT, blood glucose, and BUN are risk factors. Our univariate analysis also showed that these variables were significantly different between the mild/moderate severe group and the severe group, but did not pass the LASSO regression and logistic regression screening. It is possible that these variables contribute less significantly to prediction, or that the sample size is small, or there is case bias.

LASSO regression, a regularization technique, effectively screens variables and simplifies models by imposing a shrinkage penalty term, thereby mitigating overfitting issues arising from multicollinearity (Zhou, Li & Zhang, 2021). It is highly efficient and is particularly suitable for high-dimensional data processing (Emmert-Streib & Dehmer, 2019). In this study, univariate analysis, LASSO regression, and logistic regression were combined to select three variables from 107 variables, with good model performance, which can help clinicians judge the severity of HLAP. The AUC value of the model built in this study is smaller than that of some studies, but the variables in this study are all commonly used clinical indicators, do not include various complex scoring systems, and have the advantages of simplicity and practicality. Moreover, the model does not include pleural effusion, peritoneal effusion pancreatic pseudocyst and other variables occured in the middle and late stages. Therefore, it can be used for early diagnosis.

There are limitations to this study. First, we were unable to collect data on some potential risk factors, such as body mass index, previous surgeries, gene mutations, pain score, narcotic usage, alcohol or smoking habits, endocrine functions, and abdominal fat content and distribution. Second, in this study, a relatively large dataset was collected based merely on empirical judgment, without quantitative approaches (such as estimating learning curves) to assess the sample size. Finally, while our model demonstrated good performance, there is still room for improvement.

Conclusions

In this study, risk factors for severe HLAP were identified, and a logistic regression model was developed based on LASSO regression. The results showed that the risk factors for severe cases of HLAP included increased D-dimer, decreased serum calcium, and increased cholesterol. The logistic regression prediction model demonstrated robust performance, with an AUC of 0.8341 (95% CI [0.7724–0.8958]), indicating good discriminatory ability. The model outperformed several existing models in terms of simplicity and practicality, thus demonstrating it to be a valuable tool for early identification and intervention by clinicians. Future research should aim to collect more comprehensive data, including additional risk factors, and conduct multicenter external validation to enhance the model’s generalizability and accuracy.

Supplemental Information

Raw data from January 1, 2020 to June 30, 2023

DOI: 10.7717/peerj.20471/supp-1

Code: R studio history

DOI: 10.7717/peerj.20471/supp-2

Codebook for different levels of categorical variables

DOI: 10.7717/peerj.20471/supp-3

STROBE checklist

DOI: 10.7717/peerj.20471/supp-4