Current predictive models for cardiovascular disease based on points systems use the baseline situation of the risk factors as independent variables. These models do not take into account the variability of the risk factors over time. Predictive models for other types of disease also exist that do consider the temporal variability of a single biological marker in addition to the baseline variables. However, due to their complexity these other models are not used in daily clinical practice. Bearing in mind the clinical relevance of these issues and that cardiovascular diseases are the leading cause of death worldwide we show the properties and viability of a new methodological alternative for constructing cardiovascular risk scores to make predictions of cardiovascular disease with repeated measures of the risk factors and retaining the simplicity of the points systems so often used in clinical practice (construction, statistical validation by simulation and explanation of potential utilization). We have also applied the system clinically upon a set of simulated data solely to help readers understand the procedure constructed.

Given that cardiovascular diseases (CVD) are one of the main causes of death in the world (

In conjunction with the Framingham and SCORE predictive models, others have been developed that are also used in clinical practice, though to a lesser extent, such as the Reynolds risk score and the WHO/ISH score (

Given the complexity of these mathematical models an algorithm is used to enable the clinician to understand them more easily, though precision is lost in the estimation of the probability of CVD (

Predictive models for survival in other diseases do consider the temporal variability of a single biological marker (as well as the baseline variables). These are known as Joint Models for Longitudinal and Time-to-Event Data and comprise two parts: (1) A mixed linear model to determine the path of a longitudinal parameter; and (2) A survival model relating the baseline variables and the longitudinal parameter with the appearance of an event. These models can be used to make more precise predictions about the development of a disease (

Here we aim to show the viability and properties of a new methodological alternative for constructing cardiovascular risk scores (construction, statistical validation by simulation and potential utilization with the new theoretical model) dealing with the temporal variability of CVD risk factors. We also apply the model using a set of simulated data, with the sole purpose of helping readers understand how to apply it to a real data set with repeated measures of cardiovascular risk factors. In other words, the example given using simulated data is only to show how to apply the method proposed with a real data set having the characteristics given in this work. Thus, the scoring system given here has no value in clinical practice; what is of value is the way the system is constructed.

The basic models used to develop the new method were the Cox model with time-dependent variables, points system in the Framingham Heart Study, Joint Models for Longitudinal and Time-to-Event Data, and predictions of the longitudinal biomarkers using these Joint Models.

Let ^{∗} and the censoring time ^{∗}, ^{∗} ≤ _{0}(

The estimation of the model parameters is based on the partial likelihood function (

The classical Cox regression model (with no time-varying covariates), deletes

We summarize the steps of the method developed by the Framingham Heart Study to adapt a Cox regression model with

Estimate the parameters of the model:

Organize the risk factors into categories and determine reference values:

Continuous risk factor (e.g., age): set up contiguous classes and determine reference values for each. Example for age: 18–30 [24], 30–39 [34.5], 40–49 [44.5], 50–59 [54.5], 60–69 [64.5] and ≥70 years [74.5]. In brackets is the reference value. The Framingham Heart Study researchers recommend mid-points as acceptable reference values, and for the first and last class the mean between the extreme value and 1st (first class) or 99th percentiles (last class).

Binary risk factors (e.g., gender, 0 for female and 1 for male): the reference value is again either 0 or 1.

Let _{ij} denote the reference value for the category _{i} (total number of categories for the risk factor

Determine the referent risk factor profile: the base category will have 0 points in the scoring system and it will be denoted as _{iREF},

Determine how far each category is from the base category in regression units: calculate _{i}.

Set the fixed multiplier or constant

Determine the number of points for each of the categories of each risk factor: the closest integer number to

Determine risks associated with point totals:

Using the former notation, we have the random variables vector _{i1} ≤ _{i2} ≤ … ≤ _{ini}. Now, we will denote as

In the above expression, we have used _{i} denote the vectors of regression coefficients for the unknown fixed-effects parameters and the random effects respectively, _{i}(_{i}(_{i}(^{2}. Finally, _{i} follows a normal distribution with mean _{i}(

The estimation of the parameters of the joint models is based on a maximum likelihood approach that maximizes the log-likelihood function corresponding to the joint distribution of the time-to-event and longitudinal outcomes (

Regarding the assumptions of the model, we have to assess them for both submodels (longitudinal and survival) using the residual plots. For the longitudinal part, we will plot the subject-specific residuals versus the corresponding fitted values, the Q–Q plot of the subject-specific residuals, and the marginal residuals versus the fitted values. On the other hand, for the survival part, we will plot the subject-specific fitted values for the longitudinal outcome versus the martingale residuals, and finally we will determine graphically whether the Cox-Snell residuals is a censored sample from a unit exponential distribution (

Let

Rizopoulos developed a Monte Carlo approach to perform this task, based on Bayesian formulation. He obtained the following simulation scheme (

Step 1: Draw

Step 2 : Draw

Step 3: Compute

This scheme should be repeated

We highlight that these predictions have a dynamic nature; that is, as time progresses additional information is recorded for the patient, so the predictions can be updated using this new information.

We wish to determine the probability of having CVD with effect from a baseline situation (

Adjust a Cox regression model with time-dependent variables. As we are unable to estimate a joint model with multiple longitudinal parameters (

Use the procedure of the Framingham study to adapt the coefficients of the model obtained to a points system and determine the probabilities of CVD for each score up to the moment

Adjust a joint model for longitudinal and time-to-event data for each longitudinal parameter recorded during the follow-up. This will also include all the baseline variables. These models are constructed to make predictions about the longitudinal parameters in new patients (statistical validation by simulation and potential utilization).

Once the points system has been constructed, we wish to see whether the model determines the onset of CVD accurately in a different set of subjects (validation sample). In this validation sample we know the longitudinal markers up to the point

Determine

For each

Calculate the median of the points distribution for each patient in the validation sample. Note that we do not use the mean as it could contain decimals and this has no sense when applying the scoring system. Using these medians, classify each patient in a risk group and compare the rate of events predicted by the points system in each group to the actual observed rate. The test used for this process will be Pearson ^{2} test.

The concordance statistic used has been reported to have various limitations (

Once the points system has been validated statistically the clinician can then apply the system to determine the cardiovascular risk in a new patient, and take any necessary measures to reduce this risk. The healthcare professional will already have historical information about the longitudinal parameters (

Determine the value of each longitudinal parameter at the time

Determine the median and the 2.5% and 97.5% percentiles of the points vector constructed above. The median will be the estimation of the score for the new patient and the percentiles will define the confidence interval (

The clinician now knows the cardiovascular risk and which parameters have a score above normal, so he or she can then design the best intervention for that patient. This presents a problem, as we need to know the value of each biological parameter at time

From the previous step the clinician knows the parameters on which to act and the history of these parameters as well as the baseline situation. From these measurements the clinician can establish a realistic objective for the next patient visit at time

These calculations will give the benefit of the intervention (estimation (mean or median) of the biological parameter at

With the sole purpose of explaining how to use the method proposed here, we have simulated a data set upon which to apply each of the steps described above. Note that we are in fact going to simulate two data sets, one to construct the model and the other to validate it statistically via simulation. So that both sets are biologically plausible we have used estimations obtained in the Puras-GEVA cardiovascular study, which has been published in Medicine (

Our data sets will include the following biological parameters: age (years), systolic blood pressure (SBP) (mmHg), HbA1c (%), atherogenic index, gender (male or female) and smoking (yes or no). Of these, the SBP, HbA1c and the atherogenic index will be present at baseline (

For the main variable (time-to-CVD) we shall suppose that our cohort is used to predict CVD with a follow-up of 2 years. Note that the traditional cardiovascular risk scales use a time of 10 years (

The work used for our simulated data set developed and validated a predictive model of CVD (angina of any kind, myocardial infarction, stroke, peripheral arterial disease of the lower limbs, or death from CVD), to enable calculation of risk in the short, medium and long term (the risk associated with each score was calculated every 2 years up to a maximum of 14) in the general population (

The longitudinal follow-up measurements (construction sample) assumed that the patient attends the physician’s office once every 3 months for measurements of SBP, HbA1c and the atherogenic index. This is done until the end of the follow-up for each patient. The statistical validation sample using simulation supposes that there is a certain probability of having records in the clinical history of all the longitudinal parameters every 3 months for 5 years retrospectively (

The Supplemental Information (

One could think that by managing a shorter time period of just 2 years there would be no variability in the cardiovascular risk factors. However, in

We decided to use a simulated data set as we did not have available any data set with real data. This way of explaining a new method has already been used by others working with joint models, as the only objective of the simulated data set is to explain how to apply the new method (

Given the amount and extension of the results these are given in detail in the Supplemental Information (

The parameters of the Cox model with time-dependent variables are shown in

Goodness-of-fit (likelihood ratio test): ^{2} = 912.3,

Variable | ||
---|---|---|

Age (baseline) (per 1 year) | 0.0846 | <0.001 |

SBP (per 1 mmHg) | 0.00874 | <0.001 |

HbA1c (per 1%) | 0.188 | <0.001 |

Atherogenic index (per 1 unit) | 0.191 | <0.001 |

Male gender | 0.479 | 0.001 |

Smoker (baseline) | 0.721 | <0.001 |

Systolic blood pressure

glycated haemoglobin

Abbreviations: SBP, systolic blood pressure; HbA1c, glycated haemoglobin; TC, total cholesterol; HDL-c, HDL cholesterol.

The strategy to eliminate variables is to eliminate down from the most complex terms to the most simple terms. Goodness-of-fit: (1) SBP: ^{2} = 371, 574.1, ^{2} = 210, 881.1, ^{2} = 121, 118.0,

Variable | SBP (mmHg) | HbA1c (%) | Atherogenic index | |||
---|---|---|---|---|---|---|

Male gender | 0.428 | <0.001 | 0.475 | <0.001 | 0.446 | <0.001 |

Age (per 1 year) | 0.0837 | <0.001 | 0.0840 | <0.001 | 0.0833 | <0.001 |

Smoker | 0.731 | <0.001 | 0.757 | <0.001 | 0.775 | <0.001 |

Parameter (per 1 unit) | 0.0085 | <0.001 | 0.216 | <0.001 | 0.195 | <0.001 |

1 | 133.557 | <0.001 | 6.158 | <0.001 | 4.602 | <0.001 |

0.0046 | <0.001 | 0.0001 | <0.001 | 0.0001 | <0.001 | |

1 | 21.683 | N/A | 1.346 | N/A | 1.324 | N/A |

0.0358 | N/A | ∗ | ∗ | 0.0013 | N/A | |

Residual | 8.933 | N/A | 0.357 | N/A | 0.302 | N/A |

systolic blood pressure

glycated haemoglobin

not applicable

term eliminated due to convergence problems

A new patient arrives at our office with the following characteristics: male, 83 years old, non-smoker, and taking pharmacological medication (one antihypertensive drug and one oral antidiabetic agent) and non-pharmacological measures (diet and exercise). His history of cardiovascular risk factors is available (

Time has a negative value because it refers to the measurements taken before the baseline situation and this was defined as

Time (days) | SBP (mmHg) | HbA1c (%) | Atherogenic index |
---|---|---|---|

−360 | 152 | 5.1 | 3.56 |

−330 | 135 | 5.3 | 3.23 |

−270 | 164 | 4.7 | 3.45 |

−180 | 153 | 4.4 | 4.12 |

−90 | 170 | 5.0 | 4.15 |

0 | 145 | 4.9 | 5.17 |

systolic blood pressure

glycated haemoglobin

Application of the new model gives a histogram of the cardiovascular risk score obtained for this patient (

The clinician can now see that if the patient complies with a series of interventions (pharmacological (add two antihypertensive drug → −20 mmHg; prescribe a statin →−40% atherogenic index) and non-pharmacological (reduce salt in the diet →−5 mmHg)), his longitudinal parameters after 3 months would be: SBP 120 mmHg (145 – 2 × 10 – 5 = 120 mmHg), atherogenic index 3.10 (5.17 – 40% = 3.10), and HbA1c 4.9% (same value because no intervention was done). Applying the model using the new information gives the cardiovascular risk at 2 years (

This paper describes a method to construct predictive models for CVD considering the variability of cardiovascular risk factors and at the same time having the simplicity of points systems, which are widely used in daily clinical practice worldwide (

The cardiovascular risk scales currently available do not value the temporal variability of the parameters controlling the risk factors, although a very positive aspect of these scales is that they take into account simplicity for immediate application by healthcare professionals, the persons who really have to apply these mathematical models (

Comparison between our proposed model and current cardiovascular risk scales is problematic. Our model is more suitable to make short-term predictions, though the more time that passes from the baseline situation (

Obtaining simulations from longitudinal parameters is not easy and implies a computational cost of about one minute with the statistical package R to implement a total of 100 using a normal computer. On the other hand, the historical values of the longitudinal parameters are recorded in the clinical history, which nowadays is usually electronic (

As this algorithm was developed from a set of simulated data, we encourage others who have cardiovascular databases like that used here to implement a model with the characteristics described herein. Thus, if using real-life data achieves greater predictive precision, we shall be able to apply this method to obtain the best short-term prognosis and thus take the most appropriate decisions for the benefit of the patient. Nevertheless, we should note that the method proposed is based on the combination of mathematical models already used in medicine; therefore, in theory our model is quite correct as we have been extremely strict in each of the steps to follow. In practice we can determine the value of

We developed an algorithm to construct cardiovascular risk scales based on a points system that also takes into account the variability of the risk factors. These issues are important as the popularity of points systems in clinical practice and the improved predictive accuracy using all the information recorded in the clinical history will improve the currently used procedure. The theoretical construction of our method is based on the combination of mathematical models already used in medicine, taking into account the characteristics of each of these other models. As mentioned, the prediction time and the structure of each of the models can change in practice, as well as being used for other diseases apart from CVD or even applied to other areas of knowledge. Finally, as we do not have real data available for its immediate application in clinical practice, we encourage others to use our methods with their own data sets. In the case of CVD, traditional cohort studies should be done, but recording repeated measurements of risk factors both during the follow-up as well as for the period immediately prior to baseline

The authors thank Ian Johnstone for help with the English language version of the text.

Antonio Palazón-Bru serves as an academic editor for PeerJ.

The following information was supplied regarding data availability:

Our data set is simulated and we have explained how to obtain it in the