To increase transparency, PeerJ operates a system of 'optional signed reviews and history'. This takes two forms: (1) peer reviewers are encouraged, but not required, to provide their names (if they do so, then their profile page records the articles they have reviewed), and (2) authors are given the option of reproducing their entire peer review history alongside their published article (in which case the complete peer review process is provided, including revisions, rebuttal letters and editor decision letters).
Following your Appeal, and the feedback from the prior reviewers to your revised manuscript, thank you for your diligence in responding to the requested changes.
The two reviewers have weighted in favorably and I personally agree that this paper deserves to be published provided the additional revisions can be satisfactorily developed by the authors.
Reviewer 2 (Juan Carlos Bazo-Alvarez)
"This manuscript needs more work and it is not ready for publication. However, this new prediction tool seems to be promising, so I believe that this study merits a new opportunity for being improved and, after that, assessed for publication."
PDF file is attached
Reviewer 3 (David Prieto-Merino)
"In my view this is a very interesting manuscript that tries to tackle a methodological problem that has not been very well solve so far.
In this second version, several sections describing the methods have been extended considerably and it is now much more clear what the authors are doing and how they are doing it. I am happy how the authors have addressed my concerns on the previous manuscript.
In summary I’m strongly in favour of the publication of this second version of the manuscript."
Dear Editorial Committee
Please find attached our manuscript entitled A method to construct a points system to predict cardiovascular disease considering repeated measures of risk factors for consideration for publication in PeerJ. An earlier version of this work was sent to PeerJ (Reference #2015:07:5982) but it was rightly rejected by the editorial committee. We consider that the decision was fully justified as the manuscript failed to fulfil the standards for publication in PeerJ. However, the editor and the academic reviewers kindly offered a series of very relevant comments, all of which would help to make the article acceptable. These comments mainly concerned detailing the model constructed and explaining the simulated data set, as well as various individual comments from each reviewer. After working hard on the article for the past month, we have managed to address all the comments given. Accordingly, we wish to have the study reappraised, considering that the editorial committee mentioned it was interesting but required important changes. We sincerely believe that, after addressing all the original comments, the manuscript now merits reconsideration.
Thanking you in advance, I remain
Dr Palazón-Bru (Lead and corresponding author of the manuscript)
We have obtained feedback about your manuscript by three different professionals, with complimentary backgrounds, who could benefit from your work. Yet, uniformly, all of them found major weaknesses with your methods and approach. You can refer to their detailed feedback that can help you improve your work.
A methodology to construct a point system to predict cardiovascular disease considering repeated measures of risk factors
The paper describes the development of a prediction model for cardiovascular disease that aims to take into account the variability if risk factors over time. The paper does not provide enough information for the reader to understand what has been done and why.
Although this is a methodology paper, the model is not specified and each step of the model development and validation is not shown. This information is necessary.
It is mentioned that the model predicts risk of CVD within 2 years from baseline. However, most clinical guidelines on CVD prevention use 10y predicted risk. As the model is not specified it is not clear as to why this short risk prediction period has been chosen.
Use after validation: A simulation is described in which the CVD risk of a patient is reduced from a series of interventions. It is not mentioned what these interventions are and where the expected effect sizes of the risk factor reductions are taken from.
Figure 1: It would be helpful if the information could be provided in a table instead of as screenshots, and shown with information on how the scores were assigned, possibly in combination with the corresponding table.
Abstract & Title
• Methodology is not the best word for describing your purpose. I suggest “method” or “procedure”.
• Your study proposes a new method for building a new type of CVD risk score, but not a new CVD risk score itself (or something equivalent: point system). This is not clear enough until the end of the manuscript. I suggest you clarify it from the beginning (title/abstract). This helps reader to keep his (her) expectations correctly balanced.
• Validation and utilization are big words in the world of CVD risk scores. Usually, validation implies to contrast your new tool (the product of your new procedure, your point system) against real longitudinal data. Utilization implies the application of this “validated tool”. You do not have results about any of them. I suggest reconsider the use of these words. Some options you have: internal validation, statistical validation by simulation, explanation of potential utilization. The idea is to capture your target audience since the beginning: investigators who are searching new statistical procedures for optimizing existing CVD risk scores.
• I suggest you show strength and weakness of statistical methods behind current CVD scores, referring also other relevant scores (e.g. Reynolds and WHO). Remark how, in any of them, information from time-dependent variability of risk factors has not been included in estimates. After that, explain that your new method fills this gap.
• In practice, final users of CVD risk scores (charts users) do not have problems with “the accuracy of the estimation of the probability of CVD…”, because they take their decisions considering cut-off point recommended by guidelines (e.g. people under 10 years high-risk). Actually, this is a very thick estimate of risk. The present challenge is to ensure that people under this classification –and usually under preventive medication - are under real risk. In other words, the current problem does not end in the accuracy of statistical model; it really ends in the improvement of guidelines criteria and final clinical decisions on the field. I suggest that you analyze this idea for an edition of your second paragraph of Introduction.
• At the end of Introduction, in the moment of aims exposition, I suggest to write the most clear you can about real scope of your project: to show viability and properties of a new methodological alternative for constructing CVD risk scores. As I said before, try to do not create over-expectations on readers; for example, promising validation of a final tool.
• Big absence: a detailed description of dataset features. It is completely simulated? Which risk factors and outcomes are you taking into account? Consider that CVD risk scores do not differ only in population baseline or statistical method, they differ in nature of risk factors and outcomes, and how they have been handled. I suggest include a Dataset section in the manuscript.
• Main methodological issue: you argument that your new tool (point system) is better than traditional tools, but do not show evidence about that. In other words, the use of time-variant information is potentially an advantage of your final tool (point system) that needs to be confirmed via comparisons with other current tools. This part is extremely important, because there is not an empirical justification for using your new proposed procedure in order to create new CVD risk scores (or point systems). I suggest you include a table for comparing accuracy of your new point system against accuracy of some equivalent CVD risk scores (SCORE or WHO maybe), at least using a simulated data (in absence of desirable real data).
• Your simulations seems to be adjusted for a period of 2 years of follow up; however, standard CVD risk scores are adjusted for 10 years of follow up. I suggest clarify this explanation or justify why you have used 2 years instead of 10 years.
• This is clearly a paper with a strong methodological spirit (statistical methods). However, there is almost nothing about verification of statistical assumptions. I suggest including relevant information about it, summarized in the manuscript and more detailed in supplemental material.
• I suggest using the same standard criteria for drafting all your tables. Tables in page 18 could be omitted, and only described in the manuscript.
• Figures are not referred into the manuscript.
• About robustness: You should mention that your new procedure preserves the robust facet of classic statistical methods behind current CVD risk scores (e.g. you still are applying Cox models).
• C-statistic has implicit limitations that have not been mentioned, especially considering the way you have used it (simulated data). I suggest you explain implications of to perform only an internal validation procedure (without real data) and how you handled these implications. I recommend to read this reference previously (page 1770): http://circ.ahajournals.org/content/121/15/1768.short
1) The use of the English language is not very good. This can sometimes make the text a bit more difficult to interpret. It needs a good revision.
2) The first problem is that the theoretical structure of the data that the authors are considering is not clear. At the beginning I thought that they were going to use, as predictor variables, biomarkers that will change their value along the follow up period of the patient (from the baseline onwards). The fact that they will be using Cox regression models with time dependent variables suggests this data structure as this models are thought for when the variables change over the follow up period (between time 0 and t*) as they mention. But then in table 4 the measurements of the risk factors are taken before the baseline 0, and there is no data of how they will change on time 0. So what is the theoretical data structure of the problem that the authors are trying to tackle? A simple graph with time on the x-axis, he key points of initial time for the follow up of outcomes and and the points where different kinds of data are collected will help a lot to understand.
3) What do the authors mean exactly by the “longitudinal parameters” that they simulate and are so crucial. What is their physical meaning? Are they just the predictions of the values of the biomarkers at time t*? or are they some parameter of the trend of the biomarkers over the period (0, t*)? This concept is so crucial that it needs to be explicitly defined in the paper (Giving the Rizopolous reference is not enough).
4) About the validation: Once they have the simulations of the of the longitudinal parameters at time t* they convert this into a score using the Cox model derived previously. But this is used to predict risk in the future from when the explanatory variables are set. So the points of the score should help to predict the risk in the next say K years (so form t* to t*+K)? do the authors have the validation data the outcome in those future years (who died and when?)
5) Validation with the score system and the outcome the estimate the C statistic. This is ok if there are no censored in the data without the event. But if there are, should it not be better to calculate the Harrell’s D-statistic rather than the C-statistic?
6) What is the data set on which they develop the model? From what kind of study they got it from? are there any censored detain the follow up? How did they create the validation set? was it a random subset from the data? why that size? Why do they drop patients without values of some of the longitudinal parameters during the follow up? Is it not the aim of this method precisely to predict those values with the previous history of the parameters?
7) Figure-2 Is confusing because it shows the “medium” risk group with higher numbers than the “high” and “very high” risk groups. Please put it on a risk scale dividing by the appropriate denominators.
I think this paper addresses a very interesting problem that will become only more important with the coming of the big data era: How to build risk prediction models using changing values of biomarkers rather than just one baseline measure from the biomarker. Unfortunately I think the paper is not very well written and is very difficult to understand what the authors have done at each step (or even what they are trying to do).
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.