Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on March 4th, 2023 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on April 19th, 2023.
  • The first revision was submitted on May 26th, 2023 and was reviewed by 1 reviewer and the Academic Editor.
  • A further revision was submitted on July 25th, 2023 and was reviewed by the Academic Editor.
  • The article was Accepted by the Academic Editor on August 10th, 2023.

Version 0.3 (accepted)

· Aug 10, 2023 · Academic Editor

Accept

Thank you for your constructive responses and revisions. I’m delighted to be able to accept your manuscript and I’m looking forward to your work contributing to the important conversations about departures from protocols. Well done.

I have noted a few minor points that I think you could address as part of the proofing process.

Line 156: I wonder if “we found that 30.1% of clinical study reports (CSRs)” might be confusing as you have defined CSRs before (Line 68) but this matches the initial letters of the three words here. Perhaps just “we found that 30.1% of CSRs showed”.

Line 161: Previously you’d mentioned 80% power (equivalent to beta = 0.20), and I suggest using this here, i.e. “beta= 0.20” rather than “beta= 0.19” even if the sample size would provide 81% power.

Line 171: I wonder if “For a SR to get a high score, no or one non-critical weakness is necessary must be found” should be something like “For a SR to get a high score, no or only one non-critical weakness should be found”?

You normally have spacing between paragraphs but not at Lines 173/174, 176/176, 179/180, 241/242, 285/286, 293/294, 379/380, 384/385, and 388/389.

Lines 180–181: From your rebuttal comment, I wonder if “in a single manner single extraction” should have been “in a single extraction”.

Reference 17 on Line 472 ends with a semicolon. You might find https://support.posit.co/hc/en-us/articles/206212048-Citing-RStudio useful for the citation.

Figure 3 appears a little fuzzy for me (Figure 2 looks fine) and it is possible that a higher-resolution version or uncompressed version will be needed for the final version of your manuscript.

**PeerJ Staff Note:** Although the Academic Editor is happy to accept your article as being scientifically sound, a final check of the manuscript shows that it would benefit from further English editing. Therefore, please identify necessary edits and address these while in proof stage.

Version 0.2

· Jun 30, 2023 · Academic Editor

Minor Revisions

Thank you for your revised manuscript and rebuttal. We don’t have additional comments from our reviewers but I will make a few below. These are often quite minor, but there are a few more than I’d like for copyediting and I hope that this will be a useful step in preparing your manuscript. Many of these comments are stylistic and I’ll leave the use of these up to you. Providing these points are all addressed (through changes or rebuttal), and no new points come to my attention, I anticipate quickly recommending acceptance of your revised manuscript.

Lines 38 and 41: Apologies for the pedantry, but could you keep the hyphen (or its absence) consistent for “non[-]CSRs”. See also Lines 47, 48, 51/52, 54, etc. where the hyphen seems consistently included.

Lines 46 and 47: Can you make it clear that the numbers in parentheses are standard deviations? Also, I couldn’t see these values from the abstract in the manuscript itself.

Line 48: I suggest “67.0%” for consistency in decimal places with “54.6%” on the previous line, but given n=97/group, since each n=1 counts for slightly more than 1%, I’d use integer percentages here and throughout the manuscript. See Lines 247, 248, 259 (note integer already on 258), 261, etc., and Table 2, etc.

Line 63: Missing period.

Line 86: Spurious space before period. See also Line 121 and potentially elsewhere.

Line 91: As I’ve also seen the “P” described as “population” (as you refer to on Line 208, albeit not directly as what is meant by the “P” in “PICOS”), and even as “problem”, would it be worth mentioning at least the first alternative here (e.g. “Patients/Population”)?

Line 148: Rather than “randomized” (which suggests random allocation), perhaps “randomly sorted” or “sorted in a random order”? See previous comment and your response regarding Line 134 in the original version of the manuscript.

Line 152: Presumably this refers to the “rnorm()” function (no hyphen, and I suggest appending the parentheses)?

Lines 160–162: Rather than powering on the observed difference, powering on the smallest important difference would be safer. Perhaps, the fact that you were interested in differences smaller than 31.5% could be mentioned here?

Line 164: Perhaps “two-sided” or “two-tailed” would be more familiar to some readers than “bilateral”?

Lines 164–166: I can reproduce the n=97/group here, but only by assuming you were using 0.4 and 0.6 as the proportions (i.e., the worst-case scenario). Slightly fewer would have been needed for other differences of 0.2 which are centred away from 0.5 (e.g., 0.35 versus 0.55, a mean of 0.45 which is closer to your observed data, requires 96/group). Could you clarify that your calculation here is conservative and/or under the worst case for a difference of 0.2?

Lines 172 and 176. I wonder if “found” (as used on Line 178) would be clearer language here than “comprised”.

Lines 181–182: Is there a word missing in “in a single manner single extraction.”?

Line 215: I think I’d find the reference to using Chi-squared tests “to compare the outcomes’ means for the PICOS changes” confusing given that this test compares categorical outcomes and so distributions/proportions rather than means.

Line 251: I’m not sure the word “written” here is needed.

Lines 251–254: Perhaps make these values clear with “...with a mean (SD) duration of...” on Line 251?

Line 264: I don’t think you need the comma in “24 SRs, (24.7%)”. Similar point for Line 275 (“and major changes, 35/108 (32.4%)”—although you could add another comma after this quote to make the values a parenthetical aside) and it might be worth checking commas more generally.

Lines 280–281: Seems to repeat 279–280.

Lines 295–298: The negative wording here makes this harder to follow, I think. In any case, I wonder if a brief aside to make sure the reader remembers why you mean by “magnitude” (Line 295) and perhaps “not reporting” rather than “non-reporting” (Line 297).

Line 342: Perhaps “...CSRs, AND found that 38%...”?

Line 387: “confront” seems a stronger word than necessary. Wouldn’t you have queried them?

Lines 386–388: I’m not convinced that this is a limitation. Your research seems clearly intended to investigate the research question for intervention studies (consider Lines 124–125, 138, 143, 238, and 240) so I wouldn’t consider this a limitation.

Lines 396–397: While I think great caution is required in attributing causality (that non-CSRs are worse because they are non-CSR rather than because of researcher experience, methods experts in the team, funding availability, etc. being associated with the non-CSR/CSR status and causally associated with the outcomes here), at the same time, the comparison is still entirely valid as reflecting the state of affairs. I guess what I’m saying is that I think you could delete the word “great” from Line 396 without over-reaching (in my view).

Lines 403–405: You could, if you wished, point to ChatGPT as having been considered (and presently found wanting) for performing systematic reviews entirely, e.g., Qureshi R, Shaughnessy D, Gill KAR, Robinson KA, Li T, Agai E. Are ChatGPT and large language models “the answer” to bringing us closer to systematic review automation? Syst Rev. 2023;12(1):72. doi:10.1186/s13643-023-02243-z

·

Basic reporting

None

Experimental design

None

Validity of the findings

None

Additional comments

None

Version 0.1 (original submission)

· Apr 19, 2023 · Academic Editor

Major Revisions

Thank you for your interesting manuscript. We have comments from two reviewers which I invite you to respond to in a revised version of your work. Each of the reviewers’ points should be clearly addressed in your rebuttal—for each explaining either how the manuscript has been changed in response or why no changes have been made. I will ask you do the same to the additional points I’ve included below.

Our two reviewers have collectively raised fairly specific and direct points, all of which should be responded to. I think you would be able to address these without great difficulty, although you could make larger changes in response to some of Reviewer #2’s suggestions.

Line 47: Perhaps related to Reviewer #2’s point about percentages, I wonder how useful the two here are (particularly since they add to 100, making one redundant). Perhaps the mean (or similar) changes per review would be more useful here? I think 108/227 would give 47.6% rather than 47.5 as per the text (with an equivalent discrepancy for 119/227). See also Line 231.

Line 90: While I agree that all are important, the claim that these are equally important is very specific and seems to need some justification. Perhaps a reference, or references, might help here?

Line 134: I think you might mean “randomly sampled from” or “randomly ordered” (as shown in Figure 2, with the sampling taking place later as per Lines 137–138?) rather than “randomized” (i.e., allocated to groups) here. I felt that this was a point Reviewer #2 might have been alluding to as well.

Lines 141 and 145: I think you could explain where the 20% difference hypothesised came from (and why smaller differences would not be of practical importance). Your calculation seems to be for a Chi-squared test (rather than a LR test, say) and to aid reproduction by readers, you could make this explicit.

Line 183: To be pedantic, means and SDs don’t have corresponding CIs (sample sizes would be needed for this), but perhaps you mean the differences in means and their CIs?

Line 184: A Mann-Whitney U test isn’t a test of means, and is only a test of medians in particular circumstances, so it’s unclear to me why you report means and SDs (suggesting that these are readily interpretable, which is seldom the case in my experience outside of certain distributions, e.g. Normal or Poisson) and then use non-parametric tests (suggesting that the assumptions for, say, t-tests were not sufficiently well satisfied). Giving 95% CIs around means, but not giving estimates for the difference in means, along with its own CI, could be justified, but it would be unusual. Given Reviewer #2’s point about potential confounding, quantile regression to model medians would be only possible approach here.

Lines 185–187: I think a good goal for the statistical methods is that a competent (bio)statistician could read them and, if they had access to the raw data, could expect to be able to reproduce the results with confidence. On reading the methods, I don’t think I’d necessarily know exactly what was going to be done here. Could you make this clearer?

Line 189: I’m not sure that I’d describe ARR as a “method” as such. The particular approach used to calculate the 95% CI would be helpful to note though.

Line 193: RStudio is an interface to an R installation (and does no calculations itself), so the version needed here would be the version of R, along with the names and versions of any non-base packages used.

Lines 210–211: The methods talked about means and SDs as descriptives, which would (subject to the reader understanding the nature of the distribution, and other summary statistics might be more useful if the data was not approximately Normally distributed) give the reader some idea of the data. However, CIs are inferential. Are you interested in estimating the time between publications (for which a CI would be useful) or in describing this (for which a mean and SD, or some other combination of summary statistics would be useful)?

Lines 211–213: A similar point would apply here, but a CI along with the point estimate for the difference would be much more useful than a p-value in isolation (Line 213). I appreciate the limitations from MW-U tests, but I wonder if they can be avoided by considering other methods here?

Lines 213–215: Same point again.

[# PeerJ Staff Note: Please ensure that all review and editorial comments are addressed in a response letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate. #]

·

Basic reporting

1. Introduction: The Cochrane handbook was also updated in 2019.

2. Methods: Why was 2018 chosen?

3. Inclusion criteria: methodology reviews were also excluded?

4. Hypotheses and sample size calculation: there seems to be results included here.

5. Supplementary material, figure 2: the x axis is unclear.

6. Table 2: include abbreviations as footnotes

7. There seems to be no description of the characteristics of the included reviews and also no reference to which were included.

8. No details seem to have been included on AMSTAR only the overall risk of bias.

Experimental design

no comment

Validity of the findings

no comment

Additional comments

The title needs consideration. The work is about discrepancies between protocol and review rather than reasons for these changes. Therefore, the reasons may not be biased or introduce bias.

·

Basic reporting

- The manuscript describes a comparison of Cochrane systematic reviews (CSRs) with published non-Cochrane systematic reviews (non-CSRs) with a preregistered protocol in PROSPERO with respect to differences between protocol and publication plus reporting of these differences.
- The authors say (lines 92-97) that they want to evaluate the impact of the Cochrane requirements of reporting changes between protocol and publication (since 2008) and of motivating these changes (since 2013). It’s unclear why they didn’t compare CSRs before and after the introduction of these requirements. Please explain in your revised manuscript.
- The manuscript is well-written and the research question and its relevance are introduced adequately.
- The authors report many percentages, but in several instances fail to make fully clear what they refer to. That leaves doubt whether they refer to the total number of protocols, publications, changes between protocol and publications, changes that are reported etc. Examples of this ambiguity can be found for instance in lines 239, 244, 280, 282. Please repair.
- Another ambiguity concerns the reporting of deviations from the initial protocol:
o What does ‘reporting of the changes’ (e.g. in line 167) mean: filing an amendment to the protocol or stating it in the publication on the SR?
o What does ‘presenting a change’ (e.g. in line 245, 246, 264 and 275) mean: that a change occurred or that it occurred AND was described in an amendment or in the publication of the SR?

Experimental design

- The SRs were not allocated randomly to be conducted as CSR or non-CSR. This means that the differences found can be due to confounding, including self-selection by the author teams (similar to ‘confounding by indication’). The authors identify this correctly as main study limitation in both the abstract and the discussion section, but they label their study design confusingly as cross-sectional instead as observational or non-randomized. Please change this.
- Furthermore, I believe that the label cross-sectional (it the subtitle, abstract and many lines of the manuscript) is wrong for another reason as well. To me the study is longitudinal as it focusses on the difference between study protocols and later publications and the reporting of changes that have occurred. It’s a cohort study in epidemiological jargon. Because independent variables are contained in the study protocol that was time-stamped in the past a more specific label would be historical cohort study.
- The main inclusion criterion seems to be that the SRs deal with RCTs of therapeutic interventions in human health, but that’s not adequately spelled out in the methods section. Please make this more accurate and explain why diagnostic, preventive and palliative human health interventions and non-human and non-health interventions were excluded.
- Please explain why SRs of 5 years ago (2018) were selected for this study.
- Please explain why only SRs in English were included.
- Why was the methodological quality of SRs not assessed by 2 or more reviewers (line 153)? How many SRs were assessed by >1 reviewer and how good was the level of agreement (e.g. expressed in Cohen’s Kappa)?

Validity of the findings

- The authors missed the opportunity to also study publication bias by only including SRs that had both a preregistered protocol and a publication. It’s unclear why they did not depart from preregistered SR protocols with the existence of a later publication of the SR as the first outcome studied. I consider this a missed opportunity for assessing the occurrence of publication bias (which is a major form of selective reporting) and believe that this ought to be added as important study limitation. Strangely line 352 suggests that the authors believe that they also studied unpublished studies, which is clearly not the case.
- Minor study limitations not mentioned are that it’s not studied which proportion of SRs have a preregistered study protocol and whether these SRs have lower methodological quality.
- Maybe add to the recommendation to assign one peer reviewer to check for differences between protocol and publication (lines 353-354) that suitable software might be very helpful to detect instances of unreported changes. Furthermore, to me it seems more realistic to make this check a task for the editorial office instead of burdening unpaid peer reviewers even more.

Additional comments

none

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.