Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on March 18th, 2020 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on May 14th, 2020.
  • The first revision was submitted on June 12th, 2020 and was reviewed by 1 reviewer and the Academic Editor.
  • A further revision was submitted on July 22nd, 2020 and was reviewed by the Academic Editor.
  • The article was Accepted by the Academic Editor on August 11th, 2020.

Version 0.3 (accepted)

· Aug 11, 2020 · Academic Editor

Accept

All comments raised by the reviewers have been addressed properly, so my decision is to accept the manuscript for publication.

[# PeerJ Staff Note - this decision was reviewed and approved by Keith Crandall, a PeerJ Section Editor covering this Section #]

Version 0.2

· Jul 16, 2020 · Academic Editor

Minor Revisions

As you can see one reviewer has still a minor comment regarding the Bland-Altman method in the discussion section.

[# PeerJ Staff Note: It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors are in agreement that they are relevant and useful #]

Reviewer 2 ·

Basic reporting

NA

Experimental design

NA

Validity of the findings

NA

Additional comments

The authors answered my early comments. However, a new comment is about the discussion statement for the Bland-Altman method in the discussion section in the updated version. Many discussions can be found about this Bland-Altman method. As mentioned in the review paper by Scholz, et al. (2015), when two methods/techniques are being compared without a ‘golden standard’ or ‘true’ reference which is usually the case in applications, an improved Bland-Altman approach by Liao and Capen (2011) should be the first choice as the statistic procedure. The authors can found out the advantages of the improved Bland-Altman method in detail in Liao and Capen (2011). I do think it is important and necessary to mention this fact there.

Liao, J.J.Z. and Capen, R. (2011), “An Improved Bland-Altman Method for Concordance Assessment”, The International Journal of Biostatistics, Vol. 7: Iss. 1, Article 9, 1 – 17.

Scholz, et. al., 2015, “Non-invasive methods for the determination of body and carcass composition in livestock: Dual-energy X-ray absorptiometry, computed tomography, magnetic resonance imaging and ultrasound: Invited review”, in animal 9(07):1-15

Version 0.1 (original submission)

· May 14, 2020 · Academic Editor

Minor Revisions

As you can see from the attached reports, the reviewers have some minor points to address and gave very constructive feedback. I hope the comments are helpful for you to further improve your manuscript.

[# PeerJ Staff Note: It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors are in agreement that they are relevant and useful #]

Reviewer 1 ·

Basic reporting

I was able to reproduce nearly all the results in the paper, but the three bootstrap results slightly differed. This is easily addressed using set.seed(). In addition, I have suggestions for making reproducibility easier. The figures are very nice.

Major:
a) Bootstrapping Reproducibility: set.seed(random.number) to make bootstrapping return identical results every time. Sometimes I got no convergence errors and other times I got a different number of errors than reported in the paper. Fortunately, this a really easy to address.

set.seed(532)
m.bfat.1 <- lcc(data = bfat, subject = "SUBJECT", resp = "BF", method = "MET", time = "TIME", qf = 1, qr = 1, components = TRUE, ci = T, nboot = 10000)
m.hue.2 ….
m.bw.2 …

Minor:
a) Suggest creating Rmd: It'll be much easier to reproduce the results using an R Markdown file that is self-contained.

b) Suggest loading all required packages and datasets at the start:
if (!require("pacman")) install.packages("pacman")
pacman::p_load(lcc, cccrm, dplyr, reshape2, tidyverse)

#Datasets:
load(file = "simulated_hue_block.RData")
load(file = "simulated_hue.RData")
data(bfat, package = "cccrm")
data(hue)
data(bdaw, package = "cccrm")

Ideally, the two hue datasets could be added to lcc package instead of being loaded separately. This is a minor issue, but it is an extra step to download the data and then load it.

Also, I recommend adding sessionInfo() and saving all the plots in an Rmd.

Experimental design

The research question for the proposed methods is very well defined. I was able to reproduce the results, with one exception noted above with bootstrapping.

Validity of the findings

Overall, the findings are valid and rigorious. Howver, this is one interpretation that I think should be restated.

On line 632: “As the p-value was 0.8935, we can conclude that there is no interaction effect and, consequently, the fitted curves for each level of method over time can be considered parallel.” Non-significance does not necessarily imply the null (no difference), see Greenland et al. (2016), point 6: https://link.springer.com/article/10.1007/s10654-016-0149-3

It would be safer to interpret this result as model 5 and model 4 are not clearly distinguishable using the likelihood test ratio, see Burnham and Anderson’s (2007) book Model Selection and Multimodel Inference. However, the lower AIC and BIC values for model 4 indicate it is more plausible. I think this is a great opportunity to show an example of a more cautious interpretation. It’s not uncommon to see cherry picking of fit indices. Admittedly, it’s messy when fit statistics conflict because the interpretation is no longer straightforward. I would argue that messiness reflects the reality of the data and models.

Additional comments

The paper is very well written and the R package is sound. Documentation of the R package is excellent. I was able to reproduce all the results in the paper, notwithstanding the main issue raised in the basic reporting section; but that is an easy fix. The lcc package has great functionality with support for common functions such as anova() and plot(). Graphs in the paper are impressive and nicely illustrate the models and model diagnositics. I'll definitely be using this package, I have data with mulitple raters for items over time.

Major Comments:

a) Limits of Agreement: I'm going to caveat this first point by saying I have limited knowledge of the biostatistics literature. Still, I interpret limits of agreement (LoA), the range of absolute agreement or exchangability for two methods, as being different than accuracy along the 45 degree diagonal line because the ground truth value is often unknown. Therefore, it would be useful to mention LoA because two measures/methods can be very highly correlated, yet have a wide range of absolute exchangeability. Here’s an example of different measurements tumor size, near perfect concordance correlations with a broad range of exchangeability: https://pubs.rsna.org/doi/pdf/10.1148/radiol.2522081593

The R MethComp package can handle repeated measures for LoA:
https://cran.r-project.org/web/packages/MethComp/index.html
Repeated measures for LoA is specialized area, most LoA is done with independent observations: https://doi.org/10.1093/bja/aem214

Even though the Bland-Altman method papers are very highly cited (one is in the top 10 of all cited papers in biostats), the distinction between limits of agreement and agreement using correlation is still frequently overlooked. To add to the confusion, LoA is also called method comparison, Bland-Altman analysis/method/plot, and Tukey Mean-Difference Plot.

Original papers:
Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement, Lancet, 1986i(pg. 307-10)
Bland JM, Altman DG. Comparing methods of measurement: why plotting difference against standard method is misleading, Lancet, 1995, vol 346 (pg. 1085-7)
Bland JM, Altman DG. Measuring agreement in method comparison studies, Statisical Methods in Medical Research, 1999, vol 8 (pg. 135-60

Good tutorial paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4470095/

b) Incorrect models for time-series data: To demonstrate what not do and also show why Icc is valuable, consider adding examples of incorrect models.

Specifically, treating non-independent data as independent by fitting conventional ICC models. I’m not aware of any examples of this for reliability stats, but in general this is done more often than one would think: see Aarts (2014).
Aarts, E., Verhage, M., Veenvliet, J. V., Dolan, C. V., & Van Der Sluis, S. (2014). A solution to dependency: using multilevel analysis to accommodate nested data. Nature neuroscience, 17(4).

Another incorrect possibility is to average data and then fit a ICC conventional model.

Minor
a) Connections to multilevel modeling: Is the longitudinal correlation ever the same as the intraclass correlation coefficient (ICCs) in multilevel modeling? I think this is only for a particular multilevel model (unconditional growth model)? I remember Singer and Willett’s (2003) book, Applied Longitudinal Data Analysis, mentioned this. Unfortunately, that book is in my office right now.

b) Performance: This is a wish-list item for the future, in the package rather than the manuscript- if possible, any performance improvements would be wonderful. Example run times on an i7-4770 running Windows 10 with BLAS:

- Approximate run time: Hours, probably? I ran the bootstrapping code overnight.
m.bfat.1 <- lcc(data = bfat, subject = "SUBJECT", resp = "BF",
method = "MET", time = "TIME", qf = 1, qr = 1,
components = TRUE, ci = T, nboot = 10000)

- Approximate run time: 9 minutes
m.bfat.2 <- update(m.bfat.1, lme.control = list(opt = "optim"))

I’m guessing this is going to be extremely difficult, especially with multilevel models and because it looks like the Icc uses nlme? I'm pretty sure nlme doesn't have multi-CPU support. lme4 has multi-CPU support for models and bootstrapping. Bootstrapping may be the best starting point for parallelization.

Reviewer 2 ·

Basic reporting

See my comments to the authors

Experimental design

See my comments to the authors

Validity of the findings

See my comments to the authors

Additional comments

The authors developed a nice r-package for a very important statistical and practical concordance/agreement item, which has a very broad application. It is very important to have computational tool ready for advanced statistical methods. There are many r-packages for assessing the agreement of two measurement methods. A most recent one is named "AgreementInterval" which includes commonly used index approaches such as the CCC and the interval approaches along with graphic tools. The current r-package particularly focuses on the agreement for longitudinal data.

1. The authors used the early proposed concordance correlation coefficient (CCC) and the accuracy index C_{b} (Lin, 1989). However, as pointed out by Liao & Lewis (2000), there are many concerns regarding the metrics. For example, the C_{b} sometimes gives unexplainable results, or totally misleading results. To enhance Lin's CCC, Liao (2003) developed a new concordance correlation coefficient built on Lin's CCC by using two random paired measurements to the line of identity and improved the inferential ability of the new method. This approach increased the assessment accuracy. These facts should be mentioned in the introduction section so that the readers/practitioners can use their subject knowledge to judge the appropriateness of the derived metrics.

2. As the authors pointed out in the article, there are many cases where the agreement is needed for the curved data. The authors studied the agreement for a structured longitudinal data. However, the first paper in the literature for agreement in curved data without any structured assumption was proposed in Liao (2005) using a general non-parametric approach. This information should be mentioned in the introduction section so that the readers/practitioners can use their subject knowledge to judge if their data have the defined longitudinal structure.


• Liao, J.J.Z. and Lewis, J. (2000), “A Note on Concordance Correlation Coefficient”, PDA Journal of Pharmaceutical Science and Technology, 54(1), 23 – 26.
• Liao, J.J.Z. (2003), “An Improved Concordance Correlation Coefficient”. Pharmaceutical Statistics, 2(4), 253 – 261.
• Liao, J.J.Z. (2005), “Agreement for Curved Data”, J. of Biopharmaceutical Statistics, 15, 195 – 203.
• CRAN - Package AgreementInterval:
https://cran.r-project.org/web/packages/AgreementInterval/index.html

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.