All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
I think this is an important contribution that will stimulate methodological improvements in the computational treatment of NGS data and help to clarify mutational processes.
Both reviewers consider the study well designed and very interesting and I agree with them. However, reviewer 2 suggests two points that deserve further discussion: the influence of the presence of similar variants, although not identical, in the genome, and, more importantly, the possibility that genetic and environmental differences may have a role in the observed heterogeneity. Please discuss these points in the revised version.
Over the past eight years, the profusion of whole-genome and whole-exome sequencing studies has led to huge datasets of somatic mutations found in cancer samples. This is making it possible to see in unprecedented detail not only the driving events of cancer, but also the common artifacts of our sequencing technology. Distinguishing between these two classes of observations has proven to be a greater challenge than anticipated.
In particular, there has been a reliance in the field on observing recurrent mutations at the same genomic position in multiple samples. This is taken as prima facie evidence of positive selection in tumorigenesis, since the chances of observing exactly the same mutation in multiple samples is vanishingly small.
Studies such as Chang et al. PMID 26619011 and many others present lists of recurrently mutated positions, seemingly without appreciation for the possibility that some may actually be recurrent artifacts.
The current study points out that there are many more recurrently mutated sites in the genome than would be possible in a model without recurrent non-driver mutations. The authors show this in a straightforward, easily understood way. Furthermore, they point out that different laboratories in the world, analyzing the same kind of cancer, find many *different* recurrently mutated positions in the genome, consistent with a model of technology-specific artifacts.
This is an elegant fundamental advance in the analysis of somatic mutations in cancer, and a crucial warning for the field.
The study is well designed.
It is, however, unclear how "uniquely mappable" is defined. If it means that the 20-mer around the mutation only occur once in the genome, it would be more interesting to allow some wiggle room, i.e., the 20-mer is uniquely mappable if there is no other 20-mer in genome that matches with say 90%. See e.g. Bailey et al (2004) (pmid 11381028). This is particularly interesting if we assume that called variant is caused at the mismatch between two similar regions.
In the section "Privacy of mutations" the authors write the significant heterogeneity suggests the excess sites were predominantly error. While it is a possible explanation, it seems a bit overstated conclusion as it very well might be a biological explanation such as difference between Japanese and American patients - either genetic differences or environmental differences such as diet.
The authors state repeatedly that the model is rejected by a goodness-of-fit test, which means the model is not perfect. The paper would benefit from having some discussion on what the natural next step would be to improve the model.
Bottom of page 11 it says p>0.0001; should probably be p<0.0001
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.