To increase transparency, PeerJ operates a system of 'optional signed reviews and history'. This takes two forms: (1) peer reviewers are encouraged, but not required, to provide their names (if they do so, then their profile page records the articles they have reviewed), and (2) authors are given the option of reproducing their entire peer review history alongside their published article (in which case the complete peer review process is provided, including revisions, rebuttal letters and editor decision letters).
Thank you for addressing the reviewers' comments. I am now happy to accept your manuscript for publication at PeerJ.
As you can see, both reviewers were very positive about your work and have requested only very minor changes. Based on their feedback, I would ask you to address the following points:
- One reviewer finds the Mendelian violation rate in the trio analysis higher than expected for both BALSA and GATK. Are there any reasons why this could be the case?
- Explain in the main text whether filtering was used in the trio study.
- Describe the size distribution of indels detected by BALSA and simulated indels in the main text.
- Move the summarized description of the workflow and the SNAPSHOT format to the main text from the supplementary material.
The authors present a new whole genome and exome sequencing analysis pipeline that uses programming optimized for GPU processors to enable much faster analysis than most current methods. Importantly, they optimize all parts of the process to go from raw reads to variant calls, including homozygous reference calls. While it is much faster, one drawback is that it requires more memory than other methods.
It would be useful to describe what annotations the authors use in their random forest model in the main text of the paper.
It would be useful to describe the size distribution of indels detected by BALSA in the main text.
It would be useful to describe the size distribution of simulated indels in the main text.
For the trio study, it would be useful to say whether filtering was used in the main text. The Mendelian violation rate seems higher than I’d expect for both BALSA and GATK. I recommend that the authors manually investigate a subset of these errors to determine what might be causing them.
Since the NA12878 sample the authors analyzed is part of the Genome in a Bottle Consortium effort, the authors may find it useful to compare their calls to the high-confidence SNP, indel, and homozygous reference genotypes from this study (see the paper http://www.nature.com/nbt/journal/v32/n3/full/nbt.2835.html and most recent calls at http://genomeinabottle.org/blog-entry/new-high-confidence-na12878-genotypes-integrating-phased-pedigree-calls). This could help the authors estimate sensitivity and specificity in the high-confidence regions. It would also be useful for the authors to inspect manually the alignments around a subset of the discordant calls.
The conclusions are well-written, justified, and useful.
The article is well written and meets all standards. Prior literature is appropriately referenced. I would prefer that the supplementary figures were included in the main text (unless there is a particular length restriction which prohibits this).
I would also like to see a summarized description of the workflow and the SNAPSHOT format in the main text, rather than the reader having to read this in the supplementary material.
The experimental design is adequate.
The data presented are robust and statistically sound.
This is an important piece of work, and is a genuine advance in the field.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.