A peer-reviewed article of this Preprint also exists.
1) My understanding was that the population frequencies (75%, 90%, and 99%) were used to try and estimate the specificity of the SNP calls (so, variants found in more individuals are less likely to be random sequencing errors). Is this correct?
If so, is there any way to confirm with external data? For example, can you find any genes that have been characterized using low-throughput sequencing methods in NCBI Nucleotide (to confirm variants in those selected genes are also found in multiple samples)? It seems possible that different variant callers may have different systematic biases, so you may get bad calls in a number of samples (for example, I have seen this when working with certain targeted sequencing protocols with human variant calling).
2) Perhaps you can add a figure showing the pipeline visually (for dDocent and perhaps Stacks)? I think this can help the reader understand the pipeline more efficiency. For example, I had a hard time finding which tool was used for de novo assembly (I assume I would have to look at a different cited paper and/or the user manual).
3) Perhaps a more direct link can be provided for the user manual. It didn't take me too long to find it, but the link provided directly goes to the blog and I think it would be better to use http://ddocent.wordpress.com/ddocent-pipeline-user-guide/
4) I think Figure 2 was a screenshot that accidentally included an unintended pop-up ("Chart Area" in the upper-left hand corner). You may want to remake this figure.
You can also choose to receive updates via daily or weekly email digests. If you are following multiple preprints then we will send you no more than one email per day or week based on your preferences.
Note: You are now also subscribed to the subject areas of this preprint and will receive updates in the daily or weekly email digests if turned on. You can add specific subject areas through your profile settings.
Usage since published - updated daily