Dear Gustavo and co authors,
I have read this pre print with great interest.
I have just one comment/feedback I hope will be relevant and useful. One aspect that has not be treated in this review is the notion nuclear gene variants leading to Intra-Individual Site Polymorphisms (2ISP) (Potts et al., 2014) or heterozygous information (Lischer et al., 2013) or alleles. From the title, I was expecting something along these lines too (“heterogeneity”). I am referring to the different alleles a specific sequence can have and how to deal with this in downstream phylogenetic analyses. HTS not only allows sequencing of a massive number of individuals and markers, it also allows us to sequence the same marker over and over (number of reads and coverage) unraveling its diversity. Based on traditional Sanger sequencing these 2ISPs were mainly visual (multiple peaks). However with multiple read coverage we can address/analyze this issue in much more detail. How do we take allele variation into account in phylogenetic studies using HTS data? I think this would be a very important paragraph to add, possibly in the section “II. Data generation and data types”?. There is a still a lot of theoretical work to be done, but also novel ways to properly use this information for phylogenetic analyses. It has been shown that taking heterogeneity information into account during phylogenetic analyses has significant impact (e.g. Lischer et al., 2013) on resolution, branch length estimation and subsequently time estimation (potentially impacting macro-evolutionary analyses you refer too).
There are different ways to deal with 2ISP: phasing (physical, statistical, genealogical) (e.g. PHASE; Stephens et al., 2001; Stephens & Donnelly, 2003), random selection of alleles (Lischer et al., 2013), UIPAC (either ambiguous or informative) coding (e.g. Potts et al., 2014). These methods all have advantages and disadvantages. A quick review of these would be a great addition and in line with your current review, I think. I provide a non exhaustive list of key references below.
Thomas Couvreur (email@example.com)
Lischer, H.E., Excoffier, L., & Heckel, G. (2013) Ignoring heterozygous sites biases phylogenomic estimates of divergence times: implications for the evolutionary history of Microtus voles. Molecular biology and evolution, 31, 817–831.
Potts, A.J., Hedderson, T.A., & Grimm, G.W. (2014) Constructing Phylogenies in the Presence Of Intra-Individual Site Polymorphisms (2ISPs) with a Focus on the Nuclear Ribosomal Cistron. Systematic Biology, 63, 1–16.
Stephens, M. & Donnelly, P. (2003) A Comparison of Bayesian Methods for Haplotype Reconstruction from Population Genotype Data. The American Journal of Human Genetics, 73, 1162–1169.
Stephens, M., Smith, N.J., & Donnelly, P. (2001) A new statistical method for haplotype reconstruction from population data. The American Journal of Human Genetics, 68, 978–989.