Targeted NGS for species level phylogenomics: “made to measure” or “one size fits all”?

Institut für Organismische und Molekulare Evolutionsbiologie, Johannes-Gutenberg Universität Mainz, Mainz, Germany
Department of Biochemistry, University of Stellenbosch, Stellenbosch, South Africa
DOI
10.7287/peerj.preprints.2763v2
Subject Areas
Bioinformatics, Evolutionary Studies, Plant Science
Keywords
Ericaceae, hybridization enrichment, marker development, next-generation sequencing, phylogeny, targeted sequence capture, target enrichment, transcriptome
Copyright
© 2017 Kadlec et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Kadlec M, Bellstedt DU, Le Maitre NC, Pirie MD. 2017. Targeted NGS for species level phylogenomics: “made to measure” or “one size fits all”? PeerJ Preprints 5:e2763v2

Abstract

Targeted high-throughput sequencing using hybrid-enrichment offers a promising source of data for inferring multiple, meaningfully resolved, independent gene trees suitable to address challenging phylogenetic problems in species complexes and rapid radiations. The targets in question can either be adopted directly from more or less universal tools, or custom made for particular clades at considerably greater effort. We applied custom made scripts to select sets of homologous sequence markers from transcriptome and WGS data for use in the flowering plant genus Erica (Ericaceae). We compared the resulting targets to those that would be selected both using different available tools (Hyb-Seq; MarkerMiner), and when optimising for broader clades of more distantly related taxa (Ericales; eudicots). Approaches comparing more divergent genomes (including MarkerMiner, irrespective of input data) delivered fewer and shorter potential markers than those targeted for Erica. The latter may nevertheless be effective for sequence capture across the wider family Ericaceae. We tested the targets delivered by our scripts by obtaining an empirical dataset. The resulting sequence variation was lower than that of standard nuclear ribosomal markers (that in Erica fail to deliver a well resolved gene tree), confirming the importance of maximising the lengths of individual markers. We conclude that rather than searching for “one size fits all” universal markers, we should improve and make more accessible the tools necessary for developing “made to measure” ones.

Author Comment

There is a manuscript draft for submission to a peer reviewed journal. It has been modified from the previous version by addition of details to the bioinformatics part of the methods and further results to the supplementary material.

Supplemental Information

Supplementary data 1: Exons sequences corresponding to the 134 markers selected for the empirical study and the complete pools of marker selected using each of the methods compared (fasta format)

DOI: 10.7287/peerj.preprints.2763v2/supp-1

Supplementary data 2: Sequences alignments

DOI: 10.7287/peerj.preprints.2763v2/supp-2

Gene trees inferred under RAxML

DOI: 10.7287/peerj.preprints.2763v2/supp-3

Supplementary data 4: Table documenting markers as represented in Supplementary data 1-3

DOI: 10.7287/peerj.preprints.2763v2/supp-4