This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Co-evolution between insects and their endosymbiotic bacteria can be detected by constructing and comparing their phylogenetic trees. Even though taxon sampling can greatly affect phylogenetic and co-evolutionary inference, most hypotheses of endosymbiont relationships and estimates of the number of endosymbiont lineages within a host group have used only a small percentage of available bacterial sequences. Here we examined how different sampling strategies of Gammaproteobacteria sequences affect estimates of the number of endosymbiont lineages in parasitic sucking lice (Insecta: Phthirapatera: Anoplura). We estimated the number of louse endosymbiont lineages using both newly obtained and previously sequenced 16S rDNA bacterial sequences and more than 42,000 16S rDNA sequences from other Gammaproteobacteria. We also performed parametric and nonparametric bootstrapping experiments to examine the effects of phylogenetic error and uncertainty on these estimates. We found that sampling of 16S rDNA sequences affected the estimates of endosymbiont diversity in sucking lice until we reached a threshold of genetic diversity. Sampling by maximizing the diversity of 16S rDNA sequences was more efficient than simply randomly sampling available 16S rDNA sequences. Although simulation results support the finding of multiple endosymbiont lineages in sucking lice, the bootstrap results suggest that there is still uncertainty in estimates of the number of endosymbiont origins inferred from 16S rDNA alone.