dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms

Jonathan Puritz; Christopher M. Hollenbeck; John R. Gold

doi:10.7287/peerj.preprints.314v1

dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms

Jonathan Puritz , Christopher M. Hollenbeck, John R. Gold

Marine Genomics Laboratory, Harte Research Institute, Texas A&M Corpus Christi, Corpus Christi, TX, USA

DOI: 10.7287/peerj.preprints.314v1

Published: 2014-03-28
Accepted: 2014-03-28

Subject Areas: Bioinformatics, Genomics, Marine Biology, Molecular Biology
Keywords: RADseq, Population Genomics, Bioinformatics, Molecular Ecology, Next-generation Sequencing

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.

Cite this article: Puritz J, Hollenbeck CM, Gold JR. 2014. dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms. PeerJ PrePrints 2:e314v1 https://doi.org/10.7287/peerj.preprints.314v1

Abstract

Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for organisms with large effective population sizes and high levels of genetic polymorphism but for which no genomic resources exist. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is most likely due to the fact that dDocent quality trims instead of filtering and incorporates both forward and reverse reads in assembly, mapping, and SNP calling, thus enabling use of reads with Indel polymorphisms. The pipeline and a comprehensive user guide can be found at (http://dDocent.wordpress.com).

1

0

Add your feedback

Top referrals unique visitors

Share this preprint

Metrics

Download article