Evaluating a lightweight transcriptome assembly pipeline on two closely related ascidian species

Computer Science and Engineering, Quantitative Biology, Michigan State University, East Lansing, Michigan, United States
Biology Department and Friday Harbor Laboratories, University of Washington, Seattle, Washington, USA
MSU Microbiology and Molecular Genetics, Computer Science and Engineering, Quantitative Biology, Michigan State University, East Lansing, Michigan, United States
DOI
10.7287/peerj.preprints.505v1
Subject Areas
Bioinformatics, Computational Biology
Keywords
Cloud computing, Ascidians, Assembly evaluation, Next-generation Sequencing, Low memory assembly
Copyright
© 2014 Lowe et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Lowe EK, Swalla BJ, Brown CT. 2014. Evaluating a lightweight transcriptome assembly pipeline on two closely related ascidian species. PeerJ PrePrints 2:e505v1

Abstract

De novo transcriptome sequencing and assembly for non-model organisms has become prevalent in the past decade. However, most assembly approaches are computationally expensive, and little in-depth evaluation has been done to compare de novo approaches. We sequenced several developmental stages of two free-spawning marine species—Molgula occulta and Molgula oculata—assembled their transcriptomes using four different combinations of preprocessing and assembly approaches, and evaluated the quality of the assembly. We present a straightforward and reproducible mRNAseq assembly protocol that combines quality filtering, digital normalization, and assembly, together with several metrics to evaluate our de novo assemblies. The use of digital normalization in the protocol reduces the time and memory needed to complete the assembly and makes this pipeline available to labs without large computing infrastructure. Despite varying widely in basic assembly statistics, all of the assembled transcriptomes evaluate well in metrics such as gene recovery and estimated completeness.

Author Comment

This is a submission to PeerJ for review.