Effects of mapping algorithms on gene selection for RNA-Seq analysis: pulmonary response to acute neonatal hyperoxia
- Published
- Accepted
- Subject Areas
- Bioinformatics, Genomics, Pediatrics
- Keywords
- TopHat, SHRiMP, CASAVA, qPCR, RNA-Seq, lung
- Copyright
- © 2015 Chu et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
- Cite this article
- 2015. Effects of mapping algorithms on gene selection for RNA-Seq analysis: pulmonary response to acute neonatal hyperoxia. PeerJ PrePrints 3:e833v1 https://doi.org/10.7287/peerj.preprints.833v1
Abstract
Background: A major goal of RNA-Seq data analysis is to reconstruct the full set of gene transcripts expressed in a biological sample in order to quantify their expression levels. The process typically involves multiple steps including mapping short sequence reads to a reference genome, and estimating expression levels based on these mappings. Multiple algorithms and approaches for each processing step exist, and the impact of different methods on estimation of gene expression is not entirely clear.
Methods: We evaluated the impact of three common mapping algorithms on differential expression analysis in an RNA-Seq dataset describing the lung response to acute neonatal hyperoxia. RNA-Seq data generated using the Illumina platform were mapped and aligned using CASAVA, TopHat, and SHRiMP against the mouse genome. Significance Analysis of Microarrays and Cuffdiff were used to identify differentially expressed genes between hyperoxia-challenged and age matched control mice.
Results: 1403 genes were detected as differentially expressed by least one mapping and gene selection method. A majority of genes (>65%) were identified by all three mapping methods, regardless of the gene selection approach. Expression patterns for 52 genes were examined by quantitative polymerase chain reaction (qPCR). Importantly, we found different validation rates for genes selected by each method; 72% for CASAVA, 69% for TopHat and 63% for SHRiMP. Surprisingly, the validation rate for genes selected by all three mapping methods was no greater than the best single method.
Conclusion: The choice of mapping strategy impacts the reliability of gene selection for RNA-Seq data analysis.
Author Comment
This is a submission to PeerJ for review.
Supplemental Information
Supplemental Data Files
Contains Supplemental Figures 1-4 and Supplemental Table 1