RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing

Department of Plant Pathology & Microbiology, University of California, Riverside, Riverside, CA, United State
Institute for Integrative Genome Biology, University of California, Riverside, Riverside, CA, United States
Department of Botany and Plant Sciences, University of California, Riverside, Riverside, CA, United States
DOI
10.7287/peerj.preprints.2447v2
Subject Areas
Bioinformatics, Genomics, Plant Science
Keywords
annotation, diversity, parallel processing, transposons, population genomics, short read, bioinformatics, rice, resequencing
Copyright
© 2016 Chen et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Chen J, Wrightsman T, Wessler SR, Stajich JE. 2016. RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing. PeerJ Preprints 4:e2447v2

Abstract

Background Transposable element (TE) polymorphisms are important components of population genetic variation. The functional impacts of TEs in gene regulation and generating genetic diversity have been observed in multiple species, but the frequency and magnitude of TE variation is under appreciated. Inexpensive and deep sequencing technology has made it affordable to apply population genetic methods to whole genomes with methods that identify single nucleotide and insertion/deletion polymorphisms. However, identifying TE polymorphisms, particularly transposition events or non-reference insertion sites can be challenging due to the repetitive nature of these sequences, which hamper both the sensitivity and specificity of analysis tools.

Methods We have developed the tool RelocaTE2 ( http://github.com/stajichlab/RelocaTE2 ) for identification of TE insertion sites at high sensitivity and specificity. RelocaTE2 searches for known TE sequences in whole genome sequencing reads from second generation sequencing platforms such as Illumina. These sequence reads are used as seeds to pinpoint chromosome locations where TEs have transposed. RelocaTE2 detects target site duplication (TSD) of TE insertions allowing it to report TE polymorphism loci with single base pair precision.

Results and Discussion The performance of RelocaTE2 is evaluated using both simulated and real sequence data. RelocaTE2 demonstrate high level of sensitivity and specificity, particularly when the sequence coverage is not shallow. In comparison to other tools tested, RelocaTE2 achieves the best balance between sensitivity and specificity. In particular, RelocaTE2 performs best in prediction of TSDs for TE insertions. Even in highly repetitive regions, such as those tested on rice chromosome 4, RelocaTE2 is able to report up to 95% of simulated TE insertions with less than 0.1% false positive rate using 10-fold genome coverage resequencing data. RelocaTE2 provides a robust solution to identify TE insertion sites and can be incorporated into analysis workflows in support of describing the complete genotype from light coverage genome sequencing.

Author Comment

Preprint is updated in response to reviewer comments.