DACCOR - Detection, charACterization, and reconstruction of Repetitive regions in bacterial genomes

Center for Bioinformatics (ZBIT), Integrative Transcriptomics, Eberhard-Karls-Universität Tübingen, Tübingen, Germany
DOI
10.7287/peerj.preprints.3480v1
Subject Areas
Bioinformatics, Computational Biology
Keywords
NGS, repeat resolution
Copyright
© 2017 Seitz et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Seitz A, Hanssen F, Nieselt K. 2017. DACCOR - Detection, charACterization, and reconstruction of Repetitive regions in bacterial genomes. PeerJ Preprints 5:e3480v1

Abstract

The reconstruction of genomes using mapping based approaches with short reads experiences difficulties when resolving repetitive regions. These repetitive regions in genomes result in low mapping qualities of the respective reads, which in turn lead to many unresolved bases of the genotypers. Currently, the reconstruction of these regions is often based on modified references in which the repetitive regions are masked. However, for many references such masked genomes are not available or are based on repetitive regions of other genomes. Our idea is to identify repetitive regions in the reference genome de novo. These regions can then be used to reconstruct them separately using short read sequencing data. Afterwards the reconstructed repetitive sequence can be inserted into the reconstructed genome. We present the program DACCOR, which performs these steps automatically. Our results show an increased base pair resolution of the repetitive regions in the reconstruction of Treponema pallidum samples, resulting in fewer unresolved bases.

Author Comment

The contents of this paper were presented at the GCB2017 as a poster.

Supplemental Information

SNP table for the 16S gene

DOI: 10.7287/peerj.preprints.3480v1/supp-1

Runtime comparison between DACCOR and Vmatch

DOI: 10.7287/peerj.preprints.3480v1/supp-2

SNP tabel for the 23S gene

DOI: 10.7287/peerj.preprints.3480v1/supp-3

Improvements on the reconstruction of the two syphilis genes

DOI: 10.7287/peerj.preprints.3480v1/supp-4

Runtime of Daccor on the Nichols genome

DOI: 10.7287/peerj.preprints.3480v1/supp-5