gapFinisher: a reliable gap filling pipeline for SSPACE-LongRead scaffolder output
- Published
- Accepted
- Subject Areas
- Bioinformatics, Genetics, Genomics, Computational Science
- Keywords
- scaffolding, draft genomes, genome assembly, next-generation sequencing, long read technologies
- Copyright
- © 2017 Kammonen et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. gapFinisher: a reliable gap filling pipeline for SSPACE-LongRead scaffolder output. PeerJ Preprints 5:e3467v1 https://doi.org/10.7287/peerj.preprints.3467v1
Abstract
Unknown sequences, or gaps, are largely present in most published genomes across public databases. Gap filling is an important finishing step in de novo genome assembly, especially in large genomes. The gap filling problem is nontrivial and while many computational tools exist partially solving the problem, several have shortcomings as to the reliability and correctness of the output, i.e. the gap filled draft genome. SSPACE-LongRead is a scaffolding software that utilizes long reads from multiple third-generation sequencing platforms in finding links between contigs and combining them. The long reads potentially contain sequence information to fill the gaps, but SSPACE-LongRead currently lacks this functionality. We present an automated pipeline called gapFinisher to process SSPACE-LongRead output to fill gaps after the actual scaffolding. gapFinisher is based on controlled use of a gap filling tool called FGAP and works on all standard Linux/UNIX command lines. We conclude that performing the workflows of SSPACE-LongRead and gapFinisher enables users to fill gaps reliably. There is no need for further scrutiny of the existing sequencing data after performing the analysis.
Author Comment
This is a submission to PeerJ for review.