Amplikyzer: Automated methylation analysis of amplicons from bisulfite flowgram sequencing

Genome Informatics, Institute of Human Genetics, Universit Hospital Essen, University of Duisburg-Essen, Essen, Germany
Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
DOI
10.7287/peerj.preprints.122v2
Subject Areas
Bioinformatics, Computational Biology, Genetics, Computational Science
Keywords
methylation, epigenetics, bisulfite sequencing, pyrosequencing, amplicon analysis, flowgram, homopolymer problem, sff file, alignment
Copyright
© 2013 Rahmann et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Cite this article
Rahmann S, Beygo J, Kanber D, Martin M, Horsthemke B, Buiting K. 2013. Amplikyzer: Automated methylation analysis of amplicons from bisulfite flowgram sequencing. PeerJ PrePrints 1:e122v2

Abstract

The Roche 454 GS Junior sequencing platform allows locus-specific DNA methylation analysis using deep bisulfite amplicon sequencing. However, bisulfite-converted DNA reads may contain long T homopolymers, and the main sources of errors on pyrosequencing platforms are homopolymer over- and undercalls. Furthermore, existing tools do not always meet the analysis requirements for complex assay designs with multiple regions of interest (ROIs) from multiple samples. We have developed the amplikyzer software package to address the above challenges. It directly aligns the intensity sequences from standard flowgram files (SFF format) to given amplicon reference sequences, without converting to nucleotide FASTA format first, avoiding information loss by rounding flow intensities, and taking special measures to correctly process long homopolymers. It offers a variety of options to analyze complex multiplexed samples with several regions of interest and outputs useful statistics and publication-quality analysis plots without mandatory manual interaction. This allows our software to be used as part of automated pipelines as well as interactively. The underlying analysis algorithms, using a novel hybrid flowgram-DNA sequence representation are described in detail. We also discuss configuration options and use cases of our open source amplikyzer software and present exemplary results. The software, including required libraries, is available at https://bitbucket.org/svenrahmann/amplikyzer/downloads.

Author Comment

Minor corrections.