Exact pattern matching: Adapting the Boyer-Moore algorithm for DNA searches
- Published
- Accepted
- Subject Areas
- Bioinformatics, Algorithms and Analysis of Algorithms
- Keywords
- Boyer-Moore, Exact pattern matching, Algorithm optimization
- Copyright
- © 2016 Allmer
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
- Cite this article
- 2016. Exact pattern matching: Adapting the Boyer-Moore algorithm for DNA searches. PeerJ PrePrints 4:e1758v1 https://doi.org/10.7287/peerj.preprints.1758v1
Abstract
Exact pattern matching aims to locate all occurrences of a pattern in a text. Many algorithms have been proposed, but two algorithms, the Knuth-Morris-Pratt (KMP) and the Boyer-Moore (BM), are most widespread. It is the basis of some approximate string matching algorithms like BLAST, and in many cases it is desirable to locate an exact rather than approximate matches. Although several studies included measures with small alphabets, none of them specifically designed an algorithm to target nucleotide sequences. Since there are also no application programming interfaces available for pattern matching in nucleotide sequences, these two issues were aimed to be resolved. A portion of the Chlamydomonas reinhardtii genome (30 mega bases) was searched with queries ranging from 10 to 2000 nucleotides and an alternating number of matches between one and 25000. The results indicate that the use of two of the algorithms developed in this study is sufficient to efficiently cover the complete search space as presented in the experiment conducted here. Thus the aim of implementing an algorithm specifically targeting pattern matching in nucleotide sequences and making it available to the general public as an advanced programming interface was achieved. All algorithms are freely available at: http://bioinformatics.iyte.edu.tr/supplements/peerj/.
Author Comment
This is a preprint submission to PeerJ Preprints.
It is my intention to submit this work as a regular paper shortly.
Supplemental Information
Supplementary pseudo code
More comprehensive pseudo codes for algorithms designed in this study