ELM: Enhanced lowest common ancestor based method for detecting a pathogenic virus from a large sequence dataset

Research Center for Zoonosis Control, Hokkaido University, Sapporo, Japan
DOI
10.7287/peerj.preprints.385v1
Subject Areas
Bioinformatics, Virology
Keywords
Virus discovery, Virome, Next generation sequencing, Diagnostic virology, Taxonomic identification
Copyright
© 2014 Ueno et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Ueno K, Ishii A, Ito K. 2014. ELM: Enhanced lowest common ancestor based method for detecting a pathogenic virus from a large sequence dataset. PeerJ PrePrints 2:e385v1

Abstract

Emerging viral diseases, most of which are caused by the transmission of viruses from animals to humans, pose a threat to public health. Discovering pathogenic viruses through surveillance is the key to preparedness for this potential threat. Next generation sequencing (NGS) helps us to identify viruses without the design of a specific PCR primer. The major task in NGS data analysis is taxonomic identification for vast numbers of sequences. However, taxonomic identification via a BLAST search against all the known sequences is a computational bottleneck. Here we propose an enhanced lowest-common-ancestor based method (ELM) to effectively identify viruses from massive sequence data. To reduce the computational cost, ELM uses a customized database composed only of viral sequences for the BLAST search. At the same time, ELM adopts a novel criterion to suppress the rise in false positive assignments caused by the small database. As a result, identification by ELM is more than 1,000 times faster than the conventional methods without loss of accuracy. We anticipate that ELM will contribute to direct diagnosis of viral infections. The web server and the customized viral database are freely available at http://bioinformatics.czc.hokudai.ac.jp/ELM/.

Author Comment

This manuscript was submitted to a peer-reviewed journal.

Supplemental Information

Supplementary data

Figure S1 and Figure S2.

DOI: 10.7287/peerj.preprints.385v1/supp-1