The XL-mHG test for enrichment: Algorithms, bounds, and power

Graduate Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina, United States
Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States
DOI
10.7287/peerj.preprints.1962v2
Subject Areas
Computational Biology, Algorithms and Analysis of Algorithms, Data Mining and Machine Learning
Keywords
enrichment, XL-mHG test, KS test, algorithms, hypothesis testing
Copyright
© 2016 Wagner
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Wagner F. 2016. The XL-mHG test for enrichment: Algorithms, bounds, and power. PeerJ Preprints 4:e1962v2

Abstract

The XL-mHG test is a semiparametric test for enrichment in ranked lists with Boolean (0/1-valued) entries. It is a generalization of the nonparametric mHG test, designed to provide some control over the kind of enrichment that is being tested for, and to allow a flexible trade-off between the sensitivity and robustness of the test. Here, I describe an improved algorithm to efficiently calculate the XL-mHG p-value p, and discuss upper and lower bounds for p. Furthermore, I perform simulations to show that the mHG test is a significantly more powerful alternative to the Kolmogorov-Smirnov (KS) test for detecting enrichment in scenarios that are frequently encountered in biological applications. An open-source Python/Cython implementation of the XL-mHG test is provided in the xlmhg package, which is available from PyPI and GitHub (https://github.com/flo-compbio/xlmhg) under an OSI-approved license.

Author Comment

This version fixes a few typos, including a wrong label in Figure 1.