The XL-mHG test for enrichment: Algorithms, bounds, and power
- Published
- Accepted
- Subject Areas
- Computational Biology, Algorithms and Analysis of Algorithms, Data Mining and Machine Learning
- Keywords
- enrichment, XL-mHG test, KS test, algorithms, hypothesis testing
- Copyright
- © 2016 Wagner
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2016. The XL-mHG test for enrichment: Algorithms, bounds, and power. PeerJ Preprints 4:e1962v2 https://doi.org/10.7287/peerj.preprints.1962v2
Abstract
The XL-mHG test is a semiparametric test for enrichment in ranked lists with Boolean (0/1-valued) entries. It is a generalization of the nonparametric mHG test, designed to provide some control over the kind of enrichment that is being tested for, and to allow a flexible trade-off between the sensitivity and robustness of the test. Here, I describe an improved algorithm to efficiently calculate the XL-mHG p-value p, and discuss upper and lower bounds for p. Furthermore, I perform simulations to show that the mHG test is a significantly more powerful alternative to the Kolmogorov-Smirnov (KS) test for detecting enrichment in scenarios that are frequently encountered in biological applications. An open-source Python/Cython implementation of the XL-mHG test is provided in the xlmhg package, which is available from PyPI and GitHub (https://github.com/flo-compbio/xlmhg) under an OSI-approved license.
Author Comment
This version fixes a few typos, including a wrong label in Figure 1.