The XL-mHG test for gene set enrichment

Graduate Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina, United States
Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States
DOI
10.7287/peerj.preprints.1962v3
Subject Areas
Computational Biology, Algorithms and Analysis of Algorithms, Data Mining and Machine Learning
Keywords
gene set enrichment, nonparametric statistics, algorithms, hypothesis testing
Copyright
© 2017 Wagner
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Wagner F. 2017. The XL-mHG test for gene set enrichment. PeerJ Preprints 5:e1962v3

Abstract

The nonparametric minimum hypergeometric (mHG) test is a popular alternative to Kolmogorov-Smirnov (KS)-type tests for determining gene set enrichment. However, these approaches have not been compared to each other in a quantitative manner. Here, I first perform a simulation study to show that the mHG test is significantly more powerful than the one-sided KS test for detecting gene set enrichment. I then illustrate a shortcoming of the mHG test, which has motivated a semiparametric generalization of the test, termed the XL-mHG test. I describe an improved quadratic-time algorithm for the efficient calculation of exact XL-mHG p-values, as well as a linear-time algorithm for calculating a tighter upper bound for the p-value. Finally, I demonstrate that the XL-mHG test outperforms the one-sided KS test when applied to a reference gene expression study, and discuss general principles for analyzing gene set enrichment using the XL-mHG test. An efficient open-source Python/Cython implementation of the XL-mHG test is provided in the xlmhg package, available from PyPI and GitHub (https://github.com/flo-compbio/xlmhg) under an OSI-approved license.

Author Comment

This version features a much more comprehensive introduction section, an application to a real-world expression study, and a discussion of the new results obtained. It also fixes several typos.