Publication bias is a form of scientific misconduct. It threatens the validity of research results and the credibility of science. Although several tests on publication bias exist, no in-depth evaluations are available that suggest which test to use for the specific research problem.
In the study at hand four tests on publication bias, Egger’s test (FAT), p-uniform, the test of excess significance (TES), as well as the caliper test, were evaluated in a Monte Carlo simulation. Two different types of publication bias, as well as its degree (0%, 50%, 100%), were simulated. The type of publication bias was defined either as file-drawer, meaning the repeated analysis of new datasets, or p-hacking, meaning the inclusion of covariates in order to obtain a significant result. In addition, the underlying effect (β = 0, 0.5, 1, 1.5), effect heterogeneity, and the number of observations in the simulated primary studies (N =100, 500), as well as in the number of observations for the publication bias tests (K =100, 1000), were varied.
All tests evaluated were able to identify publication bias both in the file-drawer and p-hacking condition. The false positive rates were, with the exception of the 15%- and 20%-caliper test, unbiased. The FAT had the largest statistical power in the file-drawer conditions, whereas under p-hacking the TES was, except under effect heterogeneity, slightly better. The caliper test was, however, inferior to the other tests under effect homogeneity and had a decent statistical power only in conditions with 1000 primary studies.
The FAT is recommended as a test for publication bias in standard meta-analyses with no or only small effect heterogeneity. If no clear direction of publication bias is suspected the TES is the first alternative to the FAT. The 5%-caliper tests is recommended under conditions of effect heterogeneity, which may be found if publication bias is examined in a discipline-wide setting when primary studies cover different research problems.