This is more a comment on your Hartgerink 2015, which you cite as evidence that in our paper Head et al. 2015 PLoS Biol, "the original results may have been confounded by publication bias and tendencies to round p-values".
Firstly, as you hint on the following line of the present PrePrint, publication bias (i.e. the file drawer problem for p-values above 0.05) does not affect our study because the method we used does not utilise p values >0.05. Head et al also conducted a manual analysis that suggested that when one re-calculates p-values from the test statistics, they often erroneously end up on the 'good' side of p=0.05. So, it's a bit misleading to ignore this and cite publication bias as something that screws the analysis in Head et al.
Regarding that argument in Hartgerink (2015) that Head et al's conclusions are confounded by rounding error, I already addressed this when your paper was reviewed and rejected by PLoS Biology, and again in a comment on the non-peer-reviewed Hartgerink (2015). If you don't agree with my assessment, I'd be interested to hear why. Otherwise, perhaps you should not cite Hartgerink (2015) as evidence that Head et al.'s conclusions do not hold. Here are my comments copied over from Hartgergink (2015):
Our original analyses removed all p-values reported to two decimal places (e.g. 0.04 and 0.05), because there is empirical evidence (and a priori, it seems likely) that these data are ‘tainted’ by inconsistent reporting or rounding practices.
For example, it seems probable that p values of 0.049 will very often be reported as p < 0.05 instead of p = 0.049 or p = 0.05, due to authors trying to hide the fact that 0.049 is ‘only just significant’. Conversely, authors are probably comparatively less likely to report 0.039 as p < X or p = 0.04, since 0.039 is a ‘good, significant looking number’, so there is no shame in reporting it exactly. Therefore, we expect biased reporting practices to cause the 0.05 peak to be smaller than the 0.04 peak, and indeed that is exactly what our data show (see Fig 1 in Hartgerink’s manuscript).
Because we are aware of this bias, we elected to do our analysis on the p- value bins 0.04 < p < 0.045 and 0.045 < p < 0.05 (note that we excluded the problematic values of 0.04 and 0.05). By contrast, Hartgerink’s analysis includes these tainted data, and unsurprisingly finds no evidence for p-hacking (since the 0.05 bin is clearly much lower than the 0.04 peak - ironically, probably because of p-hacking!). An additional (more minor) problem with Hartgerink’s analysis is that it uses p-value bins that are quite far apart (e.g. comparing 0.035-0.04 and 0.045-0.05). This makes his test less sensitive, because the overall p-curve displays right skew (because of evidential value), and we are trying to detect left skew, which should be most evident in the region close to 0.05, where the right skew is weaker.
Finally, in the PeerJ pre-print linked in the above comment [by Bishop and Thompson], the authors show that our conclusion of p-hacking holds even if one takes the comparatively drastic step of eliminating all papers in which there are any p-values reported to less than 3 decimal places. This seems like a conservative approach, but it illustrates further that our results are not a spurious consequence of the primary studies’ propensity to round off their p-values.