Dear Kromann, your paper is an interesting read and covers an important topic! Like many computational methods, benchmarking is hard due to lack of high quality experimental data. I may have overlooked it (I hope not), but I did not find a section about the reference data used to calculate the error other than ab initio calculations. In your opinion, where is the field currently? The error you observe, how large is that error compared to, for example, the error in currently reported experimental data, and is your error against the ab initio data comparable to the error compared to experimental data? Continuing on that, is ab initio perhaps really a better target than experimental data, because of the measuring error?

Based on my reading ( [http://dx.doi.org/10.1002/cmdc.200700059] and [http://dx.doi.org/10.1007%2Fs11095-013-1232-z]) the accuracy of the measured pKa values are probably about 0.1 pH units. So the experimental values for these small benchmark compounds are considerably more accurate than ab initio values and are the appropriate choice for benchmark values

