A mathematical theory of knowledge, science, bias and pseudoscience

Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, California, United States
DOI
10.7287/peerj.preprints.1968v3
Subject Areas
Computational Biology, Evolutionary Studies, Science Policy, Statistics, Computational Science
Keywords
soft science, hard science, philosophy of science, research misconduct, questionable research practices, reproducibility, pseudo-science, positivism, falsification, relativism
Copyright
© 2017 Fanelli
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Fanelli D. 2017. A mathematical theory of knowledge, science, bias and pseudoscience. PeerJ Preprints 5:e1968v3

Abstract

This essay proposes mathematical answers to meta-scientific questions including "how much knowledge is produced by research?", "how rapidly is a field making progress?", "what is the expected reproducibility of a result?", "what do we mean by soft science?", "what demarcates a pseudoscience?", and many others. From two simple postulates - 1) information is finite; 2) knowledge is information compression - we derive a function \(K(y;x\tau)=\frac{T(y)-T(y|x \tau)}{(T(y)+T(x)+T(\tau)}\), in which the total information \(T()\) contained in an explanandum \(y\) is lossless or lossy compressed via an explanans composed of an information input \(x\) and a "theory" component \(\tau\). The latter is a factor that conditions the relationship between \(y\) and \(x\), with an information "cost" equivalent to the description length of the relationship itself. This function is proposed as a simple and universal tool to understand and analyse knowledge dynamics, scientific or otherwise. Soft sciences are shown to be simply fields that yield relatively low K values. Bias turns out to be information that is concealed in methodological choices, thereby reducing K. Disciplines typically classified as pseudosciences are suggested to be sciences that suffer from extreme bias: their informational input is greater than their output, yielding \(K(y;x\tau) < 0\). The essay derives numerous general results, some of which may be counter-intuitive. For example, it suggests that reproducibility failures are inevitable, and that the value of publishing negative results may vary across fields and within a field over time. Therefore, there may be conditions in which the costs of reproducible research practices such as publishing negative results and sharing data may outweigh the benefits. The theory makes several testable predictions concerning science and cognition in general, and it may have numerous applications that future research could develop, test and implement to foster progress on all frontiers of knowledge.

Author Comment

This is a substantially updated and expanded version of the manuscript. It includes multiple new results in addition to various adjustments and clarifications of previous ones. Many more examples are offered throughout, and there is an entirely new section illustrating how the K function may quantify all forms of knowledge and biological adaptation. This new section and most mathematical details have been placed in a Supporting Information section at the end of the manuscript.