Scientific data science and the case for Open Access
- Published
- Accepted
- Subject Areas
- Bioinformatics, Evidence Based Medicine, Health Policy, Translational Medicine, Science Policy
- Keywords
- reproducibility crisis, open access, scientific data science, meta-analysis, journal reform, altmetrics, peer review, preprints, scientific epistemology, structure of science
- Copyright
- © 2016 Sarma
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2016. Scientific data science and the case for Open Access. PeerJ Preprints 4:e2566v1 https://doi.org/10.7287/peerj.preprints.2566v1
Abstract
“Open access” has become a central theme of journal reform in academic publishing. In this article, I examine the consequences of an important technological loophole in which publishers can claim to be adhering to the principles of open access by releasing articles in proprietary or “locked” formats that cannot be processed by automated tools, whereby even simple copy and pasting of text is disabled. These restrictions will prevent the development of an important infrastructural element of a modern research enterprise, namely, scientific data science, or the use of data analytic techniques to conduct meta-analyses and investigations into the scientific corpus. I give a brief history of the open access movement, discuss novel journalistic practices, and an overview of data-driven investigation of the scientific corpus. I argue that particularly in an era where the veracity of many research studies has been called into question, scientific data science should be one of the key motivations for open access publishing. The enormous benefits of unrestricted access to the research literature should prompt scholars from all disciplines to reject publishing models whereby articles are released in proprietary formats or are otherwise restricted from being processed by automated tools as part of a data science pipeline.
Author Comment
This is a preprint submission to PeerJ Preprints.