Versioned data: why it is needed and how it can be achieved (easily and cheaply)
- Published
- Accepted
- Subject Areas
- Computational Biology, Ecology, Computational Science, Data Science
- Keywords
- Version control, Data sharing, Semantic versioning, Meta-analysis
- Copyright
- © 2017 Falster et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. Versioned data: why it is needed and how it can be achieved (easily and cheaply) PeerJ Preprints 5:e3401v1 https://doi.org/10.7287/peerj.preprints.3401v1
Abstract
The sharing and re-use of data has become a cornerstone of modern science. Multiple platforms now allow quick and easy data sharing. So far, however, data publishing models have not accommodated on-going scientific improvements in data: for many problems, datasets continue to grow with time -- more records are added, errors fixed, and new data structures are created. In other words, datasets, like scientific knowledge, advance with time. We therefore suggest that many datasets would be usefully published as a series of versions, with a simple naming system to allow users to perceive the type of change between versions. In this article, we argue for adopting the paradigm and processes for versioned data, analogous to software versioning. We also introduce a system called Versioned Data Delivery and present tools for creating, archiving, and distributing versioned data easily, quickly, and cheaply. These new tools allow for individual research groups to shift from a static model of data curation to a dynamic and versioned model that more naturally matches the scientific process.
Author Comment
This is a preprint submission to PeerJ Preprints.