Ten simple rules for digital data storage

Department of Biology, University of Vermont, Burlington, Vermont, United States
Department of Physics and Astronomy, University of Western Ontario, London, Ontario, Canada
National Center for Supercomputing Applications and Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States
iDigBio, Florida Museum of Natural History, University of Florida, Gainesville, Florida, United States
Whitney Laboratory for Marine Bioscience, University of Florida, Gainesville, Florida, United States
Department of Informatics, King's College London, University of London, London, United Kingdom
San Diego Super Computer Center, University of California, San Diego, San Diego, CA, United States
Département de Sciences Biologiques, University of Montreal, Montreal, Canada
Center for Environmental Research, Education, and Outreach, Washington State University, Pullman, Washington, United States
School of Plant Sciences, University of Arizona, Tucson, Arizona, United States
Atlantic Ecology Division, United States Environmental Protection Agency, Narragansett, Rhode Island, United States
DOI
10.7287/peerj.preprints.1448v2
Subject Areas
Bioinformatics, Digital Libraries, Scientific Computing and Simulation
Keywords
Data, Informatics, Standards, Metadata, Storage
Copyright
© 2016 Hart et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Hart E, Barmby P, LeBauer D, Michonneau F, Mount S, Mulrooney P, Poisot T, Woo KH, Zimmerman N, Hollister JW. 2016. Ten simple rules for digital data storage. PeerJ Preprints 4:e1448v2

Abstract

Data is the central currency of science, but the nature of scientific data has changed dramatically with the rapid pace of technology. This change has led to the development of a wide variety of data formats, dataset sizes, data complexity, data use cases, and data sharing practices. Improvements in high throughput DNA sequencing, sustained institutional support for large sensor networks, and sky surveys with large-format digital cameras have created massive quantities of data. At the same time, the combination of increasingly diverse research teams and data aggregation in portals (e.g. for biodiversity data, GBIF or iDigBio) necessitates increased coordination among data collectors and institutions. As a consequence, “data” can now mean anything from petabytes of information stored in professionally-maintained databases, through spreadsheets on a single computer, to hand-written tables in lab notebooks on shelves. All remain important, but data curation practices must continue to keep pace with the changes brought about by new forms and practices of data collection and storage.

Author Comment

This new version represents the copy of the manuscript that has been submitted for review at PLoS Computational Biology. It includes feedback based on comments on twitter and those made on v1 of this preprint.