Data organization in spreadsheets

Karl W Broman; Kara H. Woo

doi:10.7287/peerj.preprints.3183v2

Data organization in spreadsheets

Karl W Broman ¹, Kara H. Woo²

1 Biostatistics & Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States

2 Information School, University of Washington, Seattle, Washington, United States

DOI: 10.7287/peerj.preprints.3183v2

Published: 2018-09-11
Accepted: 2018-09-11

Subject Areas: Computational Biology, Science and Medical Education, Statistics, Computational Science, Data Science
Keywords: data management, data organization, spreadsheets, Microsoft Excel

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

Cite this article: Broman KW, Woo KH. 2018. Data organization in spreadsheets. PeerJ Preprints 6:e3183v2 https://doi.org/10.7287/peerj.preprints.3183v2

Abstract

Spreadsheets are widely used software tools for data entry, storage, analysis, and visualization. Focusing on the data entry and storage aspects, this paper offers practical recommendations for organizing spreadsheet data to reduce errors and ease later analyses. The basic principles are: be consistent, write dates like YYYY-MM-DD, don't leave any cells empty, put just one thing in a cell, organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row), create a data dictionary, don't include calculations in the raw data files, don't use font color or highlighting as data, choose good names for things, make backups, use data validation to avoid data entry errors, and save the data in plain text files.

Author Comment

We fixed the URL for Woo (2014), the DOI for White et al. (2013), and a typo in abstract, and simplified one bit of text.