Packaging data analytical work reproducibly using R (and friends)

Department of Anthropology, University of Washington, Seattle, Washington, United States
Department of Environmental Science, Policy and Management, University of California, Berkeley, Berkeley, California, United States
Department of History and Art History, George Mason University, Fairfax, Virginia, United States
DOI
10.7287/peerj.preprints.3192v1
Subject Areas
Anthropology, Computational Biology, Science and Medical Education, Computational Science, Data Science
Keywords
reproducible research, data science, social computing, computer education, statistics, Scientific Computing and Simulation, Programming Languages
Copyright
© 2017 Marwick et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Marwick B, Boettiger C, Mullen L. 2017. Packaging data analytical work reproducibly using R (and friends) PeerJ Preprints 5:e3192v1

Abstract

Computers are a central tool in the research process, enabling complex and large scale data analysis. As computer-based research has increased in complexity, so have the challenges of ensuring that this research is reproducible. To address this challenge, we review the concept of the research compendium as a solution for providing a standard and easily recognisable way for organising the digital materials of a research project to enable other researchers to inspect, reproduce, and extend the research. We investigate how the structure and tooling of software packages of the R programming language are being used to produce research compendia in a variety of disciplines. We also describe how software engineering tools and services are being used by researchers to streamline working with research compendia. Using real-world examples, we show how researchers can improve the reproducibility of their work using research compendia based on R packages and related tools.

Author Comment

This is part of the 'Practical Data Science for Stats' Collection, edited by Jenny Bryan and Hadley Wickham.