The Modern Research Data Portal: A design pattern for networked, data-intensive science
A peer-reviewed article of this Preprint also exists.
Author and article information
Abstract
We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance Science DMZs and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.
Cite this as
2017. The Modern Research Data Portal: A design pattern for networked, data-intensive science. PeerJ Preprints 5:e3194v2 https://doi.org/10.7287/peerj.preprints.3194v2Author comment
This new version corrects two minor typos.
Sections
Additional Information
Competing Interests
Ian Foster is an Advisor and Academic Editor for PeerJ Computer Science.
Author Contributions
Kyle Chard conceived and designed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, performed the computation work, reviewed drafts of the paper.
Eli Dart conceived and designed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
Ian Foster conceived and designed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, performed the computation work, reviewed drafts of the paper.
David Shifflett performed the experiments, performed the computation work.
Steven Tuecke conceived and designed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, performed the computation work, reviewed drafts of the paper.
Jason Williams performed the experiments, performed the computation work, reviewed drafts of the paper.
Data Deposition
The following information was supplied regarding data availability:
The companion web site, http://docs.globus.org/mrdp, provides references to GitHub for associated code.
Name of repository: GitHub
URL: https://github.com/globus/globus-sample-data-portal for the code.
Funding
This work was supported by the United States National Science Foundation (ACI-1148484) and Department of Energy's Office of Advanced Scientific Computing Research (DE-AC02-06CH11357). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.