The health care and life sciences community profile for dataset descriptions
- Published
- Accepted
- Subject Areas
- Bioinformatics, Taxonomy, Computational Science
- Keywords
- data profiling, dataset descriptions, metadata, provenance, FAIR data
- Copyright
- © 2017 Dumontier et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. The health care and life sciences community profile for dataset descriptions. PeerJ Preprints 5:e1982v2 https://doi.org/10.7287/peerj.preprints.1982v2
Abstract
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.
Author Comment
Revisions to address reviewers comments. Chiefly adding summary of use cases, complete example of summary level description, and more discussion of related work.