The introduction of the FAIR –Findable, Accessible, Interoperable, Reusable– principles has caused quite an uproar within the scientific community. Principles which, if everyone adheres to them, could result in new, revolutionary ways of performing research and fulfill the promise of open science. Furthermore, it allows for concepts such as personalized medicine and personal health monitoring to -finally- become implemented in daily practice.
However, to bring about these changes, data users need to rethink the way they treat scientific data. Just passing a dataset along, without extensive metadata will not suffice anymore. Such new ways of executing research require a significantly different approach from the entire scientific community or, for that matter, anyone who wants to reap the benefits from going FAIR.
Yet, how do you initiate behavioral change? One important solution is by changing the software scientists use and requiring data owners, or data stewards, to FAIRify their dataset. Data catalogs are a great starting point for FAIRifying data as the software already intends to make data Findable and Accessible, while the metadata is Interoperable and relying on users to provide sufficient metadata to ensure Reusability. In this paper we analyse how well the FAIR principles are implemented in several data catalogs.
To determine how FAIR a catalog is, the FAIR metrics were created by the GO-FAIR initiative. These metrics help determine to what extend data can be considered FAIR. However, the metrics were only recently developed, being first released at the end of 2017. At the moment software does not come standard with a FAIR metrics review. Still, this insight is highly desired by the scientific community. How else can they be sure that (public) money is spend in a FAIR way?
The Hyve has tested/evaluated three popular open source data catalogs based on the FAIR metrics: CKAN, Dataverse, and Invenio. Most data stewards will be familiar with at least one of these.
Within this white paper we provide answers to the following questions:
Which of the three data catalogs performs best in making data FAIR?
Which data catalog utilizes FAIR datasets the most?
Which one creates the most FAIR metadata?
Which catalog has the highest potential to increase its FAIRness, and how?
Which data catalog facilitates the FAIRifying process the best?