Fit for purpose? Identifying and resolving quality issues with marine biodiversity datasets in R
- Published
- Accepted
- Subject Areas
- Biodiversity, Marine Biology
- Keywords
- quality control, public biodiversity data, OBIS
- Copyright
- © 2018 Bosch et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2018. Fit for purpose? Identifying and resolving quality issues with marine biodiversity datasets in R. PeerJ Preprints 6:e26776v1 https://doi.org/10.7287/peerj.preprints.26776v1
Abstract
Millions of marine species occurrences and abundances can be accessed through the Ocean Biogeographic Information System (OBIS), which are then often combined with data from additional sources such as GBIF, citizen science projects, scientific literature and personal communications. However, the quality of the available data is variable and it thus needs to be scrutinized in order to get a dataset that is fit for purpose.To help this process, as well as increase the quality of the data before they are published in OBIS, we developed the obistools R package. It allows users to identify and resolve common data errors such as taxonomic, spatial, temporal and measurement issues. The package combines and builds on existing services made available by the World Register of Marine Species as well as some new OBIS home-made services. The interactive interface provides a series of strict and fuzzy quality checks ranging from longitude/latitude checks to environmental outlier detection. These checks, in combination with pre-defined constraints based on for instance the physiological knowledge of the species and the expected spatial extent, can then be used to evaluate if specific records can be used in the analysis to obtain the final dataset for the analysis.
Author Comment
This is an abstract which has been accepted for the WCMB.