SETLr: the semantic extract, transform, and load-r
- Published
- Accepted
- Subject Areas
- Computational Science, Data Science
- Keywords
- semantic interpretation, data curation, triplication
- Copyright
- © 2018 McCusker et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2018. SETLr: the semantic extract, transform, and load-r. PeerJ Preprints 6:e26476v1 https://doi.org/10.7287/peerj.preprints.26476v1
Abstract
Semantic Extract, Transform, and Load-er (SETLr) is a flexible, scalable tool for providing semantic interpretations to tabular, XML, and JSON-based data from local or web files. It has been used by diverse projects and has shown to be scalable and flexible, allowing for the simplified creation of arbitrary RDF, including ontologies and nanopublications, from many different data formats. Semantic ETL scripts use best practice standards for provenance (PROV-O) and support streaming conversion for RDF transformation using the JSON-LD based templating language, JSLDT. SETLr also supports custom Python functions for remote APIs, entity resolution, external data lookup, or other tasks. We provide case studies for dynamic SETL scripts, ontology generation, and scaling to gigabytes of input and discuss the value and impact of this approach.
Author Comment
This is a preprint submission to PeerJ Preprints. We are currently considering the best venue for peer-reviewed publication.