SETLr: the semantic extract, transform, and load-r

Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York, USA
Department of Cognitive Science, Rensselaer Polytechnic Institute, Troy, New York, United States
DOI
10.7287/peerj.preprints.26476v1
Subject Areas
Computational Science, Data Science
Keywords
semantic interpretation, data curation, triplication
Copyright
© 2018 McCusker et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
McCusker JP, Chastain K, Rashid S, Norris S, McGuinness DL. 2018. SETLr: the semantic extract, transform, and load-r. PeerJ Preprints 6:e26476v1

Abstract

Semantic Extract, Transform, and Load-er (SETLr) is a flexible, scalable tool for providing semantic interpretations to tabular, XML, and JSON-based data from local or web files. It has been used by diverse projects and has shown to be scalable and flexible, allowing for the simplified creation of arbitrary RDF, including ontologies and nanopublications, from many different data formats. Semantic ETL scripts use best practice standards for provenance (PROV-O) and support streaming conversion for RDF transformation using the JSON-LD based templating language, JSLDT. SETLr also supports custom Python functions for remote APIs, entity resolution, external data lookup, or other tasks. We provide case studies for dynamic SETL scripts, ontology generation, and scaling to gigabytes of input and discuss the value and impact of this approach.

Author Comment

This is a preprint submission to PeerJ Preprints. We are currently considering the best venue for peer-reviewed publication.