SETLr: the semantic extract, transform, and load-r

Jamie P McCusker; Katherine Chastain; Sabbir Rashid; Spencer Norris; Deborah L McGuinness

doi:10.7287/peerj.preprints.26476v1

SETLr: the semantic extract, transform, and load-r

Jamie P McCusker ¹, Katherine Chastain¹, Sabbir Rashid¹, Spencer Norris¹, Deborah L McGuinness^1,2

1 Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York, USA

2 Department of Cognitive Science, Rensselaer Polytechnic Institute, Troy, New York, United States

DOI: 10.7287/peerj.preprints.26476v1

Published: 2018-02-02
Accepted: 2018-01-31

Subject Areas: Computational Science, Data Science
Keywords: semantic interpretation, data curation, triplication

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

Cite this article: McCusker JP, Chastain K, Rashid S, Norris S, McGuinness DL. 2018. SETLr: the semantic extract, transform, and load-r. PeerJ Preprints 6:e26476v1 https://doi.org/10.7287/peerj.preprints.26476v1

Abstract

Semantic Extract, Transform, and Load-er (SETLr) is a flexible, scalable tool for providing semantic interpretations to tabular, XML, and JSON-based data from local or web files. It has been used by diverse projects and has shown to be scalable and flexible, allowing for the simplified creation of arbitrary RDF, including ontologies and nanopublications, from many different data formats. Semantic ETL scripts use best practice standards for provenance (PROV-O) and support streaming conversion for RDF transformation using the JSON-LD based templating language, JSLDT. SETLr also supports custom Python functions for remote APIs, entity resolution, external data lookup, or other tasks. We provide case studies for dynamic SETL scripts, ontology generation, and scaling to gigabytes of input and discuss the value and impact of this approach.

Author Comment

This is a preprint submission to PeerJ Preprints. We are currently considering the best venue for peer-reviewed publication.