Accessing biological data in R with semantic web technologies
- Published
- Accepted
- Subject Areas
- Bioinformatics, Computational Science
- Keywords
- RDF, Semantic Web, SPARQL, databases, Jena, CRAN
- Copyright
- © 2014 Willighagen
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- Cite this article
- 2014. Accessing biological data in R with semantic web technologies. PeerJ PrePrints 2:e185v3 https://doi.org/10.7287/peerj.preprints.185v3
Abstract
Background. Semantic Web technologies are increasingly used in biological database systems. The improved expressiveness shows advantages in tracking provenance and allowing knowledge to be more explicitly annotated. The list of semantic web standards needs a complementary set of tools to handle data in those formats to use them in bioinformatics workflows.
Methods. The approach proposed in this paper uses the Apache Jena library to create an environment where semantic web technologies can be used in the statistical environment R. The code is exposed as two R packages available from the Comprehensive R Archive Network (CRAN). The RJava library and a custom convenience class is used to bridge between R and the Jena library.
Results. We here present three examples showing how the Resource Description Framework (RDF) and SPARQL query standards can be employed in R. The first example takes input on BRCA1 SNPs from a BioMart and converts this into a RDF data set. The second example runs a query on an experimental remote SPARQL end point provided by Uniprot, and searches textual annotations of proteins encoded by the BRCA1 gene. The third example shows how the package can be used to handle RDF returned by OpenTox web services.
Discussion. The two provided library bring basic semantic web technologies to R. This paper only shows examples from the biology domain, but we believe the approaches are generally applicable. It currently only exposes a subset of key Apache Jena, but the rrdf package makes it easy to make more of the library's functionality, such as the shortest path finding. The rrdf libraries are freely available from the CRAN under the Affero GNU Public License version 3: http://cran.r-project.org/web/packages/rrdf/.
Author Comment
This version applies comments from my group leader: https://github.com/Chris-Evelo/rrdf-paper/commit/efbbcac0fb79edd146be299a5ad5162b852737f7