Accessing biological data in R with semantic web technologies
- Published
- Accepted
- Subject Areas
- Bioinformatics, Computational Science
- Keywords
- RDF, Semantic Web, SPARQL, databases
- Copyright
- © 2014 Willighagen
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- Cite this article
- 2014. Accessing biological data in R with semantic web technologies. PeerJ PrePrints 2:e185v2 https://doi.org/10.7287/peerj.preprints.185v2
Abstract
Background. Semantic Web technologies are increasingly used in biological database systems. The improved expressiveness show advantages in tracking provenance and allowing knowledge to be more explicitly annotated. The list of semantic web standards needs a complementary set of tools to handle data in those formats to use them in bioinformatics workflows.
Methods. The approach proposed in this paper uses the Apache Jena library to create an environment where semantic web technologies can be use in the statistical environment R. The code is exposed as two R packages available from the Comprehensive R Archive Network (CRAN). The RJava library and a custom convenience class is used to bridge between R and the Jena library.
Results. We here present two examples showing how the Resource Description Framework (RDF) and SPARQL query standards can be employed in R. The first example takes input on BRCA1 SNPs from a BioMart and converts this into a RDF data set. The second example runs a query on an experimental remote SPARQL end point provided by Uniprot, and searches textual annotations of proteins encoded by the BRCA1 gene. The third example shows how the package can be used to handle RDF returned by OpenTox web services.
Discussion. The two provided library bring basic semantic web technologies to R. While only a subset of Apache Jena is currently exposed, it provides key methods to deal with RDF data and resources. The libraries are freely available from the CRAN under the Affero GNU Public License version 3: http://cran.r-project.org/web/packages/rrdf/.
Author Comment
This version makes it clear in the title what data analysis platform is used, and adds the third example from the abstract.