Accessing biological data in R with semantic web technologies
Author and article information
Abstract
Background. Semantic Web technologies are increasingly used in biological database systems. The improved expressiveness shows advantages in tracking provenance and allowing knowledge to be more explicitly annotated. The list of semantic web standards needs a complementary set of tools to handle data in those formats to use them in bioinformatics workflows.
Methods. The approach proposed in this paper uses the Apache Jena library to create an environment where semantic web technologies can be used in the statistical environment R. The code is exposed as two R packages available from the Comprehensive R Archive Network (CRAN). The RJava library and a custom convenience class is used to bridge between R and the Jena library.
Results. We here present three examples showing how the Resource Description Framework (RDF) and SPARQL query standards can be employed in R. The first example takes input on BRCA1 SNPs from a BioMart and converts this into a RDF data set. The second example runs a query on an experimental remote SPARQL end point provided by Uniprot, and searches textual annotations of proteins encoded by the BRCA1 gene. The third example shows how the package can be used to handle RDF returned by OpenTox web services.
Discussion. The two provided library bring basic semantic web technologies to R. This paper only shows examples from the biology domain, but we believe the approaches are generally applicable. It currently only exposes a subset of key Apache Jena, but the rrdf package makes it easy to make more of the library's functionality, such as the shortest path finding. The rrdf libraries are freely available from the CRAN under the Affero GNU Public License version 3: http://cran.r-project.org/web/packages/rrdf/.
Cite this as
2014. Accessing biological data in R with semantic web technologies. PeerJ PrePrints 2:e185v3 https://doi.org/10.7287/peerj.preprints.185v3Author comment
This version applies comments from my group leader: https://github.com/Chris-Evelo/rrdf-paper/commit/efbbcac0fb79edd146be299a5ad5162b852737f7
Sections
Additional Information
Competing Interests
I have no competing interests.
Author Contributions
Egon Willighagen conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
Funding
The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007–2013) under Grant Agreement n° [267042], from Cosmetics Europe, and from the Innovative Medicines Initiative Joint Undertaking under grant agreement no. 115191, resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies’ in-kind contribution. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.