Finding melanoma drugs through a probabilistic knowledge graph

Jamie Patricia McCusker; Michel Dumontier; Rui Yan; Sylvia He; Jonathan S. Dordick; Deborah L. McGuinness

doi:10.7717/peerj-cs.106

Finding melanoma drugs through a probabilistic knowledge graph

Jamie Patricia McCusker ¹, Michel Dumontier², Rui Yan¹, Sylvia He¹, Jonathan S. Dordick^3,4, Deborah L. McGuinness ^1,4

1Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, USA

2Stanford Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA

3Department of Chemical & Biological Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA

4Center for Biotechnology & Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY, USA

DOI: 10.7717/peerj-cs.106

Published: 2017-02-13
Accepted: 2016-12-27
Received: 2016-04-27

Academic Editor: Yonghong Peng

Subject Areas: Bioinformatics, Computational Biology, Data Science, World Wide Web and Web Science
Keywords: Melanoma, Knowledge graphs, Drug repositioning, Uncertainty reasoning

Copyright: © 2017 McCusker et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

Cite this article: McCusker JP, Dumontier M, Yan R, He S, Dordick JS, McGuinness DL. 2017. Finding melanoma drugs through a probabilistic knowledge graph. PeerJ Computer Science 3:e106 https://doi.org/10.7717/peerj-cs.106

The authors have chosen to make the review history of this article public.

Abstract

Metastatic cutaneous melanoma is an aggressive skin cancer with some progression-slowing treatments but no known cure. The omics data explosion has created many possible drug candidates; however, filtering criteria remain challenging, and systems biology approaches have become fragmented with many disconnected databases. Using drug, protein and disease interactions, we built an evidence-weighted knowledge graph of integrated interactions. Our knowledge graph-based system, ReDrugS, can be used via an application programming interface or web interface, and has generated 25 high-quality melanoma drug candidates. We show that probabilistic analysis of systems biology graphs increases drug candidate quality compared to non-probabilistic methods. Four of the 25 candidates are novel therapies, three of which have been tested with other cancers. All other candidates have current or completed clinical trials, or have been studied in in vivo or in vitro. This approach can be used to identify candidate therapies for use in research or personalized medicine.

Introduction

Metastatic cutaneous melanoma is an aggressive cancer of the skin with low prevalence but very high mortality rate, with an estimated 5-year survival rate of 6% (Barth, Wanek & Morton, 1995). There are currently no known therapies that can consistently cure metastatic melanoma. Vemurafenib is effective against BRAF mutant melanomas (Chapman et al., 2011) but resistant cells often result in recurrence of metastases (Le et al., 2013). Melanoma itself may be best approached based on the individual genetics of the tumor, as it has been shown to involve mutations in many different genes to produce the same disease (Krauthammer et al., 2015). Because of this, an individualized approach may be necessary to find effective treatments.

Drug repurposing, or the discovery of new uses for existing approved drugs, can often lead to effective new treatments for diseases. A wide range of computational methods have been developed in support of drug repositioning. Computational approaches (Sanseau & Koehler, 2011) include topic modeling (Bisgin et al., 2012, 2014), side-effect similarity (Yang & Agarwal, 2011; Ye, Liu & Wei, 2014), drug and/or disease similarity (Chiang & Butte, 2009; Gottlieb et al., 2011), genome-wide association studies (Kingsmore et al., 2008; Grover et al., 2014), and gene expression (Lamb et al., 2006; Sirota et al., 2011). Systems biology has also provided a number of network analysis approaches (Yang & Agarwal, 2011; Wu, Wang & Chen, 2013; Cheng et al., 2012; Emig et al., 2013; Harrold, Ramanathan & Mager, 2013; Wu et al., 2013; Vogt, Prinz & Campillos, 2014) but the field has been limited by a fragmentation of databases. Most systems biology databases are not aligned with each other, and typically leave out crucial information about how other biological entities, like drugs and diseases, interact with the systems biology graph. Further, while some interaction databases provide human curation and validation of pathway interactions, and others provide experimental evidence for the recorded interactions, there has not yet been, to our knowledge, a resource that combines the two approaches and quantifies the reliability of the evidence used to assert the interactions.

A knowledge graph is a compilation of facts and figures that can be used to provide contextual meaning to searches. Google is using knowledge graphs to improve its search and to analyze the information graph of the web; Facebook is using them to analyze the social graph. We built our knowledge graph with the goal of unifying large parts of biomedical domain knowledge for both mining and interactive exploration related to drugs, diseases, and proteins. Our knowledge graph is enhanced by the provenance of each fragment of knowledge captured, which is used to compute the confidence probabilities for each of those fragments. Further, we use open standards from the world wide web consortium (W3C), including the resource description framework (RDF) (Richard, David & Markus, 2014), web ontology language (OWL) (Motik, Patel-Schneider & Cuenca Grau, 2009), and SPARQL (Harris, Seaborne & Prud’ hommeaux, 2013). The representation of the knowledge in our knowledge graph is aligned with best practice vocabularies and ontologies from the W3C and the biomedical community, including the provenance ontology (PROV-O) (Lebo, Sahoo & McGuinness, 2013), the HUPO proteomics standards initiative molecular interactions (PSI-MI) ontology (Hermjakob et al., 2004), and the semanticscience integrated ontology (SIO) (Dumontier et al., 2014). Use of these standards, vocabularies, and ontologies make it simple for ReDrugS to integrate with other similar efforts in the future with minimal effort.

We proposed and built a novel computational drug repositioning platform, that we refer to as ReDrugS, that applies probabilistic filtering over individually-supported assertions drawn from multiple databases pertaining to systems biology, pharmacology, disease association, and gene expression data. We use our platform to identify novel and known drugs for melanoma.

Results

We used ReDrugS to examine the drug–target–disease network and identify known, novel, and well supported melanoma drugs. The ReDrugS knowledge base contained 6,180 drugs, 3,820 diseases, 69,279 proteins, and 899,198 interactions. The drugs included in ReDrugS follow the distribution by the Anatomic Therapeutic Classification (ATC) categories shown in Fig. 1.

Figure 1: Percentage approved drugs in each of the categories of the anatomic therapeutic classification (ATC) system.

Download full-size image

DOI: 10.7717/peerj-cs.106/fig-1

We examined drug and gene connections that were three or less interaction steps from melanoma, and additionally filtered interactions with a joint probability greater or equal to 0.93. We identified 25 drugs in the resulting drug–gene–disease network surrounding melanoma as illustrated in Fig. 2.

Figure 2: The interaction graph of predicted melanoma drugs with a probability of 0.93 or higher and have three or fewer intervening interactions between drug and disease.
The “Explore” tab contains the controls to expand the network in various ways, including the filtering parameters. Node and edge detail tabs provide additional information about the selected node or edge, including the probabilities of the edges selected. Users can control the layout algorithm and related options using the “Options” tab.

Download full-size image

DOI: 10.7717/peerj-cs.106/fig-2

We then validated the set of 25 drugs by determining their position in the drug discovery pipeline for melanoma. Table 1 shows that nearly all drugs uncovered by ReDrugS were previously been identified as potential melanoma therapies either in clinical trials or in vivo or in vitro. Of the 25 drugs, 12 have been in Phase I, II, or III clinical trials, five have been studied in vitro, four in vivo, one was investigated as a case study, and three are novel.

Table 1:

Drug discovery status for 25 drug candidates identified using ReDrugS.

Status	Drug	Pathway	Steps	Joint p
Approved	Vemurafenib (Chapman et al., 2011)	BRAF	2	0.98
Phase III	Dabrafenib (Hauschild et al., 2012)	BRAF	2	0.98
	Sorafenib (National Cancer Institute, 2005)	BRAF	2	0.98
	Vinblastine (Luikart, Kennealey & Kirkwood, 1984)	MAP kinase	3	0.93
Phase II	Zidovudine (Humer et al., 2008)	TERT	2	0.98
	Trametinib (Kim et al., 2012)	MAP kinase	2	0.98
	Regorafenib (Istituto Clinico Humanitas, 2015)	BRAF	2	0.98
	Nadroparin (Nagy, Turcsik & Blaskó, 2009)	MYC	3	0.97
	Vinorelbine (Whitehead et al., 2004)	MAP kinase	3	0.93
	Irinotecan (Fiorentini et al., 2009)	CDKN2A	3	0.93
	Topotecan (Kraut et al., 1997)	CDKN2A	3	0.93
Phase I	Sodium stibogluconate (Naing, 2011)	CDKN2A	3	0.93
Case study	Ingenol mebutate (Mansuy et al., 2014)	PRKCA/BRAF	3	0.95
In vitro	Bosutinib (Homsi et al., 2009)	MAP kinase	2	0.98
	Purvalanol (Smalley et al., 2007)	MAP kinase/TP53	3	0.97
	Ellagic acid (Kim et al., 2008)	PRKCA/BRAF	3	0.95
	Albendazole (Patel et al., 2011)	CDKN2A	3	0.93
	Colchicine (Lemontt, Azzaria & Gros, 1988)	MAP kinase	3	0.93
In vivo	Plerixafor (D’Alterio et al., 2012)	CXCR4	3	0.97
	Vincristine (Sawada et al., 2004)	MAP kinase	3	0.93
	L-Methionine (Clavo & Wahl, 1996)	CDKN2A	3	0.93
	Mebendazole (Doudican et al., 2008)	CDKN2A	3	0.93
Novel	Framycetin	CXCR4	3	0.97
	Lucanthone	CDKN2A	3	0.93
	Podofilox	MAP kinase	3	0.93

DOI: 10.7717/peerj-cs.106/table-1

Note:

“Pathway” refers to the target or pathway that the drug acts on. “Steps” is distance in number of interactions between the drug and the disease, and “Joint p” is the joint probability that all of those interactions occur.

To further evaluate our system, we examined the impact of decreasing the joint probability or increasing the number of interaction steps. Figures 3A and 3B show precision, recall, and f-measure curves while varying each parameter. Using these information retrieval performance curves, we found that using a joint probability of 0.93 or greater with three or less interaction steps maximizes the precision and recall as shown in Fig. 3.

Figure 3: Precision, recall, and f-measure by (A) varying thresholds for joint probability and (B) varying number of interaction steps.
Precision is the percentage of returned candidates that have been validated experimentally or have been in a clinical trial (a “hit”) versus all candidates returned. Recall is the percentage of all known validated “hits.” f-Measure is the geometric mean of precision and recall that provides a balanced evaluation of the quality and completeness of the results.

Download full-size image

DOI: 10.7717/peerj-cs.106/fig-3

By performing a sampled literature search on hypothesis candidates with a joint probability of 0.5 or higher and six or fewer interaction steps, we were able to generate precision, recall, and f-measure curves for both cutoffs to find our cutoff of 0.93 with three or fewer interaction steps. The precision, recall, and f-measure curves are shown for varying joint probability thresholds in Fig. 3A and for varying interaction step counts in Fig. 3B.

Discussion

We designed ReDrugS to quickly and automatically integrate and filter a heterogeneous biomedical knowledge graph to generate high-confidence drug repositioning candidates. Our results indicate that ReDrugs generates clinically plausible drug candidates, in which half are in various stages of clinical trials, while others are novel or are being investigated in pre-clinical studies. By helping to consolidate the three main datatypes—drug targets, protein interactions, and disease genes—ReDrugs can amplify the ability of researchers to filter the vast amount of information into those that are relevant for drug discovery.

Candidate significance

Three drugs were identified that have not previously been studied for melanoma treatment. Framycetin, a CXCR4 inhibitor, has not previously been considered for melanoma treatment. While it is nephrotoxic when administered orally (Greenberg, 1965), it is used topically as an antibacterial treatment. While it may not be of use for metastasis, it might serve as a simple, inexpensive prophylactic treatment after excision of primary tumors. Additionally, Lucanthone and Podofilox were identified as having potential effects on melanoma through CDKN2A and MAP kinase, respectively.

One drug we identified, Vemurafenib, is approved for treatment of late stage melanoma has been shown to inhibit the BRAF protein in BRAF-V600 mutant melanomas (Chapman et al., 2011). However, cells can become resistant to Vemurafenib, thereby leading to metastasis (Le et al., 2013).

A number of the drugs we identified are in clinical trials for treatment of melanoma. We identified BRAF-oriented drugs, Dabrafenib (Hauschild et al., 2012), Sorafenib (National Cancer Institute, 2005), and Regorafenib (Istituto Clinico Humanitas, 2015), that have been evaluated in clinical trials, but have not yet been approved. Zidovudine or Azidothymidine is a TERT inhibitor that has shown significant melanoma tumor reductions in mouse models (Humer et al., 2008). Three MAP kinase-related compounds, Vinblastine (Luikart, Kennealey & Kirkwood, 1984), Trametinib (Kim et al., 2012), and Vinorelbine (Whitehead et al., 2004) were identified that are in clinical trials for melanoma treatment. CDKN2A was another popular target, as Irinotecan (Fiorentini et al., 2009), Topotecan (Kraut et al., 1997), and Sodium stibogluconate (Naing, 2011) are all drugs in clinical trial that we identified as potential therapies.

Many other drugs were identified that are being studied in the lab. Additional drugs were identified that target the MAP kinase pathway, including Bosutinib (Homsi et al., 2009), Purvalanol (Smalley et al., 2007), Colchicine (Lemontt, Azzaria & Gros, 1988), and Vincristine (Sawada et al., 2004). Podofilox has not yet been investigated in melanoma treatments, but preliminary investigations have focused on treating chronic lymphocytic leukemia (Shen et al., 2013) and non-small cell lung cancer (Peng et al., 2014). Since these drugs attack MAPK2 and related proteins rather than BRAF or NRAS, they can potentially synergize with other treatments (Homsi et al., 2009). Bosutinib in particular has been investigated as a synergistic treatment for melanoma (Held et al., 2012). Another possible treatment pathway is CXCR4 inhibition. Mouse models suggest that CXCR4 inhibitors like Plerixafor can reduce tumor metastasis and primary tumor growth (D’Alterio et al., 2012). We identify both Plerixafor and Framycetin (Neomycin B) as useful CXCR4 inhibitors. Two PKRCA activators, Ingenol mebutate and Ellagic acid, were also identified. PKRCA binds with BRAF (Pardo et al., 2006), but it is mechanistically unclear how PKRCA activation would result in treatment of melanoma. A number of other therapies are also notable. Purvalenol can inhibit GSK3β, which in turn activates TP53. Some, but not all, melanomas have TP53 deactivation (Smalley et al., 2007). Nadroparin, a MYC inhibitor, may inhibit tumor progression (Nagy, Turcsik & Blaskó, 2009). More broadly, heparins can potentially inhibit the metastatic process in melanoma and other cancers (Maraveyas et al., 2010).

The approach that we present here offers a novel, mechanism-focused exploration to identify and examine drugs and targets related to cancer. This approach filters our noisy or poorly supported parts of the knowledge graph to identify more confident mechanisms between drugs, targets, and diseases. Thus, our approach can be used to explore high confidence associations that are produced as a result of large scale computational screens that use network connectivity (Yang & Agarwal, 2011; Wu, Wang & Chen, 2013; Cheng et al., 2012; Emig et al., 2013; Harrold, Ramanathan & Mager, 2013; Wu et al., 2013; Vogt, Prinz & Campillos, 2014), the complementarity in drug-disease gene expression, and the similarity of chemical fingerprints, side-effects, targets, or indications (Yang & Agarwal, 2011; Ye, Liu & Wei, 2014; Chiang & Butte, 2009; Gottlieb et al., 2011; Lamb et al., 2006; Sirota et al., 2011). Importantly, since we focus on protein networks that are strongly linked with diseases, we believe that our mechanism focused approach will also aid in the identification of disease-modifying drug candidates, rather than solely those that would be useful for the treatment of symptomatic phenotypes or related co-morbid conditions.

Architecture

ReDrugS uses a fairly straightforward web architecture, as shown in Fig. 4. It uses the Blazegraph RDF database backend. The database layer is interchangeable except that the full text search service needs to use Blazegraph-only properties to perform text searches as text indexing is not yet standardized in the SPARQL query language. All other aspects are standardized and should work with other RDF databases without modification. ReDrugs currently uses the Python-based TurboGears web application framework hosted using the web services gateway interface standard via an Apache HTTP server. TurboGears in turn hosts the semantic automated discovery and integration (SADI) web services that drive the application and access the database. It also serves up the static HTML and supporting files.

The user interface is implemented with AngularJS and Cytoscape.js, which submits queries to the SADI web services using JSON-LD and aggregates results into the networked view. The software relies exclusively on standardized protocols (HTTP, SADI, SPARQL, RDF, and others) to make it simple to replace technologies as needed. The data itself is processed using conversion scripts as shown in Fig. 5.

Figure 5: The ReDrugS data flow.
Data is selected from external databases and converted using scripts into nanopublication graphs, which are loaded into the ReDrugS data store. This is combined with experimental method assessments, expressed in OWL, and public ontologies into the RDF store. The web service layer queries the store and produces aggregate analyses of those nanopublications, which is consumed and displayed by the rich web client. The same APIs can be used by other tools for further analysis.

Download full-size image

DOI: 10.7717/peerj-cs.106/fig-5

We have also adapted and featured ReDrugS in an immersive visualization laboratory called the collaborative-research augmented immersive virtual environment (CRAIVE) lab at RPI, as shown in Fig. 6. The goal of the demonstration was to explore new ways to visualize, sonify, and interact with big data in large-scale virtual reality systems. We also leveraged a gesture controller (Microsoft kinect) to interact with the visualization. With the 360° projection, multiple people can explore the visualization concurrently, which accelerates the exploration and discovery speed.

Figure 6: The authors demonstrate the ReDrugS user interface in the collaborative-research augmented immersive virtual environment (CRAIVE) lab at RPI.

Download full-size image

DOI: 10.7717/peerj-cs.106/fig-6

Limitations and future work

Our study has a some limitations. First, our study is limited by the sources of data used. We used three databases (DrugBank, iRefIndex, and online Mendelian inheritance in man (OMIM)) to construct the initial knowledge graph. These databases are continuously changing and necessarily incomplete with respect to the total number of drugs, targets, protein interactions, diseases, and disease genes. For instance, as of 8/15/2016 there are over 2,000 additional FDA approved drugs in DrugBank than in the version that was initially used. Second, the focus of our work is on the potential repositioning of FDA approved drugs, which means that tens of thousands of chemical compounds with protein binding activity cannot be considered as candidates in the current study. Third, our path expansion is currently limited to pairwise protein–protein interactions, which excludes interactions as a result of protein complexes or regulatory pathways. Having a more sophisticated understanding of non-direct interactions will help identify candidate drugs that can regulate entire pathways in a more rational manner. Additionally, we aim to incorporate knowledge of the complementarity of drug and disease gene expression patterns as evidenced by the connectivity map (Lamb et al., 2006), which could suggest therapeutic and adverse interactions. Finally, as we develop new hypotheses about potential new drug effects, we plan to test them using a new three-dimensional cellular microarray to perform high-throughput drug screening (Lee et al., 2008) with reference samples. The integration of computational predictions and high-throughput screening platform will enable the systematic evaluation of any drug or mechanism of action against any disease or adverse event.

Materials and Methods

This research project did not involve human subjects. The ReDrugS platform consists of a graphical web application, an application programming interface (API), and a knowledge base. The graphical web application enables users to initiate a search using drug, gene, and disease names and synonyms. Users can then interact with the application to expand the network at an arbitrary number of interactions away from the entity of interest, and to filter the network based on a joint probability between the source and target entities. Drug–protein, protein–protein, and gene–disease interactions were obtained from several datasets and integrated into ontology-annotated and provenance and evidence bearing representations called nanopublications. The web application obtains information from the knowledge base using semantic web services. Finally, we evaluated our approach by examining the mechanistic plausibility of the drug in having melanoma-specific disease modifying ability. We evaluated a large number of possible drug/disease associations with varying joint probabilities and interaction steps to determine the thresholds with the highest f-measure, resulting in our thresholds of three or less interactions and a joint probability of 0.93 or higher.

Using the ReDrugS application page (http://redrugs.tw.rpi.edu) we initiate our search for “melanoma,” and select the first suggestion obtained from the experimental factor ontology (EFO) (http://www.ebi.ac.uk/efo/EFO_0000756). The application then provides immediate neighborhood of drugs and genes that are associated with melanoma. We expanded the network by first selecting the melanoma node and expanding the link distance to |I| ≤ 3 and changing the minimum joint probability to p ≥ 0.93 in the search options. Importantly, we also limit the node type to “Drug.” Finally, we click on the “find incoming links” button (two left-facing arrows). When finished the network will show all drugs interacting with melanoma that meet the above criteria, as well as any intervening entities and their interactions. The resulting network can be downloaded as an image, or a summary CSV file. We used the CSV file to validate the links by searching Google Scholar and ClinicalTrials.gov for each proposed drug/disease combination. We consider a “hit” to be a pairing with a published positive experiment in vivo or in vitro or any pairing that has been tested in a clinical trial. While this level of validation does not guarantee efficacy, it does determine if the resulting connection is a plausible hypothesis that might be tested.

Data fusion

We developed a structured knowledge base containing data pertaining to drugs, targets, interactions, and diseases. We used five data sources: iRefIndex (Razick, Magklaras & Donaldson, 2008), DrugBank (Wishart et al., 2006), UniProt gene ontology annotations (GOA) (Camon et al., 2004), the online Mendelian inheritance in man (OMIM) (Hamosh et al., 2005), and the catalogue of somatic mutations in cancer (COSMIC) gene census (Futreal et al., 2004).

iRefIndex contains protein–protein interactions and protein complexes and is an amalgam of the biomolecular interaction network database (Bader, Betel & Hogue, 2003), BioGRID (Stark et al., 2006), the comprehensive resource of mammalian protein complexes (Ruepp et al., 2010), database of interacting proteins (Xenarios et al., 2002), human protein reference database (Keshava Prasad et al., 2009), InnateDB (Lynn et al., 2008), IntAct (Kerrien et al., 2011), MatrixDB (Chautard et al., 2011), molecular interaction database (Chatr-aryamontri et al., 2008), MPact (Güldener et al., 2006), microbial protein interaction database (Goll et al., 2008), MIPS mammalian protein–protein interaction database (Pagel et al., 2005), and online predicted human interaction database (Brown & Jurisica, 2005). DrugBank provides information about experimental/approved drugs and their targets, and UniProt GOA describes proteins in terms of their biological processes, cellular locations, and molecular functions. OMIM provides associations between genes and inherited or genetically-driven diseases. The COSMIC gene census is a curated list of genes that have causal associations with one or more cancer types.

Each association (e.g., drug–target, protein–protein, disease–gene) was captured using the nanopublication (Groth, Gibson & Velterop, 2010) scheme. A nanopublication is a digital artifact that consists of an assertion, its provenance, and information about the digital publication. Our nanopublications are represented as linked data: each data item is identified using an dereferenceable HTTP uniform resource identifier (URI) and statements are represented using the RDF. Each nanopublication corresponds to a single interaction assertion from one of the databases. We used a number of automated scripts to produce the nanopublications and load them into the SPARQL endpoint. An example nanopublication is shown in Fig. 7. We used the SIO (Dumontier et al., 2014) as a global schema to describe the nature and components of the associations, and coupled this with the PSI-MI ontology (Hermjakob et al., 2004) to denote the types of interactions. We used the W3C’s PROV-O (Lebo, Sahoo & McGuinness, 2013) to capture provenance of the assertion (which data source it originated from). We loaded our nanopublications into Blazegraph, an RDF nanopublication compatible database. The data is accessed using its native SPARQL endpoint by the web application.

Figure 7: Representation of a protein/protein interaction within a nanopublication.
Three graphs are represented. The assertion graph (NanoPub_501799_Assertion), states that an interaction (X) is of type *sio:DirectInteraction*, and has the target of SLC4A8, and a participant of CA2. The supporting graph (NanoPub_501799_Supporting), states that the assertion graph was generated by a pull down experiment (one of many encoded experiment types used in, a subclass of *prov:Activity*. The attribution graph (NanoPub_501799_Attribution), in turn, states that the assertion had a primary source of (Loiselle et al., 2004) and that the interaction was quoted from BioGrid.

Download full-size image

DOI: 10.7717/peerj-cs.106/fig-7

Assertion probability

Each knowledge graph fragment, enclosed in a nanopublication, is assigned a probability based on the quality of the methods used to create the assertions in the fragment. We compute probabilities based on two different methods. Manually curated assertions, from DrugBank, OMIM, and COSMIC gene census, are directly given a probability p = 0.999. Assertions that have been derived from a specific experimental method are given probabilities appropriate for that method. These probabilities are derived from a expert-driven measure of the reliability of the experimental method used to derive the association. Factors involved in the assessment of confidence include the degree of indirection in the assay, the sensitivity and specificity of the approach, and reproducibility of results under different conditions based on the comparative analyses of techniques (Skrabanek et al. 2008; Sprinzak, Sattath & Margalit, 2003). Two expert bioinformaticians rated the reliability of each method and assigned a score of 1–3, where 1 corresponds to low confidence and 3 to high confidence. After their initial assessment, they conferred on their reasoning for each score to resolve differences where possible. The experts considered level 1 to correspond to weak evidence that needs independent verification. Level 2 methods are generally reliable, but should have additional biological evidence. Level 3 methods are high-quality method that produces few false positives. We calculated inter-annotator agreement between the two annotators over the three categories using Scott’s Pi. Scott’s Pi is similar to Cohen’s kappa in that it improves on simple observed agreement by factoring in the extent of agreement that might be expected by chance. We determined the agreement to be 0.56 (Scott’s Pi value of 0.26) across 104 experimental methods comprising of 99.9999% of interaction annotations (Scott, 1955).

The scores of 1, 2, and 3 were then assigned provisional probabilities of p = 0.8, p = 0.95, and p = 0.99 respectively. We chose these probabilities as approximations of the conceptual levels of probability for each rating by the experts, and feel that those probabilities correspond to how often an experiment at that confidence level can be expected to be accurate. We plan to provide a more rigorous assessment of the accuracy of each method against gold standards in future work. These confidence values were encoded into an OWL ontology along with the evidence codes. The full inferences were extracted using Pellet (https://github.com/complexible/pellet) and loaded into the SPARQL endpoint, where they were used to apply the probabilities to each assertion in the knowledge graph that had experimental evidence.

Semantic web services

We developed four SADI web services (Wilkinson, Vandervalk & McCarthy, 2009) in Python¹ to support easy access to the nanopubications (see Table 2) in ReDrugS. The four services are enumerated in Table 2.

Table 2:

ReDrugS API SADI Web Services. The API endpoint prefix is http://redrugs.tw.rpi.edu/api/.

Service name	Description	URL	Input	Output
Resource text search	Look up resources using free text search against their RDFS labels. This service is optimized for typeahead user interfaces.	search	pml:Query	pml:AnsweredQuery
Find interactions in a biological process	Find interactions whose participants or targets also participate in the input process.	process	sio:Process	sio:Process
Find upstream participants	Find interactions that the input entity is a target of in and have explicit participants.	upstream	sio:MaterialEntity	sio:Target
Find downstream targets	Find interactions that the input entity participates in and have explicit targets.	downstream	sio:MaterialEntity	sio:Agent

DOI: 10.7717/peerj-cs.106/table-2

The first service is a simple free text lookup, that takes an pml:Query² (McGuinness et al., 2007) with a prov:value as a query and produces a set of entities whose labels contain the substring. This is used for interactive typeahead completion of search terms so users can look up URIs and entities without needing to know the details.

The other three SADI services look up interactions that contain a named entity. Two of them look at the entity to find upstream and downstream connections, and the third service assumes that the entity is a biological process and finds all interactions that related to that process. The services return only one interaction for each triple (source, interaction type, target). There are often multiple probabilities per interaction, and more than one interaction per interaction type. This is because the interaction may have been recorded in multiple databases, based on different experimental methods. To provide a single probability score for each interaction of a source and target, the interactions are combined. A single probability is generated per identified interaction by taking the geometric mean of the probabilities for that interaction. However, this method is undesirable when combining multiple interaction records of the same type. We instead combine the interaction records using a form of probabilistic voting using composite Z-scores. This is done to model that multiple experiments that produce the same results reinforce each other, and should therefore give a higher overall probability than would be indicated by taking their mean or even by Bayes theorem. We do this by converting each probability into a Z-score (aka standard score) using the quantile function (Q()), summing the values, and applying the cumulative distribution function (CDF()) to compute the corresponding probability: $P (x_{1 \dots n}) = C D F (\sum_{i = 1}^{n} Q (P (x_{i})))$

These composite Z-scores, which we transform back into probabilities, are frequently used to combine multiple indicators of the same underlying phenomena, as in (Moller et al., 1998). However, it has a drawback. One concern is that the strategy does not account for multiple databases recording the same non-independent experiment. This can possibly inflating the probabilities of interactions described by experiments that are published in more than one database.

Graph expansion using joint probability

In order to compute the probability that a given entity affects another, we compute the joint probability that each of the intervening interactions are true. Joint probability is the probability that every assertion in the set is true. This is computed by taking the product of probabilities of each interaction: $P (x_{1} \land \dots \land x_{n}) = \prod_{i = 1}^{n} P (x_{i})$

This joint probability is used as a threshold that users can set to stop graph expansion. We also provide expansion limits using the number of interaction steps that are needed to connect the two entities.

User interface

The user interface was developed using the above SADI web services and uses Cytoscape.js (http://cytoscape.github.io/cytoscape.js) angular.js (https://angularjs.org), and Bootstrap 3 (http://getbootstrap.com). An example network is shown in Fig. 2. Users can search for biological entities and processes, which can then be autocompleted to specific entities that are in the ReDrugS graph. Users can then add those entities and processes to the displayed graph and retrieve upstream and downstream connections and link out to more details for every entity. Cytoscape.js is used as the main rendering and network visualization tool, and provides node and edge rendering, layout, and network analysis capabilities, and has been integrated into a customized rich web client.

In order to evaluate this knowledge graph, we developed a demonstration web interface (http://redrugs.tw.rpi.edu) based on the Cytoscape.js (http://cytoscape.github.io/cytoscape.js) JavaScript library. The interface lets users enter biological entity names. As the user types, the text is resolved to a list of entities. The user finishes by selecting from the list, and submitting the search. The search returns interactions and nodes associated with the entity selected, which are added to the Cytoscape.js graph. Users are also able to select nodes and populate upstream or downstream connections. Figure 2 is an example output of this process.

For further information on developing web services in Python using SADI, see this tutorial: https://github.com/markwilkinson/SADI-Semantic-Web-Services-Core/wiki/Building-Services-in-Python

PML 3, in development: https://github.com/timrdf/pml. This includes PML 2 constructs that are not covered in PROV-O.

[1] Bader GD, Betel D, Hogue CW. 2003. BIND: the biomolecular interaction network database. Nucleic Acids Research 31(1):248-250

[2] Barth A, Wanek L, Morton D. 1995. Prognostic factors in 1,521 melanoma patients with distant metastases. Journal of the American College of Surgeons 181(3):193-201

[3] Bisgin H, Liu Z, Fang H, Kelly R, Xu X, Tong W. 2014. A phenome-guided drug repositioning through a latent variable model. BMC Bioinformatics 15(1):267

[4] Bisgin H, Liu Z, Kelly R, Fang H, Xu X, Tong W. 2012. Investigating drug repositioning opportunities in FDA drug labels through topic modeling. BMC Bioinformatics 13(Suppl 1):S6

[5] Brown KR, Jurisica I. 2005. Online predicted human interaction database. Bioinformatics 21(9):2076-2082

[6] Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R. 2004. The gene ontology annotation (GOA) database: sharing knowledge in UniProt with gene ontology. Nucleic Acids Research 32(Suppl 1):D262-D266

[7] Chapman PB, Hauschild A, Robert C, Haanen JB, Ascierto P, Larkin J, Dummer R, Garbe C, Testori A, Maio M, Hogg D, Lorigan P, Lebbe C, Jouary T, Schadendorf D, Ribas A, O’Day SJ, Sosman JA, Kirkwood JM, Eggermont AM, Dreno B, Nolop K, Li J, Nelson B, Hou J, Lee RJ, Flaherty KT, McArthur GA. 2011. Improved survival with vemurafenib in melanoma with BRAF V600E mutation. New England Journal of Medicine 364(26):2507-2516

[8] Chatr-aryamontri A, Zanzoni A, Ceol A, Cesareni G. 2008. Searching the protein interaction space through the MINT database. In: Thompson JD, Ueffing M, Schaeffer-Reiss C, eds. Functional Proteomics: Methods and Protocols. Totowa: Humana Press. 305-317

[9] Chautard E, Fatoux-Ardore M, Ballut L, Thierry-Mieg N, Ricard-Blum S. 2011. MatrixDB, the extracellular matrix interaction database. Nucleic Acids Research 39(Suppl 1):D235-D240

[10] Cheng F, Liu C, Jiang J, Lu W, Li W, Liu G, Zhou W, Huang J, Tang Y. 2012. Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Computational Biology 8(5):e1002503

[11] Chiang AP, Butte AJ. 2009. Systematic evaluation of drug-disease relationships to identify leads for novel drug uses. Clinical Pharmacology and Therapeutics 86(5):507-510

[12] Clavo A, Wahl R. 1996. Effects of hypoxia on the uptake of tritiated thymidine, L-leucine, L-methionine and FDG in cultured cancer cells. Journal of Nuclear Medicine 37:502-506

[13] D’Alterio C, Barbieri A, Portella L, Palma G, Polimeno M, Riccio A, Ieranò C, Franco R, Scognamiglio G, Bryce J, Luciano A, Rea D, Arra C, Scala S. 2012. Inhibition of stromal CXCR4 impairs development of lung metastases. Cancer Immunology Immunotherapy 61(10):1713-1720

[14] Doudican N, Rodriguez A, Osman I, Orlow SJ. 2008. Mebendazole induces apoptosis via Bcl-2 inactivation in chemoresistant melanoma cells. Molecular Cancer Research 6(8):1308-1315

[15] Dumontier M, Baker CJ, Baran J, Callahan A, Chepelev L, Cruz-Toledo J, Del Rio NR, Duck G, Furlong LI, Keath N, Klassen D, McCusker JP, Queralt-Rosinach N, Samwald M, Villanueva-Rosales N, Wilkinson MD, Hoehndorf R. 2014. The semanticscience integrated ontology (SIO) for biomedical research and knowledge discovery. Journal of Biomedical Semantics 5(1):14

[16] Emig D, Ivliev A, Pustovalova O, Lancashire L, Bureeva S, Nikolsky Y, Bessarabova M. 2013. Drug target prediction and repositioning using an integrated network-based approach. PLoS ONE 8(4):e60618

[17] Fiorentini G, Aliberti C, Del CA, Tilli M, Rossi S, Ballardini P, Turrisi G, Benea G. 2009. Intra-arterial hepatic chemoembolization (TACE) of liver metastases from ocular melanoma with slow-release irinotecan-eluting beads. Early results of a phase II clinical study. In Vivo 23(1):131-137

[18] Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton MR. 2004. A census of human cancer genes. Nature Reviews Cancer 4(3):177-183

[19] Goll J, Rajagopala SV, Shiau SC, Wu H, Lamb BT, Uetz P. 2008. MPIDB: the microbial protein interaction database. Bioinformatics 24(15):1743-1744

[20] Gottlieb A, Stein G, Ruppin E, Sharan R. 2011. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Molecular Systems Biology 7:496

[21] Greenberg LH. 1965. Audiotoxicity and nephrotoxicity due to orally administered neomycin. JAMA 194(7):827-828

[22] Groth P, Gibson A, Velterop J. 2010. The anatomy of a nanopublication. Information Services and Use 30(1):51-56

[23] Grover MP, Ballouz S, Mohanasundaram KA, George RA, Sherman CDH, Crowley TM, Wouters MA. 2014. Identification of novel therapeutics for complex diseases from genome-wide association data. BMC Medical Genomics 7(Suppl 1):S8

[24] Güldener U, Münsterkötter M, Oesterheld M, Pagel P, Ruepp A, Mewes H-W, Stümpflen V. 2006. MPact: the MIPS protein interaction resource on yeast. Nucleic Acids Research 34(Suppl 1):D436-D441

[25] Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. 2005. Online Mendelian inheritance in man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research 33(Suppl 1):D514-D517

[26] Harris S, Seaborne A, Prud’hommeaux E. 2013. SPARQL 1.1 query language. W3C Recommendation 21

[27] Harrold JM, Ramanathan M, Mager DE. 2013. Network-based approaches in drug discovery and early development. Clinical Pharmacology and Therapeutics 94(6):651-658

[28] Hauschild A, Grob J-J, Demidov LV, Jouary T, Gutzmer R, Millward M, Rutkowski P, Blank CU, Miller WH, Kaempgen E, Martn-Algarra S, Karaszewska B, Mauch C, Chiarion-Sileni V, Martin A-M, Swann S, Haney P, Mirakhur B, Guckert ME, Goodman V, Chapman PB. 2012. Dabrafenib in BRAF-mutated metastatic melanoma: a multicentre open-label, phase 3 randomised controlled trial. Lancet 380(9839):358-365

[29] Held MA, Langdon CG, Platt JT, Graham-Steed T, Liu Z, Chakraborty A, Bacchiocchi A, Koo A, Haskins JW, Bosenberg MW, Stern DF. 2012. Genotype-selective combination therapies for melanoma identified by high-throughput drug screening. Cancer Discovery 3(1):52-67

[30] Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U, von Mering C, Roechert B, Poux S, Jung E, Mersch H, Kersey P, Lappe M, Li Y, Zeng R, Rana D, Nikolski M, Husi H, Brun C, Shanker K, Grant SGN, Sander C, Bork P, Zhu W, Pandey A, Brazma A, Jacq B, Vidal M, Sherman D, Legrain P, Cesareni G, Xenarios I, Eisenberg D, Steipe B, Hogue C, Apweiler R. 2004. The HUPO PSI’s molecular interaction format—a community standard for the representation of protein interaction data. Nature Biotechnology 22(2):177-183

[31] Homsi J, Cubitt CL, Zhang S, Munster PN, Yu H, Sullivan DM, Jove R, Messina JL, Daud AI. 2009. Src activation in melanoma and Src inhibitors as therapeutic agents in melanoma. Melanoma Research 19(3):167-175

[32] Humer J, Ferko B, Waltenberger A, Rapberger R, Pehamberger H, Muster T. 2008. Azidothymidine inhibits melanoma cell growth in vitro and in vivo. Melanoma Research 18(5):314-321

[33] Istituto Clinico Humanitas. 2015. Regorafenib in patients with metastatic solid tumors who have progressed after standard therapy (RESOUND) (accessed 10 January 2016)

[34] Kerrien S, Aranda B, Breuza L, Bridge A, Broackes-Carter F, Chen C, Duesbury M, Dumousseau M, Feuermann M, Hinz U, Jandrasits C, Jimenez RC, Khadake J, Mahadevan U, Masson P, Pedruzzi I, Pfeiffenberger E, Porras P, Raghunath A, Roechert B, Orchard S, Hermjakob H. 2011. The IntAct molecular interaction database in 2012. Nucleic Acids Research 40:D841-D846

[35] Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S, Somanathan DS, Sebastian A, Rani S, Ray S, Harrys Kishore CJ, Kanth S, Ahmed M, Kashyap MK, Mohmood R, Ramachandra YL, Krishna V, Rahiman BA, Mohan S, Ranganathan P, Ramabadran S, Chaerkady R, Pandey A. 2009. Human protein reference database—2009 update. Nucleic Acids Research 37(Suppl 1):D767-D772

[36] Kim KB, Kefford R, Pavlick AC, Infante JR, Ribas A, Sosman JA, Fecher LA, Millward M, McArthur GA, Hwu P, Gonzalez R, Ott PA, Long GV, Gardner OS, Ouellet D, Xu Y, DeMarini DJ, Le NT, Patel K, Lewis KD. 2012. Phase II study of the MEK1/MEK2 inhibitor trametinib in patients with metastatic BRAF-mutant cutaneous melanoma previously treated with or without a BRAF inhibitor. Journal of Clinical Oncology 31(4):482-489

[37] Kim S, Liu Y, Gaber MW, Bumgardner JD, Haggard WO, Yang Y. 2008. Development of chitosan-ellagic acid films as a local drug delivery system to induce apoptotic death of human melanoma cells. Journal of Biomedical Materials Research Part B: Applied Biomaterials 90B(1):145-155

[38] Kingsmore SF, Lindquist IE, Mudge J, Gessler DD, Beavis WD. 2008. Genome-wide association studies: progress and potential for drug discovery and development. Nature Reviews Drug Discovery 7(3):221-230

[39] Kraut EH, Walker MJ, Staubus A, Gochnour D, Balcerzak SP. 1997. Phase II trial of topotecan in malignant melanoma. Cancer Investigation 15(4):318-320

[40] Krauthammer M, Kong Y, Bacchiocchi A, Evans P, Pornputtapong N, Wu C, McCusker J, Ma S, Cheng E, Straub R, Serin M, Bosenberg M, Ariyan S, Narayan D, Sznol M, Kluger H, Mane S, Schlessinger J, Lifton R, Halaban R. 2015. Exome sequencing identifies recurrent mutations in NF1 and RASopathy genes in sun-exposed melanomas. Nature Genetics 47(9):996-1002

[41] Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, Lerner J, Brunet J-P, Subramanian A, Ross KN, Reich M, Hieronymus H, Wei G, Armstrong SA, Haggarty SJ, Clemons PA, Wei R, Carr SA, Lander ES, Golub TR. 2006. The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313(5795):1929-1935

[42] Le K, Blomain ES, Rodeck U, Aplin AE. 2013. Selective RAF inhibitor impairs ERK1/2 phosphorylation and growth in mutant NRAS vemurafenib-resistant melanoma cells. Pigment Cell & Melanoma Research 26(4):509-517

[43] Lebo T, Sahoo S, McGuinness D. 2013. PROV-O: the PROV ontology.

[44] Lee M-Y, Kumar RA, Sukumaran SM, Hogg MG, Clark DS, Dordick JS. 2008. Three-dimensional cellular microarray for high-throughput toxicology assays. Proceedings of the National Academy of Sciences of the United States of America 105(1):59-63

[45] Lemontt J, Azzaria M, Gros P. 1988. Increased mdr gene expression and decreased drug accumulation in multidrug-resistant human melanoma cells. Cancer Research 48(22):6348-6353

[46] Loiselle FB, Morgan PE, Alvarez BV, Casey JR. 2004. Regulation of the human NBC₃ Na⁺/HCO₃⁻ cotransporter by carbonic anhydrase II and PKA. American Journal of Physiology-Cell Physiology 286(6):C1423-C1433

[47] Luikart S, Kennealey G, Kirkwood J. 1984. Randomized phase III trial of vinblastine, bleomycin, and cis-dichlorodiammine-platinum versus dacarbazine in malignant melanoma. Journal of Clinical Oncology 2(3):164-168

[48] Lynn DJ, Winsor GL, Chan C, Richard N, Laird MR, Barsky A, Gardy JL, Roche FM, Chan THW, Shah N, Lo R, Naseer M, Que J, Yau M, Acab M, Tulpan D, Whiteside MD, Chikatamarla A, Mah B, Munzner T, Hokamp K, Hancock REW, Brinkman FSL. 2008. InnateDB: facilitating systems-level analyses of the mammalian innate immune response. Molecular Systems Biology 4(1):218

[49] Mansuy M, Nikkels-Tassoudji N, Arrese JE, Rorive A, Nikkels AF. 2014. Recurrent in situ melanoma successfully treated with ingenol mebutate. Dermatology and Therapy 4(1):131-135

[50] Maraveyas A, Johnson MJ, Xiao YP, Noble S. 2010. Malignant melanoma as a target malignancy for the study of the anti-metastatic properties of the heparins. Cancer and Metastasis Reviews 29(4):777-784

[51] McGuinness DL, Ding L, Silva PPD, Chang C. 2007. PML 2: a modular explanation interlingua.

[52] Moller JT, Cluitmans P, Rasmussen LS, Houx P, Rasmussen H, Canet J, Rabbitt P, Jolles J, Larsen K, Hanning CD, Langeron O, Johnson T, Lauven PM, Kristensen PA, Biedler A, van Beem H, Fraidakis O, Silverstein JH, Beneken JEW, Gravenstein JS. 1998. Long-term postoperative cognitive dysfunction in the elderly: ISPOCD1 study. Lancet 351(9106):857-861

[53] Motik B, Patel-Schneider PF, Cuenca Grau B. 2009. OWL 2 Web Ontology Language: Direct Semantics.

[54] Nagy Z, Turcsik V, Blaskó G. 2009. The effect of LMWH (nadroparin) on tumor progression. Pathology & Oncology Research 15(4):689-692

[55] Naing A. 2011. Phase I dose escalation study of sodium stibogluconate (SSG) a protein tyrosine phosphatase inhibitor, combined with interferon alpha for patients with solid tumors. Journal of Cancer 2:81-89

[56] National Cancer Institute. 2005. Carboplatin and paclitaxel with or without sorafenib tosylate in treating patients with stage III or stage IV melanoma that cannot be removed by surgery. (accessed 10 January 2016)

[57] Pagel P, Kovac S, Oesterheld M, Brauner B, Dunger-Kaltenbach I, Frishman G, Montrone C, Mark P, Stümpflen V, Mewes H-W, Ruepp A, Frishman D. 2005. The mips mammalian protein-protein interaction database. Bioinformatics 21(6):832-834

[58] Pardo OE, Wellbrock C, Khanzada UK, Aubert M, Arozarena I, Davidson S, Bowen F, Parker PJ, Filonenko VV, Gout IT, Sebire N, Marais R, Downward J, Seckl MJ. 2006. FGF-2 protects small cell lung cancer cells from apoptosis through a complex involving PKCepsilon, B-Raf and S6K2. EMBO Journal 25(13):3078-3088

[59] Patel K, Doudican NA, Schiff PB, Orlow SJ. 2011. Albendazole sensitizes cancer cells to ionizing radiation. Radiation Oncology 6(1):160

[60] Peng X, Wang F, Li L, Bum-Erdene K, Xu D, Wang B, Sinn AA, Pollok KE, Sandusky GE, Li L, Turchi JJ, Jalal SI, Meroueh SO. 2014. Exploring a structural protein–drug interactome for new therapeutics in lung cancer. Molecular BioSystems 10(3):581

[61] Razick S, Magklaras G, Donaldson IM. 2008. iRefIndex: a consolidated protein interaction database with provenance. BMC Bioinformatics 9(1):405

[62] Richard C, David W, Markus L. 2014. RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation

[63] Ruepp A, Waegele B, Lechner M, Brauner B, Dunger-Kaltenbach I, Fobo G, Frishman G, Montrone C, Mewes H-W. 2010. Corum: the comprehensive resource of mammalian protein complexes—2009. Nucleic Acids Research 38(Suppl 1):D497-D501

[64] Sanseau P, Koehler J. 2011. Editorial: computational methods for drug repurposing. Briefings in Bioinformatics 12(4):301-302

[65] Sawada N, Kataoka K, Kondo K, Arimochi H, Fujino H, Takahashi Y, Miyoshi T, Kuwahara T, Monden Y, Ohnishi Y. 2004. Betulinic acid augments the inhibitory effects of vincristine on growth and lung metastasis of B16F10 melanoma cells in mice. British Journal of Cancer 90(8):1672-1678

[66] Scott WA. 1955. Reliability of content analysis: the case of nominal scale coding. Public Opinion Quarterly 19(3):321-325

[67] Shen M, Zhang Y, Saba N, Austin CP, Wiestner A, Auld DS. 2013. Identification of therapeutic candidates for chronic lymphocytic leukemia from a library of approved drugs. PLoS ONE 8(9):e75252

[68] Sirota M, Dudley JT, Kim J, Chiang AP, Morgan AA, Sweet-Cordero A, Sage J, Butte AJ. 2011. Discovery and preclinical validation of drug indications using compendia of public gene expression data. Science Translational Medicine 3(96):96ra77

[69] Skrabanek L, Saini HK, Bader GD, Enright AJ. 2008. Computational prediction of protein-protein interactions. Molecular Biotechnology 38(1):1-17

[70] Smalley KSM, Contractor R, Haass NK, Kulp AN, Atilla-Gokcumen GE, Williams DS, Bregman H, Flaherty KT, Soengas MS, Meggers E, Herlyn M. 2007. An organometallic protein kinase inhibitor pharmacologically activates p53 and induces apoptosis in human melanoma cells. Cancer Research 67(1):209-217

[71] Sprinzak E, Sattath S, Margalit H. 2003. How reliable are experimental protein–protein interaction data? Journal of Molecular Biology 327(5):919-923

[72] Stark C, Breitkreutz B-J, Reguly T, Boucher L, Breitkreutz A, Tyers M. 2006. BioGRID: a general repository for interaction datasets. Nucleic Acids Research 34(Suppl 1):D535-D539

[73] Vogt I, Prinz J, Campillos M. 2014. Molecularly and clinically related drugs and diseases are enriched in phenotypically similar drug-disease pairs. Genome Medicine 6(7):52

[74] Whitehead RP, Moon J, McCachren SS, Hersh EM, Samlowski WE, Beck JT, Tchekmedyian NS, Sondak VK. 2004. A Phase II trial of vinorelbine tartrate in patients with disseminated malignant melanoma and one prior systemic therapy. Cancer 100(8):1699-1704

[75] Wilkinson M, Vandervalk B, McCarthy L. 2009. SADI Semantic Web Services–cause you can’t always GET what you want! In: 2009 IEEE Asia-Pacific Services Computing Conference (APSCC). Singapore: IEEE. 13-18

[76] Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J. 2006. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Research 34(Suppl 1):D668-D672

[77] Wu C, Gudivada RC, Aronow BJ, Jegga AG. 2013. Computational drug repositioning through heterogeneous network clustering. BMC Systems Biology 7(Suppl 5):S6

[78] Wu Z, Wang Y, Chen L. 2013. Network-based drug repositioning. Molecular BioSystems 9(6):1268-1281