HPO2GO: prediction of human phenotype ontology term associations using cross ontology annotation co-occurrences

Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
Cancer Systems Biology Laboratory (CanSyL), Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
Subject Areas
Bioinformatics, Computational Biology
Human Phenotype Ontology, Gene Ontology, Cross ontology mapping, Ontological term prediction
© 2018 Doğan
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Doğan T. 2018. HPO2GO: prediction of human phenotype ontology term associations using cross ontology annotation co-occurrences. PeerJ Preprints 6:e26663v1


Analysing the relationships between biomolecules and the genetic diseases is a highly active area of research, where the aim is to identify the genes and their products that cause a particular disease due to functional changes originated from mutations. Biological ontologies are frequently employed in these studies, which provided researchers with extensive opportunities for knowledge discovery through computational data analysis. In this study, a novel approach is proposed for the identification of relationships between biomedical entities by automatically mapping phenotypic abnormality defining HPO terms with biomolecular function defining GO terms, where each association indicates the occurrence of the abnormality due to the loss of the biomolecular function expressed by the corresponding GO term. The proposed HPO2GO mappings were extracted by calculating the frequency of the co-annotations of the terms on the same genes/proteins, using already existing curated HPO and GO annotation sets. This was followed by the filtering of the unreliable mappings that could be observed due to chance, by statistical resampling of the co-occurrence similarity distributions. Furthermore, the biological relevance of the finalized mappings were discussed over selected cases, using the literature. The resulting HPO2GO mappings can be employed in different settings to predict and to analyse novel gene/protein - ontology term - disease relations. As an application of the proposed approach, HPO term - protein associations (i.e., HPO2protein) are predicted. In order to test the predictive performance of the method on a quantitative basis, and to compare it with the state-of-the-art, CAFA2 challenge HPO prediction target protein set was employed. The results of the benchmark indicated the potential of the proposed approach, as HPO2GO beat all models from 38 participating groups (with Fmax=0.402), by a margin of 12.6% compared to the top performer. It is important to note that, HPO2GO was not proposed to replace, but to complement the conventional approaches used in the field of biomedical relation discovery. The automated cross ontology mapping approach developed in this work can easily be extended to other ontologies as well, to identify unexplored relation patterns at the systemic level. The proposed approach will be more effective when combined with powerful techniques such as text/literature mining. The datasets, results and the source code of HPO2GO are available for download at: https://github.com/cansyl/HPO2GO.

Author Comment

This is a submission to PeerJ for review.