ProPheno 1.0: An online dataset for accelerating the complete characterization of the human protein-phenotype landscape in biomedical literature

Morteza Pourreza Shahri; Indika Kahanda

doi:10.7287/peerj.preprints.27479v2

Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

NOT PEER-REVIEWED

"PeerJ Preprints" is a venue for early communication or feedback before peer review. Data may be preliminary.

ProPheno 1.0: An online dataset for accelerating the complete characterization of the human protein-phenotype landscape in biomedical literature

Morteza Pourreza Shahri, Indika Kahanda

Gianforte School of Computing, Montana State University, Bozeman, Montana, United States

DOI: 10.7287/peerj.preprints.27479v2

Published: 2019-05-02
Accepted: 2019-05-02

Subject Areas: Bioinformatics, Natural Language and Speech
Keywords: Biomedical Natural Language Processing, Proteins/Phenotypes, Text Mining, ProPheno 1.0

Copyright: © 2019 Pourreza Shahri et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

Cite this article: Pourreza Shahri M, Kahanda I. 2019. ProPheno 1.0: An online dataset for accelerating the complete characterization of the human protein-phenotype landscape in biomedical literature. PeerJ Preprints 7:e27479v2 https://doi.org/10.7287/peerj.preprints.27479v2

Abstract

Identifying protein-phenotype relations is of paramount importance for applications such as uncovering rare and complex diseases. One of the best resources that captures the protein-phenotype relationships is the biomedical literature. In this work, we introduce ProPheno, a comprehensive online dataset composed of human protein/phenotype mentions extracted from the complete corpora of Medline and PubMed Central Open Access. Moreover, it includes co-occurrences of protein-phenotype pairs within different spans of text such as sentences and paragraphs. We use ProPheno for completely characterizing the human protein-phenotype landscape in biomedical literature. ProPheno, the reported findings and the gained insight has implications for (1) biocurators for expediting their curation efforts, (2) researches for quickly finding relevant articles, and (3) text mining tool developers for training their predictive models. The RESTful API of ProPheno is freely available at http://propheno.cs.montana.edu.

Author Comment

After receiving valuable comments from the reviewers, we updated the article with those comments. We have also changed the title since we are planning to modify the proposed method and present another version in the future.

Add your feedback

Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback.

Some Markdown syntax is allowed: _italic_ **bold** ^superscript^ ~subscript~ %%blockquote%% [link text](link URL)

By posting this you agree to PeerJ's commenting policies

Questions

Ask a question

Learn more about Q&A

Links

Add a link

Content

Alert

Just enter your email

Add your feedback

Top referrals unique visitors

Share this preprint

Metrics

Download article