Comparison of natural language processing tools for automatic gene ontology annotation of scientific literature

Lucas Beasley; Prashanti Manda

doi:10.7287/peerj.preprints.27028v1

Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

NOT PEER-REVIEWED

"PeerJ Preprints" is a venue for early communication or feedback before peer review. Data may be preliminary.

Comparison of natural language processing tools for automatic gene ontology annotation of scientific literature

Lucas Beasley, Prashanti Manda

Department of Computer Science, University of North Carolina at Greensboro, Greensboro, North Carolina, United States

DOI: 10.7287/peerj.preprints.27028v1

Published: 2018-07-11
Accepted: 2018-07-11

Subject Areas: Bioinformatics, Natural Language and Speech
Keywords: text mining, gene ontology annotation, literature curation, NLP tools

Copyright: © 2018 Beasley et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

Cite this article: Beasley L, Manda P. 2018. Comparison of natural language processing tools for automatic gene ontology annotation of scientific literature. PeerJ Preprints 6:e27028v1 https://doi.org/10.7287/peerj.preprints.27028v1

Abstract

Manual curation of scientific literature for ontology-based knowledge representation has proven infeasible and unscalable to the large and growing volume of scientific literature. Automated annotation solutions that leverage text mining and Natural Language Processing (NLP) have been developed to ameliorate the problem of literature curation. These NLP approaches use parsing, syntactical, and lexical analysis of text to recognize and annotate pieces of text with ontology concepts. Here, we conduct a comparison of four state of the art NLP tools at the task of recognizing Gene Ontology concepts from biomedical literature using the Colorado Richly Annotated Full-Text (CRAFT) corpus as a gold standard reference. We demonstrate the use of semantic similarity metrics to compare NLP tool annotations to the gold standard.

Author Comment

This is a preprint submission to PeerJ Preprints.

Add your feedback

Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback.

Some Markdown syntax is allowed: _italic_ **bold** ^superscript^ ~subscript~ %%blockquote%% [link text](link URL)

By posting this you agree to PeerJ's commenting policies

Questions

Ask a question

Learn more about Q&A

Links

Add a link

Content

Alert

Just enter your email

Add your feedback

Top referrals unique visitors

Share this preprint

Metrics

Download article