This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Motivation: In environmental risk assessment, information about potential health risks of chemicals released into the environment is compiled and distilled for use in informing public policy. The U.S. Environmental Protection Agency (EPA) produces Integrated Science Assessments (ISA) that provide a review of literature on air pollutants, including nitrogen oxides (NOx). That review process currently requires much human labor to evaluate thousands of potentially-relevant documents published each year, a problem this study seeks to alleviate by using automated topic classification methods. Results: For this study, abstracts and titles of scientific documents about NOx were labeled by subject matter experts in four domains relevant to ISAs: toxicology, atmospheric science, epidemiology, and exposure science. In addition, documents not relevant to the four domains were included to simulate the background literature that we want to filter out of consideration. The labeled documents were used to train models using a Naive Bayes Multinomial classifier, via the Weka data mining platform. Separate tests were performed using multi-class or single-class models, and including background literature or not including it. For the multi-class models, recall (% of all documents in a class that are classified correctly) for scientific domains ranged between 74% and 94%, with precision (% of classified documents that are in the desired class) between 38% and 93%, with models created with background literature performing worse than models without the background documents. Single-class models had precision that ranged from 31% to 90%, and recall that ranged from 84% to 98%, with better precision for models not using background literature, but better overall recall for models using background literature. Single-class models generally performed better than multi-class models in recall, though multi-class models without the background screen tended to be best for precision.