Accuracy of a neural net classification of closely-related species of microfossils from a sparse dataset of unedited images

Johan Renaudie; Ryan Gray; David B Lazarus

doi:10.7287/peerj.preprints.27328v1

Accuracy of a neural net classification of closely-related species of microfossils from a sparse dataset of unedited images

Johan Renaudie ¹, Ryan Gray², David B Lazarus¹

1 Museum für Naturkunde, Leibniz-Institut für Evolutions- und Biodiversitätsforschung, Berlin, Germany

2 Unaffiliated, Reston, Virginia, USA

DOI: 10.7287/peerj.preprints.27328v1

Published: 2018-11-08
Accepted: 2018-11-08

Subject Areas: Paleontology, Data Mining and Machine Learning
Keywords: Micropaleontology, Convolutional neural network, Classification, Automatic identification

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

Cite this article: Renaudie J, Gray R, Lazarus DB. 2018. Accuracy of a neural net classification of closely-related species of microfossils from a sparse dataset of unedited images. PeerJ Preprints 6:e27328v1 https://doi.org/10.7287/peerj.preprints.27328v1

Abstract

Identification of biologic objects in images is a major source of biodiversity data. Currently this is done by scarce taxonomic experts and data is thus limited in scope and reproducibility. Automated identification in fields such as plankton research or micropaleontology, where enormous numbers of objects are available, would significantly improve data quantity and quality, particularly in applied studies of environmental and climate change. We describe a machine learning workflow based on the MobileNet convolutional network. The software can identify closely related species of radiolarians, a morphologically challenging group of microfossils, and from complete species populations (not only ideal specimens) as they are normally identified in standard transmitted light microscope preparations. Multiple, partial focus, depth of field limited images were obtained for each fossil specimen from multiple radiolarian microslides. Images were normalized and in one test also cropped to remove most systematic slide-linked image biases (e. g. type of background particles) that could be used by a classifier as non-taxonomic clues to species assignment. An average of 60 specimens per species for 16 species in two distinct clusters of closely related forms (9 species in the Antarctissa group and 7 species in the genus Cycladophora) were used to train and test the system. An overall average classification accuracy of ca 73% was achieved, and for some species >85%. Using a cutoff for specimens with classifier-calculated low certainty values boosts overall accuracy close to 90%, but at the cost of ca 1/3 reduction in identifiable specimens. This latter accuracy is close to the reproducibility of human experts, albeit with more unidentifiable specimens. The most important constraint to broader use is the time and effort needed by taxonomic experts to collect and label images to be used in training, as many species in these diverse biotas are rare, and the numbers of taxonomic experts available are very limited.

Author Comment

This is a submission to PeerJ for review.