Integrating active learning and crowdsourcing into large-scale supervised landcover mapping algorithms
- Published
- Accepted
- Subject Areas
- Human-Computer Interaction, Algorithms and Analysis of Algorithms, Computer Vision, Spatial and Geographic Information Systems
- Keywords
- computer vision, machine learning, active learning, crowdsourcing, landcover, agriculture, human-computer interaction, remote sensing
- Copyright
- © 2017 Debats et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. Integrating active learning and crowdsourcing into large-scale supervised landcover mapping algorithms. PeerJ Preprints 5:e3004v1 https://doi.org/10.7287/peerj.preprints.3004v1
Abstract
Sub-Saharan Africa and other developing regions of the world are dominated by smallholder farms, which are characterized by small, heterogeneous, and often indistinct field patterns. In previous work, we developed an algorithm for mapping both smallholder and commercial agricultural fields that includes efficient extraction of a vast set of simple, highly correlated, and interdependent features, followed by a random forest classifier. In this paper, we demonstrated how active learning can be incorporated in the algorithm to create smaller, more efficient training data sets, which reduced computational resources, minimized the need for humans to hand-label data, and boosted performance. We designed a patch-based uncertainty metric to drive the active learning framework, based on the regular grid of a crowdsourcing platform, and demonstrated how subject matter experts can be replaced with fleets of crowdsourcing workers. Our active learning algorithm achieved similar performance as an algorithm trained with randomly selected data, but with 62% less data samples.
Author Comment
This is a preprint submission to PeerJ Preprints.