Integrating active learning and crowdsourcing into large-scale supervised landcover mapping algorithms

Department of Civil & Environmental Engineering, Princeton University, Princeton, NJ, United States
NASA Jet Propulsion Laboratory, Pasadena, CA, USA
Department of Geography, University of California, Santa Barbara, Santa Barbara, CA, United States
Bren School of Environmental Science and Management, University of California, Santa Barbara, Santa Barbara, CA, United States
DOI
10.7287/peerj.preprints.3004v1
Subject Areas
Human-Computer Interaction, Algorithms and Analysis of Algorithms, Computer Vision, Spatial and Geographic Information Systems
Keywords
computer vision, machine learning, active learning, crowdsourcing, landcover, agriculture, human-computer interaction, remote sensing
Copyright
© 2017 Debats et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Debats SR, Estes LD, Thompson DR, Caylor KK. 2017. Integrating active learning and crowdsourcing into large-scale supervised landcover mapping algorithms. PeerJ Preprints 5:e3004v1

Abstract

Sub-Saharan Africa and other developing regions of the world are dominated by smallholder farms, which are characterized by small, heterogeneous, and often indistinct field patterns. In previous work, we developed an algorithm for mapping both smallholder and commercial agricultural fields that includes efficient extraction of a vast set of simple, highly correlated, and interdependent features, followed by a random forest classifier. In this paper, we demonstrated how active learning can be incorporated in the algorithm to create smaller, more efficient training data sets, which reduced computational resources, minimized the need for humans to hand-label data, and boosted performance. We designed a patch-based uncertainty metric to drive the active learning framework, based on the regular grid of a crowdsourcing platform, and demonstrated how subject matter experts can be replaced with fleets of crowdsourcing workers. Our active learning algorithm achieved similar performance as an algorithm trained with randomly selected data, but with 62% less data samples.

Author Comment

This is a preprint submission to PeerJ Preprints.