Integrating active learning and crowdsourcing into large-scale supervised landcover mapping algorithms

Stephanie R Debats; Lyndon D Estes; David R Thompson; Kelly K Caylor

doi:10.7287/peerj.preprints.3004v1

Integrating active learning and crowdsourcing into large-scale supervised landcover mapping algorithms

Stephanie R Debats ¹, Lyndon D Estes¹, David R Thompson², Kelly K Caylor^1,3,4

1 Department of Civil & Environmental Engineering, Princeton University, Princeton, NJ, United States

2 NASA Jet Propulsion Laboratory, Pasadena, CA, USA

3 Department of Geography, University of California, Santa Barbara, Santa Barbara, CA, United States

4 Bren School of Environmental Science and Management, University of California, Santa Barbara, Santa Barbara, CA, United States

DOI: 10.7287/peerj.preprints.3004v1

Published: 2017-06-06
Accepted: 2017-06-06

Subject Areas: Human-Computer Interaction, Algorithms and Analysis of Algorithms, Computer Vision, Spatial and Geographic Information Systems
Keywords: computer vision, machine learning, active learning, crowdsourcing, landcover, agriculture, human-computer interaction, remote sensing

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

Cite this article: Debats SR, Estes LD, Thompson DR, Caylor KK. 2017. Integrating active learning and crowdsourcing into large-scale supervised landcover mapping algorithms. PeerJ Preprints 5:e3004v1 https://doi.org/10.7287/peerj.preprints.3004v1

Abstract

Sub-Saharan Africa and other developing regions of the world are dominated by smallholder farms, which are characterized by small, heterogeneous, and often indistinct field patterns. In previous work, we developed an algorithm for mapping both smallholder and commercial agricultural fields that includes efficient extraction of a vast set of simple, highly correlated, and interdependent features, followed by a random forest classifier. In this paper, we demonstrated how active learning can be incorporated in the algorithm to create smaller, more efficient training data sets, which reduced computational resources, minimized the need for humans to hand-label data, and boosted performance. We designed a patch-based uncertainty metric to drive the active learning framework, based on the regular grid of a crowdsourcing platform, and demonstrated how subject matter experts can be replaced with fleets of crowdsourcing workers. Our active learning algorithm achieved similar performance as an algorithm trained with randomly selected data, but with 62% less data samples.

Author Comment

This is a preprint submission to PeerJ Preprints.