Machine learning with remote sensing data to locate uncontacted indigenous villages in Amazonia
- Published
- Accepted
- Subject Areas
- Data Mining and Machine Learning, Spatial and Geographic Information Systems
- Keywords
- Random forest, Satellite imagery, South America, Indigenous societies
- Copyright
- © 2018 Walker et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2018. Machine learning with remote sensing data to locate uncontacted indigenous villages in Amazonia. PeerJ Preprints 6:e27307v1 https://doi.org/10.7287/peerj.preprints.27307v1
Abstract
Background. The world’s last uncontacted indigenous societies in Amazonia have only intermittent and often hostile interactions with the outside world. Knowledge of their locations is essential for urgent protection efforts, but their extreme isolation, small populations, and semi-nomadic lifestyles make this a challenging task.
Methods. Remote sensing technology with Landsat satellite sensors is a non-invasive methodology to track isolated indigenous populations through time. However, the small-scale nature of the deforestation signature left by uncontacted populations clearing villages and gardens has similarities to those made by contacted indigenous villages. Both contacted and uncontacted indigenous populations often live in proximity to one another making it difficult to distinguish the two in satellite imagery. Here we use machine learning techniques applied to remote sensing data with a training dataset of 500 contacted and 25 uncontacted villages.
Results. Uncontacted villages generally have smaller cleared areas, reside at higher elevations, and are farther from populated places and satellite-detected lights at night. A random forest algorithm with an optimally-tuned detection cutoff has a leave-one-out cross-validated sensitivity and specificity of over 98%. A grid search around known uncontacted villages led us to identify 3 previously-unknown villages using predictions from the random forest model. Our efforts can improve policies toward isolated populations by providing better near real-time knowledge of their locations and movements in relation to encroaching loggers, settlers, and other external threats to their survival.
Author Comment
This is a submission to PeerJ Computer Science for review.
Supplemental Information
Supplementary file of raw dataset with 11 feature variables and the target variable
Each data point is either a contacted or uncontacted indigenous village along with 11 feature variables from remote sensing.
R script
R script to run modified random forest algorithm with threshold that varies iteratively.