Soil moisture content (SMC) is an important factor that affects agricultural development in arid regions. Compared with the spaceborne remote sensing system, the unmanned aerial vehicle (UAV) has been widely used because of its stronger controllability and higher resolution. It also provides a more convenient method for monitoring SMC than normal measurement methods that includes field sampling and oven-drying techniques. However, research based on UAV hyperspectral data has not yet formed a standard procedure in arid regions. Therefore, a universal processing scheme is required. We hypothesized that combining pretreatments of UAV hyperspectral imagery under optimal indices and a set of field observations within a machine learning framework will yield a highly accurate estimate of SMC. Optimal 2D spectral indices act as indispensable variables and allow us to characterize a model’s SMC performance and spatial distribution. For this purpose, we used hyperspectral imagery and a total of 70 topsoil samples (0–10 cm) from the farmland ( 2.5 ×10^{4} m^{2}) of Fukang City, Xinjiang Uygur AutonomousRegion, China. The random forest (RF) method and extreme learning machine (ELM) were used to estimate the SMC using six methods of pretreatments combined with four optimal spectral indices. The validation accuracy of the estimated method clearly increased compared with that of linear models. The combination of pretreatments and indices by our assessment effectively eliminated the interference and the noises. Comparing two machine learning algorithms showed that the RF models were superior to the ELM models, and the best model was PIR (^{2}_{val} = 0.907, RMSEP = 1.477 and RPD = 3.396). The SMC map predicted via the best scheme was highly similar to the SMC map measured. We conclude that combining preprocessed spectral indices and machine learning algorithms allows estimation of SMC with high accuracy (^{2}_{val} = 0.907) via UAV hyperspectral imagery on a regional scale. Ultimately, our program might improve management and conservation strategies for agroecosystem systems in arid regions.

This is a submission to PeerJ for review.

(A) ^{2} maps of R_DI_{(479,619)}. (B) ^{2} maps of R_RI_{(431,446)}. (C) ^{2} maps of R_NDI_{(431,446)}. (D) ^{2} maps of R_PI_{(446,471). }The colorbar illustrates the value of the square of the correlation coefficient (^{2}) between SMC and spectral indices, and the x- axes and y-axes indicate the wavebands of 400–1000 nm. Dark red portrays a high ^{2 }between SMC and the spectral indices.

(A) ^{2} maps of FDR_DI_{(435,746)}. (B) ^{2} maps of FDR_RI_{(702,724)}. (C) ^{2} maps of FDR_NDI_{(702,726)}. (D) ^{2} maps of FDR_PI_{(435,744). }The colorbar illustrates the value of the square of the correlation coefficient (^{2}) between SMC and spectral indices, and the x- axes and y-axes indicate the wavebands of 400–1000 nm. Dark red portrays a high ^{2 }between SMC and the spectral indices.

(A) ^{2} maps of SDR_DI_{(710,753)}. (B) ^{2} maps of SDR_RI_{(444,895)}. (C) ^{2} maps of SDR_NDI_{(417,753)}. (D) ^{2} maps of SDR_PI_{(653,753). }The colorbar illustrates the value of the square of the correlation coefficient (^{2}) between SMC and spectral indices, and the x- axes and y-axes indicate the wavebands of 400–1000 nm. Dark red portrays a high ^{2 }between SMC and the spectral indices.

(A) ^{2} maps of CR_DI_{(400,446)}. (B) ^{2} maps of CR_RI_{(431,446)}. (C) ^{2} maps of CR_NDI_{(431,446)}. (D) ^{2} maps of CR_PI_{(446,466). }The colorbar illustrates the value of the square of the correlation coefficient (^{2}) between SMC and spectral indices, and the x- axes and y-axes indicate the wavebands of 400–1000 nm. Dark red portrays a high ^{2 }between SMC and the spectral indices.

(A) ^{2} maps of A_DI_{(431,446) }. (B) ^{2} maps of A_RI_{(431,619)} . (C) ^{2} maps of A_NDI_{(431,619)}. (D) ^{2} maps of A_PI_{(446,471). }The colorbar illustrates the value of the square of the correlation coefficient (^{2}) between SMC and spectral indices, and the x- axes and y-axes indicate the wavebands of 400–1000 nm. Dark red portrays a high ^{2 }between SMC and the spectral indices.

(A) ^{2} maps of FDA_DI_{(435,744) }. (B) ^{2} maps of FDA_RI_{(420,726)} . (C) ^{2} maps of FDA_NDI_{(513,726)}. (D) ^{2} maps of FDA_PI_{(435,713). }The colorbar illustrates the value of the square of the correlation coefficient (^{2}) between SMC and spectral indices, and the x- axes and y-axes indicate the wavebands of 400–1000 nm. Dark red portrays a high ^{2 }between SMC and the spectral indices.

(A) ^{2} maps of SDA_DI_{(579,753) }. (B) ^{2} maps of SDA_RI_{(440,446)} . (C) ^{2} maps of SDA_NDI_{(477,753)}. (D) ^{2} maps of SDA_PI_{(753,946). }The colorbar illustrates the value of the square of the correlation coefficient (^{2}) between SMC and spectral indices, and the x- axes and y-axes indicate the wavebands of 400–1000 nm. Dark red portrays a high ^{2 }between SMC and the spectral indices.

ELM algorithm code in matlab

RF algorithm code in matlab

Reflectance of samples (n= 70)

The authors declare that they have no competing interests.

The following information was supplied regarding data availability:

The RF and ELM algorithm are provided in the supplementary files Dataset S1 and Dataset S2.

The raw spectral data are provided in the supplementary files Dataset S3.