Soil salinization, one of the most important causes of land desertification and deterioration, has posed serious threat to agricultural development and sustainable utilization of natural resources (Shahid & Rahman, 2011; Abbas et al., 2013). 950 million ha of soil worldwide has become salinized (Schofield & Kirkby, 2003). Soil salinization is eroding and degenerating the arable soil at the speed of 10 ha/min (Graciela & Alfred, 2009). Soil remediation and management are very difficult in China because of such complex natural factors as climate, terrain and geology, and human factors as unreasonable irrigation and disruption of ecological balance. The total area of saline soil in China is 36 million ha (Li et al., 2014), accounting for 4.88% of the total area available nationwide (The National Soil Survey Office, 1998). Saline soil usually has a high concentration of salt ions with a series of effects on the plants such as physiological draught, ion toxicity and metabolic disorder, thus forming “salt damage” (Munns, 2002; Tavakkoli et al., 2011). In addition, one major cause of the inaccuracy of soil salinity spectral measurement is that pure salts seldom exist in the soil because of some trace salt ion elements are always fixed in soil crystals. Therefore, quick and accurate acquisition of the detailed information of the various salt ions content in the soil can enhance the pertinence and effectiveness of saline soil management.
The traditional quantitative estimation of soil salt contents usually includes such steps as field soil sampling in fixed points, experiments in the laboratory and comprehensive statistical analysis (Urdanoz & Aragüés, 2011). Such a method is incapable of the dynamic monitoring of saline soil in a large area because of its high consumption of time and energy, small number of measuring points and poor representativeness (Ding & Yu, 2014). Compared with conventional laboratory analysis methods, remote sensing technology has been widely used due to its rich information, continuity, high precision and low cost (Ben-Dor, 2002; Viscarra Rossel et al., 2006; Viscarra Rossel & Behrens, 2010; Viscarra Rossel & Webster, 2012). The various soil constituents (contents of water, salt, organic matter and so forth) can be acquired conveniently from remote sensing data (Gomez, Viscarra Rossel & McBratney, 2008; Yu et al., 2010; Periasamy & Shanmugam, 2017). Hence, with the abundant spectral reflection information within the VIS-NIR intervals of soil salinity, it is feasible to improve the accuracy of soil salinization inversion (Al-Khaier, 2003; Ben-Dor et al., 2009; Abbas et al., 2013).
The application of VIS-NIR spectral analysis technique has been proved effective in improving the accuracy of quantitative estimation and eliminating the external disturbance to some extent (Dehaan & Taylor, 2002; Metternicht & Zinck, 2003; Farifteh et al., 2008). The univariate linear regression on the basis of soil salinity index developed for CR (continuum removed) reflectance can be used as a method for soil salt content estimation (Weng, Gong & Zhu, 2008). Due to the strong correlation between soil electrical conductivity (EC) and soil salinity, EC is also one of the important indicators for evaluating soil salinization degree. A variety of approaches have been used to acquire the EC in the field soil, including the partial least squares regression (PLSR) and multivariate adaptive regression splines (MARS) (Volkan Bilgili et al., 2010; Nawar, Buddenbaum & Hill, 2015), logarithmic model (Xiao, Li & Feng, 2016a), Bootstrap-BP neural network model (Wang et al., 2018d) and satellite remote sensing technology (Nawar et al., 2014; Bannari et al., 2018). In addition, the differential transformation (Xia et al., 2017) and fractional derivative (Wang et al., 2017; Wang et al., 2018c) can fully utilize the potential spectral information and enhance model accuracy. The methods of spectral classification (Jin et al., 2015) and water influence elimination (Chen et al., 2016; Peng et al., 2016b; Yang & Yu, 2017) work well in improving the quantitative inversion accuracy of soil salinity. Therefore, the remote sensing technique is reliable to inverse the soil salinity quantitatively on different scales.
The quantitative analysis of VIS-NIR spectral intervals can help evaluate the content of some chemical elements (Viscarra Rossel et al., 2006; Farifteh et al., 2008; Cécillon et al., 2009; Ji et al., 2016) due to the different characteristic absorption spectrum in soil chemical elements. Besides, there exists a correlation between some principal salt ions (Na+, Cl−) and spectral reflectance (Jiang et al., 2017). Therefore, VIS-NIR spectroscopy technique can be used to obtain the contents of the soil salt ions to a certain extent. The spectral response characteristics of mid-infrared (MIR) spectroscopy are better than those of VIS-NIR spectroscopy in predicting soil salinity information, the latter has high predicting accuracy of the total salts content, HCO3−, SO42− and Ca2+, followed by Mg2+, Cl− and Na+ (Peng et al., 2016a). The spectral models have satisfactory prediction of the SAR (sodium absorption ratio) of soil salinization evaluation parameter, which is composed of the contents of Ca2+, Mg2+ and Na+ (Xiao, Li & Feng, 2016b). Qu et al. (2009) found that the contents of the total salt, SO42−, pH and K++Na+ have a higher inversion accuracy using spectral data to create PLSR model. The different pretreatment of the different ion models varies by creating and analyzing PLSR model that demonstrates relatively good predictive effects like ion contents of Ca2+, Mg2+, SO42−, Cl−, and HCO3− (Dai et al., 2015). Overall, PLSR is a frequently used and robust linear model for quantitative research because it has inference capabilities which are useful to model a probable linear relationship between the reflectance spectra and the salt ions content in soil. However, the non-uniform data and non-linear reflectance in spectral information of some soil chemical elements lead to the reduction in model accuracy (Viscarra Rossel & Behrens, 2010; Nawar, Buddenbaum & Hill, 2015). In particular, support vector regressions (SVR) based on kernel-based learning methods has the ability to handle nonlinear analysis case with high model accuracy (Vapnik, 1995; Peng et al., 2016a; Hong et al., 2018b). Over the past several decades, the use of SVR for classification and regression has been extensively applied in soil VIS-NIR spectroscopy (Ben-Dor, 2002; Xiao, Li & Feng, 2016b; Hong et al., 2018a). Moreover, the SVR model works well in estimating the contents of K+, Na+, Ca2+ and SO42− in the soil (Wang et al., 2018a). Thus, the correct way of modeling helps to guarantee the model accuracy (Farifteh et al., 2007).
Many researches focused on the inversion of soil salinity using spectral information. Nevertheless, little research has explored the eight water-soluble salt ions (K+, Ca2+, Na+, Mg2+, Cl−, SO42−, HCO3− and CO32−) using spectral information in the soil. The model fitting of ions and spectral information still needs improving (Farifteh et al., 2008; Peng et al., 2016a). Apart from the suitable multivariate statistical analysis method that can partly improve the inversion effects, reduction of redundant information is another identified approach to further optimize the model (Bannari et al., 2018; Stenberg et al., 2010). Plenty of studies have demonstrated that spectral variable selection methods can not only reduce the complexity of calibration models, but also improve the model predictive performance (Hong et al., 2018a). To select the optimal spectral variable subset, scholars have investigated varied methods such as gray correlation (GC) (Li et al., 2016; Wang et al., 2018b), stepwise regression (SR) (Zhang et al., 2018) and variable importance in projection (VIP) (Qi et al., 2017), and have achieved satisfactory effects. In addition, all the three methods have been widely applied in many studies, such as plant physiology, food engineering, mathematical statistics (Oussama et al., 2012; Maimaitiyiming et al., 2017; Liu, Yang & Wu, 2015). However, few studies have concentrated on the use of variable selection algorithms in the inversion of soil salt ions.
This study aims to: (1) build the optimal model of soil salt ions using VIS–NIR spectroscopy technique; (2) compare the models based on the sensitive spectral ranges selected using GC, SR and VIP methods for different soil ions; (3) compare the performance of PLSR and SVR models, and identify the optimal models for different ions.
Materials and Methods
Hetao Irrigation District (HID), with Yin Mountains at its north, the Yellow River at its south, Ulanbuh Desert at its west and Baotou at its east, lies in Bayannur League, Inner Mongolia, China. It consists of irrigation areas of Ulan Buh, Jiefangzha, Yongji, Yichang and Urat, and it is China’s largest irrigation district with a total size of 5740 km2 (Yu et al., 2010). In addition, HID is an important production base of cereal and oil plants in China with major crops of wheat, corn and sunflower. Shahaoqu Irrigation Area (SIA), a typical region of saline soil in HID, was chosen as the study area. SIA (107°05′∼107°10′E, 40°52′∼41°00′N) is located in the central east of Jiefangzha Irrigation Area. SIA belongs to typical continental climate, having hot summers, chilly winters, rare precipitation and strong evaporation. Its mean annual temperature, precipitation, potential evaporation is about 7.1 °C, 155 mm and 2,000 mm, respectively. Physiographically, the mean elevation and slope of SIA are about 1,030 m and 1/10,000, respectively. According to the World Reference Base for Soil Resources (WRB), the local soil texture is mainly silty clay loam with varying degrees of saline soil. Over the years, due to its gentle terrain slope, poor groundwater runoff, intense land surface evaporation and irrational farming activities, about 60% of the land within the district has been affected by various degree of salinization, which seriously restricted the agricultural development (Wu et al., 2008; Gao et al., 2015).
Sample collection and chemical analysis
The Hetao irrigation district administration gave field permit approval to us (NO. 2017YFC0403302). To ensure the representativeness of soil samples, the samples were randomly gathered from a total of 120 sampling units on a grid of 16 m ×16 m (because the spatial resolution of GF-1 satellite imagery is 16 m) in the study area during October 12∼22, 2017 (Fig. 1). In each unit, approximately 0.5 kg of topsoil (0–5 cm) was collected at four randomly selected sampling sites and then mixed thoroughly to obtain a representative sample. Overall, a total of 120 soil samples were acquired, and each sample was stored in a plastic bag, labeled and sealed. A portable global position system (GPS) was used to determine the coordinates of sampling points. Subsequently, the soil samples were transported to the lab to receive a series of such treatments as sufficient natural air-drying for two weeks and rubbing through a 2 mm sieve to exclude small stones and other impurities. Each sample was divided into two subsamples to be used for spectra collection and physiochemical analysis.
Each 50 g of soil sample was put into a respective flask, and 250 ml of distilled water (the ratio of water to soil is 5:1) were added into each flask. The water-soluble ion contents were measured in the filtrate obtained from full soaking, oscillation and filtration (Aboukila & Norton, 2017). Ca2+ and Mg2+ were measured using EDTA titration, Na+ and K+ flame photometry, CO32− and HCO3− double indicator-neutralization titration, Cl− silver nitrate titration, and SO42− EDTA indirect complexometry (Bao, 2000). The content of CO32− was too low (approximately 0) in some soil samples because CO32− is liable to integrate with Ca2+ and Mg2+ as sediment in a weak alkaline solution (Table 1). Coefficient of variation (CV) reflects the degree of discreteness, and a positive correlation exists in two variables. The high CV helps to build a robust model (Dai et al., 2015). The grading of CV showed a wide range of variation among different ions, among which the ion contents of K+, Na+ and SO42− are over 100%, showing a strong variability, and those of CO32−, Cl−, Ca2+, Mg2+ and HCO3− are between 10% and 100%, having a moderate variability.
|Statistical index||Minimum/ g kg−1||Maximum/ g kg−1||Mean/ g kg−1||Standard deviation||Coefficient of variation/%|
Laboratory spectral measurements and pretreatments
The soil samples were put into black vessels with a diameter of 10 cm and depth of 2 cm for spectral data collection and the surfaces were smoothed with a straightedge in the laboratory. The spectral data of the soil samples were measured using ASD (Analytical Spectral Devices, Inc., Boulder, CO, USA) FieldSpec®3 spectrometer with spectral range from 350–2,500 nm. This instrument is equipped with two sensors whose spectral resolutions are 1.4 nm and 2 nm, for the region of 350–1,000 nm and 1,000–2,500 nm, respectively. The spectral data was measured in a dark room with the light sources which have halogen lamps of 50 W, 50 cm from the sample soil surfaces, and 30° incident angle to reduce the effects of external factors to the minimum. The field angle of fiber-optics probe is 5°, and it is 15 cm from the sample soil surface. The light source and spectrometer had been fully preheated, and the spectrometer had been corrected with a standardized white panel (99% reflectance) prior to each measurement to reduce measurement error. Each sample soil was measured in four directions (3 turns, each is 90°), the spectrum was collected five times in each direction, and altogether there were 20 curves of the spectrum (Hong et al., 2018b). These curves were used as the raw spectral reflectance (Rraw) after having the arithmetic mean in ViewSpecPro software version 6.0. The gaps of the spectral curves near 1,000 nm and 1,800 nm were corrected using the Splice Correction function (Xiao, Li & Feng, 2016a).
The fluctuation would affect the accuracy of subsequent modeling because of such disturbance as the external environment, instrument noise and random error in spectral data collection. In general, a series of effective pretreatment, including smoothing, resampling and transformation etc., can eliminate the external noise to some degree, and then enhance the spectral characteristics (Ding et al., 2018). Therefore, it is necessary to pretreat Rraw in the following steps. (i) The marginal wavelength (350–399 nm and 2,401–2,500 nm) of higher noise in each soil sample was removed, then remaining spectrum data was smoothed with filter method (window size is 5 and polynomial order is 2) using Savitzky-Golay (SG) (Savitzky & Golay, 1964) via Origin Pro software version 2017SR2. (ii) The spectral data between 400 and 2,400 nm was resampled with a 10 nm of sample interval to keep the spectral features and remove redundant information (Xu et al., 2016). A new spectral curve consisting of 200 wave bands was obtained. (iii) The precise Rraw−SNV was obtained by using the standard normal variable (SNV) to eliminate the effects of soil particle size, surface scattering and baseline shift on the spectrum data (Xiao, Li & Feng, 2016b; Barnes, Dhanoa & Lister, 1989). The spectral curves of Rraw and Rraw−SNV are shown in Figs. 2A and 2B. Notably, comparison indicated that the spectral curve in Fig. 2B was much smoother than that in Fig. 2A, which made for the subsequent modeling.
Gray correlation (GC)
The GC, as one grey system theory, seeks the primary and secondary relations and analyzes the different effects of all the factors in a system (Deng, 1982; Li et al., 2016). Its calculation process is as follows: the reference sequence is , the comparative sequence is , and the formula of the gray correlation degree (GCD) between X0 and Xi is (1) where
ρ is the distinguishing coefficient within . ρ was set as 0.1 in this paper.
The inconsistent dimension between the spectral data and the contents of different ions has some effects on the data analysis. Therefore, normalizing the spectral data preprocessing method can reduce these disadvantageous effects (Liu, Yang & Wu, 2015; Wang et al., 2018b). In this paper, the larger the GCD of a certain band is, the closer relation the band and the ion content has, and vice versa.
Variable importance in projection (VIP)
The VIP is a variable selection method based on PLSR (Oussama et al., 2012). The explanatory power of the independent variables to the dependent variables is achieved by calculating the VIP score. The independent variables are sequenced according to the explanatory power (Qi et al., 2017). The VIP score for the j-th variable is given as: (2)
Where p is the number of independent variables; f is the total number of components; SSYf is the sum of squares of explained variance for the f-th component and p the number of independent variables. SSYtotal is the total sum of squares explained of the dependent variable. gives the importance of the j-th variable in each f-th component. The higher value VIPj has, the stronger explanatory power the independent variable has over the dependent variable. The VIP scores of independent variables have been recognized as a useful measure to identify important wavelengths when the score is more than 1 (Wold, Sjöström & Eriksson, 2001; Maimaitiyiming et al., 2017).
Model construction and validation
Two-thirds of the samples were used for modeling (n = 80) and one third for validation (n = 40) using Kennard-Stone (K-S) to calculate the Euclidean distance among different samples to ensure the statistical characteristics of modeling and the validation datasets resembled that of the whole sample set (Kennard & Stone, 1969).
The PLSR and SVR models were applied to the quantitative inversion of different water-soluble salt ion contents in the saline soil in this paper. The PLSR model is a new stoichiometric statistical model. Compared with the traditional multivariate least squares regression (MLSR), PLSR can overcome the multicollinearity among the variables, reduce the dimension, synthesize and filter the information, extract the aggregate variables with the strongest explanatory power in the system, and exclude the noise with no explanatory power (Wold, Sjöström & Eriksson, 2001). The optimal fitting model was built using the number of optimal principal components through full cross validation. SVR model is a new machine learning method based on the principle of structural risk minimization provided by the statistical learning theory. This model is characterized by its ability of solving such problems as limited sample size, nonlinear data processing and spatial pattern recognition of high-dimension data (Vapnik, 1995). During the modeling in this study, the type of SVR and kernel were set as epsilon-SVR and linear function, respectively; the penalty parameter C and nuclear parameter g were acquired by a grid-searching technique and a leave-one-out cross validation procedure. The optimal values of C and g were selected when the minimum RMSECV (root mean squared error of cross validation) was produced (Xiao, Li & Feng, 2016b). The two models were constructed and validated using the Unscrambler software version X10.4 (CAMO AS Oslo, Oslo, Norway).
Precision indices of determination coefficient of calibration (Rc2), determination coefficient of prediction (Rp2), root mean squared error (RMSE) and ratio of performance to deviation (RPD) were used to evaluate the performance of these models. RPD classification was adopted to facilitate the interpretation of predictive results: a model is considered as excellent when RPD ≥ 2.5, as very good when 2.0 ≤ RPD < 2.5, as good when 1.8 ≤ RPD <2.0, and as satisfactory when 1.4 ≤ RPD <1.8 and can only distinguish between high and low values when 1.0 ≤ RPD <1.4 (Viscarra Rossel, Taylor & McBratney, 2007). Generally, the most robust model would be the one with the largest Rc2, Rp2 (approach to 1) and RPD value and the lowest RMSE value.
Correlation between water-soluble salt ions content and spectral reflectance
The correlation coefficients (Pearson correlation) between each soil salt ion content and Rraw−SNV in the range of 400–2,400 nm were tested with the significance level of P < 0.01 (|r| = 0.234 or above). The curves of correlation coefficients of soil salt ions were plotted in Fig. 3 and the numbers of bands passing the significance test were counted in Table 2.
|Water-soluble salt ions||Number of significant bands||Maximum correlation coefficient||Maximum correlation band intervals/nm|
The curve patterns of SO42−, Cl−, Ca2+, Mg2+, K+ and Na+ were similar (Fig. 3). From 400 nm to about 550 nm, the correlation coefficients rose sharply from negative to positive, moved with a gentle depression until 1,400 nm, plummeted and surged up to 1,560 nm (among the curves, the change of Ca2+ was the sharpest), and maintained a relative stable state to 1850 nm. And then from 1,850 to 2,400 nm, dramatic oscillating variations alternated between rise and fall. In the intervals of 400–1,400 nm and 1,850–2,400 nm the curve pattern of CO32− was similar to that of other ions such as SO42−. But between 1,400 nm and about 1,850 nm, the curve took on a unique pattern: sustained oscillating rise. The coefficient curve of HCO displayed a smaller variation, smoothly fluctuating between −0.2 and 0.2. The complex variation of the coefficient curves of different ions revealed rich spectral information.
Selection of characteristic wavelength
Characteristic wavelength selection based on GC method
The curves of gray correlation degree for soil water-soluble salt ions content and Rraw−SNV were shown in Fig. 4. The correlation coefficient curves of the seven ions except CO32− resembled those of the GCD of the Rraw−SNV. Generally, the curves exhibited patterns of “oscillatory rise, fluctuation, rapid rise and fall, and oscillatory fluctuation”. The gray correlation curves of CO32− followed a pattern of “ascending, plummeting, and smooth transition”. The analysis of the GC curve amplitude showed the amplitudes of Cl−, Mg2+ and Ca2+ were relatively large, and those of Na+, SO42−, K+ and HCO3− were relatively small, and that of CO32− was relatively gentle.
The order of the maximal GCD was: Cl− (0.561) > Mg2+ (0.559) > Ca2+ (0.551) > Na+ (0.508) > SO42− (0.494) > K+ (0.470) > HCO3− (0.465) > CO32− (0.416). To ensure that each salt ion had sensitive bands as far as possible, the GCD threshold value was set as 0.40 to select the wavelength. The sensitive band was counted through gray correlation method (Table 3). The numbers of sensitive bands of different ions could be sequenced from the largest to the smallest as follows: Mg2+ (110) > HCO3− (105) > Cl− (101) > Ca2+ (53) > Na+ (36) > SO42− (21) > K+ (15) > CO32− (14). Therefore, the orders of sensitive band numbers and maximal GCD values had great difference. Furthermore, the band intervals corresponding to the maximum GCD of different salt ions were as follows: CO32− was near-infrared between 1,740 and 1,750 nm, HCO3− was green light between 560 and 570 nm, and the rest of six ions were near-infrared between 1,650 and 1,660 nm.
|Water-soluble salt ions||Sensitive band numbers||Maximum gray correlation degree||Maximum gray correlation degree intervals/nm|
Characteristic wavelength selection based on SR method
Feature band intervals were selected by stepwise regression method in SPSS software version 23.0 (IBM, Chicago, IL, USA), and the significance levels of variables acceptance and rejection were set at 0.10 and 0.15 (Zhang et al., 2018). The parameter indexes of feature band intervals selection were shown in Table 4 by stepwise regression method at maximum adjusted R2.
|Water-soluble salt ions||Sensitive band numbers||Band intervals/nm||Adjusted R2||Standard error||Sig.|
|Ca2+||7||1,040∼1,050, 1,090∼1,100, 1,900∼1,910, 1,920∼1,930, 2,200∼2,210, 2,310∼2,320, 2,370∼2,380||0.942||0.529||<0.001|
|Cl−||8||730∼740, 910∼920, 1,890∼1,900, 1,970∼1,980, 1,990∼2,000, 2,180∼2,190, 2,200∼2,210, 2,290∼2,300||0.975||1.063||<0.001|
|CO32−||4||1,280∼1,290, 1,360∼1,370, 1,380∼1,390, 1,420∼1,430||0.836||0.012||<0.001|
|HCO3−||3||2,200∼2,210, 2,260∼2,270, 2,290∼2,300||0.934||0.085||<0.001|
|K+||6||740∼750, 810∼820, 1,160∼1,170, 1,890∼1,900, 2,210∼2,220, 2,390∼2,400||0.817||0.706||<0.001|
|Mg2+||6||1,130∼1,140, 1,930∼1,950, 1,990∼2,000, 2,100∼2,110, 2,170∼2,180||0.973||0.152||<0.001|
|Na+||6||740∼750, 820∼830, 1,860∼1,870, 2,210∼2,220, 2,260∼2,270, 2,390∼2,400||0.942||1.812||<0.001|
|SO42−||6||610∼620, 1,140∼1,150, 1,960∼1,970, 2,210∼2,220, 2,290∼2,300, 2,390∼2,400||0.947||3.255||<0.001|
Great difference existed among the optimal SR models of different ions, and the numbers of band intervals accepted by the model range from 3 to 8 (Table 4). The SR model fitted well with the adjusted R2 greater than 0.8 when the number of selected independent variables was considered. Meanwhile, SR model of each ion was statistically significant (p < 0.001). Therefore, the band intervals selected by the SR models were used as the independent variables of PLSR and SVR models.
|Water-soluble salt ions||Sensitive band numbers||Maximum VIP scores||Maximum VIP scores intervals/nm|
Characteristic wavelength selection based on VIP method
Curves of VIP scores of soil water-soluble salt ions content and Rraw−SNV were shown in Fig. 5. Max VIP scores and band intervals obtained from VIP method of soil water-soluble salt ions content and Rraw−SNV were shown in Table 5.
The curves patterns of seven ions were similar except HCO3− (Fig. 5). These curves exhibited violent oscillation in the intervals of 400–800 nm and 1,900–2,400 nm, gentle transition between 800 nm and around 1,400 nm, and fluctuant rise from 1,400 to 1,900 nm. In contrast, the curve of HCO3− showed oscillatory rise from 400 to 1,400 nm, a “U” shaped motion from 1,400 to 1,900 nm or so, and a rapid fall and oscillation to 2,400 nm. The numbers of sensitive bands based on VIP method displayed the following sequence: Cl− (85) > Na+ (83) > HCO3− (79) > SO42− (74) > Mg2+ (69) = Ca2+ (69) = K+ (69) > CO32− (67). The sequence of the maximal VIP scores was HCO3− (2.37) > CO32− (2.01) > Ca2+ (1.97) > SO42− (1.74) > K+ (1.73) > Na+ (1.55) > Mg2+ (1.49) > Cl− (1.42). The spectral interval of the maximal VIP scores of Cl− was from 560 to 570 nm, Ca2+, CO32− and HCO3− were concentrated between 1,410 and 1,450 nm; and K+, Mg2+, Na+ and SO42− were from 1,870 to 1,890 nm.
Construction and analysis of PLSR model
The sensitive bands were obtained using different band selection methods of GC, SR and VIP to build PLSR model. The results of PLSR model were shown in Table 6.
|Wavelength selection methods||Water-soluble salt ions||Latent variables||Calibration sets||Validation sets|
|Variable importance in projection||Ca2+||3||0.909||0.865||0.249||2.57|
The models of the six ions Ca2+, Cl−, CO32−, Mg2+, Na+ and SO42− performed well using VIP method (Rc2 is close to 1). The models based on the bands of Ca2+, Cl−, Mg2+, Na+ and SO42− selected using the SR method displayed good fitting effect, and those of Ca2+, Mg2+ and Na+ using the GC method exhibited good fitting effect.
In terms of verification accuracy, VIP method had excellent prediction of Ca2+, Na+, SO42−, SR method had excellent prediction of Ca2+, Mg2+, Na+, SO42− (the RPD of Ca2+ was up to 3.95), and GC method did not show strong prediction power over any ions. On the contrary, all the three models demonstrated poor forecasting power over HCO3−. The RPDs of SR-HCO3− and VIP-HCO3− were 0.64 and 0.93 respectively. Therefore, VIP method had the best modeling effect and SR method had the best forecasting effect, and GC method had poor modeling and forecasting effects on the salt ions inversion in the PLSR model.
Construction and analysis of SVR model
The sensitive bands were obtained by using different band selection methods of GC, SR and VIP to build SVR model. The results of SVR model were shown in Table 7.
|Wavelength selection methods||Water-soluble salt ions||Calibration sets||Validation sets|
|Variable importance in projection||Ca2+||0.960||0.935||0.173||3.93|
The modeling accuracy of SVR model was similar to that of PLSR model. But the verification accuracy of ions was different between the two models. VIP method had the excellent prediction of Ca2+, Cl−, Mg2+, Na+, SR method had the excellent prediction of Ca2+, Mg 2+, Na+, SO42−, and GC method did not show strong prediction power over any ions. The prediction results of Ca2+ were the best: the RPD of VIP and SR models were 3.93 and 3.97, respectively. Overall, in the SVR model, VIP method exhibited the best performance for modeling and predicting the salt ions content, SR method was the second, and GC method was relatively poorer.
Comparison among the results of different salt ions content in estimating
The optimal band selection method varied in some degree from the optimal modeling method (Tables 6 and 7). The comparison was made between the measured value and the estimated value of all the ions concerned under the optimal model (Fig. 6). The sequence of the forecasting power of the ions was Ca2+ > Na+ > Cl− > Mg2+ > SO42− > CO32− > K+ > HCO3−, and it was the same as that of the modeling power.
Obviously, the verification result showed that most data points of the five ions, Ca2+, Na+, Cl−, Mg 2+ and SO42−, were concentrated near line 1:1. The optimal models of these five ions had very strong predicative power with the RPD above 2.5 (Tables 6 and 7). Compared with the previous researches, model prediction effects of K+ and Na+ (Qu et al., 2009); Ca2+, Na+ and Mg2+ (Viscarra Rossel & Webster, 2012); HCO3−, Ca2+, Cl−, Mg2+ and SO42−(Dai et al., 2015); HCO3−, Ca2+ and SO42− (Peng et al., 2016a); K+, Na+, Ca2+ and SO42− (Wang et al., 2018a) were satisfactory. Although the results of this study are not exactly the same as these previous researches, it still shows the rationality own to some extent. In addition, this result shows that band selection has realized the goal of removing the irrelevant information, and plays a major role in improving the inversion accuracy of salt ions.
In Fig. 6, the data points of CO32− and K+ were relatively dispersed in the verification result. The CO32− had a relatively good predictive power (RPD = 1.80) and the K+ had a normal predictive power (RPD = 1.43). Notably, HCO3− had no predicative power (RPD = 0.96) because the slope was under the 1:1 line and the data points were most discrete (Fig. 6D). The predicting effect of HCO3− was different from that of Peng et al. (2016a) and Dai et al. (2015), but similar to that of Wang et al. (2018a). The cause of this result needs to be further studied. Overall, it is vital to make some efforts to improve the robustness and accuracy of these ion models. Xiao, Li & Feng (2016b) failed to predict Na+, Mg 2+ and Ca2+, but applied the SVR model to forecasting SAR after the SNV transformation and the performance was satisfactory (RPD = 2.13). Analogously, first derivative reflectance (FDR) index was calculated to effectively predict SAR by Xiao, Li & Feng (2016a). In addition, Viscarra Rossel & Webster (2012) forecasted the content of Na+ after logarithmic pretreatment with VIS-NIR spectral technique (RPD = 2.10). Thus, salt ion indexes construction and variable transformation processing are helpful approaches to improve the correlation with the spectra so as to establish satisfactory models.
A little difference existed in the applicability between PLSR and SVR models on inversing the content of ions. Both methods could produce satisfactory results in conformity with that of Peng et al. (2016a). In addition, the optimal inversion models and prediction models for each ion were different: SR-PLSR model and SR-SVR model for Ca2+, VIP-SVR model and SR-PLSR model for CO32−, SR-PLSR model and VIP-PLSR model for K+, VIP-PLSR model and GC-PLSR model for HCO3−, respectively. Among them, the performance of the optimal inversion model of Ca2+ resembled that of the prediction model. The results suggested that the ion models with poorer performance frequently demonstrated uncertainty in the inversion process (Peng et al., 2016a). Generally, as the major water-soluble ion components in the two highly soluble salts of sodium and kali, Na+ and K+ exhibit great difference in the spectral characterization degree (Dai et al., 2015). Therefore, the spectral characters of water-soluble salt ions are not necessarily determined by the number of dissociative ions, so more pertinent experiments and analysis should be conducted to explore the response mechanism.
Correlation analysis and inversion performance
The raw spectral reflectance curve of each soil sample presented distinct shapes (Fig. 2A). One of the prime reasons for this phenomenon is that the absorption features in these soil samples were related to soil salt crystal contents and types, as well as various chemical bonds (e.g., C-H, O-H, N-H). The results were in accordance with those in previous studies (Viscarra Rossel et al., 2006; Viscarra Rossel & Webster, 2012; Dai et al., 2015; Peng et al., 2016a; Wang et al., 2018a), which demonstrated that soil VIS-NIR spectra could be used to determine part of soil salt ions contents in some degree.
Traditionally, correlation analysis helps reveal the relationships between soil salt ions content and VIS-NIR spectra, and it indicates modeling effects to some degree (Weng, Gong & Zhu, 2008). In the current research, the number of the significant bands of different ions could be sequenced from the largest to the smallest as follows: Cl− (96%) > Ca2+ (95%) > Mg2+ (93%) > Na+ (90.5%) > K+ (89%) = SO42− (89%) > CO32− (73%) > HCO3− (0.5%), the correlation coefficients of different ions ranged from the largest to the smallest as: Cl− (−0.882) > Ca2+ (−0.877) > Mg2+ (−0.848) > Na+ (−0.752) > SO42− (0.749) > K+ (0.630) > CO32− (0.552) > HCO3− (0.235) (Table 2). Thereby, five ions (Cl−, Ca2+, Mg2+, Na+ and SO42−) had more significant relationship with reflectance spectra. Although there were some differences between forecasting power ranking and correlation ranking, the optimal models of these five ions had the excellent predictive results (Fig. 6). Nevertheless, the other three ions (K+, CO32− and HCO3−) had weak correlations and unsatisfactory predictive power. In particular, HCO3− had only one significant band and the worst prediction effects. But in most cases, the sensitive band numbers of HCO3− were not the least in comparing the results of the three wavelength selection methods (Tables 3–5). Thus, we conjecture that the different calculation mechanisms cause a certain inconsistency between modeling performance and sensitivity. In addition, the optimal method of finding out their responding spectrum varies from one ion to another in the soil. In future study, it is practically significant to adopt various methods to select the optimal bands in the inversion of soil ions.
Effects of wavelength selection on estimation models
The massive complex spectra often contain a large amount of redundant information irrelevant to the ions contents. The selection of feature spectra is hence a critical step to create a robust model. From Tables 3–5, we could see the great difference exist in the number of wavelength selected with the three methods: VIP method had the largest number of wavelengths (34.5%∼42.5%), SR method had the smallest number of wavelengths (1.5%∼4%) and number of wavelengths (7%∼55%) varied greatly by GC method.
Our experiment with three wavelength selection methods also indicated that different methods yielded different results. Among the three methods, the VIP method produced the best results, followed by SR method, while the GC method performed least ideally. We argue that the GC method is not necessarily an inappropriate method as some results are still acceptable. However, GC method could distinguish the primary relationships among the factors in the system by calculating and comparing GCD (Deng, 1982; Liu, Yang & Wu, 2015). In the field of spectral analysis, the application of GC method could better identify sensitive spectral indices, select sensitive bands and optimize inversion model (Li et al., 2016). On the other hand, Wang et al. (2018b) used GC method to extract the feature bands of soil organic matter content to construct the model with stronger generalization capability. Therefore, the soil compositions have a strong impact on the performance of spectral model. This conclusion is consistent with previous research results (Viscarra Rossel et al., 2006; Viscarra Rossel & Webster, 2012; Xiao, Li & Feng, 2016b). The VIP values were calculated with VIP method, in the process of PLSR analysis to further evaluate the significance of each wavelength for model prediction (Wold, Sjöström & Eriksson, 2001; Maimaitiyiming et al., 2017; Qi et al., 2017). VIP method often produces the best results in the modeling set because it can distinguish between useful information and inevitable noises in the set. Oussama et al. (2012) adopted this method to reduce almost 75% of the total data set for a simplified model of high accuracy. Additionally, as a simplified regression linear model, SR method not only preserves significant bands but also solves multicollinearity problems effectively (Xiao, Li & Feng, 2016a; Xiao, Li & Feng, 2016b). It has great optimization effect on model complexity by adjusting the significance level of selected and excluded variables (Zhang et al., 2018). Compared with the selection results with VIP method, SR method could be used to extract fewer bands to establish ions (except for K+, CO32− and HCO3−) forecasting models with RPD above 1.80. Therefore, it is meaningful to make further simplification of the model while ensuring its accuracy.
This study clearly demonstrated that VIS-NIR spectral analysis technique is an effective method to detect salt ions content of salinity soil in the irrigated district. In terms of extracting feature wavelengths to estimate ions content, our work provides a comprehensive comparison and evaluation approaches. Such endeavor is critically and practically important to further enhance the model performance of the soil salt ions. The application of machine learning algorithms with strong applicability to solve nonlinear relationship between variables, such as Ant Colony Optimization-interval Partial Least Square (ACO-iPLS), Recursive Feature Elimination based on Support Vector Machine (RF-SVM), and Random Forest (RF) has been proved to be a useful approach to obtain the effective information of soil organic matter (Ding et al., 2018). To further improve the prediction accuracy, the more machine learning algorithms should be applied to the analysis of sensitive spectral regions and the construction of stable models in future study. In addition, the application of multi-source remote sensing platforms such as Landsat, GaoFen-5, Hyperion and unmanned aerial vehicle (UAV) in soil salt ions estimation has not been investigated. Therefore, further research should focus on the possible combination of multiple approaches and remote sensing data at different scales to estimate soil salt ions content.
This study investigated the feasibility of estimating soil water-soluble salt ions content via VIS-NIR spectral model. Different methods were applied to the selection of response bands interval to construct robust inversion models. Among them, VIP method could select larger number of wavebands with the highest accuracy, SR method could select the smallest number of wavebands with good accuracy. However, the number of wavebands obtained using the GC method varied greatly with poor accuracy. The PLSR and SVR models achieved good effects on the modeling and forecasting of most ions content. Moreover, the PLSR model was slightly more than the SVR model in terms of the number of ion models with good predictive effects (RPD over 2.0). The models of Ca2+, Na+, Cl−, Mg2+ and SO42− displayed the highest prediction accuracy, and the RPDs were 3.97, 3.15, 2.98, 2.75 and 2.75, respectively, while those of other ions were poor. Overall, the best wavelength selection methods, models and inversion results of soil salt ions were different. In the future, the combination of band selection methods and spectral model will have a great potential for predicting some soil salt ions content in the salinization area. Such an approach can be utilized to assist decision makers toward the determination of soil salinization levels.