Geographic population genetic structure and diversity of Sophora moorcroftiana based on genotyping-by-sequencing (GBS)

Sophora moorcroftiana is a perennial leguminous low shrub endemic to the Yarlung Zangbo River basin in Tibet with irreplaceable economic and ecological value. To determine the drivers of evolution in this species, 225 individuals belonging to 15 populations from different geographic locations were sampled, and population genetics was studied using high-throughput genotyping-by-sequencing (GBS). Based on genetic diversity analysis, phylogenetic analysis, principal component analysis, and structure analysis, 15 natural populations were clustered into the following five subgroups: subgroup I (Shigatse subgroup) was located in the upper reaches of the Yarlung Zangbo River with a relatively high level of population genetic variation (means for PIC, Shannon and PI were 0.173, 0.326 and 0.0000305, respectively), and gene flow within the subgroup was also high (mean value for Nm was 4.67). Subgroup II (including Pop 7 and Pop 8; means for PIC, Shannon and PI were 0.182, 0.345 and 0.0000321, respectively), located in the middle reaches of the Yarlung Zangbo River had relatively high levels of gene flow with the populations distributed in the upper and lower reaches. The Nm between subgroup II with subgroups I and III was 3.271 and 2.894, respectively. Considering all the genetic diversity indices Pop 8 had relatively high genetic diversity. Subgroup III (the remaining mixed subgroup of Lhasa and Shannan) was located in the middle reaches of the Yarlung Zangbo River and the means for PIC, Shannon and PI were 0.172, 0.324 and 0.0000303, respectively. Subgroup IV (Nyingchi subgroup), located in the lower reaches of the Yarlung Zangbo River basin, showed a further genetic distance from the other subgroups and the means for PIC, Shannon and PI were 0.147, 0.277 and 0.0000263, respectively. Subgroup V (Nyingchi Gongbu Jiangda subgroup), located in the upper reaches of the Niyang River, had the lowest level of genetic variation (means for PIC, Shannon and PI were 0.106, 0.198 and 0.0000187, respectively) and gene flow with other populations (mean value for Nm was 0.42). According to the comprehensive analysis, the S. moorcroftiana populations generally expanded from upstream to downstream and displayed a high level of genetic differentiation in the populations in the upper and lower reaches. There were high levels of gene exchange between the central populations with upstream and downstream populations, and wind-induced seed dispersal was an important factor in the formation of this gene exchange mode.


INTRODUCTION
Molecular markers are extremely useful in plant and animal genetics and genomics. Single nucleotide polymorphism (SNP) genetic markers are one of the key tools for population genetics and quantitative genetics. Genotyping-by-sequencing (GBS) is a high-throughput, multiplex and short-read sequencing approach that reduces genome complexity via restriction enzymes and generates high-density genome-wide markers at a low cost per sample by tagging random DNA fragments shared by different samples with unique, short DNA sequences (barcodes) and pooling samples into a single sequencing channel (Elshire et al., 2011). This approach not only greatly reduces the cost of sequencing but also makes it possible to genotype large samples. GBS is of great significance for understanding the genetic background and phylogeny of germplasm resources (Poland et al., 2012a;Poland et al., 2012b;Poland & Rife, 2012c;Glaubitz et al., 2014). The short-read sequences obtained by GBS can be assembled by a reference genome or nonreference genome to obtain highdensity SNP markers. These SNP markers or developed bin markers can be used for various processes, such as genetic map construction (Ward et al., 2013), genome-wide association studies (GWAS) (Sakiroglu & Brummer, 2017), and genome assembly. At present, GBS technology is an important tool for genotyping and is widely used in genetic linkage mapping, genetic selection (Zhang et al., 2018a;Zhang et al., 2018b), genetic diversity studies (Lu et al., 2013), germplasm identification (Wu et al., 2014), species identification (Pembleton et al., 2016) and other fields.
The Yarlung Zangbo River basin stretches over large areas of the Lhasa, Shigatse and Shannan regions of Tibet and over small areas within the Ngari, Nagqu and Qamdo regions, including 41 counties (cities). The relief of the basin is high in the west, low in the east, high in the north and south, and low in the central reaches. The Yarlung Zangbo River basin is sensitive to the evolution of the eco-environment, as most of the ground surface of the basin is below the lower one-third of the air convection layer and is therefore vulnerable to global climate change. Due to the monsoons and subtropical westerly jet over the plateau, the river valley exhibits a dry, cold and windy climate (Dong et al., 1997). As the orientation of the river valley is nearly parallel to the wind direction, the mountainous terrain greatly increases the wind velocity. Thus, the Yarlung Zangbo River valley has favorable environmental conditions for the development of aeolian sand landforms, including a sand source, a wind driving force, and deposition fields (Li et al., 1999). In total, there were 273,697.54 ha of aeolian sandy land in the Yarlung Zangbo River basin in 2008(Shen et al., 2012. Desertification in Tibet is an urgent problem that must be addressed (Zou et al., 2002). Sophora moorcroftiana is a vital species for desertification control.
S. moorcroftiana (Benth.) Benth. ex Baker (Fabaceae), a diploid (2n =2x =18; (Wang, Liu & Xu, 1995), is acknowledged as a long-lived perennial shrub species endemic to the Qinghai-Tibet Plateau (QTP). This species exists in China Tibet as well as Bhutan, N Burma, and N India (Liu et al., 2006) and is mainly distributed in the middle and upper reaches of the dry valley region of the Yarlung Zangbo River in Tibet with altitudes ranging from 2,800 m to 4,400 m (Li et al., 2017). It plays an irreplaceable role in Tibet as important forage, and the seeds are used in China as a crude drug (Chang, 1977). Furthermore, the species is currently the preferred drought-resistant afforestation tree species due to its strong adaptability to sand burial in the plateau (Zhao, Zhang & Li, 2007), leading to its irreplaceable role in vegetation reestablishment programs aiming to stabilize shifting sand (Liu et al., 1998). Meanwhile, the species has received much attention due to its medicinal properties (Zhang et al., 2018a;Zhang et al., 2018b).
Another Sophora species in the QTP, S. davidii, which is an important leguminous shrub widely distributed in southeastern China (Zhao et al., 2016), is closely related to S. moorcroftiana (Wu, 1983) and is widely distributed from the southeast of the QTP to central China. Although these two species share many characteristics, such as being diploid (2n = 18), insect-pollinated, and gravity-dispersed via propagules, a hypothesis was proposed in which S. moorcroftiana diverged from S. davidii and speciated (Shen, 1996). Many studies have found large differences in the phenotypes of S. moorcroftiana at different altitudes in the Yarlung Zangbo River basin, but few studies focus on its genetics in different locations. Understanding the genetic evolution of S. moorcroftiana has important implications for germplasm conservation and vegetation reconstruction. Even though Liu et al. (2006) has reported the genetic variation in S. moorcroftiana examined using allozyme markers, existing research in this area is still rare. To explore the relationships between the distribution and expansion of S. moorcroftiana and geographical factors, we employed GBS to call SNPs of 15 S. moorcroftiana populations originated from different geographical locations and leveraged them for further analyses.
The specific goals of this study are (1) to determine the genetic characteristics within and among populations; (2) to examine the effects of altitude and longitude on the population genetic characteristics; and (3) to find historical, life history, and/or environmental factors that might explain the patterns and levels of observed genetic variation.

Plant material collection
To determine the correlations between the evolution of S. moorcroftiana populations and geographical factors, we utilized the following sampling design: (1) we selected sampling points along a river but separated by a mountain (Pop 8 and Pop 9, Lhasa River Basin) to verify the relationship between the population expansion and water flow; (2) we selected sampling sites from the highest (Pop 3, Shigatse) altitude to the lowest (Pop 14, Nyingchi) altitude and from the lowest longitude (Pop 1, Shigatse) to the maximum longitude (Pop 15,Nyingchi) where S. moorcroftiana exists to verify the impacts of altitude and longitude on the population genetic characteristics; and (3) we selected sampling sites on different  (Table 1), and the populations were numbered according to longitude. In addition, Sophora davidii (Franch.) (individuals 226-229) was used as the outgroup to prevent ascertainment bias. For DNA extraction, young leaf tissue was collected, immediately frozen in liquid nitrogen, and stored at −80 • C.

DNA extraction and GBS
The DNA of the young leaves of S. moorcroftiana and S. davidii individuals was extracted for GBS using a modified cetyltrimethylammonium bromide (CTAB) extraction method (Tel-Zur et al., 1999). We employed a two-enzyme (MseI and TaqaI) GBS protocol modified from a previously described protocol (Poland et al., 2012a;Poland et al., 2012b) to generate a library consisting of DNA fragments with a barcode and performed sequencing with an Illumina HiSeq PE 150. The average sequencing depth for each sample was 10×.

SNP data analysis
Quality control of the FASTQ-format raw data was performed with the FastQC application (Brown, Pirrung & Mccue, 2017) to ensure that there were no hidden problems. Adapter sequences and abnormal nucleotide bases at the 5 terminus were removed from the raw sequencing reads using FastQC 0.6.0. Additionally, the low  (Wright, 1951), the value of Nm can be estimated from F ST , as where N is the effective population size of each population and m is the migration rate between populations. This method is effective for estimating gene flow indirectly.

Phylogenetic analysis
In the phylogenetic analysis of the 229 samples (

Population structure analysis
Estimated ancestries, derived from multilocus genotype data, can be used to perform statistical correction for population stratification (Alexander, Novembre & Lange, 2009). We used ADMIXTURE, which adopts the likelihood model embedded in STRUCTURE and runs considerably faster than other tools to estimate ancestry, to identify possible subgroups. When K = 5, 6, 7 and 8, the cross validation errors (CV errors) were relatively low, with K = 7 minimizing the CV error (Fig. 3). Clustering information for the 225 S. moorcroftiana individuals when K = 2 to 9 is shown in Fig. 3B. Moving from K = 2 to 9, the large samples from the upper and middle reaches of the Yarlung Zangbo River were first distinguished from the lower and middle reaches, after which the samples from the middle reaches (Shannan, Lhasa) were revealed as distinct. When all the samples were clustered into four taxa, the genetic information of   the Niyang River. Subgroup V was located at high altitudes in the eastern sampling area (Nyingchi, Gongbu Jiangda county) and only contained Pop 13, which was from the upper reaches of the Niyang River. The analyses showed that Pop 13 was obviously separated and differentiated from the other populations. The clustering of populations by PCA roughly conformed to the geographic distribution of the populations.

Population genetic diversity
Relatively low levels of genetic diversity were found in the S. moorcroftiana populations ( Table 2)

Population genetic differentiation
F ST is a measure of genetic differentiation among populations, and studying the genetic structure of a population is essential to understanding its evolutionary properties (Whitlock & Mccauley, 1999). Wright (Wright, 1965) suggested that an F ST ranging from 0 to 0.05 indicates very little genetic differentiation between populations and is thus not worth considering; F ST ranging from 0.05 to 0.15 indicates moderate genetic differentiation between populations; F ST ranging from 0.15 to 0.25 indicates large genetic differentiation between populations; and F ST greater than 0.25 indicates strong genetic differentiation between populations. As shown in Table 3

Gene flow among populations
Mutation, genetic drift due to a finite population size, and natural selection favoring adaptations to local environmental conditions will all lead to the genetic differentiation of local populations, and the movement of gametes, individuals, and even entire populationscollectively called gene flow-will oppose that differentiation (Slatkin, 1987). Slatkin indicated that gene flow may either constrain evolution by preventing adaptation to local conditions or promote evolution by spreading new genes and combinations of genes throughout a species' range. One reason for estimating Nm is that this combination of parameters indicates the relative strengths of gene flow and genetic drift. Genetic drift will result in substantial local differentiation if Nm < 1 but not if Nm > 1 (Wright, 1951). From the F ST , we obtained the gene flow among the 15 populations (

Analysis of genetic diversity and Nm for the four main subgroups
According to the comprehensive results of the phylogenetic analysis, population structure analysis and PCA analysis, 15 natural populations could be clustered into five subgroups. In order to increase the sample size and verify the analysis results, we removed subgroup V (Pop 13), which has large genetic differences with all the other populations and analyzed the genetic diversity and Nm of the following four main subgroups: subgroup I) Pop 1-6, subgroup II) Pop 7 and 8, subgroup III) Pop 9-11, and subgroup IV) Pop 12, 14 and 15. Among all the subgroups, subgroup II had higher genetic diversity (Table 5) and had high levels of gene flow with subgroup I (Nm = 3.271) and subgroup III (Nm = 2.894) ( Table 6). Subgroup IV showed large genetic differentiation with the other subgroups (Nm < 1), especially with subgroup I and subgroup II. These results are consistent with our previous analysis.

DISCUSSION
A study on the arenaceous adaptability of S. moorcroftiana (Zhao, 1998) revealed that its optimal habitats were terraces covered with sand, semi-motive dunes and the initial stages of fixed sand dunes. This species first invaded and became established in motive dunes by seed reproduction, then became abundant by establishing roots and finally declined owing to interspecific competition. A study on the soil seed bank of S. moorcroftiana (Liu, Zhao & Li, 2002) showed that 70% of all seeds were distributed on the surface. Seed dispersal was driven by wind, gravity (slope) and water flow, and the dispersal by water flow was closely related to the landforms and the carrying capacity of flowing water. Seeds dispersed by gravity or wind might have been carried away again by water. The regions alongside the Yarlung Zangbo River are covered with mobile dunes, and the degree of desertification in the Yarlung Zangbo River basin is as follows: Shigatse > Shannan > Lhasa > Nyingchi (Xu, Li & Sun, 2006). In this study, there were high levels of gene flow between populations in Shigatse, which inhibited the genetic differentiation between these populations. Most individuals of S. moorcroftiana grow in desertified areas (Zou et al., 2002), and the population density of S. moorcroftiana increased with increasing altitude (Zhao, Zhang & Li, 2007). Therefore, large seed banks are driven by ubiquitous quicksand, which may lead to this phenomenon. The middle reaches of the Yarlung Zangbo River form the center of social and economic development in Tibet and are severely affected by aeolian sands. Pop 7 and Pop 8 in the Lhasa River Basin, located between subgroup I and subgroup III, had higher levels of genetic variation among all populations and had a close genetic relationship between the two subgroups. This expansion direction is contrary to the direction of seed dispersal by water and gravity. A study of aeolian sandy land in the areas around the Lhasa Airport (Li et al., 2010) revealed the trends of wind direction changes in the areas around the Lhasa Airport from 1980-2006 (Fig. 6), showing that west wind (W), east wind (E) and east-north 22.5 • wind (ENE) were frequent around Lhasa Airport. This pattern may explain why the genetic information of subgroup I and III was shared with subgroup II and why Pop 5 and Pop 6, at higher altitudes, shared genetic information with subgroup III. The research on genetic diversity of 10 populations of S. moorcroftiana near the Yarlung Zangbo River assessed by allozymes (Zhao et al., 2003) revealed that the population in the middle area (eastern Shigatse and western Shannan) harbored the majority of alleles and had high levels of genetic diversity, which is roughly consistent with our results. In the lower reaches of the Yarlung Zangbo River, Nyingchi has better climate and soil conditions and is rich in species and vegetation resources. S. moorcroftiana, which thrives on moving dunes, followed by semifixed dunes, had a declining trend after the sand was fixed (because other species colonized the fixed sand). In addition, there is a negative effect of human pressure and habitat fragmentation. These factors may have resulted in lower population density in Nyingchi and lower gene flow with the populations in other areas. We found significant differences in the flowering period between different regions, which might also have some effect on the gene exchange between different populations. Pop 13 from Gongbu Jiangda county in Nyingchi is located on the eastern slope of Mount Mira in the upper reaches of the Niyang River, which has a temperate and moist climate and high vegetation coverage and species diversity. The environment hinders the spread of seeds and thus causes far genetic distances between Pop 13 and the other populations, except for Pop 12, which shared little genetic information with Pop 13 (potentially due to gravity). In the same basin as Pop 13, Pop 15 is located in the downstream of the Niyang River and is distantly genetically related to upstream Pop 13. The genetic distance between Pop 15 and Pop 13 was greater than the average distance between Pop 13 and the other populations. This finding also indirectly proves the suggested limitations of seed dispersal by water (Liu, Zhao & Li, 2002). Pop 9 is located in the upstream of the Lhasa River on the western slope of Mount Mira. There are clear differences in habitats between the western and eastern slopes of Mount Mira. The western slope has a cold and dry climate, and desertification is severe there. Pop 9 demonstrated extensive gene flow with the populations in adjacent regions, such as Pop 8, Pop 7, and Pop 10. These showed the importance of seed dispersal by gravity, sand and wind.
To explain these results obtained from the GBS SNP dataset, we considered the biotope and overall geographical environment and formulated 3 points on how S. moorcroftiana evolved.
(1) S. moorcroftiana originated in high-altitude areas in the west and expanded to lower altitudes in the east via seed dispersal by gravity, wind, sand drifts and water flow.
(2) Due to the effects of wind direction, the greatest number of seeds was dispersed into the central region mainly from the east, west, east-northeast and west-southwest, which was one of the greatest factors leading to the high genetic variation in Pop 7 and Pop 8, which are located in the central region.
(3) The geographical environment, including the vegetation coverage and the degree of desertification degree, had a strong influence on the expansion of S. moorcroftiana. Pressure from humans may also have a great impact on the genetic characteristics of S. moorcroftiana.
Avise (Avise & Hamrick, 1996) pointed out a lack of concern for the genetic diversity of endemic species, and in recent years, the role of population genetics in plant conservation biology has received much attention (Liu et al., 2006;Shao et al., 2009;Rayamajhi & Sharma, 2018). Research on S. moorcroftiana has focused on its medicinal value (Su et al., 2017), alkaloid production (Ma et al., 2018), forage value, drought resistance (Li et al., 2015) and desertification control value (Zhao et al., 1998;Zhao, Zhang & Li, 2007). However, there have been few studies on S. moorcroftiana population genetics. The Tibetan flora is characterized as a young flora with high endemicity (Wu, 1987). A hypothesis was proposed in which S. moorcroftiana diverged from S. davidii and speciated (Wei & Shou, 1996). Wu hypothesized that the Tibetan flora originated mostly from the paleotropical Tertiary flora in the Hengduan Mountains by adapting to the cold and arid environments associated with the strong uplift of the QTP (Wu, 1987). The results of a study (Cheng et al., 2017) using the cpDNA and ITS of S. moorcroftiana and S. davidii to explore the relationship between the two species support these two hypotheses. This study of genetic diversity indicated that both total genetic diversity and within-population diversity in S. moorcroftiana are low, and suggested that S. moorcroftiana might have undergone serious bottleneck(s) and genetic drift, which might have been caused by the uplift of the QTP. In our study, the genetic diversity of S. moorcroftiana was also generally at a low level, and the relatively high level of genetic variation in the high-altitude areas, such as in all the Shigatse populations, may have resulted from S. moorcroftiana's adapting to harsh environments over a long period of time.
During the field investigation, we found abundant S. moorcroftiana seeds in the sand, and S. moorcroftiana has the reproductive characteristic of first invading and becoming established by seed reproduction and then becoming abundant via root turions. Seed banks have a direct effect on population dynamics (Harper, 1977), and 70% of the S. moorcroftiana seed bank in the middle reach of the Yarlung Zangbo River is distributed on the soil surface without litter (Reichman, 1984). The relatively closed habitat has a dense seed bank, showing a high seed yield of S. moorcroftiana seeds in the Yarlung Zangbo River (Liu, Zhao & Li, 2002). The seeds are dispersed by wind, gravity and water flow (Fei & Ling, 1995). The Yarlung Zangbo River originates in the southwestern part of Tibet and is located in the middle of the Himalayas. Due to the plateau's monsoons and subtropical westerly jet, the river valley exhibits a dry, cold and windy climate (Dong et al., 1997). The altitude gradually decreases from the northwest to the southeast, and the orientation of the river valley is nearly parallel to the wind direction. In the winter and spring, the westerly jets are faster (maximum instantaneous wind speed: 35.2 m/s) and more frequent (Dong, Li & Dong, 1999a). These geographic and climatic conditions, such as wind direction, slope caused by the terrain, and flow direction, result in a dynamic environment and may lead to the expansion of S. moorcroftiana from high-altitude areas in the west to lower-altitude areas in the east. This expansion is consistent with the possible propagation direction of S. moorcroftiana proposed by Liu et al. (2006), but the propagation mode was slightly different. Sand activity is the dominant factor affecting vegetation that grows in sand. In particular, the vegetation on semifixed and semimovable sand dunes is most seriously affected, and the species and coverage of vegetation can reflect the stability of sand sources. Vegetation succession proceeds based on sand activity (Chang et al., 2006), and strong winds are a major force affecting sand drift. When winds are strong, the movement of the sand dunes is very rapid (Liu & Zhao, 2001). The movement of seeds caused by sand drift and wind may have led to the high gene exchange flow of subgroup II with subgroup I and subgroup III, which are located in the central region. The resources with a high level of genetic diversity should be specifically protected and sustainably exploited to enable adaptation to and improvement of extreme environments. The location and environment of Gongbu Jiangda county may have caused the S. moorcroftiana individuals in this area to be relatively isolated, leading to a low level of gene flow and high levels of differentiation with other populations. Owing to the abundant rain and suitable temperature in Nyingchi, there is a high level of species diversity and a high ratio of vegetation coverage in this area. As a result of the low interspecific competitiveness of S. moorcroftiana, it is not the dominant species in Nyingchi. A lower population density and gene flow and human impacts may have led to the lower genetic diversity in Nyingchi. A low population genetic diversity is associated with a low ability to withstand threats (Spielman, Brook & Frankham, 2004). Thus, the protection of such populations should be strengthened.
The genetic characteristics and structure of S. moorcroftiana are related not only to climate and the geographical environment but also to insect-mediated pollen flow. Pollinator activity as well as dye or pollen dispersal are positively affected by plant population size, density or both (Rossum & Triest, 2010;Nattero et al., 2011), and most pollen is dispersed over short distances to the first few flowers visited and only occasionally over long ranges (leptokurtic distribution, (Thomson & Plowright, 1980;Holmquist, Mitchell & Karron, 2012). The habitat fragmentation caused by human activities such as overexploitation (Young, Boyle & Brown, 1996;Dong & Li, 1999b), grazing, cultivation and road construction also affects the genetic traits of S. moorcroftiana. Llorens, Ayre & Whelan (2018) indicated that anthropogenic fragmentation may not alter pre-existing patterns of genetic diversity and differentiation in perennial shrubs. Therefore, the main factors affecting the evolutionary history and genetic relationship of S. moorcroftiana are climate and the geographical environment. S. moorcroftiana is of great significance to Tibet and other desertified areas, and artificial planting and protection of this species should thus be promoted.

CONCLUSIONS
Through previous studies, we knew that S. moorcroftiana diverged from S. davidii and speciated in the uplift of the QTP. The driving forces of S. moorcroftiana seed dispersal were wind, gravity (slope) and water flow. Furthermore, the unique geographical environment of the QTP caused S. moorcroftiana to expand from the western high-altitude to the eastern low-altitude areas and from the south and north areas to the Yarlung Zangbo River bank. Our research found that there was a high-level of east-west differentiation in S. moorcroftiana populations in the Yarlung Zangbo River basin due to the different geographical environment. The aeolian desertification of the Yarlung Zangbo River Basin is severe, and seed dispersal driven by wind and sand flow is one of the main factors for the large gene flow between the central groups and eastern and western groups. Thus, seed dispersal caused by gravity, wind, sand flow and geographical distance are the main driving forces contributing to the diffusion pattern of S. moorcroftiana populations. Although S. moorcroftiana seeds expansion can be driven by water flow, water flow is limited because it is closely related to the landforms and the carrying capacity of flowing water. The barriers between mountains and the competition between S. moorcroftiana and other species are disadvantages for the expansion of S. moorcroftiana. Considering that S. moorcroftiana is of great value in controlling aeolian desertification in Tibet (especially for motive dune and semi-motive dune lands), we should spare no effort to protect and utilize genetic diversity and germplasm resources.