Domestication of horses has always attracted scientific interests as the animals played an important role in the civilization of humans in ancient times. Archaeological evidence indicated that the earliest domestication occurred in the Eurasian Steppe about 5,800 years ago (Outram & Stear, 2009). However, detailed information about the whole domestication process, including the subsequent domestication of horses, and the genetic origins of modern horses are still largely debated and need to be illustrated (Vilà & Leonard, 2001; Jansen & Forster, 2002; McGahern & Bower, 2006). Attributed to rapid molecular evolution and maternal inheritance, mitochondrial DNA (mtDNA) has been widely employed in phylogenetic studies of domestic horses, which revealed some valuable clues about the origins of modern horses (Lippold & Matzke, 2011; Achilli & Olivieri, 2012).
Due to the high number of polymorphic sites in the hypervariable regions of mtDNA, the first hypervariable region (HVR-1) was regarded as a useful tool for phylogenetic and phylogeographic studies. Vilà & Leonard (2001) constructed a neighbor-joining (NJ) tree using a total of 191 contemporary horses sampled across the world, in addition with eight ancient horse samples. The phylogenetic analyses uncovered at least six genetic clusters. Based on the results of Vilá et al., Jansen et al. studied the HVR-1 sequences of 318 horses from 25 breeds, and found 17 haplotype groups (Jansen & Forster, 2002). Subsequently, McGahern & Bower (2006) identified two new haplotypes in groups A and F, and expanded the number of HVR-1 haplotypes to 19. To obtain more accurate results of phylogenetic analyses, whole genomes of horse mitochondria have been used in the evolutionary studies of horses in recent years. In addition, Lippold & Matzke (2011) investigated the whole mitochondrial genomes of 59 domestic horses from 44 breeds and a Przewalskii horse (Equus przewalskii: a wild sub-species of horse). The result uncovered at least 46 mtDNA lineages in domestic horses, of which 73% already existed prior to horse domestication (Lippold & Matzke, 2011). Achilli & Olivieri (2012) analyzed 83 mitochondrial genomes of modern horses across Asia, Europe, and America. The results revealed 18 major haplogroups (A–R) and a more accurate haplotype divergence time. The aforementioned studies supported the hypothesis of multiple origins of domestic horse maternal lineages. However, some studies (e.g., Lei & Su, 2009; Yang & Zhu, 2017) proposed that certain horse lineages may have origins outside of the Eurasian Steppe. This suggests that subsequent domestication occurred after the first domestication event, which is supported by the fact that modern horses possess rich maternal lineages, referring to multiple origins of maternal lineages of the domesticated horses. If there was only one origin of domesticated horses, the diversity of mtDNA from the domesticated horses would be rather low. Contrary to the high diversity of mtDNA, extremely low diversity of DNA sequences of horse Y chromosome was detected in modern horses (Lindgren & Backström, 2004; Ling & Ma, 2010; Wallner & Piumi, 2004; Wallner & Vogl, 2013). Very limited horse paternal lineages can be attributed to extensive selection imposed on male horses after the domestication as a previous study of ancient DNA revealed abundant Y chromosome diversity in wild horses, which suggested that both male and female horse populations harbored rich genetic diversity before domestication (Lippold & Knapp, 2011).
Most previous phylogenetic studies on horses focused on horses from western countries. Although there are more than 5 million indigenous horses in China, the investigation of the origins of the local horses is still insufficient yet. Some studies indicated that the F haplogroup of horse HVR-1 may have originated in East Asia (Lei & Su, 2009; Cai & Tang, 2009). Recently, Yang et al. conducted a phylogenetic study on Chinese indigenous horses with HVR-1 sequences and found two new haplogroups, H and I. This study leads to increase in the number of the HVR-1 haplogroups of horse to 21, and yet, suggests East Asia origin for the two new haplogroups (Yang & Zhu, 2017). However, all of the above studies were confined to the HVR-1 of mtDNA.
The short mtDNA segment of HVR-1 is often accompanied by high levels of recurrent mutations, which blur the structure of the phylogenetic tree and render the distinction between some important branches within the tree virtually impossible (Achilli & Bonfiglio, 2009; Taberlet & Valentini, 2008; Torroni & Achilli, 2006). To address this issue as well as to improve the resolution of the matrilineal genealogical analyses, we combined the whole mtDNA genomes and HVR-1 sequences. Using this approach, we established the corresponding relationship between the haplogroups of mitochondrial genome and HVR-1 sequences and further analyzed the geographic distribution of the maternal lineages of the modern horses. Our analysis shows that East Asia was a genetic pool that may have provided the maternal lineages for domestic horses.
Materials and Methods
Samples and DNA extraction
Blood samples of 31 horses were collected from 14 Chinese domestic breeds across northern, northwestern and southwestern China (Table S1). The total genomic DNA was extracted using the phenol/chloroform extraction method. The total genomic DNA samples were further diluted to 50 ng/µl for the polymerase chain reaction (PCR) and DNA sequencing. All experimental procedures and protocols for sample collections were approved by the State Key Laboratory for Agro-biotechnology of CAU (Permit Number: XK257).
PCR amplification and mitochondrial DNA sequences
To amplify the whole mitochondrial DNA sequences of the Chinese horses, primers were designed according to the published sequence X79547 (Xu & Arnason, 1994) (Table S2). All of PCR amplifications were conducted in a 20 µl volume containing 10 µl MasterMix (TIANGEN, Beijing, China), 8 µl double distilled water (ddH20), 0.5 µl each primer (20 pmol/µl), and approximately 50 ng DNA. The PCRs were carried out as follows: 10 min of initial denaturation at 95 °C, followed by 30 cycles of 30 s at 95 °C, 60 s at annealing temperature (Table S2), and 30 s at 72 °C, with a final extension of 10 min at 72 °C. PCR products were directly sequenced using ABI 3730xl Genetic analyzer (Applied Biosystems, Foster City, CA, USA) by BGI-Tech Company (Beijing, China). In cases where samples were not amplified in the first attempt, re-amplification was performed. Sequences were deposited in the GenBank and accession numbers were listed in Table S1 .
We evaluated 31 de novo mitochondrial genomes with 317 others from the GenBank plus 16 sequences belonging to 16 Przewalskii horses (Equus przewalskii). The related information of the whole mtDNA sequences, including the geographical distribution and breed/type (where applicable), was provided in Table S3.
The HVR-1 sequences (5,180 sequences) in our study include: (1) 31 HVR-1 sequences isolated from the Chinese indigenous horses sequenced in this study (Table S1); (2) 317 HVR-1 sequences truncated from the whole mtDNA sequences obtained from Genbank (Table S4); and (3) 4,832 sequences retrieved from GenBank (Table S5).
Sequencing data analysis
By using MitoToolPy (http://www.mitotool.org/mp.html) (Peng & Fan, 2015), we scored the variants for each sequence (whole mtDNA genome sequences) relative to the reference sequence JN398377 (Achilli & Olivieri, 2012). We assigned the mtDNA sequences into specific haplogroups according to DomeTree (http://www.dometree.org/trees/horse.htm) (Peng & Fan, 2015). We followed the caveats for quality control in mtDNA dataset of domestic animals (Shi & Fan, 2014).
To depict the phylogeographic patterns for whole mtDNA sequence, we assigned each of the 348 whole mtDNA sequences into haplogroups in context of updated haplogroup tree according to DomeTree (http://www.dometree.org/trees/horse.htm) (Peng & Fan, 2015). This tree encompasses 348 sequences and was rooted with donkey (Equus asinus) sequence.
NETWORK 5.0 (http://www.fluxus-engineering.com) was used for calculation of the median joining network for HVR-1 sequences. Calculations were performed under the following conditions: (1) default setting of Epsilon (0) was chosen; (2) three mutational hotspot sites 15585,15597 and 15650 were set as zero, and two mutational hotspot sites 15659,15737 were downweighted (weight :5) (Jansen & Forster, 2002; Cieslak & Pruvost, 2010); (3) the “>1 frequency” setting in MJ calculation was activated and MP Calculation option was conducted to remove excessive links and median vectors; and (4) other parameter settings were maintained as defaults.
Haplotype partitioning was quantified by DnaSP v5.0 software (Librado & Rozas, 2009). To verify the results of Network, we also assigned each of the haplotypes of HVR-1 sequences into haplogroups in context of updated mitogenome haplogroup tree according to DomeTree (http://www.dometree.org/trees/horse.htm) (Peng & Fan, 2015).
Using 348 mtDNA genomes, we inferred the time of the most recent common ancestor of the haplogroups as employed in BEAST v1.8 (Drummond & Rambaut, 2007; Drummond & Suchard, 2012). We used GTR+I+G substitution model selected by jModelTest 2.1.4 (Darriba & Taboada, 2012; Guindon & Gascuel, 2003). The mutation rate of 7.39 × 10−2 mutations/site/Mya (95% confidence interval 2.49 × 10−2–1.60 × 10−1) (Lippold & Matzke, 2011) was used under both strict and relaxed clock models (Drummond & Rambaut, 2005). Each Markov chain Monte Carlo was run for 60,000,000 generations and the first 10% of each run was discarded as burn-in, and sampled every 1,000 generations. We confirmed convergence with Effective Sample Size values >300 in Tracer v1.6 (Drummond & Rambaut, 2007). We then used TreeAnnotator v1.6.1 implemented in the BEAST for the phylogenetic tree summaries and viewed the final tree with attributes such as posterior, node age, and height median in FigTree v1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/).
Haplogroup tree and network
The analysis of whole mtDNA sequences revealed 16 clades (Aw-Rw) according to DomeTree (http://www.dometree.org/trees/horse.htm) (Peng & Fan, 2015) (Fig. 1). The system of nomenclature of those haplogroups was in accordance with Achilli & Olivieri (2012). The result showed that horses from the same breeds and/or regions were widely distributed within the clades. Further, the resulting matrilineal genealogy does not show evidence of strong concordance with breeds and/or regions. This therefore may support the hypothesis that horse may have multiple origins of matrilineal inheritance.
The alignment of the 5,180 HVR-1 sequences produced 397 haplotypes. We identified three strong mutational hotspots which have been previously described at nucleotide positions 15585, 15597, and 15650 (Vilà & Leonard, 2001; Jansen & Forster, 2002). These positions were not included in our phylogenetic analyses. In addition, two mutational hotspots 15659, 15737 were individually down-weighted (weight 5). Once these mutational hotspots were eliminated from the dataset, we divided the sequences into nine haplogroups from A to I based on the structure of the network (Fig. 2), of which haplogroups H and I were identified in our previous study (Yang & Zhu, 2017), and the four clusters B3, D5, D6, and G1 were newly observed in the present study.
Based on the result of Network we found a small proportion of HVR-1 haplogroups, which were out of the scope of A-I haplogroups. For completeness those we assigned each of the haplotypes of HVR-1 sequences into haplogroups in context of updated mitogenome haplogroup tree according to DomeTree (http://www.dometree.org/trees/horse.htm) (Peng & Fan, 2015), results were showed in Fig. 3 and Table S5.
Corresponding relationships between haplogroups of mtDNA HVR-1 and whole genomes
To establish the relationships between the haplotypes defined from HVR-1 and whole mtDNA sequences, we retrieved the HVR-1 sequences from the 348 whole mitochondrial genomes and conducted phylogenetic analysis (Table S4). A clear corresponding relationship was revealed when comparing the haplotypes of HVR-1 and mitochondrial genomes of all the samples in Table 1(except three Przewalskii horse and one ancient Yakut horse). A haplogroup of the genome corresponded to one or more of its HVR-1 sequence counterparts. The haplogroup A is the clade containing the largest number of haplotypes defined on the basis of HVR-1, but its sub-haplogroups clustered into 6 haplogroups of mitochondrial genomes. Lw contained three sub-haplogroups of HVR-1 and is the largest haplogroup containing most haplotypes of the mitochondrial genome.
haplogroup of whole mtDNA sequence
haplogroup of HVR-1 sequence
Geographic distributions of horse mtDNA haplogroups
The analysis of geographic distribution of the mitochondrial genome haplogroups showed that horse populations in Europe or East Asia included all haplogroups defined from the mtDNA genome sequences. The lineage Fw comprised entirely of Przewalskii horses. The two haplogroups Iw and Lw displayed frequency peaks in Europe (14.08% and 37.32%, respectively) and a decline to the east (9.33% and 8.00% in the West Asia, and 6.45% and 12.90% in East Asia, respectively), especially for Lw, which contained the largest number of European horses (Table 2). However, an opposite distribution pattern was observed for haplogroups Aw, Hw, Jw, and Rw, which were harbored by more horses from East Asia than those from other regions. The proportions of horses from East Asia for the four haplogroups were 38%, 88%, 62%, and 54%, respectively.
|Clade of mtDNA genome||East Asia||West Asiaa||Europe|
Considering the large amount of HVR-1 sequence data of horse mtDNA currently available in the literatures or deposited in GenBank, we also analyzed the geographic distributions of the HVR-1 haplogroups (Table 3). We further confirmed our geographic distributions of the whole mtDNA haplogroups by analyzing the known mitochondrial HVR-1 sequences, using the corresponding relationships between the two sets of haplogroups (Table 1). The results showed that haplogroups D1, D2, and D3, which correspond to haplotype Lw of the whole mitochondrial sequences, displayed the highest frequencies in European horses, while I(corresponding to Jw), G1 (corresponding to Rw) showed frequency peaks in East Asia and a decline from Asia to Europe (Table 3). However, the patterns of geographic distribution of Aw and Hw were not supported by HVR-1 data. It should be noted that Cw contains H and majority of haplotypes in the HVR-1 haplogroup A6, which had a frequency peak in the West Asian regions. It is in accordance with geographic distribution of haplogroup A6, which is the most ancient haplogroup defined by HVR-1 sequences and contains the highest proportion of ancient horse sequences. However, haplogroup H has the highest frequency in the horse populations of East Asia.
|HVR-1 haplotype||East Asia||West Asiaa||Europe||Ancient|
Divergence times of haplogroups Cw, Jw, and Rw
The results showed that modern horses shared a common ancestor around 99,800 years ago. The phylogenetic tree for the estimation of the divergence time had 95% credibility intervals for the main branches. The resulting tree topology (Fig. 4) was very similar to the results obtained with the haplogroup tree of mtDNA genome sequences. Almost half of the haplotypes formed before the widely regarded time of the earliest horse domestication (approximately 5,800 years ago) (Outram & Stear, 2009).
From the phylogenetic analysis of both mitochondrial HVR-1 and genome sequences, haplogroups Jw, Rw, and some maternal lineage of Cw (H haplogroup defined by HVR-1) showed high frequency in East Asia. To further investigate their possible origins, we determined their divergence time. The age of haplogroup Cw was estimated to be 7.8 thousand years (KYA; 95% highest posterior density (HPD): 1,000–18,800 years), while the age of haplogroup Jw was 13.1 thousand years (KYA; 95% highest posterior density (HPD): 1,700–32,600 years) and haplogroup Rw has an age more than 6.7 thousand years (KYA; 95% highest posterior density (HPD): 900–16,600 years)(Table S6).
Polymorphism in mtDNA sequences have been widely used in the study of evolution due to its highly variable sites, multiple copy numbers, as well as ease of obtaining mtDNA from various samples. Mitochondrial HVR-1 sequences have been used in most studies on horse phylogenetics in the past, which revealed a high level of diversity harbored by maternal lineages of domestic horses that were contrary to the extremely low polymorphisms detected in the horse Y chromosome (Ling & Ma, 2010; Lindgren & Backström, 2004). This suggested that there were multi-origins of horse maternal lineages (Vilà & Leonard, 2001; Cai & Tang, 2009). These studies also showed the geographic patterns of some haplotypes of the HVR-1 sequences, which indicated that the events of horse domestication or the introgression of wild horses occurred in several places across the Asia-Europe continents (Lei & Su, 2009; Yang & Zhu, 2017).
However, the short sequence of the HVR-1 and relatively limited number of phylogenetically informative sites lead to low statistical support for most nodes of the phylogenetic tree of the domestic horses, which remains unresolved. To overcome this, whole mtDNA sequences were used in horse maternal evolutionary studies in recent years. This has resulted in high phylogenetic resolution and estimation of precise dates of divergence (Lippold & Matzke, 2011; Achilli & Olivieri, 2012). Our phylogenetic analysis of horse mitochondrial genomes revealed 18 haplogroups. We also named the haplogroups according to the study of Achilli & Olivieri (2012).
Studies (e.g., Lei & Su, 2009; Ding & Oskarsson, 2011) have indicated that the frequencies of certain haplotypes demonstrated a gradient in the geographical distribution and that areas with the highest frequencies may be related to the original domestication of the lineages. Previous analysis of the geographic distribution of the mitochondrial genome haplogroups showed that Lw had a frequency peak in Europe, suggesting that this haplogroup may have origins in Europe (Lippold & Matzke, 2011; Achilli & Olivieri, 2012). Our results also indicated that Lw was the most frequent haplogroup in European horses, which supported the previous hypothesis of the European origins of Lw. Furthermore, we found that haplogroups Jw and Rw had frequency summits in the horses from East Asia. In our study, the three HVR-1 haplogroups D1, D2, and D3, corresponding to Lw, which were closely clustered together (Fig. 2), showed a decline of frequency from Europe to Asia, which were in accordance with the results from previous studies on mitochondrial genome and HVR-1 sequences (Achilli & Olivieri, 2012; Lei & Su, 2009; Cai & Tang, 2009), while the HVR-1 haplogroup I (corresponding to Jw), and G1 (corresponding to Rw), and H (represented by Cw), all showed a frequency peak in East Asia and declining frequencies to the west. Most horses that harbored the HVR-1 haplotypes H and I, which were represented by Cw and Jw, respectively, were indigenous horses of the South of China, which agrees with the results of our previous study (Yang & Zhu, 2017).
The analysis of divergence time showed that more than half of the maternal lineages formed before horse domestication, which supported the previous hypothesis of multi-origins of modern horses. Many haplotypes in Cw, Jw, and Rw also had divergence times before the domestication of wild horses (about 5,800 years ago), and most of those lineages belong to East Asia, especially distributing in south of China, the mountainous areas in the south of Yangtze River (Table S3), which may have served as places of refuge for wild horses during the Last Glacial Maximum (25–19.5 KYA) and the Younger Dryas (12.7–11.5 KYA) (Clutton-brock, 1990; Macfadden, 1995). These horses may have been domesticated later and contributed to the genetic pools of domestic horse maternal lineages.
Our study supports the hypothesis that the multiple origins of the maternal lineages of domestic horses and some maternal lineages of domestic horses may have originated from East Asia.