Genetic diversity and population structure of two subspecies of western honey bees (Apis mellifera L.) in the Republic of South Africa as revealed by microsatellite genotyping

Apis mellifera scutellata and Apis mellifera capensis, two native subspecies of western honey bees in the Republic of South Africa (RSA), are important to beekeepers in their native region because beekeepers use these bees for honey production and pollination purposes. Additionally, both bees are important invasive pests outside of their native ranges. Recently, whole mitogenome sequencing and single nucleotide polymorphisms were used to study their genetic diversity. To add to our knowledge of the molecular ecology of both bees, we tested the ability of microsatellites to be used as a tool to discriminate between A.m. capensis and A.m. scutellata. We analyzed the genetic variability and overall population structure of both bee subspecies and hybrids of the two by genotyping individuals collected from RSA (N = 813 bees from 75 apiaries) at 19 microsatellite DNA loci. Overall, populations averaged between 9.2 and 11.3 alleles per locus, with unbiased heterozygosity values ranging from 0.81 to 0.86 per population. Bayesian clustering analyses revealed two distinct evolutionary units, though the results did not match those of earlier morphometric and molecular analyses. This suggests that the microsatellites we tested were not sufficient for subspecies identification purposes, especially for Cape and hybrid bees. Nevertheless, the microsatellite data highlight the considerable genetic diversity within both populations and a larger-than-expected hybridization zone between the natural distributions of A.m. capensis and A.m. scutellata.


INTRODUCTION
Apis mellifera L. (western honey bees) are eusocial bees native to Eurasia and Africa. A. mellifera is one of nine honey bee (Apis spp.) species, with the other eight species being work (Eimanifar et al., 2018a(Eimanifar et al., , 2018b, highlight the high genetic diversity of the honey bee populations inhabiting the RSA.

Honey bee sampling and DNA extraction
In April/May 2013 and April 2014, a total of 60+ bees were collected from each of over 1,100 managed honey bee colonies located in 75 different apiaries grouped into 29 geographical regions (1-3 apiaries/region) across the A.m. scutellata and A.m. capensis distribution in RSA (Mortensen et al., 2016). A region was defined as a cluster of 1-3 apiaries that were located close to one another (i.e., ∼ 5-50 km between apiaries) or was otherwise isolated by itself in instances where a region was composed of only one apiary. The regions were named for the closest city, town, village, or other geographically defined, nearby area. Detailed information on the sampling regions and apiaries, the number of bees analyzed per apiary and the geographical coordinates of the sampling regions are shown in Table 1.
The bees were collected and stored by colony in a 50 ml tube containing ≥95% ethanol. The samples were transported (per USDA-APHIS regulations) to the Honey Bee Research and Extension Laboratory at the University of Florida where the identity of each colony (A.m. capensis, A.m. scutellata or hybrid) was determined using classical morphometrics conducted on at least ten bees from each colony (for procedure, Bustamante, Baiser & Ellis, in press). The results of the morphometric classification are shown in Table S1. Colony identity was cross-checked using a distribution map published by Hepburn & Radloff (1998). For later analysis, the sample tubes of bees were kept at −80 C.
For this study, 813 worker bees were sampled from 101 selected colonies across the 75 different apiaries, with 22-33 bees used per region (5-10 bees per colony). The DNA was extracted from the dissected thoraces of each individual bee using Wizard Ò Genomic DNA Purification kit (Promega, Durham, NC, USA) according to the manufacturer's instructions. DNA quality was assessed using a 1% agarose gel and quantified using Qubit Ò 3.0 Fluorometer based on manufacturer's guidelines (Thermo Scientific Inc., Waltham, MA, USA).

Microsatellite genotyping
Nineteen microsatellite loci were selected from previous studies and deemed appropriate candidates for the genotyping of the target bee populations (Shaibi, Lattorff & Moritz, 2008;Chen et al., 2005;Alburaki et al., 2013;Rowe et al., 1997;Solignac et al., 2003;Weinstock et al., 2006). The forward primers were labeled at the 5′ end with one of four fluorescent dyes (FAM, VIC, PET and NED; Applied Biosystems, Foster, CA, USA) and distributed into six multiplex PCR reactions. A list of labeled primers and associated information is included in Table S2. PCR reactions were performed in a thermal cycler (Eppendorf, Hamburg, Germany) in 20 µl volumes containing 10 µl of Master Mix Maxima Hot Start 2X Qiagen, 0.6 µl of each primer (10 pmol/µl) and one µl of DNA (>10 ng/µl). The PCR program started with a denaturation phase at 95 C for 5 min, Table 1 Summary information for honey bee samples collected in the Republic of South Africa. Apiary No = 75 total apiaries sampled. Sampling regions/apiaries = city/town closest to all apiaries in the region + apiary in/around that city (A-C apiaries). Apiary identifier = code generated from two-letter city/town abbreviation and apiary letter. N = number of honey bees examined in a given region. Geographical coordinates = GPS location of each apiary.

Apiary no.
Sampling regions/apiaries Apiary identifier N Geographical coordinates  (Catalog # 4311320; Life Technologies, Grand Island, NY, USA) was mixed with 10 µl of the GeneScan TM 600 LIZ size standard (Catalog # 1406056; Life Technologies, Grand Island, NY, USA). A volume of 10 µl of the solution was mixed with one µl of PCR product per sample and distributed in a 96 well plate. Fragment lengths were determined using a 3730 Ã 1 DNA Analyzer (Model 325-0020; Applied Biosystems, Foster City, CA, USA) and alleles were scored using GeneMarker version 2.4.0 (Applied Biosystems, Foster City, CA, USA) (Hulce et al., 2011). All electropherograms were checked manually and confirmed for further statistical analyses.

Statistical analyses
Population genetic analyses MICRO-CHECKER 2.2.3 was used to identify the likelihood of scoring errors due to null alleles, stutter bands and large allele dropout (Van Oosterhout et al., 2004).
We genotyped 10% of the samples twice to evaluate the rate of genotyping errors. Microsatellite diversity was evaluated by calculating the mean number of alleles per locus (Na), the number of effective alleles (Ne), observed (H obs ) and expected (H exp ) heterozygosities, unbiased expected heterozygosity (uHe), Shannon index (I) per sampling region, and the number of private alleles per region (NP) using GenAlEx 6.5 (Peakall & Smouse, 2012). The inbreeding coefficient (F IS ) was estimated based on Wrights F-statistics using GenAlEx 6.5 (Peakall & Smouse, 2012). Allelic richness (Ar) per sampling region was calculated with FSTAT 2.9.3 (Goudet, 2001) using the rarefaction method. Bee samples from BW, CD, PE, and WD were not included in the population diversity analyses because the microsatellites assigned them to bee groups that were different from those to which they were assigned using single nucleotide polymorphisms (SNPs) (Eimanifar et al., 2018b) and morphometrics (Bustamante, Baiser & Ellis, in press). Thus, their identities were questionable, making their assignment to a bee group for population analysis injudicious. The deviation of genetic markers from Hardy-Weinberg equilibrium was examined for each combination of locus and region based on the exact test following Markov chain parameters including dememorization = 5,000, batches = 5,000, iterations per batch = 1,000 (Guo & Thompson, 1992) as implemented in Genepop 4.2 (Raymond & Rousset, 1995). Overall population differentiation indices were calculated among regions (again, excluding bees from BW, CD, PE and WD) based on Wrights F-statistics index (F ST ) using GenAlEx 6.5 (Peakall & Smouse, 2012). We also calculated Hedrick's G' ST and Jost's D for each subspecies as genetic differentiation indicated by F ST may remain undetected for highly variable, multi-allelic markers such as microsatellites (Meirmans & Hedrick, 2011). G' ST is the original G ST as defined by Nei (1978) and standardized by the maximum value it can obtain (G ST (max) ) (Hedrick, 2005). Jost's D is calculated based on the effective number of alleles instead of heterozygosity, which is considered a more intuitive diversity estimate (Jost, 2008). All three pairwise genetic differentiation metrics were calculated and their significance was inferred based on 9,999 permutations in GenAlEx 6.5 (Peakall & Smouse, 2012). The overall genetic differentiation indices (F ST , G' ST and Jost's D) were compared statistically between subspecies using a one-way ANOVA test as implemented in SPSS Statistica (IBM Corp, 2013). A significance value of 95% for confidence intervals was applied to analyze the pairwise data set. The matrix of genetic distance was calculated based on Nei index among regions using GenAlEx 6.5 (Peakall & Smouse, 2012).
Partitioning of genetic variation between and within regions was determined by a hierarchical analysis of molecular variance in Arlequin v. 3.5 package (Excoffier & Lischer, 2010). The significance of these values was examined based on 9,999 permutations. The structure of defining regions was adjusted based on clustering patterns of honey bees distributions described by Hepburn & Radloff (1998).

Bayesian population structure analysis
Analyses of genetic clustering among regions were performed using STRUCTURE 2.3.4 (Pritchard, Stephens & Donnelly, 2000). An admixture model with correlated allele frequencies were assumed. We applied a number of clusters (K), varying from 1 to 10, with five independent runs per each K to estimate the most reliable number of genetic clusters (K) using the posterior probability (LnP (N/K)) (Falush, Stephens & Pritchard, 2007) and ad hoc quantity DK for each K partition. We conducted the analysis with no prior information about population identity with the following parameters: 100,000 pre-burn steps and 750,000 post-burn iterations of the MCMC algorithm for each run. Posterior probability changes with respect to K between different runs were assigned as a method for determining the true K value (Evanno, Regnaut & Goudet, 2005). The most likely value for K was estimated applying Evanno's ΔK method (Evanno, Regnaut & Goudet, 2005) using STRUCTURE HARVESTER (Earl & Von Holdt, 2012). We used individual Q matrices to visualize structure bar plots using STRUCTURE PLOT (Ramasamy et al., 2014).

RESULTS
The result from MICRO-CHECKER revealed no issues with scoring alleles due to stutters or allelic dropout in any of the 19 loci. We detected the occurrence of homozygote excess for three loci (UN351, HB-SEX-01 and HB-THE-03), likely indicating the occurrence of null alleles. Because of this, all datasets were rerun excluding these loci. In each case, the results were similar to those obtained using all 19 loci, so we included all loci in the analyses. In order to estimate population genetic indices, we divided all regions based on subspecies identity determined by Eimanifar et al. (2018b) and Bustamante, Baiser & Ellis (in press). The results of the population genetics indices are depicted in Table 2 for each region.
The mean number of alleles per locus was 10.01 ± 0.11 (mean ± s.e. here and hereafter) and varied from 5.8 (A24) to 13.44 (A107). The mean number of alleles for A.m. scutellata (six regions), A.m. capensis (11 regions) and hybrid populations (eight regions) was 10.23 ± 0.42, 9.94 ± 0.6 and 9.96 ± 0.8, respectively. The H exp value ranged from 0.72 at locus A24 to 0.87 at locus A14, while the H obs value ranged from 0.35 at locus HB-SEX-01 to 0.88 at locus AC6. The H exp value across all loci was highest in the KL region, with an average of 0.84. The lowest H exp values were found in the VR, CT, GE, LA and SF regions, with an average of 0.8, respectively. Observed heterozygosity ranged from 0.7 in the SW region to 0.8 in the RD region. Mean H exp and H obs values were 0.81 and 0.75, respectively. The highest and lowest unbiased expected heterozygosities were found in KL (0.86), and CT/GE (0.81), respectively. Multilocus values of F IS per region ranged from Table 2 Population genetic characteristics, determined using 19 microsatellite loci of honey bees sampled from 25 geographical regions in the Republic of South Africa (excluding bees from BW, CD, WD and PE-see text). Region abbreviations are explained in Table 1. Bee identity was determined using Eimanifar et al. (2018aEimanifar et al. ( , 2018b and (Bustamante, Baiser & Ellis, in press). Abbreviations: N a , mean number of observed allele per locus; N e , mean number of effective population size; Ar, mean allelic richness; H obs , observed heterozygosity; H exp , expected heterozygosity; uHe, mean unbiased estimate of expected heterozygosity; I, Shannon index; F IS ; Fixation index and N P (%), percentage of mean number of private alleles per region. The standard deviation is indicated in parentheses. The data are the mean with s.e. below in parentheses (Mean (s.e.)).

Pop.
Na 0.03 (PT and RD regions) to 0.14 (SW region), and the overall value was significantly positive 0.08 ± 0.008. Allelic richness among regions of A.m. scutellata and capensis varied from 7.84 (CT region) to 9.3 (OD region). In hybrid regions, it varied from 7.59 (SF region) to 9.1 (KL region) ( Table 2; Fig. 1). Hardy-Weinberg equilibrium tests were performed for each region and each locus. No regions or loci deviated from Hardy-Weinberg equilibrium (P < 0.05). The percentage of the mean number of private alleles varied from 0 (KR, BD, CT, LB, LA, OD, and GR regions) to 26.3% (PT region). The percent mean number of private alleles in A.m. scutellata, A.m. capensis, and hybrid bees was 11.4%, 4.79% and 11.2% respectively ( Table 2). The highest and the lowest value of genetic distance was observed between UP-RD and MF-ST (0.68), and BD-RD (0.30), respectively (Table S3). A.m. capensis had higher F ST , G' ST and Jost's D values than did A.m. scutellata and hybrid bees but there was no significant difference observed between them (Table 3).
The AMOVA results indicated that most of the genetic variation (78.97%) was attributed within populations (i.e., between bees within a region) with only 5.05% of the variation occurring between the 25 regional populations (Table 4). In the STRUCTURE analysis, the most likely number of clusters across regions was estimated based on likelihood and Delta K scores across regions. Two genetically heterogeneous clusters were detected by STRUCUTRE as the value of Delta K was greatest at K = 2. The average value of posterior probabilities and Delta K for each K are shown in Table 5. The results suggest that the sampled honey bees belong to two large genetic groups at the subspecies level, with evidence of admixture patterns across the regions of the two defined populations (Fig. 2). The regions were assigned with unequal probability to each genetic group (Fig. 3). When assigning all genotypes at the subspecies level for morphometrically   Table 1). The pie charts represent the composition of the two genetic clusters for each geographical region shown as blue (more A.m. scutellata) and orange (more A.m. capensis). The colors indicate the different proportion of allele frequencies assigned to bees from each region.  Table 1. Cluster analysis of subspecies are shown at K = 2, without prior information of population identity. The different regions are depicted on the X-axis and were defined using SNPs (Eimanifar et al., 2018b). Each honey bee is represented by a bar that is, segmented into two colors based on the assignment into inferred clusters given the assumption of K populations. The length of the colored segment is the estimated proportion of alleles of the individuals belonging to that cluster. The analysis was performed in five replicates for each K so the replicate with the highest likelihood is shown.  (Fig. 3). The microsatellite genotypes generated for all honey bee's analyzed are provided in Table S4.

DISCUSSION
In the present study, we present information resulting from microsatellite genotyping to evaluate the detailed genetic diversity and population structure of two subspecies (A.m. scutellata and A.m. capensis) and their hybrids from 29 regions in the RSA. Microsatellite loci have been widely used to characterize the structure of honey bee populations across different geographical regions since they have a high degree of sensitivity and provide a reasonable level of polymorphism (Franck et al., 2001;Loucif-Ayad et al., 2015;Techer et al., 2015).

Genetic diversity
In an analysis of 19 microsatellite loci, we detected a high level of genetic variability based on the mean number of alleles observed per microsatellite locus and heterozygosity within and among examined honey bee regions. The high level of genetic diversity observed for all honey bee groups was consistent with that shown in previous studies (Estoup et al., 1995;Franck et al., 1998;Jaffe et al., 2010;Eimanifar et al., 2018aEimanifar et al., , 2018b. This may be true in RSA given the high densities of wild honey bee colonies in certain areas of the country (Vaudo et al., 2012a(Vaudo et al., , 2012b. The microsatellites we used tended to classify bees from certain regions as hybrids, more so than did SNPs, which tended to separate the subspecies more predictably (Eimanifar et al., 2018b). In fact, both the A.m. capensis and A.m. scutellata regions in RSA were smaller when defined by microsatellites than they were when defined using SNPs or whole mitogenomes (Eimanifar et al., 2018a(Eimanifar et al., , 2018b. Consequently, we feel the microsatellites possibly overestimated the size of the hybrid zone. In contrast, it is possible that the microsatellites provided evidence of greater hybridization between the two subspecies than first thought. As expected, honey bees in the hybrid region in RSA contain a gene pool derived from hybridization of A.m. scutellata and A.m. capensis bees. The hybrid bees in our study had many polymorphic alleles that likely originated from recombination of alleles of the two subspecies (Alda & Doadrio, 2014). We suspect that the occurrence of new alleles in the hybrid bees could be a result of hybridization of the two subspecies, as is known to happen for populations in hybrid zones in general (Arnold et al., 1999).
The mean number of alleles observed across all loci (Na = 10.01) and the mean value of observed heterozygosity (H obs = 0.75) were greater for bee populations in our study (N = 716) than those reported in the literature for Algeria (N = 414, Na = 9.5, H obs = 0.69) (Loucif-Ayad et al., 2015), Croatia (N = 225, Na = 9.25, H obs = 0.67) (Muñoz et al., 2009) and on Rodrigues Island (N = 524, Na = 7.63, H obs = 0.64) (Techer et al., 2016). The greater mean number of alleles/locus in our study than in others likely resulted from our large sample size and generally large populations of the two subspecies we examined (Loucif-Ayad et al., 2015). African honey bees exhibit several exceptional characteristics, including pronounced migratory behavior (absconding and swarming), which possibly could explain their high level of genetic diversity with low levels of differentiation (Franck et al., 1998;Hepburn & Radloff, 1998).

Population genetic structure
The set of microsatellite markers we used yielded a distinctive population structure within the examined groups, though the microsatellite results did not align completely with those of the morphometric (Bustamante, Baiser & Ellis, in press) and other molecular results (Eimanifar et al., 2018b). Nevertheless, our STRUCTURE results support the presence of some genetic structure at the subspecies level, reinforcing other work showing that there are two different gene pools existing in A.m. capensis and A.m. scutellata honey bees from the RSA (Eimanifar et al., 2018a(Eimanifar et al., , 2018b. We found additional evidence of the allelic introgression of A.m. capensis from their native range into that of A.m. scutellata regions (Neumann & Moritz, 2002;Moritz et al., 2008). These results highlight the decades-old, continuing impact of the "capensis calamity" on A.m. scutellata colonies in the RSA. Our STRUCUTRE results (Fig. 3) suggest that the A.m. capensis/scutellata hybrid zone has moved north and east from that originally delineated by Hepburn & Radloff (1998) over two decades ago.
In conclusion, honey bees in the RSA are composed of two genetically different populations, both with significant genetic structure. Given our large sample size and analytical tools, we show two distinct evolutionary units, though the results did not match those of earlier morphometric and molecular analyses (Eimanifar et al., 2018b;Bustamante, Baiser & Ellis, in press), suggesting that the microsatellites we tested were not sufficient for subspecies identification purposes, especially for Cape and hybrid bees. Nevertheless, our data highlight the considerable genetic diversity within both populations, a possibly larger-than-expected hybridization zone between the natural distributions of A.m. capensis and A.m. scutellata, and evidence of the expansion of A.m. capensis into A.m. scutellata regions. The genetic introgression of A.m. capensis into A.m. scutellata territory continues from the days of the "capensis calamity" and highlights the invasive potential of A.m. capensis. Our data suggest that the hybrid zone has expanded in the RSA beyond that shown using morphological data (Hepburn & Radloff, 1998;Bustamante, Baiser & Ellis, in press), possibly owing to migratory beekeeping activities in the RSA.