In situ parapatric divergence across an environmental gradient is expected to result in a combination of non-concordant genetic clines among loci under selection (and loci linked to them), due to different selection coefficients on these loci, and a majority of “neutral” loci structured by a balance between genetic drift and gene flow (Coyne & Orr, 2004). Because environmental adaptation is expected to act on few loci only, genetic divergence across the genome should create “islands” of divergence rather than genome-wide divergence patterns in the presence of gene flow. However, recent genomic studies of incipient species with gene flow suggest the latter (Michel et al., 2010; Renuat et al., 2013), in contradiction to expectations.
Urophora cardui L. (Diptera: Tephritidae) is a specialist gall maker associated with creeping thistle Cirsium arvense (L.) Scop (Asteraceae). Initially attracting attention as a biological control agent of C. arvense in Canada (Zwölfer, Englert & Pattullo, 1970; Peschken & Harris, 1975), U. cardui has since been studied for a range of topics in its native Western Palearctic and introduced Canadian distribution ranges: biogeography, evolution of galls, interactions with parasitoids and for pest control (e.g., Peschken & Harris, 1975; Zwölfer, 1979; Eber & Brandl, 1997; Johannesen, Drüeke & Seitz, 2010; De Clerck-Floate & Cárcamo, 2011).
The local population structure of U. cardui in the native range is characterised by meta-population dynamics and isolation-by-distance gene flow (Zwölfer, 1979; Seitz & Komma, 1984; Eber & Brandl, 1994; Eber & Brandl, 1996; Johannesen & Seitz, 2003). However, Steinmetz, Johannesen & Seitz (2004) reported a narrow (ca. 70 km), linear genetic transition zone across contiguously distributed populations on the Jutland (Cimbrian) peninsula where genetic variance between populations on each side of the transition highly exceeded that explained by recurrent extinction-colonisation processes. The study, based on allozyme loci, identified non-concordant clines at three loci and no clines at another four. Johannesen, Drüeke & Seitz (2010) reported dominance of the same mtDNA haplotype on both sides of the transition zone that belonged to a Western European lineage experiencing population expansion. This mix of allelic distributions across the transition area suggested an in situ origin of the parapatric divergence rather than one caused by secondary contact of two allopatrically evolved populations, i.e., by vicariance. However, a precise estimate of the relative number of loci exhibiting clines as well as modes of allele frequency shifts was prohibited due to low levels of polymorphism.
Here, we report the development of 12 highly polymorphic microsatellite loci and a genotyping-by-sequencing (GBS) protocol for SNP identification with the aim of reaching first insights into genome-wide vs. island diversification among parapatric U. cardui populations, specifically, and for the study of population genetic patterns in native and introduced populations of U. cardui in general. The two marker systems are characterised by different levels of per-locus polymorphism and they differ in locus-specific recovery (amplification) per individual. Microsatellites have great applicability due to locus-specific amplification per individual, high allelic variation per locus and Mendelian inheritance. In comparison, GBS provides high-density, genome-wide (mostly) bi-allelic polymorphisms, but the recovery of specific loci (locus amplification success) may be uneven among individuals. The markers may, alone or in combination, offer helpful tools relative to the specific scientific enquiry and/or sampling strategy (Hodel et al., 2016). We quantify genetic diversity and differentiation among one population north, within and south of the transition zone, respectively, and we assess the information value of the markers systems relative to allozyme genetic divergence observed 14 years previously (Steinmetz, Johannesen & Seitz, 2004).
Material and Methods
Microsatellite development was based on an Illumina-Miseq DNA library generated from one male caught in the centre of the transition zone, Østre Løgum (55.133°N, 9.351°E). Genomic DNA was extracted with the Roche template preparation kit (Roche Diagnostics, Mannheim, Germany). 100 ng of genomic DNA was sheared, end repaired, A-tailed and ligated to TruSeq adapters. The library was amplified in 8 PCR cycles. The library was size selected for a mean of 650 bp, which corresponds to 530 bp internal sequence. The library was sequenced for 300 bp in “paired-end” modul in one Illumina-Miseq run. In total, 53.5 million paired-end reads (16.2 GB) were generated (NCBI BioProject: PRJNA352663; SRA: SRR5023787). Overlapping paired end reads were merged with FLASH v. 1.2.6 (Magoc & Salzberg, 2011), and the sequences were searched for microsatellite repeat motifs. Initially, 117 loci with repeat motifs were identified. PCR amplification for the 117 Loci was performed in 25 µl: 1 µl DNA (15 ng), 1 µl forward and 1 µl reverse primer (10 pmol/µl), 2 µl dNTPs (2.5 mM), 5 µl 5× PR-Buffer, 2 µl MgCl (25 mM), 0.1 µl GoTaq-Hotstart-polymerase (1.25 units) (Promega, Fitchburg, WI, USA), 12.9 µl H2O using Biometra thermocycler T1 (Biometra, Göttingen, Germany): 3 min at 95 °C, 34 cycles of 30 s at 95 °C, 30 s at 55 °C, 30 sec at 72 °C, final extension of 5 min at 72 °C. 53 loci produced distinct PCR products. Variability in the 53 loci was assayed by comparing PCR products from a mix of four individuals with the PCR product of the original individual. The PCR products were separated electrophoretically using the QIAxcel-System (Qiagen, Venlo, The Netherlands). Twelve loci with hexa-, penta- and tetra-nucleotide repeats were optimised for analyses (Genbank accession numbers KT923909 –KT923920) (Table 1). Alleles with such repeat units are easier to score (bin) accurately than alleles with dinucleotide repeats. The 12 loci were amplified in three QIAGEN Multiplex PCR reactions with fluorescent-labelled primer pairs (primer mix 1: Uc03, Uc05, Uc09, Uc10; primer mix 2: Uc01, Uc02, Uc08, Uc11; mix 3: Uc04, Uc06, Uc07, Uc12) (Table 1). For multiplex PCR, we used an end volume of 10 µl (8.5 µl mastermix and 1.5 µl DNA of c. 50 ng DNA per reaction). The mastermix contained a final concentration of 1× QIAGEN Multiplex PCR Master Mix, which had 3 mM MgCl2 and 0.2 µM of each primer. Cycling conditions were identical for the three multiplex reactions: 30 s at 95 °C, 30 cycles of 30 s at 94 °C, 90 s at 60 °C, 90 s at 72 °C, final extension of 10 min at 72 °C. 1 µl of the PCR product was added 11.7Rl HiDi-formamide and 0.3Rl ROX 500 standard (Applied Biosystems, Carlsbad, CA, USA). The loci were run on an ABI 3130XL capillary sequencer and genotyped with GeneMapper 5.0 (Applied Biosystems, Foster City, CA, USA).
|Locus||Genbank accession no.||Primer sequence 5′–3′||Tm (°C)||Core repeat||Size range (bp)||PM|
DNA was digested with the restriction enzyme EcoR1. Based on in silico digestion of c. 1 Mb of the tephritid Ceratitis capitata (GenBank assembly accession: GCA_000347755.2) and Bactrocera tryoni (GCA_000695345.1) genomes, we estimated that EcoR1 would have c. 15,000 fragments between 200 and 500 bp. DNA for the GBS library was extracted with the Roche template preparation kit (Roche Diagnostics, Mannheim, Germany). DNA quality and its interaction with GBS library quality (sequencing success) was assessed for extractions of (1) whole individuals cut longitudinally and treated with RNAase (Qiagen, Venlo) after DNA extraction, and (2) for DNA extracted from ultrasonically lysed (Bandelin Sonoplus HD70) tissue pellets without RNAase treatment. GBS adapter preparation followed Elshire et al. (2011). We used three pools with six barcodes (individuals) each (Appendix S1). Individual DNA was digested separately in 20 µl end volume consisting of 2 µl NEB buffer (10×), 1 µl EcoR1 (NEB) and 100 ng/µl DNA (variable µl), and filled up to 20 µl with RNase free water (variable µl). The mixture was gently stirred by flicking, centrifuged for 30 s (8,000 rpm), incubated at 37 °C for 15 min and heat inactivated at 65 °C for 20 min. Ligation was done for each individual separately in 50 µl end volume consisting of 20 µl digest (above), 5 µl NEB ligase buffer (10×), 10 µl barcoded adapter mix (=concentrated stock), 13.5 µl RNase free water, 1.5 µl NEB T4 DNA ligase. Reactions were centrifuged for 30 s (8,000 rpm) and incubated at 22 °C for 60min. The ligase was heat inactivated at 65 °C for 30 min. The ligation products of six individuals (each 10 µl) with different barcodes were pooled (=60 µl), added 250 µl binding buffer (Roche) and eluted in 30 µl RNase free water. Each pool was amplified in 6 parallel PCRs (with one specific TruSeq primer per pool). PCR purification was done on three PCR reactions simultaneously using 3 × 250 µl = 750 µl binding buffer, and eluted in 30 µl RNase free water. Finally, the two 3× purifications á 30 µl were pooled, resulting in 60 µl end-elution product. DNA concentrations were measured with Qubit 3.0 (Thermo Fisher Scientific, Waltham, MA, USA), DNA quality was assessed with the 260/280 ratio using Nanodrop (PEQLAB, Erlangen, Germany). Library validation was done with the Agilent 2,100 bioanalyzer (Santa Clara, CA, USA). Libraries were sequenced as 100 nt paired ends on Illumina Hiseq 2000 (1/5 lane, three pools on six individuals) at the Institute of Molecular Genetics, Mainz University.
GBS pools were de-multiplexed with pyRAD 3.0.3 (Eaton, 2014). The paired end reads were merged with PEAR 0.9.5 (Zhang et al., 2014) before being processed further with pyRAD 3.0.3, using standard settings for paired end reads. Data processing was done on the High Performance Computing (HPC) Mogon-cluster (Mainz University). We assessed the number of recovered loci, unlinked SNPs and parsimony informative sites by varying combinations of the parameters “minimum coverage of individuals” (mincov = 10 and 14), and “maximum number of individuals with heterozygotic sites” (maxSH = 4, 6, 8 and 10).
Genetic diversity indices were estimated in three populations situated north (Vildbjerg, 56.187°N, 8.771°E), within (Frøslev, 54.827°N, 9.331°E) and south (Neumünster 54.136°N, 9.907°E) of the transition zone. Microsatellite loci were tested in 19–20 individuals per population. GBS SNP variation was quantified for 6 individuals per population. Cross-species amplification of microsatellite loci was tested in U. stylata (Fabricius 1775) that were sampled in Christiansfeld, Denmark (55.354°N, 9.476°E) and Waldgrehweiler, Germany (49.664°N, 7.737°E). For microsatellites, we tested for deviations from Hardy–Weinberg proportions (per locus and population), linkage disequilibria (LD) within populations, genetic diversity per population (allele number and expected heterozygosity) with Genepop on the web (Raymond & Rousset, 1995; Rousset, 2008). Genetic differentiation, FST, was estimated locus-wise and among populations with Arlequin 126.96.36.199 (Excoffier & Lischer, 2010). Microsatellite null alleles, stuttering and large allele dropout were evaluated with Micro-Checker version 2.2.3 (Van Oosterhout et al., 2004). For GBS data, we converted the Structure file of unlinked SNPs generated by pyRAD (Eaton, 2014) into an Arlequin file (PGDspider 188.8.131.52, Lischer & Excoffier, 2012). For both microsatellites and GBS, we assessed population affiliations with Structure 2.3.4 (Pritchard, Stephens & Donnelly, 2000) using the admixture model, both with and without prior population information (10 × 50,000 burn-in and 100,000 iterations, K = 1–5). The highest level of genetic clustering was calculated in Clumpak (Kopelman et al., 2015) according to delta K (Evanno, Regnaut & Goudet, 2005) and probability of K, ln Pr(X|K) (Falush, Stephens & Pritchard, 2003).
All microsatellite loci amplified consistently in the 59 investigated U. cardui. The tetra, penta and hexanucleotide repeat lengths allowed precise binning of alleles. Both standard fragment length variation as well as intra-repeat variation, e.g., 3 bp shifts in hexanucleotide repeats, were easily discriminated. Because all loci were amplified with identical PCR conditions, other multiplex locus combinations could be possible in future studies.
We observed 5–21 alleles per locus across the three U. cardui populations (Table 2). The expected heterozygosity per locus per population was 0.60–0.90. Locus Uc12 was differentially sex-linked: both sexes lacked heterozygotes in Vildbjerg north of the transition zone (FIS = 0.874), predominately in females in Frøslev within the transition zone (FIS = 0.389) but in males in NMS south the transition zone (FIS = 0.418) (Table 2). FIS values of 0.50 are expected when haploid and diploid loci (or organisms) are equally represented in analysis. Loci Uc05 and Uc06 did not obey Hardy–Weinberg proportions in the population Frøslev after Bonferroni correction (P < 0.05). Linkage disequilibria were not found in the northern population but they were observed in the transition zone (Frøslev, N pairs = 6) and in the southern population (NMS, N pairs = 1) after Bonferroni correction (P < 0.05). Most LD involved the sex-linked locus Uc12. There was no evidence for stuttering or large allele drop out at any locus. The method of Evanno, Regnaut & Goudet (2005) identified K = 2 as the highest level of structure (Figs. 1A and 1B) whereas Clumpak summary results revealed an approximately equal probability for K = 2 and K = 3 (Figs. 1C and 1D), particularly when including population priors. For K = 2: mean(LnProb) = − 2855.010, mean(similarity score) = 0.997; for K = 3: mean(LnProb) = − 2694.350, mean(similarity score) = 0.989. Mean expected heterozygosity/population across loci was North He = 0.788, Frøslev He = 0.810, South He = 0.792. All 12 loci were significantly differentiated among populations, 0.063 < FST < 0.177, P < 0.001. Significant genetic differentiation was found between all population pairs, FST(north-transition) = 0.076, FST(north-south) = 0.122, FST(transition-south) = 0.128.
|U. cardui (N = 59)||U. stylata (N = 15)|
|Locus||Vildbjerg N = 20||Frøslev N = 20||NMS N = 19||Na||Size range||Na|
Cross-species amplification in U. stylata was successful for four loci (Uc04, Uc09, Uc11, Uc12), whereas three loci (Uc02, Uc03, Uc07) amplified inconsistently using the amplification protocol used for U. cardui. Fragment sizes of Uc03 partially overlapped with those of Uc09 (both PM1), making an evaluation of the less well amplifying Uc03 difficult. Two loci, Uc05 and Uc10 (both PM2), amplified identical fragment profiles. Future studies of U. stylata, or other Urophora spp., should take notice of re-evaluating amplification protocols and primer mix compositions.
Neither DNA preparation/extraction methods nor treatments with or without RNase affected GBS library quality and sequencing success. Although treatment with RNase significantly improved the mean ratio of 260/280 by approaching ≈1.80 (no RNase = 2.066, with RNase = 1.864; t = 9.59, 16 df, P < 0.001), the mean number of reads per pool was higher, though not significantly, for DNA without RNase treatment (Appendix S2). PyRAD analysis quantified 645,164–2,640,743 reads and 7,615–10,002 loci per individual. The mean depth of clusters per individual with depths greater than 7 was 31.7–88.1. Assembled GBS reads have NCBI accession numbers SAMN06241878–SAMN06241895. The parameter settings “minimum coverage of individuals” (mincov) = 10 and 14 and “maximum number of heterozygote samples” (maxSH) = 4, 6, 8 and 10 identified between 1,177 (14 mincov–4 maxSH) and 2,347 (10 mincov–10 maxSH) unlinked SNPs and 1,750 and 4,469 parsimony informative sites, respectively (Fig. 2). Regression analysis showed that the frequency of polymorphic sites, 0.0034–0.0040, did not increase significantly with the number of reads (t = − 0.61, df = 16, P = 0.55) whereas expected heterozygosity did (t = 2.46, df = 16, P = 0.03). The latter was caused by a significant positive relationship between the number of reads and the number of loci recovered for analysis, t = 3.59, df = 16, P > 0.01. Post hoc analysis showed that this result was influenced by the three individuals with the lowest number of recovered loci (ca. 200–300 loci less than the other individuals) and implied that a plateau was reached above c. 1,000,000 reads using in our protocol. The summary results are presented in Appendix S3.
Structure analyses were performed on loci recovered with the parameter settings 14–6 (i.e., mincov = 14, maxSH = 6), 14–10, 10–6 and 10–10. The parameter set 14–6 recovered 1,492 unlinked SNPs (the second lowest number of SNPs found). For 14–6, the methods of Evanno, Regnaut & Goudet (2005) and Falush, Stephens & Pritchard (2003) estimated K = 2 (Fig. 3A) and K = 4 (Fig. 3B) whereas membership proportions identified three clusters corresponding to the three populations (Fig. 3C), thus repeating the observed pattern found for microsatellites without inclusion of population prior. Increasing the number of loci (=unlinked SNPs) by reducing the number of recovered individuals/locus (e.g., 10–6) and/or allowing more heterozygotic sites across samples (e.g., 14–10) resulted in an identical number of optimal clusters, K = 3, identified by the three clustering methods (Figs. 3D, 3E, 3G, 3H, 3J and 3K). Genetic differentiation among populations (AMOVA) based on parameter set 14-6 (1,492 unlinked SNPs) included 508 loci with less than 0.05 missing data: FST = 0.197, pairwise estimates: FST(north-transition) = 0.165, FST(north-south) = 0.208, FST(transition-south) = 0.220, all P < 0.001. Locus-wise AMOVA based on 701 polymorphic loci found 99 loci (14.1%) with significant differentiation, 0.20 < FST < 0.91, P < 0.05). Expected heterozygosity/population across 1,492 loci was: North = 0.317, transition zone = 0.331, South = 0.327.
The microsatellite and GBS markers characterised in this study identified identical divisions among three parapatric U. cardui populations, thus providing evidence for genetic separation of these populations since the first observation of genetic divergence in 2001 (Steinmetz, Johannesen & Seitz, 2004). Many and highly differentiated loci in both marker systems suggest discrete genetic differences among the studied populations. Although our Miseq data did not permit making a high-resolution genome assembly of U. cardui for mapping the distribution of SNP diversification across the genome, in silco digestion of scaffolds of the tephritids C. capitata and B. tryoni with EcoR1 found similar fragment length distributions among both scaffolds and species. Hence, we predict genetic divergence among the parapatric U. cardui populations will be present in independent genome regions. Indeed, genomic studies of parapatric/sympatric populations/species have found genome-wide divergence patterns (Renuat et al., 2013; Michel et al., 2010; Feulner et al., 2015; Marques et al., 2016), which conflicted with theoretical expectations of genomic islands of divergence for local environmental adaptation in the presence of gene flow.
The two marker systems provided different locus-specific levels of genetic diversity that alone or in combination may give insights into such diverse research fields as paternity testing, hybrid status and phylogeograhy. Although the marker systems found identical population-divisions they also differed in estimates of individual affiliations, and GBS was partly sensitive to paralog filtering assumptions. The differences are partly due to different levels of locus-specific heterozygosity, where highly multi-allelic microsatellite loci increase the probability of detecting close relatives but lower the likelihood of assigning “unrelated individuals” to a population when the number of individuals is low relative to the level of polymorphism per locus. The lower estimate of mean genetic differentiation for microsatellites, FST = 0.109, being half of that observed for GBS, FST = 0.197, is related to higher per-locus variability of the former marker system. Still, FST > 0.10 is considered high for highly polymorphic microsatellites, and similar estimates have been found in host-race studies (e.g., Kempf et al., 2009; Imo, Maixner & Johannesen, 2013).
For GBS, the probability of population assignment increased with the number of unlinked SNPs, but even the relatively low number recovered with the conservative parameter set 14-6 (Fig. 2) identified individual memberships to three populations with high probabilities (Fig. 3). Considering that the individual GBS coverage in 1/5 lane was mostly >40, increasing the number of individuals per pool without quality loss should be possible. Standard DNA extraction procedures did not influence data quality, given adequate concentrations of DNA.
The likelihood for genome-wide differentiation and the observation of alternative sex linkage at microsatellite locus 12 might indicate that genomic rearrangements and/or sex determination are involved in diversification among the parapatric populations. Sex determination in Diptera can vary considerably (Vicoso & Bachtrog, 2015), and in the Tephritidae both male and female heterogamy is known (Bush, 1966). In U. cardui (2N = 12), where the sex-determination system is not known (Mainx, 1976), karyotype variation in the number of dot chromosomes has been found in southern Germany (2N = 13–14) (Ponisch & Brandl, 1992). Verifying rearrangements and/or sex linkage with GBS data will require more samples than used here due to the combined action of different sex linkage among populations and variable genome representations among individuals. Future studies of U. cardui from the transition area should focus on whether genomic rearrangements and sex determination interact with mating behaviour and influence rates of gene flow.
DNA extraction method and DNA quality
Extraction method: (1) DNA extraction from ultrasound-lysed tissue without application of RNAase, (2) DNA extraction from whole-cut insect with application of RNAase. DNA concentration (ng/µl) was measured with Qubit. DNA purity (260/280) was measured with NanoDrop.
Genotyping-by sequencing summary of coverage and loci identified by pyRAD 3.0.3 for parameters mincov = 14, maxSH = 6
Sum of variable sites = 3643, total number of loci containing parsimony sites = 2476, number of unlinked SNPs = 1492. The loci were based on the restriction enzyme EcoR1.
dpt.me = mean depth of clusters, dpt.sd = standard deviation of cluster depth, d > 7.tot = number of clusters with depth greater than 7, d > 7.me = mean depth of clusters with depth greater than 7, d > 7.sd = standard deviation of cluster depth for clusters with depth greater than 7, nloci = number of loci, f1loci = number of loci with >N depth coverage, f2loci = number of loci with >N depth and passed paralog, nsites = number of sites across f loci, npoly = number of polymorphic sites in nsites, poly = frequency of polymorphic sites, He = expected heterozygosity.