There is an ongoing debate about the importance of local recruitment and barriers to gene flow in marine species. Many marine species reproduce via planktonic larvae and strong ocean currents have the potential to carry propagules over long distances. However, a high degree of self-recruitment has been found in a range of species with planktonic larval duration being a poor predictor of genetic structure (Selkoe & Toonen, 2011). The development of cheap genome-scale genotyping is poised to open a new chapter in this discussion (Peterson et al., 2012; Toonen et al., 2013; Wang et al., 2012). American eels for example show panmixia in their central breeding ground in the North Atlantic but single nucleotide polymorphism (SNP) genotyping of adults along the Eastern seaboard revealed local differentiation (Gagnaire et al., 2012). Thus, a well-mixed pool of larvae sorted into environmental niches and so resulted in a structured adult population.
SNPs are ubiquitous throughout the genome, located in coding and non-coding regions, and each locus has a maximum of four alleles (the four bases). This is in contrast to microsatellite loci that consist of tandem repeats, in which allelic variation is determined by the number of tandem repeats and thus can be large. The limited number of alleles at each SNP locus requires a larger number of loci to be assayed to achieve the same power of detecting population genetic structure as a panel of microsatellite loci (Morin, Martien & Taylor, 2009; Ryman et al., 2006). The advent of reduced representation sequencing methods have made it possible to develop and assay a large number of SNP loci at a reasonable cost (Altshuler et al., 2000; Hoffberg et al., 2016). Recently, Genotyping by Sequencing (GBS) data including 4,764 SNPs in A cervicornis identified population structure within the Florida Reef tract (Willing, Dreyer & Van Oosterhout, 2012) where microsatellite markers did not (Baums et al., 2010). Other flavors of reduced representation sequencing methods (Drury et al., 2016; Toonen et al., 2013; Wang et al., 2012) have yielded information on population structure, and genetic diversity in reef building corals (Drury et al., 2016; Howells et al., 2016).
Genome-scale genotyping can provide insights into genetic diversity within functional regions of the genome that may be under selection (those genomic regions that code for proteins or regulate transcription of genes). These regions are not commonly surveyed even though they are of interest to conservation managers who want to understand how much capacity there is in a species to adapt to changing conditions (Becks et al., 2010). Statistical methods have been developed that allow scanning of SNP loci for signatures of selection. Despite the risk of generating false positive results (Vilas, PÉRez-Figueroa & Caballero, 2012), these methods yield candidate loci that should be substantiated by further testing (Renaut et al., 2011; Sork et al., 2016). The same methods can be used to scan microsatellite loci for signatures of selection (Nielsen, Hansen & Meldrup, 2006; Vasemägi, Nilsson & Primmer, 2005), however, power is often limited by the small number of assayed loci.
Acropora palmata is one of a few Caribbean coral species whose population genetic structure has been thoroughly investigated on local and range-wide scales (Baums, Devlin-Durante & LaJeunesse, 2014b; Baums, Miller & Hellberg, 2005b; Baums, Miller & Hellberg, 2006a). A range-wide survey of A. palmata population genetic structure using five coral specific polymorphic microsatellite markers showed that A. palmata stands are structured into two long-separated populations (Baums, Hughes & Hellberg, 2005a). While most reefs are self-recruiting, A. palmata stands are not inbred and harbor high genetic diversity at these microsatellite loci (Baums, Miller & Hellberg, 2005b). Bio-physical modeling identified a transient feature in the Mona Passage important in restricting present-day gene flow between the eastern and western population (Baums, Paris & Cherubin, 2006b). However, it is unclear whether the eastern and western populations differentiated initially due to selection. Subsequent denser sampling of A. palmata along the Antilles Island Arc raised the possibility of a hybrid zone across Puerto Rico rather than a clear-cut break between the eastern and western Caribbean at the Mona Passage (Fig. 1, Mège et al., 2014).
We sought to refine the location of the east–west population divide and test for the presence of finer scale population differentiation in A. palmata by developing a large number of SNP markers. We assayed genome-wide SNPs in archived samples from two geographic regions in the western A. palmata population (Bahamas and Florida) and two geographic regions in the eastern population (Puerto Rico and the US Virgin Islands (USVI)). We then compared the results to population structure derived from eleven microsatellite loci. We further aimed to produce a more comprehensive estimate of genetic diversity across the genome using SNPs and screened loci for signatures of selection.
Materials & Methods
Colonies of A. palmata were collected between 2002 and 2010 and previously genotyped (Baums, Devlin-Durante & LaJeunesse, 2014b; Baums, Miller & Hellberg, 2005b). Unique genets were selected from our database for a total of 24 samples from each of four geographic regions; the Bahamas, Florida, Puerto Rico and the US Virgin Islands (USVI). The goal was to have eight samples from three different reefs within each geographic region, however this was not always possible either due to small sample sizes from a particular reef or low clonal diversity of a reef. In those cases, we selected additional unique genets from nearby reefs. See Table 1A for detailed sample information.
We used an extended set of samples to compare the population genetic structure ascertained via microsatellite genotyping to the SNP results. This extended set of samples included 260 samples from six geographic regions; Belize, Florida, Puerto Rico, the USVI, and Curacao (Table 1B). Note that not all SNP-genotyped samples were included in the microsatellite dataset.
Coral tissue samples were extracted from ethanol preserved samples using DNeasy Blood & Tissue Kit (QIAGEN, Hilden, Germany) with the following modifications. Time of incubation in the extraction buffer was increased to 16–20 h and two 100 µl elutions were performed, the second of which was kept for library production as this fraction contained the high molecular weight DNA. Extracted DNA was then treated with 0.01 mg of RNase A (10 mg/ml; Amresco Solon, Solon, OH, USA). Extraction concentrations ranging from 500 ng to 6 µg were double-digested with 10 units of each of the restriction enzymes MluCI (AATT) and NlaIII (CATG) (New England Biolabs, Ipswich, MA) following the protocol described by Peterson et al. (2012). Digestions were purified using 1.5× Ampure beads (Beckman Coulter Inc, Brea, CA, USA) and quantified on a Qubit® fluorometer (Life Technologies, Carlsbad, CA, USA). Digested DNA was standardized to 100 ng for each sample before adaptor ligation. Samples were identified with eight 6-bp indices on the NlaIII (rare-cutter) P1 adapter (Table S1). Samples were pooled into 12 libraries and then size selected in the range of 200–800 bp on a Pippin-Prep (Sage Science, Beverly, MA, USA). Next, Illumina flow-cell annealing sequences, unique multiplexing indices and sequencing primer annealing regions were added through PCR amplification to the MluCL cut end (See Peterson et al., 2012, Protocol S1, Fig. 1). The libraries were enriched with 12 amplification cycles in four separate PCR reactions for each library containing 10 µl of Phusion High-Fidelity PCR Master Mix with HF Buffer (New England Biolabs, Ipswich, MA, USA), 2 µl of each amplification primer, 1 µl of library DNA and 5 µl of water (total 20 µl). Samples were pooled into four libraries each containing 24 samples (Table 2, Table S1). Each library was sequenced on one lane of Illumina HiSeq 2000 sequencer (paired-end, 2 ×150 bp) at the Pennsylvania State Genomics Core Facility. There were two libraries sequenced on each chip. See Peterson et al. (2012) Supplementary Information for a detailed protocol. Radseq methods have been used successfully in scleractinian corals (Combosch & Vollmer, 2015; Dimond, Gamblewood & Roberts, 2017; Forsman et al., 2017) and other marine invertebrates (Lal et al., 2016; Reitzel et al., 2013).
|Region||Reef||Count of samples||Latitude||Longitude|
|Bahamas||Little Ragged Island||1||22.15375||−75.687208|
|Puerto Rico||San Cristobal||8||17.56493||−67.04515|
|US Virgin Islands||Tague Bay||8||17.763867||−64.613397|
|Little Ragged Island||1||22.1538||−75.687|
|Puerto Rico||San Cristobal||14||17.5649||−67.045|
|Laughing Bird Caye||4||16.4367||−88.199|
|Region||Pool||Coral colonies||Lane||Total reads||Retained Reads after processing||Average number of retained sequence reads per sample||Standard deviation|
Raw sequence filtering
Raw sequence reads were filtered using the process_radtags in the pipeline Stacks 1.21 (Catchen et al., 2013; Catchen et al., 2011). Barcodes and the RAD-Tag cut sites were identified to de-multiplex the pooled data into individual samples (Table S2). Reads were discarded that had low quality (with an average raw phred score <10 within a 15-base pair sliding-window), adapter contamination, and uncalled bases. Since all indices differed by at least 2 bp, it was possible to correct and retain any index that differed by a single bp from an expected index.
Processed sequences were then aligned to the Acropora digitifera genome (V1.0) (Shinzato et al., 2011) with Bowtie2 (Langmead & Salzberg, 2012) within the Galaxy (Bedoya-Reina et al., 2013; Blankenberg et al., 2014) framework using end-end read alignment settings in order to remove symbiont and other associated microorganisms. After alignment, paired-end sequencing BAM files were assembled in the ref_map.pl pipeline in Stacks 1.30 with the following parameters. Each paired-end sequencing set was run separately through Stacks to compare results (designated Read1 and Read2) in a one-way ANOVA. The ANOVA used each paired-end read as a technical replicate of the same genomic region. We did this to assess whether we would retrieve similar estimates of FIS and heterozygosity from both reads, as expected.
The number of raw reads required to report a stack was m = 5. The number of mismatches allowed between loci when building the catalog was n = 4. SNPs with a log-likelihood of less than −10 were removed as reads with poor log-likelihoods tend to have sequencing error and/or low coverage. Two of the barcodes (TCGAT and CGATC) had few sequence reads across all four geographic regions with all Illumina lanes being affected, and samples with these barcodes were removed before assembly in Stacks.
Sequencing reads are available under NCBI BioProject ID PRJNA407327.
Bedtools (Quinlan & Hall, 2010) was used to create a histogram of genome coverage for each sample from the Bowtie2 BAM format alignment files. All positions with a depth of coverage greater to or equal to 20 were combined into a single bin in the histogram. Data from all geographic regions were averaged (excluding samples with barcodes TCGAT and CGATC) and a cumulative distribution of sequencing coverage was then plotted in SigmaPlot v12.
Population genetic statistics
We explored values for several parameters relevant to population genetic analyses. In the Populations module in Stacks 1.30 we required a locus to be present in all regions for all analyses (option— p = 4). For each locus, we then set the minimum percentage of individuals in a region required to have data for that locus to 40% or 60% (option—r). Further, we set the minimum minor allele frequency (MAF) required to process a nucleotide site at a locus (option—min_maf) to 0.025, 0.05 and 0.075. A p-value correction was applied to FST scores, so that if a FST score was not significantly different from 0 (according to Fisher’s Exact Test) the value was set to 0. Additionally, only one random SNP from any RAD locus was written to the STRUCTURE export file in order to prevent linked loci from being processed. Read 1 and Read 2 STRUCTURE export files were combined and duplicate loci removed randomly between reads. FST (p-value < 0.05) was calculated in Stacks. FIS and FST distributions are included in the Figs. S1 and S2.
Clustering analyses for the SNP and microsatellite analysis were performed in the program STRUCTURE 2.3.4 (Falush, Stephens & Pritchard, 2003; Hubisz et al., 2009) using the admixture model with correlated allele frequencies. The analysis included the following parameters: 100,000 burn-in iterations and 1,000,000 Markov chain Monte Carlo repetitions, with and without a population prior, for a total of three replicates for each value of K. K values ranged from 2 to 5. The most likely value for K was determined by CLumpak (Kopelman et al., 2015) Best K which uses either the Evanno method (Evanno, Regnaut & Goudet, 2005) or LN(PR(X|K) values to identify the K for which Pr(K = k) is the highest as described in STRUCTURE’s manual section 5.1. Results of the three structure runs were merged with Clumpak (Kopelman et al., 2015). Based on our exploration of minor allele frequency (MAF) cut off values and the percent of individuals per geographic region allowed to miss a locus (%M), we report results for MAF = 0.05 and for a %M = 60% in the main text (Fig. 2). STRUCTURE clustering analyses for minor allele frequencies cutoffs of 0.025 and 0.075 are included in the Fig. S3. STRUCTURE clustering analysis when the minor allele frequency cutoff was 0.05 and when outlier loci were removed, is also included in Fig. S4. STRUCTURE clustering analysis when the minor allele frequency cutoff was 0.05 and when a locus must be present in at least 40% of individuals in a geographic region, is included in the Fig. S5. PCA clustering analysis, for SNPs and microsatellites, using adegenet (Jombart, 2008) is included in the Fig. S6.
Previously genotyped samples (n = 260) at 10 and 11 microsatellite markers (181, 182, 192, 207, 0,585, 0513, 2,637, 007, 9,253, 5,047, with and without locus 166) (Baums et al., 2009; Baums, Hughes & Hellberg, 2005a) were also analyzed with STRUCTURE 2.3.4 (Falush, Stephens & Pritchard, 2003; Hubisz et al., 2009) using the admixture model with correlated allele frequencies (See Table 1 for sample information). The analysis included the following parameters; 100,000 burn-in iterations and 1,000,000 MCMC repetitions, with and without a population prior, for a total of 3 replicates for each value of K. K values tested ranged from 2 to 7.
Data on temperature, salinity, dissolved oxygen (ml/l), and phosphates was downloaded from the World Ocean Atlas 2013 (WOA13 V2, Table S3). Silicates and nitrates were not used as there was not sufficient data for all locations. For the Bahamas, Puerto Rico, and the USVI the geographic center point among several sampling sites was used because reefs were further apart than in Florida. For all data, the statistical mean of the annual average of years 1955–2012 and depths of 0–10 m was used. Grid sizes were 1/4° for temperature and salinity, and 1° for dissolved oxygen (ml/l), and phosphates (µmol/l) (Table S2). SPSS V22 was used to calculate a dissimilarity matrix expressed as the Euclidean distances between geographic regions based on the above environmental data. To obtain a single GPS location for each geographic region we had to average the latitude and longitude coordinates for all samples in each region. Then GenAlEx v6.501 (Peakall & Smouse, 2006) was used to calculate a pairwise geographic distance matrix between the four geographic regions. GenAlEx v6.501 (Peakall & Smouse, 2006) was used to calculate Mantel multi-comparison tests between the geographic distance matrix, FST pairwise matrix between geographic regions from Stacks, and the environmental dissimilarity matrix.
Two independent methods were applied to identify putative loci under selection. The first program used was LOSITAN (Antao et al., 2008) which utilizes the method of Beaumont & Nichols (1996a) to identify loci under selection based on the joint distributions of expected heterozygosity and FST under an island model of migration. The following settings were used for the SNP and the microsatellite datasets. The neutral mean setting was selected in which during an initial run (100,000 simulations), a candidate subset of selected loci (outside the 95% confidence interval) were identified and removed. Then the distribution of neutral FST was computed using 100,000 simulations and a bisection approximation algorithm (Antao et al., 2008), with the following options, force mean FST, infinite alleles mutation model, and a confidence interval 0.99. A FDR <0.1 correction for multiple testing was applied. Loci outside the upper and lower confidence areas were identified as candidates affected by positive and balancing selection, respectively (Table S4). All geographic regions were analyzed together. Outliers identified as being under balancing selection were not considered as these are more likely to be false positives (Lotterhos & Whitlock, 2014). The positive outlier loci (p < 0.01) were blasted against the NCBI nr, UniProt, and Trembl databases with parameters of expected value = 0.00001, gap opening penalty = 11, gap extension penalty = 1, length of initial exact match (word size) = 6 and scoring matrix = BLOSUM62 using BLASTX 2.2.32 + (Altschul et al., 1997).
The Stacks exported Genepop dataset was also reformatted with PGDSpider version 126.96.36.199 (Lischer & Excoffier, 2012) to a GESTE file. The method of Foll & Gaggiotti (2008) was performed using Bayescan 2.0 (http://www-leca.ujf-grenoble.fr/logiciels.html). For each locus, the probability of it being under selection was inferred using the Bayes factor (BF). Based on Jeffreys’ (1961) scale of evidence, a log10 BF of 1.5–2.0 is interpreted as “strong evidence” for departure from neutrality at that locus and corresponds to a posterior probability between 0.97–0.99. For our analysis, the estimation of model parameters was set as 20 pilot runs of 5,000 iterations each, followed by 50,000 iterations.
Illumina sequencing of the RAD libraries generated 49.3 million reads per pool of eight samples, averaging 6.2 million 150 bp reads per sample prior to quality filtering. After quality filtering, 4.99 million reads per sample (81%) were retained on average (Table 2). Pools had similar numbers of reads after processing (mean = 39.9 million per pool, SD = 4.95 million, one-way ANOVA, F = 2.638, p > 0.1). The average % GC content for Read 1 and 2 was 41.7 and 39.6, respectively. The percentage of polymorphic sites per genomic region varied little among geographic regions, from 0.150 to 0.173% (Table 3). The average observed heterozygosity in variant sites was 22%. Overall FIS values, when considering all sites with a minor allele frequency cutoff a ≥0.05, were close to 0 and hence provided no evidence of inbreeding (Table 3). However, when only considering variant positions within the region of Florida, FIS values were negative (Fis = − 0.0086), indicating an excess of heterozygosity. Using the two paired-end read sets as replicates, a one-way ANOVA was performed for each variable (Table 4). Summary statistics for all geographic regions were found to be similar. Alignment of A. palmata SNPs to the published A. digitifera genome indicated that on average, 2.5% percent of the A. digitifera genome had sequence coverage at a stack depth of 5 (Fig. S7). All four geographic regions produced similar sequence coverage.
|All positions: variant and fixed||Total Sites||200425||200425||200425||200425|
|Nucleotide diversity (π)||0.0004||0.0004||0.0004||0.0004|
|Variant positions only||Obs Hom||0.7728||0.7874||0.7791||0.7815|
|Nucleotide diversity (π)||0.2254||0.2034||0.2174||0.21705|
- % PL
percent polymorphic loci
- Obs Hom
- Obs Het
Bold values are significantly different from 0 (P < 0.05).
A total of 390 SNPS were identified after filtering and including a minor allele frequency cutoff ≥0.05 (Table 3). This included 219 for Read 1 and 176 for Read 2 from the paired-end sequencing (5 SNPs were identical between reads and were only considered once). Analysis of Molecular Variance (AMOVA) revealed patterns of significant genetic differentiation among geographic regions (Table 4). This was also evident when the 307 SNPs (analysis included only one SNP per 150 bp locus) were subjected to a multi-locus clustering analysis in STRUCTURE. Samples from Florida clustered first, followed by the Bahamas at K = 3. Puerto Rico and the USVI were not distinguishable until K = 4, (Fig. 2). Clumpak Best K (Kopelman et al., 2015) indicated that K = 3 was the most likely K-value, after both the Evanno method and LN(PR(X|K) values, regardless of whether the geographic region was used as a prior.
To compare to the SNP analysis, microsatellite data from samples collected in six regions were analyzed in STRUCTURE using the geographic region as a prior (Table S5). At K = 2, a western (including Belize, Florida, Bahamas and Puerto Rico) and an eastern cluster (including the USVI and Curacao) was evident (Fig. 3A). At K = 3, an isolation-by-distance like pattern was apparent in the western cluster (Fig. 3B). K = 4 was the most likely K-value, after both the Evanno method and LN(PR(X|K) values, based on 11 microsatellite markers (Kopelman et al., 2015). Florida and Belize grouped as one cluster, and Puerto Rico and the Bahamas as the second, with the USVI as the third and Curacao as an admixed fourth cluster (Fig. 3C).
According to the outlier analysis in LOSITAN, microsatellite locus 166 was identified as a potential outlier and thus possibly under selection. It was therefore excluded from the analysis in STRUCTURE. This resulted in more comparable results to the SNP analysis with the most likely K-value being 3, after both the Evanno method and LN(PR(X|K) values (Kopelman et al., 2015). Again, the first separation was between a western and an eastern cluster, however this time Puerto Rico assigned to the eastern cluster with an isolation-by-distance like pattern appearing between the west and east (Fig. 3D). At the most likely K of 3, Curacao now formed a separate cluster. At K = 4, the Bahamas started to separate from the remainder of the western region similar to what was observed in the SNP clustering analysis (Fig. 3E).
Environmental drivers of population structure
A Mantel test showed a significant positive relationship in the SNP dataset between pairwise FST values and geographic distance (R2 = 0.65, p = 0.05) consistent with the microsatellite results (10 loci) from the Florida, Bahamas, Puerto Rico, and Curacao samples only (Figs. 4C and 4D). Correlations between environmental factors including average temperature, salinity, dissolved oxygen, and pairwise FST values or geographic distance were not significant (Figs. 4A and 4B).
Loci under selection
BayeScan and LOSITAN identified 2 and 12 SNPs (Table S4) that showed signs of positive selection when including all four geographic regions, one of which was identified by both programs (a total of 13 unique loci identified between both programs). Outliers accounted for 3.3% of the total SNPs, consistent with other studies in which FST outlier loci have represented a substantial fraction of the total loci investigated (2–10%) (Nosil, Funk & Ortiz-Barrientos, 2009). Annotation of the candidate loci proved difficult as only 23% produced significant hits when queried against the NCBI nr database, Uniprot, and Trembl; with two of the hits being annotated as unconventional myosin-IXb isoform X7 and tyrosine-protein kinase transmembrane receptor ROR1-like. Screening of the microsatellite loci identified locus 166 as an outlier under positive selection, yet no annotation information of this locus is currently available.
Comparison with previous Acropora gene flow studies
The previous range-wide survey of A. palmata population genetic structure using five, presumed neutrally evolving microsatellite markers showed that while most reefs are self-recruiting, A. palmata stands are not inbred and harbor high microsatellite genetic diversity (Baums, Miller & Hellberg, 2005b). Furthermore, A. palmata stands were structured into two long-separated populations, one in the eastern and one in the western Caribbean (Baums, Miller & Hellberg, 2005b). Here, we report that genome-wide SNPs (MAF ≥0.05) resolved further population structure in the endangered reef-building coral, A. palmata from Florida to the USVI compared to previous microsatellite-based analyses.
It was recently suggested that the East-West divide of A. palmata lies not in the Mona Passage (Baums, Miller & Hellberg, 2005b; Baums, Paris & Cherubin, 2006b) but rather to the east of Puerto Rico (Fig. 1, Mège et al., 2014). The 307 SNPs analyzed here confirm earlier findings that Puerto Rico and the USVI regions are more similar to each other than Puerto Rico is to either the Bahamas or Florida without imposing any priors in a STRUCTURE analysis (MAF ≥ 0.05). However, it is not always possible to determine, with confidence, the correct clustering solution that accurately reflects genetic population structure when there is an underlying isolation by distance pattern (Frantz et al., 2009). We show here that there is significant isolation by geographic distance from Florida to the USVI when using presumably neutrally evolving SNP and microsatellite loci. Interestingly, inclusion of microsatellite locus 166, flagged as being an outlier locus, obscured this isolation by distance pattern (Fig. S8). Therefore, locus 166 is a strong candidate for a locus under selection (or it is linked to a locus under selection) and its functional significance might prove a fruitful subject for future studies (Nielsen, Hansen & Meldrup, 2006).
An east–west Caribbean divide was also evident in the corals Orbicella annularis (Foster et al., 2012) and Acropora cervicornis (Vollmer & Palumbi, 2007). An additional barrier to gene flow in A. palmata was reported by Porto-Hannes et al. (2015) between Venezuela and the Mesoamerican Barrier Reef System utilizing four of the microsatellites markers.
The total number of SNPs (n = 307) retained for population genetic analysis was lower than expected. This was due to more than a 10-fold increase in the number of fragments retrieved from the genome digest using the enzymes MluCI (AATT) and NlaIII (CATG) compared to what was predicted from an in-silico restriction of an incomplete draft genome of A. palmata (Baums, unpublished). The in-silico restriction predicted 19,067 to the actual 322,425 (read 1) and 276,753 (read 2) fragments retrieved. This under-prediction was most likely due to an early, incomplete genome draft and unknown genome size at the time of this study (Herrera, Reyes-Herrera & Shank, 2015). A larger set of SNP loci may reveal additional finer scale structure in A. palmata across the Caribbean. However, this may not necessarily be the case. In a study that used three orders of magnitude more loci (905,561 SNPs) failed to reveal population structure in A. digitifera collected from the Ryukyu Archipelago of Japan using Bayesian clustering based methods (Shinzato et al., 2015). Low coverage, 5× in this study, is also a concern however this depth of coverage has been used in other non-model species (Babbucci et al., 2016; Blanco-Bercial & Bucklin, 2016; Laporte et al., 2016). Yet in the coral Platygyra daedalea, 5× coverage was sufficient to assign samples to two distinct clusters based on their geographic origin, the Persian Gulf or Sea of Oman and was consistent with their 20× coverage data set (Howells et al., 2016).
A Mantel test showed a significant positive relationship between the SNP-derived pairwise FST values and geographic distance (r2 = 0.65, p = 0.05) consistent with the microsatellite results (10 loci) from the Florida, Bahamas, Puerto Rico, and Curacao samples (Figs. 4C and 4D). This may be due to Wright’s Isolation-By-Distance (IBD) process however Mantel tests are prone to false positives as the test assumes spatial independence of the data (Meirmans, 2012). Nevertheless, genetic variability is structured in geographic space.
Correlations between environmental factors including average temperature, salinity, dissolved oxygen, and pairwise FST values or geographic distance were not significant (Figs. 4A, 4B). It should be noted that the environmental data had a resolution of to 1 degree latitude, an equivalent of about 28–111 km, whereas the genetic data was collected on much smaller spatial scales. For example, in Florida, sampled reefs were often less than 10 km apart, and the distance between Sand Island Reef and French Reef is only 2.6 km. (Table S2). Here, reefs often harbor just one or a few A. palmata genets (albeit represented by many colonies) making it challenging to obtain the needed >25 genets per population recommended for FST analyses on a scale of a few km. Thus, genets were pooled over geographic regions to match the scale of the environmental data and yield sample sizes of at least 25 per location. Yet significant micro-environmental differences among colonies growing on the same reef have been documented (Drury, Manzello & Lirman, 2017; Gorospe & Karl, 2010). Therefore, landscape genetic approaches that may reveal environmental drivers of population differentiation (Manel et al., 2003) must await higher resolution environmental data and, perhaps, a greater number of SNP loci.
Genetic diversity indices in A. palmata
Several factors could account for negative FIS values including negative assortative mating, if a species is outcrossed and lacks selfed progeny or there is a selection pressure that favors the most heterozygous genets. Of our samples, 49 out of 96 were ramets of larger genets. A. palmata colonies fragment frequently; the branches regrow into new colonies resulting in stands of genetically identical colonies (Baums, Miller & Hellberg, 2006a). (Note that samples included here all represented distinct genets). Asexual reproduction could explain the excess of heterozygosity in A. palmata within the Florida region (see Balloux, Lehmann & De Meeus, 2003; Carlon, 1999; Delmotte et al., 2002). Excess hetereozygosity has been observed in other clonal organisms. For example, significant negative FIS values in a partially clonal but self-incompatible wild cherry tree was explained in part by asexual reproduction (Stoeckel et al., 2006).
Nucleotide diversity is a measure of a species’ genetic diversity and varies predictably with life history (Hamrick & Godt, 1996; Romiguier et al., 2014). Because A. palmata populations experienced dramatic losses in the 1980s and therefore may now have reduced genetic diversity we compared A. palmata’s nucleotide diversity to the diversity found in other species. The nucleotide diversity π, describes the degree of nucleotide polymorphism in populations and can be calculated based on variant sites only or on variant and non-variant sites combined. In acroporids, estimates range from 0.007–0.022 (Macdonald, Schleyer & Lamb, 2011) in A. austere to 0.09 in A. cervicornis (Drury et al., 2016). In other Cnidaria, estimates range from 0.00403 in Aiptasia (Bellis, Howe & Denver, 2016) to 0.0065 in Nematostella (Putnam et al., 2007). Synonymous nucleotide diversity ranged from 0.012–0.020 in transcriptomes from three gorgonian species (Romiguier et al., 2014). Average pairwise nucleotide diversity in other metazoans include Drosophila pseudoobscura (0.0024–0.0179, Kulathinal, Stevison & Noor, 2009) and Homo sapiens (0.000751, Sachidanandam et al., 2001). Our estimates of nucleotide diversity (including variant and non-variant sites) was 0.0004 for all geographic regions, an order of magnitude lower than in other cnidarians. Further, based on a survey of 374 individual transcriptome-derived SNPs from 76 non-model animal species, the level of nucleotide diversity found in A. palmata is well below that predicted for a long-lived species, with small propagule size and large adult size (Romiguier et al., 2014). This low nucleotide diversity could be due to either a relatively small long-term effective population size, a severe bottleneck associated with a selective sweep (Ellegren & Galtier, 2016), the small number of SNPs included in this study (Fischer et al., 2017) or the RAD-tag method (Arnold et al., 2013). In addition, we find that Florida is the least genetically diverse geographic region when comparing nucleotide diversity in variant sites only (0.203, Table 3), as would be expected in a marginal environment (Baums, 2008; Baums et al., 2014a; Cahill & Levinton, 2016; Eckert, Samis & Lougheed, 2008). This is in contrast to Drury et al. (2016), which found samples of the congener A. cervicornis from Florida to be higher in SNP nucleotide diversity than those from the Dominican Republic. Increased sampling of the genome as well as analysis of historical samples may shed light on whether the low nucleotide diversity in A. palmata is due to technical issues, the recent population bottleneck or unrelated causes.
Allelic richness of microsatellite data correlates better with genome-wide estimates of genetic diversity based on SNPs than heterozygosity (Fischer et al., 2017) and allelic richness is more sensitive to recent population bottlenecks than heterozygosity (Allendorf, 1986). Average microsatellite-based allelic richness in 14 Indo-Pacific Acropora corals was 4.96 overall and 6.21 in the five geographically widespread species (calculated based on Table 6 in Richards & Oppen, 2012) which compares favorably with an average allelic richness of 8.49 in A. palmata found here. Thus, allelic richness of microsatellite loci remains high in Caribbean A. palmata despite recent population declines and the documented loss of alleles in Florida (Williams, Miller & Baums, 2014).
To resolve the contradictory findings with respect to genetic diversity based on micorsatellites and SNPs, future studies should include several thousand SNPs assayed in samples from across the species range. This approach may provide more conclusive data on the impact of recent population declines on overall genetic diversity in A. palmata.
Genes under positive selection
Thirteen loci out of 395 were identified as being under positive selection in Acropora palmata. Detecting regions of the genome under selection is difficult, and statistical detection methods are prone to different rates of type 1 and type 2 errors. Further, LOSITAN and BayeScan often identify different loci as being under selection (Narum & Hess, 2011). LOSITAN identifies outliers based on the joint distributions of FST and expected heterozygosity under an island model of migration (Beaumont & Nichols, 1996b). Whereas, BayeScan uses a hierarchical Bayesian method of Foll & Gaggiotti (2008), which has been modified based on the approach proposed by Beaumont & Balding (2004). Lotterhos & Whitlock (2014) claim that many of the published FST outliers based on FDIST2 and BayeScan are probably false positives; however, their results show that these false positives are mostly in balancing selection and we did not include outliers identified as being under balancing selection for this reason. In a comparison of FST outlier tests, FDIST2 and BayeScan appeared to provide the most power, depending on the scenario, and BayeScan had fewest false positives (Narum & Hess, 2011). Here, one locus 80994_17 (Digitifera scaffold gi|342271542|dbj|BACK01025553.1|, basepair = 5,143) out of 13 was identified by both programs, therefore we consider this locus to be a strong candidate for being under selection and the other loci as possible candidates. However, Stacks locus 80994_17 was not annotated, a common occurrence even for transcribed loci in corals, where typically a third or less of genes have annotation (Meyer, Aglyamova & Matz, 2011; Polato et al., 2010).
One of the SNP loci identified as being under positive selection was annotated as a tyrosine-protein kinase transmembrane receptor ROR1-like. ROR receptor protein is associated with the nervous system in the fruit fly Drosophila (Wilson, Goberdhan & Steller, 1993), nematode C. elegans (Francis et al., 2005), and sea slug Aplysia californica (McKay et al., 2001). Functional analysis of cam-1, a gene that encodes for a ROR kinase in C. elegans, demonstrated roles in both the orientation of polarity in asymmetric cell division and axon outgrowth, and the ability to guide migrating cells (Forrester et al., 1999). The role of ROR1 receptors in Cnidaria is unknown although studies in Hydra suggest a function in regulating cell specification and tissue morphogenesis (Bertrand, Iwema & Escriva, 2014; Krishnapati & Ghaskadbi, 2014; Lange et al., 2014).
Another SNP identified as being under positive selection was located in the gene annotated as unconventional myosin-IXb isoform X7, a Rho GTPase-activating protein (RhoGAP) that is essential for coordinating the activity of Rho GTPases. Invertebrates are thought to contain a single myosin class IX gene (the exception is Drosphilia which has none) whereas most vertebrates have two with fishes having four (Liao, Elfrink & Bähler, 2010). In general, Rho GTPases control the assembly and organization of the actin cytoskeleton which includes many functions such as cell adhesion, contraction and spreading, migration, morphogenesis, and phagocytosis. Little is known about the function of myosin-IX in invertebrates. However, a recent study in which Orbicella faveolata were exposed to immune challenges identified Unconventional myosin-IXb as a transcript that was significantly correlated with melanin protein activity (Fuess, Weil & Mydlarz, 2016). In humans, Myosin-IXb is highly expressed in tissues of the immune system such as the lymph nodes, thymus, and spleen and also in immune cells like dendritic cells, macrophages and CD4 + T cells (Wirth et al., 1996). Myosin-IXb knockout mice showed impaired recruitment of monocytes and macrophages when exposed to a chemoattractant demonstrating that Myosin-IXb has an important function in innate immune responses in vivo (Hanley et al., 2010). Because statistical screens for loci under selection carry a high rate of false positive results, further experimental evidence is necessary before these loci can be considered targets of selection.
Restoration efforts should proceed under the assumption that A. palmata harbors a significant amount of population structure requiring close matches of collection and outplant sites. Hybridization of A. palmata from different geographic regions may or may not result in heterosis depending on sexual compatibility, but would be worth pursuing in an ex situ setting to enable close monitoring of offspring performance under elevated temperatures (Van Oppen et al., 2015). With respect to the sharply declining Florida colonies, these findings underline the need to manage and restore Florida’s A. palmata as an isolated, genotypically depleted geographic region (Williams, Miller & Baums, 2014).
SNP-derived FIS distribution of Acropora palmata in four geographic regions
SNP-derived FST distributions of Acropora palmata in pairwise geographic region comparisons
(A) Bahamas vs Florida. (B) Bahamas vs Puerto Rico. (C) Bahamas vs USVI. (D) Florida vs Puerto Rico. (E) Florida vs USVI. (F) Puerto Rico vs USVI.
Minor allele frequency (MAF) cut of values of 0.025 (A–C) and 0.075 (D–F) yield similar Bayesian population clustering solutions to MAF = 0.05 (Fig. 2 main text)
Shown is the probability of membership (y-axis) in a given cluster for each sample (x-axis) assuming values of K = 2 (A, D), K = 3 (B, E), and K = 4 (C, F). For each K, three replicate runs were performed. The most probable K was always 3 based on the mean estimated log probability of the data at a given K and the Evanno method. Setting the minor allele frequency to 0.025 (A–C) yields 632 SNPs (analysis included only one SNP per locus) after combining Read 1 and Read 2. Setting the minor allele frequency to 0.075 (D–F) yields 213 SNPs (analysis included only one SNP per locus) after combining Read 1 and Read 2. Reefs within geographic regions sorted by latitude.
Excluding FST outliers from Acropora palmata SNP data does not change the choice of best K based on Bayesian cluster analysis with Structure
Shown is the probability of membership (y-axis) in a given cluster for each sample (x-axis) assuming values of K = 2 (A), K = 3 (B), and K = 4 (C). Analysis of 299 SNPs (analysis included only one SNP per locus) after combining Read 1 and Read 2. The most probable K was 3 (B). Minor allele frequency cut of value was 0.05 and percentage of missing data per geographic region was 40%. Reefs within geographic regions are sorted by latitude.
Decreasing the minimum percentage of individuals in a region required to have data for that locus from 60% to 40% does not change the choice of best K based on Bayesian cluster analysis
Figure shows results of a Structure analysis of 4793 SNPs (analysis included only one SNP per locus) after combining Read 1 and Read 2 and assuming a K of 2 (A), 3 (B) or 4 (C). The minor allele frequency was set to ≥0.05. The most probable K was 3 (B) based on the mean estimated log probability of the data at a given K and the Evanno method (3 replicate runs per K, ±1 standard deviation). Reefs within geographic regions are sorted by latitude.
Principal coordinate analysis (PCA) of single nucleotide polymorphism (SNP, a) and microsatellite data (b) from Acropora palmata
Red, Florida; blue, Bahamas; yellow, Puerto Rico; green, USVI; purple, Belize and orange, Curacao.
Cumulative distribution of sequencing coverage of Bowtie2 aligned Acropora palmata reads to the A. digitifera genome using Bedtools
On average 2.5% percent of the A. digitifera genome had sequence coverage at a stack depth of 5.
Mantel matrix correlation test between genetic and geographic distances
Acropora palmata samples from four geographic regions (Florida, Bahamas, Puerto Rico and USVI) were genotyped with 11 microsatellite markers, including the outlier locus, 166. y = 5∗10−6 × + 0.0137. R2 = 0.1147, p-value =0.1.
There were 12 pools with 8 unique barcodes in each. The Database ID is a unique identifier for each coral specimen. Given is also the total number of ramets for each genet in the Baumslab database. The indices are short DNA sequences that uniquely identify products in the final libraries.
GPS coordinates in decimal degrees (WGS84) for the World Ocean Atlas 2013 (WOA13 V2) environmental data, averaged for a geographic region
Outlier SNPs identified by programs Lositan and Bayescan
Stacks locus_bp is the Stacks program locus ID with the SNP location basepair after the underscore. Read category indicates whether the outlier SNP was found in read 1 or read 2 or the paired-end sequencing run. A. dig scaffold identifies the A. digitifera scaffold where the A. palmata Stacks locus aligned to, followed by the basepair location in the next column. S start, sequence start; S end, sequence end; Lositan P, Lositan P-values (Simulated FST < sample FST). Bayescan q, Bayescan q-values; Ns, not significant.
Microsatellite genotypes at five loci for the samples included in the SNP dataset
The Database ID is a unique identifier for each coral specimen. Given is also is the geographic region and reef the samples were collected from, microsatellite allele call 1 and 2 for loci 166, 181, 182, 192, and 207 (in basebairs), and the coral genet ID. The last four columns show how the radseq samples were pooled and gives the inline barcode, the Illumina index, the Illumina sequence lane and chip.
DD-Rad Tag reads
DD-Rad Tag reads of Acropora palmata used in the population genetic analyses.
Microsatellite allele calls at eleven microsatellite loci
Microsatellite allele calls at eleven microsatellite loci of Acropora palmata. File is formatted for Structure.