Application of High Resolution Melt analysis (HRM) for screening haplotype variation in non-model plants: a case study of Honeybush (Cyclopia Vent.)

Aim This study has three broad aims: a) to develop genus-specific primers for High Resolution Melt analysis (HRM) of members of Cyclopia Vent., b) test the haplotype discrimination of HRM compared to Sanger sequencing, and c) provide a case study using HRM to detect novel haplotype variation in wild C. subternata Vogel. populations. Location The Cape Floristic Region (CFR), located along the southern Cape of South Africa. Methods Polymorphic loci were detected through a screening process of sequencing 12 non-coding chloroplast DNA regions across 14 Cyclopia species. Twelve genus-specific primer combinations were designed around variable cpDNA loci, four of which failed to amplify under PCR, the eight remaining were applied to test the specificity, sensitivity and accuracy of HRM. The three top performing HRM regions were then applied to detect novel haplotypes in wild C. subternata populations, and phylogeographic patterns of C. subternata were explored. Results We present a framework for applying HRM to non-model systems. HRM accuracy varied across the regions screened using the genus-specific primers developed, ranging between 56 and 100 %. The nucleotide variation failing to produce distinct melt curves is discussed. The top three performing regions, having 100 % specificity (i.e. different haplotypes were never grouped into the same cluster, no false negatives), were able to detect novel haplotypes in wild C. subternata populations with high accuracy (96%). Sensitivity below 100 % (i.e. a single haplotype being clustered into multiple unique groups during HRM curve analysis, false positives) was resolved through sequence confirmation of each cluster resulting in a final accuracy of 100 %. Phylogeographic analyses revealed that wild C. subternata populations tend to exhibit phylogeographic structuring across mountain ranges (accounting for 73.8 % of genetic variation base on an AMOVA), and genetic differentiation between populations increases with distance (p < 0.05 for IBD analyses). Conclusions After screening for regions with high HRM clustering specificity — akin to the screening process associated with most PCR based markers — the technology was found to be a high throughput tool for detecting genetic variation in non-model plants.

213 Specificity, or true negative rate, is the measure of HRM's ability to correctly discern between 214 haplotypes, grouping them into different HRM clusters. 215 Specificity =TN/( TN+FP ) 216 TN=TrueNegative FP=FalsePositive 217 The accuracy of HRM refers to how close haplotype clustering reflects the true identities of the 218 haplotypes and was measured as: 219 Accuracy=( TP+TN ) /( TP+FP+TN+FN ) 220 Since sensitivity below 100 % will be accounted for during HRM cluster (i.e. putative haplotype) 221 confirmation by sequencing (with a subset of samples from each unique HRM cluster 222 sequenced), all regions with 100 % specificity were included for the detection of novel 223 haplotypes in wild C. subternata populations.

224
The potential for HRM to detect haplotype variation in wild populations 225 Only three regions (MLT S1 -MLT S2, MLT S3 -MLT S4, and MLT U1 -MLT U2) were found to 226 have an HRM clustering specificity of 100%. Thus these regions were screened for haplotype 227 variation across 142 accessions from eight wild C. subternata populations. 228 The same approach as Dang et al. (2012) was employed, with each sample run in duplicate and 229 haplotype clustering performed on a single population basis with the intention of reducing errors 230 resulting from variation of PCR product concentration and quality across samples from different 231 population extractions. This was achieved by using the built in well group function in the CFX 232 Manager™ Software (Bio-Rad Laboratories, Hercules, California, U.S.A.), thus multiple 233 populations could be included in a run, but analyses separately for HRM clustering. 234 The cpDNA regions that were used to design the primers used for HRM haplotype detection 235 were amplified and sequenced (following the same protocols as before) to confirm the haplotype 236 identity of HRM clusters. The loci amplified by MLT S1-MLT S2 and MLT S3 -MLT S4 are 237 adjacent to one another and by sequencing the full atpI-atpH intergenic spacer, the sequence 238 identity of both loci could be confirmed with reduced sequencing and alignment effort. Moreover, 239 the position of the loci amplified by the HRM primers occurred near the center of their respective 240 parent regions and unidirectional sequencing using the reverse primers of Shaw et al. (2007) 241 proved sufficient for verifying the sequence motifs under HRM analysis. A minimum of three 242 accessions representing each HRM cluster (i.e. putative haplotype) in each population were 243 sequenced for haplotype verification. Samples whose replicates were classified as two different 244 clusters, thus having uncertain haplotype identify, were also sequenced to ensure they were 245 assigned the correct haplotype identity. A total of 46 and 38 accessions were sequenced for the 246 atpI-atpH intergenic spacer and ndhA intron respectively. Haplotype discrimination by HRM was 247 calculated using the C. subternata samples sequenced for haplotype confirmation, following the 248 same formula as before.
249 Phylogeographic analysis of C. subternata 250 The haplotypes detected via HRM clustering and confirmed by sequencing were assembled 251 following the same procedure described under 'Developing Cyclopia specific HRM primers'. All 252 wild C. subternata samples that underwent HRM analysis were then assigned the haplotype 253 identity of the HRM cluster they belonged using a custom R script written by A.J.P (provided 254 elsewhere, S1). 255 The genealogical relationships among haplotypes were determined from a Statistical Parsimony 256 (SP) network (Fig 4)  . In order to account for the possibility of 270 non linear population expansion, relationship between population differentiation measures and 271 the natural logarithm of geographic distance was tested following the same approach (Rousset, 272 1997). Finally, genetic differentiation across the mountain ranges that populations were sampled 273 from was tested via an Analysis of Molecular Variance (AMOVA) (Excoffier et al. 1992 Table 2 and 282 summarized in Fig 5). 283 Nucleotide differences between haplotypes failing to produce distinct melt curves, and thus 284 undifferentiated by HRM clustering, are summarized in Table 5. Of the haplotypes not 285 differentiated by HRM: two haplotypes differ by indels, while the remaining 15 comparisons 286 differ by at least one transversion, and two comparisons differed by a transversion and 287 transition. The haplotypes that did produce distinct melt curves differed by at least a transition 288 (26 cases), or multiple SNPs (16 cases), one haplotype differed by a 19 bp indel, and another 289 by a 6 bp indel. All haplotype sequence variation is summarized in Table 2. As previously 290 stated, the three HRM primer combinations with specificity of 100% (MLT S1 -S2, MLT S3 -291 MLT S4, MLT U1 -U2) were selected for haplotype discovery in wild C. subternata populations.  haplotypes were verified by sequencing a subset of samples (ranging from three to eight 297 individuals per population) from each HRM cluster for the remaining two primer combinations. 298 Of the 142 samples less than 29 % were required to be sequenced for haplotype confirmation. 299 Both loci were found to have 100 % specificity, i.e. HRM successfully discriminated amongst all 300 haplotypes detected in wild C. subternata populations. However, haplotype richness was 301 overestimated by HRM (sensitivity of 87.6 % and 95.5 % for MLT S1 -MLT S2 and MLT U1 -302 MLT U2 respectively), both regions had accuracies of 96 %. However, as these additional 303 clusters were sequenced for haplotype confirmation, samples were assigned the true identity of 304 haplotypes resolving any potential issues of low sensitivity. 305 The final cpDNA dataset comprised 561 bp, 217 bp from the atp1-atpH region (MLT S1 -MLT 306 S2) and 344 bp from the ndhAx1-ndhAx2 region (MLT U1 -MLT U2), with a GC content of 29 307 %. An additional 310 base pairs (bp) were amplified by MLT S3 -MLT S4, revealing no 308 nucleotide variation. The dataset contained 5 polymorphic sites; 4 transversions, 1 transition, 309 and a 7 bp indel (nucleotide differences summarised in Table S3).
310 Cyclopia subternata phylogeography 311 The SP network revealed a radiation from a central ancestral haplotype, with few mutations 312 separating haplotypes (Fig 4). The ancestral haplotype was present in all populations, except 313 the western most Garcia's Pass population located in the Langeberg Mountains. This population 314 contains a single, unique haplotype. An additional two populations (Kareedow Pass and 315 Bloukranz Bridge) were also found to contain rare, localized haplotypes and a low frequency 316 haplotype was detected in two populations located in the Tsitsikamma and Outeniqua 317 mountains (Fig 4). Population genetic differentiation measures increased with geographic 318 distance (R^2 = 0.77, 0.74, 0.70, and 0.76 for Gst, G''st, Jost's D and Provesti's dist 319 respectively, p < 0.05 for all measures), with significance increasing when tested against log 320 transformed geographic distance (R^2=0.64, 0.67, 0.61, and 0.65 for Gst, G''st, Jost's D and 321 Provesti's dist as before, p < 0.05 for all measures). The AMOVA revealed significant (p <0.05) 322 structuring across mountain ranges, accounting for 73.8 % of genetic variation. 323 Discussion 324 A nested framework (Fig 3) was developed to test and apply HRM to non-model organisms, 325 members of the Cape endemic plant genus Cyclopia. Polymorphic sites were identified via 326 sequencing 12 non-coding cpDNA regions across 14 Cyclopia species. PCR primers for HRM 327 analysis were designed to flank these variable sites, producing 11 HRM primer pairs across 7 328 regions. Eight of these pairs successfully amplified PCR products and were subsequently 329 analysed via HRM. Specificity of 100% was detected for three of the primer pairs, which were 330 then used to detect haplotype variation in wild C. subternata populations with a haplotypes 331 detection accuracy of 96 %. Haplotype detection errors were due to false negatives reducing 332 HRM sensitivity. False negatives occur when HRM incorrectly assigns a single haplotype to 333 multiple clustering groups, an issue that is resolved when the haplotype identity of HRM clusters 334 is confirmed by sequencing. Optimized HRM was demonstrated to be a powerful tool for 335 detecting genetic variation in non-model organisms, providing immediate insights into within 336 population genetic variation via automated melt curve clustering and substantially reduced PeerJ reviewing PDF | (2019:12:43978:0:1:NEW 27 Jan 2020) 337 sequencing efforts. The framework provided here offers a straightforward approach to develop 338 and test the potential application of HRM to non-model systems.

HRM discrimination of sequenced haplotypes
340 Differences in DNA melt curves, as detected by HRM, stem from the effects nucleotide 341 sequence chemistry has on melt peak intensity and curve shape. While HRM is reported to be 342 capable of discriminating between any SNP type, the approach may be constrained by physical 343 and chemical properties of the DNA fragment under melt analysis (Gundry et al. 2008). Some 344 nucleotide variations, namely class 3 (C ↔ G) and class 4 (A ↔ T) SNPS , tend to produce 345 negligible changes in melt behaviour (curve shape and melt peak) and are often poorly 353 Many of these observations are supported by the findings of this study, however some important 354 deviations were detected. Haplotypes that were successfully discriminated by HRM tended to 355 have a class 1 SNP (transitions, C ↔ T and A ↔ G) or multiple SNPs differentiating them. 356 However, seven haplotypes differing by multiple SNPs did not produce distinct melt curves 357 (Table 5), suggesting that some SNPs may potentially counteract one anothers impact on the 358 melt curve. Furthermore, haplotypes that differed by a class 2 (transversions, C ↔ A, G ↔ T) 359 and, as predicted, class 4 SNPs do not appear to have detectable melt curve differences. It is, 360 however, uncertain why in this study some class 2 SNPs produced distinct melt curves in some 361 cases (MLT M1 -MLT M2 and MLT S3 -MLT S4), but not in others (MLT C1-C4 and MLT C3 -362 C4). Nearest neighbor chemistry does not appear to be provide insights into this as the SNPs 363 had the same neighbouring base pairs across PCR products. Furthermore, a class 2 SNP was 364 differentiated by HRM in a larger PCR product (527 bp) and not in the smaller products (386 bp 365 and 236 bp), indicating that shorter DNA fragments do not necessarily produce more distinct 366 melt curves than larger fragments with the same mutation. 367 The primer design choices in this study were largely based on the suggestions that nucleotide 368 variation in shorter DNA strands will have a more pronounced impact on melt curve shape and 369 intensity. This appears to have not been the case and larger PCR products performed as well, if 370 not better, than smaller regions, as detected elsewhere (Dang et  378 High Resolution Melt analysis using the two best performing primer pairs that amplified variable 379 regions proved to be a highly accurate (96 % for both regions screened) means of detecting 380 haplotypes variation in wild Cyclopia populations with no cases of different haplotypes occurring 381 in the same cluster (specificity = 100 %). 382 A remarkable feature of HRM is its high and rapid throughput. Running samples in duplicate on 383 a 96 well plate allowed for 48 samples to be screened every three hours. As such, all 142 wild 384 C. subternata samples were screened across the two cpDNA regions in two days, with 385 immediate insights into the underlying levels of genetic variation (based on HRM clusterings). 386 This rapid data production comes at a minimal cost per sample, which in this study amounted to 387 $ 11.09 including all PCR amplification and sequencing for the phylogeographic analysis of C. 388 subternata. A costing analysis (Table S4)  396 Distribution of C. subternata genetic diversity 397 Despite the relatively low genetic differentiation and variation detected across wild C. subternata 398 populations, with a widespread haplotype detected in all populations sampled in the 399 Tsitsikamma and Outeniqua mountains, genetic diversity does appear to be spatially structured. 400 Geographically isolated haplotypes were detected in populations in the Tsitsikamma mountains, 401 and complete haplotype turnover was detected in Garcia's Pass population from the Langeberg; 402 possibly a consequence of a genetic bottleneck resulting from a small founding population, 403 facilitating rapid fixation of rare alleles (Klopfstein et al. 2006). These, and an additional low 404 frequency haplotype shared between Langekloof and Outeniqua populations, provided sufficient 405 divergence across mountain ranges to be detected by an AMOVA and roughly coincide with NJ 406 clustering of populations (Fig S1). The transition between mountain ranges represents steps of 407 increased genetic differentiation between populations (supported by significant IBD, Slatkin 408 1993), and the movement of seed and seedlings across these isolating barriers for Honeybush 409 cultivation should be avoided. 410 The population divergence described above is in contrast to that reported for the nuclear 411 genome of C. subternata (Niemandt et  421 slowly. Additionally, pollen dispersal by carpenter bees (Xylocopa spp) may reduce population 422 divergence through rare long distance dispersal events. Seed, in contrast, is dispersed locally 423 by ants and dehiscent seed pods and long distance dispersal is extremely unlikely, unless 424 anthropogenically mediated; this has likely been the case with ARC seed used to establish 425 cultivated populations across the CFR (Joubert et al. 2011). 426 The geographic distribution of C. subternata genetic diversity, as described here, indicates that: 427 a) unique haplotypes occur within populations, and b) these unique haplotypes are spatially 428 structured. These patterns of genetic diversity need to be acknowledged in the management of 429 this economically important species, with seed and seedling not translocated outside of the 430 mountain range that they were sourced from.

Conclusions
432 This study demonstrates that HRM is capable of discerning between cpDNA haplotypes, with 433 variable levels of success. Despite some haplotypes producing undifferentiated melt curves, 434 haplotypes screened using the top performing HRM regions were consistently differentiated by 435 HRM. When these top performing HRM regions were applied to screening genetic variation in 436 wild populations of the non-model organism, C. subternata, all haplotypes were differentiated. 437 The framework described here provides a clear guideline on generating the tools required for 438 applying HRM to non-model systems. This approach reduced overall project costs by avoiding 439 redundant sequencing of haplotypes. The high throughput of HRM offers the molecular 440 ecologist the opportunity to increase intrapopulation sample numbers, while the automated 441 clustering provides real time insights into the underlying levels of genetic variation. Furthermore, 442 this technology may be particularly well suited to the study of conserved and slow mutating 443 nuclear regions and the chloroplast genome of plants ) where low 444 intrapopulation genetic variation is predicted and redundant sequencing of the same nucleotide 445 motifs is likely. 446 The Cyclopia specific primers developed here provide a starting point for assessing potential 447 issues of genetic pollution associated with the transition to commercial Honeybush cultivation 448 (Potts 2017). However, further resolution may be required for more in depth population studies 449 and additional cpDNA regions as well as low copy nuclear loci should be explored for HRM 450 primer development.     Nucleotide differences and HRM clustering of Cyclopia accessions <!--?xml version="1.0" encoding="UTF-8"?--> LyX Document Sample ID of the accessions that were PCR amplified in replicates of 16, the number of replicates that successfully amplified during PCR and underwent HRM analysis (N), HRM haplotype discrimination (sensitivity, specificity and accuracy), the clustering results for each haplotype, and the nucleotide differences between haplotypes.  <!--?xml version="1.0" encoding="UTF-8"?--> LyX Document Protocol for PCR amplification and subsequent HRM curve generation. Primer Tm given in Table 1