Plastid genomes of the North American Rhus integrifolia-ovata complex and phylogenomic implications of inverted repeat structural evolution in Rhus L.

Plastid genomes (plastomes) represent rich sources of information for phylogenomics, from higher-level studies to below the species level. The genus Rhus (sumac) has received a significant amount of study from phylogenetic and biogeographic perspectives, but genomic studies in this genus are lacking. Rhus integrifolia and R. ovata are two shrubby species of high ecological importance in the southwestern USA and Mexico, where they occupy coastal scrub and chaparral habitats. They hybridize frequently, representing a fascinating system in which to investigate the opposing effects of hybridization and divergent selection, yet are poorly characterized from a genomic perspective. In this study, complete plastid genomes were sequenced for one accession of R. integrifolia and one each of R. ovata from California and Arizona. Sequence variation among these three accessions was characterized, and PCR primers potentially useful in phylogeographic studies were designed. Phylogenomic analyses were conducted based on a robustly supported phylogenetic framework based on 52 complete plastomes across the order Sapindales. Repeat content, rather than the size of the inverted repeat, had a stronger relative association with total plastome length across Sapindales when analyzed with phylogenetic least squares regression. Variation at the inverted repeat boundary within Rhus was striking, resulting in major shifts and independent gene losses. Specifically, rps19 was lost independently in the R. integrifolia-ovata complex and in R. chinensis, with a further loss of rps22 and a major contraction of the inverted repeat in two accessions of the latter. Rhus represents a promising novel system to study plastome structural variation of photosynthetic angiosperms at and below the species level.


INTRODUCTION
The plastid genomes (plastomes) of green plants have been used as phylogenetic tools and studied from an evolutionary standpoint for decades (Soltis, Soltis & Doyle, 1992;Chase et al., 1993;Moore et al., 2007;Ruhfel et al., 2014;Gitzendanner et al., 2018;Li et al., 2019a). Across angiosperms, these genomes are thought to be highly conserved in structure, with two single copy regions (the large single copy, or LSC; and small single copy introgression, patterns of genomic and morphological variation, and ecological niche space of these two species and their hybrid introgressants remain poorly studied despite their ecological and anthropogenic importance. Thus, there is need for future phylogeographic study in this complex, especially in the context of the disjunct distribution of R. ovata.
In this study, complete plastid genomes for three accessions from the R. integrifoliaovata complex were sequenced and annotated in order to generate genomic resources for phylogeographic study and to quantify the degree of plastid genomic variation among them. Further, a phylogenomic dataset was constructed and analyzed, consisting of 52 complete, annotated plastid genomes of order Sapindales, emphasizing the family Anacardiaceae and genus Rhus. Analysis of these plastomes revealed the extent of sequence variation within the R. integrifolia-ovata complex, support for phylogenetic relationships among major clades of Sapindales, dynamic structural changes at the IR-LSC boundary, and independent instances of gene loss within the genus Rhus.

MATERIALS AND METHODS
Sampling, DNA extraction, Illumina sequencing, and read quality control Voucher specimens were collected for one individual of Rhus integrifolia (accession CFB 320c CA), and two individuals of R. ovata (accessions 290b CA and 371a AZ; Table 1). Permission to collect samples was provided by California State Parks (DRP065), and the USDA Forest Service (FS-2400-8). Because R. integrifolia and R. ovata are known to introgress frequently in California, individuals were chosen from localities showing no visible evidence of intermediate morphologies. DNA was extracted from leaf material using a CTAB extraction protocol (Doyle & Doyle, 1987), modified to 1/5 volume. Total genomic DNA was visualized on a 1% agarose gel stained with ApexSafe loading Dye (Genesee Scientific, San Diego, CA, USA), and quality/quantity of total DNA was verified via NanoDrop spectrophotometry (ThermoFisher, Waltham, MA, USA) and Qubit fluorometry (dsDNA Broad Range Assay, ThermoFisher, Waltham, MA, USA). Library preparation (Nextera DNA Flex Library Prep Kit) and Illumina sequencing on a NextSeq500 were conducted at Global Biologics, LLC (Columbia, MO, USA). Accessions were pooled at equimolar concentrations and sequenced with nine samples from another project. Paired-end reads (2 × 150 bp) were trimmed from 3′ and 5′ ends via a 3-bp sliding window specifying a minimum PHRED score of 20, and Illumina adapters were removed with Trimmomatic v.0.32 (Bolger, Lohse & Usadel, 2014). After processing, a total of 21,723,472 paired-end reads remained for R. ovata accession 371a AZ, 10,979,554 reads remained for R. ovata accession 320c CA, and 19,342,934 reads remained for R. integrifolia accession 320c CA (Table 1). FASTQC (http://www. bioinformatics.babraham.ac.uk/projects/fastqc/) was used to assess read quality before and after trimming and filtering.  . Information content per locus (here termed "IC") for each spacer, intron, or protein-coding region yielding variation was calculated as: IC = ((# SNP + # Indels)/locus length) × 100. These were ranked by locus length, and length vs. IC were plotted for each locus to identify highly variable regions within the typical target range for PCR amplification and Sanger sequencing with a single primer pair (e.g., 1,000-500 bp). Primers were designed using the Primer3 (Untergasser et al., 2012) plugin for Geneious. Additional information on sequence variation among Rhus accessions was calculated in DnaSP v.6. (Rozas et al., 2017).

Plastid phylogenomic analysis
A representative sample of species was chosen from complete, annotated plastid genomes of Sapindales to investigate plastid genomic structural variation across the order. This included 52 completely sequenced and annotated plastomes (49 from NCBI GenBank plus the three newly generated Rhus plastomes), and two outgroup taxa: Bretschneidera sinensis Hemsl. GenBank # NC_037753 (Brassicales Bromhead, Akaniaceae Stapf) and Shorea pachyphylla Ridl. ex Symington MK841940 (Malvales Juss., Dipterocarpaceae Blume). In order to make the analyses computationally tractable, a single member of each available genus was included representing seven of the nine families of Sapindales (Biebersteiniaceae and Kirkiaceae had no publicly available complete plastomes). Multiple species of some genera were included due to notable plastome length variation, including: Commiphora Jacq. (Burseraceae Kunth); Rhus (Anacardiaceae); Acer L. and Dipteronia Oliv.
(Sapindaceae Juss.); and Entandrophragma C. DC. (Meliaceae Juss.). IR annotations were added to plastomes missing this information in GENEIOUS via the self-dotplot function, and the second IR copy was removed from all accessions (IRa). The fifty-four complete plastomes (minus IRa) were aligned with progressiveMAUVE as above. The number of syntenic, locally collinear blocks (LCB) was calculated and subjected to a pairwise "double cut and join" (DCJ) analysis in MAUVE to estimate and visualize the number of genomic rearrangements and major inversions. In order to interpret plastome evolution in a comparative context, a phylogenetic tree was constructed for a concatenated matrix of the seven largest LCB identified in MAUVE (ranging from 11,663 to 52,267 bp aligned length) for a total of 209,579 bp in aligned length, including both coding and non-coding regions. RAxML (Stamatakis, 2014) was used to generate a tree under the GTR-Γ model, with the default number of rate categories (25), and base frequencies estimated empirically. Ten replicate tree searches were conducted, starting from parsimony trees, to check for convergence in topology. Then, 1,000 standard bootstrap replicates were conducted in RAxML under the same search parameters and displayed on the maximum likelihood tree estimate. All analyses were run across 30 cores, each using of 16 GB RAM (specifying RAxMLHPC-PTHREADS) on a Thinkmate VSX R5 760V3 server-class workstation (Thinkmate, Waltham, MA, USA).
To investigate the relationship between and total plastome length, IR length, and total tandem repeat content, all accessions were compared via phylogenetic least-squares regression (PGLS), which accounts for phylogenetic signal in trait covariance among species (Felsenstein, 1985;Garland, Harvey & Ives, 1992;Grafen, 1992), using the R packages "ape" (Paradis & Schliep, 2019), "phytools" (Revell, 2012), and "caper" (Orme et al., 2013). Tandem repeat content for all 54 complete plastomes (excluding IRa) was calculated in TandemRepeatsFinder (Benson, 1999) specifying a minimum alignment score of 50, and alignment parameters of 2, 7, and 7 for matches, mismatches, and indels, respectively. The consensus size of each repeat (bp) was multiplied by the number of each repeat to calculate the total tandem repeat content in bp for each plastome. Phylogenetic signal for IR length, repeat content, and total plastome length was calculated as Pagel's λ and its significance compared to a model of λ = 0 (Pagel, 1999). PGLS analysis was run under a Brownian Motion model, with λ estimated. Lastly, a closer investigation of expansion and contraction of IR boundaries was conducted using IRscope (Amiryousefi, Hyvönen & Poczai, 2018), an R 'Shiny' application that allows analysis of up to ten annotated plastomes simultaneously (https://irscope.shinyapps.io/irapp/), and outputs a JPEG illustration of IR boundary dynamics.

RESULTS
Plastome structure, repeat content, and sequence variation Assembled plastome sequences for Rhus ranged from 160,141 bp (R. integrifolia) to 160,262 bp (R. ovata from CA) ( Table 1; File S1). All three plastomes contained a total of 110 genes: 76 protein coding genes (CDS), 30 transfer RNA genes, and four ribosomal RNA genes. The LSC region ranged from 87,980 bp in R. integrifolia to 88,086 in Arizonan R. ovata; the IR from 26,602 in R. integrifolia to 26,635 bp in Californian R. ovata; and the SSC from 18,880 bp in Arizonan R. ovata to 18,957 bp in R. integrifolia. There was a total of 111 segregating sites (excluding sites with gaps), and an average pairwise nucleotide diversity (π) of 0.00055 ± 0.00016. The number of nucleotide substitutions in pairwise comparisons ranged from 127 between Californian and Arizonan R. ovata to 208 between Californian R. ovata and R. integrifolia (Table 2). All three accessions had similar numbers of dispersed and tandem repeats (Table 3; File S2). Forward-Forward dispersed repeats were the most common, followed by palindromic, forward-reverse, and forward-compliment based on our search criteria. Tandem repeats were abundant, with pentanucleotide repeats being most common, while dinucleotide repeats were the rarest, considering repeat motifs from 2 to 20 bp (Table 3). Of particular note was a 22 bp minisatellite repeat within the ndhC-trnV UAC spacer that varied in copy number between the three Rhus accessions (ATT TTT TTT ATT ATT AAT TAT T), with one unit in R. integrifolia, two units in the Arizonan accession of R. ovata, and three in the Californian accession of R. ovata.
Overall plastome structure and synteny did not differ among all three newly sequenced Rhus accessions (File S1). Interestingly, the plastomes of two accessions of Rhus chinensis (GenBank accessions MF351625 and KX447140) displayed a major contraction of the IR relative to R. integrifolia, R. ovata, R. typhina, and another accession of R. chinensis (MG267385), with the IR being some 10 KB smaller in the former R. chinensis accessions.
In the two accessions of R. chinensis with a reduced IR, four genes normally found in the IR are instead found in the LSC region: rpl2, rpl23, trnI CAU , and ycf2. The majority of SNPs, single-base indels, and multi-base indels occur in intergenic spacers within R. integrifolia and R. ovata, and these are predominantly found in the LSC region (Fig. 2). SNPs and indels within introns are present mainly in the LSC, followed by the SSC, with none in the IR. Exonic SNPs and multi-base indels are slightly more prevalent in the SSC than in the LSC. Considering information content (IC), several regions display relatively high variation within the range of commonly used PCR and sequencing capabilities with two primers for a single amplicon (i.e., 500-1,000 bp; Fig. 3). These include: the trnF GAA -ndhJ spacer, matK-trnK UUU spacer, accD-psaI spacer, clpP intron #2, trnH GUG -psbA spacer, psaA-ycf3 spacer, trnE UUC -trnT GGU spacer, and ndhC UAC -trnV spacer. Oligonucleotides for amplification of these eight regions are listed in File S3.

Plastid phylogeonomics of Rhus and order Sapindales
The concatenated matrix of 54 taxa and 209,573 aligned bp had 26,989 variable positions, excluding gaps, with 16,421 parsimony informative sites, 10,568 singleton sites, and 87,070 sites including at least one inferred gap (File S4). Fig. 4 shows the estimated phylogenetic relationships among the 52 complete plastomes of Sapindales, plus two non-sapindalean outgroups. The Maximum likelihood analysis recovered 100% bootstrap support for all "deep" relationships among families within Sapindales (excluding Biebersteiniaceae Schnitzlein and Kirkiaceae Takhtajan, for which no complete plastomes were available). Rutaceae Juss. were placed as sister to Simaroubaceae DC., followed by Meliaceae. This clade was placed as sister to Sapindaceae, and collectively these four families were placed as sister to a clade of Anacardiaceae + Burseraceae. Nitrariaceae Lindl. grouped as sister to all other families of Sapindales included in this analysis. Within Anacardiaceae, three accessions of the Asian Rhus chinensis grouped as sister to an accession of the North American R. typhina L. The two accessions of Rhus ovata (CA and AZ) were weakly supported as sister to one another (BS = 72), and together sister to R. integrifolia. The Rhus integrifolia-ovata clade grouped as sister to all other Rhus accessions. Rhus was grouped as sister to Pistacia, Toxicodendron, Anacardium L. + Mangifera L., Sclerocarya Hochst., and Spondias L., respectively.
There was evidence for phylogenetic signal in total plastome length (Pagel's λ = 0.802, p = 0.0017), while there was no signal in IR length (Pagel's λ = 0.017, p = 0.8885) or repeat content (Pagel's λ = 6.68 × 10 −5 , p = 1.0) based on analysis of 54 plastomes. Phylogenetic least squares regression revealed no association between IR length and total plastome length (adjusted R 2 = 0.0079, F = 1.421, p = 0.2387), but a significant, albeit weak association between repeat content and total plastome length (adjusted R 2 = 0.2175, F = 15.73, p = 0.0002). A model including both IR length and repeat content as explanatory variables yielded similar results (adjusted R 2 = 0.2256, F = 9.722, p = 0.0006), wherein repeat content was significantly associated with total plastome length (p = 0.0002) and IR length was not (p = 0.2189). Many members of Sapindaceae display relatively smaller total  plastome lengths (e.g., Acer, Dipteronia, Aesculus L.; Fig. 5), but this is not reflected by smaller IR lengths for those species. The plastome of Anacardium occidentale was the longest in the dataset, and had the longest IR region, but otherwise there were no clear patterns of variation in IR length driving total length of the plastome. Synteny analysis in MAUVE of the 54 complete plastomes revealed a total of 11 locally collinear, syntenic blocks (LCB; File S5). There were a few notable genomic rearrangements among members of Sapindales, but these were mostly specific to individual species or genera and were not phylogenetically informative at "higher" taxonomic levels, for example, at the family level or above (File S5). Mangifera indica has a large inversion in the LSC region, and Anacardium occidentale contains a~6,700 insertion of mitochondrial DNA within the IR, as observed previously (Rabah et al., 2017). The other striking example of plastome modification comes from two of three accessions of the Asian species Rhus chinensis: GenBank accessions KX447140 and MF351625 have IR lengths of 16,741 bp and 16,602 bp, respectively, relative to accession MG267385, which has an IR length of 27,375 bp ( Fig. 5; File S5).
A closer analysis of IR boundaries within Rhus reveals dynamic expansions and contractions of the IR, primarily at the IR-LSC junctions (Fig. 6). The most notable finding is the independent loss of the rps19 gene from the plastomes of R. chinensis and the R. integrifolia-ovata complex, with further loss of rpl22 in R. chinensis (Fig. 6). The IR is expanded at the LSC-IR boundary in one accession of R. chinensis relative to R. integrifolia, R. ovata, and R. typhina, to include a portion of psbA, which is typically found at the start of the LSC region (Fig. 6). This is followed by an approximately 9.8 KB contraction of the IR boundary in two accessions of R. chinensis. The rps19 gene spans the LSC-IRb boundary in Rhus typhina, while in R. ovata and R. integrifolia, the gene is missing. Based on the phylogenetic pattern of (R.integrifolia-ovata, (R. typhina, R. chinensis)), it can be inferred that rps19 was lost at least twice within Rhus, based on the current sampling.

DISCUSSION
Three complete plastid genomes of the Rhus integrifolia-R. ovata species complex were sequenced, annotated, and analyzed in a comparative context with several other complete plastomes from the order Sapindales. Patterns of nucleotide polymorphism and indel variation were characterized among the three newly generated Rhus plastomes, providing genomic resources for population-level and phylogeographic study of these ecologically and economically important species. Additionally, phylogenetic analysis of 52 complete plastomes in order Sapindales provides resolution and robust branch support among most families within the order, allowing a well-supported basis for phylogenomic comparison of plastid genome evolution. The most striking finding was the dynamic evolution of the  These findings suggest that the genus Rhus may have experienced several such events, providing a potentially novel, dynamic model system in which to study plastome structural changes in angiosperms at and below the species level.
Plastome structure, repeat content, and sequence variation in R. integrifolia and R. ovata Plastome structure and synteny in the three newly sequenced Rhus accessions are somewhat typical for angiosperms (Wicke et al., 2011;. Schltr. spp. (Li et al., 2019b). Repetitive DNA is common in the three Rhus accessions, with 65-69 dispersed repeats >20 bp in length) and 613-621 tandem repeats of motif length 2-20 bp (File S2). These regions are candidates for primer design and the study of length variation in a population-genetic or phylogeographic context. In particular, a 22 bp minisatellite repeat within the ndhC-trnV UAC spacer varied between all three accessions and holds potential as an informative marker. The vast majority of SNP and indel variation is found in the LSC region, particularly within spacer or intron regions. Some of these intron and spacer regions correspond to those shown in previous studies to contain relatively high levels of variation (e.g., ndhC-trnV UAC ; Shaw et al., 2005Shaw et al., , 2007, and are likely to be useful both above and below the species level as variable markers. Rhus-specific rimers for PCR amplification were designed for eight of these regions (File S3). Furthermore, the complete plastid genome sequences for R. integrifolia and R. ovata can be used to design specific target-capture or long-range PCR probes (Cronn et al., 2012;Mariac et al., 2014;Peñalba et al., 2014;Bethune et al., 2019).

Plastid phylogeonomics of Rhus and order Sapindales
Several previous studies have addressed phylogenetic relationships among families of the order Sapindales, or have included representatives of Sapindales in larger studies of angiosperm relationships (Muellner, Vassiliades & Renner, 2007;Weeks et al., 2014;Magallón et al., 2015;Muellner-Riehl et al., 2016;Li et al. 2019b). However, these studies differ in the relative placement of some of the nine families of Sapindales, with many lacking support for key relationships due to the reliance upon either limited genomic information (e.g., few-gene analyses), or by limited taxon sampling. Order Sapindales contains nine families, 479 genera, and approximately 6,570 species, or approximately 2.15% of angiosperm species diversity (Angiosperm Phylogeny Group, 2016; The Plant List, 2010). While the sampling of 52 plastomes in the current study does not capture the species richness of this diverse order, it does provide support for higher-level relationships based on the plastomes included, and further enables a robust framework for comparative plastid genomics in the order. In fact, the relationships among sapindalean families recovered in the current study are corroborated by another recent study based on complete plastome sequences in Wang et al. (2020), here with the addition of two plastomes from Nitrariaceae (Nitraria L. and Peganum L.), which were not included in the aforementioned study.
Missing from the current study are representatives of two sapindalean families, Biebersteiniaceae and Kirkiaceae. Muellner-Riehl et al. (2016) sampled 207 species representing all families of Sapindales, and conducted a phylogenetic analysis based on three plastid loci: rbcL, atpB, and the trnL-trnF spacer. While most of the deep branch support values in that study were moderate to high, the placement of Biebersteiniaceae and Nitrariaceae among other members of Sapindales received no support. Additionally, there was moderate support for Meliaceae as sister to Simaroubaceae, whereas in the current study and in that of Wang et al. (2020), both based on complete plastome data, Meliaceae is supported as sister to Simaroubaceae + Rutaceae. The topology from the current study among families is also identical to that in (Li et al., 2019b) based on coding regions of the plastome. However, in that study Biebersteiniaceae (not included in the current study) are sister to Nitrariaceae with <50% bootstrap support, collectively sister to the rest of Sapindales. Thus, future phylogenomic studies of Sapindales should focus on dense, strategic plastome sampling representative of the species richness of this clade, as well as on broad representation of nuclear genomic information (Collins, Gostel & Weeks, 2016;Eiserhardt et al., 2018;Johnson et al., 2019;Dodsworth et al., 2019).
Boundary shifts of the IR in plastid genomes are hypothesized to be one of the primary drivers of overall plastid genome size variation (Wicke et al., 2011;. However, this hypothesis was not supported across 52 representative plastomes of Sapindales in a comparative framework via phylogenetic least squares regression. While a few shifts in plastome size were visibly associated with expansions or contractions of the IR (e.g., Anacardium occidentale, Rhus chinensis; Fig. 5), shifts in the IR boundary alone are not sufficient to explain overall plastome size variation in Sapindales among phylogenetically independent lineages, based on the current sampling. Variation in repeat content appears to be a stronger determinant of plastome length variation in Sapindales (e.g., as in Asarum; Sinn et al., 2018).
Perhaps the most striking finding in the current study is the dynamic nature of evolution of the IR boundary in Rhus, based on comparison of complete plastomes across the order Sapindales.
The inverted repeat (IR) region of the plastome is the most conserved in terms of sequence variation (Palmer & Thompson, 1982) relative to the LSC and SSC. However, numerous studies have demonstrated the dynamic nature of boundary shifts in the IR, which often has implications for the evolution of gene content, structural stability, and substitution rates (Zhu et al., 2016). By the inclusion of only seven available accessions of four species in Rhus, it is evident that IR boundaries are variable at and below the species levels (Fig. 6). Thus, increased representation of plastomes in this genus and related genera will likely reveal additional variation in plastome structure, making Rhus a promising system in which to test hypotheses associated with IR expansion/contraction and the overall structural/substitutional dynamics of the plastome in a phylogenetic, comparative context. IR boundary shifts often lead to gene duplication, loss, and large-scale syntenic rearrangement, all of which have implications for substitution rates of the genes captured or "released" from the IR (Zhu et al., 2016). In some cases, genes can be lost from the plastome as a result of IR boundary shifts (Wicke et al., 2014;Downie & Jansen, 2015), as is the case in R. integrifolia-ovata (rps19) and R. chinensis (rpl22 and rps19; Fig. 6). It is possible that one or both of these genes were transferred to another genomic compartment, either to the mitochondrial genome or the nuclear genome, where they may still be expressed (Martin & Herrmann, 1998;Selosse, Albert & Godelle, 2001). Alternatively, they may have been lost from the organism altogether, with little to no fitness implications for plastid function. Numerous examples of the loss of such "housekeeping" genes are evident in mycoheterotrophic or parasitic plants which contain plastids that presumably retain function at some level (Wicke et al., 2011(Wicke et al., , 2013(Wicke et al., , 2016. The IR is hypothesized to play a major role in the structural stability of the plastome (Palmer & Thompson, 1982). The IR has been lost several times independently among plant lineages, for example in: conifers (Raubeson & Jansen, 1992;Wu et al., 2011;Wu & Chaw, 2014), legumes (Palmer, Osorio & Thompson, 1988;Lavin, Doyle & Palmer, 1990; but also regained de novo, Choi, Jansen & Ruhlman, 2019), parasitic plants (Wicke et al., 2016), mycoheterotrophic plants (Schelkunov et al., 2015;Yuan et al., 2018;Barrett, Sinn & Kennedy, 2019), members of family Geraniaceae Juss. (Guisinger et al., 2011;Blazier et al., 2016), cacti (Sanderson et al., 2015), and a palm (Tahina spectabilis J. Dransf. & Rakotoarinivo; Barrett et al., 2016). However, there is mixed evidence that IR loss leads to syntenic instability, including a lack of evidence from Erodium L'Hér. (Geraniaceae;Blazier et al., 2016) and weak evidence within the "IR-lacking clade" of papilionoid legumes that plastome rearrangements are prevalent following the loss of the IR (Sabir et al., 2014).

CONCLUSION
Three complete plastid genomes were generated for two ecologically important North American shrub species in the genus Rhus. Information from these plastomes was used to design PCR primers for amplifying variable regions and regions containing phylogeographically informative repeats in dense population-level samples (e.g., ndhC-trnV). Plastome evolution was investigated in a robustly supported phylogenomic framework, revealing striking variation within Rhus in terms of dynamics of the LSC-IR boundary, with multiple, independent instances of gene loss. Rhus presents a promising novel system in which to study the dynamics of plastome structural variation in photosynthetic angiosperms at or below the species level, and continued representation of plastomes in this genus is likely to reveal additional changes.