Extremely low nucleotide diversity among thirty-six new chloroplast genome sequences from Aldama (Heliantheae, Asteraceae) and comparative chloroplast genomics analyses with closely related genera

Aldama (Heliantheae, Asteraceae) is a diverse genus in the sunflower family. To date, nearly 200 Asteraceae chloroplast genomes have been sequenced, but the plastomes of Aldama remain undescribed. Plastomes in Asteraceae usually show little sequence divergence, consequently, our hypothesis is that species of Aldama will be overall conserved. In this study, we newly sequenced 36 plastomes of Aldama and of five species belonging to other Heliantheae genera selected as outgroups (i.e., Dimerostemma asperatum, Helianthus tuberosus, Iostephane heterophylla, Pappobolus lanatus var. lanatus, and Tithonia diversifolia). We analyzed the structure and gene content of the assembled plastomes and performed comparative analyses within Aldama and with other closely related genera. As expected, Aldama plastomes are very conserved, with the overall gene content and orientation being similar in all studied species. The length of the plastome is also consistent and the junction between regions usually contain the same genes and have similar lengths. A large ∼20 kb and a small ∼3 kb inversion were detected in the Large Single Copy (LSC) regions of all assembled plastomes, similarly to other Asteraceae species. The nucleotide diversity is very low, with only 1,509 variable sites in 127,466 bp (i.e., 1.18% of the sites in the alignment of 36 Aldama plastomes, with one of the IRs removed, is variable). Only one gene, rbcL, shows signatures of positive selection. The plastomes of the selected outgroups feature a similar gene content and structure compared to Aldama and also present the two inversions in the LSC region. Deletions of different lengths were observed in the gene ycf2. Multiple SSRs were identified for the sequenced Aldama and outgroups. The phylogenetic analysis shows that Aldama is not monophyletic due to the position of the Mexican species A. dentata. All Brazilian species form a strongly supported clade. Our results bring new understandings into the evolution and diversity of plastomes at the species level.

Chloroplast genome size varies from 120 to 170 kb across angiosperms, coding for 110-130 genes (∼80 proteins, ∼30 tRNAs and four rRNAs). Chloroplast organization is especially maintained throughout land plant evolution, with two copies of an inverted repeat (IR) region that separate two regions of single-copy genes: the large and small single copy regions (LSC and SSC, respectively). Despite the conserved gene content and organization shared among plastomes in higher plant lineages, some structural changes can occur (e.g., gene and intron losses, inversions, deletions, and duplications), as well as point mutations, especially in the non-protein coding regions that account for ∼50% of the genome (Raubeson & Jansen, 2005;Ruhlman & Jansen, 2014). Therefore, this level of variability makes chloroplast sequences suitable for phylogenetic analyses (Gitzendanner et al., 2018). They have been used as phylogenetic markers due to a series of advantages: small genome size, high copy number per cell, conserved gene order, uniparental inheritance (usually maternal, rarely biparental) and lack of recombination (Raubeson & Jansen, 2005).
The inference of evolutionary relationships among land plants has relied mainly on chloroplast sequences at diverse taxonomic levels (Freitas et al., 2018;Gitzendanner et al., 2018). NGS has greatly simplified the acquisition of whole chloroplast genome sequences (Parks, Cronn & Liston, 2009;Stull et al., 2013), with more than 4,500 plastomes sequenced to date (Genbank: https://www.ncbi.nlm.nih.gov/refseq/, accessed on July 2020), and recent publication of numerous plastome phylogenies (e.g., Sun et al., 2016;Givnish et al., 2018;Kuo et al., 2018;Saarela et al., 2018;Li et al., 2019). Structural changes in the chloroplast genome can also serve as powerful phylogenetic evidence, with one of the most notable examples found in the family Asteraceae. Jansen & Palmer (1987a) and Jansen & Palmer (1987b), through comparative restriction site and gene mapping studies, found a 22.8 kb inversion in the LSC region shared by all studied members of the family, except for the early-diverging lineage Barnadesieae, which was later confirmed as the sister-group to the rest of Asteraceae (Kim, Choi & Jansen, 2005;Funk et al., 2009;Mandel et al., 2019).
Asteraceae is one of the largest families of flowering plants, with 25,000-35,000 species, occurring in all continents and in nearly all types of vegetations (Mandel et al., 2019). This high species diversity is presumably linked to a series of whole genome duplications and paleopolyploidization events (Barker et al., 2008;Barker et al., 2016;Huang et al., 2016). One of these events was previously identified at the stem leading to the Heliantheae Alliance clade (Huang et al., 2016), being shortly followed by an acceleration in diversification in the phytomelanic fruit clade (i.e., all Heliantheae Alliance except the tribe Helenieae), which resulted in ∼457 genera and ∼5,500 species (Funk et al., 2009;Panero & Crozier, 2016;Mandel et al., 2019).
Aldama is a genus of the Heliantheae Alliance (Heliantheae: Helianthinae) with ∼120 species (Schilling & Panero, 2011;Magenta & Pirani, 2014), occurring in the southwestern USA and in most of South America, in mountainous regions and open vegetations. They are perennial herbs, with heads containing neutral ray florets, pappus composed by two awns and several short, deciduous or persistent squamellae (Magenta, Loeuille & Pirani, 2017). Previously to the phylogenetic analysis by Schilling & Panero (2011), the polyphyletic genus Viguiera contained most of the species currently placed in Aldama, including the 36 Brazilian species that are now placed in the latter. However, this new circumscription remains uncertain due to the small sampling of South American taxa in Schilling & Panero's (2011) study and the absence of an unambiguous morphological synapomorphy to define Aldama. The marked incongruences among subtribal level phylogenies, either based on nuclear regions (Schilling & Panero, 2011) or those based on a chloroplast restriction site dataset (Schilling & Jansen, 1989;Schilling & Panero, 1996a), indicate that introgression has played a significant role in the evolutionary history of this group (Schilling & Panero, 1996b). Nonetheless, the low level of DNA sequence divergence, either in ribosomal or chloroplast data (Schilling et al., 2000;Schilling & Panero, 2011), has been an obstacle to a clear understanding of evolutionary relationships in Helianthinae.
The chloroplast genome of Asteraceae, except for the Barnadesieae lineage, is characterized by two inversions: the previously mentioned 22.8 kb inversion in the LSC region and a second, 3.3 kb inversion, nested within the larger one (Kim, Choi & Jansen, 2005;Timme et al., 2007). The plastome size in the family varies from 149,5 to 153,7 kb (Choi & Park, 2015;Lu et al., 2016) and the gene content is relatively conserved: from 111 to 115 different genes, including 79 to 83 protein-coding genes, four rRNAs genes, and 29-30 distinct tRNAs (Wang et al., 2015;Salih et al., 2017;Lin et al., 2019;. Some variations have been found in the gene structure and tRNA abundance, and in some regions the nucleotide diversity is higher than 5% (Wang et al., 2015). These more diverse regions can be used as phylogenetic markers and even to identify cryptic lineages in some cases (Wang et al., 2015;Salih et al., 2017). Several plastome phylogenies have been published recently (Pouchon et al., 2018;Cho et al., 2019;Zhang et al., 2019;Knope et al., 2020), two of them involving lineages of the Heliantheae Alliance (the Espeletia complex, Pouchon et al., 2018, andthe genus Bidens Knope et al., 2020). These recent studies also reveal incongruence between plastid and nuclear phylogenies, likely due to the exchange of plastids between geographically close species, regardless of their morphological similarities or nuclear phylogenetic distance (Pouchon et al., 2018), or to hybridization events and/or incomplete lineage sorting (Knope et al., 2020).
With the aim to increase our comprehension of chloroplast genome characteristics, structural diversity, and evolution at low taxonomic levels in the tribe Heliantheae, we sequenced the plastome of 33 species of Aldama (Heliantheae, Asteraceae) representing a wide range of the morphological diversity in the genus. Additionally, the plastome of five species belonging to other Heliantheae genera, selected as outgroups, were also sequenced (i.e., Dimerostemma asperatum S.F. Blake, Helianthus tuberosus L., Iostephane heterophylla (Cav.) Benth., Pappobolus lanatus (Heiser) Panero var. lanatus, and Tithonia diversifolia (Hemsl.) A. Gray). We analyzed the structure and content of the assembled plastomes, performed comparative analyses within Aldama and with other closely related genera, registered selection patterns within Aldama chloroplast genes, identified putative repeated regions, and reconstructed phylogenetic relationships.

Sampling, DNA preparation, and sequencing
We sampled 36 individuals of Aldama, representing 33 species, and one individual each of five species belonging to other Heliantheae genera, four of them from subtribe Helianthinae (Helianthus tuberosus, Iostephane heterophylla, Pappobolus lanatus var. lanatus, and Tithonia diversifolia) and one from subtribe Ecliptinae (Dimerostemma asperatum), which were selected as outgroups based on other studies (Panero, 2007;Schilling & Panero, 2011) (Table 1, Data S1). Total genomic DNA was extracted from silica-gel-dried leaves collected in the field using the commercial kit E.Z.N.A. R SQ Plant DNA Kit (Omega Bio-Tek Inc., Norcross, GA, USA) following the manufacturer's instructions except by the addition of ascorbic acid and polyvinylpyrrolidone in the SQ1 buffer. The integrity of genomic DNA was assessed by performing electrophoresis in a 1% agarose gel and DNA concentrations were measured with spectrophotometric analysis with Epoch Micro-Volume Spectrophotometer System (BioTek). DNA of each sample was diluted to a final volume of 20 µL at 2.5 ng/µL, quantified with Qubit fluorometer (Invitrogen) and used to construct Illumina Nextera libraries (Illumina, San Diego, CA, USA) following the manufacturer's instructions. Plastome capture was carried out with a customized SureSelectXT Custom 1kb-499kb library R (Agilent Technologies) target enrichment panel, using biotinylated RNA baits developed based on the H. annuus plastome sequence (Data S2). This customized panel includes both the complete plastome and several low copy nuclear loci (unpublished data), thus ensuring high sequencing coverage. DNA library preparation, target enrichment and sequencing were carried out at Laboratório Multiusuários Centralizado (ESALQ-USP, Piracicaba -SP, Brazil). Sequencing was conducted using an Illumina HiSeq platform in paired-end mode, with 300 bp read lenght.

Plastome assembly and annotation
Sequence quality was initially evaluated with FastQC 0.11.9 (Andrews, 2010) and trimmed for quality and contaminants with Trimmomatic (Bolger, Lohse & Usadel, 2014), using the SLIDINGWINDOW mode with a 5-base wide window and quality cut off of 20, dropping reads shorter than 36 bp. Due to the availability of a published chloroplast genome from a closely related taxon, Helianthus. annuus (GenBank accession NC_007977;Timme et al., 2007), we opted for using reference assembly. This strategy also allows the assembly of off-target reads containing parts of the plastome sequence. The Bowtie2 plugin in Geneious (Langmead & Salzberg, 2012) was used to index the reference genome and to match the reads obtained from sequencing back to the reference genome. BAM files generated by Bowtie2 were used to calculate coverage of the sequencing with Samtools, using the ''samtools depth '' tool (Li et al., 2009), which gives the sequencing depth for each position in the final assembly. Basic statistics were used to summarize the coverage value for each taxon. Consensus sequences generated by the assembly were used for annotation and analysis. Chloroplast genome annotations were carried out using Geneious 9.1.5 (Kearse et al., 2012), DOGMA (Wyman, Jansen & Boore, 2004), and BLAST (Altschul, Gish & Miller, 1990;Altschul et al., 1997). Open Reading Frames (ORFs) were confirmed by manually searching start and stop codons. The plastome of Aldama trichophylla was the first to be annotated, using H. annuus (GenBank accession NC_007977;Timme et al., 2007) as reference. This Aldama reference was subsequently used to annotate the remaining plastomes. The graphical illustration of the A. trichophylla annotated plastome was built in OGDRAW (Lohse et al., 2013). The junction sites between the LSC/IRa/SSC/IRb regions were manually determined in Geneious, with complete annotations for adjacent genes. The boundaries between the plastomes of one species representative of each genus sequenced in this study were plotted in IRscope (Amiryousefi, Hyvönen & Poczai, 2018; https://irscope.shinyapps.io/irapp/). All 41 newly-sequenced chloroplast genomes were submitted to GenBank (Table 1).

Comparative analyses of chloroplast genomes
We performed comparative analyses among Aldama species and between Aldama and the five Heliantheae outgroup taxa. To avoid data duplication, one copy of the IRs of all chloroplast genomes was manually removed in all analyses.
Synteny and possible rearrangements were determined and identified through the comparison of the A. trichophylla plastome sequence with those from the five other Heliantheae genera. This analysis was completed in Mauve 2.4.0 (Darling, Mau & Perna, 2010, http://wolfe.gen.tcd.ie/GenomeVx), using progressiveMauve and MUSCLE 3.6 (Edgar, 2004) as alignment algorithm and internal aligner, respectively, with minimum locally collinear block (LCB) score and full alignment calculated automatically. We did not consider the plastomes to be collinear.

Selection on plastid genes
To investigate the presence positive selection signatures on the 79 plastid coding regions, Selecton 2.2 (Stern et al., 2007; http://selecton.tau.ac.il/index.html) was used to analyze the ratio of synonymous (Ks; silent) and non-synonymous (Ka; amino-acid altering) substitutions in the codon alignments for each region described above. We used the M8 (positive selection enabled; Yang et al., 2000) and M8a (null model;Swanson, Nielsen & Yang, 2003) models and likelihood scores were compared by a chi-square test with one degree of freedom. Results were considered significant when the probability was lower than 0.01.

Simple sequence repeat analyses
Identification of microsatellites or Simple Sequence Repeats (SSRs; i.e., tandem repeats of short, 1-6 bp-long DNA motifs) were performed using MISA (Beier et al., 2017) in the plastomes of the ten most divergent Aldama species (listed above) as representatives of the genus and of the five other Heliantheae genera sequenced in this study. Search for SSRs followed the following criteria: motif length between one and six nucleotides, with a minimum number of repetitions set as ten, five, and four units for mono-, di-, and trinucleotide SSRs, respectively, and three units each for tetra-, penta-, and hexanucleotide SSRs.

Phylogenetic reconstruction
We reconstructed phylogenetic relationships among plastomes of 36 Aldama and using the five other Heliantheae species assembled here as outgroup. The FFT-NS-2 method (Katoh et al., 2002) in MAFFT 7 (Katoh & Standley, 2013) was used to align all plastomes with one of the IRs removed to avoid data duplication. The ML analysis was conducted in RAxML 8.2.9 (Stamakis, 2014) using the GTR+G model with node support assessed by fast-bootstrap (-f a) using 1,000 non-parametric bootstrap pseudo-replicates.

Assembly and characteristics of the chloroplast genomes
The average read depth ranged between 1,431×and 5,037×(A. nudicaulis and A. tuberosa, respectively; Table S1). The 36 Aldama plastomes range in length from 151,221 (A. vernonioides) to 151,399 bp (A. squalida) ( Table 1 and Fig. 1). The assembled plastomes include some missing data and very few degenerate bases. However, very few of these sites were found within coding regions, not significantly impacting other analyses performed here (see Tables S2 and S3 All assembled plastomes feature a large ∼20 kb and a small, nested, ∼3 kb inversion in the LSC region, shared by other Asteraceae species (Kim, Choi & Jansen, 2005). The large inversion within the Aldama plastomes ranges from 22,350 to 22,396 bp, including 16 genes from trnS-GCU -trnC-GCA to trnG-UCC-trnT-GGU. The small inversion, with length between 3,298 and 3,328 bp, includes six genes located between trnS-GCU-trnC-GCA and trnE-UUC (Table 1 and Fig.  1). The GC content in all assembled plastomes was 37.6%. The 41 sequenced plastomes encode 113 unique genes, including 79 protein-coding genes (CDS), 30 tRNA genes, and four rRNA genes ( Table 2). The IRs of all plastomes present six CDS, seven tRNA genes, and four rRNA genes, duplicated. The plastomes assembled in this study include 17 intron-containing genes, of which 14 contain one intron, while three genes contain two introns (i.e., clpP, rps12, ycf3) ( Table 2 and Fig. 1). The rps12 gene is trans-spliced, with the 5 end located in the LSC region and the duplicated 3 end in the IR regions. The LSC/IRs/SSC boundaries are very conserved within Aldama species and among other Heliantheae genera sampled in this study. The LSC/IRb boundary is within rps19, the IRb/SSC boundary is within ycf1, the SSC/IRa is between ndhF and a partial ycf1 (ψycf1), and the IRa/LSC is between a truncated rps19 ( †rps19) and trnH (Figs. 1 and 2).

Identification of variable regions
A single synteny block was retrieved from the structural analysis performed in Mauve (Fig. S1). The plastomes of A. trichophylla and five other Heliantheae genera show the same structure and linear order. Levels of sequence diversity were investigated through the calculation of nucleotide variability (π) values within the 36 Aldama plastomes (Fig. 3A), among three Aldama species     (Fig. 3B). The π values within 800 bp across the plastomes range from 0 to 0.00754 (mean value of 0.00118) within Aldama species and from 0 to 0.0305 (mean value of 0.00860) among three Aldama species and the other genera, indicating that these sequences are very conserved, especially at the intrageneric level. The most variable sites within Aldama plastomes are within the petA-psbJ intergenic region (π between 0.00754 and 0.00613) and only three sites present π >0.004, which are ndhE-psaC, ycf1, and petN-psbM (Fig. 3A). The most variable sites among three Aldama species and the other genera are within the trnL-UAG-rpl32 and rpl32-ndhF intergenic regions (π between 0.0305 and 0.02936). Further variable sites with π >0.020 are located within the ycf1 gene and in other intergenic regions, such as trnE-UUC-rpoB, trnT-GGU-psbD, ycf3-trnS-GCU and petA-psbJ (Fig. 3B). In both datasets, the IRs are the most conserved regions (Fig. 3). The alignment of the 36 complete Aldama plastomes (with one of the IRs removed) has 127,466 bp with 1,508 variable sites (i.e., 1.18% of variable sites). In addition, the noncoding regions are more variable (i.e., 1.66% of the intergenic regions) than the coding regions (0.73% of the protein-coding genes) (Table 3).
Among the 79 protein-coding genes, the genes with the highest percentage of variable sites are presented in Fig. 3C and Table S4. Regarding absolute numbers, the genes with more than 10 variable sites are presented in Fig. 3D and Table S4.
Pairwise comparisons of divergent regions within the 10 selected Aldama plastomes (i.e., A. canescens, A. dentata 2, A. excelsa, A. filifolia, A. grandiflora, A. linearis, A. macrorhiza, A. trichophylla, A. rubra and A. veredensis) and among three species of Aldama (i.e., A. anchusifolia, A. dentata 2 and A. trichophylla) and five plastomes from other Heliantheae genera sequenced in this study were performed using mVISTA, with A. trichophylla as a reference ( Fig. 4 and Fig. S2). Overall, the alignments reveal very low intra-and inter-generic (Fig. 4) sequence divergence across the plastomes, suggesting high degree of conservation. Noncoding regions and introns are generally more divergent than coding regions ( Fig. 4 and Fig. S2).

Selection on plastid genes
The analyses performed in Selecton to explore selection pressure on the 79 protein-coding genes within Aldama plastomes showed that only one gene, rbcL, has signatures of positive selection (adaptive selection), with only two sites out of 486 with omega values (ω) lower than 1, at a significance level of 0.01 for two positions (Data S3).

SSR analyses
Six kinds of repeat patterns were screened and identified using MISA in the plastomes of the 10 selected Aldama species listed above and the five other Heliantheae genera sequenced in this study. In Aldama plastomes, the total number of SSRs ranges from 47 to 57 SSRs, while in the other genera it varies from 38 (D. asperatum) to 56 (I. heterophylla) ( Fig. 5A and Table S5). The most frequent SSRs are A or T mononucleotide repeats, accounting for 68-73.2% of the total SSRs in Aldama plastomes and for 60.5-75.5% in the other genera; G or C repeats, on the other hand, are rare ( Fig. 5C and Table S5). The total number of SSR motifs in Aldama range between 34-44 (70-77.2%) mononucleotide  (i.e., A. anchusifolia, A. dentata, and A. trichophylla) and the five plastomes from other Heliantheae genera sequenced in this study performed in mVISTA. The plastome of A. trichophylla was used as reference. Dark blue blocks indicate conserved genes (CNS), light blue blocks indicate conserved introns (UTR), and red blocks indicate conserved noncoding sequences (CNS). White blocks represent regions with sequence variation among the plastomes. The vertical axis indicates sequence alignment similarity of 50-100%.
In the other genera sequenced here, 28-45 of the SSRs are situated in the LSC, 0-2 in the IRs, and 8-9 in the SSC region (Figs. 5B-5C and Table S5). The density of SSRs in the LSC is somewhat similar to that found in the SSC when considering the size of each plastome region.

Phylogenetic reconstruction
The final alignment with the 41 taxa sequenced here (i.e., 36 Aldama species and 5 Helianthinae representatives as outgroups) is 129,218 bp long, of which 4,544 are variable sites and 1,089 are parsimony informative sites. In the ML tree (Fig. 6, bootstrap values (BS) depicted in the tree) Aldama is not monophyletic due to the position of the Mexican A. dentata as sister-group of T. diversifolia (BS: 98%), while all other Aldama species form a well-supported clade (BS: 99%). In this latter clade, Mexican and Andean species are the first to diverge (A. linearis, A. excelsa, A. canescens and A. revoluta). All Brazilian species form a strongly supported clade (BS: 99%), but internal relationships are poorly resolved due to the low nucleotide diversity found among Aldama plastomes, resulting in very short branches.

Plastome features
In this study, we sequenced and assembled 36 complete plastomes of 33 Aldama species and the plastome of five other genera from the tribe Heliantheae. Even though the sequences present a number of gaps and degenerate bases, this did not affect the genetic diversity and selection analyses, as they account for a small proportion of sites in relation to the total length of each genome. In addition, coding regions were less affected by these issues, with 0.006% of missing data and zero ambiguous sites (Tables S2 and S3). The basic features of these plastomes are highly similar to those found in other Asteraceae species, including the LSC inversions that are present in nearly all members of the family. Our results show that the plastomes are highly conserved in size, GC content, number of genes and gene order (Wang et al., 2015;Salih et al., 2017). Size variation between the smallest (A. vernonioides) and largest (A. squalida) Aldama plastomes is very low (178 bp) (Table 1). This is similar to the variation found between the plastomes of different species of Atractylodes ( (Table 1 and Fig. 1). Expansions of the IRs and contractions of the LSC are frequent in Asterids (e.g., Downie & Jansen, 2015;Firetti et al., 2017;Thode & Lohmann, 2019), but were not observed at the generic level in our results or in other studies in Asteraceae (Zhang et al., 2017;Cho et al., 2019;Gichira et al., 2019;Shahzadi et al., 2019;Shen et al., 2020;Zhang et al., 2019), except for a recent study in the genus Aster (Tyagi et al., 2020). The IR/SC boundaries are highly conserved in all Aldama and Heliantheae genera sequenced in this study (Fig. 2). Positions of these boundaries are very constant in Asteraceae (Wang et al., 2015). Some variation has been found at the IRb/SSC boundary in the tribe Cardueae and in Aster (within ycf1 or between the genes ycf1 and ndhF ) (Zhang et al., 2019;Tyagi et al., 2020), but we observed conserved borders within ycf1 in Aldama and the Heliantheae genera. In all plastomes studied here (33 species of Aldama and five Heliantheae genera), 18 genes are duplicated. Like in nearly all Asteraceae, there are seven tRNA genes and four rRNA genes located in the IR regions (Wang et al., 2015). Similar to all Angiosperms, the rps12 gene was found to be trans-spliced (one of its exons located in the LSC region and the other duplicated in the IRs) (Howe et al., 2003). Duplication of the trnF-GAA gene has probably occurred several times in the evolutionary history of Asteraceae, in different subfamilies (Carduoideae, Cichorioideae and Asteroideae) (Salih et al., 2017). We did not detect the trnF-GAA gene duplication in the Aldama species and Heliantheae genera assembled here, as previously seen in other taxa of Heliantheae, such as H. annuus (Timme et al., 2007) and Echinacea (Zhang et al., 2017). Nonetheless, this duplication was reported for Lasthenia burkeyi (Heliantheae, subtribe Madieae) (Walker, Zanis & Emery, 2014). Three other frequent duplications in Asteraceae (ndhB, rpl2 and rpl23; all located in the IR regions) (Timme et al., 2007;Salih et al., 2017;Zhang et al., 2017;Cho et al., Tyagi et al., 2020; were observed in our results. Another frequently duplicated gene in Asteraceae, ycf15, was detected in Aldama and the Heliantheae genera sequenced here, as observed in H. annuus (Timme et al., 2007), Echinacea (Zhang et al., 2017) and Aster (Tyagi et al., 2020), but its complete absence was reported in some species of Chrysanthemum (tribe Anthemideae) (Wang et al., 2015) and Guizotia abyssinica (tribe Millerieae) (Dempewolf et al., 2010) ( Table 2). The largest plastid gene in the Angiosperms, ycf2, displays a ∼456 bp deletion in Helianthus species, which does not seem to affect its functionality (Timme et al., 2007;Zhang et al., 2019), even though its vital function remains unknown (Drescher et al., 2000). Such partial deletion in ycf2 has also been observed in Poaceae (Millen et al., 2001;Matsuoka et al., 2002). We found a slightly smaller deletion in all Aldama species, with size varying from 441 bp (most species) to 447 bp (in two species), similar to the values found in species of Helianthus (Zhang et al., 2019). The same 441 bp deletion in ycf2 was found in the three taxa of subtribe Helianthinae (I. heterophylla, P. lanatus var. lanatus and T. diversifolia) and in Dimerostemma, from subtribe Ecliptinae, even though this deletion is absent in the plastome of another member of this subtribe, Eclipta prostata (Park et al., 2016). The deletion in ycf2 is absent in species of Echinacea, from subtribe Zinniinae (Zhang et al., 2017). These results indicate that the deletion in ycf2 is not an unique event in the evolutionary history of Heliantheae, as it is not restricted to the derived taxa of subtribe Helianthinae (Schilling & Jansen, 1989), also occurring in subtribe Ecliptinae. However, the current knowledge about phylogenetic relationships among Heliantheae subtribes (Panero, 2007) is insufficient to understand the evolution of ycf2 within the tribe.
The conservation of plastome features has been recorded in several Angiosperm groups (Cai et al., 2015;Reginato et al., 2016;Sun et al., 2019;Yan et al., 2019) and our results corroborate that plastomes are highly conserved in Asteraceae, even at the generic or subtribal level, in size, GC content, number of genes and gene order (Wang et al., 2015;Salih et al., 2017;Shen et al., 2020).

Variable regions
The plastomes of Aldama and other genera of Heliantheae studied here display very low intrageneric and intergeneric sequence divergence. Noncoding regions and introns are slightly more divergent than coding regions and the IRs are the most conserved regions (Figs. 3 and 4 and Fig. S1-Fig. S2). Among Aldama plastomes, the petA-psbJ intergenic region is the most variable locus and only three other regions have some significant variability: ndhE-psaC, petN-psbM and ycf1 (Fig. 4A). The intergenic region petA-psbJ is rarely used in phylogenetic inferences (e.g., Huang et al., 2004;Techaprasan et al., 2010), and has never been used in Asteraceae. Shaw et al. (2007) and Shaw et al. (2014) pointed out this region as one of the most variable within the chloroplast genome across angiosperm lineages, including among closely related species of Aster (Tyagi et al., 2020). Dong et al. (2012) suggested it as a potential candidate for DNA barcoding and phylogenies in lower taxonomic levels. The noncoding region ndhE-psaC has never been used in phylogenetic inference and has only been cited as a hypervariable region in monocots (Lu, Li & Qiu, 2017;Santana Lopes et al., 2018;Cui et al., 2019). The petN-psbM intergenic spacer was reported as a variable region in Asteraceae once, in Echinacea, another genus of Heliantheae (Zhang et al., 2017), and Shaw et al. (2014) described it as a potential variable region in specific groups (e.g., Fonseca & Lohmann, 2017). In a similar way to other intergenic regions that evolve rapidly, minute inversions have already been noted in petN-psbM, requiring some care when aligning it (Kim & Lee, 2005;Dong et al., 2012).
While coding regions normally accumulate genetic differences more slowly than noncoding regions (Fazekas et al., 2008;Lahaye et al., 2008), one exception is the gene ycf1, which evolves at a faster pace (Dong et al., 2012;Shaw et al., 2014). This region is frequently cited as one of the most variable in Asteraceae plastomes (Wang et al., 2015;Zhang et al., 2017;Cho et al., 2019;Zhang et al., 2019;Shen et al., 2020; but seems to have been used as a marker only once, in a phylogenetic study within the tribe Astereae (Costa, 2014). Nonetheless, ycf1 was successfully used in phylogenetic inferences of other angiosperm groups, such as Fabaceae (e.g., Dastpak et al., 2018) and orchids (e.g., Neubig et al., 2009). Dong et al. (2015 concluded that ycf1 can serve as a core barcode for land plants. Regarding the percentage of variable sites (in proportion to sequence length), ycf1 is the most variable coding region, followed by rpl32 ( Fig. 3C and Table S4). When considering absolute numbers, rpoC2 has the highest number of sites, followed by ycf1 ( Fig. 3D and Table S4). The rpl32 gene is rarely cited in Asteraceae as a variable region (Timme et al., 2007) and, due to its small size (165 bp), the region has never been used in phylogenetic inferences. The rpo genes, which code for RNA polymerase subunits, are relatively rapidly evolving genes (Krawczyk & Sawicki, 2013). Other studies in Asteraceae (Salih et al., 2017;Cho et al., 2019) also pointed out rpoC2 as a variable region and Wang et al. (2015) found some species-specific editing sites in Lactuca sativa, suggesting lineage specific evolutionary features of RNA editing for that region. While rpoC2 has not been used for phylogenetic inference in Asteraceae, its utility was demonstrated in other Angiosperm families like Poaceae (Peterson, Romaschenko & Arrieta, 2016) and Verbenaceae (Marx et al., 2010). In addition, Logacheva et al. (2017) concluded that rpo genes might be very useful for phylogenetic reconstructions. As a whole, the nucleotide composition among Aldama species is very conserved, with a percentage of variable sites (1.18%) somewhat lower than that found in the Espeletia species complex (2.46%; Pouchon et al., 2018), among species of Saussurea (4. 07%;Zhang et al., 2019) and in Hawaiian species of Bidens (4.8%; Knope et al., 2020), but similar to Echinacea (<1%; Zhang et al., 2017).

Signature of positive selection in plastid genes
The present study indicates that among the 79 protein-coding genes within Aldama, only the gene rbcL is significantly under positive selection (ω >1). It encodes for the larger subunit of the enzyme rubisco, which catalyzes the first step in photosynthetic assimilation of inorganic carbon and photorespiration (Spreitzer & Salvucci, 2002;Feller, Anders & Mae, 2008;Bathellier et al., 2018;Erb & Zarzycki, 2018;Pottier, Gilis & Boutry, 2018). Purifying selection has been frequently reported in Asteraceae for most of the plastidial proteincoding genes (Shahzadi et al., 2019;Zhang et al., 2019). The gene ndhF was shown to be under positive selection in Aster (Tyagi et al., 2020) and in Dolomiaea (Shen et al., 2020), along with atpA and rbcL in the latter. Zhang et al. (2019) found positive selection for ycf2 in Helianthus, but contrary to the expectation, we did not detect a positive selection signature in Aldama. The extremely low nucleotide diversity observed among chloroplast sequences might compromise our capacity to detect that signal.

SSRs in Aldama plastomes
Single Sequence Repeats (SSRs) are often observed in chloroplast genomes, being useful markers for evolutionary studies in lower taxonomic levels, such as population genetics (Avise, 1994;Ebert & Peakall, 2009;Qi et al., 2016;Yu et al., 2017). The SSRs most frequently found in chloroplast genomes are poly-A tails, usually also being the ones showing more variability. As NGS methods became cheaper and more popular, the in silico mining of microsatellite has become more common, with the advantage of easily separating organellar repeats from nuclear repeats when mapping against a reference genome. Recent papers dealing with the development of SSR markers in Asteraceae from chloroplast genomes have shown very little variability in the tested loci, especially when excluding mononucleotide repeats (Siniscalchi et al., 2019;Thapa, Bayer & Mandel, 2019).
The SSRs regions presented here can be used as basis for future population genetic studies involving Aldama, although their variability needs to be validated in different populations (Figs. 6A-6C). The fact that several repeat regions are present in more than one species is a good indication that they could be easily transferable across the genus, being useful for multi-species studies. While several Brazilian species complexes within Aldama have been the focus of detailed anatomical and phytochemical studies (Bombo et al., 2012;Bombo et al., 2014;Silva & Appezzato-da Glória, 2014;Bombo, Filartiga & Appezzato-Da-Glória, 2016;Filartiga et al., 2016;Filartiga et al., 2017), populational studies based on these SSRs would greatly improve the understanding of population dynamics and microevolutionary processes in these species.

Phylogenetic hypothesis
Our phylogenetic analysis indicates that in the current circumscription, Aldama is not monophyletic, as A. dentata is more closely related to other Mexican and Andean genera (Pappobolus and Tithonia) than to the rest of the Aldama species (Fig. 6). This result is in agreement with plastidial restriction site analysis (Schilling & Panero, 1996a) but not with ribosomal ITS and ETS phylogenetic analyses (Schilling & Panero, 2011). More studies are necessary to understand if this incongruence between plastome and nuclear data is a consequence of an evolutionary history of hybridization and chloroplast capture in the Helianthinae (Schilling & Panero, 1996b) or an artefact of the low taxonomic sampling of Mexican Helianthinae in our analysis.
The Mexican Aldama species are early diverging lineages, while all South American members of Aldama form a strongly supported clade. Nonetheless, the relationships between the Brazilian and Andean species are poorly resolved due to the extremely low nucleotide diversity (Fig. 6).

CONCLUSIONS
The analyses comparing 36 Aldama plastomes and the plastomes of five other genera belonging to Heliantheae provided significant new understanding about Asteraceae plastome evolution and structure. Within Aldama and among the other genera compared here, plastomes show very similar lengths, number of genes, and boundaries between the regions. The nucleotide variability was extremely low at the intrageneric level. Our results show that plastomes may be extremely conserved and not suited for phylogenetic analysis at the intrageneric level. Our results also shed light on a more complex evolution of the large deletion in the ycf2 gene in Heliantheae, which was probably gradual and involved multiple events. Signals of positive selection detected in the rbcL gene in Aldama raise an interesting question on the adaptative radiation of the genus that instigates further investigation. Finally, the obtained data bring powerful information for further studies aiming at a robust phylogenetic circumscription of Aldama, indicating regions that are more variable and SSR markers useful for population genetics studies.