First complete mitogenomes of Diamesinae, Orthocladiinae, Prodiamesinae, Tanypodinae (Diptera: Chironomidae) and their implication in phylogenetics

Background The mitochondrial genome (mitogenome) has been extensively used for phylogenetic and evolutionary analysis in Diptera, but the study of mitogenome is still scarce in the family Chironomidae. Methods Here, the first complete mitochondrial genomes of four Chironomid species representing Diamesinae, Orthocladiinae, Prodiamesinae and Tanypodinae are presented. Coupled with published mitogenomes of two, a comparative mitochondrial genomic analysis between six subfamilies of Chironomidae was carried out. Results Mitogenomes of Chironomidae are conserved in structure, each contains 37 typical genes and a control region, and all genes arrange the same gene order as the ancestral insect mitogenome. Nucleotide composition is highly biased, the control region displayed the highest A + T content. All protein coding genes are under purifying selection, and the ATP8 evolves at the fastest rate. In addition, the phylogenetic analysis covering six subfamilies within Chironomidae was conducted. The monophyly of Chironomidae is strongly supported. However, the topology of six subfamilies based on mitogenomes in this study is inconsistent with previous morphological and molecular studies. This may be due to the high mutation rate of the mitochondrial genetic markers within Chironomidae. Our results indicate that mitogenomes showed poor signals in phylogenetic reconstructions at the subfamily level of Chironomidae.


INTRODUCTION
The typical mitochondrial genome (mitogenome) of insects is a double-strand circular molecule ranging from 14kb to 20kb in size, which encodes 37 genes (13 protein-coding genes, two ribosomal RNA genes, and 22 transfer RNA genes) and a control region (Boore, 1999;Cameron, 2014;Wolstenholme, 1992). Due to its small genome size, maternal inheritance, low sequence recombination, and fast evolutionary rates (Brown, George & Wilson, 1979;Curole & Kocher, 1999), the mitogenome is considered as powerful marker for phylogenetic and evolutionary analysis (Condamine et al., 2018;Jacobsen et al., 2012;Stokkan et al., 2018;Tang et al., 2019b). Due to high-throughput sequencing technology, an increasing number of complete mitogenomes have been sequenced among the Diptera, covering most families (Kang, Li & Yang, 2016;Li et al., 2020;Miao et al., 2020;Ramakodi et al., 2015;Tang et al., 2019a). Mitogenomes have been widely used for mitochondrial structure comparison and phylogenetic analysis at different taxonomic level of the Diptera (Chen et al., 2018;De Oliveira Aragão et al., 2019;Miao et al., 2020;Yan et al., 2019;Zhang et al., 2016;Zhang et al., 2019b). However, complete mitogenomes are still scarce for the family Chironomidae, which limits our understanding of their mitochondrial structure and phylogenetic pattern. In addition, it is still unknown whether mitogenomes can effectively resolve phylogenetic relationships at the subfamily level within Chironomidae.
The dipteran family Chironomidae is a diverse aquatic insect group, and are important bioindicators for freshwater ecosystem monitoring. Within Chironomidae, several phylogenetic studies have been conducted based on morphological characters or combining genetic markers to reconstruct the evolutionary history of subfamilies (Cranston, Hardy & Morse, 2012;Saether, 1977), but no one has attempted to use mitogenomes. Prior to this study, only five mitogenomes of Chironomidae were available (Beckenbach, 2012;Deviatiiarov, Kikawada & Gusev, 2017;Kim, Kim & Shin, 2016;Park et al., 2020;Zhang et al., 2019a), representing species from three subfamilies: Chironominae, Podonominae, and Prodiamesinae. However, comparative analysis of mitogenome structure, base composition, substitution and evolutionary rates among subfamilies has not been carried out. In addition, the monophyly of Chironomidae has not been supported by a recent study using mitogenomes of Culicomorpha (Zhang et al., 2019b).
In the present study, we provide complete mitogenomes for four species representing the subfamilies Diamesinae, Orthocladiinae, Prodiamesinae, and Tanypodinae. Along with the published mitogenomes of subfamilies Chironominae and Podonominae, the first comparative analysis of the genome structure, base composition, substitution and evolutionary rates among six chironomid subfamilies is presented. In addition, a phylogenomic analysis covering six chironomid subfamilies was carried out.

DNA extraction, sequencing and assembling
For the newly sequenced species, total genomic DNA was extracted from the body, (except abdomen and genitalia) using a General AllGen Kit (Qiagen, Germany). The entire mitogenome of each species were sequenced using the Illumina NovaSeq 6000 platform with an insert size of 350-bp and a paired-end 150-bp sequencing strategy by the Allwegene Company and Novogene Co., Ltd. at Beijing, China. About 2 Gb clean data were obtained from each library after trimming using Trimmomatic (Bolger, Lohse & Usadel, 2014).
To ensure the accuracy of the mitogenome sequences, three frequently used assembly methods were applied to each sample. A de novo assembly was performed using IDBA-UD (Peng et al., 2012) with minimum and maximum k values of 40 and 120 bp, respectively. Two reference based assemblies were performed using Geneious 2020.2.1 (Kearse et al., 2012) with default setting and MITObim 1.9 (Hahn, Bachmann & Chevreux, 2013). The mitogenome sequences obtained by the three methods were aligned, manually compared, and finally compiled into a single sequence in Geneious 2020.2.1 (Kearse et al., 2012).

Genome annotation, composition and substitution rate
Genome annotation was conducted as previously described in Zheng et al. (2020). Specifically, the transfer RNA (tRNA) genes and their secondary structures were detected by MITOS2 webserver (http://mitos2.bioinf.uni-leipzig.de/index.py) (Bernt et al., 2013) with invertebrate mitochondrial genetic code. The ribosomal RNA (rRNA) genes were predicted by alignment with homologous regions of mitogenome from closely related species. Protein coding genes (PCGs) were initially annotated using the Open Reading Frame Finder (ORF Finder) as implemented at the NCBI website (https://www.ncbi.nlm.nih.gov/orffinder/) and then compared with published mitogenomes of insects using the program BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi). Newly sequenced mitogenomes were submitted to GenBank (accession numbers: MW373523-MW373526).
CGView Server V 1.0 (Grant & Stothard, 2008) was used to draw mitogenome maps. Codon usage of PCGs and nucleotide composition were calculated in MEGA X (Kumar et al., 2018). The bias of the nucleotide composition was measured according to the formulas: AT-skew = (A−T)/(A+T) and GC-skew = (G−C)/(G+C). Rates of non-synonymous substitution rate (Ka) and synonymous substitution rate (Ks) were calculated in DnaSP 6.12.03 (Rozas et al., 2017).

Phylogenetic analyses
Phylogenetic analyses were conducted using the sequences of 13 PCGs and two rRNAs. The PCGs were aligned based on amino acid sequences using Muscle implemented in MEGA X (Kumar et al., 2018). The rRNAs were aligned using MAFFT 7.402 (Katoh & Standley, 2013) with algorithm G-INS-i strategy. Alignments of individual genes were then concatenated using SequenceMatrix v1.7.8 (Vaidya, Lohman & Meier, 2011) to generate five datasets: PCG123 (all three codon positions of the 13 PCGs), PCG123R (all three codon positions of the 13 PCGs and two rRNAs), PCG12 (the first and second codon positions of the 13 PCGs), PCG12R (the first and second codon positions of the 13 PCGs and two rRNAs), and AA (amino acid sequences of the 13 PCGs). To test substitution saturation, transition and transversion rates were evaluated by DAMBE 5.6.14 (Xia, 2013). The program PartitionFinder 2.0 (Lanfear et al., 2017) was used to infer the best substitution model (Table S1). The analysis of Bayesian inference (BI) and maximum likelihood (ML) were conducted for each dataset. The BI analyses were performed under the program MrBayes 3.2.7a (Ronquist et al., 2012) with partitioned models (Table S1). Two simultaneous Markov chain Monte Carlo (MCMC) runs of 10,000,000 generations were conducted, trees were sampled every 1000 generations with a burn-in of 25%. The program Tracer 1.7 (Rambaut et al., 2018) was used to assess the convergence of runs. The ML analyses were conducted using the program RAxML 8.0.12 (Stamatakis, 2014) under the substitution model GTR +GAMMA +I. The nodal support values were calculated with 1,000 bootstrap replicates.
Nucleotide composition (Table 2) of the six Chironomidae species is similar, with a high A+T bias (72.4%-76.8%), the control region has the highest A+T content while the first and the second codon positions of PCGs have the lowest A+T content. All six Chironomidae species exhibited negative AT-skew and GC-skew. All three codon positions of PCGs had negative AT-skew, the GC-skew of the first codon position was positive, while the 2nd and the 3rd codon position were negative. Some gene regions exhibited different nucleotide skew among the six Chironomidae species. For example, in 12S rRNA, the AT-skew of Chironomus tepperi and Clinotanypus yani are −0.01 and 0.00 respectively, while the AT-skew are positive (0.01-0.05) in the remaining four species.

Figure 1 Mitogenome maps of Chironomus tepperi (A), Potthastia sp. (B), Rheocricotopus villiculus (C), Rheocricotopus villiculus (D), Prodiamesa olivacea (E), Clinotanypus yani (F).
The names of PCGs and rRNAs are indicated by standard abbreviations, while names of tRNAs are represented by a single letter abbreviation. The first circle shows the gene map and arrows indicate the orientation of gene transcription. Blue arrows refer to PCGs, purple arrows refer to rRNAs, red arrows refer to tRNAs and grey arrow refers to control region. The second circle shows the GC content, which is plotted as the deviation from the average GC content of the entire sequence. The third circle shows the GC-skew, which is plotted as the deviation from the average GC-skew of the entire sequence. The innermost circle shows the sequence length.
Full-size DOI: 10.7717/peerj.11294/ fig-1 in all six Chironomidae species (Fig. 2). The relative synonymous codon usage (RSCU) patterns among the six Chironomidae species are similar. The RSCU values are showed in Fig. 3. All synonymous codons of 20 amino acids are present. The most frequent used codons are NNU and NNA for each amino acid (Fig. 3).  The Ka/Ks value (ω) was used to test for signatures of natural selection (Cheng et al., 2018;Hu & Banzhaf, 2008). The ω value of all PCGs are less than 0.6. Among the 13 PCGs, ATP8 has the largest ω value, indicating that ATP8 evolves at the fastest rate. The animal DNA barcoding gene COI has the lowest ω value (Fig. 4).

tRNAs, rRNAs and control region
The typical set of 22 tRNA genes were identified in the mitogenomes of all six Chironomidae species, ranging from 63 to 72 bp in length. The tRNA genes exhibit high A+T content (73.2%-79.5%), positive AT-skew, and negative GC-skew (Table 2). All the predicted tRNAs can be folded into the typical clover-leaf secondary structure except trnS (GCU), which lacks the dihydrouridine (DHU) arm. The non-Watson-crick base pair G-U is common in tRNA genes from all Chironomidae species (Fig. S1-S6). Both 12S and 16S rRNA genes exhibit similar position and size across the Chironomidae mitogenomes. The A+T content of 12S and 16S rRNA genes ranges from 76.4% to 82.6% and 80.1% to 84.4%, respectively. Both genes exhibit positive AT-skew and negative GC-skew in all Chironomidae species except Chironomus tepperi: the AT-skew of 12S rRNA and 16S rRNA in Chironomus tepperi is −0.01 and 0.00, respectively ( Table 2).

Mitogenome features
The entire mitogenome length of the six Chironomidae species differs considerably (15,652-16,803 bp), mainly due to the variation in control region size. All Chironomidae mitogenomes contain 37 typical genes and a control region, the order and arrangement of these genes are completely accordant with the ancestral insect gene arrangement (Clary & Wolstenholme, 1985). The whole mitogenome of Chironomidae has high A+T content and similar AT/GC-skew, consistent with the similar base composition biases of insect mitochondrial DNA (Wei et al., 2010). This type of nucleotide bias may be related to the asymmetric mutation processes during replication (Hassanin, Leger & Deutsch, 2005). Among the Chironomidae mitogenomes, most PCGs have complete termination codons, while the COII gene in Parochlus steinenii and Clinotanypus yani has an incomplete termination codon (T-) that probably completed by post-transcriptional polyadenylation (Ojala, Montoya & Attardi, 1981). The patterns of codon usage among the Chironomidae mitogenomes are nearly the same. The most frequent used codons were NNU and NNA for each amino acid, reflecting the AT bias of nucleotide composition. For most amino acids, the most frequently used codon is not the anti-codon that strictly correspond to tRNA. The low ω value for each PCG indicates that they are all under strong purifying selection. The animal DNA barcoding gene COI has the lowest evolutionary rate, which is consistent with the results observed from other insect groups (Li et al., 2020;Yang, Yu & Du, 2013;Zhang & Ye, 2017).
All six Chironomidae mitogenomes contain the 22 typical tRNA genes, and secondary structure across species is similar. Unlike other tRNA genes, trnS (GCU) lacks the dihydrouridine (DHU) arm. This could be commonly found in published insect mitogenomes (Li et al., 2012;Lu, Huang & Deng, 2020;Zhang et al., 2018). The A+T contents of 12S rRNA, 16S rRNA, and control region are much higher than that in the whole genome in Chironomidae mitogenomes, indicating a strong A+T bias in these regions.

CONCLUSIONS
In this study, we sequenced four complete mitogenomes representing four subfamilies of Chironomidae by whole genome sequencing technologies and did the first comparative analysis of mitogenome base composition and evolutionary history in Chironomidae. The study shows that mitogenomes of Chironomidae are conserved in structure, gene order and nucleotide composition. Our results revealed that mitogenomes have poor phylogenetic signals for subfamily level relationships in Chironomidae.