The draft genome of strain cCpun from biting midges establishes Cardinium as a paraphyletic group, and reveals a novel gene family expansion in a symbiont
- Published
- Accepted
- Subject Areas
- Evolutionary Studies, Genomics, Microbiology
- Keywords
- Cardinium hertigii, Culicoides biting midges, genome sequence, phylogenomic analysis, gene family expansion, heritable symbionts
- Copyright
- © 2018 Siozios et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2018. The draft genome of strain cCpun from biting midges establishes Cardinium as a paraphyletic group, and reveals a novel gene family expansion in a symbiont. PeerJ Preprints 6:e27242v1 https://doi.org/10.7287/peerj.preprints.27242v1
Abstract
Background: It is estimated that 13% of arthropod species carry the heritable symbiont Cardinium hertigii. 16S rRNA and gyrB sequence divides this species into three clades, with the A group infecting a range of arthropods, the B group infecting nematode worms, and the C group infecting Culicoides biting midges. To date, genome sequence has only been available for strains from clade A and B, impeding general understanding of the evolutionary history of the radiation. We present a draft genome sequence for a C group Cardinium, motivated both by the paucity of genomic information outside of the A group, and the importance of Culicoides biting midge hosts as arbovirus vectors.
Methods: We reconstructed the genome of cCpun, a Cardinium strain from group C that naturally infects Culicoides punctatus, through Illumina sequencing of infected host specimens.
Results: The draft genome presented has high completeness, with BUSCO scores comparable to closed group A Cardinium genomes. Phylogenomic analysis based on concatenated single copy core proteins revealed that Cardinium, as currently considered, is paraphyletic, with strains of Ca. Paenicardinium endoni from nematodes nested within the two groups infecting arthropod hosts. Analysis of the genome of cCpun revealed expansion of a variety of gene families classically considered important in symbiosis (e.g. ankyrin domain containing genes), and one set – characterized by DUF1703 domains – not previously associated with symbiotic lifestyle. This protein group encodes putative secreted nucleases, and the cCpun genome carried at least 25 widely divergent paralogs, of which 24 had a common ancestor in the C group ancestor. The genome revealed no evidence in support of B vitamin provisioning to its haematophagous host, and indeed suggests Cardinium may be a net importer of biotin.
Discussion: These data indicate Cardinium, as currently conceived, to be paraphyletic. The draft genome further produces new hypotheses as to the interaction of the symbiont with the midge host, in particular the biological role of DUF1703 nuclease proteins that are predicted as being secreted by cCpun, but in contrast provides no support for a role for the symbiont in provisioning the host with B vitamins.
Author Comment
This is a submission to PeerJ for review.
Supplemental Information
Taxon annotated GC-coverage plot of the primary genome assembly of Culicoides punctatus
Circular representation of cCpun genome using Circos v0.69 (Krzywinski et al. 2009)
To enhance visualization the scaffolds were size sorted and concatenated into a pseudomolecule. The alternating grey and white strips highlight the scaffold borders. Inwards, the first, second and third circle are colour coded according to COG functional categories and represent a) the complete cCpun protein coding genes, b) the core genes between the four Amoebophilaceae genomes presented in Figure 1A, and c) the cCpun unique genes. In the fourth circle we show the genomic location of the genes coding for the Afp-like (magenta) and type IX (red) secretion systems as well as the DUF1703 gene paralogs. Finally, the two line plots represent genome coverage and GC% content across cCpun genome (1kb sliding window) respectively. An orange line indicates the mean coverage (90X) of the draft assembly.
cCpun is a near complete genome
BUSCO completeness assessment results for cCpun draft genome in comparison to the other Cardinium genomes and A. asiaticus. The Results are based on the presence or absence of 148 single-copy universal bacterial markers.
Repeat-content comparison across the five Amoebophilaceae genomes
Mummer self-plots representing sequence repeat density in the five Amoebaphilaceae genomes. Each dot represent a repeat (blue=direct) and (red=inverted) of at least 200bp and 95% similarity.
Pairwise similarity (lower right) and identity (upper left) matrix of the 24 DUF1703 protein paralogs identified in the genome of cCpun
Cardinium carbonic anhydrase homologs
Maximum likelihood phylogenetic placement of cCpun and Ca. Paenicardinium endonii (cHgTN10) carbonic anhydrase (CAs) protein sequences compared with their closest homologs in the Genbank database. Members from the four clades forming the beta-class of CAs are presented. The positions of the Cardinium homologs and the CA homolog identified in Rickettsia (RiCNE) endosymbionds in biting midges are indicated in purple and red respectively. Phylogenetic relationships were inferred using IQ-TREE v1.6.6 (method: automated best model selection).