The draft genome of strain cCpun from biting midges establishes Cardinium as a paraphyletic group, and reveals a novel gene family expansion in a symbiont
A peer-reviewed article of this Preprint also exists.
Author and article information
Abstract
Background: It is estimated that 13% of arthropod species carry the heritable symbiont Cardinium hertigii. 16S rRNA and gyrB sequence divides this species into three clades, with the A group infecting a range of arthropods, the B group infecting nematode worms, and the C group infecting Culicoides biting midges. To date, genome sequence has only been available for strains from clade A and B, impeding general understanding of the evolutionary history of the radiation. We present a draft genome sequence for a C group Cardinium, motivated both by the paucity of genomic information outside of the A group, and the importance of Culicoides biting midge hosts as arbovirus vectors.
Methods: We reconstructed the genome of cCpun, a Cardinium strain from group C that naturally infects Culicoides punctatus, through Illumina sequencing of infected host specimens.
Results: The draft genome presented has high completeness, with BUSCO scores comparable to closed group A Cardinium genomes. Phylogenomic analysis based on concatenated single copy core proteins revealed that Cardinium, as currently considered, is paraphyletic, with strains of Ca. Paenicardinium endoni from nematodes nested within the two groups infecting arthropod hosts. Analysis of the genome of cCpun revealed expansion of a variety of gene families classically considered important in symbiosis (e.g. ankyrin domain containing genes), and one set – characterized by DUF1703 domains – not previously associated with symbiotic lifestyle. This protein group encodes putative secreted nucleases, and the cCpun genome carried at least 25 widely divergent paralogs, of which 24 had a common ancestor in the C group ancestor. The genome revealed no evidence in support of B vitamin provisioning to its haematophagous host, and indeed suggests Cardinium may be a net importer of biotin.
Discussion: These data indicate Cardinium, as currently conceived, to be paraphyletic. The draft genome further produces new hypotheses as to the interaction of the symbiont with the midge host, in particular the biological role of DUF1703 nuclease proteins that are predicted as being secreted by cCpun, but in contrast provides no support for a role for the symbiont in provisioning the host with B vitamins.
Cite this as
2018. The draft genome of strain cCpun from biting midges establishes Cardinium as a paraphyletic group, and reveals a novel gene family expansion in a symbiont. PeerJ Preprints 6:e27242v1 https://doi.org/10.7287/peerj.preprints.27242v1Author comment
This is a submission to PeerJ for review.
Sections
Supplemental Information
Taxon annotated GC-coverage plot of the primary genome assembly of Culicoides punctatus
Circular representation of cCpun genome using Circos v0.69 (Krzywinski et al. 2009)
To enhance visualization the scaffolds were size sorted and concatenated into a pseudomolecule. The alternating grey and white strips highlight the scaffold borders. Inwards, the first, second and third circle are colour coded according to COG functional categories and represent a) the complete cCpun protein coding genes, b) the core genes between the four Amoebophilaceae genomes presented in Figure 1A, and c) the cCpun unique genes. In the fourth circle we show the genomic location of the genes coding for the Afp-like (magenta) and type IX (red) secretion systems as well as the DUF1703 gene paralogs. Finally, the two line plots represent genome coverage and GC% content across cCpun genome (1kb sliding window) respectively. An orange line indicates the mean coverage (90X) of the draft assembly.
cCpun is a near complete genome
BUSCO completeness assessment results for cCpun draft genome in comparison to the other Cardinium genomes and A. asiaticus. The Results are based on the presence or absence of 148 single-copy universal bacterial markers.
Repeat-content comparison across the five Amoebophilaceae genomes
Mummer self-plots representing sequence repeat density in the five Amoebaphilaceae genomes. Each dot represent a repeat (blue=direct) and (red=inverted) of at least 200bp and 95% similarity.
Pairwise similarity (lower right) and identity (upper left) matrix of the 24 DUF1703 protein paralogs identified in the genome of cCpun
Cardinium carbonic anhydrase homologs
Maximum likelihood phylogenetic placement of cCpun and Ca. Paenicardinium endonii (cHgTN10) carbonic anhydrase (CAs) protein sequences compared with their closest homologs in the Genbank database. Members from the four clades forming the beta-class of CAs are presented. The positions of the Cardinium homologs and the CA homolog identified in Rickettsia (RiCNE) endosymbionds in biting midges are indicated in purple and red respectively. Phylogenetic relationships were inferred using IQ-TREE v1.6.6 (method: automated best model selection).
Functional annotation of cCpun draft genome including Pfam domains and eggNOG results
Type IX secretion system (T9SS) components in cCpun genome
Evidences of recombination between the cCpun DUF1703 paralogs as determined with RDP4 software
Signal peptide prediction in the 25 intact DUF1703 protein paralogs of cCpun using the SignalP 4.1 server
Additional Information
Competing Interests
The authors declare that they have no competing interests.
Author Contributions
Stefanos Siozios conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.
Jack Pilgrim performed the experiments, authored or reviewed drafts of the paper, approved the final draft.
Alistair C Darby analyzed the data, authored or reviewed drafts of the paper, approved the final draft.
Matthew Baylis conceived and designed the experiments, authored or reviewed drafts of the paper, approved the final draft.
Gregory DD Hurst conceived and designed the experiments, contributed reagents/materials/analysis tools, authored or reviewed drafts of the paper, approved the final draft.
DNA Deposition
The following information was supplied regarding the deposition of DNA sequences:
The raw reads and the cCpun draft genome assembly have been submitted to the DDBJ/EMBL/GenBank database under the BioProject accession number PRJNA487198 (WGS project QWJI00000000).
Reviewers can provisionally access the data using the following link https://www.dropbox.com/sh/g4li15pve04l5hu/AACOSSDpJhu1b7tYAdHUpsMca?dl=0
Data Deposition
The following information was supplied regarding data availability:
The supermatrix file used for the phylogenomic analysis (Fig. 1) and the alignment files used for the phylogenetic analyses of the DUF1703 and the Carbonic Anhydrase gene families (Fig. 3 and Fig. S6 respectively) are provided as supplementary files. Both trimmed and untrimmed versions of the alignment files are provided.
Funding
This work was supported by a Marie Curie Individual Fellowship (H2020-MSCA-IF-2014) grant 657135 “MIDGESYM” to Stefanos Siozios and a BBSRC DTP studentship to Jack Pilgrim. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.