The draft genome of strain cCpun from biting midges establishes Cardinium as a paraphyletic group, and reveals a novel gene family expansion in a symbiont

Institute of Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Liverpool, United Kingdom
Institute of Infection and Global Health, Faculty of Health and Life Sciences, University of Liverpool, Liverpool, United Kingdom
NIHR Health Protection Research Unit in Emerging and Zoonotic Infections (HPRU-EZI), University of Liverpool, Liverpool, United Kingdom
DOI
10.7287/peerj.preprints.27242v1
Subject Areas
Evolutionary Studies, Genomics, Microbiology
Keywords
Cardinium hertigii, Culicoides biting midges, genome sequence, phylogenomic analysis, gene family expansion, heritable symbionts
Copyright
© 2018 Siozios et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Siozios S, Pilgrim J, Darby AC, Baylis M, Hurst GD. 2018. The draft genome of strain cCpun from biting midges establishes Cardinium as a paraphyletic group, and reveals a novel gene family expansion in a symbiont. PeerJ Preprints 6:e27242v1

Abstract

Background: It is estimated that 13% of arthropod species carry the heritable symbiont Cardinium hertigii. 16S rRNA and gyrB sequence divides this species into three clades, with the A group infecting a range of arthropods, the B group infecting nematode worms, and the C group infecting Culicoides biting midges. To date, genome sequence has only been available for strains from clade A and B, impeding general understanding of the evolutionary history of the radiation. We present a draft genome sequence for a C group Cardinium, motivated both by the paucity of genomic information outside of the A group, and the importance of Culicoides biting midge hosts as arbovirus vectors.

Methods: We reconstructed the genome of cCpun, a Cardinium strain from group C that naturally infects Culicoides punctatus, through Illumina sequencing of infected host specimens.

Results: The draft genome presented has high completeness, with BUSCO scores comparable to closed group A Cardinium genomes. Phylogenomic analysis based on concatenated single copy core proteins revealed that Cardinium, as currently considered, is paraphyletic, with strains of Ca. Paenicardinium endoni from nematodes nested within the two groups infecting arthropod hosts. Analysis of the genome of cCpun revealed expansion of a variety of gene families classically considered important in symbiosis (e.g. ankyrin domain containing genes), and one set – characterized by DUF1703 domains – not previously associated with symbiotic lifestyle. This protein group encodes putative secreted nucleases, and the cCpun genome carried at least 25 widely divergent paralogs, of which 24 had a common ancestor in the C group ancestor. The genome revealed no evidence in support of B vitamin provisioning to its haematophagous host, and indeed suggests Cardinium may be a net importer of biotin.

Discussion: These data indicate Cardinium, as currently conceived, to be paraphyletic. The draft genome further produces new hypotheses as to the interaction of the symbiont with the midge host, in particular the biological role of DUF1703 nuclease proteins that are predicted as being secreted by cCpun, but in contrast provides no support for a role for the symbiont in provisioning the host with B vitamins.

Author Comment

This is a submission to PeerJ for review.

Supplemental Information

Taxon annotated GC-coverage plot of the primary genome assembly of Culicoides punctatus

DOI: 10.7287/peerj.preprints.27242v1/supp-1

Circular representation of cCpun genome using Circos v0.69 (Krzywinski et al. 2009)

To enhance visualization the scaffolds were size sorted and concatenated into a pseudomolecule. The alternating grey and white strips highlight the scaffold borders. Inwards, the first, second and third circle are colour coded according to COG functional categories and represent a) the complete cCpun protein coding genes, b) the core genes between the four Amoebophilaceae genomes presented in Figure 1A, and c) the cCpun unique genes. In the fourth circle we show the genomic location of the genes coding for the Afp-like (magenta) and type IX (red) secretion systems as well as the DUF1703 gene paralogs. Finally, the two line plots represent genome coverage and GC% content across cCpun genome (1kb sliding window) respectively. An orange line indicates the mean coverage (90X) of the draft assembly.

DOI: 10.7287/peerj.preprints.27242v1/supp-2

cCpun is a near complete genome

BUSCO completeness assessment results for cCpun draft genome in comparison to the other Cardinium genomes and A. asiaticus. The Results are based on the presence or absence of 148 single-copy universal bacterial markers.

DOI: 10.7287/peerj.preprints.27242v1/supp-3

Repeat-content comparison across the five Amoebophilaceae genomes

Mummer self-plots representing sequence repeat density in the five Amoebaphilaceae genomes. Each dot represent a repeat (blue=direct) and (red=inverted) of at least 200bp and 95% similarity.

DOI: 10.7287/peerj.preprints.27242v1/supp-4

Pairwise similarity (lower right) and identity (upper left) matrix of the 24 DUF1703 protein paralogs identified in the genome of cCpun

DOI: 10.7287/peerj.preprints.27242v1/supp-5

Cardinium carbonic anhydrase homologs

Maximum likelihood phylogenetic placement of cCpun and Ca. Paenicardinium endonii (cHgTN10) carbonic anhydrase (CAs) protein sequences compared with their closest homologs in the Genbank database. Members from the four clades forming the beta-class of CAs are presented. The positions of the Cardinium homologs and the CA homolog identified in Rickettsia (RiCNE) endosymbionds in biting midges are indicated in purple and red respectively. Phylogenetic relationships were inferred using IQ-TREE v1.6.6 (method: automated best model selection).

DOI: 10.7287/peerj.preprints.27242v1/supp-6

Functional annotation of cCpun draft genome including Pfam domains and eggNOG results

DOI: 10.7287/peerj.preprints.27242v1/supp-7

Type IX secretion system (T9SS) components in cCpun genome

DOI: 10.7287/peerj.preprints.27242v1/supp-9

Evidences of recombination between the cCpun DUF1703 paralogs as determined with RDP4 software

DOI: 10.7287/peerj.preprints.27242v1/supp-10

Signal peptide prediction in the 25 intact DUF1703 protein paralogs of cCpun using the SignalP 4.1 server

DOI: 10.7287/peerj.preprints.27242v1/supp-11

Alignment files used in this study

DOI: 10.7287/peerj.preprints.27242v1/supp-12