Supplemental Information

Taxon annotated GC-coverage plot of the primary genome assembly of Culicoides punctatus

DOI: 10.7287/peerj.preprints.27242v1/supp-1

Circular representation of cCpun genome using Circos v0.69 (Krzywinski et al. 2009)

To enhance visualization the scaffolds were size sorted and concatenated into a pseudomolecule. The alternating grey and white strips highlight the scaffold borders. Inwards, the first, second and third circle are colour coded according to COG functional categories and represent a) the complete cCpun protein coding genes, b) the core genes between the four Amoebophilaceae genomes presented in Figure 1A, and c) the cCpun unique genes. In the fourth circle we show the genomic location of the genes coding for the Afp-like (magenta) and type IX (red) secretion systems as well as the DUF1703 gene paralogs. Finally, the two line plots represent genome coverage and GC% content across cCpun genome (1kb sliding window) respectively. An orange line indicates the mean coverage (90X) of the draft assembly.

DOI: 10.7287/peerj.preprints.27242v1/supp-2

cCpun is a near complete genome

BUSCO completeness assessment results for cCpun draft genome in comparison to the other Cardinium genomes and A. asiaticus. The Results are based on the presence or absence of 148 single-copy universal bacterial markers.

DOI: 10.7287/peerj.preprints.27242v1/supp-3

Repeat-content comparison across the five Amoebophilaceae genomes

Mummer self-plots representing sequence repeat density in the five Amoebaphilaceae genomes. Each dot represent a repeat (blue=direct) and (red=inverted) of at least 200bp and 95% similarity.

DOI: 10.7287/peerj.preprints.27242v1/supp-4

Pairwise similarity (lower right) and identity (upper left) matrix of the 24 DUF1703 protein paralogs identified in the genome of cCpun

DOI: 10.7287/peerj.preprints.27242v1/supp-5

Cardinium carbonic anhydrase homologs

Maximum likelihood phylogenetic placement of cCpun and Ca. Paenicardinium endonii (cHgTN10) carbonic anhydrase (CAs) protein sequences compared with their closest homologs in the Genbank database. Members from the four clades forming the beta-class of CAs are presented. The positions of the Cardinium homologs and the CA homolog identified in Rickettsia (RiCNE) endosymbionds in biting midges are indicated in purple and red respectively. Phylogenetic relationships were inferred using IQ-TREE v1.6.6 (method: automated best model selection).

DOI: 10.7287/peerj.preprints.27242v1/supp-6

Functional annotation of cCpun draft genome including Pfam domains and eggNOG results

DOI: 10.7287/peerj.preprints.27242v1/supp-7

Type IX secretion system (T9SS) components in cCpun genome

DOI: 10.7287/peerj.preprints.27242v1/supp-9

Evidences of recombination between the cCpun DUF1703 paralogs as determined with RDP4 software

DOI: 10.7287/peerj.preprints.27242v1/supp-10

Signal peptide prediction in the 25 intact DUF1703 protein paralogs of cCpun using the SignalP 4.1 server

DOI: 10.7287/peerj.preprints.27242v1/supp-11

Alignment files used in this study

DOI: 10.7287/peerj.preprints.27242v1/supp-12

Additional Information

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Stefanos Siozios conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.

Jack Pilgrim performed the experiments, authored or reviewed drafts of the paper, approved the final draft.

Alistair C Darby analyzed the data, authored or reviewed drafts of the paper, approved the final draft.

Matthew Baylis conceived and designed the experiments, authored or reviewed drafts of the paper, approved the final draft.

Gregory DD Hurst conceived and designed the experiments, contributed reagents/materials/analysis tools, authored or reviewed drafts of the paper, approved the final draft.

DNA Deposition

The following information was supplied regarding the deposition of DNA sequences:

The raw reads and the cCpun draft genome assembly have been submitted to the DDBJ/EMBL/GenBank database under the BioProject accession number PRJNA487198 (WGS project QWJI00000000).

Reviewers can provisionally access the data using the following link

Data Deposition

The following information was supplied regarding data availability:

The supermatrix file used for the phylogenomic analysis (Fig. 1) and the alignment files used for the phylogenetic analyses of the DUF1703 and the Carbonic Anhydrase gene families (Fig. 3 and Fig. S6 respectively) are provided as supplementary files. Both trimmed and untrimmed versions of the alignment files are provided.


This work was supported by a Marie Curie Individual Fellowship (H2020-MSCA-IF-2014) grant 657135 “MIDGESYM” to Stefanos Siozios and a BBSRC DTP studentship to Jack Pilgrim. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

