This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cyanobacteria are major contributors to global biogeochemical cycles. The genetic diversity among Cyanobacteria enables them to thrive across many habitats. However, the taxonomy of Cyanobacteria remains unstable because of an inadequate taxonomic classification system. Taxa within Cyanobacteria have historically been classified using morphological traits, which is inadequate to categorize the diversity within this phylum. The aim of this study was to propose a new taxonomic framework for Cyanobacteria using whole-genome-based taxonomic analysis including in silico Genome-to-Genome hybridization (GGH), Average Amino acid Identity (AAI), Average Nucleic acid Identity (ANI), phylogenetic reconstruction using a set of conserved marker genes (MLST), and 16S rRNA gene sequences. Applying these genomic signatures to a set of 100 cyanobacterial genomes allowed 86 species and 43 genera to be identified, among which 32 species and 19 genera were found to be novel. By exploring changes in the relative abundances of the analyzed genomes throughout diverse marine and freshwater ecosystems, we determined the ecological niches occupied by these taxa, adding another level to our proposed taxonomic scheme.
This is a preprint submission to PeerJ Preprints.
Figure S1. Multilocus sequence analysis (MLSA) phylogenetic reconstruction of the Cyanobacteria phylum
Tree was constructed though ML using the Dayhoff+G likelihood model by RaxML tool. Tree was inferred from a set of conserved marker genes of 100 genomes. The numbers at the nodes indicate bootstrap values as percentages greater than 50 %. Bootstrap tests were conducted with 1,000 replicates. The unit of measure for the scale bars is the number of nucleotide substitutions per site. The Gloeobacter violaceus PCC 7421 sequence was designated as outgroup.
Figure S3. Non-metric multidimensional scaling (NMDS) analysis of the freshwater metagenomes and environmental parameters
Ordination plot of physicochemical parameters and the community structure of metagenomes from all stations. Distances of samples were used in generating the NMDS. The lengths of the lines represent the strength of the correlation. Dots indicate the metagenomes samples.
Figure S4. Non-metric multidimensional scaling (NMDS) analysis of the freshwater metagenomes and Eco clusters
Ordination plot of Eco clusters and the community structure of metagenomes from all stations. Distances of samples were used in generating the NMDS. The lengths of the lines represent the strength of the correlation. Dots indicate the metagenomes samples.
Table S1. Estimates of genome relatedness of cyanobacterium strains
Values at the matrix indicates the intergenomic distances (i.e., evolutionary divergence between sequences). The numbers of base substitutions per site between sequences are shown. Analyses were conducted accordingly Tamura et al. (2004) method. The analysis involved 110 nucleotide sequences. All positions containing gaps and missing data were eliminated. There were a total of 759 positions in the final dataset. Evolutionary analyses were conducted in MEGA6.
Values of GGD, AAI, and 16S rRNA % similarity of each pair (query – closest strain) were discriminated. Cutoffs values checked out, the proposed new names were depicted in the highlighted column. Classification and the corresponding genera according to Kózlov et al. (2016) paper were detailed for comparison. Type strain or Type species are indicated as T between parentheses at the end of each name. Strains presented only with 16S rRNA sequence are indicated as “(16s)” at the end of each name.