Figure S1. Multilocus sequence analysis (MLSA) phylogenetic reconstruction of the Cyanobacteria phylum
Tree was constructed though ML using the Dayhoff+G likelihood model by RaxML tool. Tree was inferred from a set of conserved marker genes of 100 genomes. The numbers at the nodes indicate bootstrap values as percentages greater than 50 %. Bootstrap tests were conducted with 1,000 replicates. The unit of measure for the scale bars is the number of nucleotide substitutions per site. The Gloeobacter violaceus PCC 7421 sequence was designated as outgroup.
Figure S2. Abundance and distribution of Eco clusters across freshwater metagenomes
Relative abundance of Eco-A, Eco-B and Eco-C in Caatinga biome (metagenomes, N = 8).
Figure S3. Non-metric multidimensional scaling (NMDS) analysis of the freshwater metagenomes and environmental parameters
Ordination plot of physicochemical parameters and the community structure of metagenomes from all stations. Distances of samples were used in generating the NMDS. The lengths of the lines represent the strength of the correlation. Dots indicate the metagenomes samples.
Figure S4. Non-metric multidimensional scaling (NMDS) analysis of the freshwater metagenomes and Eco clusters
Ordination plot of Eco clusters and the community structure of metagenomes from all stations. Distances of samples were used in generating the NMDS. The lengths of the lines represent the strength of the correlation. Dots indicate the metagenomes samples.
Table S1. Estimates of genome relatedness of cyanobacterium strains
Values at the matrix indicates the intergenomic distances (i.e., evolutionary divergence between sequences). The numbers of base substitutions per site between sequences are shown. Analyses were conducted accordingly Tamura et al. (2004) method. The analysis involved 110 nucleotide sequences. All positions containing gaps and missing data were eliminated. There were a total of 759 positions in the final dataset. Evolutionary analyses were conducted in MEGA6.
Table S2. Comparison of all genomic metrics
Values of GGD, AAI, and 16S rRNA % similarity of each pair (query – closest strain) were discriminated. Cutoffs values checked out, the proposed new names were depicted in the highlighted column. Classification and the corresponding genera according to Kózlov et al. (2016) paper were detailed for comparison. Type strain or Type species are indicated as T between parentheses at the end of each name. Strains presented only with 16S rRNA sequence are indicated as “(16s)” at the end of each name.