Robustness analysis of metabolic predictions in algal microbial communities based on different annotation pipelines

Animals, plants, and algae rely on symbiotic microorganisms for their development and functioning. Genome sequencing and genomic analyses of these microorganisms provide opportunities to construct metabolic networks and to analyze the metabolism of the symbiotic communities they constitute. Genome-scale metabolic network reconstructions rest on information gained from genome annotation. As there are multiple annotation pipelines available, the question arises to what extent differences in annotation pipelines impact outcomes of these analyses. Here, we compare five commonly used pipelines (Prokka, MaGe, IMG, DFAST, RAST) from predicted annotation features (coding sequences, Enzyme Commission numbers, hypothetical proteins) to the metabolic network-based analysis of symbiotic communities (biochemical reactions, producible compounds, and selection of minimal complementary bacterial communities). While Prokka and IMG produced the most extensive networks, RAST and DFAST networks produced the fewest false positives and the most connected networks with the fewest dead-end metabolites. Our results underline differences between the outputs of the tested pipelines at all examined levels, with small differences in the draft metabolic networks resulting in the selection of different microbial consortia to expand the metabolic capabilities of the algal host. However, the consortia generated yielded similar predicted producible compounds and could therefore be considered functionally interchangeable. This contrast between selected communities and community functions depending on the annotation pipeline needs to be taken into consideration when interpreting the results of metabolic complementarity analyses. In the future, experimental validation of bioinformatic predictions will likely be crucial to both evaluate and refine the pipelines and needs to be coupled with increased efforts to expand and improve annotations in reference databases.


Genome annotation
Eighty-one publicly available genomes were used for this study (Table S1), all of them corresponding to strains that have been isolated from the brown alga Ectocarpus subulatus. The genomes were structurally and functionally annotated using five popular annotation pipelines: the Magnifying Genomes (MaGe) platform (Vallenet et al., 2019), the Rapid Annotations using Subsystems Technology (RAST) toolkit (Aziz et al., 2008), the Integrated Microbial Genomes (IMG) server (Markowitz et al., 2009), the Prokka tool v1.13 (Seemann, 2014) and the DDBJ Fast Annotation and Submission Tool (DFAST) (Tanizawa, Fujisawa & Nakamura, 2017). Each pipeline uses one or more unique databases to predict and search for genes and gene functions. MaGe searches for functional features using UniProtKB/Swiss-Prot, Interpro, FIGFAM, COG, ENZYME, and Diamond as a search tool (Vallenet et al., 2019); RAST predicts gene functions using the SEED database (Aziz et al., 2008); IMG predicts features for genes based on COGs, Pfams, TIGRFAMs, as well as the KEGG and MetaCyc; Prokka is a command-line tool using UniProt, Pfam, and TIGRFAMs; DFAST uses ortholog searches with reciprocal BLAST searches, HMM searches against TIGRFAMs, and COG assignments for functional annotations (Tanizawa, Fujisawa & Nakamura, 2017). All of the pipelines were run with default parameters (Table  S2). For Prokka, we also performed tests with more stringent e-value settings (1e−15 vs 1e−6 (default)); as these results were similar to those obtained with default parameters they are included only in the supplementary tables.

Draft metabolic network reconstruction
The draft metabolic network of the host alga Ectocarpus subulatus was already available (Dittami et al., 2020a) and had been reconstructed using Pathway Tools version 23 (Karp et al., 2019) and the AuReMe pipeline (Aite et al., 2018). To be able to use this metabolic network of the host for complementarity analyses, the same version of Pathway Tools was also used to reconstruct the bacterial draft metabolic networks. For each annotated bacterial genome (i.e., 5 pipelines × 81 strains), a draft metabolic network was automatically reconstructed using Mpwt, the PathoLogic multiprocessing wrapper Python package (Belcour et al., 2020) taking into account the protein description, GO terms, and complete EC numbers. The output files created by Pathway Tools were then converted to the SBML format with the Padmet software. During this process the reactions without gene association were removed. Then, the list of reactions as well as network statistics were extracted using the report_network.py script from the Padmet software (Aite et al., 2018). Consequently, for each of the five annotation pipelines, a set of 81 draft metabolic networks was obtained and used for metabolic complementarity analyses. Furthermore, one additional set was produced for each strain by merging the metabolic networks obtained for each annotation pipeline.

Predicting metabolic complementarity
The SBML files were utilized to predict potential metabolic complementarity between the algal host and the bacteria using the MiSCoTo tool . Based on a list of compounds present in the culture medium, i.e., Provasoli-enriched seawater as defined by Prigent et al. (2014), MiSCoTo predicts the production of compounds (i.e., the scope) in combined metabolic networks based on a discrete abstraction, which assumes recursively that a compound is producible either if it belongs to the culture medium if it is the product of a reaction in the combined metabolic networks whose substrates are themselves producible, as stated in the network expansion algorithm (Ebenhöh, Handorf & Heinrich, 2004). It thus does not consider the kinetics, or stoichiometry of the reactions, nor the relative abundance of the organism corresponding to the metabolic networks. MisCoTo furthermore allows for the selection of minimal symbiont communities that enable the production of a given set of compounds. In our analyses of the data obtained with the five different pipelines individually plus one analysis merging all genomes from all pipelines, the set of host metabolites that became producible through symbiotic cooperation was computed with miscoto_scopes. This set was then used as a target for community selection with miscoto_mincom using ''--soup'' as an option, i.e. ignoring the putative cost of exchanges associated with ensuring the producibility of these novel metabolites by the host. In each experiment, the minimal community size ensuring the producibility of these algal compounds was computed, together with the set of symbionts that appears in at least one of the minimal communities (''--union'' option). Both the list of producible metabolites and the selected microbial communities were compared for all tested annotation pipelines. The detailed description of the code and data used for the above steps is available on GitHub: https://github.com/ElhamKarimi/Metabolic-predictions_different-Pipelines.

Indirect measures of network quality and manual examination of pipeline-specific reactions reconstructed metabolic networks
To gain insight into the quality of metabolic networks generated based on the different pipelines, dead-end metabolites were computed for each pipeline using Memote (Lieven et al., 2020). Furthermore, sizes of the largest strongly connected components were counted in each of the networks to obtain a measure of its overall connectivity. First, directed metabolite graphs were constructed from the SBML files using the Networkx package (Hagberg, Swart & Chult, 2008). Then, 23 ubiquitous metabolites were removed (water, proton, ATP, ADP, 3-5-ADP, AMP, NADP, NADPH, NAD, NADH, FAD, UDP, GTP, GDP, oxygen molecule, inorganic phosphate, diphosphate, carbon dioxide, ammonia, hydrogen peroxide, coenzyme A, H 2 -acceptor, H 2 -donor). Finally, for each graph, strongly connected components were found with the Networkx strongly_connected_components function. Then, for each metabolic network, the ratio between the size of the largest strongly connected component and the total number of metabolites was computed.
In addition to these indirect measures, we randomly chose 100 reactions that were exclusively found in one pipeline (20 per pipeline) for manual curation. Protein sequences of the bacterial genes that led to the prediction of these reactions were aligned with the Swissprot/Uniprot database using blastP and were then classified as high confidence (>30% ID; >70% coverage with a protein with the same metabolic function, functional domains present), low confidence (lower similarity with a protein with the same function, or similarity with a protein with a distinct but similar function), or false (>30% ID; >70% coverage with proteins known to carry out a different function).

Statistical analyses
All statistical analyses were carried out using RStudio (v.1.1.463). Differences in the number of annotation features per pipeline were determined using an ANOVA followed by a Tukey's HSD (honestly significant difference) test, considering each genome as a replicate. Nonmetric multidimensional scaling (NMDS) was performed on a presence/absence matrix comprising all reactions for each genome/annotation pipeline. The Bray-Curtis index was used as a measure of quantification of compositional dissimilarity between genomes annotated by different pipelines. To verify if the annotation pipeline had an impact on the reaction, an ANOSIM test (number of permutations = 9999) was performed using the Vegan package (Oksanen et al., 2020). UpSet diagrams were generated using the UpSetR package (Conway, Lex & Gehlenborg, 2017). Comparisons of the overall numbers of predicted EC-numbers per category were tested using a Binomial test and a Bonferroni correction for multiple testing.

RESULTS
81 bacterial genomes were annotated using the Prokka, RAST, IMG, MaGe, and DFAST pipelines, and the number of predicted features (Coding Sequences (CDSs), Enzyme Commission (EC) numbers, hypothetical proteins) are shown for each genome and pipeline in Table S1. The number of predicted CDSs across all genomes and pipelines ranged from 2180 to 6755; the number of predicted EC-numbers ranged from 443 to 2106. The number of hypothetical proteins ranged from 56 to 3566. Finally, the average number of predicted metabolic reactions, after automatic metabolic network reconstruction with Pathway Tools, ranged from 1034 to 2052. Figure 1 shows the impact of the tested annotation pipelines on these metrics. At the level of CDS prediction, we observed divergence with MaGe, for instance, predicting more CDSs per genome than all other pipelines. Also, MaGe annotation contained fewer unknown hypothetical proteins.
Prokka predicted more EC numbers than the other pipelines, regardless of the e-value cutoff used. The precision of the predictions was similar in Prokka, IMG, MaGe, and DFAST with 85, 88, 84, and 85% of all EC numbers being complete. The only outlier was RAST, which provided only complete EC numbers (Table S3). We did, however, observe significant differences between the pipelines regarding certain categories of EC numbers. For instance, DFAST predicted most oxidoreductases acting on NADH/NADPH, IMG predicted most Ligases forming phosphoric ester bonds, MaGe was the only pipeline to predict translocases (E.C. 7.-.-.-), Prokka predicted most hydrolases acting on phosphorusnitrogen bonds, and RAST predicted most hydrolases acting on ether bonds. A detailed list of overrepresented EC numbers per annotation pipeline is provided in Table 1.
In terms of predicted metabolic reactions, IMG predicted the highest numbers (4.2% more than average) closely followed by Prokka with default settings (3.1%), while DFAST annotations resulted in the fewest reactions (7.7% fewer than average). Similar patterns were also obtained when running the analyses separately for the different bacterial phyla (Fig.

Figure 1 Boxplot of the number of (A) CDSs, (B) EC-numbers (considers all EC-number), (C) Hypothetical proteins, and (D) Metabolic reactions of all annotated bacterial species through the five pipelines.
The graphs show the deviation of each genome annotated with every one pipeline from the mean across all pipelines in %. The thick horizontal line represents the median of the distribution across all genomes for that pipeline, and circles represent outlier data. Letters above the box-plots indicate statistically significant differences (Tukey's HSD test). Pipelines share the same letter, the differences between them are not significant (p ≥ 0.05). (See Table S10 for details).
Full-size DOI: 10.7717/peerj.11344/fig-1 S1), except that EC-number predictions for Actinobacteria by DFAST were lower compared to other pipelines, and CDS prediction for Firmicutes by IMG were slightly higher. Finally, we examined three indirect measures of quality for the network reconstructions: the percentage of dead-ends, i.e., metabolites which can be produced but are not consumed; orphan metabolites, i.e., metabolites that are consumed but not produced in the metabolic networks; and the ratio between the size of the largest strongly connected components and the total number of metabolites, a direct estimate of the ''connectedness'' of the network. The percentage of dead-end metabolites was similar across all pipelines: 30.35% for Prokka, 29.66% for IMG, 29.17 for DFAST, 29.02% for MaGe, and 28.4% for RAST. The percentage of orphan metabolites was also similar: 30.63% for Prokka, 30.49% for IMG, 29.81% for DFAST, 30.37% for MaGe, and 29.48% for RAST. Lastly, this was also true for the ratio between the size of the largest of strongly connected components and the total number of metabolites: 26% for Prokka, 28% for IMG, 29% for DFAST, 27% for MaGe, and 29% for RAST.
To determine how similar networks based on different annotation pipelines were, the Bray-Curtis dissimilarity, a unidimensional distance measure between two matrices or networks, was calculated for all network comparisons. Non-Metric Multidimensional Scaling (NMDS) was then used to display the resulting distance matrix in the twodimensional space. In the resulting graph ( Fig. 2 and Table S4) the distance between two points represents the dissimilarity of the underlying networks. It shows that the similarity Notes.
The ''%over'' indicates how many more EC numbers were found in a specific pipeline compared to all pipelines; '' ∞'' indicates that the EC category was predicted only in this pipeline. p-values correspond to a binomial test after Bonferroni correction. between networks is determined by the bacterial phyla, and less so by the annotation pipeline, but within each phylum, there was neither a clear clustering according to strain nor according to annotation pipeline. ANOSIM analyses were then used to determine if these patterns were statistically significant. They show that both phylum and annotation pipeline had a significant effect on the similarity of the networks (p < 0.0001 in both cases), but confirm that the effect was stronger for phylum (R = 0.93) than for annotation pipeline (R = 0.41).
To further explore the differences in the metabolic network reconstructions and to highlight the specificities of each annotation pipeline we aimed to determine the conserved and unique content between the genome annotations from these different pipelines. This was done using the UpSetR package. Besides generating lists of reactions specific to different subsets of pipelines, this package also generates an UpSet diagram (Fig. 3), which displays the size of these sets. For instance, 2535 reactions were predicted in at least one genome with each of the 5 pipelines. Prokka and MaGe annotations resulted in the most pipeline-specific reactions: 390 (325 with more stringent e-value) and 256, respectively. These two tools also shared a high number of pairwise exclusive metabolic reactions, i.e., reactions not found by any other pipeline (307 vs 15-65 for all other pairwise comparisons). The lists of reactions exclusively found in each pipeline as well as lists of the metabolic pathways they belong to are provided in Tables S5 and S6.
To evaluate the accuracy of the annotation pipelines, manual curation of 20 randomly selected reactions that were unique to each pipeline was carried out. DFAST has the highest number of high confidence assignations (10) followed by Prokka and MaGe (6), IMG(4), and RAST(1). RAST, on the other hand, did not produce any annotations that could be clearly identified as false, followed by Prokka (1), DFAST (2), IMG (3), and MaGe (3). In Prokka, increasing the stringency of the e-value cutoff did not affect the proportions of high confidence, low confidence or false reactions. In DFAST, although the correct reaction was associated with the gene, we observed discrepancies between the product name of genes and the EC number associated in 3 cases. In MaGe and IMG we found 1 and 3 predicted proteins without any similarity in the database and classified them as low confidence (Table S7).
To assess how these differences in the draft metabolic networks impacted the function of the predicted metabolism of algal-bacterial holobionts, we next examined the list of metabolites that could be produced by the algal metabolic network when combined with the 81 draft bacterial networks for each annotation pipeline (i.e., the predicted producible compounds). As shown in Table 2, based on Prokka, RAST, IMG, MaGe, and DFAST 506, 492, 484, 549, and 504 producible compounds were identified, respectively (see Table S8 for a list of compounds). 448 (i.e., between 81.6% and 92.6% of the producible compounds) were shared among all tools, but MaGe, RAST, and Prokka also had 34, 11, and 10 exclusive compounds (Fig. 3B). Furthermore, 12 additional compounds became producible by merging the networks of all strains and annotation pipelines. Next, we computed how many strains were needed to enable the production of the same metabolic compounds as the entire bacterial communities for each pipeline ( Table 2). As

Nocardioides sp. AX2bis
Actinobacteria (1); Actinobacteria(1); Actinomycetales(1); Nocardioidaceae(1); Nocardioides(1); Imperialibacter sp. 89 Bacteroidetes (1)  expected, the annotation pipeline affected the number of strains involved in a minimal community, which increased with the number of metabolites that could be produced by the entire bacterial community. The number of selected strains ranged from seven strains needed to produce the 484 metabolites producible by the metabolic networks reconstructed with the IMG pipeline to 13 strains needed to produce the 549 metabolites producible by the MaGe metabolic networks. Fourteen strains were required to produce the full 590 compounds producible by the combination of all networks from all pipelines. We also examined which bacteria were involved in these minimal communities, taking into consideration the union of all minimal communities proposed by MisCoTo. Here, the annotation pipeline had a strong effect on these communities, with strains being selected from sub-sets of bacteria ranging from 15 (Prokka) to 61 (RAST) strains ( Table 2). The number of bacteria in each of the sub-sets was not related to the number of producible compounds nor the size of the minimal communities. Fifteen strains were selected in only one of the five pipelines, 14 of which exclusively in the RAST-based community, and one, Pseudomonas 9AG, was included in minimal communities only when they were calculated from merged data from all annotation pipelines. Most strains were jointly selected by two or more pipelines, but even Prokka and IMG, for which minimal communities were based on the smallest subset of bacteria (15 and 17 selected strains, respectively) had only six strains in common. However, we also found commonalities between the selected communities: all communities comprised at least one representative of the Actinomycetales (Actinobacteria), Bacillales (Firmicutes), Rhizobiales (Alphaproteobacteria), Sphingomonadales (Alphaproteobacteria), and Pseudomonadales (Gammaproteobacteria). Lastly, one strain, Maribacter strain 151 (Flavobacteriaceae, Bacteroidetes), was consistently selected in all data sets regardless of the annotation tool (Table 2). Finally, given the high variability between the selected communities, we tested how specific these communities were to the dataset used to select them. To this means, we selected the strains comprised in the minimal microbial communities predicted based on the Prokka annotations, i.e., 15 bacteria, and examined the metabolites predicted to be produced by this community together with the host when applied to the datasets based on the other pipelines. Briefly, the DFAST, IMG, MaGe, and RAST metabolic networks with the Prokka community predicted similar scopes as the larger communities specifically selected for the dataset: 477 compounds vs 492 for RAST, 469 vs 484 for IMG, 488 vs 549 for MaGe, and 448 vs 504 producible compounds for DFAST, respectively (Table S9). This suggests that differences in the community size and composition between the pipelines can be explained mainly by the need to complement each community with bacteria enabling the production of less than 10% of the metabolites.

DISCUSSION & CONCLUSIONS
In this study, we sought to assess the impact of selected annotation pipelines on draft metabolic network reconstructions and downstream analyses of metabolic complementarity. Our results highlight significant differences between the output of the five tested standard pipelines at all examined levels, from the prediction of coding sequences to the selection of microbial communities, especially for EC annotations and hypothetical proteins, with levels of variability similar to those previously reported by Griesemer et al. (2018). Overall, the number of reactions predicted in the final network mirrored the number of EC numbers predicted, with Prokka and IMG yielding both the highest number of EC annotations and reactions. This underlines the importance of this type of annotation for metabolic reconstructions with Pathway Tools -a link that is not surprising as EC numbers are directly referenced in MetaCyc (Caspi et al., 2017), the database used by Pathway Tools for draft metabolic network reconstruction. Each complete EC number can therefore be translated directly into one or several corresponding metabolic reactions.
Our analyses also highlight a number of reactions that were predicted specifically for some pipelines, but our analysis of the corresponding pathways did not clearly show any pipelines to favor the annotation of specific biological processes over others. Furthermore, the fact that we observed little differences in the relative performance of the pipelines when comparing them separately for different bacterial phyla, suggests that there is no strong phylum-specificity of the tested pipelines. Our manual examination of a subset of pipeline-specific reactions showed that the reliability of these pipeline-specific predictions is generally low, with a tendency for those pipelines that predict the fewest pipeline-specific reactions (DFAST, RAST) to produce the highest proportion of high-confidence prediction, as well as the most connected networks in terms of dead-end metabolites. The overall high proportion of dead-end metabolites can be explained by the facts that (i) during the reconstruction process we discarded reactions without genetic support (gene association), and that (ii) metabolic networks underwent no or very little curation. Furthermore, the genomes analyzed here belong to non-model organisms, and hence their metabolism is more difficult to reconstruct than in well-established models.
Given these differences, a key question is how reliable functional analyses based on these draft metabolic networks are, and how much their results change according to the annotation pipeline employed. Here our results may seem contradictory at first: on one hand, variability at the level of producible metabolites was low; on the other hand, during the selection of microbial communities based on metabolic complementarities, even these small differences in the draft metabolic networks resulted in largely different consortia. These small differences, whether they are erroneous or missing annotations, accumulate across the 81 annotated genomes and are particularly likely to impact the overall set of producible metabolites and hence community selection, as any possible cooperation with an added value to the host is selected. Indeed, although small, differences in the producible metabolites require complementing the bacterial community with strains that specifically enable the production of these metabolites. Overall, however, the low number of pipelinespecific metabolites indicated that all of the selected consortia were functionally similar. Furthermore, we have shown that a given minimal bacterial community (in our case one generated with Prokka data), also yields similar producible metabolites regardless of the annotation pipeline used for the generation of the metabolic networks. This means that the different selected bacterial consortia are likely able to fulfill the same or similar metabolic roles in our metabolic model. This phenomenon has previously also been described in natural alga-associated bacterial consortia (Burke et al., 2011) and was explained by the competitive lottery theory (Sale, 1979): several, but not all species/strains can occupy a given niche, and among them, random processes govern which species prevails (lottery model). In our case, such ''random'' processes could be generated by the 'noise' in the genome annotations and the specificities of each pipeline. Consequently, the niche would correspond to the provision of specific metabolic functions. In line with this analogy, the composition of the bacterial communities in our analyses was not entirely random: although only one strain was present in all communities, each community also contained at least one member of several major taxonomic groups. These groups likely possess specific metabolic capacities absent from other groups making their presence indispensable in all selected consortia. These requirements were detected and met by MiSCoTo regardless of the annotation pipeline.
Regarding the use of metabolic complementarity as a criterion to select microbial communities, this implies that, as also confirmed experimentally by Burgunter-Delamare et al. (2020), metabolic complementarity can be used to select microbial functions important for a symbiotic community. However, we need to expect high variability in the composition of the selected communities if several strains contain similar metabolic capacities. Just like in nature, there may simply not be one ideal solution that clearly excludes all others. In this sense, our data does not provide any evidence that one or another pipeline is more suitable for metabolic network reconstruction-we can only state that for our dataset Prokka and IMG, on average, produced more EC numbers and larger draft metabolic networks with Pathway Tools, while DFAST and RAST are likely to produce fewer false-positive reactions and dead-end metabolites, on average. Furthermore, MaGe produced the largest global scope for the community. One approach to avoid biases introduced by annotation pipelines is to merge results from different pipelines thus maximizing the number of annotations (Kalkatawi, Alam & Bajic, 2015). Our data obtained for the merged metabolic networks of all pipelines suggest that this approach would also further extend our metabolic networks and the global scope of the community. However, this approach also comes with a risk, as every additional annotation pipeline may introduce additional errors (Poptsova & Gogarten, 2010), especially if the pipeline is not regularly updated (Salzberg, 2019). Based on our manual curation of pipeline-specific reactions we may even consider the opposite i.e., basing metabolic complementarity analysis exclusively on reactions that have been predicted independently by two or more pipelines. This will reduce the scope of the networks but likely also result in a reduction of noise and false-positive associations. An additional option may be to increase the stringency of the pipeline settings. Here moderate modification of the e-value in Prokka had little effect on the final results, but further adjustments are possible and their efficiency would likely need to be adapted for each dataset.
Another application of metabolic complementarity using different annotation pipeline could also be in the context of metagenome analysis, on Metagenome Assembled Genomes (MAGs) of uncultured bacteria. Belcour et al. (2020) have shown that reconstructed metabolic networks from MAGs are similar to the reference genomes. Also, the stability of the producible metabolites and selected minimal symbionts was demonstrated for degraded genomes with 2% of genes randomly removed. It, therefore, seems probable that the overall stability observed in our study in terms of the predicted metabolic contributions of the symbionts to the algal metabolism regardless of the annotation pipeline will also hold true for metagenomic communities.
In the long run, continued and extensive experimental validation of bioinformatic predictions in both a culture and a metagenomic context will be key to evaluate and refine the pipelines (Poptsova & Gogarten, 2010) and needs to be coupled with increased efforts to expand and improve annotations in reference databases (Carr & Borenstein, 2014). In the meantime, one approach to overcome the variability brought by the annotation pipelines is to carefully curate the resulting metabolic networks. This is an indispensable step in obtaining high-quality metabolic networks as stated by Thiele & Palsson (2010). However, this step is costly as it usually requires human expertise and thorough literature exploration. As more and more genomes and metagenomes are available, there is a need for curation-free and reliable metabolic networks to surmount this bottleneck. Non-curated metabolic networks can be informative but are likely to contain false-positive functions brought by the annotation, these functions deserve to be examined closely when used for selecting communities of interest.