Evidencing strain-dependency of metabolic pathways within 1,494 lactic bacteria genomes with the in silico screening Prolipipe pipeline
Abstract
Genomes from bacteria of interest to the food industry exhibit significant functional variability, yet evaluating this characteristic remains challenging. As public repositories continue to accumulate more genomes, large-scale assessment of metabolic potential emerges as a promising method to highlight this functional variability. The primary challenge lies in automating a workflow to construct metabolic networks from genomes on a massive scale. Here, we present Prolipipe, a pipeline designed for the large-scale assessment of metabolic potential in bacteria, focusing on specific pathways. Given a large dataset of hundreds to thousands of bacterial genomes with known taxonomy and a list of targeted pathways, Prolipipe identifies gene functions through a comprehensive annotation step using three different tools. Then it builds genome-scale metabolic networks for each genome. These networks are then parsed to document the presence or absence of each reaction across all processed genomes. The pipeline evaluates the metabolic potential of each genome to carry out the pathway according to its gene content and highlight the best candidates among the large-scale set of genomes. In this study, Prolipipe was applied to 1,494 genomes of lactic acid bacteria, assessing the completeness ratio of 761 pathways. We classified pathways according to their maximum completeness rate, revealing that 137 pathways can be operated by at least one strain in our dataset. By mapping the identifiers of these pathways onto the pathway ontology graph of the Metacyc database, we highlighted that none of the pathways within four functional classes of Metacyc can be entirely recovered in the strain dataset. We then investigated infraspecific variability, a strong indicator of functional variability, and compared the species in our genome dataset based on their tendency to exhibit infraspecific variability. This analysis revealed species potential for strain-dependency, where phenotypes differ among strains of the same species.