This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Computational models in biology encode molecular and cell biological processes. Many of them can be represented as biochemical reaction networks. Studying such networks, one is often interested in systems that share similar reactions and mechanisms. Typical goals are to understand the parts of a model, to identify reoccurring patterns, and to find biologically relevant motifs. The large number of models are available for such a search, but also the large size of models require automated methods.Specifically the generic problem of finding patterns in large networks is computationally hard. As a consequence, only partial solutions for a structural analysis of models exist. Here we introduce a tool chain that identifies reoccurring patterns in biochemical reaction networks. We started this work with an evaluation of algorithms for the identification of frequent subgraphs. Then, we created graph representations of existing SBML models and ran the most suitable algorithm on the data. The result was a list of reaction patterns together with statistics about the occurrence of each pattern in the data set. The approach was validated with 575 SBML models from the curated branch of BioModels. We analysed how the resulting patterns confirm with expectations from the literature and from previous model statistics. In the future, the identified patterns can serve as a tool to measure the similarity of models.
A closer connection to postulated motifs in reaction networks was added.
Identified Patterns for dataset R1
A species is depicted by a yellow ellipse, a reaction by a green square.