Enumerating common molecular substructures

Life Sciences & Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
Princeton University, Princeton, New Jersey, USA
VU University, Amsterdam, The Netherlands
University of Queensland, Brisbane, Australia
Algorithmic Bioinformatics, Heinrich-Heine Universität, Düsseldorf, Germany
DOI
10.7287/peerj.preprints.3250v1
Subject Areas
Bioinformatics, Computational Science
Keywords
molecular dynamics simulations, subgraph enumeration, molecular graphs
Copyright
© 2017 Engler et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Engler MS, El-Kebir M, Mulder J, Mark AE, Geerke DP, Klau GW. 2017. Enumerating common molecular substructures. PeerJ Preprints 5:e3250v1

Abstract

Finding and enumerating common molecular substructures is an important task in cheminformatics, where small molecules are often modeled as molecular graphs. We introduce the problem of enumerating all maximal k-common molecular fragments of a pair of molecular graphs. A k-common fragment is a common connected induced subgraph that consists of a common core and a common k-neighborhood. It is thus a generalization of the NP-hard task to enumerate all maximal common connected induced subgraphs (MCCIS) of two graphs, which corresponds to the k = 0 case.

We extend the MCCIS enumeration algorithm by Ina Koch and apply algorithm engineering techniques to solve practical instances fast for the general k > 0 case, which is relevant, for example, for automatically generating force field topologies for molecular dynamics (MD) simulations. We find that our methods achieve good performance on a real-world benchmark of all-against-all comparisons of 255 molecules.

Our software is available under the LGPL open source license at https://github.com/enitram/mogli .

Author Comment

This is an article which has been accepted for the GCB 2017 Conference