Greedy motif-based approach to parsing large and diverge coiled-coil proteins into domains

Faculty of Science, Institute for Computational Science, University of Zurich, Zurich, Switzerland
Service and Support for Science IT (S3IT), University of Zurich, Zurich, Switzerland
Division of Infection Medicine, Department of Clinical Sciences, Lund University, Lund, Sweden
DOI
10.7287/peerj.preprints.3118v2
Subject Areas
Bioinformatics, Infectious Diseases, Data Science
Keywords
Host-pathogen interaction, Protein domains, Conserved motifs, S. pyogenes
Copyright
© 2017 Khakzad et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Khakzad H, Malmström J, Malmström L. 2017. Greedy motif-based approach to parsing large and diverge coiled-coil proteins into domains. PeerJ Preprints 5:e3118v2

Abstract

Bacterial surfaces are complex, built of from membranes, peptide-glycans and, importantly, proteins. The proteins play crucial roles as the key regulator of how the bacterium interacts with its environment. A full catalog of the motifs in coiled-coil proteins and their relative conservation grade is a pre-requisite to target the protein-protein interaction that bacterial surface protein makes to host proteins. Here, we present a greedy approach to iteratively identify conserved motifs in large sequence collections, identify all occurrences of these motifs and mask them. Remaining unmasked sequences are subjected to the second round of motif detection until no more significant motifs can be found or all protein segments have been assigned to a motif. We present the results for the S. pyogenes M protein. Given the speed and flexibility of our approach, we believe it will be useful in breaking analyzing surface protein of pathogens as these proteins are under high selective pressure and therefore cannot be analyzed using more traditional approaches such as multiple-sequence alignments. Preliminary data indicates that many of the newly discovered motifs are not always present together with adjacent motifs, indicating that they might have different and independent functions.

Author Comment

The conclusion part in this version has been changed and the acknowledgement part is modified.

This is an abstract which has been accepted for the NETTAB 2017 Workshop.