Amino-acid site variability among natural and designed proteins

Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, CA, USA
BEACON Center for the Study of Evolution in Action, East Lansing, MI
Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
DOI
10.7287/peerj.preprints.74v1
Subject Areas
Biochemistry, Computational Biology, Computational Science
Keywords
protein design, sequence alignments, relative solvent accessibility, site variability, fixed-backbone design, flexible-backbone design
Copyright
© 2013 Jackson et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Cite this article
Jackson EL, Ollikainen N, Covert III AW, Kortemme T, Wilke CO. 2013. Amino-acid site variability among natural and designed proteins. PeerJ PrePrints 1:e74v1

Abstract

Computational protein design attempts to create protein sequences that fold stably into pre-specified structures. Here we compare alignments of designed proteins to alignments of natural proteins and assess how closely designed sequences recapitulate patterns of sequence variation found in natural protein sequences. We design proteins using RosettaDesign, and we evaluate both fixed-backbone designs and variable-backbone designs with different amounts of backbone flexibility. We find that proteins designed with a fixed backbone tend to underestimate the amount of site variability observed in natural proteins while proteins designed with an intermediate amount of backbone flexibility result in more realistic site variability. Further, the correlation between solvent exposure and site variability in designed proteins is lower than that in natural proteins. This finding suggests that site variability is too uniform across different solvent exposure states (i.e., buried residues are too variable or exposed residues too conserved). When comparing the amino acid frequencies in the designed proteins with those in natural proteins we find that in the designed proteins hydrophobic residues are underrepresented in the core. From these results we conclude that intermediate backbone flexibility during design results in more accurate protein design and that either scoring functions or backbone sampling methods require further improvement to accurately replicate structural constraints on site variability.

Supplemental Information

Supporting Figures

Single pdf file containing Supporting Figures S1-S9.

DOI: 10.7287/peerj.preprints.74v1/supp-1