This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Classification of antibody Complementarity-Determining Region (CDR) conformations is an important step that drives antibody modelling and engineering, prediction from sequence, directed mutagenesis and induced-fit studies, and allows inferences on sequence-to-structure relations. Most of the previous work performed conformational clustering on a reduced set of structures or after application of various structure pre-filtering criteria. In this study, it was judged that a clustering of every available CDR conformation would produce a complete and redundant repertoire, increase the number of sequence examples and allow better decisions on structure validity in the future. In order to cope with the potential increase in data noise, a first-level statistical clustering was performed using structure superposition Root-Mean-Square Deviation (RMSD) as a distance-criterion, coupled with second- and third-level clustering that employed Ramachandran regions for a deeper qualitative classification. The classification of a total of 12712 CDR conformations is thus presented, along with rich annotation and cluster descriptions, and the results are compared to previous major studies. The present repertoire has procured an improved image of our current CDR Knowledge-Base, with a novel nesting of conformational sensitivity and specificity that can serve as a systematic framework for improved prediction from sequence as well as a number of future studies that would aid in knowledge-based antibody engineering such as humanisation.
A paper on CDR conformation prediction methods from sequence, which uses the present clustering result for training/updating, is conjointly submitted to PeerJ preprint server, titled:"Disjoint combinations profiling (DCP): a new method for the prediction of antibody CDR conformation from sequence." Version 2: added DOI of conjoint paper. Version 3: minor amendments.
Comparison of level-1 conformational clusters obtained in CDR-H3 with North et al., 2011
The cluster medoid/median or representative of the external sets was used for identification of correspondences. Only level-1 clusters with a correspondence are shown here, in order to preserve a readable size for the table (213 total level-1 clusters in CDR-H3). In brackets, next to each correspondence, is the full, level-3, classification in this work of the representative of the external set. The entire correspondence is marked between square brackets and in full-italics because the CDR-H3 definition used in North et al., 2011, was longer by 2 residues (i.e. 93-102).
Collection of heatmaps for all CDR/length combinations, showing the minimum number of amino acid differences, position-by-position, between any two sequences of different clusters. allow a quick visual appreciation of the degree of sequence dissimilarity between clusters. mSD heatmaps allow a quick visual appreciation of the degree of sequence dissimilarity between clusters.
Detailed membership assignments (sorted by PDB code)
Csv formatted lists where every CDR is shown in alphabetical PDB order with all available clustering and data-mined information (cis/trans peptides, structure resolution, crystal spacegroup, sequence, Ramachandran logos, cluster core label).
Detailed membership assignments (sorted by cluster)
Csv formatted lists where every CDR is shown in cluster order with all available clustering and data-mined information (cis/trans peptides, structure resolution, crystal spacegroup, sequence, Ramachandran logos, cluster core label).