The evolution of logic circuits for the purpose of protein contact map prediction
- Published
- Accepted
- Subject Areas
- Bioinformatics, Computational Biology
- Keywords
- protein contact map prediction, evolutionary computation, markov networks, machine learning, feature selection
- Copyright
- © 2016 Chapman et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2016. The evolution of logic circuits for the purpose of protein contact map prediction. PeerJ Preprints 4:e2197v1 https://doi.org/10.7287/peerj.preprints.2197v1
Abstract
Predicting protein structure from sequence remains a major open problem in protein biochemistry. One component of predicting complete structures is the prediction of inter-residue contact patterns (contact maps). Here, we discuss protein contact map prediction by machine learning. We describe a novel method for contact map prediction that uses the evolution of logic circuits. These logic circuits operate on feature data and output whether or not two amino acids in a protein are in contact or not. We show that such a method is feasible, and in addition that evolution allows the logic circuits to be trained on the dataset in an unbiased manner so that it can be used in both contact map prediction and the selection of relevant features in a dataset.
Author Comment
This is a submission to PeerJ for review.
Supplemental Information
Results for the split of 10 with 10 bits per feature (6880 total bits)
The highest Fmax at 60 committee members is at 75k, with an Fmax of 0.098
Results for the split of 16 with 16 bits per feature (11008 total bits)
The highest Fmax at 60 committee members is at 75k, with an Fmax of 0.097.
Results for the split of 4 with 4 bits per feature (2752 total bits)
The highest Fmax at 60 committee members is at 75k, with an Fmax of 0.097.
Results for the split of 16 with 4 bits per feature (2752 total bits)
The highest Fmax at 60 committee members is at 75k, with an Fmax of 0.102
Results for the split of 4 with 2 bits per feature (1376 total bits)
The highest Fmax at 60 committee members is at 75k, with an Fmax of 0.102.
Results at 75k updates for all five split treatments
The highest Fmax is achieved by the split of 16, 4 bits per feature encoding, with an Fmax of 0.103.
Specificity and sensitivity results at 75k updates for the split of 4 with 2 bits per feature treatment
Specificity at 60 committee members was 0.14, and sensitivity was 0.35.
A sample network diagram taken from the treatment of a split of 4 with 2 bits/feature at 75k updates
Out of a possible 1376 bits, the network has evolved to recognize only 118 of these. Inputs bits are green, gates are red, and outputs are blue. The inputs and gates are unordered. Note that a pair of outputs has evolved to represent a positive contact answer (the maximum is two), but that the negative contact answer evolved only one.
A histogram showing number of features recognized by each number of networks
The used encoding used is split of 4 with 2 bits/feature. A network only has to have input from one bit of a feature to recognize it.
Number of networks out of the 60 that evolved to recognize each kind of secondary structure along the two size-9 sliding windows
Encoding was split of 4, 2 bits/feature.
Number of networks out of the 60 that evolved to recognize the amino acid pair separation features
Encoding was split of 4, 2 bits/feature. Each tick shown is a different contact separation feature.
Fmax comparison of the runs for all features and the reduced features
Fmax of the original split-4, 2 bits per feature encoding with all features, and the same kind of run with the reduced feature set that only used features recognized by at least 6 of the networks from the first run.