The evolution of logic circuits for the purpose of protein contact map prediction

Department of Comptuational Science and Engineering, North Carolina A&T State University, Greensboro, USA
Michigan State University, East Lansing, USA
Department of Integrative Biology, The University of Texas at Austin, Austin, USA
DOI
10.7287/peerj.preprints.2197v1
Subject Areas
Bioinformatics, Computational Biology
Keywords
protein contact map prediction, evolutionary computation, markov networks, machine learning, feature selection
Copyright
© 2016 Chapman et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Chapman SD, Adami C, Wilke CO, KC DB. 2016. The evolution of logic circuits for the purpose of protein contact map prediction. PeerJ Preprints 4:e2197v1

Abstract

Predicting protein structure from sequence remains a major open problem in protein biochemistry. One component of predicting complete structures is the prediction of inter-residue contact patterns (contact maps). Here, we discuss protein contact map prediction by machine learning. We describe a novel method for contact map prediction that uses the evolution of logic circuits. These logic circuits operate on feature data and output whether or not two amino acids in a protein are in contact or not. We show that such a method is feasible, and in addition that evolution allows the logic circuits to be trained on the dataset in an unbiased manner so that it can be used in both contact map prediction and the selection of relevant features in a dataset.

Author Comment

This is a submission to PeerJ for review.

Supplemental Information

Results for the split of 10 with 10 bits per feature (6880 total bits)

The highest Fmax at 60 committee members is at 75k, with an Fmax of 0.098

DOI: 10.7287/peerj.preprints.2197v1/supp-1

Results for the split of 16 with 16 bits per feature (11008 total bits)

The highest Fmax at 60 committee members is at 75k, with an Fmax of 0.097.

DOI: 10.7287/peerj.preprints.2197v1/supp-2

Results for the split of 4 with 4 bits per feature (2752 total bits)

The highest Fmax at 60 committee members is at 75k, with an Fmax of 0.097.

DOI: 10.7287/peerj.preprints.2197v1/supp-3

Results for the split of 16 with 4 bits per feature (2752 total bits)

The highest Fmax at 60 committee members is at 75k, with an Fmax of 0.102

DOI: 10.7287/peerj.preprints.2197v1/supp-4

Results for the split of 4 with 2 bits per feature (1376 total bits)

The highest Fmax at 60 committee members is at 75k, with an Fmax of 0.102.

DOI: 10.7287/peerj.preprints.2197v1/supp-5

Results at 75k updates for all five split treatments

The highest Fmax is achieved by the split of 16, 4 bits per feature encoding, with an Fmax of 0.103.

DOI: 10.7287/peerj.preprints.2197v1/supp-6

Specificity and sensitivity results at 75k updates for the split of 4 with 2 bits per feature treatment

Specificity at 60 committee members was 0.14, and sensitivity was 0.35.

DOI: 10.7287/peerj.preprints.2197v1/supp-7

A sample network diagram taken from the treatment of a split of 4 with 2 bits/feature at 75k updates

Out of a possible 1376 bits, the network has evolved to recognize only 118 of these. Inputs bits are green, gates are red, and outputs are blue. The inputs and gates are unordered. Note that a pair of outputs has evolved to represent a positive contact answer (the maximum is two), but that the negative contact answer evolved only one.

DOI: 10.7287/peerj.preprints.2197v1/supp-8

A histogram showing number of features recognized by each number of networks

The used encoding used is split of 4 with 2 bits/feature. A network only has to have input from one bit of a feature to recognize it.

DOI: 10.7287/peerj.preprints.2197v1/supp-9

Number of networks out of the 60 that evolved to recognize each kind of secondary structure along the two size-9 sliding windows

Encoding was split of 4, 2 bits/feature.

DOI: 10.7287/peerj.preprints.2197v1/supp-10

Number of networks out of the 60 that evolved to recognize the amino acid pair separation features

Encoding was split of 4, 2 bits/feature. Each tick shown is a different contact separation feature.

DOI: 10.7287/peerj.preprints.2197v1/supp-11

Fmax comparison of the runs for all features and the reduced features

Fmax of the original split-4, 2 bits per feature encoding with all features, and the same kind of run with the reduced feature set that only used features recognized by at least 6 of the networks from the first run.

DOI: 10.7287/peerj.preprints.2197v1/supp-12