Prediction of protein function using a deep convolutional neural network ensemble
A peer-reviewed article of this Preprint also exists.
Author and article information
Abstract
Background. The availability of large databases containing high resolution three-dimensional (3D) models of proteins in conjunction with functional annotation allows the exploitation of advanced supervised machine learning techniques for automatic protein function prediction.
Methods. In this work, novel shape features are extracted representing protein structure in the form of local (per amino acid) distribution of angles and amino acid distances, respectively. Each of the multi-channel feature maps is introduced into a deep convolutional neural network (CNN) for function prediction and the outputs are fused through Support Vector Machines (SVM) or a correlation-based k-nearest neighbor classifier. Two different architectures are investigated employing either one CNN per multi-channel feature set, or one CNN per image channel.
Results. Cross validation experiments on enzymes (n = 44,661) from the PDB database achieved 90.1% correct classification demonstrating the effectiveness of the proposed method for automatic function annotation of protein structures.
Discussion. The automatic prediction of protein function can provide quick annotations on extensive datasets opening the path for relevant applications, such as pharmacological target identification.
Cite this as
2017. Prediction of protein function using a deep convolutional neural network ensemble. PeerJ Preprints 5:e2778v1 https://doi.org/10.7287/peerj.preprints.2778v1Author comment
This is a submission to PeerJ Computer Science for review.
Sections
Additional Information
Competing Interests
The authors declare that they have no competing interests.
Author Contributions
Evangelia I Zacharaki conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, performed the computation work, reviewed drafts of the paper.
Data Deposition
The following information was supplied regarding data availability:
The raw data can be found in the PDB public database [www.rcsb.org/pdb/]. The code can be downloaded from [https://github.com/ezachar/PeerJ]. The code can run either using pdb entries downloaded locally or by accessing the entries from the PDB during run-time.
Funding
This research was partially supported by European Research Council Grant Diocles (ERC-STG-259112). There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.