Prediction of protein function using a deep convolutional neural network ensemble

Center for Visual Computing, CentraleSupélec and GALEN Team, INRIA Saclay, Palaiseau, France
DOI
10.7287/peerj.preprints.2778v1
Subject Areas
Bioinformatics, Computational Biology, Data Mining and Machine Learning
Keywords
enzyme classification, CNN, function predition, deep learning, structural genomics
Copyright
© 2017 Zacharaki
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Zacharaki EI. 2017. Prediction of protein function using a deep convolutional neural network ensemble. PeerJ Preprints 5:e2778v1

Abstract

Background. The availability of large databases containing high resolution three-dimensional (3D) models of proteins in conjunction with functional annotation allows the exploitation of advanced supervised machine learning techniques for automatic protein function prediction.

Methods. In this work, novel shape features are extracted representing protein structure in the form of local (per amino acid) distribution of angles and amino acid distances, respectively. Each of the multi-channel feature maps is introduced into a deep convolutional neural network (CNN) for function prediction and the outputs are fused through Support Vector Machines (SVM) or a correlation-based k-nearest neighbor classifier. Two different architectures are investigated employing either one CNN per multi-channel feature set, or one CNN per image channel.

Results. Cross validation experiments on enzymes (n = 44,661) from the PDB database achieved 90.1% correct classification demonstrating the effectiveness of the proposed method for automatic function annotation of protein structures.

Discussion. The automatic prediction of protein function can provide quick annotations on extensive datasets opening the path for relevant applications, such as pharmacological target identification.

Author Comment

This is a submission to PeerJ Computer Science for review.