Prediction of protein function using a deep convolutional neural network ensemble

Evangelia I Zacharaki

doi:10.7287/peerj.preprints.2778v1

Prediction of protein function using a deep convolutional neural network ensemble

Evangelia I Zacharaki

February 5, 2017

A peer-reviewed article of this Preprint also exists.

View peer-reviewed version

Author and article information

Abstract

Background. The availability of large databases containing high resolution three-dimensional (3D) models of proteins in conjunction with functional annotation allows the exploitation of advanced supervised machine learning techniques for automatic protein function prediction.

Methods. In this work, novel shape features are extracted representing protein structure in the form of local (per amino acid) distribution of angles and amino acid distances, respectively. Each of the multi-channel feature maps is introduced into a deep convolutional neural network (CNN) for function prediction and the outputs are fused through Support Vector Machines (SVM) or a correlation-based k-nearest neighbor classifier. Two different architectures are investigated employing either one CNN per multi-channel feature set, or one CNN per image channel.

Results. Cross validation experiments on enzymes (n = 44,661) from the PDB database achieved 90.1% correct classification demonstrating the effectiveness of the proposed method for automatic function annotation of protein structures.

Discussion. The automatic prediction of protein function can provide quick annotations on extensive datasets opening the path for relevant applications, such as pharmacological target identification.

Cite this as

Zacharaki EI. 2017. Prediction of protein function using a deep convolutional neural network ensemble. PeerJ Preprints 5:e2778v1 https://doi.org/10.7287/peerj.preprints.2778v1

note This preprint is not peer-reviewed. You may wish to reference the subsequent peer-reviewed version of this article.

Author comment

This is a submission to PeerJ Computer Science for review.

Sections

Additional Information

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Evangelia I Zacharaki conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, performed the computation work, reviewed drafts of the paper.

Data Deposition

The following information was supplied regarding data availability:

The raw data can be found in the PDB public database [www.rcsb.org/pdb/]. The code can be downloaded from [https://github.com/ezachar/PeerJ]. The code can run either using pdb entries downloaded locally or by accessing the entries from the PDB during run-time.

Funding

This research was partially supported by European Research Council Grant Diocles (ERC-STG-259112). There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Prediction of protein function using a deep convolutional neural network ensemble

Author and article information

Abstract

Author comment

Sections

Additional Information

Competing Interests

Author Contributions

Data Deposition

Funding

Add your feedback

Publish for free

Five new journals in Chemistry

Sections

Additional Information

Competing Interests

Author Contributions

Data Deposition

Funding

Add your feedback

Top referrals unique visitors

Share this preprint

Metrics

Download article

Publish for free

Five new journals in Chemistry