Prediction of amyloidogenicity based on the n-gram analysis

Michał Jan Burdukiewicz; Piotr Sobczyk; Stefan Rödiger; Anna Duda-Madej; Paweł Mackiewicz; Małgorzata Kotulska

doi:10.7287/peerj.preprints.2390v1

Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

NOT PEER-REVIEWED

"PeerJ Preprints" is a venue for early communication or feedback before peer review. Data may be preliminary.

German Conference on Bioinformatics 2016 Collection thumbnail

Highlighted in German Conference on Bioinformatics 2016 Collection

Prediction of amyloidogenicity based on the n-gram analysis

Michał Jan Burdukiewicz ¹, Piotr Sobczyk², Stefan Rödiger³, Anna Duda-Madej⁴, Paweł Mackiewicz¹, Małgorzata Kotulska⁵

1 Department of Genomics, University of Wrocław, Wrocław, Poland

2 Faculty of Pure and Applied Mathematics, Wrocław University of Science and Technology, Wrocław, Poland

3 Institute of Biotechnology, Brandenburg University of Technology Cottbus-Senftenberg, Senftenberg, Germany

4 Department of Microbiology, Wrocław Medical University, Wrocław, Poland

5 Department of Biomedical Engineering, Wrocław University of Science and Technology, Wrocław, Poland

DOI: 10.7287/peerj.preprints.2390v1

Published: 2016-08-24
Accepted: 2016-08-24

Subject Areas: Bioinformatics, Computational Biology
Keywords: n-gram, amyloid, random forest, prediction, feature selection

Copyright: © 2016 Burdukiewicz et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

Cite this article: Burdukiewicz MJ, Sobczyk P, Rödiger S, Duda-Madej A, Mackiewicz P, Kotulska M. 2016. Prediction of amyloidogenicity based on the n-gram analysis. PeerJ Preprints 4:e2390v1 https://doi.org/10.7287/peerj.preprints.2390v1

Abstract

Amyloids are proteins associated with the number of clinical disorders (e.g., Alzheimer's, Creutzfeldt-Jakob's and Huntington's diseases). Despite their diversity, all amyloid proteins can undergo aggregation initiated by 6- to 15-residue segments, called hot spots. To find the patterns defining the hot-spots, we trained predictors of amyloidogenicity, using n-grams and random forest classifiers, based on data collected in the AmyLoad database. Only the most informative n-grams, selected by our Quick Permutation Test, were considered. Since the amyloidogenicity may not depend on the exact sequence of amino acids but on more general properties of amino acids, we tested 524,284 reduced amino acid alphabets of different lengths (three to six letters) to find the alphabet providing the best performance in cross-validation. The predictor based on this alphabet, called AmyloGram, was benchmarked against the most popular tools for the detection of amyloid peptides using an external data set and obtained the highest values of performance measures (AUC: 0.90, MCC: 0.63). Our results showed sequential patterns in the amyloids, which are strongly correlated with hydrophobicity, a tendency to form ß-sheets and rigidity of amino acid residues. Among the most informative n-grams of AmyloGram we identified 15 that were already confirmed experimentally. AmyloGram is available as a web-server: www.smorfland.uni.wroc.pl/amylogram/. The code and results are publicly available at: www.github.com/michbur/prediction_amyloidogenicity_ngram/.

Author Comment

This is an article which has been accepted for the "GCB 2016 Conference".

Add your feedback

Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback.

Some Markdown syntax is allowed: _italic_ **bold** ^superscript^ ~subscript~ %%blockquote%% [link text](link URL)

By posting this you agree to PeerJ's commenting policies

Questions

Ask a question

Learn more about Q&A

Links

Add a link

Content

Alert

Just enter your email

Add your feedback

Top referrals unique visitors

Share this preprint

Metrics

Download article