100 most important features

The top 100 most important features. Features are ranked by permutation importance and results are shown for the first fold of the five-fold cross-validation (with very similar results for the other folds). The four amino acid index (see references 41-43 for details) accessions occurring are (i) The Kerr effect of amino acids in water: The Kerr-constant increments [KHAG800101], (ii) Characterization of multiple bends in proteins: Normalized relative frequency of double bend [ISOY800107], (iii) Shape and surface features of globular proteins: Correlation coefficient in regression analysis [PRAM820103] and (iv) Protein secondary structure: Normalized frequency of beta-sheet in alpha+beta class [PALJ810111]. Furthermore, all k-mer features are listed including (and consider) the respective reverse complement.

DOI: 10.7287/peerj.preprints.2379v1/supp-1

The authors declare that they have no competing interests.

Author Contributions

Carlus Deneke conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper, implemented the software.

Robert Rentzsch conceived and designed the experiments, contributed reagents/materials/analysis tools, wrote the paper, reviewed drafts of the paper.

Bernhard Y Renard conceived and designed the experiments, wrote the paper, reviewed drafts of the paper.

Database: github


name 1: paprbag

url 1:

name 2: data4paprbag

url 2:


This work was supported by the German Federal Ministry of Health [IIA5-2512-FSB-725 to B.Y.R.]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

