The top 100 most important features. Features are ranked by permutation importance and results are shown for the first fold of the five-fold cross-validation (with very similar results for the other folds). The four amino acid index (see references 41-43 for details) accessions occurring are (i) The Kerr effect of amino acids in water: The Kerr-constant increments [KHAG800101], (ii) Characterization of multiple bends in proteins: Normalized relative frequency of double bend [ISOY800107], (iii) Shape and surface features of globular proteins: Correlation coefficient in regression analysis [PRAM820103] and (iv) Protein secondary structure: Normalized frequency of beta-sheet in alpha+beta class [PALJ810111]. Furthermore, all k-mer features are listed including (and consider) the respective reverse complement.

The authors declare that they have no competing interests.

Carlus Deneke conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper, implemented the software.

Robert Rentzsch conceived and designed the experiments, contributed reagents/materials/analysis tools, wrote the paper, reviewed drafts of the paper.

Bernhard Y Renard conceived and designed the experiments, wrote the paper, reviewed drafts of the paper.

This work was supported by the German Federal Ministry of Health [IIA5-2512-FSB-725 to B.Y.R.]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

