This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Horsch S, Kopczynski D, Baumbach JI, Rahnenführer J, Rahmann S.2015. From raw ion mobility measurements to disease classification: a comparison of analysis processes. PeerJ PrePrints3:e1294v1https://doi.org/10.7287/peerj.preprints.1294v1
Ion mobility spectrometry (IMS) is a technology for the detection of volatile compounds in the air of exhaled breath that is increasingly used in medical applications. One major goal is to classify patients into disease groups, for example diseased versus healthy, from simple breath samples. Raw IMS measurements are data matrices in which peak regions representing the compounds have to be identified and quantified. A typical analysis process consists of pre-processing and peak detection in single experiments, peak clustering to obtain consensus peaks across several experiments, and classification of samples based on the resulting multivariate peak intensities. Recently several automated algorithms for peak detection and peak clustering have been introduced, in order to overcome the current need for human-based analysis that is slow, subjective and sometimes not reproducible. We present an unbiased comparison of a multitude of combinations of peak processing and multivariate classification algorithms on a disease dataset. The specific combination of the algorithms for the different analysis steps determines the classification accuracy, with the encouraging result that certain fully-automated combinations perform even better than current manual approaches.
The two last authors, Jörg Rahnenführer and Sven Rahmann, contributed equally. This work has been presented at the German Conference on Bioinformatics 2015.