This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Pattern recognition is commonly used for identifying an unknown entity from a set of known objects curated in a database, and find use in various applications such as fingerprint matching and microbial identification. Mass spectrometry is increasingly used in identifying microbes in the research and clinical settings via species- or strain-specific mass spectrum signatures. Although the existence of unique biomarkers (such as ribosomal proteins) underpins mass spectrometry-based microbial identification, absence of corresponding genome or proteome information in public databases for a large fraction of extant microbes significantly hamper biomarker (and species) assignment. However, the reproducible generation of species-specific mass spectrum across different growth and environmental conditions opens up the possibility of identifying unknown microbes, without biomarker identities, via comparing peak positions between mass spectra. Thus, the mass spectrum fingerprinting (pattern recognition) approach circumvents the need for biomarker information, where alignment of as many mass peaks as possible (particularly, those of phylogenetic significance) between spectra is the basis for identification. In contrast, variation in gene expression and metabolism with environmental and nutritional factors, meant that alignment of peak intensities, though desired, is not a strict requirement in species annotation. With large diversity of biomolecules present in each microbial species, mass spectrometry-based microbial identification is inherently data-intensive, which necessitates statistical tools and computers for implementation. However, relegation of algorithmic details to the backend of software obfuscates the approach’s conceptual underpinnings and hinders understanding. More importantly, mathematics-centric approaches for explaining the conceptual basis of pattern recognition, though useful, are generally less pedagogically accessible to students relative to visual illustration techniques. This short primer describes a simple graphical illustration that explains the conceptual underpinnings of mass spectrum fingerprinting, and highlights caveats for avoiding misidentifications, and may find use as a supplement in a microbiology or bioinformatics course for introducing the conceptual basis of pattern recognition based microbial identification by mass spectrometric analysis.
This version updates content, language and logic flow.