This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Multilocus sequence typing (MLST) is a standard tool in population genetics and bacterial epidemiology that assesses the genetic variation present in a reduced number of housekeeping genes (typically seven) along the genome. This methodology assigns arbitrary integer identifiers to genetic variations at these loci allowing to efficiently compare bacterial isolates using allele-based methods. Now, the increasing availability of whole-genome sequences for hundreds to thousands of strains from the same bacterial species has motivated to upgrade the resolution of traditional MLST schemes using larger gene sets or even the core genome (cgMLST). The PubMLST database is the most comprehensive resource of described MLST and cgMLST schemes available for a wide variety of species. Here we present MLSTar as the first R package that allows to i) connect with the PubMLST database to select a target scheme, ii) screen a desired set of genomes to assign alleles and sequence types and iii) interact with other widely used R packages to analyze and produce graphical representations of the data. We applied MLSTar to analyze a set of 400 Campylobacter coli genomes, showing great accuracy and comparable performance with previously published command-line tools. MLSTar can be freely downloaded from http://github.org/iferres/MLSTar.
This is a preprint submission to PeerJ Preprints. Also, the manuscript has been submitted for peer-review in PeerJ Journal.