This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Chromatograms represent a class of data difficult to process expeditiously due to the large number of intermediary steps necessary to translate peak detection to a concentration reading of a specific compound. This problem is further exacerbated by the different output file format in which instrument manufacturers present chromatographic data. Steps necessary to convert a detected peak to a concentration reading include identification of compound using retention time, extraction of corresponding peak area, and calculation of concentration of compound by using a calibration curve. This work sought to develop a MATLAB software able to automatically extract peak area from chromatographic readout captured in pdf format and calculate the corresponding concentration values. Given manufacturer-specific formatting features in pdf file, the MATLAB software could only read and handle pdf files of HPLC readouts from Shimadzu’s LabSolutions software. In processing the pdf file of each analyzed sample, entire content of the file was first read as a character string. Subsequently, specific delimiters were used to extract retention time and detected peak area for each compound. This information was subsequently processed to identify specific target compound of interest, where extracted peak area was used to calculate concentration of compound using a calibration plot. Overall, the program generates a database comprising filename, raw retention time and peak area data, as well as concentration values of each target compound in an easy to read format. Finally, to provide ease of access and a permanent file for storage, the program output the above database as an Excel file stored on the hard drive. One important advantage of this software is that it could process multiple pdf files simultaneously and there is no upper limit to the number of pdf files (or samples) that could be processed. Collectively, the MATLAB software capable of automatically extracting peak area and calculating concentration of different compounds would provide significant savings of time in handling large number of pdf files in a typical chromatographic run from a Shimadzu HPLC instrument.
This version corrects a software bug and outlines a software with improved efficiency.