From trash to treasure: detecting unexpected contamination in unmapped NGS data
Author and article information
Abstract
Standard procedures for NGS data analysis involve a pre-processing step of reads quality assessment, followed by the alignment of the filtered reads to a reference genome. Typically the amount of reads that correctly maps to the specific reference genome ranges between 70% and 90%, leaving in some cases a consistent fraction of unmapped sequences. Investigating the reasons of this discrepancy may provide relevant information about the source of the so called unmapped reads. It is not unusual that genetic material of microorganisms is present in biological samples undergoing sequencing. These exogenous sequences can derive from the normal or altered tissues microbiome (upstream contamination) or from a contamination occurring during the samples processing (downstream contamination).
Here we propose DecontaMiner, a tool to unravel the presence of contaminating sequences among the unmapped reads. It uses a subtraction approach in which the sequences are first filtered according to quality parameters and then mapped to ribosomal, mithocondrial and foreign organism's databases. The reads that do not map on human genome are then mapped, through a local alignment algorithm (MegaBlast), to bacteria, fungi and viruses genome. DecontaMiner generates several output files to track all the processed reads, and to provide a complete report of their characteristics. The good quality matches on microorganism genomes are counted and compared among samples. The main novelty of DecontaMiner is the versatility of its use together with a complete, easy to use, and automatic pipeline.
DecontaMiner has been mainly used to detect contamination in human RNA-seq data, but the pipeline can be easily tailored using the configuration files and flags to process DNA-seq data, and unmapped data coming from non-human species.
Cite this as
2017. From trash to treasure: detecting unexpected contamination in unmapped NGS data. PeerJ Preprints 5:e3230v1 https://doi.org/10.7287/peerj.preprints.3230v1Author comment
This is an abstract which has been accepted for the NETTAB 2017 Workshop
Sections
Additional Information
Competing Interests
The authors declare that they have no competing interests.
Author Contributions
llaria Granata conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
Mara Sangiovanni conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
Amarinder Singh Thind prepared figures and/or tables, developed the code for the graphic interface.
Mario Rosario Guarracino contributed reagents/materials/analysis tools, reviewed drafts of the paper.
Data Deposition
The following information was supplied regarding data availability:
DecontaMiner code, user guide, and example files are available at
www-labgtp.na.icar.cnr.it/decontaminer/index.php
Funding
This work has been funded by MIUR PON02-00619, Interomics Italian Flagship Project and COFIND INCIPIT Project. Mario Rosario Guarracino work has been conducted at National Research University Higher School of Economics (HSE) and has been supported by the RSF grant n. 14-41-00039. The publication costs are funded by MIUR PON02-00619. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.