Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks

View article
Microbiology

Main article text

 

Introduction

Materials & Methods

Results

Quality metrics in raw and healed reads

Read healing and quality metrics impacting SNPs analysis

Read healing and quality metrics relationships to assemblies

Discussion

Conclusions

Supplemental Information

NCBI SRA identifiers, divided by cluster

DOI: 10.7717/peerj.12446/supp-1

Scripts used to estimate quality metrics and downsample WGS data files

DOI: 10.7717/peerj.12446/supp-2

Scripts used to implement read healing pipelines

All pipelines tested on isolates from Clusters 1 through 4. Ambiguous nucleotide content was evaluated using the script, countReadsWithAmbig.py, for all raw and healed reads. Custom scripts available for download from https://github.com/darlenewagner/NGS_Multi_Heal

DOI: 10.7717/peerj.12446/supp-3

Chromosomal coordinates masked for prophages prior to LyveSET mapping

No coordinate masking employed for S. enterica ser. Reading draft assembly, CVM_N17S1020.

DOI: 10.7717/peerj.12446/supp-4

Quality metrics and variation across read trimming/healing methods

R1/R2 PHRED quality, median insert lengths, and percentage of Ns in R1 + R2 reads across trimming/healing methods are indicated by range of values. Dunn test post-hoc p-values are given in parenthesis for each healing method’s comparison with raw reads. Significant p-values ( α < 0.05) in boldface.

DOI: 10.7717/peerj.12446/supp-5

Variation in lyveSET read-mapping Quality metrics

Total proportions of reads mapped to reference genomes and proportions of reads mapping with proper pairing. Dunn test post-hoc p-values are given in parenthesis for each healing method’s comparison with raw reads. Significant p-values ( α < 0.05) are in boldface.

DOI: 10.7717/peerj.12446/supp-6

Assembly quality metrics average values for Skesa

DOI: 10.7717/peerj.12446/supp-7

Escherichia coli O26 (cluster 1) with quality metrics

DOI: 10.7717/peerj.12446/supp-8

Salmonella enterica ser. Reading (cluster 2) with quality metrics

DOI: 10.7717/peerj.12446/supp-9

Salmonella enterica ser. Pomona (cluster 3) with quality metrics

DOI: 10.7717/peerj.12446/supp-10

Shigella sonnei (cluster 4) with quality metrics

DOI: 10.7717/peerj.12446/supp-11

Detailed SNP information

DOI: 10.7717/peerj.12446/supp-12

E. coli O26 (Cluster 1) Read Healing and Read Mapping

Total proportions of reads mapped to reference genome, AP010953 (Data S1, column M). Kruskal–Wallis p = 6.203 × 10−12 ( df = 7); prinseq, prinseq-5pr3pr, prinseq-3pr, and bayesHammer read mappings differ from raw reads by p < 0.05 under the pairwise comparisons post hoc test (Table S5). (B) Proportions of reads mapping with proper pairing (Data S1, column N) against reference genome AP010953. Kruskal–Wallis p = 2.20 × 10−16 ( df = 7); All healed reads except noNmin100 show improved proper paired mapping by p < 0.05 under the pairwise comparison post hoc test.

DOI: 10.7717/peerj.12446/supp-13

S. enterica ser. enterica Reading (Cluster 2) Read Healing and Read Mapping

Total proportions of reads mapped (Data S2, column M) to assembly of strain CVM_N17S1020. Kruskal–Wallis p = 0.000302 ( df = 7) indicates that none of the healing pipelines significantly improve total read mapping. (B) Proportions of reads mapping with proper pairing (Data S2, column N) against assembly of strain CVM_N17S1020. Kruskal–Wallis p = 7.956 × 10−12 ( df = 7) indicates all healed reads except noNmin100 and bayesHammer show an improvement proper paired mapping over raw reads by p < 0.05 under the pairwise comparison post hoc test.

DOI: 10.7717/peerj.12446/supp-14

S. enterica ser. enterica Pomona (Cluster 3) Read Healing and Read Mapping

(A) Total proportions of reads mapped (Data S3, column M) to the genome of strain 2012K-0678. Kruskal–Wallis p = 2.20 × 1016 ( df = 7) indicates only prinseq, prinseq-5pr3pr, and prinseq-3pr read mappings differ from raw reads by p < 0.05 under the pairwise comparisons post hoc test. (B) Proportions of reads mapping with proper pairing (Data S3, column N) against the genome of strain 2012K-0678. Kruskal–Wallis p = 2.20 × 10−16 (df = 7) indicates only prinseq, prinseq-5pr3pr, and prinseq-3pr proper pairing rates differ from raw reads by p < 0.05 under the pairwise comparisons post hoc test.

DOI: 10.7717/peerj.12446/supp-15

Shigella sonnei (Cluster 4) Read Healing and Read Mapping

(A) Total proportions of reads mapped (Data S4, column M) to the genome of strain 2015C-3794. Kruskal–Wallis p = 0.3180 ( df = 7) indicates that none of the healing pipelines significantly improve total read mapping. (B) Proportions of reads mapping with proper pairing (Data S4, column N) against the genome, 2015C-3794. Kruskal–Wallis p = 2.7530 × 10−13 ( df = 7) indicates that fastxOnly-3pr, noNmin100-3pr, prinseq-5pr3pr, and prinseq-3pr have proper pairing rates above raw reads by p < 0.05 under the pairwise comparisons post hoc test.

DOI: 10.7717/peerj.12446/supp-16

Read Healing Effects on SNPs Identification

ROC-like plots of unique CFSAN SNPs (estimated false discovery rate in Data S5) compared to detected concordant SNPsor True Positive Rate(estimated sensitivity in Data S5). (A) E. coli O26 (Cluster 1). (B) S. enterica Reading (Cluster 2). (C) S. enterica Pomona(Cluster 3). (D) Shigella sonnei (Cluster 4).

DOI: 10.7717/peerj.12446/supp-17

Additional Information and Declarations

Competing Interests

Darlene D. Wagner is employed by Eagle Medical Services, LLC.

Author Contributions

Darlene D. Wagner conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Heather A. Carleton and Eija Trees conceived and designed the experiments, authored or reviewed drafts of the paper, and approved the final draft.

Lee S. Katz conceived and designed the experiments, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

DNA Deposition

The following information was supplied regarding the deposition of DNA sequences:

The sequences are publicly available at NCBI (Table S1).

Data Availability

The following information was supplied regarding data availability:

GitHub: https://github.com/darlenewagner/NGS_Multi_Heal.

Funding

The authors received no funding for this work.

5 Citations 2,199 Views 252 Downloads

Your institution may have Open Access funds available for qualifying authors. See if you qualify

Publish for free

Comment on Articles or Preprints and we'll waive your author fee
Learn more

Five new journals in Chemistry

Free to publish • Peer-reviewed • From PeerJ
Find out more