URMAP, an ultra-fast read mapper

View article
Bioinformatics and Genomics

Main article text

 

Introduction

Background

Prior work

URMAP algorithm

Performance testing

Urbench performance test

Methods

URMAP index

Hash table

Hash table row

Pins

Singletons

Linked lists

Over-abundant slots

Absent slots

Collisions

Word length

URMAP search algorithm

Query word search order

First pass: brace search

Second pass: low-abundance slot search

Third pass: high-abundance slot search

HSP construction

Gapped alignments

MAPQ calculation

URMAPv algorithm

Tested methods

Urbench benchmark

Reference sequence

Variant genome

Simulated read pairs

Sequencing error

Accuracy metrics

Pairwise method comparison

Speed

Accuracy of MAPQ

Wgsim validation

Variant calling test

Unmappable regions

Results

Speed and mapping accuracy on Urbench

MAPQ accuracy on Urbench

Mapping accuracy on wgsim test

GIAB variant calling test

Unmappable regions

Discussion

GIAB variant call accuracy as a benchmark of mapping accuracy

GIAB bias against regions which are challenging to mappers

Number of aligned reads as a test of mapping accuracy

Conclusions

Supplemental Information

Speed on Urbench

Speed is measured relative to BWA with file i/o overhead minimized.

DOI: 10.7717/peerj.9338/supp-1

Accuracy on Urbench

Accuracy metrics are S (sensitivity) and E (error rate) with MAPQ ≥ 10, expressed as percentages. Superscript is var for the variant genome (NA12878) or ref for the reference (GRCh38), subscript is r for per-read or l for per-locus.

DOI: 10.7717/peerj.9338/supp-2

Mapping accuracy on wgsim test

Accuracy metrics are sensitivity and error rate with MAPQ ≥10, expressed as percentages. Species are Drosophila melanogaster (dm), Arabidopsis thaliana (at) and Homo sapiens (hs).

DOI: 10.7717/peerj.9338/supp-3

Number of mapped reads on GIAB test

For each method, the table shows the number of unmapped reads (i.e., reads with no reported alignment) and numbers of reads aligned with MAPQ=0, MAPQ=1, MAPQ ≥ 10 and total number of mapped reads (Total Mapped), i.e. aligned reads with MAPQ ≥ 0.

DOI: 10.7717/peerj.9338/supp-4

Additional Information and Declarations

Competing Interests

The author declares that he receives income from the sale of scientific software through his personal web site at https://drive5.com.

Author Contributions

Robert Edgar conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

Source code is available at https://github.com/rcedgar/urmap/.

Benchmark data are available at OSF: Edgar, R. (2020). ”Urbench”. OSF. Dataset. https://osf.io/th4qv/.

Funding

The author received no funding for this work.

12 Citations 2,770 Views 581 Downloads