Assessing alignment-based taxonomic classification of ancient microbial DNA

Australian Centre for Ancient DNA, University of Adelaide, Adelaide, South Australia, Australia
Centre of Excellence for Australia Biodiversity and Heritage, University of Adelaide, Adelaide, South Australia, Australia
DOI
10.7287/peerj.preprints.27166v1
Subject Areas
Bioinformatics, Evolutionary Studies, Microbiology
Keywords
Microbiome, Paleomicrobiology, Ancient DNA, Bioinformatics, Alignment, Taxonomic classification, Shotgun metagenomics, Microbiology
Copyright
© 2018 Eisenhofer et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Eisenhofer R, Weyrich LS. 2018. Assessing alignment-based taxonomic classification of ancient microbial DNA. PeerJ Preprints 6:e27166v1

Abstract

The field of paleomicrobiology—the study of ancient microorganisms—is rapidly growing due to recent methodological and technological advancements. It is now possible to obtain vast quantities of DNA data from ancient specimens in a high-throughput manner and use this information to investigate the dynamics and evolution of past microbial communities. However, we still know very little about how the characteristics of ancient DNA influence our ability to accurately assign microbial taxonomies (i.e. identify species) within ancient metagenomic samples. Here, we use both simulated and published metagenomic data sets to investigate how ancient DNA characteristics affect alignment-based taxonomic classification. We find that nucleotide-to-nucleotide, rather than nucleotide-to-protein, alignments are preferable when assigning taxonomies to DNA fragment lengths routinely identified within ancient specimens (<60 bp). We determine that deamination (a form of ancient DNA damage) and random sequence substitutions corresponding to ~100,000 years of genomic divergence minimally impact alignment-based classification. We also test four different reference databases and find that database choice can significantly bias the results of alignment-based taxonomic classification in ancient metagenomic studies. Finally, we perform a reanalysis of previously published ancient dental calculus data, increasing the number of microbial DNA sequences assigned taxonomically by an average of 64.2-fold and identifying microbial species previously unidentified in the original study. Overall, this study enhances our understanding of how ancient DNA characteristics influence alignment-based taxonomic classification of ancient microorganisms and provides recommendations for future paleomicrobiological studies.

Author Comment

This is a submission to PeerJ for review.

Supplemental Information

Read length distribution of simulated metagenome mimicking commonly observed fragment length distribution of ancient DNA

DOI: 10.7287/peerj.preprints.27166v1/supp-1

Genus-level taxonomic assignments of simulated metagenomes

Taxa coloured black were not used as input for constructing the simulated metagenomes and represent misclassifications.

DOI: 10.7287/peerj.preprints.27166v1/supp-2

Species-level taxonomic assignments of simulated metagenomes

Taxa coloured black were not used as input for constructing the simulated metagenomes and represent misclassifications.

DOI: 10.7287/peerj.preprints.27166v1/supp-3

Influence of heavy deamination on taxonomic assignment at species-level using empirical ancient DNA fragment length distribution metagenome

Taxa coloured black were not used as input for constructing the simulated metagenomes and represent misclassifications.

DOI: 10.7287/peerj.preprints.27166v1/supp-4

Influence of deamination on taxonomic assignment at genus-level for all read length metagenomes

Taxa coloured black were not used as input for constructing the simulated metagenomes and represent misclassifications.

DOI: 10.7287/peerj.preprints.27166v1/supp-5

Influence of deamination on taxonomic assignment at species-level for all read length metagenomes

Taxa coloured black were not used as input for constructing the simulated metagenomes and represent misclassifications.

DOI: 10.7287/peerj.preprints.27166v1/supp-6

Influence of divergence and heavy deamination on taxonomic classification at genus-level on empirical ancient DNA fragment length distribution metagenome

Taxa coloured black were not used as input for constructing the simulated metagenomes and represent misclassifications.

DOI: 10.7287/peerj.preprints.27166v1/supp-7

Influence of divergence and heavy deamination on taxonomic classification at species-level on empirical ancient DNA fragment length distribution metagenome

Taxa coloured black were not used as input for constructing the simulated metagenomes and represent misclassifications.

DOI: 10.7287/peerj.preprints.27166v1/supp-8

Read length distribution of simulated metagenome, MALTn-genome aligned reads, and unaligned reads for the 1,000ky divergence simulation

DOI: 10.7287/peerj.preprints.27166v1/supp-9

Species-level classification of the Chimpanzee sample using different reference databases

DOI: 10.7287/peerj.preprints.27166v1/supp-10

Species-level classification of the El Sidron1 Neanderthal using different reference databases

DOI: 10.7287/peerj.preprints.27166v1/supp-11

Species-level classification of the modern dental calculus sample using different reference databases

DOI: 10.7287/peerj.preprints.27166v1/supp-12

Species-level classification of the Spy II Neanderthal using different reference databases

DOI: 10.7287/peerj.preprints.27166v1/supp-13

Details and composition of simulated metagenome

Plaque community based on Mark-Welsh et al. 2016

DOI: 10.7287/peerj.preprints.27166v1/supp-14

Overview and characteristics of simulated metagenomes used in this study

DOI: 10.7287/peerj.preprints.27166v1/supp-15

Taxonomic misclassifications for each MALT database used

DOI: 10.7287/peerj.preprints.27166v1/supp-16

Influence of deamination on the percentage of alignments against the MALTn-genome database

DOI: 10.7287/peerj.preprints.27166v1/supp-17

Influence of deamination on the percentage of alignments against the MALTn-CDS database

DOI: 10.7287/peerj.preprints.27166v1/supp-18

Influence of deamination on the percentage of alignments against the MALTx database

DOI: 10.7287/peerj.preprints.27166v1/supp-19

Species-level classifications unique to each MALT database

DOI: 10.7287/peerj.preprints.27166v1/supp-20

Alignment statistics from reanalysis of previously published dental calculus sample

DOI: 10.7287/peerj.preprints.27166v1/supp-21