0
Possibility of contamination in some of the older pinned specimens in Sirthenea?
Viewed 37 times

I had emailed the authors with the following concern about possible contamination in the molecular dataset in this article on Apr. 3rd and have not yet recieved a response so I am posting my concerns here.

Dear Dr. Chlond,

I am emailing about your article on the phylogeny of Sirthenea published in PeerJ. I have tried PCRing old museum specimens myself and have had some very limited success, so I was curious about the data associated with this paper. Upon preliminary investigation, it seems possible some of the older specimens (pre 1970's) have DNA more closely associated with human DNA (mostly in the 3' end of the COI sequence) than with other confirmed reduviid DNA. These specimens all group together within one clade and the nine oldest specimens all fall within this suspicious clade (also seen in the phylogeny in the paper and the basis of the dating analysis). Below is a basic neighbor joining tree from the COI alignment that demonstrates the issue. https://raw.githubusercontent.com/erg55/Various/master/screenshots/SirtheneaCOI.png

To attempt to see whether this subset of Peiratinae sequences might just be somehow randomly similar to Homo sapiens DNA, I queried the nt database on BLAST with one sequence from the paper excluding Homo sapiens results (which otherwise are almost all the results) and got the following result which consists only of other sequences associated with this paper and then other related primates at decreasing identity scores: https://raw.githubusercontent.com/erg55/Various/master/screenshots/SnitidaBLASTnresultsexcludingHomosapiens.png

It seems like the 18S rRNA gene included in this dataset may have some similar issue but not with human DNA contamination (most likely other reduviid DNA) as the divergence of the short stretch of 18S rRNA is higher than should be expected. Having preliminarily looked at an alignment of the sequences, there are several outliers that may represent contaminants but it is not as straightforward to determine. PCR on old specimens with very fragmented DNA can be challenging because any similar DNA product can be bound by the chosen primers preferentially and amplified especially if some contaminant becomes airborne.

The first part of the COI sequence does not seem to be contaminated by human DNA so perhaps it does originate from the specimens (perhaps only one of the sequencing primers bound preferentially to the contaminant DNA?), but it does not seem to have any informative mutations for the phylogeny and could represent random error.

Sometimes older pinned specimens are the only material available for reconstructing phylogenies and I respect the commitment to try these specimens because very often more recently collected ones are not possible to obtain. Indeed, much of the data does not seem to suffer from these problems and are likely to be valuable in reconstructing the correct phylogeny but that would only be possible via removal or at least extensive trimming of these likely problematic sequences.

Many thanks for listening to my concerns! If you don't find these results persuasive, I would be happy to listen to other arguments that might explain these patterns in the sequence data.

waiting for moderation