Culture-independent detection and characterisation of Mycobacterium tuberculosis and M. africanum in sputum samples using shotgun metagenomics on a benchtop sequencer

Emma L. Doughty; Martin J. Sergeant; Ifedayo Adetifa; Martin Antonio; Mark J. Pallen

doi:10.7717/peerj.585

Culture-independent detection and characterisation of Mycobacterium tuberculosis and M. africanum in sputum samples using shotgun metagenomics on a benchtop sequencer

Emma L. Doughty¹, Martin J. Sergeant¹, Ifedayo Adetifa², Martin Antonio^1,2, Mark J. Pallen ¹

1Microbiology and Infection Unit, Warwick Medical School, University of Warwick, Coventry, United Kingdom

2Medical Research Council Unit, Fajara, The Gambia

DOI: 10.7717/peerj.585

Published: 2014-09-23
Accepted: 2014-08-28
Received: 2014-08-14

Academic Editor: Jonathan Eisen

Subject Areas: Bioinformatics, Genomics, Microbiology, Infectious Diseases, Respiratory Medicine
Keywords: Tuberculosis, Sputum, Diagnosis, Metagenomics, Mycobacterium tuberculosis

Copyright: © 2014 Doughty et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

Cite this article: Doughty EL, Sergeant MJ, Adetifa I, Antonio M, Pallen MJ. 2014. Culture-independent detection and characterisation of Mycobacterium tuberculosis and M. africanum in sputum samples using shotgun metagenomics on a benchtop sequencer. PeerJ 2:e585 https://doi.org/10.7717/peerj.585

The authors have chosen to make the review history of this article public.

Abstract

Tuberculosis remains a major global health problem. Laboratory diagnostic methods that allow effective, early detection of cases are central to management of tuberculosis in the individual patient and in the community. Since the 1880s, laboratory diagnosis of tuberculosis has relied primarily on microscopy and culture. However, microscopy fails to provide species- or lineage-level identification and culture-based workflows for diagnosis of tuberculosis remain complex, expensive, slow, technically demanding and poorly able to handle mixed infections. We therefore explored the potential of shotgun metagenomics, sequencing of DNA from samples without culture or target-specific amplification or capture, to detect and characterise strains from the Mycobacterium tuberculosis complex in smear-positive sputum samples obtained from The Gambia in West Africa. Eight smear- and culture-positive sputum samples were investigated using a differential-lysis protocol followed by a kit-based DNA extraction method, with sequencing performed on a benchtop sequencing instrument, the Illumina MiSeq. The number of sequence reads in each sputum-derived metagenome ranged from 989,442 to 2,818,238. The proportion of reads in each metagenome mapping against the human genome ranged from 20% to 99%. We were able to detect sequences from the M. tuberculosis complex in all eight samples, with coverage of the H37Rv reference genome ranging from 0.002X to 0.7X. By analysing the distribution of large sequence polymorphisms (deletions and the locations of the insertion element IS6110) and single nucleotide polymorphisms (SNPs), we were able to assign seven of eight metagenome-derived genomes to a species and lineage within the M. tuberculosis complex. Two metagenome-derived mycobacterial genomes were assigned to M. africanum, a species largely confined to West Africa; the others that could be assigned belonged to lineages T, H or LAM within the clade of “modern” M. tuberculosis strains. We have provided proof of principle that shotgun metagenomics can be used to detect and characterise M. tuberculosis sequences from sputum samples without culture or target-specific amplification or capture, using an accessible benchtop-sequencing platform, the Illumina MiSeq, and relatively simple DNA extraction, sequencing and bioinformatics protocols. In our hands, sputum metagenomics does not yet deliver sufficient depth of coverage to allow sequence-based sensitivity testing; it remains to be determined whether improvements in DNA extraction protocols alone can deliver this or whether culture, capture or amplification steps will be required. Nonetheless, we can foresee a tipping point when a unified automated metagenomics-based workflow might start to compete with the plethora of methods currently in use in the diagnostic microbiology laboratory.

Introduction

Tuberculosis (TB) is an infection, primarily of the lungs, caused by Mycobacterium tuberculosis and related species within the M. tuberculosis complex. TB remains a major global health problem, second only to HIV/AIDS in terms of global deaths from a single infectious agent—according to estimates from the World Health Organisation (WHO), 8.6 million people developed TB in 2012 and 1.3 million died from the disease, including 320,000 deaths among HIV-positive individuals (WHO, 2013).

Central to management of TB in the individual patient and in the community are laboratory diagnostic methods that allow effective, early detection of cases. Since the pioneering work of Koch and Ehrlich in the 1880s, laboratory diagnosis of pulmonary TB has largely relied on acid-fast staining of sputum samples and culture on selective laboratory media for the isolation of mycobacteria (Ehrlich, 1882; Koch, 1882). Microscopy is still generally used as a first-line diagnostic approach and as the only laboratory approach in resource-poor settings (Drobniewski et al., 2012) Smear-positivity is also used as a guide to infectivity and responsiveness to treatment. However, microscopy fails to provide species-level identification of acid-fast bacilli (Maiga et al., 2012). Such identification is important in guiding treatment, because pathogenic mycobacteria from outside the M. tuberculosis complex often fail to respond to conventional anti-TB treatment (Maiga et al., 2012). Furthermore, there are important differences in response to treatment even within the M. tuberculosis complex. M. bovis and M. canettii fail to respond to the first-line anti-tuberculous agent pyrazinamide—as a result, failure to recognise M. bovis as a cause of TB can have fatal consequences (Allix-Beguec et al., 2010). In addition, M. canettii appears to show decreased susceptibility to a promising new anti-TB drug candidate, PA-824 (Feuerriegel et al., 2011; Feuerriegel et al., 2013).

There is also increasing recognition of lineage- or species-specific differences in pathogen biology within the M. tuberculosis complex. M. africanum, which is largely restricted to West Africa, where it causes up to half of human pulmonary TB, is associated with less transmissible and less severe infection than typical strains of the “modern” M. tuberculosis clade (de Jong, Antonio & Gagneux, 2010). Similarly, M. canettii, restricted to the horn of Africa, and M. bovis, both usually a spillover from animals, transmit relatively poorly from human to human (Fabre et al., 2010; Gonzalo-Asensio et al., 2014). By contrast, the Beijing-W lineage of M. tuberculosis sensu stricto, which has spread around the world in recent decades, appears to cause more aggressive disease and is more likely to become drug-resistant (Nicol & Wilkinson, 2008; Borgdorff & van Soolingen, 2013).

Owing to the slow growth rate of the M. tuberculosis complex, traditional culture-based diagnosis of TB typically takes several weeks or even months. Similarly, conventional phenotypic mycobacterial sensitivity testing remains slow and may not be reliable for all classes of anti-tuberculous agent. In recent decades, automated detection of growth in liquid culture, through e.g. the mycobacteria growth indicator tube (MGIT), has led to improvements in the speed and ease of diagnosis, so that diagnosis by culture is now often possible within a fortnight (Pfyffer et al., 1997).

However, in comparison to most other laboratory procedures, culture-based diagnostic workflows for TB remain complex, expensive, slow, technically demanding and require expensive biocontainment facilities. Furthermore, as isolation of mycobacteria in pure culture and sensitivity testing remain onerous, in resource-poor settings these steps are omitted and, even in well-resourced laboratories, typically only one or a few single-colony subcultures are followed up from each sample. This leads to under-recognition of mixed infections, where more than one strain from the M. tuberculosis complex is present or where TB co-occurs with infection by other mycobacteria (Shamputa et al., 2004; Warren et al., 2004; Cohen et al., 2011; Wang et al., 2011). This can lead to difficulties in treatment when strains or species susceptible to conventional anti-tuberculous treatment co-exist with resistant strains or species within the same patient (Hingley-Wilson et al., 2013).

As an alternative to culture and phenotypic sensitivity testing, the WHO has recently recommended a new, rapid, automated, real-time amplification-based TB diagnostic test, the Xpert MTB/RIF assay (WHO, 2011). This system allows simultaneous detection of M. tuberculosis and rifampicin-resistance mutations in a closed system, suitable for use in a simple laboratory setting, while providing a result in less than two hours directly from sputum samples (Helb et al., 2010). However, this approach performs suboptimally on mixed infections, fails to provide the full range of clinically relevant information (e.g., speciation, susceptibility to other agents) and, in sampling only a small fraction of the genome, affords no insight into pathogen biology, evolution, and epidemiology (Zetola et al., 2014).

Epidemiological investigation of clinical isolates from the M. tuberculosis complex plays an important role in the management and control of TB. A range of molecular typing schemes have been developed, including IS6110 fingerprinting, mycobacterial interspersed repetitive unit-variable number of tandem repeat (MIRU-VNTR) and spoligotyping (Jagielski et al., 2014). These approaches can be valuable in distinguishing relapse from re-infection and in recognising mixed infections within the individual patient, as well as identifying sources of infection, detecting outbreaks and tracking spread of lineages within a community. However, as these approaches usually require isolation of the pathogen in pure culture, clinically relevant typing data is typically not available until 1–2 months after collection of a sputum sample.

Over the past fifteen years, whole-genome sequencing has been applied to a steadily wider range of isolates from M. tuberculosis and related species (Cole et al., 1998; Brosch et al., 2002; Gutierrez, Supply & Brosch, 2009). These efforts have shed light on the evolution and population structure of this group of pathogens, showing that members of the M. tuberculosis complex are reproductively isolated, engaging in almost no horizontal gene transfer and showing a clonal population structure in which lineages diverge through a limited set of genetic changes, including point mutations, deletions, movement of insertion elements and rearrangements within repetitive regions. Whole-genome analyses allow isolates to be assigned to a range of species, global lineages and sub-lineages on the basis of single nucleotide polymorphisms (SNPs) and large sequence polymorphisms (typically deletions, which are often termed “regions of difference” or RDs, and insertion of the transposable element IS6110).

In recent years, the availability of rapid, cheap high-throughput sequencing and, particularly, the arrival of user-friendly benchtop sequencing platforms, such as the Illumina MiSeq (Loman et al., 2012a; Loman et al., 2012b), have led to the widespread use of whole-genome sequencing in TB sensitivity testing and epidemiology, with adoption of whole-genome sequencing for routine use in some TB reference laboratories (Gardy et al., 2011; Koser et al., 2012; Roetzer et al., 2013; Walker et al., 2013; Walker et al., 2014). However, high-throughput sequencing has not yet been used as a diagnostic tool for TB, because it has been assumed that one needs to subject clinical samples to prolonged culture before sufficient mycobacterial DNA can be obtained for whole-genome sequencing and analysis. Some researchers (Koser et al., 2013) have recently challenged this assumption by obtaining mycobacterial genome sequences from DNA extracted directly from a three-day MGIT culture of a sputum sample. However, this begs the questions: why bother with culture; why not obtain mycobacterial genome sequences directly from a sputum sample, without culture?

Shotgun metagenomics—that is the unbiased sequencing en masse of DNA extracted from a sample without target-specific amplification or capture—has provided a powerful assumption-free approach to the recovery of bacterial pathogen genomes from contemporary and historical material (Pallen, 2014). This approach allowed an outbreak strain genome to be reconstructed from stool samples from the 2011 Escherichia coli O104:H4 outbreak and has proven successful in obtaining genome-wide sequence data for Borrelia burgdorferi, M. leprae, M. tuberculosis and Brucella melitensis from long-dead human remains (Keller et al., 2012; Chan et al., 2013; Loman et al., 2013; Schuenemann et al., 2013; Kay et al., 2014). Metagenomics has recently provided clinically useful information in cases of chlamydial pneumonia and neuroleptospirosis (Fischer et al., 2014; Wilson et al., 2014).

Here, we explore the potential of metagenomics in detecting and characterising Mycobacterium tuberculosis and M. africanum strains in smear-positive sputum samples from patients from The Gambia in West Africa.

Materials and Methods

Microbiological analysis and sample selection

Eight smear- and culture-positive sputum samples were selected for metagenomic analysis from specimens collected in May 2014 under the auspices of the Enhanced Case Finding project (http://clinicaltrials.gov/show/NCT01660646). The joint Gambia Government/MRC Ethics Committee approved this investigation under reference SCC 1232 and informed written consent was obtained for all participants. The sputum samples were collected by expectoration into a sterile cup and transported on ice to the TB laboratory at the MRC Gambia unit within 24 h of collection.

Prior to selection for metagenomic investigation, an aliquot of each sample was subjected to microbiological analysis. These specimens were decontaminated by the sodium hydroxide and N-acetyl-l-cysteine (NaOH/NALC) method, with final concentrations of 1% for NaOH, 1.45% sodium citrate and 0.25% for NALC. Sputum smears were prepared by centrifuging 3–10 mL decontaminated sputum and then resuspending pellets in 2 mL buffer. Smears were stained with auramine-O and then examined by fluorescence microscopy. Positive smears were confirmed by Ziehl-Neelsen staining. 20–100 fields were examined at 1000X magnification and smear-positive samples were scored quantitatively as 1 +, 2 + or 3 + (Kent & Kubica, 1985).The presence of M. tuberculosis complex in samples was confirmed by culture in the BACTEC MGIT 960 Mycobacterial Detection System and on slopes of Löwenstein–Jensen medium. Cultured isolates were subjected to spoligotyping as previously described (Kamerbeek et al., 1997; de Jong et al., 2009).

DNA extraction using differential lysis

DNA extraction was performed in the TB laboratory in the MRC Unit in The Gambia. Aliquots of unprocessed sputum were subjected to a differential lysis protocol, modified from a published method for metagenomic analysis of sputum from cystic fibrosis patients (Lim et al., 2013). In this method, human cells are subjected to osmotic lysis and then the liberated human DNA is removed by DNase treatment. To monitor contamination within the laboratory, we processed two negative-control samples containing only sterile water via the same method.

At the start of the differential lysis protocol, a 1 mL aliquot of whole sputum was mixed with 1 mL decongestant solution (0.25 g N-acetyl L-cysteine, 25 mL 2.9% sodium citrate, 25 mL water) until liquefied and incubated for 15 min at room temperature. 48 mL phosphate-buffered solution (pH 7) was added and mixed thoroughly, before centrifugation at 3,220 × g for 20 min. The pellet was resuspended in 10 mL sterile deionised water and incubated at room temperature for 15 min, so that human cells undergo osmotic lysis, while mycobacterial cells remain intact. The centrifugation and resuspension-in-water steps were repeated before a final round of centrifugation. The pellet was then treated with the RNase-Free DNase Set (Qiagen), adding 25 µL DNase I (2.73 Kunitz units per µL), 100 µL RDD buffer and 875 µL sterile water. The sample was then incubated at room temperature for 2 h, with repeated inversion of the tubes. The sample underwent two rounds of centrifugation and resuspension of the pellet in 10 mL TE buffer (0.01 M Tris–HCl, 0.001 M EDTA, pH 8.0). Finally, before DNA extraction began, the sample was centrifuged and the pellet was resuspended in 500 µL TE buffer. On completion of the differential lysis protocol, samples underwent heat treatment at 75 °C for 10 min, followed by DNA extraction using a commercial kit, the NucleoSpin Tissue-Kit (Macherey-Nagel, Duren, Germany), according to the manufacturer’s protocol for hard-to-lyse bacteria.

Library preparation and sequencing

DNA samples were sent to Warwick Medical School, Coventry, UK, where all further laboratory and bioinformatics analyses were performed. The concentration of DNA present in each extract was determined using the Qubit 2.0 fluorometer and Qubit^® dsDNA Assay Kits according to the manufacturer’s protocol (Invitrogen Ltd., Paisley, United Kingdom), using the HS (high-sensitivity) or BR (broad-range) kits, depending on the DNA concentration. There was no detectable DNA in the negative control samples with the HS kit, which is sensitive down to 10 pg/µL. DNA extracts were diluted to 0.2 ng/µL and were then converted into sequencing libraries, using the Illumina Nextera XT sample preparation kit according to the manufacturer’s instructions (Illumina UK, Little Chesterford, United Kingdom). The libraries were sequenced on the Illumina MiSeq at the University of Warwick.

Identification of human and mycobacterial sequences

Sequence reads were mapped against the genome of Mycobacterium tuberculosis H37Rv (GenBank accession numbers AL123456) and the human reference genome hg19 (GenBank Assembly ID: GCA_000001405.1), using Bowtie2 version 2.1.0 (Langmead & Salzberg, 2012), using relaxed and stringent protocols. The relaxed protocol exploited the option --very-sensitive-local. The stringent protocol allowed only limited mismatches (3 per 100 base pairs) and soft clipping of poor quality ends, by exploiting the options --ignore-quals --mp 10,10 --score-min L,0,0.725 --local --ma 1. A custom-built script was used to convert coverage data from the BAM files into a tab-delineated format that was then entered into Microsoft Excel, which was then used to generate coverage plots. Metagenomic sequence reads from this study (excluding those that mapped to the human genome) have been deposited in the European Nucleotide Archive under the following accession numbers: ERS542292, ERS542293, ERS542294, ERS542295, ERS542296, ERS542297, ERS542298, ERS542299.

Species and lineage assignment using low-coverage SNPs

For the phylogenetic analysis using SNPs, we selected representative genomes from each of the species and major lineages within the M. tuberculosis complex that infect humans, drawing on lineage designations reported by PolyTB (Coll et al., 2014). Genome sequences were taken from entries in the short read archive ERP000276 and ERP000124 (http://www.ncbi.nlm.nih.gov/Traces/sra/). We then mapped these genomes against M. tuberculosis H37Rv with Bowtie2 under default settings and then called SNPs using VarScan2 (Koboldt et al., 2012). Any SNPs that fell within a set of previously published repetitive genes were excluded from further analysis (Comas et al., 2010). SNPs were used to construct a tree with RAxML version 7 (Stamatakis, 2014), using default parameters with the GTR-gamma model. Reads from the metagenome from each sample were mapped against the reference strain M. tuberculosis H37Rv using the default settings in Bowtie2 and the majority base called from each SNP position with no quality filtering. If no base was present at the position, a gap was used. The pplacer suite of programs (Matsen, Kodner & Armbrust, 2010) was then used to assign the sequence to a species and lineage on the mycobacterial tree.

Lineage assignment using IS6110-insertion-site profiles

We mapped each metagenome against the sequence of IS6110 (Genbank accession number: AJ242908) using Bowtie’s --local option, which performs a softclipping of the mapped sequences. We then extracted IS6110-flanking sequences by retrieving all sequences >30 bp that had that had been softclipped from the ends of the element. These sequences were then mapped against the H37Rv genome using Bowtie2 and the coordinates of the IS6110 insertion points determined.

Results

Detection of the M. tuberculosis complex in sputum samples

We obtained metagenomic sequences from eight smear- and culture-positive sputum samples. The number of sequence reads in each sputum-derived metagenome ranged from 989,442 to 2,818,238 (Table 1). The proportion of reads from each sample mapping to the human reference genome hg19 varied from 20% to 99%.

Table 1:

Sample characteristics and sequencing results.

Sample	ZN grade	DNA concentration in extract (µg/mL)	Total no. reads	% reads aligning to human genome
K1	3+	27.8	989,442	73.71
K2	3+	2.28	2,170,640	78.46
K3	2+	71	1,617,808	99.3
K4	2+	250	1,204,408	97.22
K5	2+	7.7	1,537,676	74.17
K6	2+	48.8	2,411,708	97.47
K7	1+	25	2,818,238	50.59
K8	1+	0.63	1,851,892	20.29

DOI: 10.7717/peerj.585/table-1

Coverage from reads mapping to the genome of the M. tuberculosis reference strain H37Rv under relaxed settings ranged from 0.009X to 1.3X (Table 2). However, we suspected that many of the matches represented false-positives. To confirm our suspicion, we calculated the average read depth at the positions where reads matched.

Table 2:

Mapping to M. tuberculosis H37Rv reference genome.

Sample	Under relaxed mapping conditions			Under stringent mapping conditions
	Bases aligning to H37Rv	Coverage of H37Rv	Average read depth	Bases aligning to H37Rv	Coverage of H37Rv	Average read depth
K1	410,228	0.093	2.2	141,906	0.032	1.3
K2	5,685,901	1.289	2.3	3,057,187	0.693	1.9
K3	99,643	0.023	1.3	54,413	0.012	1.2
K4	40,019	0.009	1.9	10,840	0.002	1.3
K5	732,623	0.166	2.5	238,451	0.054	1.3
K6	94,023	0.021	2.3	34,704	0.008	1.7
K7	1,366,309	0.310	11.4	50,873	0.012	1.5
K8	1,725,816	0.391	7.7	109,514	0.025	1.3

DOI: 10.7717/peerj.585/table-2

If the matches occurred because of sequence identity with conserved genes from other species, one would expect there to be multiple reads matching each mapped position, whereas for a shotgun library where the coverage is less than 1X, one would expect the average read depth to be around 1. However, as we created our sequence libraries using a paired-end protocol, there will be variable overlap between reads originating from the same DNA fragment, so one would expect the average read depth for a genuine random shotgun under these conditions to sit between 1 and 2. However, when mapping was performed under relaxed conditions, the average read depth was >2 in six of the eight samples and in two cases was >7 (Table 2), indicating a major contribution from spurious matches to conserved genes.

To restrict matches to the H37Rv genome to genuine on-target alignments, we then mapped each metagenome against the reference strain under high-stringency conditions (≤ 3 mismatches per 100 base pairs, with soft clipping of poor quality ends). This led to a decrease in reads mapping to H37Rv in all samples, with coverage of the H37Rv under stringent settings ranging from 0.002X to 0.7X. Nonetheless, we recovered between ∼11,000 and 3 million base pairs of M. tuberculosis sequence from our samples under such stringent conditions (Table 2). The average read depth in the samples fell to between 1.2 and 1.9, consistent with expectations for a random shotgun (Table 2).

Phylogenetic placement of M. tuberculosis strains using SNPs

Conventional phylogenetic methods based on identification of trusted SNPs cannot be applied to the kinds of low-coverage genome sequences we have obtained here. However, the technique of “phylogenetic placement” provides an alternative solution (Matsen, Kodner & Armbrust, 2010; Kay et al., 2014). Here, one draws on a fixed reference tree, computed from high-coverage genomes, and places the unknown query sequence on to the tree using programs such as pplacer (Matsen, Kodner & Armbrust, 2010). To perform phylogenetic placements on our samples, we derived a set of phylogenetically informative SNPs from representatives of the major lineages within the M. tuberculosis complex. We then analysed reads from each of the sputum metagenomes that aligned to equivalent positions in the H37Rv genome.

Using this approach, despite the low coverage, we could confidently assign (with a posterior probability of >0.97), all but one of the metagenome-derived mycobacterial genomes to a species and lineage within the M. tuberculosis complex (Fig. 1). In all these cases, the conclusions from metagenomics matched those from spoligotyping of cultured isolates (Table 3). For two of the samples (K3, K5), the metagenome-derived genome was assigned to M. africanum clade 2, which is consistent with the known high-prevalence of this lineage in The Gambia (de Jong et al., 2010). Five samples were assigned to the Euro-American lineage (also termed Lineage 4), which sits within the clade of modern M. tuberculosis strains and which is known to be highly prevalent in The Gambia (de Jong et al., 2010). Phylogenetic placement allowed three of these samples to be assigned to sub-lineage H, one to the T-clade and one to the LAM clade.

Maximum likelihood tree showing placement of mycobacterial metagenome-derived genomes amongst the major lineages and clades within the M. tuberculosis complex. — Figure 1: Maximum likelihood tree showing placement of mycobacterial metagenome-derived genomes amongst the major lineages and clades within the *M. tuberculosis* complex.
Detection and characterisation of *Mycobacterium tuberculosis* in sputum samples using shotgun metagenomics. Two representatives from each lineage/clades are shown. Tree calculated using RaXML and rooted with *M. canetti* (not shown).

Download full-size image

DOI: 10.7717/peerj.585/fig-1

Table 3:

Species and lineage assignments by phylogenetic placement and spoligotyping.

Sample	Phylogenetic placement by pplacer		Spoligotyping
	Species, lineage, clade	Posterior probability	Lineage	Spoligotype
K1	M. tuberculosis Euro-American / Lineage 4 LAM clade	1	Euro-American	1101111111110111111100001111111100001111011
K2	M. tuberculosis Euro-American / Lineage 4 H clade	1	Euro-American	1111111111111111111111111111110100001111111
K3	M. africanum Lineage 6 M. africanum clade 2	1	West African 2	1111110001111111111000001000011111111101111
K4	M. tuberculosis Euro-American / Lineage 4 H clade	0.99	Euro-American	1111111111111111111111111111110100001111111
K5	M. africanum Lineage 6 M. africanum clade 2	1	West African 2	1111110001111111111111111111111111111101111
K6	Not determined		West African 2	1111110001111111111111111111111111111101111
K7	M. tuberculosis Euro-American / Lineage 4 H clade	0.97	Euro-American	1111111111111111111111111111110100001111111
K8	M. tuberculosis Euro-American / Lineage 4 T clade	1	Euro-American	1111110000000000000000000111111100001111111

DOI: 10.7717/peerj.585/table-3

Species and lineage assignment using IS6110 insertion sites

From four samples, we were able to retrieve information on IS6110 insertion sites (Table 4). In two of the three samples (K2, K4) assigned to the H clade by phylogenetic placement, we discovered IS6110 insertion sites that had previously been reported as specific to the Haarlem or H clade (HSI1, HSI2, HSI3), thereby confirming the SNP-based lineage assignment (Cubillos-Ruiz et al., 2010). In the sample assigned to the LAM clade, we retrieved information on a single IS6110 insertion site, which disrupts the coding sequence Rv3113. This insertion has been reported as specific to the LAM clade (Lanzas et al., 2013), again confirming the SNP-based lineage assignment. In one of the two samples assigned to M. africanum, we retrieved information on a single IS6110 insertion site. However, this insertion appeared to be absent from all other available genome-sequenced strains from the M. tuberculosis complex, so was phylogenetically uninformative.

Table 4:

IS6110 profiles.

Sample	No. reads mapping to IS6110	No. reads spanning IS6110 insertion site	IS6110 insertion site coordinates	Comments
K1	11	1	3480371	Specific to LAM clade
K2	199	22	2610861 (HSI1), 1075947–1075950 (HSI2), 1715974 (HSI3). 212132–212135, 483295–483298, 888787, 1695606, 1986622–1986625, 3120523	HSI1, HSI2, HSI3 specific to H clade:
K3	2	0	Not determined
K4	6	2	2610861–2610864 (HSI1)	HSI1 specific to H clade
K5	4	1	2631765	Unique so uninformative
K6	0	0	Not determined
K7	2	0	Not determined
K8	5	0	Not determined

DOI: 10.7717/peerj.585/table-4

Discussion

Here, we have provided proof of principle that shotgun metagenomics can be used to detect and characterise M. tuberculosis sequences from sputum samples without culture or target-specific amplification or capture, using an accessible benchtop-sequencing platform, the Illumina MiSeq, and relatively simple DNA extraction, sequencing and bioinformatics protocols.

There are several proven or potential advantages to metagenomics as a diagnostic approach for pulmonary TB. By circumventing the need for culture, it could provide information more quickly than conventional approaches. Even in this proof-of-principle study, for most samples it has provided more detailed information than conventional approaches, including spoligotyping. In addition, it represents an open-ended one-size-fits-all approach that could allow the reunification of TB microbiology with other sputum microbiology, particularly as metagenomics has already been shown to work on other respiratory tract pathogens, including bacteria and viruses (Lysholm et al., 2012; Fischer et al., 2014). It also aids in the detection of mixed infections (Chan et al., 2013; Koser et al., 2013), which are clinically important, but hard to recognise (Shamputa et al., 2004; Warren et al., 2004; Cohen et al., 2011; Wang et al., 2011; Hingley-Wilson et al., 2013).

However, as things stand, there are several important limitations to metagenomics as a diagnostic approach. Our study has been limited to the investigation of smear-positive sputum samples, where a diagnosis can already be obtained quickly and easily by microscopy; considerable improvements in sensitivity are likely to be needed before metagenomics can be made to work on smear-negative culture-positive samples. However, it is worth stressing that smear-positive cases are the most important TB cases in terms of infectivity and severity of disease and rapid, accurate diagnosis and epidemiological investigation of such samples is likely to aid TB control (Shaw & Wynn-Williams, 1954; Colebunders & Bastian, 2000; Wang et al., 2008). Plus, for all our samples, metagenomics goes beyond mere detection of acid-fast bacilli to deliver clinically important information at the level of species and lineage within the M. tuberculosis complex.

Surprisingly, metagenomics has not proven quite so informative when applied to contemporary sputum samples as when applied to historical samples, from which we have gained much higher coverage of pathogen genomes, which allowed recognition of phylogenetically informative large sequence polymorphisms (Chan et al., 2013; Kay et al., 2014). Furthermore, in our hands, sputum metagenomics does not yet deliver sufficient depth of coverage of TB genomes to allow the accurate SNP calling necessary for sequence-based sensitivity testing. It remains unclear whether increased depth of coverage can be achieved by refinements in DNA extraction protocols alone—or whether one might need to sacrifice the speed, simplicity and open-endedness of shotgun metagenomics by incorporating amplification of mycobacterial DNA or cells (i.e., by culture in MGIT tubes (Koser et al., 2013)) or by capture of mycobacterial cells or DNA (Sweeney et al., 2006; Bouwman et al., 2012; Schuenemann et al., 2013).

Some have argued that metagenomics is too expensive for routine use (Köser, Ellington & Peacock, 2014). However, the same was true of whole-genome sequencing a few years ago; in this study, reagent costs amounted to <£50 per sample. Plus, with minor modifications, we anticipate that DNA extraction could be completed in a few hours of receipt of a sputum sample and sequencing and analysis within a few days. In addition, now that cultured TB isolates are being routinely genome sequenced in many laboratories (Koser et al., 2012; Kohl et al., 2014), a catalogue of local TB genomes will be available for comparison with the metagenome-derived genomes, facilitating epidemiological analyses

With likely future improvements in the ease, throughput and cost-effectiveness of sequencing, twinned with commoditisation of laboratory and informatics workflows, one can foresee a tipping point when a unified automated metagenomics-based workflow might start to compete with the plethora of methods currently in use in the diagnostic microbiology laboratory, while also delivering additional useful information on epidemiology, antimicrobial resistance and pathogen biology.

Supplemental Information

Detection and characterisation of Mycobacterium tuberculosis in sputum samples using shotgun metagenomics. Detailed phylogenetic placement of metagenome-derived genomes

For each sample, the majority bases at each reference SNP position (or gaps if there was no coverage at that position) were concatenated and the sequence was placed in the reference tree using pplacer (see Methods). The output from pplacer (jplace file) was parsed and a new file produced, such that each alternative placement for a sample could be displayed in the tree along with the posterior probability of that placement. Trees were generated from the modified place file, using guppy from the pplacer suite of programs. Only the top 3 placements for each sample are shown. The combined pp values for each alternative placement of sample in a clade were used to ascertain the likelihood of that sample belonging to that clade.

DOI: 10.7717/peerj.585/supp-1

Download

SNP matrix used to generate tree shown in Fig. 1

DOI: 10.7717/peerj.585/supp-2

Download

Repetitive genes excluded from SNP calling

DOI: 10.7717/peerj.585/supp-3

Download

[1] Allix-Beguec C, Fauville-Dufaux M, Stoffels K, Ommeslag D, Walravens K, Saegerman C, Supply P. 2010. Importance of identifying Mycobacterium bovis as a causative agent of human tuberculosis. European Respiratory Journal 35(3):692-694

[2] Borgdorff MW, Van Soolingen D. 2013. The re-emergence of tuberculosis: what have we learnt from molecular epidemiology? Clinical Microbiology and Infection 19(10):889-901

[3] Bouwman AS, Kennedy SL, Muller R, Stephens RH, Holst M, Caffell AC, Roberts CA, Brown TA. 2012. Genotype of a historic strain of Mycobacterium tuberculosis. Proceedings of the National Academy of Sciences of the United States of America 109(45):18511-18516

[4] Brosch R, Gordon SV, Marmiesse M, Brodin P, Buchrieser C, Eiglmeier K, Garnier T, Gutierrez C, Hewinson G, Kremer K, Parsons LM, Pym AS, Samper S, Van Soolingen D, Cole ST. 2002. A new evolutionary scenario for the Mycobacterium tuberculosis complex. Proceedings of the National Academy of Sciences of the United States of America 99(6):3684-3689

[5] Chan JZ, Sergeant MJ, Lee OY, Minnikin DE, Besra GS, Pap I, Spigelman M, Donoghue HD, Pallen MJ. 2013. Metagenomic analysis of tuberculosis in a mummy. New England Journal of Medicine 369:289-290

[6] Cohen T, Wilson D, Wallengren K, Samuel EY, Murray M. 2011. Mixed-strain Mycobacterium tuberculosis infections among patients dying in a hospital in KwaZulu-Natal, South Africa. Journal of Clinical Microbiology 49(1):385-388

[7] Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D, Gordon SV, Eiglmeier K, Gas S, Barry CE, Tekaia F, Badcock K, Basham D, Brown D, Chillingworth T, Connor R, Davies R, Devlin K, Feltwell T, Gentles S, Hamlin N, Holroyd S, Hornsby T, Jagels K, Krogh A, McLean J, Moule S, Murphy L, Oliver K, Osborne J, Quail MA, Rajandream MA, Rogers J, Rutter S, Seeger K, Skelton J, Squares R, Squares S, Sulston JE, Taylor K, Whitehead S, Barrell BG. 1998. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393(6685):537-544

[8] Colebunders R, Bastian I. 2000. A review of the diagnosis and treatment of smear-negative pulmonary tuberculosis. International Journal of Tuberculosis and Lung Disease 4(2):97-107

[9] Coll F, Preston M, Guerra-Assuncao JA, Hill-Cawthorn G, Harris D, Perdigao J, Viveiros M, Portugal I, Drobniewski F, Gagneux S, Glynn JR, Pain A, Parkhill J, McNerney R, Martin N, Clark TG. 2014. PolyTB: a genomic variation map for Mycobacterium tuberculosis. Tuberculosis 94(3):346-354

[10] Comas I, Chakravartti J, Small PM, Galagan J, Niemann S, Kremer K, Ernst JD, Gagneux S. 2010. Human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved. Nature Genetics 42(6):498-503

[11] Cubillos-Ruiz A, Sandoval A, Ritacco V, Lopez B, Robledo J, Correa N, Hernandez-Neuta I, Zambrano MM, Del Portillo P. 2010. Genomic signatures of the haarlem lineage of Mycobacterium tuberculosis: implications of strain genetic variation in drug and vaccine development. Journal of Clinical Microbiology 48(10):3614-3623

[12] de Jong BC, Adetifa I, Walther B, Hill PC, Antonio M, Ota M, Adegbola RA. 2010. Differences between tuberculosis cases infected with Mycobacterium africanum, West African type 2, relative to Euro-American Mycobacterium tuberculosis: an update. FEMS Immunology and Medical Microbiology 58(1):102-105

[13] de Jong BC, Antonio M, Awine T, Ogungbemi K, de Jong YP, Gagneux S, DeRiemer K, Zozio T, Rastogi N, Borgdorff M, Hill PC, Adegbola RA. 2009. Use of spoligotyping and large sequence polymorphisms to study the population structure of the Mycobacterium tuberculosis complex in a cohort study of consecutive smear-positive tuberculosis cases in The Gambia. Journal of Clinical Microbiology 47(4):994-1001

[14] de Jong BC, Antonio M, Gagneux S. 2010. Mycobacterium africanum–review of an important cause of human tuberculosis in West Africa. PLoS Neglected Tropical Diseases 4(9):e744

[15] Drobniewski F, Nikolayevskyy V, Balabanova Y, Bang D, Papaventsis D. 2012. Diagnosis of tuberculosis and drug resistance: what can new tools bring us? International Journal of Tuberculosis and Lung Disease 16(7):860-870

[16] Ehrlich P. 1882. Referate aus den Verein fur innere Medicin zu Berlin. Deutsche Medizinische Wochenschrift 9:246-249

[17] Fabre M, Hauck Y, Soler C, Koeck JL, Van Ingen J, Van Soolingen D, Vergnaud G, Pourcel C. 2010. Molecular characteristics of “Mycobacterium canettii” the smooth Mycobacterium tuberculosis bacilli. Infection, Genetics and Evolution 10(8):1165-1173

[18] Feuerriegel S, Koser CU, Bau D, Rusch-Gerdes S, Summers DK, Archer JA, Marti-Renom MA, Niemann S. 2011. Impact of Fgd1 and ddn diversity in Mycobacterium tuberculosis complex on in vitro susceptibility to PA-824. Antimicrobial Agents and Chemotherapy 55(12):5718-5722

[19] Feuerriegel S, Koser CU, Richter E, Niemann S. 2013. Mycobacterium canettii is intrinsically resistant to both pyrazinamide and pyrazinoic acid. Journal of Antimicrobial Chemotherapy 68(6):1439-1440

[20] Fischer N, Rohde H, Indenbirken D, Gunther T, Reumann K, Lutgehetmann M, Meyer T, Kluge S, Aepfelbacher M, Alawi M, Grundhoff A. 2014. Rapid metagenomic diagnostics for suspected outbreak of severe pneumonia. Emerging Infectious Diseases 20(6):1072-1075

[21] Gardy JL, Johnston JC, Ho Sui SJ, Cook VJ, Shah L, Brodkin E, Rempel S, Moore R, Zhao Y, Holt R, Varhol R, Birol I, Lem M, Sharma MK, Elwood K, Jones SJ, Brinkman FS, Brunham RC, Tang P. 2011. Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. New England Journal of Medicine 364(8):730-739

[22] Gonzalo-Asensio J, Malaga W, Pawlik A, Astarie-Dequeker C, Passemar C, Moreau F, Laval F, Daffe M, Martin C, Brosch R, Guilhot C. 2014. Evolutionary history of tuberculosis shaped by conserved mutations in the PhoPR virulence regulator. Proceedings of the National Academy of Sciences of the United States of America 111(31):11491-11496

[23] Gutierrez MC, Supply P, Brosch R. 2009. Pathogenomics of mycobacteria. Genome Dynamics 6:198-210

[24] Helb D, Jones M, Story E, Boehme C, Wallace E, Ho K, Kop J, Owens MR, Rodgers R, Banada P, Safi H, Blakemore R, Lan NT, Jones-Lopez EC, Levi M, Burday M, Ayakaka I, Mugerwa RD, McMillan B, Winn-Deen E, Christel L, Dailey P, Perkins MD, Persing DH, Alland D. 2010. Rapid detection of Mycobacterium tuberculosis and rifampin resistance by use of on-demand, near-patient technology. Journal of Clinical Microbiology 48(1):229-237

[25] Hingley-Wilson SM, Casey R, Connell D, Bremang S, Evans JT, Hawkey PM, Smith GE, Jepson A, Philip S, Kon OM, Lalvani A. 2013. Undetected multidrug-resistant tuberculosis amplified by first-line therapy in mixed infection. Emerging Infectious Diseases 19(7):1138-1141

[26] Jagielski T, Van Ingen J, Rastogi N, Dziadek J, Mazur PK, Bielecki J. 2014. Current methods in the molecular typing of Mycobacterium tuberculosis and other mycobacteria. BioMed Research International Article 645802

[27] Kamerbeek J, Schouls L, Kolk A, Van Agterveld M, Van Soolingen D, Kuijper S, Bunschoten A, Molhuizen H, Shaw R, Goyal M, Van Embden J. 1997. Simultaneous detection and strain differentiation of Mycobacterium tuberculosis for diagnosis and epidemiology. Journal of Clinical Microbiology 35(4):907-914

[28] Kay GL, Sergeant MJ, Giuffra V, Bandiera P, Milanese M, Bramanti B, Bianucci R, Pallen MJ. 2014. Recovery of a medieval Brucella melitensis genome using shotgun metagenomics. MBio 5(4):e01337-14

[29] Keller A, Graefen A, Ball M, Matzas M, Boisguerin V, Maixner F, Leidinger P, Backes C, Khairat R, Forster M, Stade B, Franke A, Mayer J, Spangler J, McLaughlin S, Shah M, Lee C, Harkins TT, Sartori A, Moreno-Estrada A, Henn B, Sikora M, Semino O, Chiaroni J, Rootsi S, Myres NM, Cabrera VM, Underhill PA, Bustamante CD, Vigl EE, Samadelli M, Cipollini G, Haas J, Katus H, O’Connor BD, Carlson MR, Meder B, Blin N, Meese E, Pusch CM, Zink A. 2012. New insights into the Tyrolean Iceman’s origin and phenotype as inferred by whole-genome sequencing. Nature Communications 3 Article 698

[30] Kent PT, Kubica GP. 1985. Public health mycobacteriology: a guide for the level III laboratory. Atlanta, GA: Centers for Disease Control and Prevention.

[31] Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK. 2012. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Research 22(3):568-576

[32] Koch R. 1882. Die Aetiologie der Tuberkulose. Berliner Klinische Wochenschrift 19:221-230

[33] Kohl TA, Diel R, Harmsen D, Rothganger J, Walter KM, Merker M, Weniger T, Niemann S. 2014. Whole-Genome-Based Mycobacterium tuberculosis Surveillance: a Standardized, Portable, and Expandable Approach. Journal of Clinical Microbiology 52(7):2479-2486

[34] Köser CU, Ellington MJ, Peacock SJ. 2014. Whole-genome sequencing to control antimicrobial resistance. Trends in Genetics 30(9):401-407

[35] Koser CU, Bryant JM, Becq J, Torok ME, Ellington MJ, Marti-Renom MA, Carmichael AJ, Parkhill J, Smith GP, Peacock SJ. 2013. Whole-genome sequencing for rapid susceptibility testing of M. tuberculosis. New England Journal of Medicine 369(3):290-292

[36] Koser CU, Ellington MJ, Cartwright EJ, Gillespie SH, Brown NM, Farrington M, Holden MT, Dougan G, Bentley SD, Parkhill J, Peacock SJ. 2012. Routine use of microbial whole genome sequencing in diagnostic and public health microbiology. PLoS Pathogens 8(8):e1002824

[37] Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nature Methods 9(4):357-359

[38] Lanzas F, Karakousis PC, Sacchettini JC, Ioerger TR. 2013. Multidrug-resistant tuberculosis in panama is driven by clonal expansion of a multidrug-resistant Mycobacterium tuberculosis strain related to the KZN extensively drug-resistant M. tuberculosis strain from South Africa. Journal of Clinical Microbiology 51(10):3277-3285

[39] Lim YW, Schmieder R, Haynes M, Willner D, Furlan M, Youle M, Abbott K, Edwards R, Evangelista J, Conrad D, Rohwer F. 2013. Metagenomics and metatranscriptomics: windows on CF-associated viral and microbial communities. Journal of Cystic Fibrosis 12(2):154-164

[40] Loman NJ, Constantinidou C, Chan JZ, Halachev M, Sergeant M, Penn CW, Robinson ER, Pallen MJ. 2012a. High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity. Nature Reviews Microbiology 10(9):599-606

[41] Loman NJ, Constantinidou C, Christner M, Rohde H, Chan JZ, Quick J, Weir JC, Quince C, Smith GP, Betley JR, Aepfelbacher M, Pallen MJ. 2013. A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of Shiga-toxigenic Escherichia coli O104:H4. Journal of the American Medical Association 309(14):1502-1510

[42] Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, Pallen MJ. 2012b. Performance comparison of benchtop high-throughput sequencing platforms. Nature Biotechnology 30(5):434-439

[43] Lysholm F, Wetterbom A, Lindau C, Darban H, Bjerkner A, Fahlander K, Lindberg AM, Persson B, Allander T, Andersson B. 2012. Characterization of the viral microbiome in patients with severe lower respiratory tract infections, using metagenomic sequencing. PLoS ONE 7(2):e30875

[44] Maiga M, Siddiqui S, Diallo S, Diarra B, Traore B, Shea YR, Zelazny AM, Dembele BP, Goita D, Kassambara H, Hammond AS, Polis MA, Tounkara A. 2012. Failure to recognize nontuberculous mycobacteria leads to misdiagnosis of chronic pulmonary tuberculosis. PLoS ONE 7(5):e36902

[45] Matsen FA, Kodner RB, Armbrust EV. 2010. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11:538

[46] Nicol MP, Wilkinson RJ. 2008. The clinical consequences of strain diversity in Mycobacterium tuberculosis. Transactions of the Royal Society of Tropical Medicine and Hygiene 102(10):955-965

[47] Pallen MJ. 2014. Diagnostic metagenomics: potential applications to bacterial, viral and parasitic infections. Parasitology 1-7

[48] Pfyffer GE, Welscher HM, Kissling P, Cieslak C, Casal MJ, Gutierrez J, Rusch-Gerdes S. 1997. Comparison of the Mycobacteria Growth Indicator Tube (MGIT) with radiometric and solid culture for recovery of acid-fast bacilli. Journal of Clinical Microbiology 35(2):364-368

[49] Roetzer A, Diel R, Kohl TA, Ruckert C, Nubel U, Blom J, Wirth T, Jaenicke S, Schuback S, Rusch-Gerdes S, Supply P, Kalinowski J, Niemann S. 2013. Whole genome sequencing versus traditional genotyping for investigation of a Mycobacterium tuberculosis outbreak: a longitudinal molecular epidemiological study. PLoS Medicine 10(2):e1001387

[50] Schuenemann VJ, Singh P, Mendum TA, Krause-Kyora B, Jager G, Bos KI, Herbig A, Economou C, Benjak A, Busso P, Nebel A, Boldsen JL, Kjellstrom A, Wu H, Stewart GR, Taylor GM, Bauer P, Lee OY, Wu HH, Minnikin DE, Besra GS, Tucker K, Roffey S, Sow SO, Cole ST, Nieselt K, Krause J. 2013. Genome-wide comparison of medieval and modern Mycobacterium leprae. Science 341(6142):179-183

[51] Shamputa IC, Rigouts L, Eyongeta LA, El Aila NA, Van Deun A, Salim AH, Willery E, Locht C, Supply P, Portaels F. 2004. Genotypic and phenotypic heterogeneity among Mycobacterium tuberculosis isolates from pulmonary tuberculosis patients. Journal of Clinical Microbiology 42(12):5528-5536

[52] Shaw JB, Wynn-Williams N. 1954. Infectivity of pulmonary tuberculosis in relation to sputum status. American Review of Tuberculosis 69(5):724-732

[53] Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312-1313

[54] Sweeney FP, Courtenay O, Ul-Hassan A, Hibberd V, Reilly LA, Wellington EM. 2006. Immunomagnetic recovery of Mycobacterium bovis from naturally infected environmental samples. Letters in Applied Microbiology 43(4):364-369

[55] Walker TM, Ip CL, Harrell RH, Evans JT, Kapatai G, Dedicoat MJ, Eyre DW, Wilson DJ, Hawkey PM, Crook DW, Parkhill J, Harris D, Walker AS, Bowden R, Monk P, Smith EG, Peto TE. 2013. Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. The Lancet Infectious Diseases 13(2):137-146

[56] Walker TM, Lalor MK, Broda A, Saldana Ortega L, Morgan M, Parker L, Churchill S, Bennett K, Golubchik T, Giess AP, Del Ojo Elias C, Jeffery KJ, Bowler IC, Laurenson IF, Barrett A, Drobniewski F, McCarthy ND, Anderson LF, Abubakar I, Thomas HL, Monk P, Smith EG, Walker AS, Crook DW, Peto TE, Conlon CP. 2014. Assessment of Mycobacterium tuberculosis transmission in Oxfordshire, UK, 2007-12, with whole pathogen genome sequences: an observational study. The Lancet Respiratory Medicine 2(4):285-292

[57] Wang CS, Chen HC, Chong IW, Hwang JJ, Huang MS. 2008. Predictors for identifying the most infectious pulmonary tuberculosis patient. Journal of the Formosan Medical Association 107(1):13-20

[58] Wang JY, Hsu HL, Yu MC, Chiang CY, Yu FL, Yu CJ, Lee LN, Yang PC. 2011. Mixed infection with Beijing and non-Beijing strains in pulmonary tuberculosis in Taiwan: prevalence, risk factors, and dominant strain. Clinical Microbiology and Infection 17(8):1239-1245

[59] Warren RM, Victor TC, Streicher EM, Richardson M, Beyers N, Gey van Pittius NC, Van Helden PD. 2004. Patients with active tuberculosis often have different strains in the same sputum specimen. American Journal of Respiratory and Critical Care Medicine 169(5):610-614