All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Dear authors,
Your manuscript is now accepted. Congrats!
Best
Mike Thiv
[# PeerJ Staff Note - this decision was reviewed and approved by Julin Maloof, a PeerJ Section Editor covering this Section #]
The authors have adequately answered my comments on the previous version of the manuscript and deposited their sequence data in public repositories.
The authors properly acknowledge the limitation of their approach (read filtering followed by assembly) in the revised manuscript. Although more sophisticated methods are available which may yield better results, their limited access to high performance computing infrastructure prevented them from doing so. I am nevertheless satisfied that the main claim of the paper is well supported, and that the reader is sufficiently made aware of potential pitfalls.
The authors adequately answered my comments on the previous version of the manuscript.
The revised manuscript is clearer and the authors took great care in answering the major criticisms from the reviewers. I have no additional comment.
Dear authors,
Two reviewers are generally positive. But they still recommend some changes, especially concerning the methods as indicated by rev. #1.
Please address all their points.
Best wishes
Mike Thiv
[# PeerJ Staff Note: It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors are in agreement that they are relevant and useful #]
[# PeerJ Staff Note: Please ensure that all review and editorial comments are addressed in a response letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate. #]
According to the manuscript “Our study is based on the data of Ly et al. (2020)” (L264, L272), but paper Ly, 2020 contains no raw data published.
L287: There are more recent studies on the topic to be cited:
• Georgiou, A., Sieber, S., Hsiao, CC. et al. Leaf nodule endosymbiotic Burkholderia confer targeted allelopathy to their Psychotria hosts. Sci Rep 11, 22465 (2021). https://doi.org/10.1038/s41598-021-01867-2
• Bacterial leaf symbiosis–Origin, function, evolutionary gain, and transmission mode of endophytes in bacteriophilous Rubiaceae (https://lirias.kuleuven.be/2909132?limo=0)
Thick and thin lines on Fig. 2 are hard to distinguish, I suggest to use dashed or colored lines. Text on Fig. 1 is unreadable, the report archives from Kaiju better be published in Supplementary materials or at Figshare/Zenodo.
The assembly software choice is arbitrary. I recommend to write the criteria and requirements for the assembler based on data. The reads quality filtering procedure details are missing (L134). Some review articles may help to form these criteria: https://doi.org/10.1016/j.mimet.2018.06.007 , https://www.mdpi.com/2076-2607/10/12/2416
Which reads were discarded after FASTQC analysis?
Which particular database type and version were used for Kaiju to (L130) “to identify and classify reads not belonging to the plant genome with the NCBI non-redundant RefSeq protein database”? There are “nr_euk”, “nr” and “refseq” types fits the description above.
Authors used some outdated software: BUSCO v.2.0 was released in 2019, Kaiju 1.8.2 in 2021.
If three of four assembled genomes have high BUSCO score (Table 3), why not to annotate and publish them to the NCBI GenBank? The Venn diagram for these annotated genomes might show the common and different genes between close bacterial species.
To extract 16S rRNA genes from metagenome data MATAM software could be used, it contains the reference database based on SILVA.
I performed reads classification for SRR21547707 and SRR21547709 with Kraken2 and database PlusPFP database (https://benlangmead.github.io/aws-indexes/k2). The percentage of reads belong to particular organisms is follows:
SRR21547707 (T. semidecidua): 31.56 – Bacteria; 25.45 – Burkholderiales
SRR21547709 (T. hensii): 14.89 – Bacteria; 10.59 – Burkholderiales
Theses % fits with authors’ results, but PlusPFP database allows to evaluate each read among all kingdoms of life. As the genomes of endophytic bacteria are assembled, they might be used to construct the phylogenic tree using PhyloPhlAn.
The manuscript by Verstraete et al. entitled « Metagenomics of African Empogona and Tricalyisia (Rubiaceae) reveals the presence of leaf endophytes » describes the detection of putative leaf symbionts of the genus Burkholderia s.l. in leaf samples of several Rubiaceae plants. The authors cleverly use previously generated data to expand the range of known leaf symbioses in the Rubiaceae, adding 2 genera belonging to the Coffeeae tribe.
The manuscript is well written, and the main conclusion that endophytic Burkholderia were detected in the four datasets examined is well supported by the evidence. Aside from minor issues with legends (see "additional comments"), figures, tables, and citations are up to a professional standard.
The sequence data are deposited in a general-purpose repository (Zenodo), but should be deposited in a sequence repository such as Genbank. I truly appreciate the authors making their data available through a public repository, but I believe that this is insufficient: Sequence data on Zenodo are not included in any of the major sequence databases, are not searchable, and do not include standardized metadata. Given the importance of this study in describing new leaf symbioses, I suggest submitting their assemblies to a public sequence repository.
The highly fragmented nature of the assembled genomes and low BUSCO scores indicate that MAGs may be incomplete. This is not completely unusual for metagenome assemblies, but the fact that the authors chose to select reads based on nucleotide composition and coverage prior to assembly could introduce bias. This would result in incomplete bins, as the composition and kmer coverage of individual (bacterial) reads may not be homogenous enough to pass the filter.
It seems also that contigs were selected for analysis based on similarity to Burkholderia s.l. genomes, discarding contigs that do not yield significant hits. This might create some taxonomic bias, masking the presence of genome information not represented in the database (e.g. other taxa or genes acquired via horizontal transfer).
Instead, the standard procedure in the field is to assemble reads with software adapted to metagenome assembly (e.g. MetaSPAdes or similar), and bin the contigs according to coverage and nucleotide composition. Nearly fully automated pipelines are available that perform metagenome assembly and binning (e.g. nfcore/mag, AutoMETA and many others). Resulting MAGs, once evaluated for completeness and contamination, may be used to get a more accurate genome size, and consequently a better picture of endophyte diversity and abundance in the samples. Better MAG accuracy would allow genome comparison with Caballeronia type species and available genomes of other leaf symbionts. Crucially, this would help determine the make-up of endophytic communities, i.e. whether only one taxon or several taxa are present.
The conclusion that Burkholderia s.l. are detected in the samples is well demonstrated, but because of methodological shortcomings (see comments in "Experimental design"), it is not clear whether these taxa occur alone or in complex communities within leaves.
Moreover, I feel that the authors do not fully exploit the potential of their data to shed light on Burkholderia leaf symbiosis in these new taxa. In particular, taxonomic analysis based on comparison with the genomes of Caballeronia type strains should be undertaken to clarify the taxonomic status of the endophytic bacteria.
Although the whole genome of the 4 potential symbionts are assembled, I find it a pity that no attempt is made at annotating the genomes. This would allow the reporting of basic properties such as number of genes, repeats and maybe some functional analysis. I understand that this type of analysis may not be within the immediate scope of the study, but this is fairly easy to do nowadays with automated, publicly available pipelines and I see little reason not to provide this information to the reader.
I have a few additional minor comments listed below :
L68 : « Every plant species seems to have its own unique bacterial species ». This is not necessarily true (see Danneels et al 2022), please rephrase to something along the lines of « most plant species harbor unique bacterial lineages ».
L129. Was the kaiju classification done on raw reads, or reads that had beed already labeled as « contaminants » ?
L140 : The GC content values given most probably are counts per 27-mer and not %. Please correct if this is the case.
L145. If I understand this correctly, contigs were filtered based on similarity with Burkholderia s.l. genomes, discarding contigs that do not yield significant hits. This might create some taxonomic bias, masking the presence of taxa not represented in the database.
L172 : Please explain what you mean by « possible incongruence between datasets ».
Table 2 : Please include average coverage per genome assembly.
L195, 197, 199. I suggest changing the phrasing « be assigned a taxon name » to « assigned to a bacterial taxon », to be less ambiguous.
L208. GC content of 15-20% strikes me as very low for Burkholderia s.l. (usually > 60%). The authors possibly confuse the « GC count » of the KAT tool, which is the number of G or C in 27-mers, with GC content. Please update the text as appropriate throughout the paragraph.
L214. These are rather low N50 values, which may suggest incomplete read sets, the presence of polymorphisms or repeats. Most likely, the strategy used by the authors is responsible to some extent. Filtering reads before assembly is likely to result in gaps in the assembly. The criteria used for filtering the reads are not clear to me : Were reads that contained at least 1 kmer below the threshold value entirely discarded ?
Table 3. Except for the genome assembled from T. hensii, the BUSCO scores are low to very low, indicating that the genomes are not complete. See comment about assembly strategy.
Fig1. I believe the Y-axis on panel B represents GC count per kmer, not GC content as it is commonly interpreted. I suggest updating the legend to avoid confusion.
Figure 2. Type strains of Caballeronia species should be included to place the endophytes relative to validly described taxa.
Figure 2. The legend indicates that thick and thin lines represent different branch support thresholds, but on my screen all lines seem to have the same thickness. It is then impossible to say if branch support is high or low.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.