All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
I think that issues regarding the differential expression analysis are now fixed in the revised version of the manuscript.
[# PeerJ Staff Note - this decision was reviewed and approved by Julin Maloof, a PeerJ Section Editor covering this Section #]
I acknowledge that, in your revised manuscript, you fixed the minor issues raised by the reviewer. However, we are seeing that there are still some minor issues to fix.
In particular, it seems that the differential expression analysis was done incorrectly. The authors write " Quantification of gene expression levels were estimated by fragments per kilobase of transcript per million fragments mapped (FPKM) method. Differential expression analysis of two conditions/groups was performed using the DESeq R package (1.10.1). " However, FPKM cannot be used as input to DEseq. DEseq requires raw counts (number of reads per transcript). The DEseq manual states "The count values must be raw counts of sequencing reads. This is important for DESeq’s statistical model to hold, as only the actual counts allow assessing the measurement precision correctly. Hence, please do not supply other quantities, such as (rounded) normalized counts, or counts of covered base pairs – this will only lead to nonsensical results."
Thank you for submitting the revised version of your manuscript where most of the remarks addressed by reviewers have been taken into account. As you can see from the reviewer's report, there are still a few minor issues that need to be fixed before the paper can be accepted for publication.
No comments
No comments
No comments
Authors addressed my concerns properly, but there are few things authors need to pay attention in the updated version of manuscript.
1> the expression of log2 fold change is confusing, you could either say absolute log2 fold change or |log2 fold change|
2> the statistical significance of P value reported by DESeq is original p value or adjusted p value?
3> For GO analysis, you should also report GO accession for each term
4> L148, L154, what does the "f" in "average length f 300bp" mean?
Reviewers provided very details comments on your manuscript. Although they recognize that the paper is prepared according to a proper workflow, they mentioned several major flaws they need to be addressed before the paper can be further considered. For example, reviewer 1 noted that RNA-seq should be prepared with more precise descriptions of statistical thresholds. Also Rev #3 warns there is no quantitative validation of RNA sequencing. At least one other method has to be used for candidate genes.
In the submitted manuscript, the authors described the transcriptomic analysis of genes which are differentially expressed in cucumber between the resistant and susceptible cultivated varieties under the control condition and after inoculation with using Sphaerotheca fuliginea which is a plant pathogen that causes powdery mildew on cucurbits (very destructive disease). Introduction and background of the article are well written. Proper breeding technologies for selection of PM-resistant cucumber varieties are an effective way to control the disease but there is still a lack of knowledge about the molecular mechanism underlying the infection.
The authors conducted extensively in silico analyses of the studied genes, the annotation, and assignment to KOG/COG, KEGGO and GO terms. The strong points of the manuscript is a very interesting topic and application of NGS technology. The description of identified mRNA transcripts is very interesting especially with regard to hormone signaling pathway connections. I commend the authors for their extensive data set. The workflow of data processing is proper. In addition, the manuscript is clearly written in a professional, unambiguous language. Figures are rather relevant and high quality. Figure 2 and 4 are not properly labeled and there is a lack of full descriptions in the legend part (see general comments). Raw data are deposited in NCBI SRA database and are available but there inaccuracies of a given method of obtaining the raw reads. Authors should be more precise. The overall structure of the manuscript is proper. References are relevant.
The aim of the study is clearly stated and it was to identify genes and clarify the correlation mechanism of plant response after pathogen infection in resistant and susceptible cucumber.
Proper breeding technologies for selection of PM-resistant cucumber varieties are an effective way to control the disease but there is still a lack of knowledge about the molecular mechanism underlying the infection.
There are several weak points of the article.
It is a good and valued practice to combined RNA – seq data with the experimental validation of gene expression in transcript level by quantitative real-time RT-PCR. In this article, there is a lack of this confirmation. It is a big failure of this work.
The statistical analysis (as I have noted above) which should be improved. The authors do not present borderline data for statistical analyses. They give it only in some cases (see general comments). Statistical analysis presented herein seem to be appropriate but there is no clear description of the used threshold in the analysis of QC of reads, mapping to the reference genome and DEGs analysis. The lack of this description is a problem and should be complied. The description of DEGs using only p-value it could be not enough, Often the statistical approach use also fold change
Authors provide the raw data, and supplementary tables, however, your supplemental files need more descriptive metadata identifiers to be useful to future readers. Although your results are compelling, the data analysis should be improved in the following ways: gene id, functional description, ontology assignment, expression value, p-value, fold change etc…
Impact and novelty of the finding relay on the pointing out of genes differentially regulated and not more. The authors emphasize the importance of influencing the results too much. There is a set of data not confirmed experimentally. There is a lack of statistical description and experimental validation (see experimental design). Conclusions are linked to the scientific question.
Line 84 – lack of space: CsERF004expression
Line 134- “The four libraries were used” if they were pooled or they were sequenced separately?
Line136 – “Raw reads were generated in a paired-end format” -what about the length?
Line137 - NCBI Short Read Archive with accession numbers SRP212890 – in the SRA database you show different sequencing technology Illumina Hi seq4000, in the than in presented manuscript
Line:142 – “clean data(clean reads)” – lack of space
Lines 141 – 145 – You should provide more information about thresholds. The statement (line 144): “All the downstream analyses were based on clean data with high quality (Yu et al. 2017).” Is definitely not enough for scientific paper.
Line 148: “Raw sequences were transformed into clean reads after data processing.” – this step should be more descriptive.
Line 149 – you should specify what was your reference, by adding the Accession Number.
Line 150 – “Only reads with a perfect match or one mismatch
were further analyzed and annotated based on the reference genome” – you should give the specification for this task
Line 151 soft were – bad writing
Line152 – Gene function you should write the web source for the programs and databases which were used for this task. Additionally, the scheme of functional analysis is not clear. Did you perform functional annotation for all sets of genes? What about the annotation form the reference genome? Why do you not use it?
Line 156 – lack of space
Line 160 – please provide a standard shortcut for this – FPKM
Line 187 – to what refers the number of 280 mln reads – you should specify
Line 189 – could be mapped or were mapped?
Line 189 – again specify the reference genome
Line 192 – Please specify better how many genes you detected after mapping? And describe the better generation of 21 408 number.
Line 194 – “The largest term” - does it refer to the KOG term?
Line 194-198 - Are genes counted once or are together?
Line 205 – additional mark ‘
Line 215 – DGEs-?
Line223 –Fig 2c - describe the axes (x, y) in the drawing
Line 244 –“a large number” – how many percents?
Line 248 – “1” should be written as “one”, – “2” should be written as “two” like in other parts of the paragraph.
Line 282 – exactly you write time points but in your study in did not mention time point study
Line286 – tiny variation – could you specify this?
Line 293-294 – “Our data provide comprehensive information
294 on hormone signaling pathway genes under S. fuliginea infection in cucumber” This is a too far-reaching statement. It is now known that the hormone-related gene group is changing, but this is not fully experimentally justified. And it would be worth highlighting in the discussion.
Line 300 – “including 1 AUX1 gene, 1 AUX/IAA gene, 1 GH3 gene” see comments to Line248
Line 306 – nomenclature of gene – the gene name should be written in italic
Line 306 – Is this gene name? Could you specify this?
Line 306 – Not Cas but Csa
Line 314 – see comments to line 306
Line329 – 330 – “In our study, some peroxidase genes (Class I) were only
330 induced in the susceptible cultivated variety, while some other peroxidase genes (Class II)…” – the word “some” is not appropriate to a scientific paper.
Fig 4 – what is a grey box, you should add this to the legend, what is a rich factor – this should be explained.
Table S5 – what is the number of 4219?
This manuscript describes a study on identifying candidate differentially expressed genes in cucumber species in response to the infection of Sphaerotheca fuliginea. Authors mainly used a RNA-seq experimental design to answer this question. Main findings of this study could be potentially meaningful, but there are many places authors should do to improve the quality of this manuscript.
1. The overall English writing could still be improved. There are a few of flaws could be easily caught. For example, L71 “to control” should be “to be controlled”; L109-L110 “one when… one when…” should be restructured; L150, the word of perfect in “perfect match” looks more symbolic, but you should use a more scientific word to describe; L160, you should give FPKM as an abbreviation;
2. Figure 1, the left border seems to be trimmed a little bit, please modify you figure and update.
3. Figure 2, Heatmap not heat map; you should label the unit for the scale bar; what is the y scale and y scale unit for the cluster subpanel?
4. Figure 3, you should also list p-value for each enriched term.
5. Figure 4, what does it mean the rich factor? What does different size of round mean? What does the red arrow mean? What does the color of round mean?
6. Figure 5, what does the phylogeny in each heatmap mean? You didn’t explain Class I and Class II in the figure, also I understand you have mentioned that in the main text.
1. From authors’ description, I think there were 3 replicates for each treatment per species. However, the expression authors used in the manuscript is confused, please restructure sentences to be concise for better understanding.
2. The “processing of RNA-seq data” session should be reinforced a lot, such as detailed parameters used, or what detailed filtering parameters employed in the in-house perl scripts.
3. Which version of reference genome you used? Which should be given detailed information.
4. For the gene function annotation session, I am totally lost. You listed a number of software for the annotation, but why did you use those software? What are the differences among them? All of these types of information should be given but not jest listed names of software/methods.
1. Only depending on the p value to define differentially expressed genes is not accurate, you should also consider absolute log2 fold change to avoid instability in low read counts per gene.
2. For GO analysis, other than looking at which gene associated with which GO term, it is more informative to look at which GO is significantly enriched for different gene populations (such as DEGs), although authors mentioned GO enrichment in the methods session, but I couldn’t find any in the results. I don’t think the current GO analysis result could be used in the main result.
3. Please also specifying the background gene population in your analysis (GO, if you are going to do GO enrichment; and KEGG); also how did you define the p value in this GO analysis? (one-tail or two-tail?)
4. The methods of clustering analysis should be solely listed as an additional paragraph in the methods session. Also, you should clarify how did you decide the number of clusters you used.
There are two cultivars were used in this study, except for looking DEGs, it would be very interesting to see or discuss how orthologous genes are regulated differentially between these two species, like the study in the paper “Zhang et al. (2017) Differentially Regulated Orthologs in Sorghum and the Subgenomes of Maize”. Please discuss this point and I think it will be helpful to enhance the significance of the manuscript.
In this manuscript, Zhang et al. attempted to describe an identification of genes involved in the resistance to Sphaerotheca fuliginea infection using comparative transcriptome approach. However, the authors simply represent their data of bioinformatics with typical analysis, while the manuscript seems to have no scientific hypothesis driven-question. Additionally, it has raised enumerated concerns that need to be thoroughly addressed before consideration further.
Introduction lacks specific information regarding molecular mechanisms of PM-resistant cucumber varieties. How many PM-resistant cucumber varieties obtained so far? What are the obstacles that need to be further investigated? Very scare information of Sphaerotheca fuliginea is described, especially its pathogenesis, epidemiology, life cycle, molecular biology disease, ect.
Additional information has to be included: What are the morphologies and phenotypic characteristics of BK2 and H136? What are their genotypes and resource? Where are they obtained for this study? Ect.
At least 2 biological replications should be performed for each condition of RNA sequencing.
Several time points should be performed to represent the infection. One timepoint is insufficient. Also, infected morphological results have to be included that confirm the gene expression caused by the infections.
How to determine which level will be assigned as differential expressed genes? Log2? How difference will be considered “significant”?
There is no quantitative validation of RNA sequencing. At least one other method has to use for candidate genes obtained RNA sequencing, otherwise, it is meanless.
The use of English in this manuscript is poor and contains typos, grammar errors, etc that would be greatly improved by running by a native speaker.
No comment
No comment
1. Some of the sentences used are too general. For example in the abstract line 42, 47-48. ‘…a few differences..’, ‘the differential expression of prdx genes might contribute..’. The authors should highlight the main findings, at least mention a few genes that may contributed to the resistance of the plant.
2. In line 68, the author mentioned ‘..massive foundational data..’ what does this mean?
3. In line 106-111: this paragraph should be in the method section – however it does not tally with the method section where it was stated that ‘..Three independent cDNA libraries for each sample group were constructed..’ . does this mean that the authors pooled the 3 cDNA samples in each of the groups?. The justification given in the last paragraph is weak. The authors could explain more on the need for identifying potential PM resistance genes and its application.
4. The way the data is presented may be improved.
The method section needs to be improved.
Line 145: Yu et al is quoted to address which method? Is it the parameters used to trim the data?
Line 148: this has been mentioned in the previous paragraph.
Line 150: what is the reference genome that was referred to? State the ref genome
Line 151: ‘..tophat2 tools soft were…’. Please revise this sentence.
In the differential expression analysis section, the authors did not elaborate on the cutoff point of the fold change of DEG. Does this mean that there’s no cut-off point of FC, why is this so?
Line 175: need to add a citation/reference for KOBAS software.
Statistical analysis: which experiment utilized the ANOVA by Excel? Some of the researchers do not recommend the usage of Excel for ANOVA analyses due to the highly error prone analysis as it requires a unique data format for the built in analysis. Suggest to use other software such as Graphpad prism/spss.
What about the validation of RNAseq expression by qPCR?
How do you compare the DEG between resistant and susceptible cucumber varieties infected with S.fuliginea (ST vs RT)? Do you straight away analyze the ST data to RT data? According to the results, there were 97 DEGs in the control groups (SC vs RC). The method of comparison is not clear in the methodology section.
Results were presented thoroughly. However, there are a few queries:
1. Line 188: Table S1 showed a total of 280.92 million trimmed reads. But in Table S2, the reads that are mapped to the reference genome is a total of 561.8 million reads. Assuming that this is the untrimmed reads, how do you explain this?
2. Line 192: Please elaborate in the method section on why you chose to annotate the DEG against ~ 7mentioned platforms.
3. Line 218-223: the details of clustering were not explained in the methodology.
4. Line 233-224: some of the results were only explained briefly, for example in line 233-224, I suggest to at least state a few of the gene names for ‘flavonoid biosynthesis’.
Discussion could be improved:
1. For example in line 288-294, the discussion was very brief and too general. Instead of stating the pathways alone, I suggest the authors to mention a few genes that are worth to be highlighted in the study. Please rephrase the sentences to link the previous studies mentioned to your results.
2. Line 342-346: too brief. How does the PR genes ‘play a role in pathogen resistance?’, please elaborate with mechanism and again mention a few genes from your study.
Conclusion: no need to mention the method in conclusion. Please summarize the findings without being too general. In the last sentences (352-353), it is good to state the importance of ‘analyzing potential genes related to PM resistance.’
In this study, Zhang et al. have reported the differential gene expression of resistant BK2 and susceptible H136 cultivated cucumbers under S.fuliginea infection. The abstract is only briefly described, especially the methodology and results section. The author needs to mention specifically the main highlight of the study. Furthermore, some of the sections in the body of the manuscript are not explained thoroughly and are confused to read. This manuscript needs to be proofread before submission. Above are the major points that need to be addressed to improve the quality of the study.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.