Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on March 10th, 2021 and was peer-reviewed by 3 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on March 21st, 2021.
  • The first revision was submitted on June 22nd, 2021 and was reviewed by 2 reviewers and the Academic Editor.
  • A further revision was submitted on August 7th, 2021 and was reviewed by 1 reviewer and the Academic Editor.
  • The article was Accepted by the Academic Editor on August 7th, 2021.

Version 0.3 (accepted)

· Aug 7, 2021 · Academic Editor

Accept

Your re-revised manuscript has been reviewed by the original reviewer who gave several comments on its previous version. As you can see from his/her comments below, the reviewer is satisfied with your revision, understanding that one of the points was hard to answer clearly. Thus, I am happy to inform you that I will recommend its acceptance to the Editor-in-Chief.

[# PeerJ Staff Note - this decision was reviewed and approved by Keith Crandall, a PeerJ Section Editor covering this Section #]

Reviewer 2 ·

Basic reporting

no comment

Experimental design

no comment

Validity of the findings

no comment

Additional comments

The authors tried to resolve all of my concerns, and now, I recommend that it be accepted for publication.

I understand it isn't easy to clarify the reason for comment.2, and I hope the authors will find the reason in future research.

Version 0.2

· Jun 29, 2021 · Academic Editor

Major Revisions

Your revised manuscript has been reviewed by two of the three original reviewers (the remaining one has not responded to our invitations). As you can see from their comments below, one of them is now satisfied with the revision while the other requests further revision. Since I basically support this reviewer's opinion, please re-revise the manuscript as requested. Thanks for your patience, in advance.

Reviewer 1 ·

Basic reporting

no comment

Experimental design

no comment

Validity of the findings

no comment

Additional comments

Authors have solved all my comments, I have no further comments.

Reviewer 2 ·

Basic reporting

see comments

Experimental design

see comments

Validity of the findings

see comments

Additional comments

The authors tried to resolve all of my concerns, but I still have a few concerns.

1. I cannot follow why the authors used Monaco and CellBech dataset only in Fig.2. In my opinion, these datasets will be useful for other analyses (such as Fig.3).
2. The ARI values of “salmon and kallisto” are lower than those of “kallisto” or “salmon” for the Pollen dataset. In my opinion, it is important to analyze and discuss this case for clarifying the usefulness of “joint” NMF.
3. It is difficult to see and compare the values of different methods in Fig.5. I want the authors to revise Fig.5 so that the boxplot are arranged for the same dataset and same subsample rate, likewise Fig.3.

Version 0.1 (original submission)

· Mar 21, 2021 · Academic Editor

Major Revisions

Your manuscript has been reviewed by three experts in the field. As you can see from their comments below, all of them raise rather fundamental criticisms on it; one of them even recommends its rejection. Thus, please read their comments carefully and revise the manuscript accordingly. All of the reviewers seem to point out some concerns on (the derivation of) the presented equations. Also, they give some concerns on the basic design of the experiment itself. I hope that your next revision will be satisfactory to the reviewers.

[# PeerJ Staff Note: It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors are in agreement that they are relevant and useful #]

[# PeerJ Staff Note: Please ensure that all review comments are addressed in a rebuttal letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.  It is a common mistake to address reviewer questions in the rebuttal letter but not in the revised manuscript. If a reviewer raised a question then your readers will probably have the same question so you should ensure that the manuscript can stand alone without the rebuttal letter.  Directions on how to prepare a rebuttal letter can be found at: https://peerj.com/benefits/academic-rebuttal-letters/ #]

Reviewer 1 ·

Basic reporting

This paper develops a new method named SC-JNMF for integration of multiple quantification methods to perform stable clustering of scRNA-seq data.

Experimental design

no comment

Validity of the findings

no comment

Additional comments

Please see my comments:
1, Authors point out that there are many gene expression quantification pipelines, like cufflink, RSEM, etc, and they provide inconsistent results. But those quantification methods are designed for bulk RNA-seq data and not for single cell data. On scRNA-seq, most research just use the read count instead of FPKM, RPKM, or TPM. It is not clear what is the benefit of using those quantification methods on single cell data.
2,NMF is widely used in single cell clustering analysis. Important citations are missing in introduction:
A. Shao, Chunxuan, and Thomas Höfer. "Robust classification of single-cell transcriptome data by nonnegative matrix factorization." Bioinformatics 33.2 (2017): 235-242.
B. Duren, Zhana, et al. "Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations." Proceedings of the National Academy of Sciences 115.30 (2018): 7723-7728.
C. Jin, Suoqin, Lihua Zhang, and Qing Nie. "scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic profiles." Genome biology 21.1 (2020): 1-19.
3, The update role in Eq. (7-9) looks incorrect to me. Please check the equations carefully. For example, D1 and H are cannot be directly multiped since the dimensions are not matching.

Reviewer 2 ·

Basic reporting

see General comments

Experimental design

see General comments

Validity of the findings

see General comments

Additional comments

The authors proposed a joint-NMF based method to integrate expression matrix with different quantification methods and applied join-NMF for cell clustering.

Although the idea of the shared matrix of “modules x samples” is already applied for multimodal data (https://www.nature.com/articles/s41467-020-20430-7 for example), such idea is not applied for multiple expression data with different quantification methods.

Although the integration of such expression data is interesting topic, I have several concerns for publication.

Major comments
1. In my opinion, the experimental design is not appropriate for clustering validation. I think the cell-type (label) of the Segerstolpe dataset is determined with the original expression data, for example. In such a case, it is obvious and not interesting that the ARI values with original expression data (or data with similar quantification methods) are better than those with Salmon and kallisto. Therefore, the conclusion that “ARI tended to be low only when using Salmon as the quantification method” (line.205) is inadequate. Overall, I think this bias makes us difficult to interpret the effects of different quantification methods and different clustering methods.

2. To overcome the above problem, I recommend the authors use the bulk RNA-seq dataset of sorted cell types such as https://dice-database.org/downloads#info_anchor and https://www.sciencedirect.com/science/article/pii/S2211124719300592 for clustering evaluation.

3. The scRNA-seq dataset in this article is not the latest. I recommend that the author add the evaluation with the latest scRNA-seq data because the quality and properties of the data are improving.

4. I have a concern about the merit of joint-NMF. In the current results, I cannot follow the effect of “joint” because the ARI values result from comprehensive procedures. I want to know the impact of “joint”-NMF by comparing the ARI values for such as
- Combining two expression matrices into one matrix, using general NMF and applying the same clustering method.
- Using only one expression matrix, using general NMF, and applying the same clustering method.
- Averaging the expression matrix, using general NMF, and applying the same clustering method.

Minor comments
Line.57: It is inappropriate to say “without mapping” about Salmon and kallisto because these methods roughly map reads to the transcripts with k-mer. It is appropriate to say “without alignment”.

Line.132: The authors say “log2 transformed the data”, but I wonder it produce negative values ?

Line.137: The dimension of D2 and W1 are wrong?

Line.142: The notations such as W_{1k} are confusing because the authors use H^{kj} in the following description. I want the authors to rewrite the united notation.

Line.143: I think lambda_1 is not related to “column vector sparsity regularization”.

Line.180: The reference to the table does not work well.

Line. 188: I cannot follow the detailed clustering setting. For example, I think the authors used FindNeighbors() and FindClusters() of Seurat Clustering, and I want to know the setting and effects of “dims” parameter. In addition, the other clustering method is described in the result section that makes me difficult to follow the result. Please divide the method and result into different sections.

Reviewer 3 ·

Basic reporting

The language needs to be improved. page 5 line 159, what is "the ratio of the total of".

Also some terminologies are not standard. The authors used "biased" in many places, where "variation" is more appropriate. For example, in page 2 line 70, "we need ... that reflect individual cell biases", but the author meant the gene expression variation among individual cells. To refer to a RNASeq dataset, "RNA-seq library" is more commonly used than "RNA-seq reads". In the factor analysis literature, "latent factor" is more common than "latent feature" used by the authors. For example, on page 3 line 108-110, "features" was used in many places in this sentence but they have different meanings. The authors could use "cell-derived factors" instead of "features". "Rank" is more common than "Rank number". On page 6 line 228, the authors could use "factor loadings" instead of "factor values".

Experimental design

On page 4 line 133, the author normalized gene expression by L1 norm instead of L2 norm. The author could explain the rationale of that.

Equation(3), the author should clarify which matrix norm is used for W and H. Also in their experiments, \lambda_2 and \lambda_3 are always set to be zero. So they don't actually need to include these two terms. If they include, they need to show how to select these two hyperparameters.

It's unclear how to derive equations 7-9 from 4-6.

As quantification by STAR has better performance than others, \lambda_1 should have important role on the performance of joint NMF. The authors fixed \lambda_1 to match the number of genes, but it would be interesting to show how the performance varies with \lambda_1.

The author mentioned the sample size can affect the clustering performance (page 7 line 240), but it's unclear from the manuscript in which case their method is more beneficial. The authors could subsample the data set and compare their joint NMF with single NMF varying data size.

The method of choosing the rank is interesting but lacks justification. why the rank when the sparseness hits the elbow point is a good choice.

Validity of the findings

The improvement of joint NMF on two quantification methods compared with only using STAR is not clearly shown. Fig 3 only has aRIs for combination of two methods, it'd better if including the result only using one of the quantification methods and including all three methods.

The author stated that using similar gene expression profiles is worse than combining different gene expression profiles. However, different gene expression profiles have different performance on their own. The reason combining with STAR has better performance is likely because STAR is a better quantification method. Instead of comparing different combinations directly, it makes more sense to compare the performance of each combination with the performance of each individual method. Also line 248 on page 7 is quite confusing.

In figure 5C, several genes have quite different weights in two quantification methods. It would be interesting to provide several examples with further diagnosis.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.