All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Thank you for addressing the remaining concerns and congratulations again.
The only question/revision is with regard to PyClone. As noted by a reviewer, it was indicated that comparison was made to results from PyClone in the supplementary materials, but it is not obvious where this comparison is. Please clarify this.
The authors have addressed my concerns.
The authors have included competing approaches with similar formulations and subsequently strengthen their manuscript.
No concerns.
The edits and alterations have improved the manuscript and I have no further comment.
The presentation improved.
No comment.
No comment.
1- "we included, in the Supplementary Material, the results for PyClone"
I could not find PyClone results in the supp file that I attached. Am I looking at the right version?
2- The authors explained why they called their approach "sequential" in their reply. Did they include this explanation in the revised manuscript?
no comment
no comment
no comment
The authors have addressed all of my comments and concerns in the revised version. I have no additional comments. Overall, the proposed method will benefit other researchers in the fields. I recommend to accept the paper.
Please consider and address the various reviewer comments. In particular, improvements to the writing/language should be incorporated. Comparison to other methods is also a important aspect.
The English could be improved - some sentences are difficult to follow.
Some examples include:
In the abstract, the authors say “..and the parameters of our proposed state-state model.” I assume they mean “state-space model.”
Lines 52-54: Use of both “Although” and “however” makes this sentence confusing.
Line 82: authors should define “IBP” (Indian Buffet Process) in the main text.
Line 93: authors introduce matrices Z and W with no context or definition
Line 112: “SOME LATEX EXAMPLES”; I’m not sure why this is here.
Also, citation of Wersto et al. does not make sense (Line 41). The context involves studies that investigated tumor heterogeneity using NGS technology – Wersto et al. looked at tumor DNA content using flow cytometry. Additionally, the necessary depth of sequencing necessary to characterize somatic variation is quite high, making it rather dubious to claim such studies have been conducted for the “past few decades”.
There are other similar approaches that were not addressed. The most obvious comparator would be the MAD-Bayes approach proposed by Xu et al. (JASA 2015), which similarly touts computational benefits over MCMC-based inference and applies IBP priors for feature allocation. It would seem important to compare against this approach. However, it is not mentioned in the manuscript and its unclear why the authors did not investigate.
The methods appear well-outlined and the modes of evaluating error in the simulations seem reasonable.
Some typos must be fixed. Some sentences are too long.
Will the code to reproduce your results be publically available? How can other research teams apply your algorithm?
The findings on the synthetic data show some improvement over MCMC, but there are many alternative approaches to compare with. Also, no validation on the real data.
=== Summary:
The manuscript describes an approach to estimate the clonal structure of a tumor using multiple samples obtained from a single cancer that vary patient spatially or temporally. This work may be have some methodological contribution to the field. In particular, the authors claim to use the spatial and temporal information on the samples in their inference. However, while they used the Chinese Restaurant Process to the include the temporal information in the analysis, it is not clear where and how the spatial information is used in their model. Also, comparison with the current approaches is very limited.
=== Major:
- How can your approach be used on multiple samples that vary only spatially? Do you assume that all samples are "sequential" in the sense the they are taken along a line? Discuss if and how can your approach be extended in 2 or 3 dimensions. Also, what if both time and space vary together?
- In the real dataset, how do you compare your results vs. MCMC? Did you use any gold-standard to determine which approach was actually better on the real dataset?
- There are several other approaches that have been developed to do a similar task including: Clomial, cloneHD, PhyloWGS, PyClone, Cloe, phyC, Canopy, TargetClone, ddClone, PASTRI, GLClone, TRaIT, WSCUnmix, and B-SCITE. A fair comparison with some of these alternative approaches is desirable.
=== Minor:
- The abstract is somewhat too technical. Please consider rewriting it for a broader range of audience especially biologists who are not necessarily interested much in the technical and methodological details. For example, explain why and how your sequential algorithm could be better.
- Some sentences are too long, e.g., lines 30-34, 40-44, etc.
-"These approaches have several limitations that prevent their wider usage in examining and quantifying the level of heterogeneity in a given sample." Briefly, mention these limitations.
- Typo: "112 SOME LATEX EXAMPLES"
- When referring the literature, please consider citing the original work. For example, a version of Eq. (2) has been mentioned and used in (Zare et al. 2014) before (Lee et al., 2016).
- Typo: "(Griffiths and Ghahramani, 2011; ?)"
- Typo: "then such a binary matrix is a draw from the distribution in (3) (?)"
- Line 154: Remove "due to limited space".
- Line 136: Remove "(discussed below)".
- Typo: 194 Reference to Figure ??.
- Typo in the supposed: "The CLL datasets in [3] for the three patients and the prostate cancer dataset are presented in Tables 21 - ??." Also, did you analyze and explain a prostate cancer dataset?!
- In the caption of Tables S1-3, explain how they are different from the tables in the main text.
In general, there are numerous grammar error and typo throughout the manuscript. For example, the sentence on page 9 line 230 has a grammar error.
Page 4 before equation (3), “Griffiths and Ghahramani, 2011; ?)” and Page 4 line 150 (?). There are many unexpected question marks all over the manuscript. Please correct them.
Page 3, line 112 and page 7, line 194 should be deleted.
The English language should be improved and partially rewritten.
Please add some instructions and examples to run the source code. I did not find any instruction on the github: https://github.com/moyanre/tumor_haplotypes
The authors claimed that the proposed SMC algorithm benefits from larger number of loci. But the haplotype error (ez) in table 2 increased from 0 to 0.008 of 1000 loci and 2000 loci respectively, which is exactly opposite to this conclusion. It would be better to make an explanation for this phenomenon.
In CLL003 data analysis part, the proposed SMC algorithm identifies 7 latent haplotypes, C2 and C6 are dominate haplotypes which is not consistent with Manual and Phylosub method, since Manual and Phylosub yield 5 haplotypes and only one of the haplotype dominate others(with ~0.8 proportion). How to interpret the difference? The inconsistency of estimated haplotype happened in CLL006 and CLL077 as well. The CLL data analysis part should be improved.
It will be great if the authors can compare the SMC method with some well known subclonal reconstruction methods such as PyClone.
This paper presents a novel sequential Monte Carlo (SMC) algorithm to solve the feature allocation model which characterizes tumor heterogeneity by latent haplotypes. Based on the simulated data, the proposed SMC algorithm provides more accurate compared to the state-of-the-art Markov chain Monte Carlo (MCMC) approach. An additional feature of the proposed algorithm is that newly observed VAFs data from next-generation sequencing (NGS) can be analyzed to improve existing estimates without re-analyzing the previous datasets, which improve the efficiency.
The effort to develop a more accurate and efficient algorithm to infer latent haplotypes is valuable.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.