All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
The author has addressed all of the concerns of the reviewers.
[# PeerJ Staff Note - this decision was reviewed and approved by Paula Soares, a PeerJ Section Editor covering this Section #]
Please address the concern of Reviewer 1, who suggests that you discuss the advantage of the phylogenetic profile method with respect to alternative methods, if possible.
The problems with the initial submission have been largely addressed and the manuscript is considerably improved. In particular, the edits have greatly improved the clarity and logic.
My only remaining comment is that I would suggest that the author explain, in a direct clear way in a paragraph in the introduction, why the phylogenetic profile clustering approach is superior to traditional phylostratigraphy in at least some respects. It is now mentioned that traditional phylostratigraphy uses arbitrary thresholds avoided by the clustering method, but why is this a problem--why might it lead to errors in biological inferences or missing insights that the clustering approach might find? Without this, the reader will eventually understand the point of this manuscript implicitly through the analyses done but this should not be required.
All issues have been addressed satisfactorily.
All issues have been addressed satisfactorily.
Please address the concerns of the Reviewers, in particular "explain and justify more fully the motivation and goals of both the general approach—explaining why someone might want to use it over others—and of each step in the analysis", as suggested by Reviewer 1. I agree with Reviewer 1 that many methodological choices (such as the comparison with "ideal profiles") should be better justified, the advantages and disadvantages of phylogenetic profiles with respect to other more widely used phylogenetic methods should be addressed more clearly, and that the phylogenetic profiles also reflect the evolutionary rates, which could bias the analysis of rapidly evolving proteins such as intrinsically disordered proteins.
The two main figures 4 and 5, with colour representing age, are rather difficult to interpret visually and I do not understand how they are exactly obtained (why are there different points for same proportion?). I suggest that the author adopt a more readable representation, such as for instance the median fraction of prion or disordered proteins as the function of the origination time, or something similar. It may be interesting to compare the results on the evolution of disordered proteins also with those of the paper https://pubmed.ncbi.nlm.nih.gov/22076659/, which found that proteins and exons that appeared more recently in the evolution of the human lineage are characterized by a larger disorder content than more ancient ones.
Overall, the idea of using phylogenetic profiles to investigate evolutionary dynamics is intriguing. However, there are a few issues that make it challenging to evaluate this approach as presented in this manuscript.
Though it is mostly understandable what is being done in each step, it is difficult to follow the logical reasoning of the author through the progression of the manuscript. It is not stated why each choice is being made—what problem it is trying to solve or question it is trying to answer. For example, one of the first steps in the manuscript (shown in Fig 1) is to compare each profile to “ideal profiles” and “randomly generated profiles.” But it is not said why. What is each comparison specifically intended to tell us about evolution? Why is this approach a reasonable choice? Without understanding exactly what problem the author is trying to solve it is difficult to know whether it has been accomplished.
The manuscript is dense in terminology, abbreviations, and technical concepts, several of which originate in this manuscript. This is not a problem in itself. But often these technical concepts are related to each other, and it is not clear why these relationships are being investigated or what the biological implication is. This contributes to difficulty in following the logic of the argument.
This issue extends to the overall motivation of the phylogenetic profile approach. What is the reason to use phylogenetic profile clustering over other methods of phylostratigraphy? What are its strengths and weaknesses? The citations themselves are fine; sufficient background is provided; but the place of this work within the field is not sufficiently explained.
My suggestion is to explain and justify more fully the motivation and goals of both the general approach—explaining why someone might want to use it over others—and of each step in the analysis.
The research question is defined only in a very general way--no knowledge gap is identified.
The methods are mostly written in a clear way, but I do not understand the section “Removing cluster redundancy” (line 108). Perhaps more detail or an explanation of the reasoning behind the steps in this section would help clarify for the reader.
The stated goal is to understand patterns of gene origination an this is mostly how the profiles are interpretted. But as the author briefly acknowledges in discussion of intrinsically disordered proteins, phylogenetic profiles do not just reflect patterns of gene origination but also evolutionary rate, since fast evolving genes can diverge beyond recognition by BLASTP. No argument is given for why we should interpret phylogenetic profiles as primarily reflective of patterns of gene origination rather than patterns of evolutionary rate.
No additional comments.
The manuscript os well written and clear
The research question is well defines and the approaches are robust and well documented
The findings are in general robust. I was however skeptical f the correlations reported in Figure 2. These correlations apper to be mostly driven by a samll number of outliers. I would expect that more robust correlation metrics (such as spearman rank) will show much lower correlation. The conclusions should be adjusted based on this.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.