Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on July 11th, 2018 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on September 6th, 2018.
  • The first revision was submitted on December 5th, 2018 and was reviewed by the Academic Editor.
  • The article was Accepted by the Academic Editor on December 7th, 2018.

Version 0.2 (accepted)

· Dec 7, 2018 · Academic Editor

Accept

Dear Dr. Glover and colleagues:

Thanks for re-submitting your manuscript to PeerJ, and for addressing the concerns raised by the reviewers. I now believe that your manuscript is suitable for publication. Congratulations! I look forward to seeing this work in print, and I anticipate it being an important resource for genomics researchers. Thanks again for choosing PeerJ to publish such important work.

-joe

# PeerJ Staff Note - this decision was reviewed and approved by Keith Crandall, a PeerJ Section Editor covering this Section #

Version 0.1 (original submission)

· Sep 6, 2018 · Academic Editor

Minor Revisions

Dear Dr. Glover and colleagues:

Thanks for submitting your manuscript to PeerJ. I apologize for the lengthy time in review, as we had trouble finding reviewers and one reviewer took a little longer than anticipated. I have now received two independent reviews of your work, and as you will see, both are rather favorable. Well done! Nonetheless, both reviewers raised some concerns about the research, and areas where the manuscript can be improved. I agree with the reviewers, and thus feel that their concerns should be adequately addressed before moving forward.

Please consider a definitions table or figure for terms that are jargon or introduced in this work. This will ensure that a broader audience can appreciate your writing.

I am recommending that you revise your manuscript accordingly, taking into account all of the issues raised by the reviewers. I do believe that your manuscript will be ready for publication once these issues are addressed.

Good luck with your revision,

-joe

Reviewer 1 ·

Basic reporting

Overall the language is clear and well-written and wholly appropriate for a scientific publication.

The background is generally well described, although the use of confidence scores in other orthology/paralogy methods could have been discussed briefly (e.g. InParanoid, but possibly others) and what approaches they use. Similarly, the authors could discuss any relevant literature showing that synteny, genetic distance or gene copy number are useful variables for assessing whether genes are homoeologs.

The layout of the material is clear and the figures are similarly clear and well labelled. (Figure 4B, should include the units for distance, i.e. PAM units)

The manuscript is coherent and self-contained.

Experimental design

The paper defines the a meaningful question (how to assign confidence scores to putative homoeologs) and the study contributes an answer to this.

The work appears to have been carried out rigorously and the method are clear.

Validity of the findings

I have some doubts about the language used to describe the findings. These are relatively minor, affecting one statement in the abstract and a couple of expansions on this statement in the main text. It doesn't have any impact on the validity of the paper but I'd suggest removing the claims, as detailed below:

The method is developed and the results (the confidence score) compared to an 'independent metric' (line 258), the total number of orthologs for the putative homoeolog pair. The authors reason: "Homoeolog pairs with few orthologs are either lineage-specific or dubious, whereas pairs with many orthologs represent those likely to be true". However, they find the correlation between their confidence score and this measure to be R=0.006. In the abstract the authors say the metric corroborates their confidence scores. In my view, such a low correlation shows that either the confidence scores are a very poor measure or the independent metric is. I don't believe it corroborates them.

The authors also do a manual evaluation (Supplemental Table 1 and lines 264-270). I thought this was a lot better for assessing the described confidence score and was useful to the reader. The comparison with the old scores from 271-281 is also useful.

My suggestion is that the reason for the very poor correlation is that the independent metric is actually poor, not their confidence measure. I think the independent measure can be included in the paper but I don't think the claims made from it are valid. Figure 8 is used to suggest that despite the almost zero correlation that their is a general trend. There aren't any statistical tests to confirm this. By eye, the figure seems to suggest two classes, 'low' and 'high', like their old measure. As such, I think the authors should remove the claims based on this independent measure. I'd suggest - not saying in the abstract that the independent metric corroborates the confidence score - not calling it a 'benchmark' on line 77 (it isn't used as a benchmark) - not saying on line 317 "the resulting scores proved meaningful in how they correlate with the number of orthologs"

Additional comments

Overall I thought the paper was clearly written and addressed the research question posed. The methodolgy was sound. There are a couple of minor points that could be addressed:

Figure 5: The synteny membership function for 'high' decreases as the synteny scores increases above ~0.7 for all species. This means that a score of 1.0 belongs to the 'high synteny' class less strongly than a synteny score of 0.8. I don't believe this is correct. I think the membership function should be redefined to avoid this or the authors should justify in the text why it should be this way. This problem is avoided for the 'distance' and 'total copy-number' variables.

Line 162 Should also reference figure 5 I believe

167-168. The text here is unclear/confusing. "Confidence scores were then scaled between the minimum score and 100.". This needs to be stated more clearly. It sounds like 0 was mapped to 'the minimum score' (but then what the 'minimum score' is would be undefined). My best guess is that the opposite was done: the crisp scores were linearly mapped with the minimum crisp score mapped to zero and 100 mapped to 100. Can this part of the method be stated more clearly please.

167-168. A more serious question related to this? Why were the confidence scores produced by the fuzzy logic rules subsequently re-scaled at all? This seems to go against the point of defining the fuzzy logic system. The rules, as written, already provided the confidence scores in the range 0-100. As such what OMA reports is no longer the fuzzy logic confidence scores from their system but a re-scaled version of them. It seems like the authors were not happy with the confidence scores returned and so re-scaled them? This should be explained.

Figure 7 are these the fuzzy logic confidence scores or the re-scaled ones?

220. It seems like we don't know that the errors in assignment will be only at the high-end of the range of sequence distances, not also at the low range (e.g. from more recent duplications than the divergence in question)? The text says "we expect there to generally be a low distance between homoeologs", but this is achieved by scaling the membership function to the general range of observed divergence times (Table 2). So the highest confidence for divergence at the initial speciation would be for those in the middle of the range and lower confidence would apply to both those that are older and younger than expected, I would have thought? Perhaps the authors could address this?

227: "would could"

Reviewer 2 ·

Basic reporting

Glover, Altenhoff, and Dessimoz present a very interesting approach to the issue of homoeolog identification. The use of fuzzy logic to assign confidence scores sheds light on the fact that the identity of these genes is not always known. This is an important point that would benefit the polyploid community and beyond.

The article has some issues with accessibility in the writing. As a evolutionary biologist who studies polyploidy and does quite a bit of bioinformatics, I found aspects of the fuzzy logic description difficult to follow. Some of this could be easily alleviated by explanation of terms or acronyms (like PAM units, line 150). Another example would be line 158. Clarification of "universe" here would be helpful. Is it the range of input variables? Output variables? Both? The use of "fuzzification" and "defuzzification" throws me off a bit, though I am sure they are perfectly reasonable for use when discussing fuzzy logic. Here, again, a quick definition would be very helpful.

Experimental design

The design is fine but the limitation that these are all younger polyploidy (mesopolyploids) would be useful to be addressed. Since there is a syntenic factor to the confidence score, older events might have fewer identified homoeologs. This potentially limits the scope of use for this approach, though this exact limitation is not test nor mentioned. It would be useful for the authors (probably in discussion) to speculate on these limits.

Validity of the findings

The data/results seem to be very clear with what the authors claim. I especially like that they say that this is not necessarily the best method or schema, but that, based on their manual curation, it is an improvement.

The approach is limited to genomes where the subgenomes are assigned. This is not always going to be possible. The authors acknowledge this.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.