To increase transparency, PeerJ operates a system of 'optional signed reviews and history'. This takes two forms: (1) peer reviewers are encouraged, but not required, to provide their names (if they do so, then their profile page records the articles they have reviewed), and (2) authors are given the option of reproducing their entire peer review history alongside their published article (in which case the complete peer review process is provided, including revisions, rebuttal letters and editor decision letters).
Thanks for fixing the typos so quickly.
Please fix the remaining typos the reviewers pointed out. Once done, I will accept your revised manuscript without further review.
The updated methods are detailed and explicit; meets all of our criteria.
Thank you for addressing all of our concerns.
A few typos/errors remain --
p4, line 113, "assembly" => "assemble" reads
p8, line 198, "assemblies ... were accomplished" (Fix was)
p9, line 225, "are expressed _in_ all tissue types."
The authors have sufficiently addressed the concerns raised in the previous reviews and I believe it will be ready for publication once the very minor grammatical corrections (detailed below) are addressed.
Line 113: ‘assembly’ should probably be changed to ‘assemble’
Line 118: ‘that’ should be changed to ‘than’
Line 128: ‘that’ should be changed to ‘than’
Line 189: should read “…available as part of BioProject…”
Line 198: ‘was’ should be ‘were’
Line 300: ‘These’ should be changed to ‘The’
Line 304: I think ‘test’ should be plural or ‘were’ should be changed to ‘was’
Please address the various points made by the reviewers. While they are numerous, none seem particularly major, so I consider this a minor revision. Please pay special attention to the comment of reviewer 1 about having the various parts of the analysis in a repository. PeerJ generally expects authors to make all data and analysis methods available to all readers.
The authors aim to identify genetic mechanisms behind osmoregulation using a species of dessert rodents. This species was chosen because of their adaption to extreme water deprivation, even to the point of living their lives without water. This makes a good model to identify genes under selection for osmoregulation. Studies have been done previously in model organism such as mouse, rat and human, but only on a gene-by-gene basis. This manuscript looks at transcriptome-wide differential expression, selection and nucleotide polymorphism using next-generation sequencing in efforts to advance this area of research.
1. The authors do a good job at framing their work, showing why the study is needed, the limitations and the how the work will/can lead to future research.
2. The assembly and annotation steps were well thought out. Assemblies were error corrected, quality filtered and several steps were implemented for annotation using closely related species, Pfam database and extraction of putative coding sequences. The only thing I wonder is why didn’t the authors pool the samples when assembling. This would not change their downstream pipeline much, however, it would help to recover low expressed transcripts. (Are there any citations for this?) Also, I do not understand if or why the addition reads for kidney were not used for assembly.
3. The author mentioned in results line 185 “The kidney appears to [be] an outlier in the number of unique sequences, though this could […] result [from] the recovery of more lowly expressed transcripts [caused by] deeper sequencing.” Why would this not also be the case for liver, which only has 3M (5%) less sequences?
4. I am trying to understand the filtering process for the assembled reads. From my understanding (Page 4, lines 103:109) sequences were filtered using Blastn, (Page 4, lines 113:120) annotated using Blastn, HMMER3 and Transdecoder. Is my understanding correct? If so, why were the assembled sequences filtered with Blastn before annotated with Blastn and HMMER3? I thought the point of HMMER3 was to retain divergent sequences not detected by blastn.
5. For the natural section results, I think it would be interest to add more than two genes. Perhaps the top and bottom 10 genes from the Tajima’s D analysis.
6. It would also be nice to have the various parts of the analysis in a repository, for reviewing and open science purposes.
Overall I believe it is a good paper with interesting analysis, and cool results.
Elijah K Lowe and C. Titus Brown
Some more explicit details to enable replication would be welcome, as described above.
Major comments in basic reporting section:
1. Citation format should match the "name, year" format described for the journal, currently it is in a different, numbered format
2. Introduction, lines 46-47: In discussing that P. eremicus does not drink water, is there a study or citation that gives their lifespan and/or drinking habits? Are the authors referring back to the species account cited in the previous section?
3. In reporting the individuals captured the authors should provide some metadata such as age (juvenile vs. adult) and sex (were there equal numbers of each sex, or more of one sex than the other)?
4. In the methods lines 86-89, the specific multiplexing and number of lanes of sequencing should be reported (how many individuals were sequenced on each lane, etc.) Perhaps this information could be included in table 1.
5. The figure legend for figure 1 needs to be more descriptive and informative.
Line 106: The abbreviation for transcripts per million (TMP) is provided here but the full term is not stated until line 189, TMP should be defined here first.
Lines 164-168. Should assembly be plural in these two sentences? As it reads, it seems that the authors are referring to one combined assembly of all 4 reference tissues, but given the numbers and the subsequent text this meaning does not seem to be correct and it should instead be 'assemblies'.
Lines 167-168: The use of 'tissue-specific' terminology is somewhat confusing as this denotes that the transcripts are unique to the tissue but this is clearly not the meaning here given lines 183-185 and figure 1.
Why aren't gene symbols provided for each of the genes in tables 3 and 4, if you are going to report gene symbols for some of the genes why not do so for all of the genes?
Line 248 and Line 250: Were p-values truly equal to 0 and 1 or are these rounded estimates, would p<.05 or p>.05 be more appropriate? This may be a matter of personal preference.
Very Minor/Grammatical revisions:
Line 48: The beginning of the sentence should probably read "These rodents have a distinct..."). This is one of several minor grammatical changes/typos that should be addressed but I will not belabor this as it is a very minor point.
1. Can the authors provide an explanation for the choice of male reproductive tissue for the reference tissues while leaving out the female reproductive tissue? Presumably one of the other sampled individuals was a female and tissue could have been harvested, yet only the testes were included in the reference transcriptome sequencing.
2. For the sentence from line 136-140 the authors later reference a paper for this, but the citation should probably be included here as well and addressed heteromyid rodents, not just Dipodomys.
3. Lines 138-142 Did the alignments produced contain insertions/deletions or internal stop codons? If so how were these treated for the PAML analysis? The results of the branch-sites test can be sensitive to alignment errors and with a small number of comparisons the alignments can be inspected manually to ensure this does not occur.
4. Line 143 Clustal-Omega is usually used as a multiple sequence aligner, can the authors provide details on what method it uses for producing a tree and whether branch lengths were provided to PAML or estimated in PAML?
5. This may be something planned in subsequent work but could the authors have provided kidney expression data for the 15 additional individuals or tested for differences in expression between the individuals of different sex?
Line 95, do the authors mean PHRED < 2 or PHRED <20?
Line 109 should this say "using default settings"?
1. The authors mention calculating the site frequency spectrum for their data in line 133, did anything come of this analysis?
2. Is anything known about the demographic history of this population? As the authors acknowledge, the patterns used to infer selection according to Tajima's D can also be produced from demographic events and the authors should provide any data that exists on the population history. If this data does not exist the authors should state that this and include demography as a possible explanation of their data in their conclusions about tables 3 and 4.
3. Do the authors have data on Tajima's D for either of the genes tested in the branch sites test? If so they should report these values.
Line 200: Do the authors mean complete coding sequence or open reading frame rather than 'complete exons'?
In this study the authors characterized the transcriptome of four separate tissues for an individual Peromyscus eremicus and conducted RNA-seq of kidney tissue of an additional 15 individuals to provide transcriptomic resources for the study of osmoregulation in this desert rodent. The study is sound in its methodology and provides a significant resource, however it does require some modifications and/or clarifications about some of the methodology and data before it could be published. In the above sections I provide suggestions and/or questions that should be addressed.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.