First of all, great package! I look forward to using it. Thanks too for using PeerJ PrePrints, I'm not as motivated to post comments elsewhere but I like the StackOverflow-style points system here, so it motivates me to bother.
My comments here solely concern the written paper, not the actual software. (I will try and play with the software later if I get the time to...)
Line 35: I think a reference to Magee et al's paper (1) would fit well here. It's very much in the same vein as the Stoltzfus et al & Drew et al papers.
Lines 50-55: I'm surprised to see no love/citation for phangorn (2). Perhaps we live in different parallel worlds of phylogenetic R packages but that would be high on my list to mention in such a section of relevant packages! I do note you reference a more comprehensive list later.
Line 60: I'm calling bullshit on R as "...an ideal platform for reproducible research in phylogenetics and comparative biology...".
I looked up the definition of 'ideal' just to make sure I wasn't wrong and here's what I found:
A) "satisfying one's conception of what is perfect; most suitable."
B) "a person or thing regarded as perfect."
At the very least describing R as 'ideal' for phylogenetics shouldn't pass peer review without more supporting evidence / discussion. It's also fairly trivial to state that use of R may "allow a complete record of the steps taken in gathering, processing, and analyzing a given data set to be produced". That's true of any open source, well documented workflow, it's not special to R.
In my experience R is good for handling small, relatively 'downstream' phylogenetic data. For instance, objectively-speaking, R is not and likely will never be the key language/programme for computing phylogenies from raw data. It simply isn't 'speedy' enough. At best it can provide a wrapper around the specialised programs that do the actual hard work. Not one of PAUP, TNT, RAXML, MrBayes, RevBayes, BEAST, POY, MEGA, or anything else most people use to infer phylogeny are written in R.
In terms of sheer computational performance, perhaps Julia (3) might in future be a more promising language than R for phylogenetics? Phylogenetics packages are currently being built for Julia (4) and performance-wise it wouldn't surprise me if they were clearly superior to R equivalents. Admittedly Julia hasn't got a mature set of phylogenetics packages at the moment but you used the word 'ideal' and my point is in terms of performance R is demonstrably far from 'ideal' relative to implementations other languages! As I understand it, Julia is better for parallel computing and thus will perform faster permutations & bootstraps - key types of calculations performed in phylogenetics.
I'd encourage the authors to back away from the word 'ideal' here. R sure is popular with biologists at the moment, but it is not perfect or ideal. It could potentially be surpassed in both popularity, utility AND performance in future by other languages like Julia.
References
1. Magee AF, May MR, Moore BR (2014) The Dawn of Open Access to Phylogenetic Data. PLoS ONE 9(10): e110268. doi:10.1371/journal.pone.0110268
2. Schliep, K. P. 2011. phangorn: phylogenetic analysis in r. Bioinformatics 27:592-593. http://dx.doi.org/10.1093/bioinformatics/btq706
3. Bezanson, J., Karpinski, S., Shah, V. B., and Edelman, A. 2012. Julia: A fast dynamic language for technical computing. http://arxiv.org/abs/1209.5145
4. https://github.com/Ward9250/Phylogenetics.jl
see also https://willeerd.wordpress.com/2014/02/10/phylogenetics-in-julia-not-r-sorry/
Thanks for your comments Ross!
Re: Magee et al 2014 I'll look up the paper you mention and see if it can be integrated here.
Re: phangorn, I have personally never used it (and maybe I should...) and so it wasn't on my radar when I came up with a list of representative R packages.
Re: "an ideal platform for reproducible research in phylogenetics and comparative biology..."
Your point is well taken and I (not speaking on behalf on the co-authors here) would never argue that R is ideal to estimate phylogenies. It is also true that other programming languages allow users to record the steps taken in the analyses. However, in the last couple of years, several developments in the R ecosystem, particularly with the advent literate programming (support of pandoc with rmarkdown), as well as several efforts to improve reproducibility (packrat https://github.com/rstudio/packrat/ and remake https://github.com/richfitz/remake to just name two) are really facilitating reproducible research in R. In addition, it is easy to call external programs directly from R. For phylogenetics, it means that I go from unaligned sequence data to the figure in publication using the software that performs best for sequence alignment, tree building, comparative method, etc... along the way. At each step of the process, I can manipulate the input and output of these programs in R using the rich suite of packages that are available, and the perform statistical analyses that are not available elsewhere. I have a couple of papers in the making that demonstrate how to do that. So I agree that the word "ideal" might be too strong, but when it comes to "reproducible research in phylogenetics and comparative biology" R performs better than the other solutions I am familiar with.
Thanks for these useful comments Ross,
I actually have used phangorn to estimate branch lengths in R. I think one of our vignettes suggests using it, and we could certainly add it to the MS.
I agree entirely with François about the strengths of R as a platform for phylogeny and comparative biology. The sentence you highlight is probably too "cut and dried" a statement, but I think we can make clarify exactly where R's strengths lie (as François has above) and improve the MS. So thanks for pointing this out.
Cheers for the replies and the lively twitter debate too. I would have carried on if my phone battery hadn't died :)
One more thing I'd like to add beyond my quibble about the word 'ideal' is that I think it would be better to scope your discussion of rotl in the context of 'comparative phylogenetics' rather than just phylogenetics. This side-steps the whole issue of R not being a good platform for tree-building. It's an important distinction I think because phylogenetics is a very broad term and encompasses many many many different things. In terms of volume of papers; tree-building papers (with no comparative phylo element) probably outnumber 'comparative phylogenetic methods' papers by 10 to 1 or more, hence I was aggrieved at describing R as ideal for phylogenetics because tree-building is dominant within phylogenetics (broadly defined). If you rescope it to 'comparative phylogenetic methods' or some such phrase it makes it clearer which subfield you're getting at and I certainly wouldn't argue that R is the go-to language for this.
For the avoidance of doubt; I'm certainly not arguing that R + knitr isn't good for literate programming and reproducible research :)