All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Thank you for addressing all the reviewer concerns. I believe you have adequately addressed these comments and there is no need for another round of revisions.
[# PeerJ Staff Note - this decision was reviewed and approved by Dezene Huber, a PeerJ Section Editor covering this Section #]
I agree with the reviewer's assessment that this manuscript will be an important contribution to the field. The authors should consider revising according to the limited, and relatively minor, set of comments from the reviewers.
Figure 2: I appreciate the compact and clear format of this figure for communicating the structure of this relatively complex hierarchical model. However, definitions of most of the variables are missing, which makes it impossible to interpret. These should be added to the figure caption (ideally with the intuition in the context of this system as well as the literal statistic).
Line 43: Did you mean nine rodent species? (It sounds like nine individual animals as worded here)
L199-201: Needs a citation
L214: Why minimum temperatures? And from what spatial areas?
L242: Citations and an empirical or theoretical justification for the time windows used for environmental covariates would be helpful here
No comment
This is an outstanding contribution to the field of ecological forecasting and sets a gold standard for how hierarchical models can be designed to incorporate ecological mechanisms (here, at species and community scales) and test hypotheses in appropriate statistical frameworks while also yielding increased predictive skill. The writing is extremely clear, the motivation and case study application are powerful, and I believe that the implications for population modeling are profound.
Overall, the manuscript is written in a correct, professional English, and Figures are relevant and easy to interpret. I have a few minor points that I think would improve the clarity of the text.
Minor points:
(1)
Line 76: 'This is problematic because key applications of population dynamics forecasts, including changes in ecosystem function and biodiversity loss, are rarely single-species issues […]. '
It is not clear why ‘this is problematic’ unless more explicitly linked to the points made a few lines below. I suggest replacing it with simply: ‘However, key applications […].
(2)
Line 88: 'These associations between the dynamics of different species has resulted in extensive research into multivariate population dynamics models […]. '
Although the name ‘multivariate population dynamics models’ is to some extent self-explanatory, I think a brief definition here would be helpful for readers who are less familiar with these models.
(3)
Line 99: 'The rarity of multispecies population dynamic forecasting is likely due in part to the increased computational complexity and statistical knowledge needed to formulate multivariate population dynamic models that incorporate real world complexities in ecological data (Karp et al. 2023). '
One could argue that in many cases it is not the statistical knowledge per se, but the fact that these models can be tedious to code and debug, convergence is sometimes difficult to achieve, and out-of-the-box solutions such as R packages are relatively new and not suitable for all. I would therefore replace it with 'higher computational costs and statistical complexity'.
(4)
Line 194: 'These include observation errors due to imperfect detection, missing samples due to weather or other issues (e.g., global pandemics), and overdispersed discrete counts for many species (20 rodent species) that include large numbers of zeros and upper bounds set by the number of traps. '
Is ‘overdispersion’ a property of the data, or a property of the data given an assumed distribution, typically without a scale parameter (e.g., Poisson and binomial)?
(5)
Line 205: 'Each observation was a vector of total captures on long-term control plots for the nine remaining species (Figure 1). '
I do not understand how a single observation is represented as a vector. Do the authors mean that observations for each species were vectors?
(6)
Line 231: ' [...] we needed to model these dynamics [...]. '
Line 232: 'Second, we needed to laverage community information [...]. '
Line 288: 'To adequately evaluate competing forecast models, it is also necessary to perform [...]. '
The authors could leave these types of justifications for the introduction and discussion, and they simply introduced the model components and their rationale here as they do a few lines below:
Line 234: 'Because species’ responses to environmental change in this system are expected to be nonlinear (Brown & Ernest 2002), we used splines to model these responses. '
(7)
Line 232: 'Second, we needed to laverage [...]. '
I think the correct spelling is leverage.
(8)
Line 273: 'The first benchmark model used the same HGAM linear predictor as the GAM-VAR'
This is the first (and only) time the acronym ‘HGAM’ appears in the text, so a definition is needed.
(9)
Line 313: 'Stan’s diagnostics guided us to a model that could be reliably estimated, which included deviation functions for the four most frequently captured species (D. ordii, D. merriami, Onychomys torridus and C. penicillatus). '
The authors could be more specific about what ‘deviation functions’ refer to.
(10)
Line 349: 'For example, Ord’s kangaroo rat (D. ordii) and silky pocket mouse (P. flavus) had negative cross-dependencies in the GAM-VAR, providing structure that the model used to make predictions (Figure 4). '
The wording of this sentence is poor. It is the user who makes predictions using the estimated model, not the model itself, and it is not clear what the authors mean by ‘providing structure’.
(11)
Line 384: 'For these species, the model expected increases in ~70% of simulations and decreases in ~30% (Figure 6). '
The wording of this sentence can be improved: the 'model' is the whole posterior distribution, which indicates a ~70% probability of increasing abundances. I suggest replacing it with: 'For these species, the model shows a ~70% chance of increasing abundance (Figure 6).’
(12)
Line 285: 'While primary conclusions were generally similar when using the GAM-AR no pooling model, which did not leverage multi- species learning, the estimates of these contrasts were much more variable (Figure S15). '
I cannot see how this is related to Figure S15.
(13)
Line 398: 'The five species that relied solely on the global function (O. leucogaster, C. baileyi, P. eremicus, P. flavus and R. megalotis) were expected to show tighter spring peaks and autumn troughs.'
Related to (12), I think this needs to be explained in more detail.
(14)
Line 402: 'There was not enough information to learn nonlinear distributed lag functions for these five species, with the model instead estimating flat functions centred on zero for all five species (Figure S17). '
Did the authors intend to refer to Figure S16?
Comments:
(1)
There is explicit mention of some general ‘data challenges’ in the Introduction, and Material and Methods section, some of them present in the analysed dataset. For instance:
- Line 108: 'Finally, because monitoring wildlife is challenging, data complexities (e.g., irregular sampling intervals, observation errors, missing samples, and overdispersed discrete counts with meaningful lower and/or upper bounds) bring additional challenges into an already complicated modelling environment (Clark & Wells 2023). '
- Line 141: 'Moreover, most multispecies time series models fail to incorporate one or more of the many important real-world complexities observation errors, missing values, non-linear responses to environmental drivers, and latent temporal dynamics that plague real-world forecasting applications (Clark & Wells 2023; Daugaard et al. 2022; Holmes et al. 2014; Royle & Nichols 2003). '
- Line 193: 'The Portal dataset exhibits many of the complexities that confront population forecasting. These include observation errors due to imperfect detection, missing samples due to weather or other issues (e.g., global pandemics), and overdispersed discrete counts for many species (20 rodent species) that include large numbers of zeros and upper bounds set by the number of traps. '
However, it is not clear how, or whether, some of these issues are dealt with in the proposed model. For instance, I cannot see from Figure 2 how the authors address zero inflation, overdispersion, or censoring in the data. I understand the proposed model is already very complex and including these elements might be unfeasible. If this is the case, I would suggest disclosing these limitations in the methods section and adding a brief discussion in the discussion section. For the data challenges that have been addressed with the current model, the link to the specific model components (e.g., latent state model definition to deal with imperfect detection) could be more explicitly stated in the Materials and Methods section.
(2)
The authors investigate whether the inclusion of two model components, the partial pooling of (lagged) environmental responses and the temporal covariance of species abundances, improves predictions. For this, they build three models: a full model (GAM-VAR), a model without species dependencies (GAM-AR), and a model without species dependencies or partial pooling of environmental responses (GAM-AR no pooling). It is not clear why the authors chose this combination of model components and did not consider a model with species dependencies but without partial pooling (GAM-VAR no pooling). For completeness and given that the ‘GAM-VAR no pooling’ model is relatively common in the literature, I would suggest including it in the analysis.
Minor points:
(1)
Line 221: 'Measurements for both covariates were converted to monthly averages. '
Please specify how the monthly average for 'minimum temperature' was calculated. Is it the average of daily minimum?
(2)
Line 252: 'The sum of these effects allowed each species to show a different temperature response from the wider community, but only if there was information in the data to support such a deviation.'
What did the authors consider 'enough information in the data?'
(3)
Line 258: 'Off-diagonals represented cross-dependencies that could provide useful biological insights into interspecific interactions. '
When describing the ‘A matrix’ of species dependencies the authors should also specify whether the matrix was set to be symmetric or not and why, as the ecological interpretation of coefficients might be different.
no comment
This is an interesting manuscript that addresses the relevant issue of building sound and theoretically grounded models for near-term forecasting of multi-species assemblages, a very active area in ecology. Specifically, the authors leverage the explicit inclusion of species’ temporal covariance, shared environmental responses, and lagged effects to improve near-term predictions. The proposed approach could also be useful for inferring ecological processes from data, so it is interesting from a theoretical and an applied perspective.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.