A critical issue in modelbased inference for studying traitbased community assembly and a solution
 Published
 Accepted
 Received
 Academic Editor
 Maria Dornelas
 Subject Areas
 Ecology, Statistics
 Keywords
 Generalized linear models, Poisson regression, Community composition, Fourthcorner problem, Compositional count data, Traitenvironment association, Loglinear model, Negativebinomial response
 Copyright
 © 2017 Ter Braak et al.
 Licence
 This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
 Cite this article
 2017) A critical issue in modelbased inference for studying traitbased community assembly and a solution. PeerJ 5:e2885 https://doi.org/10.7717/peerj.2885 (
Abstract
Statistical testing of traitenvironment association from data is a challenge as there is no common unit of observation: the trait is observed on species, the environment on sites and the mediating abundance on speciessite combinations. A number of correlationbased methods, such as the community weighted trait means method (CWM), the fourthcorner correlation method and the multivariate method RLQ, have been proposed to estimate such traitenvironment associations. In these methods, valid statistical testing proceeds by performing two separate resampling tests, one sitebased and the other speciesbased and by assessing significance by the largest of the two pvalues (the p_{max} test). Recently, regressionbased methods using generalized linear models (GLM) have been proposed as a promising alternative with statistical inference via sitebased resampling. We investigated the performance of this new approach along with approaches that mimicked the p_{max} test using GLM instead of fourthcorner. By simulation using models with additional random variation in the species response to the environment, the sitebased resampling tests using GLM are shown to have severely inflated type I error, of up to 90%, when the nominal level is set as 5%. In addition, predictive modelling of such data using sitebased crossvalidation very often identified traitenvironment interactions that had no predictive value. The problem that we identify is not an “omitted variable bias” problem as it occurs even when the additional random variation is independent of the observed trait and environment data. Instead, it is a problem of ignoring a random effect. In the same simulations, the GLMbased p_{max} test controlled the type I error in all models proposed so far in this context, but still gave slightly inflated error in more complex models that included both missing (but important) traits and missing (but important) environmental variables. For screening the importance of single traitenvironment combinations, the fourthcorner test is shown to give almost the same results as the GLMbased tests in far less computing time.
Introduction
According to the habitat templet theory (Southwood, 1977; Townsend & Hildrew, 1994), evolution selects for species characteristics (i.e., traits) appropriate to their environment. Such traits influence community assembly (Ackerly & Cornwell, 2007) and a major goal in contemporary ecology has become to identify among a set of traits which ones interact with the environment and which do not. Although most traits may influence the way species are distributed in space, not all environmental features necessarily select for these traits, hence the need to test for traitenvironmental interactions and the need to select only the relevant traitenvironment combinations in predictive models.
The first statistical methods to uncover and describe traitenvironment associations were correlationbased with as prime examples the Community Weighted trait Means (CWM) approach (Lavorel et al., 2008) that correlates community weighted means with environmental features, the fourthcorner correlation (Dray & Legendre, 2008; Legendre, Galzin & HarmelinVivien, 1997) and the multivariate method RLQ (Dolédec et al., 1996; Dray et al., 2014). See Kleyer et al. (2012) for a review of methods. Notwithstanding the availability and wide use of these methods, it took some time to understand the behaviour of these methods and to develop valid statistical tests to assess traitenvironment associations. Dray & Legendre (2008) showed that randomization tests based on either site or species permutations lead to increased type I error rates. The issue of increased type I error rate was solved by Ter Braak, Cormont & Dray (2012) and PeresNeto, Dray & ter Braak (2016) who showed that regardless of the method used to assess traitenvironment relationships, valid statistical testing requires both a sitebased and a speciesbased analysis, each resulting in a pvalue. They showed that correct rates are achieved by assessing significance by the largest of the two pvalues (the p_{max} permutation test). If it is desired to account for phylogenetic relationships among the species and/or spatial and temporal correlations among the sites, the random permutations should be replaced by restricted permutation or bootstrap (Lapointe & Garland, 2001; Wagner & Dray, 2015), but the principle of the p_{max} test remains unchanged.
More recently, regressionbased methods have been proposed for studying traitenvironment relations (Brown et al., 2014; Cormont et al., 2011; Jamil et al., 2013; Pollock, Morris & Vesk, 2012; Warton et al., 2015; Warton, Shipley & Hastie, 2015). These methods model the abundance (or presence–absence) of multiple species across sites (communities) as a function (linear or nonlinear) of species traits and environmental variables. If these are generalized linear (mixed) models, main effects of traits and environmental variables and their interactions are specified on a linkscale, for count data usually the logscale giving a loglinear model. Interaction terms represent traitenvironment associations, each being a product of a single trait with a single environmental variable. The associated (standardized) regression coefficients provide insights regarding the strength and direction of traitenvironment associations (positive and negative associations, e.g., largebodied species tend to occur more often in lowtemperature environments than in hightemperature ones, known as Bergmann’s rule (Bergmann, 1847).
The main effects in the GLM model represent the separate effects of environmental variables and traits on species distributions. For example, traits might be used to predict the distribution of species in an ‘average’ environment; by adding the environmental effects and their interaction with the traits, the distribution of a given species in a specific environment may be predicted. Also, by setting main effects to be polynomial terms of quantitative trait and environment variables (or, simply, to factors for species and site) the model includes the simplest model for ecological niches, which shows Gaussian species response to the environmental variable and has equal niche breadths (Jamil et al., 2013; Jamil & ter Braak, 2013; Ter Braak & Looman, 1986). If the environmental optima of species in this model are related to their traits, the traitenvironment relationships are exactly represented by the interaction terms, all being a product of a given environmental variable and a given trait. In regressionbased analyses of traitenvironment relations, neither the trait nor the environment takes the role of response variables; instead they are predictor variables in a model in which the response variable is the abundance (or presence–absence) of species in sites.
In this paper, we focus on the performance of statistical tests to assess the significance of interaction terms in GLMbased methods as these represent traitenvironment relations. Different testing procedures have been used so far. Cormont et al. (2011) fit a linear traitenvironment model (LTE) with interactions to logtransformed species data and used the p_{max} permutation test of Ter Braak, Cormont & Dray (2012) for statistical testing of the interaction. Jamil et al. (2012) implemented a mixedmodel approach with fixed traitenvironment coefficients and random speciesbyenvironment coefficients; interaction terms were selected via a tiered forwardselection approach and their significance was assessed by likelihood ratio tests. Pollock, Morris & Vesk (2012) and Jamil et al. (2013) moved away from ‘leastsquares after data transformation’ by using generalized linear mixed models (GLMM) for count or presence–absence data. Brown et al. (2014) and Warton, Shipley & Hastie (2015) developed the same model under a simpler GLM framework without the extra random terms for species, and complemented it with resampling techniques, namely crossvalidation for selection and bootstrapping of sites for statistical testing. Their method is thus essentially a communitylevel analysis in that the site (community) is the statistical unit on which the statistical inference is based. Their method of statistical testing is implemented via the functions ‘traitglm’ and ‘anova.traitglm’ in the R package ‘mvabund’ (Wang et al., 2012). These GLMbased methods are quite promising because they use mainstream statistical methods, allowing model selection, prediction and standard diagnostics for goodnessoffit as well as greater flexibility in accounting for additional structure in the data, compared to previous correlationbased approaches.
Conceptually, the most important difference between these GLM and GLMM approaches is that Brown et al. (2014) and Warton, Shipley & Hastie (2015) solve the problem of residual correlation among species in their GLM by resampling sites and ignore any specieslevel random effects. Conversely, Pollock, Morris & Vesk (2012) and Jamil et al. (2013) account for randomness at the specieslevel in their GLMM, but have either no, or only a very simplistic, method to account for residual correlation among species, respectively. In practice, GLM is much quicker and numerically more stable than GLMM, so that using GLMM with bootstrapping is not yet very appealing, particularly when many environmental variables need to be analyzed simultaneously.
Whereas the performance of statistical tests associated with the fourthcorner correlation has been well evaluated in the literature (Dray & Legendre, 2008; PeresNeto, Dray & ter Braak, 2016; Ter Braak, Cormont & Dray, 2012), no study has been designed to evaluate and compare the type I error rate and power of statistical tests on traitenvironment interaction in the GLM framework. One obvious question is how the simpler GLM model performs when the data actually follow a loglinear model with random effects, i.e., a GLMM model (Jamil et al., 2013). Is it then sufficient to simply resample (bootstrap or permute) sites or residuals associated with sites, which is similar to a communitylevel analysis, or is there also a need for a specieslevel analysis (resampling species or residuals associated with species) as found for correlationbased approaches, and, if so, how can we then explain the difference in outcome between the communitylevel and specieslevel analyses?
In this paper we investigate these questions by simulating data according to models with and without traitenvironment interaction and reanalysing a literature data set as an illustrative case study. We apply four statistical tests which differ in the way resampling is performed (resampling sites, species or both) and how the test statistic is calculated (assuming a negativebinomial or a Poisson distribution). We report on the observed type I error rates of these procedures in the data sets simulated without traitenvironment interactions and their power in simulated data sets with this interaction. We also apply the predictive modelling approach of Brown et al. (2014) to see how many times traitenvironment terms were falsely judged predictive when there was in fact no interaction between the observed traits and the environment. The simulated data were based on Gaussian response models (Dray & Legendre, 2008; PeresNeto, Dray & ter Braak, 2016) and loglinear models with random effects (i.e., GLMM models).
Theory and Methods
Statistical models for traitenvironment interaction
The data required for the statistical analyses in this paper are abundance data of m species across n sites, together with trait data on the m species, and environmental data on the n sites. In its simplest form, there is one quantitative trait t (with m values {t_{j}}, j = 1, …, m) and one quantitative environmental variable e (with n values {e_{i}}, i =1, …, n). The abundance of species j in site i, denoted by y_{ij}, is assumed to be (similar to) a count that follows a negativebinomial distribution with mean μ_{ij} and unknown overdispersion. The statistical model that we use for detecting the traitenvironment interaction is the GLM model (1)$log\left({\mu}_{ij}\right)={R}_{i}+{C}_{j}+{b}_{te}{t}_{j}{e}_{i},$ where b_{te} is the coefficient measuring direction and strength of the traitenvironment (te) interaction and R_{i} and C_{j} are the main effects for site i and species j, respectively. The main effects are thus formed by factors for site and species and can thus approximate any nonlinear function of e and t, respectively. In total, the model has n + m +2 unknown parameters (including the unknown overdispersion).
An alternative for the GLM model is the GLMM model, which we present here for later reference, (2)$log\left({\mu}_{ij}\right)={R}_{i}+{C}_{j}+{\beta}_{j}{e}_{i}\phantom{\rule{1em}{0ex}}\text{with}\phantom{\rule{1em}{0ex}}{\beta}_{j}={b}_{0}+{b}_{te}{t}_{j}+{\epsilon}_{\beta j}$ with β_{j} a speciesspecific slope with respect to e, modelled as a linear model of trait t, with intercept b_{0} and slope b_{te}, and with ε_{βj} a normally distributed error term with mean 0 and variance ${\sigma}_{\beta}^{2}$. By inserting the model for β_{j} in the loglinear model, we see that b_{te} is indeed the coefficient of the interaction t_{j}e_{i}. The term b_{0}e_{i} can be absorbed in the row main effect R_{i}, and ε_{βj}e_{i} represents additional speciesspecific random variation that interacts with the observed environment.
Statistical tests on traitenvironment interaction
The traitenvironment interaction in Eq. (1) can be tested by fitting the model with and without the interaction term, the latter by setting b_{te} = 0, calculating the likelihood ratio (LR) statistic of the two models for the data and assessing its significance via resampling (Warton, Shipley & Hastie, 2015). We consider four ways of carrying out the test. The first test uses the LR based on a negativebinomial response distribution and is therefore rather slow. To investigate whether we could improve on speed without sacrificing the type I error rate and loosing (too much) power, we set the response distribution to Poisson in the other three tests, as any issue due to overdispersion is accounted for by the resampling procedure. Moreover, theory tells that Poisson likelihood is the only likelihood for nonnegative data and models that gives consistent estimates under misspecification of the distribution; the normal likelihood/leastsquares has this feature for unbounded data (Gourieroux, Monfort & Trognon, 1984a; Gourieroux, Monfort & Trognon, 1984b; Wooldridge, 1999). In detail, the four tests are:

1.
anova.traitglm. The first test uses the ‘anova.traitglm’ function of version 3.11.5 of the R package mvabund (Wang et al., 2012) with sitebased resampling and calculation of the LR assuming a negativebinomial distribution (as in the datagenerating models of ‘Statistical models for traitenvironment interaction’ and ‘Simulation models’). This function resamples sites by bootstrapping probability integral transform (PIT) residuals with all residuals across species of the same site kept together. The code for obtaining the significance of the interaction is simply:
model1 < traitglm(Y,E,T, composition=TRUE)
anova.traitglm(model1, nBoot=nBoot)
where Y, E and T contain the abundance data y_{ij}, the environmental values e = e_{i} and the trait values t =t_{j}, respectively. In the manytraits case, T contains the q simulated traits. The number of bootstrap samples is set to 39. With the observed sample this gives 40 samples and a minimum pvalue of 1/40 = 0.025. This number is sufficient at the nominal level of 5% of the test that we used, as Monte Carlo significance tests are unbiased for any number of resamples (Hope, 1968). We could increase the number of bootstraps to obtain a small increase in power.

2.
sites. The second test also resamples sites, but differs from the first test in that it permutes sites (instead of bootstrapping them) and calculates the Poisson LR (instead of the negativebinomial LR). For statistical testing (rather than estimation), we prefer permutation to bootstrapping. Therefore, we wrote special purpose functions in R using glm on vectorised abundance data Y for calculating the LR statistic assuming Poisson distributed abundance data. When permuting sites, the values in e are permuted instead of the rows of the abundance table Y (Appendix S2). We used 39 permutations to make the test most similar to the anova.traitglm test.

3.
species. This test is similar to the previous one but permutes species (instead of sites) and calculates the Poisson LR.

4.
max r/c. The fourth test applies both tests 2 and 3 and takes the maximum of their pvalues. It thus a GLMbased p_{max} test. The ‘r/c’ is a mnemonic for rows/columns (corresponding to sites/species).
Simulation models
Abundance data on m = 30 species in n = 30 sites were generated by two simulation models, a loglinear simulation model and a onedimensional Gaussian response model, detailed in Appendix S1. In summary, two traits t and z (both m values) and two environmental variables e and x (both n values) were drawn independently from the standard normal distribution; t and e are taken as the observed trait and the observed environment, respectively, and z and x as unobserved. As such, variables z and x are latent variables that are unrelated to (i.e., independent of) the observed ones; alternatively, z and x represent simply noise, more specifically, variability among species in their response to the environment, as in GLMM models (Jamil et al., 2013; Pollock, Morris & Vesk, 2012), and unobserved variability among sites, respectively. Either way, unobserved variation is likely the case in most ecological data and needs to be considered more often in simulation studies.
The statistical test procedure seeks to detect whether the observed trait t and the observed environment e are associated (i.e., interact), without knowledge about the two latent variables z and x, as these are unobserved/unmeasured. In the null models, abundance data are generated without any interaction between the observed t and e, but with an interaction between, for example, the unobserved trait z and the observed environmental variable e.
In the Gaussian response model, this is achieved by generating the expected abundance μ_{ij} of species j at site i as a Gaussian response function of e and z: (3)${\mu}_{ij}={h}_{j}exp\left[\frac{{\left({e}_{i}{z}_{j}\right)}^{2}}{2{\sigma}_{j}^{2}}\right],$ where h_{j} is the maximum value and σ_{j} is the tolerance or niche breadth of species j that are both constants or random, with σ_{j} independent of t. As z and σ_{j} are independent of the observed trait t, this model by definition contains no association between t and e. This is the ‘environmentally structured but trait random’ case, in short the ‘trait random’ case. Similarly, we can define a ‘trait structured but environment random’ case, in short the ‘environment random’ case, by replacing e_{i} by x_{i} and z_{j} by t_{j}, and a ‘both random’ case by replacing e_{i} by x_{i} in Eq. (3).
The corresponding loglinear simulation model has free main effects (R_{i} and C_{j}) and one interaction, namely either z_{j}e_{i} or t_{j}x_{i} respectively, e.g., for the ‘trait random’ case (4)$log\left({\mu}_{ij}\right)={R}_{i}+{C}_{j}+{b}_{ze}{z}_{j}{e}_{i}$ with b_{ze} the coefficient measuring the direction and strength of the ze interaction, and similarly for other interactions later on. The interaction t_{j}e_{i} is missing, so there is no association between t and e in the model. Equation (4) is like Eq. (3), the model of a GLMM with random speciesspecific slopes with respect to the environment e (Jamil et al., 2013), but without t–e interaction (b_{te} = 0).
The Gaussian model of Eq. (3) and loglinear model of Eq. (4) are closely related, at least when the tolerance is constant (σ_{j} = σ). First, the maximum h_{j} does not depend on the trait and thus log(h_{j}) can be absorbed in the free coefficient C_{j} in Eq. (4). Working out the square in Eq. (3) gives squared terms in e_{i} and z_{j} that can be absorbed in the free main effects of Eq. (4) and a product z_{j}e_{i} with associated coefficient1∕σ^{2}, which is then equal to b_{ze} in Eq. (4). With variable niche breadths, the Gaussian response model is not an easy GLM.
The loglinear model allows one case that is not available in the onedimensional Gaussian model. This case has two interaction terms: t interacts with x and e interacts with z, but t does not interact with e, (5)$log\left({\mu}_{ij}\right)={R}_{i}+{C}_{j}+{b}_{ze}{z}_{j}{e}_{i}+{b}_{tx}{t}_{j}{x}_{i}.$ These are simple models. In the reported simulations, we also included structured and unstructured noise in the expected abundances, respectively (Appendix S1); these are important to make the data more realistic, but are not essential for our main results.
Abundance data y_{ij} (i = 1, …, n; j = 1, …, m) were drawn from the negativebinomial distribution with mean μ_{ij} and overdispersion parameter 1, giving variance function ${\mu}_{ij}+{\mu}_{ij}^{2}$.
To mimic a situation with many traits and a single environmental variable, t_{j} in the above equations was taken as the sum of q independent traits, normalized to a theoretical variance of 1.
To study the power of different methods to assess parameter significance, once the type I error had been controlled, the interaction between the observed trait and environment, i.e., the term b_{te}t_{j}e_{i}, was added to the loglinear model with nonzero regression coefficient b_{te}, and z_{j} was replaced by t_{j} in the Gaussian response model. Each model was simulated 1,000 times.
Predictive modelling by lasso
Brown et al. (2014) proposed selecting traitenvironment interactions on the basis of their predictive power using GLM with the least absolute shrinkage and selection operator (lasso). This approach uses penalized regression and a key point is the selection of the penalty parameter; if it is set very high, no predictor variable enters the model, if it is set to 0 all variables enter. The penalty is usually selected by crossvalidation. Brown et al. (2014) used a sitebased crossvalidation approach, treating species as fixed factors. To evaluate this approach, we simulated 1000 data sets as in the ‘trait random’ case of the Gaussian model with n = m = 30 and 10 traits and a single environmental variable. We applied the function traitglm of the R package mvabund with arguments method = “cv.glm1path” and composition = TRUE, then counted the number of simulated data sets in which the best model, i.e., that gave the “best predictive performance” (Wang et al., 2012), contained any nonzero traitenvironment coefficients. As a control experiment, we analyzed data from a model without any interaction of a latent trait with the observed variables. We also applied speciesbased crossvalidation to the same data sets to determine whether this led to similar results.
Case study
As an illustration, we reanalysed the data of Choler (2005) who investigated the shift in Alpine plant traits along a snowmelt gradient. This is the aravo data set in the R package ade4 (Dray & Dufour, 2007). It has 82 species and 75 sites. Here we considered only one trait and one environmental variable, namely Spread (maximum lateral spread of clonal plants) and Snow (mean snowmelt date in Julian day averaged over 1997–1999). Because the distribution of Spread is rightskewed, it was logarithmically transformed before analysis.
First, we performed statistical tests based on the squared fourthcorner correlation, using the R package ade4 (Dray & Dufour, 2007). Second, we performed regressionbased analyses as described in Brown et al. (2014) and Warton, Shipley & Hastie (2015). In such analyses, the abundance of the species is the response variable for which we must specify a distribution. We used the default distribution of the R package mvabund (Wang et al., 2012), the negativebinomial distribution, and checked whether this distribution is tenable for these data. Following Warton, Shipley & Hastie (2015), we did this by plotting the DunnSmith residuals of the model ‘site+species+species:Snow’ against the fitted values. The plot (Fig. S1) does not show any clear pattern that would suggest an illspecified mean–variance relation. We therefore continued analysis with this default distribution. We then applied the anova.traitglm test and the other three tests in ‘Statistical tests on traitenvironment interaction.’ We used 999 resamples in each test. Third, as an alternative to statistical testing, we applied predictive modelling as in ‘Predictive modelling by lasso.’ With a single traitenvironment interaction, the choice is between a model with and a model without the interaction.
Results
Statistical testing results
In the Gaussian model simulations, the ‘trait random’ case is the only one that generates a significantly inflated type I error rate (>0.40) in the anova.traitglm test, whereas the max r/c test provides appropriate rates (Table 1). With six species and five or ten random traits, the type I error rate of anova.traitglm was even more inflated (>0.75), and for twenty species with 10–30 traits the type I error rate was even greater than 0.90.
Test procedure  Scenario  

Trait random (z and e)  Environment random (t and x)  Both random (z and x)  
anova.traitglm  0.417  0.055  0.045 
max r/c  0.047  0.055  0.019 
In the loglinear simulation model, similar inflation of type I error rates is found. The type I error rate of the anova.traitglm test increases with the size of the coefficient b_{ze} in Eqs. (4) and (5) (Fig. 1, grey lines) up to values around 0.50. The sites test results in similar pvalues. The species test results in much lower pvalues than the sites test when b_{tx} = 0, but similar pvalues if b_{tx} = b_{ze} (solid and dotted blue lines in Fig. 1, respectively). The latter is expected as, when b_{tx} = b_{ze}, the model is perfectly symmetric in species and sites. The max r/c test, which takes the maximum of the pvalues per simulated dataset, results in pvalues around or below 0.05 if b_{tx} = 0 but has a slightly inflated type I error rate if b_{tx} = b_{ze} > 0.4 (solid and dotted black lines in Fig. 1, respectively). The highest type I error rate of the p_{max} test is 0.08 (Fig. 1).
Figure 2 shows the power of max r/c test. Recall that power here is estimated at a higher than 0.05 nominal level because the test does not fully control the type I error. The actual size of the test is 0.08 (the maximum observed type I error rate for this test in Fig. 1). As expected, power increases with the traitenvironment coefficient b_{te}, but decreases with increasing b_{ze}, which represents either the interaction of e with an unmeasured trait or noise in the model for the speciesspecific slopes in Eq. (2). The effect of an additional noise component (modelled by b_{tx}) on power is not very large, except near b_{te} = 0, where power is in fact type I error rate, which was inflated when b_{ze} > 0.4, as in Fig. 1.
Predictive modelling results
In the ‘trait random’ case, sitebased crossvalidation to select the penalty parameter of the lasso often resulted in models with traitenvironment terms that have, in fact, no predictive value; the number of data sets with such terms was 851 (in 1,000 simulated data sets) with 463 data sets giving one interaction coefficient that was at least 0.1 in absolute value. In the control experiment using a model without any interaction of a latent trait with the observed variables, we found low numbers of false positive (<20 out of 1,000). With m = 10 (instead of 30), there were 855 false positives numbers with 710 and 403 data sets with a coefficient larger than 0.1 and 0.3, respectively. Applying speciesbased crossvalidation resulted for m = 30 in two data sets with nonzero traitenvironment coefficients and for m = 10 in ten such data sets.
Case study results
Statistical tests on the interaction between Spread and Snow using the fourthcorner correlation resulted in pvalues of 0.002 and 0.361 for the site and speciesbased permutation tests, respectively (999 permutations). In the p_{max} test, these pvalues are combined by taking their maximum and the final pvalue was thus 0.361. The conclusion from the fourthcorner analysis is that there is no evidence that Spread and Snow are associated.
Using GLMmodels, the sitebased approaches found evidence for interaction between Spread and Snow (pvalue of 0.001 for both the anova.traitglm and sites tests). By contrast, the species and max r/c tests did not detect evidence for this interaction (pvalue of 0.376 for both).
Sitebased crossvalidation in lassobased predictive modelling selected the model with interaction, with a fourthcorner regression coefficient of 0.066 with respect to the standardized trait and the standardized environment variable. Speciesbased crossvalidation selected the model without interaction.
Discussion
Our simulations showed that resampling of sites is simply not sufficient to generate a valid statistical test on traitenvironment association under a GLM framework. This holds true not only for advanced bootstrap methods, such as is implemented in the anova.traitglm function of the R package mvabund (Wang et al., 2012), but also for permutation tests using either GLM (Fig. 1) or simple methods such as CWM and fourthcorner (PeresNeto, Dray & ter Braak, 2016). The reason is that sites are not the only random statistical units; species are random units as well (Ives & Helmus, 2011), particularly when we want to determine which traits interact with particular environmental features in conditioning species distributions. Species and sites form two crossed random factors and both need to be considered in models. A GLM without resampling would account only for the residual variation (technically, the nm independent response distributions of the nm abundance values). A GLMM model is able to account for more levels of randomness. A GLMM model for traitenvironment association accounts for the randomness of species responses to the environment and for how traits modulate these responses. These levels of randomness are not considered in GLMs; therefore, these will not provide valid statistical inference (see Appendix S3 for an explicit explanation). An escape is to mimic the randomness and discover the relevant variation by resampling. Therefore, we propose in this paper to resample not only sites but also species and to combine the two tests in the p_{max} procedure. Note that a test using simultaneous resampling of species and sites, as originally proposed for RLQ (Dolédec et al., 1996), does not control the type I error (Dray & Legendre, 2008); instead resampling needs to be performed independently for sites, which fixing species, and viceversa for species.
All sitebased analyses of the case study data found evidence for SpreadSnow interaction, but the speciesbased analysis, and thus the p_{max} test, did not. Why do sitebased and speciesbased analyses differ with these data? An explanation is that there is an ‘unobserved’ trait, namely Specific Leaf Area (SLA), that is almost uncorrelated with Spread (correlation 0.02) and that strongly interacts with Snow as both the sitebased and speciesbased tests are highly significant using either the fourthcorner correlation or the GLM tests using Poisson loglikelihood (all p < 0.001, yielding p_{max} < 0.001). From this, we thus conclude that there is evidence that SLA interacts with Snow and no evidence that Spread interacts with Snow. The sitebased resampling tests are not able to detect the large variation among the speciesspecific regression coefficients in Eq. (2). In the GLM fit, Spread explains an amount of variation that is easily explainable by chance. The conclusion is that the GLMbased tests require not only sitebased resampling but also speciesbased resampling.
One might think that the problem with sitebased resampling is just an ‘omitted variable bias’ problem (Robins, Rotnitzky & Zhao, 1994), also known as a ‘confounding variable’ problem (McDonald, 2014), which leads to biased regression coefficients and thus biased inference. This problem identifies a wellknown fundamental limitation of all regression models and occurs when omitting a variable that is both correlated to the response and to the predictor. Our case is different in that the omitted variable is uncorrelated to the variables under investigation, the trait and environmental variables. In addition, the problem we identify is not bias in the regression coefficient but a wrong (too low) standard error of estimate. Instead of viewing it as a case of “omitted variable bias”, we consider it as a case of “ignoring random effects,” which here means ignoring species as a random factor.
The p_{max} procedure was shown via simulations to control the type I error in simple models for traitenvironment association, such as the ones used in the context of fourthcorner correlation (Dray et al., 2014; PeresNeto, Dray & ter Braak, 2016) and GLMM (Jamil et al., 2013; Pollock, Morris & Vesk, 2012). However, the present study assessed these models also under more complex (possibly more realistic) scenarios including multiple latent factors. We show that if both the trait and the environmental variables interact with an important latent environmental variable and an important latent trait, respectively, the control over type I error is faulted. That said, in all simulations we carried out (with up to 100 species and sites), we never observe a type I error rate above 0.10. In even more complex scenarios, however, the inflation could possibly be more severe. At this point, as long as nothing better is proposed, our advice is to divide the nominal level in the p_{max} test by two (e.g., 0.05/2 = 0.025) for species and site numbers between 20 and 100, so as to control empirically the type I error at a level of 0.05. This is akin to what happens in Bonferronicorrection for multiple testing (Verhoeven, Simonsen & McIntyre, 2005). With numbers of both species and sites below 20, based on our results, correction is not necessary. A more precise correction factor for any specific number of species and sites could be determined via simulation of data using models in which the interactions between the traits and the environment are set to zero.
The fact that the p_{max} test does not fully control the type I error in Eq. (5) should not come as a surprise. The test considers only whether there is a link between the species distributions and the environment(Y↔e) and a link between the species distributions and the trait (Y↔t) at the nominal significance level. What these links entail is not part of the theory behind the test (Ter Braak, Cormont & Dray, 2012). For example, by taking a test statistic that is sensitive to main effects, one could easily establish these links. For detecting traitenvironment association, the test statistic should not be sensitive to main effects and be able to detect interactions only. There is a more fundamental issue, however. In the GLM, traitenvironment association is an interaction, but interaction is linkdependent. An extreme example of this is that main effects in a loglinear model are products (i.e., interactions) in a linear model (with identity link). With binary data, a GLM is often used with a logistic link function, generating implicitly another definition of interaction. Seldom is a logarithmic link used as it leads to numerical issues. The complementary log–log link function is the one that is closest to the count scale and is perhaps the most appropriate for traitenvironment modelling. In Eq. (5), trait t and the environmental variable e are both important in the speciessite interaction space of the loglinear model. One could argue that this model shows some form of t–e association, namely when analyzed with any other link function than the logarithmic one. The same would happen the other way round, i.e., when the logarithm in the generating model is replaced by a slightly different function (reality being not precisely loglinear), but the analysis uses the loglink. Hence, model misspecification could lead to different (and possibly wrong) interpretations of traitenvironment interactions.
We made the computation quicker by changing the negativebinomial likelihood to the Poisson likelihood. This did not appear to affect the power of the test much, as observed in the comparison of the anova.glm and sites tests in Fig. 1. We used random permutations in our tests. In future applications, this may need further adaptation as sites may be structured in space (spatial autocorrelation) and time (temporal autocorrelation) and species form a phylogeny (phylogenetic autocorrelation), so that neither sites nor species are really completely independent or exchangeable units. The net effect will be that the effective number of units (i.e., degrees of freedom) is actually smaller than the number observed in the data, likely generating a liberal test when random permutations are used. Possible alternatives for random permutations are restricted permutations (Lapointe & Garland, 2001) or data simulation that keeps the original spatial or phylogenetic structure in data (Wagner & Dray, 2015). Note also that the need of resampling for statistical inference also implies that the standard errors of the coefficients representing traitenvironment interaction as estimated by GLM are likely far too small.
Along with the GLMbased tests for association between a single trait and a single environmental variable, we also carried out the tests using the fourthcorner correlation. They resulted in almost identical pvalues per data set. This finding has led to a companion paper that shows that the squared fourthcorner correlation times the abundance total is a Rao score test statistic on the interaction term in a Poisson loglinear model (Ter Braak, in press). This finding was not totally unexpected because Appendix S1 of Brown et al. (2014) contains a proof of the equivalence between the fourthcorner correlation and GLM when trait and environment are factors (although the test statistics are still different, chisquare in fourthcorner and deviance in GLM). The nearequivalence with quantitative trait and environment can be understood mathematically. Under resampling, the term b_{te}t_{j}e_{i} is small, so that the expected abundance in the loglinear model (6)${\mu}_{ij}={R}_{i}^{\ast}{C}_{j}^{\ast}exp\left({b}_{te}{t}_{j}{e}_{i}\right)\approx {R}_{i}^{\ast}{C}_{\mathrm{j}}^{\ast}\left(1+{b}_{te}{t}_{j}{e}_{i}\right)$ where the latter can be recognised as the reconstitution formula of a (doubly constrained) correspondence analysis (Ter Braak, 2014), which, for a quantitative trait and a quantitative environment variable, is in fact equivalent to the fourthcorner analysis. The test using the fourthcorner correlation is 140 times quicker to compute than the GLMbased test in our R implementation and 1,400 times quicker than the anova.traitglm test. PeresNeto, Dray & ter Braak (2016) already showed that the fourthcorner test is not sensitive to loglinear main effects, thus having power to detect interactions in loglinear models. In the case of single correlations (i.e., pairwise combinations between each trait and each environmental factor), the fourthcorner p_{max} test is convenient and reliable (PeresNeto, Dray & ter Braak, 2016). The GLMbased models were designed for multiple traits and multiple environmental variables. Their advantage and disadvantage relative to rivals such as RLQ, doubly constrained correspondence analysis (Kleyer et al., 2012; Lavorel et al., 1998) and the combination of sitebased and speciesbased redundancy analyses (Kleyer et al., 2012; PeresNeto & Kembel, 2015; Šmilauer & Lepš, 2014) need to be investigated.
Supplemental Information
The simulation models
Mathematical description of the Gaussian response model and loglinear model used in the simulation.
Annotated Rfunctions and scripts
Description of the R functions and script with example output.
Diagnostic plot
DunnSmith residuals of the model ‘site+species+species:Snow’ against the fitted values in the aravo data set.
Rfiles
The zip contains three Rfiles. The file trait_env_Type_I_error_examples.R is the driver script, the other contain functions as explained in Article S2.