Confirmatory path analysis allows researchers to evaluate and compare causal models using observational data. This tool has great value for comparative biologists since they are often unable to gather experimental data on macro-evolutionary hypotheses, but is cumbersome and error-prone to perform. I introduce

The comparative method is a critical tool to answer macro-evolutionary questions and has been since the start of evolutionary biology itself (

Consider a minimal example, where A causes B and B causes C, i.e., A → B → C. Since there is no direct causal link between A and C, only through B, this causal model predicts that A and C are independent, given B. This prediction can be tested with the regression model

In comparative biology normal regression models cannot be used for path analysis since the assumption of independence of observations is violated, as closely related species are expected to be more similar (

By its nature, PPA is complicated, time consuming and error prone. For the worked exercise in the book chapter outlining the method (

I will illustrate the use of the package by recreating a small part of the analysis by

The data used in the study is included in the package as

Variable | Description |
---|---|

Br | Brain size |

B | Body size |

P | Population density |

L | Litter size |

G | Gestation period |

W | Weening age |

Status | Vulnerability to extinction, as Red list status |

I start out by defining various relationships common to all causal models. I assume that brain size is caused by body size (a result of allometry), gestation length is a causal parent of both litter size and weening age and that body size is a causal parent of population density, since these are all well-established relationships in the literature. I want to control for allometric effects of body size, and therefore include a direct effect of body size on status and an indirect effect through litter size. Additionally I also assume that the population density and life history variables all affect the vulnerability to extinction (which I will refer to as

Since I am interested in testing for direct and indirect effects of brain size, I will vary those effects. Following the original authors, when considering indirect effects, brain size is a causal parent of litter size, gestation period and weaning age. When looking at direct effects, brain size is directly causally linked to status. This then leaves me with four causal hypotheses: a null model where brain size is irrelevant, a model with a direct effect, a model with indirect effects and a model with both.

I define these models using the

It is easy to forget a path, or to make a typo. It is therefore good to make a quick plot to check. You can either plot a single model with, e.g.,

The model set is laid out algorithmically (A) and manually (B).

The nodes are laid out algorithmically. I mimic the lay-out used in the paper by manually defining the coordinates in a `positions <- `

Defining your model set is perhaps the most crucial part of PPA. Since the method is confirmative and not explorative, you want to strike a good balance between complexity and interpretability.

`p <- `phylo_path (m, red_list, red_list_tree)

Printing the result gives us some basic information:

```
p
## A phylogenetic path analysis, on the variables:
## Continuous: G W B L P Status Br
## Binary:
##
## Evaluated for these models: null direct indirect both
##
## Containing 36 phylogenetic regressions, of which 18 unique
```

More importantly, asking for its `s <- `

The summary reports the results table as used by

In this example, there is strong support for the indirect pathway. The addition of the direct path in the

So what is the best causal model? Firstly, the null and direct models are not supported since they have significant

After I have found my final model, I can estimate the relative importance of each of the paths. To estimate the paths in the highest ranked model, use the `b <- `

This will return both the standardized regression coefficients, as well as their standard errors. The resulting plot is shown in

A visualization of the best supported causal model, and the standardized path coefficients.

A second way to look at a fitted model is to more directly look at the standardized coefficients and errors of the paths using

Standardized path coefficients and their standard errors, for the best supported model (A) and the average of the top two models (B).

coef_plot (b, error_bar = "se" , order_by = "strength" , to = "Status" ) + ggplot2:: coord_flip ()

In many cases it may not be obvious or correct to choose one model. While in this case the two top competing models were nested, they do not have to be. In cases like these, it may be useful to perform model averaging instead, as discussed and used in the original paper (

In this case, I could choose to average the two competing models. I use full averaging, as I would like uncertain paths to experience shrinkage, and re-evaluate the strength of the coefficients toward `avg <- `

The

A clear rejection of the null model indicates that brain size is related to the vulnerability to extinction of mammals, where large-brained animals high a higher vulnerability. This effect is mediated through life history, where the weaning and gestation periods are more important than litter size. There is no strong evidence in support of a direct effect of brain size on vulnerability to extinction that is independent of life history. The original analysis came to the same conclusion.

Both continuous and binary data can be included in path analyses performed with

For example, perhaps instead of having actual body sizes, perhaps I only knew whether the animals are small or large. Below I make this new variable, and again run the same `red_list2 <- `

Printing now shows:
```
## A phylogenetic path analysis, on the variables:
## Continuous: G W L P Status Br
## Binary: B
##
## Evaluated for these models: null direct indirect both
##
## Containing 36 phylogenetic regressions, of which 18 unique
```

This confirms that body size is now modeled as a binary variable. All following analyses will take this into account automatically. Note that path estimates toward binary variables are on a logit scale.

Using a variable with more than two levels is not supported and will result in an error.

The estimated phylogenetic parameter can be found in the `p`

In addition to the functions outlined above, several lower level functions are also available to the user, specifically

Furthermore, the

I have presented

I thank Alejandro Gonzalez-Voyer and Achaz von Hardenberg for their help during the development of the package. I thank Niclas Kolm and Alejandro Gonzalez-Voyer for their helpful comments on the manuscript and their support.

The author declares that he has no competing interests.

The following information was supplied regarding data availability:

All code and data is available in the R package: