0
How meaningful can such a parsimony strict consensus tree be?

Strict consensus trees based on a sample of equally parsimonious trees is still the standard in palaeophylogenetics. However, given the result of this analysis, unknown amount of inferred (12,000+ steps-long) MPTs, a CI approx 0, i.e. essentially all scored traits are homoplasious, synapomorphies very rare, and a RI = 0.59, the taxon-pruned (for hand-selected rogues) strict consensus tree cannot even come close in depicting the actual differentiation signal in the matrix.

Alternative ways to summarise MPT samples include the Adams consensus tree and the consensus network (Holland & Moulton in Algorithms in Bioinformatics: Third International Workshop, WABI, Budapest, Hungary Proceedings. p. 165–176; implemented in free-software Java-programme SplitsTree: www.splitstree.org). Why not give it a try?

Neighbour-nets (Bryant & Moulton, Mol. Biol. Evol. 21: 255–265; also implemented in SplitsTree) focussing on certain evolutionary lineages or phylogenetic neighbourhoods may help to assess how much of the tree structure relates to overall similarity/dissimilarity and allow quick assessment of coherence of assumed clades and their topological alternatives. Note that at this point the complete matrix is much too gappy to infer a sensible all-inclusive distance matrix (404 out of 505 OTUs have a proportion of missing data >50%; and 120 lack more than 90% of the scored characters), which also can explain the extremely weak standard parsimony result. The focus taxon has only 62% missing data, the 266 defined characters should be enough to infer its phylogenetic position.

Any phylogenetic matrix used to infer trees, especially one with these dimensions and properties, should be subdued to some branch support assessment (a Bayesian analysis, ML or LS/NJ bootstrapping would have been quick to do, or dirty parsimony bootstrapping, Müller BMC Evol Biol 5:58 2006). Evolution cannot be overly parsimonious, noting the apparent scarcity of clade-unique traits, hence, a CI ~0. Establishing branch and character support is a common standard in Neophylogenetics, it needs to be applied in palaeontology, too (even when we cannot hope to get high values).

Another underexplored angle would be to divide the taxon sets into time slices and first establish relationships within a time slice before trying to connect the time slices.

The signal from morphological matrices, especially extremely gappy ones like this one, is rarely very tree-like and surely not trivial. The matrix is great, the fossil as well, but what information is provided by a naked (lacking branch support) parsimony strict consensus cladogram (tree without branch-length reflecting the inferred number of changes) with virtually no data-signal-consistence?

waiting for moderation
2 Answers
3
Accepted answer

In addition to David's answer, I'd like to say that I agree a strict consensus tree is a poor representation of the actual phylogenetic signal. This is why over one hundred alternative topologies were listed in the discussion with the number of additional steps necessary to constrain them. My goal was to help the reader understand which hypotheses were likely given the dataset and which ones would require a significant number of newly recognized characters to support. In this way, our topology is better seen as a hierarchy of more and less probable alternatives instead of a single 'new correct cladogram'. A Bootstrap or Bremer diagram would have been relatively useless with so many fragmentary OTUs able to move a number of nodes in a single step. Using constraint tree lengths bypassed the issue. That being said, it would be interesting to perform a Bayesian analysis on the matrix, and such is planned for the next iteration.

waiting for moderation
0

I really recommend you try out the consensus network then, too. They also work for summing up Bayesian tree samples: http://dx.doi.org/10.1016/j.revpalbo.2009.08.007 (see also: https://phylonetworks.blogspot.com/2013/01/we-should-present-bayesian-phylogenetic.html and https://phylonetworks.blogspot.com/2018/01/summarizing-non-trivial-bayesian-tree.html), as well as the supertree/-network approaches for tree samples with different sets of tips (see answer to David above regarding the point about "arbitrary filtering"). The matrix maybe a Swiss cheese but since it has 700 characters, even using smallish subsets with good taxon coverage, or taxon sets relying on proposed relationships, may give quite resolved trees, e.g. only including taxa of clade xx, clade xy, taxa close to the root, oldest taxa, most derived taxa, body-part filtered subsets etc.

Furthermore, I wonder if, for a matrix like this, the evolutionary placement algorithm...

Berger SA, Stamatakis A. 2010. Accuracy of morphology-based phylogenetic fossil placement under Maximum Likelihood. IEEE/ACS International Conference on Computer Systems and Applications (AICCSA). Hammamet: IEEE. p. 1-9; an application (just one fossil): http://dx.doi.org/10.1186/s12862-015-0400-7,

... would not be worth a try. Why not infer a backbone tree using only well-covered taxa, and then optimise the position of all others using EPA, and compare this result with the all-inclusive reconstruction.

During my time I re-analysed quite a bunch of (partly nasty, signal-wise) morphological datasets (out of curiosity), hence, I know that what is true for molecular datasets that PP (strongly) overestimate while BS are underestimating branch support) doesn't apply the same degree to non-molecular data sets: one finds branches with PP < ML-BS, e.g. when A is (morphologically) the ancestor of B and C, you may get split support for both. See also this paper of Zander using binary data:

Zander RH. 2004. Minimal values of reliability of Bootstrap and Jackknife proportions, Decay index, and Bayesian posterior probability. PhyloInformatics 2:1-13.

Hence, I would always run a ML/BS analysis in addition to the Bayesian analysis. For the full data set, classic RAxML needs (using three CPU cores) about 160–320 sec per (Lewis' standard Mk + Gamma model) BS pseudoreplicate (RAxML-ng may be faster).

You read in a (extented) phylip export (I did this with Mesquite using your NEXUS-file), the code line is (for :

raxmlHPC-PTHREADS-SSE3 -T 3 -s Hesperonithoides.epf -m MULTIGAMMA -K MK -n Hesper.MK -f a -x 12321 -# autoMRE -p 11111

To speed up, you can choose -m MULTICAT (-x and -p are arbitrary seed numbers, - autoMRE invokes the extended majority rule bootstop criterion)

Some technical notes in case needed: I'm not sure about MrBayes (haven't used it for some time) but polymorphism, like in RAxML for categorical data, used to/had to be treated as missing data. There are relatively few polymorphic cells in your matrix, so I would not except it will distort the analysis too much.

Regarding consensus networks to sum up the Bayesian tree sample: SplitsTree is limited to max. 4 GB of RAM, so make sure you don't oversample the MCMCMC chain; up to 10,000 Bayesian inferred sampled trees with 500 tips usually still work for thresholds such as 0.20 or 0.33.

I also got the feeling to better sample the tree space with non-trivial non-molecular data, it's better to run e.g. 10 independent runs with no heated chain instead of using the standard set-up; as recommend in an early manual of MrBayes for multistate data (one observation I made back then is that plateaus are quickly reaches but quite rough).

-
waiting for moderation
0

The performance of maximum likelihood with paleontological datasets is probably horrible, judging from how it deals with missing data otherwise:

Simmons MP. 2011a (printed 2012). Misleading results of likelihood-based phylogenetic analyses in the presence of missing data. Cladistics 28(2):208–222. https://doi.org/10.1111/j.1096-0031.2011.00375.x

Simmons MP. 2011b (also printed 2012). Radical instability and spurious branch support by likelihood when applied to matrices with non-random distributions of missing data. Molecular Phylogenetics and Evolution 62(1):472–484. https://doi.org/10.1016/j.ympev.2011.10.017

MrBayes can deal with polymorphism and partial uncertainty now (though I don't know if it just treats them as missing data, which would be the wrong thing to do). The only thing it can't deal with are stepmatrices.

"Why not infer a backbone tree using only well-covered taxa, and then optimise the position of all others using EPA, and compare this result with the all-inclusive reconstruction."

I don't understand what for – in effect this would be an unsystematic test of the EPA. It is well known, in the literature and my own experience, that every taxon in a (paleontological) matrix can have an unpredictable influence on the position(s) of every other taxon. A tree made only from the well-covered taxa would exclude known information and therefore almost certainly be wronger than the best we can do. This is why the matrix of this paper includes practically every taxon and unnamed specimen that doesn't fall victim to Safe Taxonomic Reduction, i.e. scores identically to another terminal taxon except for the distribution of missing data, and why the authors deleted the rogue taxa from the trees after the analysis to make the trees legible, rather than before.

-
waiting for moderation
0

Well, I only read one of the Simmons paper some long way back (no access to paywalled literature anymore), but I can't remember that he actually demonstrated that maximum parsimony performs less horrible.

The argument with "spurios branch support" is funny, since very few parsimony palaeontological trees show any branch support at all. The data sets I looked at had many branches (in the strict consensus trees) with no support whatsover, and some matrices provides signal for topological alternative that received ample support from both MP and ML, but were not present in the originally published naked strict consensus tree.

Let alone Simmons fight for parsimony. From the comparison with molecular data we have learned two things: a) evolution is simply not parsimonious, b) branch-lengths, and the probability to change, plays an important role (effectively this is what TNT's iterative weighting process makes use of) – a tip that is far away from the root (temporally or phylogenetically) has a higher probability to accumulate derived traits (shared apomorphies, homoiologies) than one very close to the root.

Because the processes are pretty much unpredictable for morphological traits, where we also usually have no model of evolution (and may lack neutrally evolved traits), it's impossible to set up a simulation that simulate multistate or binary data that matches real-world palaeontological data sets.

Just do it for whatever dataset you have: do the parsimony bootstrapping and the ML bootstrapping, and correlate both (can be done with e.g. RAxML). See how both support the branches in the consensus tree, but also what alternative they have. My guess is there are situations (especially in the root-proximal region of the tree) where MP will actually outperform ML (or Bayesian) and others (the further we go towards the leaves and the larger our clade become) where it is the opposite: it all depends on the signal, and what Felsenstein (2004) wrote in Inferring Phylogenies still holds: parsimonny is only statically robust when the rate of change is low. Can we assume that for morphological data sets in general to simply wave away application of any other methods to infer phylogenetic relationships than parsimony?

For MrBayes (and RAxML), in case you want to preserve the polymorphism but they are not considered during the run (for molecular data, the polymorphisms have been treated as missing data until MrBayes 3.2 launched in 2012) binarising may be an option. Without counterweighting, it may be overweighting but if there is good signal in the polymorphism, it's the proper way to do it. I don't think that the overweighting really matters a lot (again talking about BS support of competing topologies, not necessarily the consensus tree) Binarisation also allows implementing step-matrices.

The EPA will a) give you already an idea about how roguish a taxon ist (rogues will have many insert points with low probability); b) it avoids the problem of "unpredictable influence on the positions of every other taxon". It works with any input tree you fancy. In the old version, one could do also a parsimony mapping but they found that this is usually a bad idea (since one step never equals one step elsewhere), so they got rid of it.

The standard Mk is not that different from the logic behind parsimony, but the good thing is that it is not branch-length ignorant like parsimony.

"Unpredictable influence on the positions of every other taxon" is indeed a problem, and as far as I can tell, quite ignored in palaeontological literature (although papers spend a lot of time saying what happens if). I see the same basic problem here as for taxon-subsampling in general: if adding or removing a taxon (or changing the outgroup) triggers massif topological changes, the matrix' signal has a first-level problem that should not just be ignored but explored. Most rogues, however, just collapse the (strict) consensus tree but hardly effect the core structure of the tree, the branch support patterns, in case your data matrix provides some coherent treelike signal.

-
waiting for moderation
3
Accepted answer

All else being equal, the CI is indirectly proportional to the size of the matrix. The CI found here is not surprising at all. This is why the RI and the RC were invented.

A low CI does not at all mean that synapomorphies are rare! It only means that <b>unique</b> synapomorphies are rare – that characters which change states less often than their number of states are rare. Given the size of the taxon sample, that had to be expected. If the CI were much higher, I would have to seriously question the quality of the coding and scoring.

However, the fact that the total number of most parsimonious trees remains unknown does mean that the strict consensus may be inaccurate. (It is not necessarily.)

The Adams consensus is great for identifying wildcard taxa (whether terminal taxa or larger clades), which is its originally intended purpose as far as I know. It is not meant to represent any kind of stable subtree; indeed, it routinely contains "clades" that don't occur in any MPT. I have used Adams consensus trees to great effect for identifying wildcards in my own work, but have not published them, because they simply aren't phylogenetic trees.

The behavior of Bayesian inference with matrices whose missing data are distributed neither randomly nor by rate of evolution has never been tested in a quantified way and continues to be poorly understood. Anecdotally, it seems to me that it takes some of the mistakes parsimony makes and doubles or triples down on them till they get posterior probabilities > 0.9. For a more detailed discussion with references, please see my paper: https://peerj.com/articles/5565

"which also can explain the extremely weak standard parsimony result"

Well, sort of. The more characters are scored for a taxon, the higher the probability that enough important apomorphies will be scored to place it; but wildcard behavior is not well correlated to the proportion of missing data. In the paper mentioned above, some complete skeletons had unstable positions, while even partial isolated lower jaws usually had fully resolved ones.

"Another underexplored angle would be to divide the taxon sets into time slices and first establish relationships <i>within</i> a time slice before trying to connect the time slices."

...Why would anybody do that? As far as I can see, this amounts to nothing but an arbitrary exclusion of taxa, repeated for a few dozen time slices.

"tree without branch-length reflecting the inferred number of changes"

I don't know about TNT, but if you load the matrix and the tree in PAUP* and tell it to display the tree with branch lengths, it'll do so. With a tree this size it might take a few seconds.

waiting for moderation
0

It's some time I read Hennig's book, but isn't uniqueness a necessary property of a synapomorphy? I learned that a synapomorphy is a uniquely derived trait shared by all members of a monophylum (hence, quite rare, especially when we combine taxa of different age). When you have a character that changes once from 0 -> 1 and then, within the state-1-subtree, to 2 (i.e. two Dollo-like mutations), you have one symplesiomorphy and one synapomorphy, not two synapomorphies. Values like the CI have been invented to see how well the data fits the tree. Expected or not, if it converges to zero, it means there's a poor fit.

Adams vs. strict – I think we can agree that the high number of MPTs relate directly to rogue taxa (I suppose that is the same than wildcard taxa); so, given that taxa are excluded because they mess up the tree, the Adams strict consensus tree is clearly the better choice.

But what is your argument against summarising the MPT sample using consensus networks? (noting that a strict consensus network will possibly hit memory constraints of SplitsTree in this case)

Note that just because a branch is found in all MPTs, this doesn't mean it's best-reflecting the differentiation signal in the matrix. Also strict consensus trees can show wrong branching patterns, e.g. if there are long-branch attraction issues (or, more likely in palaeomorphological datasets: short-branch culling due to the lack of discriminating signal in the matrix). If a branch in a strict consensus tree has virtually no character support/branch support, it should not be viewed in the same way as one that has, e.g. a BS > 40 (which can be a lot for a matrix with 72% of entirely undefined or polymorphic cells)

time-slice subsets – Why is this an "arbitrary exclusion"? All molecular trees (those not including ancient DNA samples) represent a time-slice taxon set.

I know that many palaeontologist reject the notion of ancestors in the fossil record but if you have an old primitive form together with its (much) more derived modern counterparts, it may act as a rogue since it is equally related to all its descendants (whether they are direct or just form-wise descendants) but even more so to early, equally underived sister lineages. Using the strict consensus tree of all MPTs, you collapse inferred ancestor-more than one descendant relationships to polytomies. Plus, you invite branch attraction effects: primitive members of different lineages are drawn together, and derived ones may trigger long-branch attraction. By limiting your taxon set to same time-slices, you eliminate such time-related topological effects. In addition, there may be convergent developments at different times (just take our Osmundaceae dataset here on PeerJ (https://peerj.com/articles/3433/), which has much less characters, but equally complex signals)

Let's say you split the taxon set into arbitrary time frames, and infer a tree on each. Using the superconsensus tree or network approaches one can summarise those time-restricted trees. Why not just give it a try? I think the gappyness of the data is a strong argument for inferring more than one tree, and sum them up.

In general, I don't follow the argument against filtering. If the (let's say) strict consensus tree only has true branches, i.e. reflecting the actual lineage splitting over space and time, then each taxon subset tree would need to show those branches. Or, at least, don't show conflicting alternatives with higher branch support. If it does, the analysis (and the strict consensus tree) has a problem.

"the tree with branch lengths, it'll do so. With a tree this size it might take a few seconds." – so why no just showing it in the papers? Why using a cladogram if you can show a phylogram? Wouldn't it support the placement hypothesis, if the reader could directly depicts the branch lengths of the focal subtree?

-
waiting for moderation
2

At least as commonly used in vertebrate paleontology, a synapomorphy doesn't need to be unique nor shared by all members of a clade. So flight could be a synapomorphy of a group of theropods, even if also found in bats and if lost in taxa like ostriches. What counts as a symplesiomorphy versus a synapomorphy depends on your point of reference. So within the crown clade of living birds, flight is a symplesiomorphy, but if you go back far enough in the Mesozoic flight is a synapomorphy of a certain clade of theropods.

Regarding Adams consensus trees, they place unstable taxa at the base of the largest clade they can belong to. The problem is that you can't tell if a taxon is actually resolved as being at that basal level or whether it could go in multiple more derived positions but is being placed basally due to the Adams consensus rules.

-
waiting for moderation
0

Well, we relax a lot of concepts to match the complexity of what we (can) observe.

I do understand the problem with the Adams consensus (this would need to be considered in the interpretation but I don't see what you gain by using the strict consensus. Eg. when your data clearly shows A + (B + C) but D is unstable in that subtree, the strict consensus shows and A-B-C-D polytomy, hence, you lost the information about how A, B and C relate to each other. The Adams will have this information.

Anyway, this is a further argument for changing to consensus networks in general because they not only provide you this information but also inform you about the interactions between different taxa when it comes to the preferred topology or its equally good alternatives.

-
waiting for moderation
1

"It's some time I read Hennig's book, but isn't uniqueness a necessary property of a synapomorphy?"

Few people these days have read Hennig's book, for the same reason that few have read any of Darwin's: they're only interesting as part of the history of science. Except for a few quotes found elsewhere, I haven't read them myself.

It is easily possible that Hennig defined synapomorphies by uniqueness. After all, he believed that homoplasy does not exist – that if you get any in your tree, you need to redo your entire matrix from scratch, because you've mistaken something merely similar as fully identical. In the molecular realm, we now know that this is complete nonsense, and it follows that the same holds for morphology.

Personally, I do follow a terminological distinction Hennig made: one clade has autapomorphies (auto- = self), two sister-groups have synapomorphies (syn- = together). But this level of precision is hardly ever needed; I think Hennig had a background of German Idealism and liked to invent terminology for the fun of it.

"what is your argument against summarising the MPT sample using consensus networks?"

I, for one, don't have any; I'm not familiar with consensus networks and need to look into them.

That said, I don't think the fossil record of the taxa sampled in this paper is dense enough to contain any traces of reticulate evolution. If all Paleo- and Eocene birds were sampled, that could be different.

"long-branch attraction issues"

Absolutely, these are real; parsimony is more sensitive to them than model-based methods; and while they're much more common with molecular than with morphological data, they do exist in the latter, too. The best that can be done about this problem is dense sampling of taxa and characters to break up long branches.

"time-slice subsets – Why is this an "arbitrary exclusion"? All molecular trees (those not including ancient DNA samples) represent a time-slice taxon set."

...But that's a bug, not a feature! Whenever molecular phylogeneticists have ancient DNA at their disposal, they use it. They just almost never have any, so they do the best they can and sample the extant taxa.

In principle, it might be interesting to do a phylogenetic analysis of a time slice in the past in order to test how hopeless the task of the molécularistes is. But the fossil data aren't good enough for such a test.

"many palaeontologist reject the notion of ancestors in the fossil record"

This is a misunderstanding.

1) Compared to the diversity of life today, our sample of any other time slice is so small that, statistically, we should expect to find few if any ancestors of any known taxon except in exceptional cases (say large mammals of the Late Pleistocene, or Pliocene midocean diatoms).

2) Outside such exceptional cases, every time a fossil has been proposed to be an ancestor, even if it was explicitly claimed to lack autapomorphies, it was later found to have autapomorphies after all. (Archaeopteryx is the most prominent example.) Obviously, autapomorphies aren't proof that a taxon died without issue, but they do make that hypothesis more parsimonious.

3) If we don't find autapomorphies – which, again, is actually really rare outside the mentioned kinds of exceptional cases –, we can always blame preservation: perhaps all the autapomorphies were in the soft anatomy or just the DNA and aren't preserved.

Zero-length terminal branches are not common in morphological phylogenetic analyses, even though most matrices are made for parsimony and therefore don't code known unique autapomorphies of terminal taxa.

"By limiting your taxon set to same time-slices, you eliminate such time-related topological effects."

But you invite long-branch attraction by refusing to break up the long branches you have.

"If the (let's say) strict consensus tree only has true branches, i.e. reflecting the actual lineage splitting over space and time, then each taxon subset tree would need to show those branches. Or, at least, don't show conflicting alternatives with higher branch support. If it does, the analysis (and the strict consensus tree) has a problem."

The problem is character conflict, which is real. Trees made from subsets of the taxa will not show us which signal is phylogenetic and which are homoplastic; to the contrary, if the phylogenetic signal isn't much stronger than all the noise, they will suffer from accidental sampling bias and end up preferring one of the homoplastic signals over the phylogenetic one just by chance. I've expanded on this in the introduction to my paper cited above.

The solution to noise is to add more noise. In the immortal words of Thomas Holtz, the signal adds up, the noise cancels itself out – if and only if there is enough noise to do statistics with. If your matrix is too small and the phylogenetic signal in it is not unrealistically strong, you run a serious risk of accidental sampling bias.

"Why using a cladogram if you can show a phylogram?"

Probably just out of tradition. I'm not sure if parsimony branch lengths have ever been published in paleontology.

The tradition does have a reason, though: we know how small our matrices are. Add a few characters, and the lengths would be noticeably different. We also know how gappy our matrices are: if we had all the data we lack, almost all branches would be longer, but not all by the same or any other predictable amount. Given that the branch lengths are (generally) more sensitive to these problems than the topology, it makes less sense to publish them.

Model-based approaches try to compensate for all this. They put out branch lengths in expected changes per character, not in actually observed ones. Therefore, when vertebrate paleontologists run Bayesian analyses, we do publish these calculated branch lengths (not only because that's how Bayesian trees are published elsewhere). How well this attempted compensation really works with paleontological datasets has not been tested.

-
waiting for moderation
0

"Eg. when your data clearly shows A + (B + C) but D is unstable in that subtree, the strict consensus shows and A-B-C-D polytomy, hence, you lost the information about how A, B and C relate to each other. The Adams will have this information."

Well, sort of: it will show A, (B + C) and D as a trichotomy, but it won't tell you whether D can be inside (B + C), and it won't tell you if D can be the sister-group of (A + (B + C)) among other things.

This is why I published neither strict nor Adams consensus trees in my cited paper, but actually looked at a representative sample of MPTs and made tree figures that show exactly where each taxon can and cannot go. However, that was a lot of work, even though I never got beyond 158 taxa or 52,000 MPTs.

-
waiting for moderation
1

Then you will love consensus networks, because this is what they do.

You read in the MPT sample – SplitsTree can read in NEXUS and NEWICK-formatted files; NEWICK has less problems; NEXUS needs to be old-NEXUS format with no annotations and commentaries, otherwise there may be import glitches: using PAUP*'s "savetrees" worked fine so far as export (as long as the taxon names were without special characters, just standard latin letters, digits, and underlines).

When you choose a threshold of 0, you get a graph that shows all topological alternatives found in this MPT sample. When you increase the threshold, the consensus networks only show the bipartitions that have an according frequency in the MPT sample. This may be necessary for large tree samples with many leaves and very little topological consistence (or many rogues) because you otherwise hit the RAM-restrictions of the software.

Examples for a strict consensus networks based on MPT samples can be found here:

http://phylonetworks.blogspot.com/2018/04/the-curious-cases-of-tree-like-matrices.html

https://phylonetworks.blogspot.com/2017/10/networks-not-trees-identify-weak-spots.html

http://phylonetworks.blogspot.com/2017/08/more-non-treelike-data-forced-into.html

A very theoretical example is included here: https://phylonetworks.blogspot.com/2019/02/should-we-bother-about-character.html

A simple example with one rogue illustrating the difference between strict consensus tree and network can be found in fig. 2 in this post: http://phylonetworks.blogspot.com/2017/10/clades-cladograms-cladistics-and-why.html

-
waiting for moderation
0

A few more points:

Probability to find ancestors – because the sample is very patchy, and, especially with large animals, often singular (one fully preserved skeleton), you will have a high likelihood to find always an autapormorphy. Simply because one has no idea about or can test whether this trait was typical for the entire species (we have stable and instable morphs within a single biological species), in its sister species, sister genera or related higher taxa at the time, before and after.

From the perspective of somebody who worked with plants and foraminifers (the latter have very continuous fossil records reflecting evolutionary change in situ over time including unique traits of ancestors, i.e. their autapomorphy as a taxon, getting lost in all their descendants, I find it hard to believe that the fossil record should have only preserved the dead ends, not the lineages that led to past and modern diversity.

Reconstruction-wise, I see no reason to object what Felsenstein wrote in his book: We should not only consider that some fossils represent ancestral forms, we should be encouraged to do so.

Fossil A should not be placed as sister of B + C if its morphology is the primitive form from which B and C can be derived but as the ancestor. Because this reflects the evolutionary pathway/process, and at that point it is irrelevant if fossil A represent actually the source species/population from which B and C evolved, or the sister species/population.

Time-slice: Long-branch attraction is only a major issue for parsimony (and distance-based trees), less so for ML, but the idea of time-slicing is a different. By taking away true or false ancestor-descendant groupings you can see how the lineages relate to each other that actually coexisted, and having the background of the overall relationships, you can easily depict at which point LBA steps in. Which eventually could allow you to define the trait sets that force wrong clades.

Also, LBA has been rarely studied for all-inclusive matrices. Temporal convergences are also a reality when your data set covers a substantial amount of time (like in the Osmundaceae, where an earlier standard phylogenetic study relying on a strict consensus tree included some LBA-triggered clades). If one plays around with different taxon sets, one can get a grip on them, as we did for the Osmundaceae here on PeerJ

"The problem is character conflict, which is real." is why I very early stopped using trees and went networks. Even dichotomous evolution, with no reticulation involved, can produce data with non-treelike signal riddled by internal data incompatibility. Trees cannot handle or depict data incompatibility as they have only one dimension. The Neighbour-net is a simple (even maybe crude) method to handle incompatible data, this is what is was designed for. Consensus networks depict competing topological alternatives, alternatives that may find support by parts of the data and rejected by others. A parsimony tree is not designed to cope with such data properties, it assumes that there are always more character splits compatible with the true tree than not (in the latter context, see also the paper by Scotland & Steel, 2015, Circumstances in which parsimony but not compatibility will be provably misleading. Systematic Biology 64:492–504)

The logic that "... the signal adds up, the noise cancels itself out – if and only if there is enough noise to do statistics with" (a mathematician once told me, that as soon as you need more than simple statistics to make sense out of your data, you have already lost) has failed in neontology quite often (in very different fields). And we "molecularists" have much better data in the sense that we have much more (with NGS extremely more) characters, which underly very trivial evolutionary processes, and mostly, not always, our mutations can be considered to be neutral. When you have positive selection, its signal adds up and will eventually outcompete the more noisy phylogenetic signal in the rest of the data.

-
waiting for moderation
0

"large animals, often singular (one fully preserved skeleton)" – ha, we wish! Most Mesozoic dinosaur genera are known from a total amount of material like this one (a partial skeleton) or less.

Seriously, the situation with forams is not comparable. We're dealing with terrestrial sediments, mostly rivers. We're very lucky to find anything at all. The chance that anything we find is an ancestor of anything we've already found – doesn't need to be a dead end if we simply haven't found its descendants – is minuscule. This is not merely expected, but confirmed by the continued scarcity of remains that clearly lack autapomorphies.

For forams, or ammonites or conodonts or diatoms or radiolarians or whatnot, my expectations are completely different.

"Fossil A should not be placed as sister of B + C if its morphology is the primitive form from which B and C can be derived but as the ancestor."

Yes, and that is practically never the case in vertebrate paleontology. The morphology of Archaeopteryx is not the primitive form from which birds, or deinonychosaurs or just dromaeosaurids, can be derived from. The morphology of Anchiornis is not the primitive form from which Archaeopteryx can be derived from, and so on and so forth.

"By taking away true or false ancestor-descendant groupings you can see how the lineages relate to each other that actually coexisted"

Not only are ancestor-descendant groupings sadly not an issue because the record is too incomplete, the record is incomplete enough that time slices will equate one locality each unless they're made very wide (say 30 million years, and then you'll still miss whole continents). In this paper's fig. 18, the 8 lineages that are shown as known just from the beginning of the Late Jurassic are from a single deposit; the 14 shown as known just from close to the middle of the Early Cretaceous are from another single deposit that lies geographically right next to the one I just mentioned; this EK deposit also furnishes the only contemporary records of at least 6 other lineages that extend beyond this time in fig. 18. Before the LJ deposit was discovered, the entire Jurassic record of the whole clade shown in that figure consisted of just Archaeopteryx and Compsognathus, which are famously from a single deposit again.

"A parsimony tree is not designed to cope with such data properties, it assumes that there are always more character splits compatible with the true tree than not"

...No, it only assumes that more splits are compatible with the true tree than are compatible with any single other tree. This is why consistency indices aren't above, or anywhere near, 0.5 unless a matrix is very small or blatantly cherry-picked (which hasn't occurred in 20 years).

"has failed in neontology quite often (in very different fields)"

In morphology it's hardly been tried. The neontological morphological datasets are almost all tiny. Molecular data have genome-wide correlations like GC-content biases to deal with.

"When you have positive selection, its signal adds up and will eventually outcompete the more noisy phylogenetic signal in the rest of the data."

Ah, but you have selection for a hundred different things in a dataset of this size. That amounts to noise. (It's also why datasets should be this big.)

-
waiting for moderation
0

In short, your arguments boil down to: vertebrate palaeozoological data matrices are useless for phylogenetic reconstructions.

They are just to gappy (fossil record-wise, data-wise, concept-wise). Ancestors are purely hypothetical, and character direction can only be based on the inferred tree which relies on the same data that one maps. Which makes it 100% circular reasoning. Plus, the topology you use is highly fragile 'cause it changes easily by adding/eliminating characters or taxa.

I do get vertrebrate matrices are very suboptimal for phylogenetic reconstructions, but are they really so bad that one has to skip all safeguards of phylogenetic analysis?

Example for "This is why consistency indices aren't above, or anywhere near, 0.5 unless a matrix is very small or blatantly cherry-picked (which hasn't occurred in 20 years)."

When you eliminate all taxa with more than 15% (or 35%) of missing data from this matrix you end up with a single most parsimonious tree (in case of 15%, it's the actual MPT, inferred using branch-and-bound) and CI = 0.59/RI = 0.57, all but two (terminal) branches with ML(!)-BS > 70 (ML and MPT identical topologically, NJ tree only moves Archaeopteryx one node). Only competing relationships are Struthiomimus sister to Gallimumus (BS = 66) or Dromiceiomimus (BS = 33). MP-BS are naturally lower and the BS consensus network less treelike, but still > 35 for the branches in the tree (incompatible topologies related mostly to IGM10042 and Archaeopteryx).

When we follow your arguments for bigger matrices are better even when we mostly add missing data, this means the 700 characters used to infer the tree are cherry-picked and this tree shows only dubious clades that need to be disgarded when challenged by any competing branch in the full-505 tree although we cannot even measure its branch support.

I maybe too much of a molecularist (more data, lots of randomness)-palaeobotanist (much less characters, much more fossils) to follow such a logic.

Any data can and should be explored, and when your data don't give a relative stable, testable tree (or few alternatives), then inferring a tree is simply not the right thing to do.

-
waiting for moderation
0

"In short, your arguments boil down to: vertebrate palaeozoological data matrices are useless for phylogenetic reconstructions."

I appreciate that it may look like that to someone who's not used to missing data; and indeed, the reason the four largest problems in vertebrate phylogeny (phylogeny of early gnathostomes, mostly the monophyly or paraphyly of the placoderms; phylogeny of early tetrapods with the origins of the extant amphibian groups; phylogeny of early amniotes with the origin of turtles; and phylogeny of squamates with the origin of snakes) haven't been solved is that only paleontological data can do that, and is difficult enough to compile into a useful matrix (e.g. without redundant characters) that that hasn't been done to everyone's satisfaction.

But we're working on it, and we're actually making progress.

"character direction can only be based on the inferred tree which relies on the same data that one maps. Which makes it 100% circular reasoning."

What... no. All the usual methods of phylogenetics produce unrooted trees; these are then rooted on the outgroup, which had to be specified in advance. This is no different with molecular data, of course. Before an analysis you have to be sure that your ingroup is monophyletic with respect to your outgroup; that is not tested by your analysis, and it is the reason why it is so hard to root the entire tree of life (mono- or paraphyly of bacteria).

In other words, the information on what the plesiomorphic state of each character is is read from the tree once you have it; this information does not go into the tree search, so there is no circularity.

Asymmetric stepmatrices force trees to be rooted. I've had that situation and was forced to include the outgroup (a single terminal taxon) as an ancestor in the analysis, because otherwise PAUP* strangely insisted on rooting the trees on the terminal taxon with the largest amount of missing data even though the outgroup was specified. I think the first two preprints of my 2019 paper with Michel Laurin contain this situation – and quite probably they are its only published occurrence, because nobody uses stepmatrices.

"Plus, the topology you use is highly fragile 'cause it changes easily by adding/eliminating characters or taxa."

Yes. This is why we need larger matrices. (501 terminal taxa is impressive; 700 characters with up to 8 states is impressive in absolute terms, but relative to the 501 terminal taxa I find this number discomfortingly low. Alas, adding characters is a lot more work than adding taxa.) Our matrices need to get large enough that small changes no longer matter. Unfortunately, this is so much work that hardly anyone does it (the mentioned paper took 11 years in total) and nobody funds it. It is simply not widely understood how important and how time-consuming morphological phylogenetics is.

"When you eliminate all taxa with more than 15% (or 35%) of missing data from this matrix you end up with a single most parsimonious tree"

Don't do that. You lose actual, important information that way. The number of MPTs is not a quality measure.

In my experience, including the mentioned paper where this is spelled out, how well resolved the position of a terminal taxon is – in other words how many MPTs there are just due to that one taxon alone – does not correlate at all well with its proportion of missing data. Some have little missing data but such an unexpected combination of character states that they jump around anyway. Some have lots of missing data, but their combination of character states fits their most parsimonious position so well that adding these taxa improves the resolution of the tree.

"NJ tree"

What is the point? NJ is even more sensitive to long-branch attraction than parsimony is, and doesn't even tell you if several trees are equally optimal. Its performance with missing data is completely unknown. The only advantage I can think of is that it's faster!

"when your data don't give a relative stable, testable tree (or few alternatives)"

It is testable: scrutinize the dataset and/or enlarge it, and see what happens.

"then inferring a tree is simply not the right thing to do."

You can make a case that it's premature because our matrices are too small. That equates to "glass half empty", of course. I prefer to do it anyway, lay out the problems, and then present this situation to potential funders so I can eventually do it right.

-
waiting for moderation
0

Rooting is not the issue. When you optimise characters on a tree inferred from the very same characters to draw conclusion about evolutionary pathways (the change of morphs through time, which you model with the tree), it's circular.

To de-circularise you need to make a reconstruction without the traits you want to map. Which, in your set-up, is impossible because then you expect a worse tree. Also, you anyway end up with a huge range of equally MPTs because your matrix is indescive for most of the included OTUs. Your approach has an extreme risk of getting lost in the tree space, and, although I'm new to vertebrate morphological matrices, this seems to be exactly the issue.

Adding more and more taxa with less and less information, you end up with more and more pseudo-monophyla: wrong internodes, branches that are just artefacts, such as clades based on positively selected convergences.

If the shared apomorphy (what you call a "synapomorphy", PS even plant lineages have, occasionally, synapomorphies in a strict sense, and molecular dataset have quite a lot of them at various levels) is missing and you have a convergence (or more likely parallelism) then the according taxon will be wrongly placed in the tree (subtree). Adding more characters won't help, because you just add more "?" for that taxon. Adding more taxa won't help, because you have no control whether the new potential shared apomorphy is reflecting a common origin or just another convergence. Worst case scenario is you add a shared apomorphy and two parallelisms, enforcing the wrong clade and adding root-tip distance because your taxon is already in the wrong clade due to the other phylogenetically not well sorted traits you added to the matrix.

Even if you can't accept it, a tree with measurable BS support, inferred with three different optimality criteria and method-independent stable topology is – from a methodological tree-inference point of view – better than a tree cloud of 99,999 equally parsimonious solutions or 100 hand-picked topologies (which effectively is the approach you advocate). It shows what your data prefers as a backbone tree and the BS shows how much support the matrix gives to each branch. Any wrong branch in that tree has a high chance to remain one when adding taxa with (much) less discriminating signal due to increasing proportions of missing data or lead to further branching artefacts when adding incompatible data. It's simply not true that more data means better inferences no matter of the (evolutionary) quality of the data. Not even in the case of genes, a classic paper: Delsuc F, Brinkmann H, Philippe H. 2005. Phylogenomics and the reconstruction of the tree of live. Nature Reviews Genetics 6:361-375 https://hal.archives-ouvertes.fr/halsde-00193293/document

PS There no reason to assume a molecular clade must be wrong, because morphology tells a different story. And it still does, we recently covered a vertebrate example on the Genealogical World of Phylogenetic Networks: https://phylonetworks.blogspot.com/2019/03/has-homoiology-been-neglected-in.html

We can discuss this endlessly, but having looked at the matrix' principal properties (and that of other vertebrate palaeozoological matrices; see the examples including fossils, we covered on the Genealogical World of Phylogenetic Networks), all-inclusive parsimony trees of increasingly large and gappy matrices will make the Rubel roll, 'cause you always get a new tree option, but will not get you any closer to the true tree, you are looking for. You may correct one error in a smaller tree, but at the same time, add new ones (note: that most topological changes you can observe are only because one relies on strict consensus trees of MPTs, when you would have started with a network, you would have seen that there are just two or three alternatives, and the MPTs picked in the first analysis the one, and in the next with a different taxon set, the other)

To explore the space of potential trees one needs to understand which portion of your data support which alternative, which taxa provide stable signals (method-independently to minimise method-inherent shortcomings) and which don't and why. It's easy for 15 or 20 characters (what we molecularists do, when mapping traits on molecular traits to assess how non-parsimonious they are in the modern world), but 700 on 99,999 MPTs is not.

I stand by my original suggestions:

Start with trees that receive ample (can be low as long as there are no competing alternatives) support under different optimality criteria, place the lesser-known OTUs in those trees (I'll upload my results to my morphomatrix-dedicated figshare project, https://doi.org/10.6084/m9.figshare.7067369, and write a post); and then hack-and-cut the supermatrix: Reduce the matrix to a group of interest, explore that subtree in more detail using smaller but less gappy matrices by going networks. Neighbour-nets (extremely fast, but not as vulnerable to topological artefacts as MPTs and NJ trees, they work very well for placing individual fossils) to analyse incompatible signals, consensus networks to summarise the topological alternatives to filter the ones that make sense to proceed on and investigate what is behind the ambiguous support of each branch in your MPT tree sample or preferred tree.

Final step: when you have a series of sensible hypothesis for the parts of the tree, at best reflecting aspects of the true tree, patch all those mosaic trees together in form of a super-network.

Or supertree, if you prefer trees despite two fundamental natural properties of morphological traits that will cloud any true-tree signal, which are:

— sometimes hard to normalise, and involve not rarely an interpretation by the person scoring the matrix (systematic bias)

— being not the product of a strictly parsimonious (requirement to apply NJ or MP) or easy-to-model process (requirement to apply ML or Bayesian inference)

— being not only the product of inheritance (a distinct fixed genetic code) but e.g. environmental parameters (epigenetic phenomena) or gene expression (maybe more an issue for plants, who can't run away, hence, have a natural incentive to de-activate or re-active parts of their lineage-specific genetic code).

— being, in case of very detailed matrices, potentially not representative of their evolutionary lineage at all (I visualised this particular problem one has to live with here: https://researchinpeace.blogspot.com/2018/04/digging-deeper-population-dynamics-and.html).

When dealing with morphology you have to deal with signal incompatibility to a degree that is (much) higher than what we molecularists have at the coal-face of evolution (intra-generic differentiation/speciation, where we have actual reticulation, i.e. already a worst-case scenario for tree-inference) and not that different to what linguists have to deal with. Who, quite often, use networks for phylogenetics (in contrast to palaeontology).

Just give it a try!

99,999 MPTs without any measurable branch support (there are about a dozen branches which have a BS > 15 for the full matrix, again: method-independent, so much for Simmon's old claim ML-BS is inferior to MP when facing missing data) are clearly not the product one can aspire from a 505 x 700 matrix.

- edited
waiting for moderation
1

"When you optimise characters on a tree inferred from the very same characters to draw conclusion about evolutionary pathways (the change of morphs through time, which you model with the tree), it's circular."

Oh, sorry! I agree with this, of course. I didn't understand that you were thinking three steps ahead! I was only thinking of making apomorphy lists or change lists – restatements of the tree, not uses of the tree for further research.

"To de-circularise you need to make a reconstruction without the traits you want to map. Which, in your set-up, is impossible because then you expect a worse tree."

That would be the case if all known characters which are parsimony-informative for the taxon sample were included in the matrix. That's an ideal no analysis has probably reached. In the present case, a lot of known parsimony-informative characters have yet to be added, as you can see from the first "figure" in this post: https://theropoddatabase.blogspot.com/2019/07/phylogeny-of-lori-analysis-1-philosophy.html – maybe the fact that the characters number exactly 700 is not an accident, but a deliberate restriction to avoid even more delays to publication? The most drastic example I found is that 104 such characters from Brusatte et al. (2014) are not yet included.

"Adding more and more taxa with less and less information, you end up with more and more pseudo-monophyla: wrong internodes, branches that are just artefacts, such as clades based on positively selected convergences."

Yes! You have to add characters, too!

"Adding more characters won't help, because you just add more "?" for that taxon."

It is a very important fact that adding characters can change what the most parsimonious trees for the taxa that can be scored for it are, and that in turn can change what the most parsimonious trees for all taxa are, and that in turn can change the optimizations for other characters which another taxon may be scored for and by which those taxa may be placed. There are many papers that have analyzed mixed matrices and found that the addition of molecular data to an otherwise morphological matrix changes the positions of fossil taxa for which molecular data are wholly unknown.

"Worst case scenario is you add a shared apomorphy and two parallelisms, enforcing the wrong clade"

That can happen if the two parallelisms are compatible with each other and if the other 500 parallelisms don't cancel them out.

That, in turn, happens when there are redundant (or less extremely correlated) characters in the matrix. And that's a widespread problem that there's literature about; it is a lot of work to tackle.

"It shows what your data prefers as a backbone tree"

No, because the selection criterion (percentage of missing data) is arbitrary and meaningless.

It might be interesting to instead remove the taxa with the highest percentages of character conflict. Those might be the ones that have undergone the most homoplasy. Actually, now I'm curious: has that ever been done?

"It's simply not true that more data means better inferences no matter of the (evolutionary) quality of the data. Not even in the case of genes"

I know; that's an issue of character correlation, not to mention things like heterotachy in the case of model-based approaches.

"There no reason to assume a molecular clade must be wrong, because morphology tells a different story."

I never claimed there was! And frankly you won't find many people (anymore, as opposed to 20 years ago) who do make that claim.

BTW, in your blog post you claim that "there has (to my knowledge) not been a single morphology-based tree that was fully congruent to a molecular tree with sufficient taxon and gene sampling". There are lots, they just aren't mentioned much because they're expected. The cases where there are discrepancies get a lot of attention, and they are invariably those where the morphology is really difficult. Roland picked an example, crocodylian phylogeny, where it looks like paedomorphosis in gharials (and possibly elsewhere) distorts the morphological tree by creating redundant characters.

"but will not get you any closer to the true tree, you are looking for."

Morphological sarcopterygian phylogeny has steadily approached the molecular tree. 25 years ago, the morphological consensus joined the molecular one on the position of lungfish closer to us than Latimeria. 9 years ago, the morphological consensus joined the molecular one on lissamphibian monophyly with respect to Amniota, and the molecular consensus settled on the existing morphological one on batrachian monophyly (frogs + salamanders) excluding the caecilians. And while such an agreement is still lacking on the position of the turtles, their morphological position is now much, much closer to the molecular one than the two morphological positions that were competing 10 years ago were, while of the two molecular positions that competed back then, the morphologically absurd-looking one (inside Archosauria) is no longer found.

"note: that most topological changes you can observe are only because one relies on strict consensus trees of MPTs, when you would have started with a network, you would have seen that there are just two or three alternatives, and the MPTs picked in the first analysis the one, and in the next with a different taxon set, the other"

This does indeed seem to happen a lot, in my unquantified experience, with trees made from successive updates of the same matrix: there are several nearly equally strong signals in the data, and all but the largest updates just bring out one or another. I agree that it is not enough to present the MPTs (or their consensus) as "the result"; there also needs to be – as there is in the present case – some presentation and assessment of alternatives that are nearly as strongly supported.

"Just give it a try!"

I, for one, will definitely look into it!

"so much for Simmon's old claim ML-BS is inferior to MP when facing missing data)"

Simmons in his two papers found that ML and Bayesian inference are inferior to parsimony when missing data has a very specific distribution. That distribution can occur in molecular or mixed supermatrices (where different taxa are sequenced for different genes), but hardly in morphological ones. When missing data is instead distributed by rate of evolution (definitely not gonna happen in morphology), Bayesian inference outperforms all. I've discussed this in my big paper.

Anyway, I'm looking forward to your post! (And I still owe you answers to your answers to my comments on a post of yours from back in January. I'll try to get to that this weekend... by now we've probably rehashed half of that here...)

-
waiting for moderation