Reproducibility in phylogenetics: reevaluation of the largest published morphological data matrix for phylogenetic analysis of Paleozoic limbed vertebrates
- Published
- Accepted
- Subject Areas
- Biodiversity, Ecology, Evolutionary Studies, Paleontology
- Keywords
- phylogenetics, data matrix, morphology, Lissamphibia, Amphibia, Temnospondyli, Lepospondyli, Anthracosauria, reversal, phylogeny
- Copyright
- © 2018 Marjanović et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2018. Reproducibility in phylogenetics: reevaluation of the largest published morphological data matrix for phylogenetic analysis of Paleozoic limbed vertebrates. PeerJ Preprints 6:e1596v3 https://doi.org/10.7287/peerj.preprints.1596v3
Abstract
The largest published phylogenetic analysis of early limbed vertebrates (Ruta M, Coates MI. 2007. Journal of Systematic Palaeontology 5:69–122) recovered e.g. Seymouriamorpha, Diadectomorpha and (in some trees) Caudata as paraphyletic and found the “temnospondyl hypothesis” on the origin of Lissamphibia (TH) to be more parsimonious than the “lepospondyl hypothesis” (LH) – though only, as we show, by one step.
We report thousands of misscored cells, most of them due to typographic and similar accidental errors. Further, some characters are duplicated; some have only one described state; for some, most taxa were scored after presumed relatives. Even continuous characters were unordered, the effects of ontogeny were not sufficiently taken into account, and data published after 2001 were mostly excluded.
After these issues are improved – we document and justify all changes to the matrix –, but no characters are added, we find (Analysis R1) much longer trees with e.g. monophyletic Caudata, Diadectomorpha and (in some trees) Seymouriamorpha; Ichthyostega either crownward or rootward of Acanthostega; Anthracosauria either crownward or rootward of Temnospondyli; the LH is 9 steps shorter than the TH (R2; constrained) and 12 steps shorter the “polyphyly hypothesis” (PH – R3; constrained). Brachydectes (Lysorophia) is not found next to Lissamphibia; instead, the sister-group of Lissamphibia is a large clade that includes adelogyrinids, urocordylid “nectrideans” and aïstopods.
Adding 56 OTUs to the original 102 increases the resolution. The added taxa range in completeness from complete articulated skeletons to an incomplete lower jaw. Even though the lissamphibian-like temnospondyls Gerobatrachus, Micropholis and Tungussogyrinus and the extremely peramorphic salamander Chelotriton are added, the difference between LH (R4) and TH (R5) rises to 10 steps, that between LH and PH (R6) to 15; the TH also requires several more regains of lost bones than the LH.
Most bootstrap values are low, and plummet when taxa are added. Statistically, the TH (R2, R5) is not distinguishable from the LH or the PH; the LH (R1) and the PH (R3) may be distinguishable from each other under the original taxon sample at p ≥ 0.04. A test for the upper bound of the p value is not available.
Bayesian inference (Analysis EB, same settings as R4) mostly agrees with R4. High posterior probabilities are found for Lissamphibia (1.00) and for the LH (0.92); however, many branches remain weakly supported, and most are short, as expected from the small character sample.
We discuss phylogeny, approaches to coding, methods of phylogenetics (Bayesian inference vs. equally weighted vs. reweighted parsimony), some character complexes, and prospects for further improvement of this matrix. In its present state, even after our changes, the matrix cannot provide a robust assessment of the phylogeny of early limbed vertebrates; sufficient improvement, however, will be laborious but not difficult.
Author Comment
This is the third preprint version, identical to the fourth revision which is about to be resubmitted to PeerJ for peer review, except that the text predates the last round of editing: in particular, some of the links don't go where they claim to go – in all these cases, the links are correct, and the text is not – and the reference to ancestors on p. 60 has become meaningless and will be deleted.
We did not turn the second or the third revisions into preprints, so a lot has changed since the second preprint version. Apart from further corrections to the matrix and far-reaching clarifications in the text (including but not limited to the Introduction, the Conclusions and Appendix 2 – the matrix in human-readable format), we have added:
– a Bayesian analysis;
– a discussion of the suitability of Bayesian inference and of parsimony with implied weights for our dataset and others like it;
– and more taxa (Casineria, Llistrofus, Bystrowiella, Coloraderpeton, Pseudophlegethontia, Perittodus, Diploradus, Aytonerpeton – the first two were previously included as parts of other OTUs, but are now separate).
The anthracosaur tail CM 34638, previously used to score one character of Archeria, is now removed.
We have also removed the analyses with irreversible bone losses to make the manuscript shorter and less complex.
An important addition to the acknowledgments is José Grau (Museum für Naturkunde, Berlin), who performed our bootstrap and Bayesian analyses much faster than we could have: in less than a day each instead of a week each.
Supplemental Information
Revised data matrix for parsimony analyses
NEXUS file (plain text) containing our revised data matrix (the machine-readable version of Appendix 2), one MPT from each of Analyses R1–R6, the bootstrap trees from B1 and B2, and the settings used for these analyses. Executing the file in PAUP* repeats Analyses R1–R6 in this order and then performs the statistical comparisons (Kishino-Hasegawa test, Templeton test, winning-sites test) of the trees that are already stored in the file.
Excel file containing our measurements and their ratios relevant to characters PREMAX 7 and SKU TAB 1
On the sheet “Data”, the OTUs are listed such that the line numbers are the same as the numbers the OTUs have in the matrix; OTUs that cannot be measured for any of the parameters are represented by blank lines. Calculations are underlain in yellow or blue. The raw measurements, in cm, will mostly be difficult to reproduce: they were taken from illustrations (we preferred reconstruction drawings to avoid the effects of diagenetic distortion) on paper or on a screen, in the latter cases usually but not always at a magnification such as 150%, 200% or 300%. The ratios, however, should be fairly well reproducible. Column B is the distance (at a right angle to the sagittal plane) between the lateral extremities of the premaxillae, measured in ventral view when the premaxillae are insufficiently exposed in dorsal view. Column C is the maximum width of the dermatocranium in dorsal view. Column D is the maximum width of the skull table; when sharp edges between the table and the temporal regions are absent or unknown, this can be measured across the “tabular horns”, across the supratemporals across the rostral ends of the temporal notches, or across the intertemporals, whichever is widest. When possible, we have consulted lateral views to determine where the dorsal and the lateral surfaces of the skull roof meet. Column E cites our sources (all of them are also cited in the main text and/or Appendix 1 and therefore listed in the References section). Column G is the ratio of premaxillary width to skull roof width (B divided by C), which we decided to use as the raw data for PREMAX 7 (Appendix-Table 1). Column H is the ratio of premaxillary width to skull table width (B/D). Column I is the postorbital skull table length, in other words the rostrocaudal distance between the caudal margins of the orbits or orbitotemporal fenestrae (averaged if necessary) and the caudal end of the skull table in the midline. In salientians, the rostral margin of the otic capsule was assumed to lie at the caudal margin of the orbitotemporal fenestra, not at that of the lateral process of the parietal which covers only the caudal or caudomedial part of the otic capsule. Column J is the postorbital skull roof length, in other words the rostrocaudal distance between the caudal margins of the orbits (averaged if necessary) and the caudalmost extent of the dermatocranium, which may be the caudal end of the skull table in the midline, the tips of “tabular horns” (averaged if necessary), or the caudal ends of the suspensoria excluding the quadrates (averaged if necessary). Column K is the ratio of skull roof width to postorbital skull roof length (C/J). Column L is the ratio of skull table width to postorbital skull table length (D/I), which we decided to use as the raw data for SKU TAB 1 (Appendix-Table 3). Column M is the ratio of skull roof width to postorbital skull table length (C/I), and column N is the ratio of skull table width to postorbital skull roof length (D/J). OTUs are represented by their most complete and skeletally most mature known members, except that Sauropleura is represented by S. scalaris rather than the morphological outlier S. pectinata (which is measured in line 153); Dendrerpetidae is represented by Dendrysekos, Albanerpetidae by Celtedens, *Caseasauria by Eothyris.
On all other sheets, OTUs scored 0 by RC07 are underlain in blue (for PREMAX 7 on the sheets “pmx-roof” and “pmx-table”, for SKU TAB 1 on the others), OTUs scored 1 by RC07 are underlain in yellow, and OTUs scored as unknown by RC07 as well as those that we have added retain a white background. On each sheet, the values from one of the calculated columns on “Data” are ordered by size in column B (from highest to lowest for PREMAX 7, from lowest to highest for SKU TAB 1, in agreement with the original state definitions) and plotted; the line between the data points is of course meaningless, but we included it in order to see more easily where morphological gaps would lie. Column C on the sheets “pmx-roof” (for PREMAX 7) and “po table” (for SKU TAB 1) shows the state we have assigned to each OTU. Sheet “pmx-roof” is column G of the sheet “Data”, “pmx-table” is H, “po roof” is K, “po table” is L, “roof width, table length” is M, “table width, roof length” is N.
Revised data matrix for Bayesian analysis
NEXUS file (plain text) containing our revised data matrix, a tree from Analysis EB in two formats (one with the posterior probabilities as node labels), and the settings used for this analysis. The stepmatrix characters (32 and 134 in both Appendices and Data S1) are split into two or three characters each, as explained in the text and Appendix 1. Executing the file in MrBayes repeats Analysis EB.
Original data matrix of RC07. NEXUS file (plain text) containing the original data matrix of RC07, a tree from Analyses O1 and O2 each, as well as the settings for these analyses
Executing the file in PAUP* repeats these analyses in this order and then performs the statistical comparisons of the trees that are already stored in the file.
Homoplasy per character in our revised matrix given an MPT from Analysis R1 (unconstrained, original taxon sample) and one from R4 (unconstrained, expanded taxon sample)
The line numbers are the numbers of the characters (1–277). Column A shows the minimum of necessary steps per character (its number of states minus one); column B shows the steps actually found in R1 or R4 on the trees included in Data S1; column C is the difference between A and B, sorted from lowest to highest in column E, counted in columns G and H and plotted in the file as well as in Fig. 22. The line between the data points in the plots is meaningless, but makes it easier to compare the distributions to an exponential curve. Compare Goloboff, Torres & Arias (2017: fig. 1(a)).