Strict consensus trees based on a sample of equally parsimonious trees is still the standard in palaeophylogenetics. However, given the result of this analysis, unknown amount of inferred (12,000+ steps-long) MPTs, a CI approx 0, i.e. essentially all scored traits are homoplasious, synapomorphies very rare, and a RI = 0.59, the taxon-pruned (for hand-selected rogues) strict consensus tree cannot even come close in depicting the actual differentiation signal in the matrix.
Alternative ways to summarise MPT samples include the Adams consensus tree and the consensus network (Holland & Moulton in Algorithms in Bioinformatics: Third International Workshop, WABI, Budapest, Hungary Proceedings. p. 165–176; implemented in free-software Java-programme SplitsTree: www.splitstree.org). Why not give it a try?
Neighbour-nets (Bryant & Moulton, Mol. Biol. Evol. 21: 255–265; also implemented in SplitsTree) focussing on certain evolutionary lineages or phylogenetic neighbourhoods may help to assess how much of the tree structure relates to overall similarity/dissimilarity and allow quick assessment of coherence of assumed clades and their topological alternatives. Note that at this point the complete matrix is much too gappy to infer a sensible all-inclusive distance matrix (404 out of 505 OTUs have a proportion of missing data >50%; and 120 lack more than 90% of the scored characters), which also can explain the extremely weak standard parsimony result. The focus taxon has only 62% missing data, the 266 defined characters should be enough to infer its phylogenetic position.
Any phylogenetic matrix used to infer trees, especially one with these dimensions and properties, should be subdued to some branch support assessment (a Bayesian analysis, ML or LS/NJ bootstrapping would have been quick to do, or dirty parsimony bootstrapping, Müller BMC Evol Biol 5:58 2006). Evolution cannot be overly parsimonious, noting the apparent scarcity of clade-unique traits, hence, a CI ~0. Establishing branch and character support is a common standard in Neophylogenetics, it needs to be applied in palaeontology, too (even when we cannot hope to get high values).
Another underexplored angle would be to divide the taxon sets into time slices and first establish relationships within a time slice before trying to connect the time slices.
The signal from morphological matrices, especially extremely gappy ones like this one, is rarely very tree-like and surely not trivial. The matrix is great, the fossil as well, but what information is provided by a naked (lacking branch support) parsimony strict consensus cladogram (tree without branch-length reflecting the inferred number of changes) with virtually no data-signal-consistence?