Data from: Phylogenetic sampling affects evolutionary patterns of morphological disparity



Cladistic character matrices are routinely repurposed in analyses of morphological disparity. Unfortunately, the sampling of taxa and characters within such datasets reflects their intended application - to resolve phylogeny, rather than distinguish between phenotypes - resulting in tree shapes that often misrepresent broader taxonomic and morphological diversity. Here we use tree shape as a proxy to explore how sampling can affect perceptions of evolving morphological disparity. Through analyses of simulated and empirical data, we demonstrate that sampling can introduce biases in trait space occupation between clades that are predicted by differences in tree symmetry and branch length distribution. Symmetrical trees with relatively long internal branches predict more expansive patterns of trait space occupation. Conversely, asymmetrical trees with relatively short internal branches predict more compact distributions. Additionally, we find that long external branches predict greater phenotypic divergence by peripheral morphotypes. Taken together, our results caution against the uncritical repurposing of cladistic datasets in disparity analyses. However, they also demonstrate that when morphological diversity is proportionately sampled, differences in tree shape between clades can speak to genuine differences in morphospace occupation. While cladistic datasets may serve as a useful starting point, disparity datasets must attempt to achieve uniformity of lineage sampling across time and topology. Only when all potential sources of bias are accounted for can genuine evolutionary phenomena be distinguished from artefactual signals. It must be accepted that the non-uniformity of the fossil record may preclude representative sampling and, therefore, a faithful characterization of the evolution of morphological disparity.,This record contains: R scripts employed for simulating and analysing discrete character data. Four empirical datasets comprised of discrete character matrices, time-calibrated trees, and first and last occurence dates (i.e. FADs, LADS) R scripts employed for the analysis of this empirical data.,Users should install the latest version of R and the packages listed in the Methods section of the main manuscript. Please pay attention to the package versions specified, as some have since experienced major changes in the names of the functions employed (e.g. Claddis).,
Date made available23 Jul 2021

Cite this