Data from: Probabilistic methods outperform parsimony in the phylogenetic analysis of data simulated without a probabilistic model



In order to understand patterns and processes of the diversification of life we require an accurate understanding of taxa interrelationships. Recent studies have suggested that analyses of morphological character data using the Bayesian and Maximum likelihood Mk model provide phylogenies of higher accuracy compared to parsimony methods. These studies have proved controversial, particularly simulating morphology-data under Markov models that assume shared branch lengths for characters, as it is claimed this leads to bias favouring the Bayesian or Maximum likelihood Mk model over parsimony models which do not explicitly make this assumption. We avoid these potential issues by employing a simulation protocol in which character states are randomly assigned to tips, but datasets are constrained to an empirically-realistic distribution of homoplasy as measured by the Consistency Index. Datasets were analysed with equal-weights and implied weights parsimony, and the Maximum Likelihood and Bayesian Mk model. We find that consistent (low homoplasy) datasets render method choice largely irrelevant, as all methods perform well with high consistency (low homoplasy) datasets, but the largest discrepancies in accuracy occur with low consistency datasets (high homoplasy). In such cases, the Bayesian Mk model is significantly more accurate than alternative models, and Implied weights parsimony never significantly out-performs the Bayesian Mk model. When poorly-supported branches are collapsed, the Bayesian Mk model recovers trees with higher resolution compared to other methods. Since it is not possible to assess homoplasy independently of a tree estimate, the Bayesian Mk model emerges as the most reliable method for categorical morphological analyses.,README file for Puttick et al. 2018supplementalCodeR code to simulate data matricesempirical.CIData from the survey of Consistency Indices from empirical matrices of morphological dataRCode_exampleExample of how data were simulated in the manuscriptsimulateBinaryR code to simulate all binary matricessimulateMultiStateR code to simulate all binary + multistate matricesPuttick_et_al_supp_figuresAll supplementary figures referenced in Puttick et al. 2018Puttick_et_al_supp_tablesAll supplementary tables referenced in Puttick et al. 2018,
Date made available29 Jun 2019

Cite this