OBJECTIVE: To empirically explore the level of agreement of the treatment hierarchies from different ranking metrics in network meta-analysis (NMA) and to investigate how network characteristics influence the agreement.
DESIGN: Empirical evaluation from re-analysis of NMA.
DATA: 232 networks of four or more interventions from randomised controlled trials, published between 1999 and 2015.
METHODS: We calculated treatment hierarchies from several ranking metrics: relative treatment effects, probability of producing the best value [Formula: see text] and the surface under the cumulative ranking curve (SUCRA). We estimated the level of agreement between the treatment hierarchies using different measures: Kendall's τ and Spearman's ρ correlation; and the Yilmaz [Formula: see text] and Average Overlap, to give more weight to the top of the rankings. Finally, we assessed how the amount of the information present in a network affects the agreement between treatment hierarchies, using the average variance, the relative range of variance and the total sample size over the number of interventions of a network.
RESULTS: Overall, the pairwise agreement was high for all treatment hierarchies obtained by the different ranking metrics. The highest agreement was observed between SUCRA and the relative treatment effect for both correlation and top-weighted measures whose medians were all equal to 1. The agreement between rankings decreased for networks with less precise estimates and the hierarchies obtained from [Formula: see text] appeared to be the most sensitive to large differences in the variance estimates. However, such large differences were rare.
CONCLUSIONS: Different ranking metrics address different treatment hierarchy problems, however they produced similar rankings in the published networks. Researchers reporting NMA results can use the ranking metric they prefer, unless there are imprecise estimates or large imbalances in the variance estimates. In this case treatment hierarchies based on both probabilistic and non-probabilistic ranking metrics should be presented.