Methods for standard meta-analysis of diagnostic test accuracy studies are well established and understood. For the more complex case in which studies report test accuracy across multiple thresholds, several approaches have recently been proposed. These are based on similar ideas, but make different assumptions. In this article we apply four different approaches to data from a recent systematic review in the area of nephrology and compare the results. The four approaches use: a linear mixed effects model, a Bayesian multinomial random effects model, a time-to-event model, and a nonparametric model respectively. In the case study data, the accuracy of neutrophil gelatinase-associated lipocalin for the diagnosis of acute kidney injury was assessed in different scenarios, with sensitivity and specificity estimates available for three thresholds in each primary study. All approaches led to plausible and mostly similar summary results. However, we found considerable differences in results for some scenarios, for example differences in the area under the ROC curve (AUC) of up to 0.13. The Bayesian approach tended to lead to the highest values of the AUC, and the nonparametric approach tended to produce the lowest values across the different scenarios. Though we recommend using these approaches, our findings motivate the need for a simulation study to explore optimal choice of method in various scenarios.
- bivariate endpoint
- multiple thresholds