Guideline development requires the synthesis of evidence on several treatments of interest, typically by using network meta-analysis (NMA). Because treatment effects may be estimated imprecisely or be based on evidence lacking internal or external validity, guideline developers must assess the robustness of recommendations made on the basis of the NMA to potential limitations in the evidence. Such limitations arise because the observed estimates differ from the true effects of interest, for example, because of study biases, sampling variation, or issues of relevance. The widely used GRADE (Grading of Recommendations Assessment, Development and Evaluation) framework aims to assess the quality of evidence supporting a recommendation by using a structured series of qualitative judgments. This article argues that GRADE approaches proposed for NMA are insufficient for the purposes of guideline development, because the influence of the evidence on the final recommendation is not taken into account. It outlines threshold analysis as an alternative approach, demonstrating the method with 2 examples of clinical guidelines from the National Institute for Health and Care Excellence (NICE) in the United Kingdom. Threshold analysis quantifies precisely how much the evidence could change (for any reason, such as potential biases, or simply sampling variation) before the recommendation changes, and what the revised recommendation would be. If it is judged that the evidence could not plausibly change by more than this amount, then the recommendation is considered robust; otherwise, it is sensitive to plausible changes in the evidence. In this manner, threshold analysis directly informs decision makers and guideline developers of the robustness of treatment recommendations.