Problems with binary pattern measures for flood model evaluation

Elisabeth Stephens*, Guy Schumann, Paul Bates

*Corresponding author for this work

Research output: Contribution to journalArticle (Academic Journal)peer-review

76 Citations (Scopus)


As the calibration and evaluation of flood inundation models are a prerequisite for their successful application, there is a clear need to ensure that the performance measures that quantify how well models match the available observations are fit for purpose. This paper evaluates the binary pattern performance measures that are frequently used to compare flood inundation models with observations of flood extent. This evaluation considers whether these measures are able to calibrate and evaluate model predictions in a credible and consistent way, i.e. identifying the underlying model behaviour for a number of different purposes such as comparing models of floods of different magnitudes or on different catchments. Through theoretical examples, it is shown that the binary pattern measures are not consistent for floods of different sizes, such that for the same vertical error in water level, a model of a flood of large magnitude appears to perform better than a model of a smaller magnitude flood. Further, the commonly used Critical Success Index (usually referred to as F<2>) is biased in favour of overprediction of the flood extent, and is also biased towards correctly predicting areas of the domain with smaller topographic gradients. Consequently, it is recommended that future studies consider carefully the implications of reporting conclusions using these performance measures. Additionally, future research should consider whether a more robust and consistent analysis could be achieved by using elevation comparison methods instead.

Original languageEnglish
Pages (from-to)4928-4937
Number of pages10
JournalHydrological Processes
Issue number18
Publication statusPublished - 30 Aug 2014


  • Calibration
  • Evaluation
  • Flood modelling
  • Performance measures
  • Remote sensing


Dive into the research topics of 'Problems with binary pattern measures for flood model evaluation'. Together they form a unique fingerprint.

Cite this