Abstract
Design flood estimation is a fundamental task in hydrology. In this research, we propose a machine learning based approach to estimate design floods globally. This approach involves three stages: (i) estimating at-site flood frequency curve
for global gauging stations by the Anderson-Darling test and a Bayesian MCMC method; (ii) clustering these stations into subgroups by a K-means model based on twelve globally available catchment descriptors, and (iii) developing a regression model in each subgroup for regional design flood estimation using the same descriptors. A total of 11793 stations globally were selected for model development and three widely used regression models were compared for design flood estimation. The results showed that: (1) the proposed approach achieved the highest accuracy for design flood estimation when using all twelve descriptors for clustering; and the performance of the regression was improved by considering more descriptors during training and validation; (2) a support vector machine regression provided the highest prediction performance amongst all
regression models tested, with root mean square normalised error of 0.708 for 100-year return period flood estimation; (3) 100-year design floods in tropical, arid, temperate, cold and polar climate zones could be reliably estimated with relative mean 20 relative biases (RBIAS) of -0.199, -0.233, -0.169, 0.179 and -0.091 respectively (i.e. <20% error); (4) the machine learning based approach developed in this paper showed considerable improvement over the index-flood based method introduced by Smith et al. (2015, https://doi.org/10.1002/2014WR015814) for design flood estimation at global scales; and (5) the average RBIAS in estimation is less than 18% for 10, 20, 50 and 100-year design floods. We conclude that the proposed approach is a valid method to estimate design floods anywhere on the global river network, improving our prediction of the flood hazard, especially in ungauged areas
for global gauging stations by the Anderson-Darling test and a Bayesian MCMC method; (ii) clustering these stations into subgroups by a K-means model based on twelve globally available catchment descriptors, and (iii) developing a regression model in each subgroup for regional design flood estimation using the same descriptors. A total of 11793 stations globally were selected for model development and three widely used regression models were compared for design flood estimation. The results showed that: (1) the proposed approach achieved the highest accuracy for design flood estimation when using all twelve descriptors for clustering; and the performance of the regression was improved by considering more descriptors during training and validation; (2) a support vector machine regression provided the highest prediction performance amongst all
regression models tested, with root mean square normalised error of 0.708 for 100-year return period flood estimation; (3) 100-year design floods in tropical, arid, temperate, cold and polar climate zones could be reliably estimated with relative mean 20 relative biases (RBIAS) of -0.199, -0.233, -0.169, 0.179 and -0.091 respectively (i.e. <20% error); (4) the machine learning based approach developed in this paper showed considerable improvement over the index-flood based method introduced by Smith et al. (2015, https://doi.org/10.1002/2014WR015814) for design flood estimation at global scales; and (5) the average RBIAS in estimation is less than 18% for 10, 20, 50 and 100-year design floods. We conclude that the proposed approach is a valid method to estimate design floods anywhere on the global river network, improving our prediction of the flood hazard, especially in ungauged areas
Original language | English |
---|---|
Article number | 5981–5999 |
Number of pages | 19 |
Journal | Hydrology and Earth System Sciences |
Volume | 25 |
Early online date | 22 Nov 2021 |
DOIs | |
Publication status | E-pub ahead of print - 22 Nov 2021 |