Hydrologic model performance evaluation depends on streamflow observations that are accurately positioned in the landscape. For distributed hydrologic models, this means that the streamflow observation need to be mapped to a location along the model streamflow network that repres
...
Hydrologic model performance evaluation depends on streamflow observations that are accurately positioned in the landscape. For distributed hydrologic models, this means that the streamflow observation need to be mapped to a location along the model streamflow network that represents the location of the observation station in a hydrologic system. However, the gridded representation of the modelled area causes a spatial mismatch between the hydrologic system and hydrologic model. In this study we aimed to develop a Machine learning-based method to improve matching between streamflow observations and streamflow simulations. The setup of this method was implemented in two steps: (1) a dataset was created consisting of streamflow characteristics of simulations and observations and (2) a Machine learning algorithm was trained with the created dataset. Three data sources were used for the creation of the dataset: (1) 595 streamflow observations were retrieved from the Global Runoff Database Centre (GRDC), (2) streamflow simulations were extracted from the European Flood Alert System (EFAS) and (3) we were provided with a manually created and checked dataset by European Centre for Medium Range Weather Forecasts linking each GRDC observation to the correct EFAS grid cell. To link 60% of the observations in the dataset with the correct grid cells, the observations required to be moved away from the cell corresponding to the geolocation of the observations. The method developed in this study anticipated this by creating a search window around the initial location of each observation. The streamflow simulations were extracted from the grid cells in the search window and compared with the streamflow observation. The algorithm aimed to select the streamflow simulation that best reflected the characteristics of the streamflow observation. The characteristics were described with streamflow signatures. Four Machine learning algorithms, a Logistic Regression, Random Forest, Support Vector Machine and K Nearest Neighbours algorithm, were trained with a Kfold Cross Validation procedure to match streamflow simulations with streamflow observations based on streamflow signatures. Their performance was compared with four benchmark algorithms: a Center Cell benchmark which places the observations on their initial location, and the Root Mean Squared Error, Kling-Gupta Efficiency and Nash-Sutcliffe Efficiency benchmarks that compare the streamflow observation with the streamflow simulations. We identified the Logistic Regression and Random Forest algorithms as the best performing algorithms. However, neither outperformed all benchmarks. Despite these results, we show the potential to automate matching between streamflow observations and streamflow simulations with a ML-based approach in this study.