Evaluating the generalizability and transferability of water distribution deterioration models
More Info
expand_more
Abstract
Small utilities often lack the required amount of data to train machine learning-based models to predict pipe failures, and hence are unable to harness the possibilities and predictive power of machine learning. This study evaluates the generalizability and transferability of a machine learning model to see if small utilities can benefit from the data and models of other utilities. Using nine Norwegian utilities’ datasets, we trained nine global models (by merging multiple datasets) and nine local models (by utilizing each utility's dataset) using random survival forest. Several pre-processing techniques including addressing left-truncated break data and break data scarcity are also presented. The global models and three of the local models were tested to predict the pipe failure of the utilities which were not included in their training datasets. The results indicate that the global models can predict other utilities with sufficient accuracy while local models have some limitations. However, if a representative utility with a sufficiently large (and information rich) dataset is selected, its model can predict the other utility's pipe breaks as accurate as the global models. Furthermore, survival curves for defined cohorts as proxies for uncertainty, and variable importance show that pipes with and without previous breaks behave extremely different. With the understanding of models’ generalizability and transferability, small utilities can benefit from the data and models of other utilities.