Sewer pipes are commonly inspected in situ with CCTV equipment. The CCTV footage is then reviewed by human operators in order to classify defects in the pipes and make a recommendation on possible interventions. This process is both labor-intensive and error-prone. Other research
...
Sewer pipes are commonly inspected in situ with CCTV equipment. The CCTV footage is then reviewed by human operators in order to classify defects in the pipes and make a recommendation on possible interventions. This process is both labor-intensive and error-prone. Other researchers have suggested machine learning techniques to (partially) automate the human review of this footage, but the automated classifiers are often validated in artificial testing setups, leading to biased results that do not translate directly to operational impact. In this work, we discuss suitable evaluation metrics for this specific classification task — most notably ‘specificity at sensitivity’ and ‘precision at recall’ — and the importance of using a validation setup that includes a realistic ratio of images with defects to images without defects, and a sufficiently large dataset. We also introduce ‘leave-two-inspections-out’ cross validation, designed to eliminate a data leakage bias that would otherwise cause an overestimation of classifier performance. We designed a convolutional neural network (CNN) and applied this validation methodology to automatically detect the twelve most common defect types in a dataset of over 2 million CCTV images. With this dataset and our validation methodology, our CNN outperforms the state-of-the-art. Classification performance was highest for intruding and defective connections and lowest for porous pipes. While the CNN is not capable of fully automated classification at sufficient performance levels, we determined that if we augment the human operator with the CNN, this may reduce the required human labor by up to 60.5%.@en