TrustNet

Learning from Trusted Data Against (A)symmetric Label Noise

More Info
expand_more

Abstract

Big Data systems allow collecting massive datasets to feed the data hungry deep learning. Labelling these ever-bigger datasets is increasingly challenging and label errors affect even highly curated sets. This makes robustness to label noise a critical property for weakly-supervised classifiers. The related works on resilient deep networks tend to focus on a limited set of synthetic noise patterns, and with disparate views on their impacts, e.g., robustness against symmetric v.s. asymmetric noise patterns. In this paper, we first extend the theoretical analysis of test accuracy for any given noise patterns. Based on the insights, we design TrustNet that first learns the pattern of noise corruption, being it both symmetric or asymmetric, from a small set of trusted data. Then, TrustNet is trained via a robust loss function, which weights the given labels against the inferred labels from the learned noise pattern. The weight is adjusted based on model uncertainty across training epochs. We evaluate TrustNet on synthetic label noise for CIFAR-10, CIFAR-100 and big real-world data with label noise, i.e., Clothing1M. We compare against state-of-The-Art methods demonstrating the strong robustness of TrustNet under a diverse set of noise patterns.