Label Alchemy: Transforming Noisy Data into Precious Insights in Deep Learning

More Info
expand_more

Abstract

Labels are essential for training Deep Neural Networks (DNNs), guiding learning with fundamental ground truth. Label quality directly impacts DNN performance and generalization with accurate labels fostering robust predictions. Noisy labels introduce errors and hinder learning, affecting performance adversely. High-quality labels aid convergence, optimizing DNN training towards accurate data distribution representation. Ensuring label accuracy is vital for DNNs’ effective learning, generalization, and real-world performance. Undoubtedly, ensuring the quality of labels is not only critical but also demanding, often entailing considerable resources in terms of time and cost. As the scale of datasets grows, methods such as crowdsourcing have gained traction to expedite the labeling process. However, this approach comes with its own set of challenges, most notably the inherent susceptibility to errors and inaccuracies. For example, it was observed that the accuracy of AlexNet in classifying CIFAR-10 images plummeted from 77% to a mere 10% when labels were subjected to random flips. This stark drop in accuracy exemplifies the magnitude of influence that corrupted or erroneous labels can exert on the performance of DNNs. Such instances underscore the critical relationship between accurate labels and the efficacy of DNNs in understanding and effectively leveraging data. EnsuringDNNrobustness is vital, involving strategies like noise label identification, filtering, and integrating noise patterns into training for resilientmodels. Architectural and loss function design also combats label-related challenges, enhancing DNN adaptability across applications. This thesis investigates the pivotal role of labels in DNN training and their quality impact onmodel performance. Strategies spanning noise recovery, robust learning frameworks, andmulti-label solutions contribute toDNNresilience against noisy labels, advancing both understanding and practical applications. Chapter 1 of this thesis introduces and explains the crucial elements involved in training DNNs, which include data, DNN models, and expert participation. It highlights the complexity introduced by label noise and sets the stage for the diverse methods designed in subsequent chapters to address these aspects comprehensively.