Label Alchemy: Transforming Noisy Data into Precious Insights in Deep Learning

Doctoral thesis (2024)

Authors

S. Ghiassi Data-Intensive Systems -

Contributors

D.H.J. Epema Data-Intensive Systems - (supervisor)

Lydia Y. Chen Data-Intensive Systems - (supervisor)

Y. Chen Data-Intensive Systems - (supervisor)

Research Group

Data-Intensive Systems () (TU Delft)

Deep Neural Networks Robustness Active learning Noisy labels Multi-label learning Trusted data Noise transition matrix Noise resilient loss

To reference this document use:

http://resolver.tudelft.nl/uuid:9330d15b-c063-4908-9c9b-5bf510ecbba9

More Info

expand_more

Published Date

2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Software Technology

Research Group

Data-Intensive Systems

Abstract

Labels are essential for training Deep Neural Networks (DNNs), guiding learning with fundamental ground truth. Label quality directly impacts DNN performance and generalization with accurate labels fostering robust predictions. Noisy labels introduce errors and hinder learning, affecting performance adversely. High-quality labels aid convergence, optimizing DNN training towards accurate data distribution representation. Ensuring label accuracy is vital for DNNs’ effective learning, generalization, and real-world performance. Undoubtedly, ensuring the quality of labels is not only critical but also demanding, often entailing considerable resources in terms of time and cost. As the scale of datasets grows, methods such as crowdsourcing have gained traction to expedite the labeling process. However, this approach comes with its own set of challenges, most notably the inherent susceptibility to errors and inaccuracies. For example, it was observed that the accuracy of AlexNet in classifying CIFAR-10 images plummeted from 77% to a mere 10% when labels were subjected to random flips. This stark drop in accuracy exemplifies the magnitude of influence that corrupted or erroneous labels can exert on the performance of DNNs. Such instances underscore the critical relationship between accurate labels and the efficacy of DNNs in understanding and effectively leveraging data. EnsuringDNNrobustness is vital, involving strategies like noise label identification, filtering, and integrating noise patterns into training for resilientmodels. Architectural and loss function design also combats label-related challenges, enhancing DNN adaptability across applications. This thesis investigates the pivotal role of labels in DNN training and their quality impact onmodel performance. Strategies spanning noise recovery, robust learning frameworks, andmulti-label solutions contribute toDNNresilience against noisy labels, advancing both understanding and practical applications. Chapter 1 of this thesis introduces and explains the crucial elements involved in training DNNs, which include data, DNN models, and expert participation. It highlights the complexity introduced by label noise and sets the stage for the diverse methods designed in subsequent chapters to address these aspects comprehensively.

Files

Ghiassi_thesis_final_version.p... (pdf)

(pdf | 12 Mb)

Unknown license