Robust (Deep) learning framework against dirty labels and beyond

Conference paper (2019)

Authors

S. Ghiassi Data-Intensive Systems - , Université Grenoble Alpes

T. Younesian Université Grenoble Alpes, Data-Intensive Systems -

Zhilong Zhao ABB Future Labs

Robert Birke University of Neuchâtel

Valerio Schiavoni University of Neuchâtel

Lydia Y. Chen Université Grenoble Alpes, Data-Intensive Systems -

Y. Chen Université Grenoble Alpes, Data-Intensive Systems -

Research Group

Data-Intensive Systems () (TU Delft)

Deep neural networks Adversarial learning Active learning Data filtering Dirty labels Trusted execution

To reference this document use:

http://resolver.tudelft.nl/uuid:21a69f0d-b26c-4202-bbeb-792809141f54

More Info

expand_more

Published Date

2019

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Software Technology

Research Group

Data-Intensive Systems

Abstract

Data is generated with unprecedented speed, due to the flourishing of social media and open platforms. However, due to the lack of scrutinizing, both clean and dirty data are widely spreaded. For instance, there is a significant portion of images tagged with corrupted dirty class labels. Such dirty data sets are not only detrimental to the learning outcomes, e.g., misclassified images into the wrong classes, but also costly. It is pointed out that bad data can cost the U.S. up to a daunting 3 trillion dollars per year. In this paper, we address the following question: how prevailing (deep) machine learning models can be robustly trained given a non-negligible presence of corrupted labeled data. Dirty labels significantly increase the complexity of existing learning problems, as the ground truth of label's quality are not easily assessed. Here, we advocate to rigorously incorporate human experts into one learning framework where both artificial and human intelligence collaborate. To such an end, we combine three strategies to enhance the robustness for deep and regular machine learning algorithms, namely, (i) data filtering through additional quality model, (ii) data selection via actively learning from expert, and (iii) imitating expert's correction process. We demonstrate three strategies sequentially with examples and apply them on widely used benchmarks, such as CIFAR10 and CIFAR100. Our initial results show the effectiveness of the proposed strategies in combating dirty labels, e.g., the resulting classification can be up to 50% higher than the state-of-the-art AI-only solutions. Finally, we extend the discussion of robust learning from the trusted data to the trusted execution environment.

Files

Robust_Deep_Learning_Framework... (pdf)

(pdf | 0.775 Mb)

Unknown license

Download not available