Multi-AL: Robust Active learning for Multi-label Classifier

Bachelor thesis (2021)

Authors

M.J. Basting Electrical Engineering, Mathematics and Computer Science

Contributors

Y. Chen Data-Intensive Systems - (graduation committee member)

Lydia Y. Chen Data-Intensive Systems - (graduation committee member)

T. Younesian Data-Intensive Systems - (mentor)

S. Ghiassi Data-Intensive Systems - (mentor)

F.A. Kuipers Embedded Systems - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:93fba5b3-f31a-452a-9af6-5c372a00abda

More Info

expand_more

Published Date

02-07-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Multi-label learning is becoming more and moreimportant as real-world data often contains multi-ple labels. The dataset used for learning such aclassifier is of great importance. Acquiring a cor-rectly labelled dataset is however a difficult task.Active learning is a method which can, given anoisy dataset, identify important instances for anexpert to label. This greatly reduces the amountof instances needed to train an accurate classi-fier, and thus reduces the cost of cleaning a noisydataset. Therefore, this paper aims to present an ac-tive learning algorithm, focused on wrongly labeleddata, combined with a deep neural network formulti-label image classification. The proposed ac-tive learning solution is divided into two measures;a mislabelling likelihood and an informativenessmeasure together with an option to identify anduse highly probable clean instances in the dataset.Experiments performed on the real world dataset,called Microsoft COCO, with 20, 40 and 60% in-jected label noise show that Multi-AL outperformsthe current state-of-the-art multi-label learning al-gorithm called ASL by 28% while only using 600labelled instances in total and 250 extracted ’clean’instances. Multi-AL additionally outperforms ran-dom sampling by 3% on average for 20 and 40%random label noise when sampling from a wronglylabelled dataset of 23k instances.

Files

Final_report_mark_basting.pdf

(pdf | 1.69 Mb)

Unknown license