A Generative Neural Network Model for Speech Enhancement

Master thesis (2019)

Authors

H.S. Kapadia Electrical Engineering, Mathematics and Computer Science

Contributors

W.B. Kleijn Signal Processing Systems - (mentor)

Richard C. Hendriks Signal Processing Systems - (coach)

R.C. Hendriks Signal Processing Systems - (coach)

Bert de Vries (coach)

Anne Hendrikse (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:b76d4c1f-108b-4110-bced-ea6b320eda56

More Info

expand_more

Published Date

20-09-2019

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Listening in noise is a challenging problem that affects the hearing capability of not only normal hearing but especially hearing impaired people. Since the last four decades, enhancing the quality and intelligibility of noise corrupted speech by reducing the effect of noise has been addressed using statistical signal processing techniques as well as neural networks. However, the fundamental idea behind implementing these methods is the same, i.e., to achieve the best possible estimate of a single target speech waveform. This thesis explores a different route using generative modeling with deep neural networks where speech is artificially generated by conditioning the model on previously predicted samples and features extracted from noisy speech. The proposed system consists of the U-Net model for enhancing the noisy features and the WaveRNN synthesizer (originally proposed for text-to-speech synthesis) re-designed for synthesizing clean sounding speech from noisy features. Subjective results indicate that speech generated by the proposed system is preferred over listening to noisy speech however, the improvement in intelligibility is not significant.

Files

MSc_Thesis_A_Generative_Neural... (pdf)

(pdf | 20.7 Mb)

- Embargo expired in 01-09-2020

Unknown license