A Generative Neural Network Model for Speech Enhancement
More Info
expand_more
Abstract
Listening in noise is a challenging problem that affects the hearing capability of not only normal hearing but especially hearing impaired people. Since the last four decades, enhancing the quality and intelligibility of noise corrupted speech by reducing the effect of noise has been addressed using statistical signal processing techniques as well as neural networks. However, the fundamental idea behind implementing these methods is the same, i.e., to achieve the best possible estimate of a single target speech waveform. This thesis explores a different route using generative modeling with deep neural networks where speech is artificially generated by conditioning the model on previously predicted samples and features extracted from noisy speech. The proposed system consists of the U-Net model for enhancing the noisy features and the WaveRNN synthesizer (originally proposed for text-to-speech synthesis) re-designed for synthesizing clean sounding speech from noisy features. Subjective results indicate that speech generated by the proposed system is preferred over listening to noisy speech however, the improvement in intelligibility is not significant.