Jun Du | TU Delft Repository

The Multimodal Information Based Speech Processing (Misp) 2022 Challenge

Audio-Visual Diarization And Recognition

Conference paper (2023) - Zhe Wang (author), Shilong Wu (author), Diyuan Liu (author), More Authors..., Hang Chen (author), Mao-Kui He (author), Jun Du (author), Chin-Hui Lee (author), Jingdong Chen (author), Shinji Watanabe (author), Sabato Marco Siniscalchi (author), Sabato Marco Siniscalchi (author), O.E. Scharenborg (author)

The Multi-modal Information based Speech Processing (MISP) challenge aims to extend the application of signal processing technology in specific scenarios by promoting the research into wake-up words, speaker diarization, speech recognition, and other technologies. The MISP2022 ch ...

The First Multimodal Information Based Speech Processing (Misp) Challenge

Data, Tasks, Baselines And Results

Conference paper (2022) - Hang Chen (author), Hengshun Zhou (author), Jun Du (author), Chin-Hui Lee (author), Jingdong Chen (author), Shinji Watanabe (author), Sabato Marco Siniscalchi (author), Sabato Marco Siniscalchi (author), O.E. Scharenborg (author), Di-Yuan Liu (author), More Authors...

In this paper we discuss the rational of the Multi-model Information based Speech Processing (MISP) Challenge, and provide a detailed description of the data recorded, the two evaluation tasks and the corresponding baselines, followed by a summary of submitted systems and evaluat ...

Audio-Visual Wake Word Spotting in MISP2021 Challenge

Dataset Release and Deep Analysis

Journal article (2022) - Hengshun Zhou (author), Jun Du (author), Guang Zou (author), Zhaoxu Nian (author), Chin-Hui Lee (author), Sabato Marco Siniscalchi (author), Sabato Marco Siniscalchi (author), Shinji Watanabe (author), O.E. Scharenborg (author), Jingdong Chen (author), More Authors...

In this paper, we describe and release publicly the audio-visual wake word spotting (WWS) database in the MISP2021 Challenge, which covers a range of scenarios of audio and video data collected by near-, mid-, and far-field microphone arrays, and cameras, to create a shared and p ...