A limitation of current ASR systems is the so-called out-of-vocabulary words. The solution to overcome this limitation is to use APR systems. Previous research on Dutch APR systems identified Time Delayed Bidirectional Long-Short Term Memory Neural Network (TDNN-BLSTM) as one of
...
A limitation of current ASR systems is the so-called out-of-vocabulary words. The solution to overcome this limitation is to use APR systems. Previous research on Dutch APR systems identified Time Delayed Bidirectional Long-Short Term Memory Neural Network (TDNN-BLSTM) as one of best performing state-of-the-art NN architecture for PR. The goal of this research is to evaluate the performance of the TDNN-BLSTM architecture for phoneme recognition on Mandarin read and spontaneous speech, analyze the differences in performance for the two speech styles as well as compare the results with previous research on Dutch PR.
To achieve this goal 4 different NN models of the TDNN-BLSTM architecture were built and trained on Mandarin read and spontaneous speech. The test results of the NN models were used to calculate the phoneme error rate (PER), decomposed PER, and the contribution of individual phonemes to the overall PER. Based on these findings, conclusions are formulated regarding the impact of different languages, speech styles, and the architectural changes on the performance of the TDNN-BLSTM architecture.