Journal of Shanghai University(Natural Science Edition) ›› 2024, Vol. 30 ›› Issue (3): 476-490.doi: 10.12066/j.issn.1007-2861.2449

Previous Articles     Next Articles

Anti-noise speech recognition system based on generative adversarial network data enhancement

FENG Tianyu, ZHU Yonghua   

  1. Shanghai Film Academy, Shanghai University, Shanghai 200072, China
  • Online:2024-06-30 Published:2024-07-09

Abstract: Research on speech recognition is always challenged by the limitations of the dataset. Data enhancement can improve the scale and diversity of training data, thereby improving the accuracy of speech recognition. In this paper, a speech data generation method based on generative adversarial network (GAN) is proposed for improving speech recognition in noisy environments. First, the basic GAN structure is used to generate speech samples frame by frame at the spectral feature level. Considering the lack of real labels for training, an unsupervised learning framework is proposed for acoustic modeling using non-transcribed data, whereby the conditional GAN structure is used to explore two conditions: the acoustic state of each speech frame and original clean speech corresponding to the speech in the dataset. GANs that incorporate conditional information can directly provide real labels for acoustic modeling. The present method was evaluated on the noisy Aurora-4 and AMI conference transcription tasks. Experimental results show that the new method can significantly improve the performance under various noise conditions (additive noise, channel distortion, and reverberation). The enhanced data generated by GAN reduced the word error rate (WER) by 6%∼14% on the advanced very deep convolutional neural network (VDCNN) acoustic model. 

Key words: generate adversarial network, acoustic model, data enhancement, noise, speech recognition

CLC Number: