Journal of Shanghai University(Natural Science Edition) ›› 2022, Vol. 28 ›› Issue (2): 226-237.doi: 10.12066/j.issn.1007-2861.2239

• Research Articles • Previous Articles     Next Articles

An algorithm for solving the permutation indeterminacy problem of frequency-domain ICA based on speech energy ratio

ANG Zhiqiang1, WANG Tao1(), JIN Zhiwen2   

  1. 1. School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China
    2. Unit 96216 of PLA, Beijing 100085, China
  • Received:2020-04-02 Online:2022-04-30 Published:2022-04-28
  • Contact: WANG Tao E-mail:twang@shu.edu.cn

Abstract:

With the development of artificial intelligence & internet of things (AIoT) and the rapid advancement of hardware technology, an increasing number of smart speakers are becoming a part of people's lives. Human-computer interaction has also witnessed a shift from remote control to voice control. However, the audio signals recorded by the microphone in a device usually contain considerable noise and interfering voices. Therefore, separation needs to be performed on the signals recorded by the microphones. Frequency-domain independent component analysis (ICA) is a commonly used separation technique, but it faces the permutation indeterminacy problem, i.e., the separated components from Source 1 are classified into a channel for Source 2, whereas the separated components from Source 2 are classified into a channel for Source 1, which greatly deteriorates the separation performance. To address this issue, we proposed an algorithm based on the speech energy ratio, which effectively improved the separation performance. The separation performance was tested on the Signal Separation Evaluation Campaign (SiSEC) and Computational Hearing in Multisource Environments (CHiME) datasets. The results showed that the proposed algorithm outperformed existing algorithms, and a good separation performance for mixed signals could be maintained even in an environment with strong reverberations.

Key words: blind source separation, speech separation, frequency-domain independent component analysis, permutation indeterminacy, energy ratio

CLC Number: