Research Articles

An algorithm for solving the permutation indeterminacy problem of frequency-domain ICA based on speech energy ratio

Expand
  • 1. School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China
    2. Unit 96216 of PLA, Beijing 100085, China

Received date: 2020-04-02

  Online published: 2022-04-28

Abstract

With the development of artificial intelligence & internet of things (AIoT) and the rapid advancement of hardware technology, an increasing number of smart speakers are becoming a part of people's lives. Human-computer interaction has also witnessed a shift from remote control to voice control. However, the audio signals recorded by the microphone in a device usually contain considerable noise and interfering voices. Therefore, separation needs to be performed on the signals recorded by the microphones. Frequency-domain independent component analysis (ICA) is a commonly used separation technique, but it faces the permutation indeterminacy problem, i.e., the separated components from Source 1 are classified into a channel for Source 2, whereas the separated components from Source 2 are classified into a channel for Source 1, which greatly deteriorates the separation performance. To address this issue, we proposed an algorithm based on the speech energy ratio, which effectively improved the separation performance. The separation performance was tested on the Signal Separation Evaluation Campaign (SiSEC) and Computational Hearing in Multisource Environments (CHiME) datasets. The results showed that the proposed algorithm outperformed existing algorithms, and a good separation performance for mixed signals could be maintained even in an environment with strong reverberations.

Cite this article

ANG Zhiqiang, WANG Tao, JIN Zhiwen . An algorithm for solving the permutation indeterminacy problem of frequency-domain ICA based on speech energy ratio[J]. Journal of Shanghai University, 2022 , 28(2) : 226 -237 . DOI: 10.12066/j.issn.1007-2861.2239

References

[1] Jutten C, Herault J. Blind separation of sources, Part 1: an adaptive algorithm based on neuromimetic architecture[J]. Signal Processing, 1991, 24(1): 1-10.
[2] Comon P. Independent component analysis, a new concept?[J]. Signal Processing, 1994, 36(3): 287-314.
[3] 吴奇昌, 马峰, 戴礼荣. 一种新的基于频域独立成分分析的语音信号盲分离方法[J]. 电路与系统学报, 2013, 18(2): 405-412.
[4] Nesta F, Svaizer P, Omologo M. Convolutive BSS of short mixtures by ICA recursively regularized across frequencies[J]. IEEE Transactions on Audio Speech and Language Processing, 2010, 19(3): 624-639.
[5] Nesta F, Matassoni M. Blind source extraction for robust speech recognition in multisource noisy environments[J]. Computer Speech and Language, 2013, 27(3): 703-725.
[6] Nesta F, Omologo M. Convolutive underdetermined source separation through weighted interleaved ICA and spatio-temporal source correlation[M]// Vigneron V, Zarzoso V, Gribonval R, et al. Latent variable analysis and signal separation. Berlin: Springer-Verlag, 2012: 222-230.
[7] Sawada H, Mukai R, Araki S, et al. A robust and precise method for solving the permutation problem of frequency-domain blind source separation[J]. IEEE Transactions on Speech and Audio Processing, 2004, 12(5): 530-538.
[8] Chen Z T, Chan L W. New approaches for solving permutation indeterminacy and scaling ambiguity in frequency domain separation of convolved mixtures[C]// 2011 International Joint Conference on Neural Networks. 2011: 911-918.
[9] Kitamura D, Ono N, Sawada H, et al. Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(9): 1626-1641.
[10] Sawada H, Mukai R, Araki S, et al. Polar coordinate based nonlinear function for frequency-domain blind source separation[C]// IEEE International Conference on Acoustics, Speech, and Signal Processing. 2003: I-1001-I-1004.
[11] Mallis D, Sgouros T, Mitianoudis N. Convolutive audio source separation using robust ICA and an intelligent evolving permutation ambiguity solution[J]. Evolving Systems, 2018, 9(4): 315-329.
[12] Naik G R, Kumar D K. An overview of independent component analysis and its applications[J]. Informatica, 2011, 35(1): 63-81.
[13] Amari S I. Natural gradient works efficiently in learning[J]. Neural Computation, 1998, 10(2): 251-276.
[14] Romano J M T, Attux R, Cavalcante C C, et al. Unsupervised signal processing: channel equalization and source separation[M]. Boca Raton: CRC Press, 2018.
[15] Ding S, Cichocki A, Huang J, et al. Blind source separation of acoustic signals in realistic environments based on ICA in the time-frequency domain[J]. International Journal of Pervasive Computing and Communications, 2005, 1(2): 89-100.
[16] Saruwatari H, Kurita S, Takeda K, et al. Blind source separation combining independent component analysis and beamforming[J]. EURASIP Journal on Advances in Signal Processing, 2003, 2003(11): 1-12.
[17] Ozerov A, Vincent E, Bimbot F. A general flexible framework for the handling of prior information in audio source separation[J]. IEEE Transactions on Audio, Speech and Language Processing, 2012, 20(4): 1118-1133.
[18] Vincent E, Gribonval R, Fevotte C. Performance measurement in blind audio source separation[J]. IEEE Transactions on Audio, Speech and Language Processing, 2006, 14(4): 1462-1469.
[19] Rix A W, Beerends J G, Hollier M P, et al. Perceptual evaluation of speech quality (PESQ): a new method for speech quality assessment of telephone networks and codecs[C]// IEEE International Conference on Acoustics, Speech, and Signal Processing. 2001: 749-752.
[20] Vincent E, Virtanen T, Gannot S. Audio source separation and speech enhancement[M]. Hoboken: John Wiley & Sons, 2018.
Outlines

/