影视对白音质缺陷检测方法

doi:10.12066/j.issn.1007-2861.2070

摘要/Abstract

摘要：

对白是影视声音的重要组成部分. 同期录音和后期配音过程中, 设备、环境和人为因素等均会造成各种形式的音质缺陷. 传统的后期处理是通过人工查找缺陷进行修复, 效率较低. 分析影视对白中的各类音质缺陷及其产生原因, 对比分析可行的检测方法, 以期为对白缺陷自动化检测提供思路.

关键词: 对白, 音质缺陷, 检测, 声音事件

Abstract:

Dialogue is an important part of film and television sound, but whether it is dialogue recorded in the same period or in the period of ADR (voice dubbing), sound quality defects of various kinds are inevitable because of equipment, environment, and human factors. Traditional post-processing, which is carried out by manually searching for defects, is inefficient. This paper explores various types of sound defects in film and television dialogue, and then it compares feasible detection methods to provide ideas for automatic detection of dialogue defects.

Key words: dialogue, sound quality defects, detection, sound event

中图分类号:

TP391.42

吴昊, 张莹, 毛润坤, 董雪婷. 影视对白音质缺陷检测方法[J]. 上海大学学报(自然科学版), 2018, 24(4): 545-552.

WU Hao, ZHANG Ying, MAO Runkun, DONG Xueting. Detecting method of defects in movie conversation quality[J]. Journal of Shanghai University（Natural Science Edition）, 2018, 24(4): 545-552.

图/表 5

参考文献 17

[1]	Foggia P, Petkov N, Saggese A , et al. Reliable detection of audio events in highly noisy environments[J]. Pattern Recognition Letters, 2015,65(C):22-28.
[2]	Goetze S, Schroder J, Gerlach S , et al. Acoustic monitoring and localization for social care[J]. Journal of Computing Science and Engineering, 2012,6(1):40-50.
[3]	Salamon J, Bello J P . Feature learning with deep scattering for urban sound analysis[C] // 23$^{rd}$ European Signal Processing Conference (EUSIPCO). 2015: 724-728.
[4]	Wang Y, Neves L, Metze F . Audio-based multimedia event detection using deep recurrent neural networks[C] // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2016: 2742-2746.
[5]	Stowell D, Clayton D . Acoustic event detection for multiple overlapping similar sources[C] // IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). 2015, DOI: 10.1109/WASPAA.2015.7336885.
[6]	Cai R, Lu L, Hanjalic A , et al. A flexible framework for key audio effects detection and auditory context inference[J]. IEEE Transactions on audio, speech, and language processing, 2006,14(3):1026-1039.
[7]	Mesaros A, Heittola T, Dikmen O , et al. Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations[C] // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2015: 151-155.
[8]	Cakir E, Heittola T, Huttunen H , et al. Polyphonic sound event detection using multi label deep neural networks[C] // International Joint Conference on Neural Networks (IJCNN). 2015, DOI: 10.1109/IJCNN.2015.7280624.
[9]	Cakir E, Ozan E C, Virtanen T . Filterbank learning for deep neural network based polyphonic sound event detection[C] // International Joint Conference on Neural Networks (IJCNN). 2016, DOI: 10.1109/IJCNN.2016.7727634.
[10]	全国广播电视标准化技术委员会. 广播节目声音质量主观评价方法和技术指标要求: GB/T 16463---1996 [S]. 北京: 中国标准出版社, 1996.
[11]	Hayashi T, Watanabe S, Toda T , et al. Duration-controlled LSTM for polyphonic sound event detection[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2017,25(11):2059-2070.
[12]	Heittola T, Mesaros A, Eronen A , et al. Context-dependent sound event detection[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2013, DOI: 10.1186/1687-4722-2013-1. doi: 10.1186/1687-4722-2012-22 pmid: 30546387
[13]	Cakir E, Parascandolo G, Heittola T , et al. Convolutional recurrent neural networks for polyphonic sound event detection[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2016,25(6):1291-1303.
[14]	Sohn J, Kim N S, Sung W Y . A statistical model-based voice activity detection[J]. IEEE Signal Processing Letters, 1999,6(1):1-3.
[15]	Graves A, Mohamed A, Hinton G . Speech recognition with deep recurrent neural networks[C] // IEEE international conference on Acoustics, speech and signal processing (ICASSP). 2013: 6645-6649.
[16]	Sainath T N, Vinyals O, Senior A , et al. Convolutional, long short-term memory, fully connected deep neural networks[C] // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2015: 4580-4584.
[17]	Karpathy A, Li F F . Deep visual-semantic alignments for generating image descriptions[C] // IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015: 3128-3137.