Detecting method of defects in movie conversation quality

WU Hao, ZHANG Ying, MAO Runkun, DONG Xueting

doi:10.12066/j.issn.1007-2861.2070

Journal of Shanghai University >

2018 , Vol. 24 >Issue 4: 545 - 552

DOI: https://doi.org/10.12066/j.issn.1007-2861.2070

Digital Film and Television Technology

Detecting method of defects in movie conversation quality

Expand

Shanghai Film Academy, Shanghai University, Shanghai 200072, China

Received date: 2018-06-27

Online published: 2018-08-31

Fold

Abstract

Dialogue is an important part of film and television sound, but whether it is dialogue recorded in the same period or in the period of ADR (voice dubbing), sound quality defects of various kinds are inevitable because of equipment, environment, and human factors. Traditional post-processing, which is carried out by manually searching for defects, is inefficient. This paper explores various types of sound defects in film and television dialogue, and then it compares feasible detection methods to provide ideas for automatic detection of dialogue defects.

Key words： dialogue; sound quality defects; detection; sound event

Cite this article

WU Hao, ZHANG Ying, MAO Runkun, DONG Xueting . Detecting method of defects in movie conversation quality[J]. Journal of Shanghai University, 2018 , 24(4) : 545 -552 . DOI: 10.12066/j.issn.1007-2861.2070

References

[1]	Foggia P, Petkov N, Saggese A , et al. Reliable detection of audio events in highly noisy environments[J]. Pattern Recognition Letters, 2015,65(C):22-28.
[2]	Goetze S, Schroder J, Gerlach S , et al. Acoustic monitoring and localization for social care[J]. Journal of Computing Science and Engineering, 2012,6(1):40-50.
[3]	Salamon J, Bello J P . Feature learning with deep scattering for urban sound analysis[C] // 23$^{rd}$ European Signal Processing Conference (EUSIPCO). 2015: 724-728.
[4]	Wang Y, Neves L, Metze F . Audio-based multimedia event detection using deep recurrent neural networks[C] // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2016: 2742-2746.
[5]	Stowell D, Clayton D . Acoustic event detection for multiple overlapping similar sources[C] // IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). 2015, DOI: 10.1109/WASPAA.2015.7336885.
[6]	Cai R, Lu L, Hanjalic A , et al. A flexible framework for key audio effects detection and auditory context inference[J]. IEEE Transactions on audio, speech, and language processing, 2006,14(3):1026-1039.
[7]	Mesaros A, Heittola T, Dikmen O , et al. Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations[C] // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2015: 151-155.
[8]	Cakir E, Heittola T, Huttunen H , et al. Polyphonic sound event detection using multi label deep neural networks[C] // International Joint Conference on Neural Networks (IJCNN). 2015, DOI: 10.1109/IJCNN.2015.7280624.
[9]	Cakir E, Ozan E C, Virtanen T . Filterbank learning for deep neural network based polyphonic sound event detection[C] // International Joint Conference on Neural Networks (IJCNN). 2016, DOI: 10.1109/IJCNN.2016.7727634.
[10]	全国广播电视标准化技术委员会. 广播节目声音质量主观评价方法和技术指标要求: GB/T 16463---1996 [S]. 北京: 中国标准出版社, 1996.
[11]	Hayashi T, Watanabe S, Toda T , et al. Duration-controlled LSTM for polyphonic sound event detection[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2017,25(11):2059-2070.
[12]	Heittola T, Mesaros A, Eronen A , et al. Context-dependent sound event detection[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2013, DOI: 10.1186/1687-4722-2013-1.
[13]	Cakir E, Parascandolo G, Heittola T , et al. Convolutional recurrent neural networks for polyphonic sound event detection[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2016,25(6):1291-1303.
[14]	Sohn J, Kim N S, Sung W Y . A statistical model-based voice activity detection[J]. IEEE Signal Processing Letters, 1999,6(1):1-3.
[15]	Graves A, Mohamed A, Hinton G . Speech recognition with deep recurrent neural networks[C] // IEEE international conference on Acoustics, speech and signal processing (ICASSP). 2013: 6645-6649.
[16]	Sainath T N, Vinyals O, Senior A , et al. Convolutional, long short-term memory, fully connected deep neural networks[C] // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2015: 4580-4584.
[17]	Karpathy A, Li F F . Deep visual-semantic alignments for generating image descriptions[C] // IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015: 3128-3137.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References