Detecting method of defects in movie conversation quality
Received date: 2018-06-27
Online published: 2018-08-31
吴昊, 张莹, 毛润坤, 董雪婷 . 影视对白音质缺陷检测方法[J]. 上海大学学报(自然科学版), 2018 , 24(4) : 545 -552 . DOI: 10.12066/j.issn.1007-2861.2070
Dialogue is an important part of film and television sound, but whether it is dialogue recorded in the same period or in the period of ADR (voice dubbing), sound quality defects of various kinds are inevitable because of equipment, environment, and human factors. Traditional post-processing, which is carried out by manually searching for defects, is inefficient. This paper explores various types of sound defects in film and television dialogue, and then it compares feasible detection methods to provide ideas for automatic detection of dialogue defects.
Key words: dialogue; sound quality defects; detection; sound event
| [1] | Foggia P, Petkov N, Saggese A , et al. Reliable detection of audio events in highly noisy environments[J]. Pattern Recognition Letters, 2015,65(C):22-28. |
| [2] | Goetze S, Schroder J, Gerlach S , et al. Acoustic monitoring and localization for social care[J]. Journal of Computing Science and Engineering, 2012,6(1):40-50. |
| [3] | Salamon J, Bello J P . Feature learning with deep scattering for urban sound analysis[C] // 23$^{rd}$ European Signal Processing Conference (EUSIPCO). 2015: 724-728. |
| [4] | Wang Y, Neves L, Metze F . Audio-based multimedia event detection using deep recurrent neural networks[C] // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2016: 2742-2746. |
| [5] | Stowell D, Clayton D . Acoustic event detection for multiple overlapping similar sources[C] // IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). 2015, DOI: 10.1109/WASPAA.2015.7336885. |
| [6] | Cai R, Lu L, Hanjalic A , et al. A flexible framework for key audio effects detection and auditory context inference[J]. IEEE Transactions on audio, speech, and language processing, 2006,14(3):1026-1039. |
| [7] | Mesaros A, Heittola T, Dikmen O , et al. Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations[C] // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2015: 151-155. |
| [8] | Cakir E, Heittola T, Huttunen H , et al. Polyphonic sound event detection using multi label deep neural networks[C] // International Joint Conference on Neural Networks (IJCNN). 2015, DOI: 10.1109/IJCNN.2015.7280624. |
| [9] | Cakir E, Ozan E C, Virtanen T . Filterbank learning for deep neural network based polyphonic sound event detection[C] // International Joint Conference on Neural Networks (IJCNN). 2016, DOI: 10.1109/IJCNN.2016.7727634. |
| [10] | 全国广播电视标准化技术委员会. 广播节目声音质量主观评价方法和技术指标要求: GB/T 16463---1996 [S]. 北京: 中国标准出版社, 1996. |
| [11] | Hayashi T, Watanabe S, Toda T , et al. Duration-controlled LSTM for polyphonic sound event detection[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2017,25(11):2059-2070. |
| [12] | Heittola T, Mesaros A, Eronen A , et al. Context-dependent sound event detection[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2013, DOI: 10.1186/1687-4722-2013-1. |
| [13] | Cakir E, Parascandolo G, Heittola T , et al. Convolutional recurrent neural networks for polyphonic sound event detection[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2016,25(6):1291-1303. |
| [14] | Sohn J, Kim N S, Sung W Y . A statistical model-based voice activity detection[J]. IEEE Signal Processing Letters, 1999,6(1):1-3. |
| [15] | Graves A, Mohamed A, Hinton G . Speech recognition with deep recurrent neural networks[C] // IEEE international conference on Acoustics, speech and signal processing (ICASSP). 2013: 6645-6649. |
| [16] | Sainath T N, Vinyals O, Senior A , et al. Convolutional, long short-term memory, fully connected deep neural networks[C] // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2015: 4580-4584. |
| [17] | Karpathy A, Li F F . Deep visual-semantic alignments for generating image descriptions[C] // IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015: 3128-3137. |
/
| 〈 |
|
〉 |