Journal of Systems Engineering and Electronics ›› 2024, Vol. 35 ›› Issue (2): 294-301.doi: 10.23919/JSEE.2023.000110
• ELECTRONICS TECHNOLOGY • Previous Articles
Dada ZHAO1,2(), Kai DING2(), Xiaogang QI1,*(), Yu CHEN2(), Hailin FENG1()
Received:
2021-06-06
Accepted:
2023-02-24
Online:
2024-04-18
Published:
2024-04-18
Contact:
Xiaogang QI
E-mail:ddzhao@stu.xidian.edu.cn;winfast113@sina.com;xgqi@xidian.edu.cn;cy0520tool@sohu.com;hlfeng@xidian.edu.cn
About author:
Supported by:
Dada ZHAO, Kai DING, Xiaogang QI, Yu CHEN, Hailin FENG. Sound event localization and detection based on deep learning[J]. Journal of Systems Engineering and Electronics, 2024, 35(2): 294-301.
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
Table 1
Comparation of the proposed algorithm with baseline"
Algorithm | Evaluation metrics | CANSYN | CRESYN | |||||
OV1 | OV2 | OV3 | OV1 | OV2 | OV3 | |||
SELDnet | ER | 0.11 | 0.18 | 0.19 | 0.13 | 0.22 | 0.30 | |
F score | 93.0 | 86.6 | 85.3 | 90.4 | 82.2 | 78.0 | ||
DOA error | 29.5 | 31.3 | 34.3 | 28.4 | 33.7 | 41.0 | ||
Frame recall | 97.9 | 78.8 | 67.0 | 96.4 | 75.7 | 60.7 | ||
HIRnet | ER | 0.41 | 0.45 | 0.62 | 0.43 | 0.46 | 0.50 | |
F score | 60.0 | 54.9 | 58.8 | 59.3 | 60.2 | 58.6 | ||
DOA error | 5.2 | 16.3 | 33.0 | 7.4 | 18.6 | 43.3 | ||
Frame recall | 60.2 | 35.9 | 18.4 | 56.9 | 20.5 | 10.7 | ||
MUSIC | DOA error | 26.4 | 28.9 | 31.1 | 38.6 | 49.5 | 61.9 | |
Two-stage | ER | 0.07 | 0.17 | 0.20 | 0.12 | 0.21 | 0.28 | |
F score | 95.9 | 91.0 | 84.7 | 92.4 | 84.2 | 81.0 | ||
DOA error | 27.6 | 31.3 | 36.2 | 27.0 | 33.5 | 39.8 | ||
Frame recall | 98.0 | 82.1 | 65.0 | 96.3 | 78.1 | 62.7 | ||
The proposed algorithm | ER | 0.08 | 0.16 | 0.18 | 0.12 | 0.18 | 0.23 | |
F score | 95.8 | 91.0 | 89.5 | 93.7 | 89.5 | 86.2 | ||
DOA error | 28.5 | 31.0 | 32.6 | 27.2 | 32.4 | 38.4 | ||
Frame recall | 97.8 | 85.5 | 72.1 | 96.6 | 82.4 | 68.5 |
1 |
MESAROS A Sound event detection in the DCASE 2017 challenge. IEEE/ACM Trans. on Audio, Speech, Language Processing, 2019, 27 (6): 992- 1006.
doi: 10.1109/TASLP.2019.2907016 |
2 |
SALAMON J, BELLO J P Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters, 2017, 24 (3): 279- 283.
doi: 10.1109/LSP.2017.2657381 |
3 |
EVERS C, NAYLOR P A Acoustic SLAM. IEEE/ACM Trans. on Audio, Speech, and Language Processing, 2018, 26 (9): 1484- 1498.
doi: 10.1109/TASLP.2018.2828321 |
4 | MACK W, BHAEADWAJ U, CHAKRABARTY S, et al. Signal-aware broadband DOA estimation using attention mechanisms. Proc. of the IEEE International Conference Acoustics, Speech and Signal Processing, 2020, 4930- 4934. |
5 | VALENZISE G, GEROSA L, TAGLIASACCHI M, et al Scream and gunshot detection and localization for audio-surveillance systems. Proc. of the IEEE Conference on Advanced Video & Signal Based Surveillance, 2007, 21- 26. |
6 | MESAROS A, HEITTOLA T, ERONEN A, et al Acoustic event detection in real-life recordings. Proc. of the European Signal Processing Conference, 2010, 1267- 1271. |
7 | AKIR E C, HEITTOLA T, HUTTUNEN T, et al. Polyphonic sound event detection using multi-label deep neural networks. Proc. of the IEEE International Joint Conference on Neural Networks, 2015. DOI: 10.1109/IJCNN.2015.7280624. |
8 | PARASCANDOLO G, HUTTUNEN H, VIRTANEN T Recurrent neural networks for polyphonic sound event detection in real life recordings. Proc. of the IEEE International Conference Acoustics, Speech and Signal Processing, 2016, 6440- 6444. |
9 | ADAVANNE S, PARASCANDOLO G, PERTILA P, et al. Sound event detection in multichannel audio using spatial and harmonic features. Proc. of the Workshop on Detection Classification Acoustic Scenes and Events, 2016. DOI: 10.48550/arXiv.1706.02293. |
10 |
HAYASHI T, WATANABE S, TODA T, et al Duration-controlled LSTM for polyphonic sound event detection. IEEE/ACM Trans. on Audio, Speech, Language Processing, 2017, 25 (11): 2059- 2070.
doi: 10.1109/TASLP.2017.2740002 |
11 | ZOHRER M, PERNKOPF F Virtual adversarial training and data augmentation for acoustic event detection with gated recurrent neural networks. Proc. of the Interspeech, 2017, 493- 497. |
12 | ZHANG H M, MCLOUGHLIN I, SONG Y Robust sound event recognition using convolutional neural networks. Proc. of the IEEE International Conference Acoustics, Speech and Signal Processing, 2015, 559- 563. |
13 | PHAN H, HERTEL L, MAASS M, et al. Robust audio event recognition with 1-max pooling convolutional neural networks. Proc. of the Interspeech, 2016. DOI: 10.48550/arXiv.1604.06338. |
14 | ADAVANNE S, POLITIS A, VIRTANEN T. Multichannel sound event detection using 3D convolutional neural networks for learning inter-channel features. Proc. of the IEEE International Joint Conference on Neural Networks, 2018. DOI: 10.1109/ISCNN.2018.8489542. |
15 | LIM H, PARK J, LEE K, et al Rare sound event detection using 1D convolutional recurrent neural networks. Proc. of the Detection and Classification of Acoustic Scenes and Events, 2017, 80- 84. |
16 |
CAKIR E, PARASCANDOLO G, HEOTTOLA T, et al Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. on Audio, Speech, Language Processing, 2017, 25 (6): 1291- 1303.
doi: 10.1109/TASLP.2017.2690575 |
17 | ADAVANNE S, VIRTANEN T. A report on sound event detection with different binaural features. Proc. of the Detection and Classification Acoustic Scenes and Events, 2017. DOI: 10.1109/ICASSP.2017.7952260. |
18 | ADAVANNE S, PERTILA P, VIRTANEN T Sound event detection using spatial features and convolutional recurrent neural network. Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2017, 771- 775. |
19 | BUTKO T, PLA F G, SEGURA C, et al Two-source acoustic event detection and localization: online implementation in a smart-room. Proc. of the 19th Eurpean Signal Processing Conference, 2011, 1317- 1321. |
20 |
SCHMIDT R O Multiple emitter location and signal parameter estimation. IEEE Trans. on Antennas and Propagation, 1986, 34 (3): 276- 280.
doi: 10.1109/TAP.1986.1143830 |
21 | DIBIASE J H, SILVERMAN H F, BRANDSTEIN M S Robust localization in reverberant rooms in microphone arrays. Microphone Arrays Signal Processing Techniques & Applications, 2001, 2, 157- 180. |
22 |
POLITIS A, MESAROS A, ADAVANNE S, et al Overview and evaluation of sound event localization and detection in DCASE 2019. IEEE/ACM Trans. on Audio, Speech, and Language Processing, 2021, 29, 684- 698.
doi: 10.1109/TASLP.2020.3047233 |
23 | CHAKRABORTY R, NADEU C Sound-model-based acoustic source localization using distributed microphone arrays. Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2014, 619- 623. |
24 |
MESAROS A Detection and classification of acoustic scenes and events: outcome of the DCASE 2016 challenge. IEEE/ACM Trans. on Audio, Speech and Language Processing, 2018, 26 (2): 379- 393.
doi: 10.1109/TASLP.2017.2778423 |
25 |
NGUYEN T N T, GAN W S, RANJAN R, et al Robust source counting and DOA estimation using spatial pseudo-spectrum and convolutional neural network. IEEE/ACM Trans. on Audio, Speech, and Language Processing, 2020, 28, 2626- 2637.
doi: 10.1109/TASLP.2020.3019646 |
26 |
ZHAO X Y, CHEN S W, ZHOU L, et al Sound source localization based on SRP-PHAT spatial spectrum and deep neural network. Computers, Materials and Continua, 2020, 64 (1): 253- 271.
doi: 10.32604/cmc.2020.09848 |
27 |
CHAKRABARTY S, HABETS E A Multi-speaker DOA estimation using deep convolutional networks trained with noise signals. IEEE Journal Selected Topics in Signal Processing, 2019, 13 (1): 8- 21.
doi: 10.1109/JSTSP.2019.2901664 |
28 | HIRVONEN T Classification of spatial audio location and content using convolutional neural networks. Proc. of the Audio Engineering Society Convention, 2015, 9294. |
29 |
ADAVANNE S, POLITIS A, NIKUNEN J, et al Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE Journal of Selected Topics in Signal Processing, 2019, 13 (1): 34- 48.
doi: 10.1109/JSTSP.2018.2885636 |
30 | IOFFE S, SZEGEDY C Batch normalization: accelerating deep network training by reducing internal covariate shift. Proc. of the International Conference on Machine Learning, 2015, 448- 456. |
31 |
HU J, SHEN L, ALBANIE S, et al Squeeze-and-excitation networks. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2020, 42 (8): 2011- 2023.
doi: 10.1109/TPAMI.2019.2913372 |
32 | ABADI M, AGARWAL A, BARHAM P, et al. TensorFlow: large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/. |
33 | MESAROS A, HEITTOLA T, VIRTANEN T Metrics for polyphonic sound event detection. Applied Science, 2016, 6 (6): 162. |
34 | CAO Y, KONG Q X, IQBAL T, et al Polyphonic sound event detection and localization using a two-stage strategy. Proc. of the Detection Classification Acoustic Scenes and Events, 2019, 30- 34. |
[1] | Xiaolong XU, Shuai JIANG, Jinbo ZHAO, Xinheng WANG. DCEL: classifier fusion model for Android malware detection [J]. Journal of Systems Engineering and Electronics, 2024, 35(1): 163-177. |
[2] | Yuyuan ZHANG, Wenjun YAN, Limin ZHANG, Qing LING. FOLMS-AMDCNet: an automatic recognition scheme for multiple-antenna OFDM systems [J]. Journal of Systems Engineering and Electronics, 2023, 34(2): 307-323. |
[3] | Siting LYU, Xiaohui LI, Tao FAN, Jiawen LIU, Mingli SHI. Deep learning for fast channel estimation in millimeter-wave MIMO systems [J]. Journal of Systems Engineering and Electronics, 2022, 33(6): 1088-1095. |
[4] | Haifen YANG, Hao ZHANG, Houjun WANG, Zhengyang GUO. A novel approach for unlabeled samples in radiation source identification [J]. Journal of Systems Engineering and Electronics, 2022, 33(2): 354-359. |
[5] | Tao YE, Zongyang ZHAO, Jun ZHANG, Xinghua CHAI, Fuqiang ZHOU. Low-altitude small-sized object detection using lightweight feature-enhanced convolutional neural network [J]. Journal of Systems Engineering and Electronics, 2021, 32(4): 841-853. |
[6] | Zhao SUN, Chao MA, Liang WANG, Ran MENG, Shanshan PEI. A deep learning-based binocular perception system [J]. Journal of Systems Engineering and Electronics, 2021, 32(1): 7-20. |
[7] | Hongyin SHI, Yue LIU, Jianwen GUO, Mingxin LIU. ISAR autofocus imaging algorithm for maneuvering targets based on deep learning and keystone transform [J]. Journal of Systems Engineering and Electronics, 2020, 31(6): 1178-1185. |
[8] | Chuan LIN, Qing CHANG, Xianxu LI. Uplink NOMA signal transmission with convolutional neural networks approach [J]. Journal of Systems Engineering and Electronics, 2020, 31(5): 890-898. |
[9] | Liangkui LIN, Shaoyou WANG, Zhongxing TANG. Using deep learning to detect small targets in infrared oversampling images [J]. Journal of Systems Engineering and Electronics, 2018, 29(5): 947-952. |
[10] | Chongsheng Zhang, Pengyou Wang, Ke Chen, and Joni-Kristian K¨am¨ ar¨ainen. Identity-aware convolutional neural networks for facial expression recognition [J]. Systems Engineering and Electronics, 2017, 28(4): 784-. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||