通过引入全局注意力机制(global attention mechanism,GAM),在YOLOv7模型的基础上提出了YOLOv7-GAM模型,增强了对关键区域的关注能力.通过引入多尺度训练方案,提高模型对小目标的感知性能,并设计了一种两阶段增强检测算法,能够有效缓解因遮挡、重叠和小目标问题引起的检测性能下降.在输入图像分辨率为640$\times$640的情况下,该方案的检测速度可满足实际生产环境中的实时性需求,且其性能优于相关的算法.
Based on the YOLOv7 model, the YOLOv7-globel attention mechanism (YOLO-GAM) model was proposed to enhance the model's focus on critical regions. Additionally, a multi-scale training scheme was introduced to improve the model's ability to detect small targets, and a two-stage enhanced detection algorithm was designed, which effectively mitigated the degradation of detection performance caused by occlusion, overlapping, and small targets. With an input image resolution of $640\times 640$, the scheme's detection speed could meet the real-time requirements of the actual production environment and outperform the related algorithms in terms of performance.
[1] Viola P, Jones M. Rapid object detection using a boosted cascade of simple features [C]// Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2001: 990517.
[2] Dalal N, Triggs B. Histograms of oriented gradients for human detection [C]// 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005: 886-893.
[3] Felzenszwalb P F, Girshick R B, McAllester D, et al. Object detection with discriminatively trained part-based models [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627-1645.
[4] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classiflcation with deep convolutional neural networks [J]. Communications of the ACM, 2017, 60(6): 84-90.
[5] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 580-587.
[6] He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[7] Girshick R. Fast R-CNN [C]// Proceedings of the IEEE International Conference on Computer Vision. 2015: 1440-1448.
[8] Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. Advances in Neural Information Processing Systems, 2015, 39(6): 1137-1149.
[9] Lin T Y, Dollar P, Girshick R Á, et al. Feature pyramid networks for object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2117- 2125.
[10] Cai Z, Vasconcelos N. Cascade R-CNN: delving into high quality object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6154- 6162.
[11] Redmon J, Divvala S, Girshick R, et al. You Only Look Once: unifled, real-time object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 779-788.
[12] Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 7464-7475.
[13] Vaswani A. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010
[14] Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers [C]// European Conference on Computer Vision. 2020: 213-229.
[15] Zhao Y, Lu W, Xu S, et al. DETRs beat YOLOs on real-time object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 16965-16974.
[16] Varghese R, Sambath M. YOLOv8: a novel object detection algorithm with enhanced performance and robustness [C]// 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems. 2024: 1-6.
[17] Wang A, Chen H, Liu L, et al. YOLOv10: real-time end-to-end object detection [C]// 38th Conference on Neural Information Processing Systems. 2024: 14458.