一种基于深度图像的左右手同步分割改进方法

doi:10.12066/j.issn.1007-2861.2247

上海大学学报(自然科学版) ›› 2021, Vol. 27 ›› Issue (3): 454-465.doi: 10.12066/j.issn.1007-2861.2247

一种基于深度图像的左右手同步分割改进方法

徐正则¹^,², 张文俊¹()

1.上海大学上海电影学院, 上海 200072
2.华东师范大学传播学院, 上海 200241

收稿日期:2020-02-03 出版日期:2021-06-30 发布日期:2021-06-27
通讯作者: 张文俊 E-mail:wjzhang@shu.edu.cn
作者简介:张文俊(1959—), 男, 教授, 博士生导师, 博士, 研究方向为数字图像处理、数字多媒体技术等. E-mail: wjzhang@shu.edu.cn

Improved approach to simultaneous left- and right-hand segmentation from a single depth image

XU Zhengze¹^,², ZHANG Wenjun¹()

1. Shanghai Film Academy, Shanghai University, Shanghai 200072, China
2. School of Communication, East China Normal University, Shanghai 200241, China

Received:2020-02-03 Online:2021-06-30 Published:2021-06-27
Contact: ZHANG Wenjun E-mail:wjzhang@shu.edu.cn

摘要/Abstract

摘要：

基于深度图像的手势识别技术是下一代数字媒体设备的主要交互手段, 从深度图像中准确定位出"干净"的手部图像显得尤为重要. 提出了一种同步进行左右手分割的改进方法, 在传统 SegNet 算法的基础上, 加入了类别权重、转置卷积、混合式空洞卷积组合和编解码器之间的拼接合并跳层连接, 使左右手的 F2-Score 相较基准方法分别提高了 7.6% 和 5.9%. 推理速度在 GPU 上达到了 20.5 ms/帧, 可以实时处理深度图像序列. 实验证明本方法对深度图像进行左右手同步分割时可以得到更加精准的分割结果.

关键词: 深度图像, 手部分割, 改进方法

Abstract:

Hand gesture recognition technology based on depth image, which relies on the accurate identification of "clean" hand in the captured depth image, is the primary interactive mode for digital media devices of future generation. We propose an improved approach to simultaneous left- and right-hand segmentation, extending the traditional SegNet algorithm by strategies including class weight, transposed convolution, hybrid dilated convolution, and skip-connection between the encoder and decoder performed by concatenation. Our approach achieves higher F2-Score than the existing baseline by 7.6% for the left and 5.9% for the right hand. The processing on the GPU reaches 20.5 ms per frame at inference time, making real-time hand tracking in depth image sequences feasible. The results of the experiment demonstrate that our approach can considerably improve the performance of simultaneous left- and right-hand segmentation from a single depth map.

Key words: depth image, hand segmentation, improved approach

中图分类号:

TP37

徐正则, 张文俊. 一种基于深度图像的左右手同步分割改进方法[J]. 上海大学学报(自然科学版), 2021, 27(3): 454-465.

XU Zhengze, ZHANG Wenjun. Improved approach to simultaneous left- and right-hand segmentation from a single depth image[J]. Journal of Shanghai University（Natural Science Edition）, 2021, 27(3): 454-465.

图/表 7

图1

图2

图3

表1

表2

图4

图5

参考文献 32

[1]	Ren Z, Yuan J, Zhang Z. Robust hand gesture recognition based on finger-earth mover's distance with a commodity depth camera[C]// ACM International Conference on Multimedia. 2011: 1093-1096.
[2]	Tompson J, Stein M, LeCun Y, et al. Real-time continuous pose recovery of human hands using convolutional networks[J]. ACM Transactions on Graphics, 2014,33(5):1-10.
[3]	Sinha A, Choi C, Ramani K. Deephand: robust hand pose estimation by completing a matrix imputed with deep features[J]. Computer Vision and Pattern Recognition, 2016(1):4150-4158.
[4]	Khan R, Hanbury A, Stttinger J, et al. Color based skin classification[J]. Pattern Recognition Letters, 2012,33(2):157-163. doi: 10.1016/j.patrec.2011.09.032
[5]	Melax S, Keselman L, Orsten S. Dynamics based 3D skeletal hand tracking[C]// Proceedings of Graphics and Interface. 2013: 63-70.
[6]	Sridhar S, Oulasvirta A, Theobalt C. Interactive markerless articulated hand motion tracking using RGB and depth data[C]// IEEE International Conference on Computer Vision. 2013: 2456-2463.
[7]	Intel. Intel RealSense SDK for Windows [EB/OL]. [2020-01-20]. https://software.intel.com/en-us/articles/realsense-sdk-windows-eol.
[8]	Oikonomidis I, Kyriazis N, Argyros A. Efficient model-based 3D tracking of hand articulations using kinect [C]// The British Machine Vision Conference. 2011: 101.1-101.11.
[9]	Romero J, Kjellstrom H, Kragic D. Monocular real-time 3D articulated hand pose estimation[C]// IEEE-RAS International Conference on Humanoid Robots. 2009.
[10]	Shotton J, Sharp T, Kipman A. Real-time human pose recognition in parts from single depth images[J]. Communications of the ACM, 2013,56(1):116-124. doi: 10.1145/2398356.2398381
[11]	Sharp T, Keskin C, Robertson D P, et al. Accurate, robust, and flexible real-time hand tracking[C]// ACM Conference on Human Factors in Computing Systems. 2015: 3633-3642.
[12]	Srinath S, Franziska M, Antti O, et al. Fast and robust hand tracking using detection-guided optimization[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3213-3221.
[13]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2014: 580-587.
[14]	Girshick R. Fast R-CNN[C]// IEEE International Conference on Computer Vision. 2015: 1440-1448.
[15]	Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]// IEEE Transactions on Pattern Analysis and Machine Intelligence. 2017: 1137-1149.
[16]	Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2016: 779-788.
[17]	Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2017: 6517-6525.
[18]	Redmon J, Farhadi A. YOLOv3: an incremental improvement[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2018.
[19]	James S S Ⅲ, Rogez G, Yang Y, et al. Depth-based hand pose estimation: data, methods, and challenges[C]// IEEE International Conference on Computer Vision. 2015: 1868-1876.
[20]	Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017,39(4):640-651. doi: 10.1109/TPAMI.2016.2572683 pmid: 27244717
[21]	Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation[C]// IEEE International Conference on Computer Vision. 2015: 1520-1528.
[22]	Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017,39:2481-2495. doi: 10.1109/TPAMI.2016.2644615 pmid: 28060704
[23]	Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture[C]// IEEE International Conference on Computer Vision. 2015: 2650-2658.
[24]	Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine, 2018,40(4):834-848. doi: 10.1109/TPAMI.2017.2699184
[25]	Wang P, Chen P F. Understanding convolution for semantic segmentation[C]// IEEE Winter Conference on Applications of Computer Vision. 2018: 1451-1460.
[26]	He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
[27]	Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation[C]// International Conference on Medical Image Computing and Computer. 2015: 234-241.
[28]	Xu C, Cheng L. Efficient hand pose estimation from a single depth image[C]// IEEE International Conference on Computer Vision. 2013: 3456-3462.
[29]	Tompson J, Stein M, Lecun Y, et al. NYU hand pose dataset [EB/OL]. [2020-01-20]. https://jonathantompson.github.io/NYU_Hand_Pose_Dataset.htm.
[30]	Saric M. LibHand: a library for hand articulation [EB/OL]. [2020-01-20]. http://www.libhand.org/.
[31]	Zimmermann C, Brox T. Rendered handpose dataset [EB/OL]. [2020-01-20]. https://lmb.informatik.uni-freiburg.de/resources/datasets/RenderedHandposeDataset.en.html.
[32]	Wetzler A, Slossberg R, Kimmel R. HandNet [EB/OL]. [2020-01-20]. http://www.cs.technion.ac.il/~twerd/HandNet/.

一种基于深度图像的左右手同步分割改进方法

Improved approach to simultaneous left- and right-hand segmentation from a single depth image

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 32

相关文章 3

编辑推荐

Metrics

本文评价

[1]	XU Zhengze, ZHANG Wenjun. Hand segmentation from a single depth image based on histogram threshold selection and shallow CNN[J]. 上海大学学报(自然科学版), 2018, 24(5): 675-685.
[2]	李智华张青春刘振. 蓄电池剩余电量的模糊预测[J]. 上海大学学报(自然科学版), 2009, 15(4): 364-368.
[3]	王闯, 王永芳, 练俊杰. 基于颜色复杂度和结构张量的恰可察觉失真模型[J]. 上海大学学报(自然科学版), 0, (): 1-.