Hand segmentation from a single depth image based on histogram threshold selection and shallow CNN
收稿日期: 2018-06-15
网络出版日期: 2018-10-26
Hand segmentation from a single depth image based on histogram threshold selection and shallow CNN
Received date: 2018-06-15
Online published: 2018-10-26
XU Zhengze, ZHANG Wenjun . Hand segmentation from a single depth image based on histogram threshold selection and shallow CNN[J]. 上海大学学报(自然科学版), 2018 , 24(5) : 675 -685 . DOI: 10.12066/j.issn.1007-2861.2073
Real-time hand gesture recognition technology significantly improves the user's experience for virtual reality/augmented reality (VR/AR) applications, which relies on the identification of the orientation of the hand in captured images or videos. A new three-stage pipeline approach for fast and accurate hand segmentation for the hand from a single depth image is proposed. Firstly, a depth frame is segmented into several regions by histogram-based threshold selection algorithm and by tracing the exterior boundaries of objects after thresholding. Secondly, each segmentation proposal is evaluated by a three-layers shallow convolutional neural network (CNN) to determine whether or not the boundary is associated with the hand. Finally, all hand components are merged as the hand segmentation result. Compared with algorithms based on random decision forest (RDF), the experimental results demonstrate that the approach achieves better performance with high-accuracy (88.34% mean intersection over union, mIoU) and a shorter processing time ($\le $8 ms).
| [1] | Tompson J, Stein M, LeCun Y, et al. Real-time continuous pose recovery of human hands using convolutional networks[J]. ACM Transactions on Graphics, 2014,33(5):1-10. |
| [2] | Sinha A, Choi C, Ramani K. Deephand: robust hand pose estimation by completing a matrix imputed with deep features[J]. Computer Vision and Pattern Recognition, 2016(1):4150-4158. |
| [3] | Ren Z, Yuan J, Zhang Z. Robust hand gesture recognition based on finger-earth mover's distance with a commodity depth camera[C]// ACM International Conference on Multimedia. 2011: 1093-1096. |
| [4] | Zhu X, Yang J, Waibel A. Segmenting hands of arbitrary color[C]// IEEE International Conference on Automatic Face and Gesture Recognition. 2000: 446-455. |
| [5] | Khan R, Hanbury A, Stttinger J, et al. Color based skin classification[J]. Pattern Recognition Letters, 2012,33(2):157-163. |
| [6] | Li C, Kitani K M. Pixel-level hand detection in ego-centric videos[J]. Computer Vision and Pattern Recognition, 2013,9(4):3570-3577. |
| [7] | Oikonomidis I, Kyriazis N, Argyros A A. Full DoF tracking of a hand interacting with an object by modeling occlusions and physical constraints[C]// IEEE International Conference on Computer Vision. 2011: 2088-2095. |
| [8] | Tzionas D, Gall J. 3D object reconstruction from hand-object interactions[C]// IEEE International Conference on Computer Vision. 2015: 729-737. |
| [9] | Liang H, Yuan J, Thalmann D. 3D fingertip and palm tracking in depth image sequences[C]// ACM International Conference on Multimedia. 2012: 785-788. |
| [10] | Qin S, Zhu X, Yang Y, et al. Real-time hand gesture recognition from depth images using convex shape decomposition method[J]. Journal of Signal Processing Systems, 2014,74(1):47-58. |
| [11] | Malassiotis S, Strintzis M G. Real-time hand posture recognition using range data[J]. Image and Vision Computing, 2008,26(7):1027-1037. |
| [12] | Sharp T, Keskin C, Robertson D P, et al. Accurate, robust, and flexible real-time hand tracking[C]// ACM Conference on Human Factors in Computing Systems. 2015: 3633-3642. |
| [13] | Shotton J, Fitzgibbon A, Cook M, et al. Real-time human pose recognition in parts from single depth images[J]. Communications of the ACM, 2013,56(1):116-124. |
| [14] | Intel. Intel RealSense overview [EB/OL].[2018-07-01]. https://www.intel.cn/content/www/cn/zh/architecture-and-technology/realsense-overview.html. |
| [15] | Glasbey C A. An analysis of histogram-based thresholding algorithms[J]. Graphical Models and Image Processing, 1993,55(6):532-537. |
| [16] | Aaron W, Ron S, Ron K. HandNet database [EB/OL]. [2018-07-01]. http://www.cs.technion.ac.il/~twerd/HandNet/. |
| [17] | LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998,86(11):2278-2324. |
| [18] | Kingma D, Ba J. Adam: a method for stochastic optimization[C]// 3rd International Conference for Learning Representations (ICLR). 2015. |
| [19] | Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks[J]. Journal of Machine Learning Research, 2010(9):249-256. |
| [20] | Bambach S, Lee S, Crandall D J, et al. Lending a hand: detecting hands and recognizing activities in complex egocentric interactions[C]// IEEE International Conference on Computer Vision. 2016: 1949-1957. |
| [21] | Keskin C, Kirac F, Kara Y E, et al. Hand pose estimation and hand shape classification using multi-layered randomized decision forests[C]// European Conference on Computer Vision. 2012: 852-863. |
| [22] | Liang H, Yuan J, Thalmann D. Egocentric hand pose estimation and distance recovery in a single RGB image[C]// IEEE International Conference on Multimedia & Expo. 2015: 1-6. |
/
| 〈 |
|
〉 |