中国综合性科技类核心期刊(北大核心)

中国科学引文数据库来源期刊(CSCD)

美国《化学文摘》(CA)收录

美国《数学评论》(MR)收录

俄罗斯《文摘杂志》收录

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度图像预旋转的手势估计改进方法

徐正则 张文俊

徐正则, 张文俊. 基于深度图像预旋转的手势估计改进方法[J]. 华东师范大学学报(自然科学版), 2020, (4): 124-133. doi: 10.3969/j.issn.1000-5641.201921004
引用本文: 徐正则, 张文俊. 基于深度图像预旋转的手势估计改进方法[J]. 华东师范大学学报(自然科学版), 2020, (4): 124-133. doi: 10.3969/j.issn.1000-5641.201921004
XU Zhengze, ZHANG Wenjun. An improved method for hand gesture estimation based on depth image pre-rotation[J]. Journal of East China Normal University (Natural Sciences), 2020, (4): 124-133. doi: 10.3969/j.issn.1000-5641.201921004
Citation: XU Zhengze, ZHANG Wenjun. An improved method for hand gesture estimation based on depth image pre-rotation[J]. Journal of East China Normal University (Natural Sciences), 2020, (4): 124-133. doi: 10.3969/j.issn.1000-5641.201921004

基于深度图像预旋转的手势估计改进方法

doi: 10.3969/j.issn.1000-5641.201921004
基金项目: 华东师范大学实验技术研究项目(20190704)
详细信息
    作者简介:

    徐正则, 男, 博士研究生, 研究方向为数字媒体技术. E-mail: zzxu@comm.ecnu.edu.cn

    通讯作者:

    张文俊, 男, 教授, 博士生导师, 研究方向为数字图像处理. E-mail: wjzhang@shu.edu.cn

  • 中图分类号: TP391

An improved method for hand gesture estimation based on depth image pre-rotation

  • 摘要: 基于深度图像的手势估计比人体姿势估计更加困难, 部分原因在于算法不能很好地识别同一个手势经旋转后的不同外观样式. 提出了一种基于卷积神经网络(Convolutional Neural Network, CNN)推测预旋转角度的手势姿态估计改进方法: 先利用自动算法标注的最佳旋转角度来训练CNN; 在手势识别之前, 用训练好的CNN模型回归计算出应预旋转的角度, 然后再对手部深度图像进行旋转; 最后采用随机决策森林(Random Decision Forest, RDF)方法对手部像素进行分类, 聚类产生出手部关节位置. 实验证明该方法可以减少预测的手部关节位置与准确位置之间的误差, 手势姿态估计的正确率平均上升了约4.69%.
  • 图  1  方法整体步骤

    Fig.  1  Flow diagram of the proposed method

    图  2  旋转角度映射示意图

    Fig.  2  Illustration of rotation angle mapping

    图  3  VGGNet架构

    Fig.  3  Framework of VGGNet

    图  4  通过旋转确定最佳角度 θbest

    Fig.  4  θbest is calculated by the iteration of image rotation

    图  5  未经偏移的图像旋转和经过偏移的图像旋转

    Fig.  5  Image rotation with and without translation

    图  6  数据扩增举例

    Fig.  6  Examples by data augmentation

    图  7  指尖关节点估计值与准确值的平均欧氏距离

    Fig.  7  Average Euclidean distance between the estimated value of the fingertip joint and the exact value

    表  1  常用的手部深度图像数据集

    Tab.  1  Dataset of depth images for hand gesture estimation

    数据集产生方式数据集大小视角视角数量被试者数量
    ICL[27] 真实数据 + 手动标注 331 000 第三人称 1 10
    NYU[2] 真实数据 + 自动标注 72 757 第三人称 3 1
    Libhand[28] 合成数据 无限制 无限制 无限制 无限制
    下载: 导出CSV

    表  2  深度像素的分类结果

    Tab.  2  Results of classification based on depth pixels

    深度图像旋转角度: 0°深度图像旋转角度: CNN计算出的回归值深度图像旋转角度: 训练数据提供的最佳值
    PrecisionRecallF1PrecisionRecallF1PrecisionRecallF1
    大拇指 0.673 9 0.695 7 0.684 6 0.685 1 0.702 3 0.693 6 0.696 2 0.726 3 0.710 9
    食指 0.578 8 0.663 7 0.618 3 0.603 5 0.691 0 0.644 3 0.641 2 0.731 9 0.683 6
    中指 0.566 8 0.654 7 0.607 6 0.580 1 0.666 5 0.620 3 0.604 3 0.696 8 0.647 3
    无名指 0.494 2 0.599 6 0.541 8 0.493 2 0.604 3 0.543 1 0.536 7 0.651 7 0.588 6
    小指 0.638 7 0.627 8 0.633 2 0.647 4 0.634 6 0.640 9 0.677 3 0.658 3 0.667 7
    下载: 导出CSV
  • [1] ZHOU R, YUAN J S, ZHANG Z Y. Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera [C]// Proceedings of the 19th International Conference on Multimedia. ACM, 2011: 1093–1096. DOI:  10.1145/2072298.2071946.
    [2] TOMPSON J, STEIN M, LECUN Y, et al. Real-time continuous pose recovery of human hands using convolutional networks [J]. ACM Transactions on Graphics, 2014, 33(5): Article number 169. DOI:  10.1145/2629500.
    [3] SINHA A, CHOI C, RAMANI K. Deephand: Robust hand pose estimation by completing a matrix imputed with deep features [J]. Computer Vision and Pattern Recognition, 2016(1): 4150-4158.
    [4] KHAN R, HANBURY A, STTTINGER J, et al. Color based skin classification [J]. Pattern Recognition Letters, 2012, 33(2): 157-163. DOI:  10.1016/j.patrec.2011.09.032.
    [5] GE L H, LIANG H, YUAN J S, et al. 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 5679-5688. DOI:  10.1109/CVPR.2017.602.
    [6] YUAN S X, YE Q, STENGER B, et al. BigHand2.2M benchmark: Hand pose dataset and state of the art analysis [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 2605-2613. DOI:  10.1109/CVPR.2017.279.
    [7] SHOTTON J, GIRSHICK R, FITZGIBBON A, et al. Efficient human pose estimation from single depth images [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2821-2840. DOI:  10.1109/TPAMI.2012.241.
    [8] QIAN C, SUN X, WEI Y C, et al. Realtime and robust hand tracking from depth [C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2014: 1106-1113. DOI:  10.1109/CVPR.2014.145.
    [9] XU C, CHENG L. Efficient hand pose estimation from a single depth image [C]// 2013 IEEE International Conference on Computer Vision. IEEE, 2013: 3456-3462. DOI:  10.1109/ICCV.2013.429.
    [10] CAMPBELL L W, BECKER D A, AZARBAYEJANI A, et al. Invariant features for 3-D gesture recognition [C]// Proceedings of the Second International Conference on Automatic Face and Gesture Recognition. IEEE, 1996: 157-162. DOI:  10.1109/AFGR.1996.557258.
    [11] JOONGROCK K, SUNJIN Y, DONGCHUL K, et al L. An adaptive local binary pattern for 3D hand tracking [J]. Pattern Recognition, 2017, 61: 139-152. DOI:  10.1016/j.patcog.2016.07.039.
    [12] KESKIN C, KIRAÇ F, KARA Y E, et al. Real time hand pose estimation using depth sensors [C]// 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). IEEE, 2011: 1228−1234. DOI:  10.1109/ICCVW.2011.6130391.
    [13] LAPTEV D, SAVINOV N, BUHMANN J M, et al. TI-POOLING: Transformation-invariant pooling for feature learning in convolutional neural networks [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016: 289-297. DOI:  10.1109/CVPR.2016.38.
    [14] BOUREAU Y L, PONCE J, LECUN Y. A theoretical analysis of feature pooling in visual recognition [C]// Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010: 111–118.
    [15] LEPETIT V, LAGGER P, FUA P. Randomized trees for real-time keypoint recognition [C]// 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). IEEE, 2005: 775–781. DOI:  10.1109/CVPR.2005.288.
    [16] CHENG G, ZHOU P C, HAN J W. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images [J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(12): 7405-7415. DOI:  10.1109/TGRS.2016.2601622.
    [17] MELAX S, KESELMAN L, ORSTEN S. Dynamics based 3D skeletal hand tracking [C]// Proceedings of the 2013 Graphics Interface Conference. ACM, 2013: 63-70. DOI: 10.1145/2448196.2448232.
    [18] SRIDHAR S, OULASVIRTA A, THEOBALT C. Interactive markerless articulated hand motion tracking using RGB and depth data [C]// 2013 IEEE International Conference on Computer Vision. IEEE, 2013: 2456-2463. DOI:  10.1109/ICCV.2013.305.
    [19] OIKONOMIDIS I, KYRIAZIS N, ARGYROS A. Efficient model-based 3D tracking of hand articulations using kinect [C]// Proceedings of the British Machine Vision Conference. BMVC, 2011: 101.1-101.11. DOI:  10.5244/C.25.101.
    [20] ROMERO J, KJELLSTROM H, KRAGIC D. Monocular real-time 3D articulated hand pose estimation [C]// 2009 9th IEEE-RAS International Conference on Humanoid Robots. IEEE, 2009: 87-92. DOI:  10.1109/ICHR.2009.5379596.
    [21] SUN X, WEI Y C, LIANG S, et al. Cascaded hand pose regression [C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2015: 824-832. DOI:  10.1109/CVPR.2015.7298683.
    [22] TOMPSON J, JAIN A, LECUN Y, et al. Joint training of a convolutional network and a graphical model for human pose estimation [EB/OL]. (2014-09-17)[2019-03-01]. https://arxiv.org/pdf/1406.2984.pdf.
    [23] JHINN W L, GOH K O M, HOE L S, et al. A contactless rotation-invariant palm vein recognition system [J]. Advanced Science Letters, 2018, 24(2): 1143-1148. DOI:  10.1166/asl.2018.10704.
    [24] CHENG G, HAN J W, ZHOU P C, et al. Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection [J]. IEEE Transactions on Image Processing, 2019, 28(1): 265-278. DOI:  10.1109/TIP.2018.2867198.
    [25] SHOTTON J, SHARP T, KIPMAN A, et al. Realtime human pose recognition in parts from single depth images [J]. Communications of the ACM, 2013, 56(1): 116-124. DOI:  10.1145/2398356.2398381.
    [26] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. (2015-04-10)[2019-03-01]. https://arxiv.org/pdf/1409.1556.pdf.
    [27] TANG D H, CHANG H J, TEJANI A, et al. Latent regression forest: structured estimation of 3D articulated hand posture [C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2014: 3786-3793. DOI:  10.1109/CVPR.2014.490.
    [28] ŠARI’C M. Libhand: A library for hand articulation [EB/OL]. [2019-03-01]. http://www.libhand.org/.
    [29] KINGMA D P, LEI BA J. Adam: A method for stochastic optimization [EB/OL]. (2017-01-30)[2019-03-01]. https://arxiv.org/pdf/1412.6980v9.pdf.
    [30] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks [J]. Journal of Machine Learning Research, 2010, 9: 249-256.
  • 加载中
图(7) / 表(2)
计量
  • 文章访问数:  107
  • HTML全文浏览量:  95
  • PDF下载量:  2
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-04-28
  • 网络出版日期:  2020-07-20
  • 刊出日期:  2020-07-25

目录

    /

    返回文章
    返回