基于情绪一致性的三维人脸重建

黄东晋; 俞乐洋; 石永生; 郑楚; 钱纪宇

doi:10.12066/j.issn.1007-2861.2574

上海大学学报(自然科学版) >

2025 , Vol. 31 >Issue 3: 475 - 486

DOI: https://doi.org/10.12066/j.issn.1007-2861.2574

计算机科学

基于情绪一致性的三维人脸重建

黄东晋 ,
俞乐洋 ,
石永生 ,
郑楚 ,
钱纪宇

展开

1. 上海大学上海电影学院, 上海 200072;
2. 上海电影特效工程技术研究中心, 上海 200072

收稿日期: 2023-11-03

网络出版日期: 2025-07-22

基金资助

上海市人才发展资助项目(2021016);国家档案局科技资助项目(2023-X-036)

收起

Three-dimensional facial reconstruction based on emotional consistency

HUANG Dongjin ,
YU Leyang ,
SHI Yongsheng ,
ZHENG Chu ,
QIAN Jiyu

Expand

1. Shanghai Film Academy, Shanghai University, Shanghai 200072, China;
2. Shanghai Engineering Research Center of Motion Picture Special Efiects, Shanghai 200072, China

Received date: 2023-11-03

Online published: 2025-07-22

Fold

摘要

从单目RGB图片中重建三维人脸是一项非常具有挑战性的计算机视觉任务.由于缺乏带有人脸表情标签的数据集,目前大多数三维人脸重建方案都缺乏对人脸表情的有效监督,导致不能准确还原输入人脸的表情信息.因此,提出一种基于情绪一致性的三维人脸重建方法,在训练时引入情绪感知一致性损失,对人脸情绪形成自监督,从而隐式地激励重建人脸与输入人脸具有一致的面部表情.此外,还提出一种轻量级人脸重建框架,用MobileNetV2代替深度网络ResNet50来回归人脸参数,以提升模型在CPU端的推理速度.实验结果表明,本方法能够基于单张人脸图像高质量地重建三维人脸模型；在人脸表情捕捉准确度和三维人脸重建精度两方面均优于一些主流的人脸重建方法.同时,所采用的轻量级人脸重建框架显著提升了模型在CPU端的推理速度,拓展了模型在计算资源受限场景下的应用前景.

关键词： 三维人脸重建; 情绪一致性; 自监督; MobileNetV2

本文引用格式

黄东晋 , 俞乐洋 , 石永生 , 郑楚 , 钱纪宇 . 基于情绪一致性的三维人脸重建[J]. 上海大学学报(自然科学版), 2025 , 31(3) : 475 -486 . DOI: 10.12066/j.issn.1007-2861.2574

Abstract

Reconstructing three-dimensional（3D） faces from monocular RGB images is a challenging computer-vision task. Owing to the dearth of datasets with facial-expression labels, most 3D facial reconstruction schemes lack the supervision of facial expressions,thus resulting in the inaccurate reconstruction of input facial expressions. Therefore, this study proposes a 3D facial reconstruction method based on emotional consistency. In this method, a loss of emotional perception consistency is introduced during training to selfsupervise facial emotions, thus enabling the reconstructed face to exhibit the same facial expression as the input face. Additionally, this study proposes a lightweight framework that uses MobileNetV2 to replace the deep network ResNet50 to regress face parameters and improve the inference speed of the model on the CPU side. Experimental results show that the proposed method can reconstruct a high-quality 3D face model based on a single-face image. The proposed method is superior to some mainstream face-reconstruction methods in terms of facial-expression capture and 3D face-reconstruction accuracies. Additionally,the lightweight face-reconstruction framework adopted in this study significantly improves the inference speed on the CPU side and expands the application prospects of the model in computing-resource-constrained scenarios.

Key words： three-dimensional facial reconstruction; emotional consistency; self-supervised; MobileNetV2

参考文献

[1] Paysan P, Knothe R, Amberg B, et al. A 3D face model for pose and illumination invariant face recognition[C]//2009 6th IEEE International Conference on Advanced Video and Signal Based Surveillance. 2009:296-301.
[2] Zhang L, Zeng C, Zhang Q, et al. Video-driven neural physically-based facial asset for production[J]. ACM Transactions on Graphics (ToG), 2022, 41(6):1-16.
[3] Kuang C, Kephart J O, Ji Q. AU-aware dynamic 3D face reconstruction from videos with transformer[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2024:6237-6247.
[4] Hong F T, Zhang L, Shen L, et al. Depth-aware generative adversarial network for talking head video generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022:3397-3406.
[5] Rai A, Gupta H, Pandey A, et al. Towards realistic generative 3D face models[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2024:3738-3748.
[6] Li H, Wang B, Cheng Y, et al. DSFNet:dual space fusion network for occlusion-robust 3D dense face alignment[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023:4531-4540.
[7] Zhang S, Song F, Song G, et al. SDRNet:shape decoupled regression network for 3D face reconstruction[C]//2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2023:1-5.
[8] Deng Y, Yang J, Xu S, et al. Accurate 3D face reconstruction with weakly-supervised learning:from single image to image set[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2019:1-11.
[9] Feng Y, Feng H, Black M J, et al. Learning an animatable detailed 3D face model from in-the-wild images[J]. ACM Transactions on Graphics (ToG), 2021, 40(4):1-13.
[10] Guo J, Zhu X, Yang Y, et al. Towards fast, accurate and stable 3D dense face alignment[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2020:152-168.
[11] Mo L, Li H, Zou C, et al. Towards accurate facial motion retargeting with identity-consistent and expression-exclusive constraints[C]//Proceedings of the AAAI Conference on Artiflcial Intelligence. 2022:1981-1989.
[12] Deng Z, Liang Y, Pan J, et al. Fast 3D face reconstruction from a single image combining attention mechanism and graph convolutional network[J]. The Visual Computer, 2023, 39(11):5547-5561.
[13] Chai Z, Zhang T, He T, et al. HiFace:high-fldelity 3D face reconstruction by learning static and dynamic details[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023:9087-9098.
[14] Otto C, Chandran P, Zoss G, et al. A perceptual shape loss for monocular 3D face reconstruction[C]//Computer Graphics Forum. 2023:e14945.
[15] Sandler M, Howard A, Zhu M, et al. MobileNetV2:inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:4510-4520.
[16] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
[17] Chaudhuri B, Vesdapunt N, Shapiro L, et al. Personalized face modeling for improved face reconstruction and motion retargeting[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2020:142-160.
[18] Blanz V, Vetter T. A morphable model for the synthesis of 3D faces[M]//Whitton M C. Seminal graphics papers:pushing the boundaries. New York:Association for Computing Machinery, 2023:157-164.
[19] Romdhani S, Vetter T. Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). 2005:986-993.
[20] Gerig T, Morel-Forster A, Blumer C, et al. Morphable face models:an open framework[C]//201813th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2018). 2018:75-82.
[21] Sanyal S, Bolkart T, Feng H, et al. Learning to regress 3D face shape and expression from an image without 3D supervision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019:7763-7772.
[22] DaneĚČek R, Black M J, Bolkart T. EMOCA:emotion driven monocular face capture and animation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022:20311-20322.
[23] Richardson E, Sela M, Kimmel R. 3D face reconstruction by learning from synthetic data[C]//2016 Fourth International Conference on 3D Vision (3DV). 2016:460-469.
[24] Liu F, Zhu R, Zeng D, et al. Disentangling features in 3D face shapes for joint face reconstruction and recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:5216-5225.
[25] Cao C, Weng Y, Zhou S, et al. FaceWarehouse:a 3D facial expression database for visual computing[J]. IEEE Transactions on Visualization and Computer Graphics, 2013, 20(3):413- 425.
[26] Sumner R W, Popović J. Deformation transfer for triangle meshes[J]. ACM Transactions on Graphics (ToG), 2004, 23(3):399-405.
[27] Lassner C, Zollhofer M. Pulsar:e-cient sphere-based neural rendering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021:1440-1449.
[28] Tu Z, Weng D, Liang B, et al. Expression retargeting from images to three-dimensional face models represented in texture space[J]. Journal of the Society for Information Display, 2022, 30(10):775-788.
[29] Yu C, Wang J, Peng C, et al. BiSeNet:bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018:325-341.
[30] Bulat A, Tzimiropoulos G. How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,0003D facial landmarks)[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017:1021-1030.
[31] Yan Y, Lu K, Xue J, et al. FEAFA:a well-annotated dataset for facial expression analysis and 3D facial animation[C]//2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). 2019:96-101.
[32] Feng Y, Wu F, Shao X, et al. Joint 3D face reconstruction and dense alignment with position map regression network[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018:534-551.
[33] Zhu X, Liu X, Lei Z, et al. Face alignment in full pose range:a 3D total solution[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 41(1):78-92.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献