Three-dimensional facial reconstruction based on emotional consistency

HUANG Dongjin; YU Leyang; SHI Yongsheng; ZHENG Chu; QIAN Jiyu

doi:10.12066/j.issn.1007-2861.2574

Journal of Shanghai University >

2025 , Vol. 31 >Issue 3: 475 - 486

DOI: https://doi.org/10.12066/j.issn.1007-2861.2574

Computer Science

Three-dimensional facial reconstruction based on emotional consistency

HUANG Dongjin ,
YU Leyang ,
SHI Yongsheng ,
ZHENG Chu ,
QIAN Jiyu

Expand

1. Shanghai Film Academy, Shanghai University, Shanghai 200072, China;
2. Shanghai Engineering Research Center of Motion Picture Special Efiects, Shanghai 200072, China

Received date: 2023-11-03

Online published: 2025-07-22

Fold

Abstract

Reconstructing three-dimensional（3D） faces from monocular RGB images is a challenging computer-vision task. Owing to the dearth of datasets with facial-expression labels, most 3D facial reconstruction schemes lack the supervision of facial expressions,thus resulting in the inaccurate reconstruction of input facial expressions. Therefore, this study proposes a 3D facial reconstruction method based on emotional consistency. In this method, a loss of emotional perception consistency is introduced during training to selfsupervise facial emotions, thus enabling the reconstructed face to exhibit the same facial expression as the input face. Additionally, this study proposes a lightweight framework that uses MobileNetV2 to replace the deep network ResNet50 to regress face parameters and improve the inference speed of the model on the CPU side. Experimental results show that the proposed method can reconstruct a high-quality 3D face model based on a single-face image. The proposed method is superior to some mainstream face-reconstruction methods in terms of facial-expression capture and 3D face-reconstruction accuracies. Additionally,the lightweight face-reconstruction framework adopted in this study significantly improves the inference speed on the CPU side and expands the application prospects of the model in computing-resource-constrained scenarios.

Key words： three-dimensional facial reconstruction; emotional consistency; self-supervised; MobileNetV2

Cite this article

HUANG Dongjin , YU Leyang , SHI Yongsheng , ZHENG Chu , QIAN Jiyu . Three-dimensional facial reconstruction based on emotional consistency[J]. Journal of Shanghai University, 2025 , 31(3) : 475 -486 . DOI: 10.12066/j.issn.1007-2861.2574

References

[1] Paysan P, Knothe R, Amberg B, et al. A 3D face model for pose and illumination invariant face recognition[C]//2009 6th IEEE International Conference on Advanced Video and Signal Based Surveillance. 2009:296-301.
[2] Zhang L, Zeng C, Zhang Q, et al. Video-driven neural physically-based facial asset for production[J]. ACM Transactions on Graphics (ToG), 2022, 41(6):1-16.
[3] Kuang C, Kephart J O, Ji Q. AU-aware dynamic 3D face reconstruction from videos with transformer[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2024:6237-6247.
[4] Hong F T, Zhang L, Shen L, et al. Depth-aware generative adversarial network for talking head video generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022:3397-3406.
[5] Rai A, Gupta H, Pandey A, et al. Towards realistic generative 3D face models[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2024:3738-3748.
[6] Li H, Wang B, Cheng Y, et al. DSFNet:dual space fusion network for occlusion-robust 3D dense face alignment[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023:4531-4540.
[7] Zhang S, Song F, Song G, et al. SDRNet:shape decoupled regression network for 3D face reconstruction[C]//2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2023:1-5.
[8] Deng Y, Yang J, Xu S, et al. Accurate 3D face reconstruction with weakly-supervised learning:from single image to image set[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2019:1-11.
[9] Feng Y, Feng H, Black M J, et al. Learning an animatable detailed 3D face model from in-the-wild images[J]. ACM Transactions on Graphics (ToG), 2021, 40(4):1-13.
[10] Guo J, Zhu X, Yang Y, et al. Towards fast, accurate and stable 3D dense face alignment[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2020:152-168.
[11] Mo L, Li H, Zou C, et al. Towards accurate facial motion retargeting with identity-consistent and expression-exclusive constraints[C]//Proceedings of the AAAI Conference on Artiflcial Intelligence. 2022:1981-1989.
[12] Deng Z, Liang Y, Pan J, et al. Fast 3D face reconstruction from a single image combining attention mechanism and graph convolutional network[J]. The Visual Computer, 2023, 39(11):5547-5561.
[13] Chai Z, Zhang T, He T, et al. HiFace:high-fldelity 3D face reconstruction by learning static and dynamic details[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023:9087-9098.
[14] Otto C, Chandran P, Zoss G, et al. A perceptual shape loss for monocular 3D face reconstruction[C]//Computer Graphics Forum. 2023:e14945.
[15] Sandler M, Howard A, Zhu M, et al. MobileNetV2:inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:4510-4520.
[16] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
[17] Chaudhuri B, Vesdapunt N, Shapiro L, et al. Personalized face modeling for improved face reconstruction and motion retargeting[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2020:142-160.
[18] Blanz V, Vetter T. A morphable model for the synthesis of 3D faces[M]//Whitton M C. Seminal graphics papers:pushing the boundaries. New York:Association for Computing Machinery, 2023:157-164.
[19] Romdhani S, Vetter T. Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). 2005:986-993.
[20] Gerig T, Morel-Forster A, Blumer C, et al. Morphable face models:an open framework[C]//201813th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2018). 2018:75-82.
[21] Sanyal S, Bolkart T, Feng H, et al. Learning to regress 3D face shape and expression from an image without 3D supervision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019:7763-7772.
[22] DaneĚČek R, Black M J, Bolkart T. EMOCA:emotion driven monocular face capture and animation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022:20311-20322.
[23] Richardson E, Sela M, Kimmel R. 3D face reconstruction by learning from synthetic data[C]//2016 Fourth International Conference on 3D Vision (3DV). 2016:460-469.
[24] Liu F, Zhu R, Zeng D, et al. Disentangling features in 3D face shapes for joint face reconstruction and recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:5216-5225.
[25] Cao C, Weng Y, Zhou S, et al. FaceWarehouse:a 3D facial expression database for visual computing[J]. IEEE Transactions on Visualization and Computer Graphics, 2013, 20(3):413- 425.
[26] Sumner R W, Popović J. Deformation transfer for triangle meshes[J]. ACM Transactions on Graphics (ToG), 2004, 23(3):399-405.
[27] Lassner C, Zollhofer M. Pulsar:e-cient sphere-based neural rendering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021:1440-1449.
[28] Tu Z, Weng D, Liang B, et al. Expression retargeting from images to three-dimensional face models represented in texture space[J]. Journal of the Society for Information Display, 2022, 30(10):775-788.
[29] Yu C, Wang J, Peng C, et al. BiSeNet:bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018:325-341.
[30] Bulat A, Tzimiropoulos G. How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,0003D facial landmarks)[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017:1021-1030.
[31] Yan Y, Lu K, Xue J, et al. FEAFA:a well-annotated dataset for facial expression analysis and 3D facial animation[C]//2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). 2019:96-101.
[32] Feng Y, Wu F, Shao X, et al. Joint 3D face reconstruction and dense alignment with position map regression network[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018:534-551.
[33] Zhu X, Liu X, Lei Z, et al. Face alignment in full pose range:a 3D total solution[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 41(1):78-92.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References