上海大学学报(自然科学版) ›› 2025, Vol. 31 ›› Issue (3): 475-486.doi: 10.12066/j.issn.1007-2861.2574

• 计算机科学 • 上一篇    下一篇

基于情绪一致性的三维人脸重建

黄东晋1,2, 俞乐洋1,2, 石永生1,2, 郑楚1,2, 钱纪宇1,2   

  1. 1. 上海大学 上海电影学院, 上海 200072;
    2. 上海电影特效工程技术研究中心, 上海 200072
  • 收稿日期:2023-11-03 出版日期:2025-06-30 发布日期:2025-07-22
  • 通讯作者: 黄东晋(1982-),男,副教授,博士生导师,博士,研究方向为计算机图形学、虚拟现实和智能影像技术等. E-mail:djhuang@shu.edu.cn
  • 基金资助:
    上海市人才发展资助项目(2021016);国家档案局科技资助项目(2023-X-036)

Three-dimensional facial reconstruction based on emotional consistency

HUANG Dongjin1,2, YU Leyang1,2, SHI Yongsheng1,2, ZHENG Chu1,2, QIAN Jiyu1,2   

  1. 1. Shanghai Film Academy, Shanghai University, Shanghai 200072, China;
    2. Shanghai Engineering Research Center of Motion Picture Special Efiects, Shanghai 200072, China
  • Received:2023-11-03 Online:2025-06-30 Published:2025-07-22

摘要: 从单目RGB图片中重建三维人脸是一项非常具有挑战性的计算机视觉任务.由于缺乏带有人脸表情标签的数据集,目前大多数三维人脸重建方案都缺乏对人脸表情的有效监督,导致不能准确还原输入人脸的表情信息.因此,提出一种基于情绪一致性的三维人脸重建方法,在训练时引入情绪感知一致性损失,对人脸情绪形成自监督,从而隐式地激励重建人脸与输入人脸具有一致的面部表情.此外,还提出一种轻量级人脸重建框架,用MobileNetV2代替深度网络ResNet50来回归人脸参数,以提升模型在CPU端的推理速度.实验结果表明,本方法能够基于单张人脸图像高质量地重建三维人脸模型;在人脸表情捕捉准确度和三维人脸重建精度两方面均优于一些主流的人脸重建方法.同时,所采用的轻量级人脸重建框架显著提升了模型在CPU端的推理速度,拓展了模型在计算资源受限场景下的应用前景.

关键词: 三维人脸重建, 情绪一致性, 自监督, MobileNetV2

Abstract: Reconstructing three-dimensional(3D) faces from monocular RGB images is a challenging computer-vision task. Owing to the dearth of datasets with facial-expression labels, most 3D facial reconstruction schemes lack the supervision of facial expressions,thus resulting in the inaccurate reconstruction of input facial expressions. Therefore, this study proposes a 3D facial reconstruction method based on emotional consistency. In this method, a loss of emotional perception consistency is introduced during training to selfsupervise facial emotions, thus enabling the reconstructed face to exhibit the same facial expression as the input face. Additionally, this study proposes a lightweight framework that uses MobileNetV2 to replace the deep network ResNet50 to regress face parameters and improve the inference speed of the model on the CPU side. Experimental results show that the proposed method can reconstruct a high-quality 3D face model based on a single-face image. The proposed method is superior to some mainstream face-reconstruction methods in terms of facial-expression capture and 3D face-reconstruction accuracies. Additionally,the lightweight face-reconstruction framework adopted in this study significantly improves the inference speed on the CPU side and expands the application prospects of the model in computing-resource-constrained scenarios.

Key words: three-dimensional facial reconstruction, emotional consistency, self-supervised, MobileNetV2

中图分类号: