上海大学学报(自然科学版) ›› 2022, Vol. 28 ›› Issue (2): 261-269.doi: 10.12066/j.issn.1007-2861.2287

• 研究论文 • 上一篇    下一篇

基于深度学习的图像抠图技术

王榕榕1,2, 徐树公2, 黄剑波1,3()   

  1. 1.上海大学 上海电影学院, 上海 200072
    2.上海大学 上海先进通信与数据科学研究院, 上海 200444
    3.上海大学 上海电影特效工程技术研究中心, 上海 200072
  • 收稿日期:2020-03-13 出版日期:2022-04-30 发布日期:2022-04-28
  • 通讯作者: 黄剑波 E-mail:huangjianbo110@shu.edu.cn
  • 作者简介:黄剑波(1980--), 男, 副教授, 博士生导师, 博士, 研究方向为艺术理论、图像处理等. E-mail: huangjianbo110@shu.edu.cn
  • 基金资助:
    上海大学电影学高峰学科和上海电影特效工程技术研究中心研究项目(16dz2251300)

Image matting based on deep learning

WANG Rongrong1,2, XU Shugong2, HUANG Jianbo1,3()   

  1. 1. Shanghai Film Academy, Shanghai University, Shanghai 200072, China
    2. Shanghai Institute for Advanced Communication and Data Science, Shanghai University, Shanghai 200444, China
    3. Shanghai Engineering Research Center of Motion Picture Special Effects, Shanghai University, Shanghai 200072, China
  • Received:2020-03-13 Online:2022-04-30 Published:2022-04-28
  • Contact: HUANG Jianbo E-mail:huangjianbo110@shu.edu.cn

摘要:

图像抠图(image matting)技术是图像编辑技术的基础, 广泛应用于影视后期制作和日常生活. 基于深度学习的图像抠图网络, 通过输入的原图和三元图来估计每个像素的 $\alpha$ 值. 在原下、上采样的图像抠图技术基础上, 针对抠图数据集图像差异较大容易造成网络收敛较慢的问题, 在每个卷积层后加入了批量标准化(batch normalization, BN)层, 对输入数据进行归一化操作, 加快模型收敛速度, 同时参数更新方向更符合数据集整体特性; 针对抠图任务需要更关注物体边缘部分的特点, 使用可变形卷积(deformable convolution)层替换普通卷积层. 可变形卷积层会根据不同输入数据自适应学习卷积核形状, 有效扩大感受野范围, 在细节部分有更好的预测效果.

关键词: 深度学习, 图像抠图, 语义分割, 预测

Abstract:

Image editing technology, which is widely used in the post-production of film and television and in daily life, is based on image matting. In this study, an image matting network based on deep learning which estimates the value of each pixel by inputting the original image and trimap is proposed. Based on the original down- and up-sampling network and to address the problem of slow network convergence caused by the large difference between matting dataset pictures, batch normalisation (BN) is applied after each convolution layer in this study. In the normalisation layer, the input data are normalised to speed up the convergence of the model. This enables the update direction of the parameters to be more consistent with the overall characteristics of the dataset. Because the edge of the object should be carefully considered in the matting task, a deformable convolution layer is used instead of the custom convolution layer. The deformable convolution layer can adaptively learn the shape of the convolution kernel according to different input data, effectively expand the range of the receptive field, and improve the prediction effect in detailed image parts.

Key words: deep learning, image matting, semantic segmentation, prediction

中图分类号: