上海大学学报(自然科学版) ›› 2024, Vol. 30 ›› Issue (4): 669-681.doi: 10.12066/j.issn.1007-2861.2546

• • 上一篇    下一篇

基于多特征融合和几何感知网络的光场图像编码

刘发国1, 白晓琦2, 张 倩1, 王 斌1, 司 文3   

  1. 1. 上海师范大学 信息与机电工程学院, 上海 201418; 2. 杨浦区教育学院, 上海 200092;3. 上海商学院, 上海 201400
  • 收稿日期:2023-09-24 出版日期:2024-08-30 发布日期:2024-09-13
  • 通讯作者: 张 倩(1983—), 女, 副教授, 博士, 研究方向为视频和图像信息处理. E-mail:qianzhang@shnu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目 (62301320)

Light field image coding based on multi-feature fusion and geometry-aware networks

LIU Faguo1, BAI Xiaoqi2, ZHANG Qian1, WANG Bin1, SI Wen3   

  1. 1. College of Information and Electromechanical Engineering, Shanghai Normal University, Shanghai 201418, China;2. Yangpu District Education Institute, Shanghai 200092, China;3. Shanghai Business School, Shanghai 201400, China
  • Received:2023-09-24 Online:2024-08-30 Published:2024-09-13

摘要: 为了克服基于单一视差合成的光场图像编码方法在遮挡区域无法恢复纹理细节的问题, 提出一种基于多特征融合和几何感知网络的光场图像编码方法, 以进一步提升遮挡场景下光场图像的压缩性能. 首先, 对密集光场稀疏采样, 使用通用视频编码器 (versatile video coding, VVC) 对稀疏光场进行压缩; 然后, 在解码端使用2 个关键分支模块, 即视差估计模块和空间角度联合卷积模块, 以获取光场图像全局的几何信息, 确保在密集纹理和遮挡区域能够更充分地恢复特征; 最后, 为了挖掘 2 个分支融合特征的结构信息, 构建了双向视图的堆栈结构, 并运用几何感知的细化网络以重建高质量的密集光场. 实验结果表明, 与已有国际上流行的光场图像编码方法相比, 所提出的方法具有显著优势.

关键词: 光场图像编码, 遮挡区域, 多特征融合, 几何感知

Abstract: To address the inherent limitations of light field image coding methods relying on single parallax synthesis, which hinders the recovery of texture details from occluded regions, an innovative light field image coding approach grounded in multi-feature fusion and geometric perception networks is introduced in this research. The primary objective is to further enhance the compression performance of light-field images captured in occluded scenes. The methodology begins by employing sparse sampling of the dense light field, followed by the application of versatile video coding (VVC) to effectively compress the resulting sparse light field. Subsequently, two pivotal branch modules, the parallax estimation module and the spatial angle joint convolution module, were deployed during the decoding process. These branches collectively capture the global geometric attributes of the optical field image and ensure a more comprehensive recovery of the features, particularly from regions characterized by dense textures and occlusions. To exploit the structural information embedded within the fused features originating from these two branches fully, a stack structure featuring bidirectional views was constructed. Furthermore, a refinement network with geometric perception capabilities was used to reconstruct high-quality dense light fields. The experimental results demonstrate the significant advantages of our pro-posed method over current international light-field image coding techniques.

Key words: light ?eld image coding, occlusion region, multi-feature fusion, geometry-aware

中图分类号: