上海大学学报(自然科学版) ›› 2022, Vol. 28 ›› Issue (3): 386-398.doi: 10.12066/j.issn.1007-2861.2380

• 数据采集、数据库和数据处理 • 上一篇    下一篇

基于自然语言处理的材料领域知识图谱构建方法

魏晓1(), 王晓鑫1, 陈永琪1, 张惠然1,2,3   

  1. 1.上海大学 计算机工程与科学学院, 上海 200444
    2.上海大学 材料基因组工程研究院 材料信息与数据科学中心,上海 200444
    3.之江实验室, 浙江 杭州 311100
  • 收稿日期:2022-03-28 出版日期:2022-06-30 发布日期:2022-05-27
  • 通讯作者: 魏晓 E-mail:xwei@shu.edu.cn
  • 作者简介:魏晓(1973—), 男, 副教授,博士生导师, 博士, 研究方向为自然语言理解、机器学习. E-mail: xwei@shu.edu.cn
  • 基金资助:
    国家重点研发计划资助项目(2018YFB0704400);云南省重大科技专项资助项目(202002AB080001-2);云南省重大科技专项资助项目(202102AB080019-3);之江实验室科研攻关资助项目(2021PE0AC02);上海张江国家自主创新示范区专项发展资金重大资助项目(ZJ2021-ZD-006)

Constructing a material-domain knowledge graph based on natural language processing

WEI Xiao1(), WANG Xiaoxin1, CHEN Yongqi1, ZHANG Huiran1,2,3   

  1. 1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
    2. Center of Materials Informatics and Data Science, Materials Genome Institute, Shanghai University, Shanghai 200444, China
    3. Zhejiang Laboratory, Hangzhou 311100, Zhejiang, China
  • Received:2022-03-28 Online:2022-06-30 Published:2022-05-27
  • Contact: WEI Xiao E-mail:xwei@shu.edu.cn

摘要:

如何将材料领域知识与机器学习技术相结合是材料智能研究迫切需要解决的问题. 知识图谱(knowledge graphs, KGs)作为一种高效的知识组织模型, 可以有效地对材料领域知识进行表示、组织和推理, 从而提升材料机器学习算法的智能水平. 研究了基于自然语言处理技术的材料领域知识自动获取方法, 提出了基于双向门控循环单元-图神经网络-条件随机场(bidirectional-gated recurrent unit-graph neural network-conditional random field, Bi-GRU-GNN-CRF) 的材料实体关系联合抽取方法, 以及基于改进 TextRank 算法的材料工艺知识抽取方法, 实现了从专利、论文等材料文献中自动获取材料实体、关系、工艺流程等材料领域知识. 实验结果表明, 所提出的材料知识获取方法具有较好的精度和召回率, 能够有效提升材料知识图谱的知识覆盖度. 基于该方法构建的材料领域知识图谱的知识覆盖率达到了80%, 能够为材料智能研发提供更加全面的知识支撑. 同时, 构建了非调制特殊钢、铝基复合材料、热障陶瓷涂层材料 3 个材料领域知识图谱, 并进行了应用探索, 进一步验证了知识图谱为材料研发提供知识支撑的可能性.

关键词: 材料智能, 自然语言处理, 知识图谱

Abstract:

Determining how to combine material-domain knowledge with the machine learning method is an urgent problem in materials intelligence. As an efficient knowledge-organization method, knowledge graphs (KGs) can effectively represent, organize, and reasoning material-domain knowledge so as to improve the intelligence level of machine-learning algorithms for materials. In this paper, we study natural language processing (NLP)-based knowledge-acquisition methods for materials and propose a joint extraction method comprising the material entity relationship based on bidirectional-gated recurrent unit-graph neural network-conditional random field (Bi-GRU-GNN-CRF) and a material-processing knowledge-extraction method based on the improved TextRank algorithm. Using the proposed knowledge-acquisition method, we acquire material-domain knowledge such as material entities, relationships, and technological processes from patents, papers, and other types of texts. The experimental results show that the proposed knowledge acquisition method has good accuracy and recall, which can effectively improve the knowledge coverage of the material KGs. The knowledge coverage of the material KGs constructed based on proposed method reaches 80%, which provides more comprehensive knowledge support for materials research and development. We also construct the domain KGs of special non-modulated steel, an aluminum matrix composite material, and a thermal-barrier ceramic-coating material, and the results further verify the potential of using material knowledge maps in materials research and development.

Key words: materials intelligence, natural language processing, knowledge graphs

中图分类号: