数据采集、数据库和数据处理

基于自然语言处理的材料领域知识图谱构建方法

展开
  • 1.上海大学 计算机工程与科学学院, 上海 200444
    2.上海大学 材料基因组工程研究院 材料信息与数据科学中心,上海 200444
    3.之江实验室, 浙江 杭州 311100
魏晓(1973—), 男, 副教授,博士生导师, 博士, 研究方向为自然语言理解、机器学习. E-mail: xwei@shu.edu.cn

收稿日期: 2022-03-28

  网络出版日期: 2022-05-27

基金资助

国家重点研发计划资助项目(2018YFB0704400);云南省重大科技专项资助项目(202002AB080001-2);云南省重大科技专项资助项目(202102AB080019-3);之江实验室科研攻关资助项目(2021PE0AC02);上海张江国家自主创新示范区专项发展资金重大资助项目(ZJ2021-ZD-006)

Constructing a material-domain knowledge graph based on natural language processing

Expand
  • 1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
    2. Center of Materials Informatics and Data Science, Materials Genome Institute, Shanghai University, Shanghai 200444, China
    3. Zhejiang Laboratory, Hangzhou 311100, Zhejiang, China

Received date: 2022-03-28

  Online published: 2022-05-27

摘要

如何将材料领域知识与机器学习技术相结合是材料智能研究迫切需要解决的问题. 知识图谱(knowledge graphs, KGs)作为一种高效的知识组织模型, 可以有效地对材料领域知识进行表示、组织和推理, 从而提升材料机器学习算法的智能水平. 研究了基于自然语言处理技术的材料领域知识自动获取方法, 提出了基于双向门控循环单元-图神经网络-条件随机场(bidirectional-gated recurrent unit-graph neural network-conditional random field, Bi-GRU-GNN-CRF) 的材料实体关系联合抽取方法, 以及基于改进 TextRank 算法的材料工艺知识抽取方法, 实现了从专利、论文等材料文献中自动获取材料实体、关系、工艺流程等材料领域知识. 实验结果表明, 所提出的材料知识获取方法具有较好的精度和召回率, 能够有效提升材料知识图谱的知识覆盖度. 基于该方法构建的材料领域知识图谱的知识覆盖率达到了80%, 能够为材料智能研发提供更加全面的知识支撑. 同时, 构建了非调制特殊钢、铝基复合材料、热障陶瓷涂层材料 3 个材料领域知识图谱, 并进行了应用探索, 进一步验证了知识图谱为材料研发提供知识支撑的可能性.

本文引用格式

魏晓, 王晓鑫, 陈永琪, 张惠然 . 基于自然语言处理的材料领域知识图谱构建方法[J]. 上海大学学报(自然科学版), 2022 , 28(3) : 386 -398 . DOI: 10.12066/j.issn.1007-2861.2380

Abstract

Determining how to combine material-domain knowledge with the machine learning method is an urgent problem in materials intelligence. As an efficient knowledge-organization method, knowledge graphs (KGs) can effectively represent, organize, and reasoning material-domain knowledge so as to improve the intelligence level of machine-learning algorithms for materials. In this paper, we study natural language processing (NLP)-based knowledge-acquisition methods for materials and propose a joint extraction method comprising the material entity relationship based on bidirectional-gated recurrent unit-graph neural network-conditional random field (Bi-GRU-GNN-CRF) and a material-processing knowledge-extraction method based on the improved TextRank algorithm. Using the proposed knowledge-acquisition method, we acquire material-domain knowledge such as material entities, relationships, and technological processes from patents, papers, and other types of texts. The experimental results show that the proposed knowledge acquisition method has good accuracy and recall, which can effectively improve the knowledge coverage of the material KGs. The knowledge coverage of the material KGs constructed based on proposed method reaches 80%, which provides more comprehensive knowledge support for materials research and development. We also construct the domain KGs of special non-modulated steel, an aluminum matrix composite material, and a thermal-barrier ceramic-coating material, and the results further verify the potential of using material knowledge maps in materials research and development.

参考文献

[1] 徐增林, 盛泳潘, 贺丽荣, 等. 知识图谱技术综述[J]. 电子科技大学学报, 2016, 45(4): 589-606.
[2] 付雷杰, 曹岩, 白瑀, 等. 国内垂直领域知识图谱发展现状与展望[J]. 计算机应用研究, 2021, 38(11): 3201-3214.
[3] 刘涛, 邓永和, 高明, 等. 材料属性知识图谱的建设与发展浅析[J]. 湖南工程学院学报(自然科学版), 2021, 31(4): 59-65.
[4] 杨丽, 苏航, 柴锋, 等. 材料数据库和数据挖掘技术的应用现状[J]. 中国材料进展, 2019, 38(7): 672-681, 650.
[5] 邓依依, 邬昌兴, 魏永丰, 等. 基于深度学习的命名实体识别综述[J]. 中文信息学报, 2021, 35(9): 30-45.
[6] 鄂海红, 张文静, 肖思琪, 等. 深度学习实体关系抽取研究综述[J]. 软件学报, 2019, 30(6): 1793-1818.
[7] 吴赛赛, 梁晓贺, 谢能付, 等. 面向领域实体关系联合抽取的标注方法[J]. 计算机应用, 2021, 41(10): 2858-2863.
[8] 付瑞, 李剑宇, 王笳辉, 等. 面向领域知识图谱的实体关系联合抽取[J]. 华东师范大学学报(自然科学版), 2021(5): 24-36.
[9] Miwa M, Bansal M. End-to-end relation extraction using LSTMs on sequences and tree structures[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1105-1116.
[10] Zheng S C, Hao Y X, Lu D Y, et al. Joint entity and relation extraction based on a hybrid neural network[J]. Neurocomputing, 2017, 257: 1-8.
[11] Katiyar A, Cardie C. Going out on a limb: joint extraction of entity mentions and relations without dependency trees[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 917-928.
[12] Li Q, Ji H. Incremental joint extraction of entity mentions and relations[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014: 402-412.
[13] Zheng S C, Wang F, Bao H Y, et al. Joint extraction of entities and relations based on a novel tagging scheme[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1227-1236.
[14] Huang P X, Zhao X, Fang Y, et al. End-to-end knowledge triplet extraction combined with adversarial training[J]. Journal of Computer Research and Development, 2019, 56(12): 2536-2548.
[15] Zeng X R, Zeng D J, He S Z, et al. Extracting relational facts by an end-to-end neural model with copy mechanism[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 506-514.
[16] Mrdjenovich D, Horton M K, Montoya J H, et al. Propnet: a knowledge graph for materials science[J]. Matter, 2020, 2(2): 464-480.
[17] Wei X, Chen Y Q. Joint extraction of long-distance entity relation by aggregating local- and semantic-dependent features[J]. Wireless Communications and Mobile Computing, 2022, 2022: 3763940.
文章导航

/