Journal of Shanghai University(Natural Science Edition) ›› 2022, Vol. 28 ›› Issue (3): 386-398.doi: 10.12066/j.issn.1007-2861.2380

• Data Collection, Database and Data Processing • Previous Articles     Next Articles

Constructing a material-domain knowledge graph based on natural language processing

WEI Xiao1(), WANG Xiaoxin1, CHEN Yongqi1, ZHANG Huiran1,2,3   

  1. 1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
    2. Center of Materials Informatics and Data Science, Materials Genome Institute, Shanghai University, Shanghai 200444, China
    3. Zhejiang Laboratory, Hangzhou 311100, Zhejiang, China
  • Received:2022-03-28 Online:2022-06-30 Published:2022-05-27
  • Contact: WEI Xiao E-mail:xwei@shu.edu.cn

Abstract:

Determining how to combine material-domain knowledge with the machine learning method is an urgent problem in materials intelligence. As an efficient knowledge-organization method, knowledge graphs (KGs) can effectively represent, organize, and reasoning material-domain knowledge so as to improve the intelligence level of machine-learning algorithms for materials. In this paper, we study natural language processing (NLP)-based knowledge-acquisition methods for materials and propose a joint extraction method comprising the material entity relationship based on bidirectional-gated recurrent unit-graph neural network-conditional random field (Bi-GRU-GNN-CRF) and a material-processing knowledge-extraction method based on the improved TextRank algorithm. Using the proposed knowledge-acquisition method, we acquire material-domain knowledge such as material entities, relationships, and technological processes from patents, papers, and other types of texts. The experimental results show that the proposed knowledge acquisition method has good accuracy and recall, which can effectively improve the knowledge coverage of the material KGs. The knowledge coverage of the material KGs constructed based on proposed method reaches 80%, which provides more comprehensive knowledge support for materials research and development. We also construct the domain KGs of special non-modulated steel, an aluminum matrix composite material, and a thermal-barrier ceramic-coating material, and the results further verify the potential of using material knowledge maps in materials research and development.

Key words: materials intelligence, natural language processing, knowledge graphs

CLC Number: