Journal of Shanghai University(Natural Science Edition) ›› 2022, Vol. 28 ›› Issue (3): 372-385.doi: 10.12066/j.issn.1007-2861.2377

• Data Collection, Database and Data Processing • Previous Articles     Next Articles

Material data named entity recognition based on matching contextual lexical words and graph convolution

CHEN Qian1, WU Xing1,2,3()   

  1. 1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
    2. Center of Materials Informatics and Data Science, Materials Genome Institute, Shanghai University, Shanghai 200444, China
    3. Zhejiang Laboratory, Hangzhou 311100, Zhejiang, China
  • Received:2022-03-15 Online:2022-06-30 Published:2022-05-27
  • Contact: WU Xing E-mail:xingwu@shu.edu.cn

Abstract:

Literature pertaining to materials contain abundant information regarding data mining using machine learning and natural language processing, which is currently being investigated extensively. Named entity recognition (NER) is first performed when mining and extracting information from data such that the data can be used efficiently. As vector representation cannot solve multiple meanings of words, and models often extract contextual features while disregarding global features, a named entity recognition method based on matching contextual lexical words and graph convolution is proposed herein. First, the contextual dynamic features of text is obtained using XLNet; second, the contextual and global features are obtained using a long short-term memory network and a graph convolutional network (GCN) combined with contextual lexical words of the text, respectively. Finally, a sequence of labels is output via a conditional random field. The model is validated using two different datasets. Experimental results of the material data show that the precision, recall, and F1 score are 90.05%, 88.67%, and 89.36%, respectively, which effectively improve the named entity recognition accuracy.

Key words: named entity recognition (NER), XLNet, graph convolutional network (GCN)

CLC Number: