上海大学学报(自然科学版) ›› 2022, Vol. 28 ›› Issue (2): 270-280.doi: 10.12066/j.issn.1007-2861.2283

• 研究论文 • 上一篇    下一篇

基于分层标注的中文嵌套命名实体识别

金彦亮(), 谢晋飞, 吴迪嘉   

  1. 上海大学 通信与信息工程学院, 上海 200444
  • 收稿日期:2020-07-26 出版日期:2022-04-30 发布日期:2022-04-28
  • 通讯作者: 金彦亮 E-mail:wuhaide@shu.edu.cn
  • 作者简介:金彦亮(1973--), 男, 副教授, 博士,研究方向为无线传感器网络、无线宽带接入、人工智能等. E-mail: wuhaide@shu.edu.cn
  • 基金资助:
    上海市科委重点资助项目(19511102803)

Chinese nested named entity recognition based on hierarchical tagging

JIN Yanliang(), XIE Jinfei, WU Dijia   

  1. School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China
  • Received:2020-07-26 Online:2022-04-30 Published:2022-04-28
  • Contact: JIN Yanliang E-mail:wuhaide@shu.edu.cn

摘要:

中文命名实体识别在中文信息处理中扮演着重要的角色. 在中文信息文本中, 许多命名实体内部包含着嵌套实体. 然而, 已有研究大多聚焦在非嵌套实体识别, 无法充分捕获嵌套实体之间的边界信息. 采用分层标注方式进行嵌套命名实体识别(nested named entity recognition, NNER), 将每层的实体识别解析为一个单独的任务, 并通过Gate过滤机制来促进层级之间的信息交换. 利用公开的1998年《人民日报》NNER语料进行了多组实验, 验证了模型的有效性. 实验结果表明, 在不使用外部资源词典信息的情况下, 该方法在《人民日报》数据集上的F1值达到了91.41%, 有效提高了中文嵌套命名实体识别的效果.

关键词: 中文信息处理, 分层标注, 嵌套命名实体识别, Gate过滤机制

Abstract:

Chinese named entity recognition plays a critical role in Chinese information processing. In Chinese information text, many named entities contain nested entities. However, most recent studies have focused solely on the recognition of flat entities, which cannot fully capture the boundary information between nested entities. In this study, a hierarchical tagging method is used for nested named entity recognition (NNER), in which each layer of entity recognition is parsed into a separate task, and a gated filtering mechanism is used to promote information exchange between layers. Experiments are conducted on the public NNER corpus of the People's Daily from 1998 to verify the effectiveness of the model. Experimental results show that the F1 value of this method on the People's Daily dataset reach 91.41% without using external resource dictionary information. Thus, the method is shown to improve the recognition of Chinese nested named entities.

Key words: Chinese information processing, hierarchical tagging, nested named entity recognition (NNER), gated filtering mechanism

中图分类号: