收稿日期: 2020-07-26
网络出版日期: 2020-12-15
基金资助
上海市科委重点资助项目(19511102803)
Chinese nested named entity recognition based on hierarchical tagging
Received date: 2020-07-26
Online published: 2020-12-15
中文命名实体识别在中文信息处理中扮演着重要的角色. 在中文信息文本中, 许多命名实体内部包含着嵌套实体. 然而, 已有研究大多聚焦在非嵌套实体识别, 无法充分捕获嵌套实体之间的边界信息. 采用分层标注方式进行嵌套命名实体识别(nested named entity recognition, NNER), 将每层的实体识别解析为一个单独的任务, 并通过Gate过滤机制来促进层级之间的信息交换. 利用公开的1998年《人民日报》NNER语料进行了多组实验, 验证了模型的有效性. 实验结果表明, 在不使用外部资源词典信息的情况下, 该方法在《人民日报》数据集上的F1值达到了91.41%, 有效提高了中文嵌套命名实体识别的效果.
金彦亮, 谢晋飞, 吴迪嘉 . 基于分层标注的中文嵌套命名实体识别[J]. 上海大学学报(自然科学版), 2022 , 28(2) : 270 -280 . DOI: 10.12066/j.issn.1007-2861.2283
Chinese named entity recognition plays a critical role in Chinese information processing. In Chinese information text, many named entities contain nested entities. However, most recent studies have focused solely on the recognition of flat entities, which cannot fully capture the boundary information between nested entities. In this study, a hierarchical tagging method is used for nested named entity recognition (NNER), in which each layer of entity recognition is parsed into a separate task, and a gated filtering mechanism is used to promote information exchange between layers. Experiments are conducted on the public NNER corpus of the People's Daily from 1998 to verify the effectiveness of the model. Experimental results show that the F1 value of this method on the People's Daily dataset reach 91.41% without using external resource dictionary information. Thus, the method is shown to improve the recognition of Chinese nested named entities.
| [1] | 周俊生, 戴新宇, 尹存燕, 等. 基于层叠条件随机场模型的中文机构名自动识别[J]. 电子学报, 2006, 34(5): 804-809. |
| [2] | Fu C Y, Fu G H. Morpheme-based Chinese nested named entity recognition[C]// The 9th International Conference on Fuzzy System and Knowlodge Discovery. 2012: 2546-2550. |
| [3] | 尹迪, 周俊生, 曲维光. 基于联合模型的中文嵌套命名实体识别[J]. 南京师范大学学报 (自然科学版), 2014, 37(3): 29-35. |
| [4] | Xing Y, Zhu Y, Zhang K, et al. Named entity recognition among Chinese MicroBlog based on Cascaded CRF[C]// 2018 International Conference on Audio, Language and Image Processing. 2018: 28-34. |
| [5] | 李雁群, 何云琪, 钱龙华, 等. 中文嵌套命名实体识别语料库的构建[J]. 中文信息学报, 2018, 32(8): 19-26. |
| [6] | Katiyar A, Cardie C. Nested named entity recognition revisited[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics. 2018: 861-871. |
| [7] | Ju M, Miwa M, Ananiadou S. A neural layered model for nested named entity recog- nition[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics. 2018: 1446-1459. |
| [8] | 顾溢. 基于BiLSTM-CRF的复杂中文命名实体识别研究[D]. 南京: 东南大学, 2019. |
| [9] | Peng N, Dredze M. Named entity recognition for chinese social media with jointly trained embeddings[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 548-554. |
| [10] | Zhu Y, Wang G. CAN-NER: convolutional attention network for Chinese named entity recognition[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. 2019: 3384-3393. |
| [11] | Hochreiter S, Schemidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. |
| [12] | Kingma D P, Ba J. Adam: a method for stochastic optimization[C]// 3rd International Conference on Learning Representations. 2015: 1-15. |
| [13] | Li S, Zhao Z, Hu R, et al. Analogical reasoning on chinese morphological and semantic relations[C]// Proceedings of the 56th annual meeting of the association for computational linguistics. 2018: 138-143. |
| [14] | Levow G. The third international Chinese language processing bakeoff: word segmentation and named entity recognition[C]// Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. 2006: 108-117. |
| [15] | Zhou J, Qu W, Zhang F. Chinese named entity recognition via joint identification and categorization[J]. Chinese Journal of Electronics, 2013, 22(2): 225-230. |
| [16] | Dong C, Zhang J, Zong C, et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[M]// Lin C Y, Xue N, Zhao D, et al. Natural Language Understanding and Intelligent Applications. Cambrige: Springer, 2016: 239-250. |
| [17] | Cao P, Chen Y, Liu K, et al. Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 182-192. |
| [18] | Yang F, Zhang J, Liu G, et al. Five-stroke based CNN-BiRNN-CRF network for Chinese named entity recognition[C]// CCF International Conference on Natural Language Processing and Chinese Computing. 2018: 184-195. |
| [19] | Xu C, Wang F, Han J, et al. Exploiting multiple embeddings for Chinese named entity recognition[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019: 2269-2272. |
/
| 〈 |
|
〉 |