Research Paper

Construction of event-oriented Chinese coreference corpus

Expand
  • 1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
    2. Shanghai Precision Metrology and Test Research Institute, Shanghai 201109, China

Received date: 2017-02-05

  Online published: 2018-12-24

Abstract

Coreference resolution, a key in natural language processing, is a basic reasearch topic. This paper describes how an event-oriented Chinese coreference corpus is built based on the Chinese emergency corpus (CEC) via automatic generation and manual annotation. Differing from the traditional coreference corpuses, this corpus is directed to the text, in which knowledge representation unit is based on events, and coreference of elements and events are annotated. The construction of corpus is a key to the research of event-oriented Chinese coreference resolution, which provides more resources to support. The coreference of elements and events are counted and analyzed to provide a basis for the future research.

Key words: Chinese; event; coreference; corpus

Cite this article

ZHANG Yajun, LIU Zongtian, LI Qiang, ZHOU Wen . Construction of event-oriented Chinese coreference corpus[J]. Journal of Shanghai University, 2018 , 24(6) : 900 -911 . DOI: 10.12066/j.issn.1007-2861.1888

References

[1] 周炫余, 刘娟, 卢笑. 篇章中指代消解研究综述[J]. 武汉大学学报 (理学版), 2014(1):24-36.
[2] 宋洋, 王厚峰. 共指消解研究方法综述[J]. 中文信息学报, 2015(1):1-12.
[3] Deemter K, Kibble R. On coreferring: coreference in MUC and related annotationschemes[J]. Computational Linguistics, 2000,26(4):629-637.
[4] Doddington G, Mitchell A, Przybocki M. The automatic content extraction (ACE) program-tasks, data, and evaluation [DB/OL]. [2016-09-01].https://course.ccs.neu.edu/csg224/resources/muc/ACE.pdf.
[5] Pradhan S, Ramshaw L, Marcus M, et al. CoNLL-2011 shared task: modeling unrestricted coreference in OntoNotes [C]//CoNLL Shared Task'11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning. 2011: 1-27.
[6] Pradhan S, Moschitti A, Xue N, et al. CoNLL-2012 shared task: modeling multilingual unrestricted coreference in OntoNotes [C]//Joint Conference on EMNLP and CoNLL-Shared Task. 2012: 1-40.
[7] 赵知纬, 钱龙华, 周国栋. 一个面向信息抽取的中文跨文本指代语料库[J]. 中文信息学报, 2015(1):57-66.
[8] 舒佳根, 惠浩添, 钱龙华, 等. 一个中文实体链接语料库的建设[J]. 北京大学学报 (自然科学版), 2015(2):321-327.
[9] Kong F, Zhou G D. A tree kernel-based unified framework for Chinese zero anaphoraresolution [C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2010: 882-891.
[10] 廖涛. 面向事件的文本表示及其应用研究[D]. 上海: 上海大学, 2014.
[11] 仲兆满, 刘宗田, 周文, 等. 事件关系表示模型[J]. 中文信息学报, 2009(6):56-60.
[12] 仲兆满, 刘宗田, 李存华. 事件本体模型及事件类排序[J]. 北京大学学报(自然科学版), 2013(2):234-240.
[13] 刘宗田, 黄美丽, 周文, 等. 面向事件的本体研究[J]. 计算机科学, 2009,36(11):189-192.
[14] 付剑锋, 刘宗田, 付雪峰, 等. 基于依存分析的事件识别[J]. 计算机科学, 2009(11):217-219.
[15] Passoneau R J. Computing reliability for coreference annotation [C]//Proceedings of theInternational Conference on Language Resouces (LREC). 2004: 1503-1506.
[16] Krippendorff K H. Content analysis: an introduction to its methodology[M]. Beverly Hills: SAGE Publications, 1980.
Outlines

/