上海大学学报(自然科学版) ›› 2021, Vol. 27 ›› Issue (3): 544-552.doi: 10.12066/j.issn.1007-2861.2158

• 研究论文 • 上一篇    下一篇

基于关键 $n$-grams 和门控循环神经网络的文本分类模型

赵倩, 吴悦(), 刘宗田   

  1. 上海大学 计算机工程与科学学院, 上海 200444
  • 收稿日期:2019-03-27 出版日期:2021-06-30 发布日期:2021-06-27
  • 通讯作者: 吴悦 E-mail:ywu@mail.shu.edu.cn
  • 作者简介:吴悦(1960—), 女, 教授, 博士生导师, 博士, 研究方向为智能信息处理. E-mail: ywu@mail.shu.edu.cn

Text classification model based on essential $n$-grams and gated recurrent neural network

ZHAO Qian, WU Yue(), LIU Zongtian   

  1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
  • Received:2019-03-27 Online:2021-06-30 Published:2021-06-27
  • Contact: WU Yue E-mail:ywu@mail.shu.edu.cn

摘要:

提出一种基于关键 $n$-grams 和门控循环神经网络的文本分类模型. 模型采用更为简单高效的池化层替代传统的卷积层来提取关键的 $n$-grams 作为重要语义特征, 同时构建双向门控循环单元(gated recurrent unit, GRU)获取输入文本的全局依赖特征, 最后将两种特征的融合模型应用于文本分类任务. 在多个公开数据集上评估模型的质量, 包括情感分类和主题分类. 与传统模型的实验对比结果表明: 所提出的文本分类模型可有效改进文本分类的性能, 在语料库 20newsgroup 上准确率提高约 1.95%, 在语料库 Rotton Tomatoes 上准确率提高约 1.55%.

关键词: 文本分类, 门控循环单元(gated recurrent unit, GRU), $n$-grams, 自然语言处理

Abstract:

An effective text classification model based on $n$-grams and a gated recurrent neural network is proposed in this paper. First, we adopt a simpler and more efficient pooling layer to replace the traditional convolutional layer to extract the essential $n$-grams as important semantic features. Second, a bidirectional gated recurrent unit (GRU) is constructed to obtain the global dependency features of the input text. Finally, we apply the fusion model of the two features to the text classification task. We evaluate the quality of our model on sentiment and topic categorization tasks over multiple public datasets. Experimental results show that the proposed method can improve text classification effectiveness compared with the traditional model. On accuracy, it approaches an improvement of 1.95% on the 20newsgroup and 1.55% on the Rotten Tomatoes corpus.

Key words: text classification, gated recurrent unit (GRU), $n$-grams, natural language processing

中图分类号: