数字影视技术

基于深度学习的中文影评情感分析

展开
  • 1. 中国科学技术大学苏州研究院 软件学院, 江苏 苏州 215123
    2. 上海大学 上海电影学院, 上海 200072

收稿日期: 2018-07-02

  网络出版日期: 2018-10-26

Sentiment analysis of Chinese movie reviews based on deep learning

Expand
  • 1. School of Software Engineering, Suzhou Institute for Advanced Study, University of Science and Technology of China, Suzhou 215123, Jiangsu, China
    2. Shanghai Film Academy, Shanghai University, Shanghai 200072, China

Received date: 2018-07-02

  Online published: 2018-10-26

摘要

随着社交网络的兴起, 更多人选择在网络上发表自己对影视作品的观点, 这为影视投资人了解观众对电影的反馈提供了更方便的途径. 例如, 豆瓣影评中包含了海量用户或积极或消极的情感观点, 而分析豆瓣影评的情感倾向能够辅助投资人进行决策, 提升作品质量. 大量数据分析必须借助计算机技术手段完成, 其中情感分析是自然语言处理(natural language processing, NLP)的一个方向, 常用来分析判断文本描述的情绪类型, 因此也被称为情感倾向分析. 为了提高影评情感分类的准确率, 设置了多组对比实验来选择最优参数, 比较了当以中文字符向量和词向量为输入矩阵时, 双向长短期记忆(bidirectional long short-term memory, Bi-LSTM)模型和卷积神经网络(convolutional neural network, CNN)模型对分类准确率的影响. 提出了一种以 CNN 模型为弱分类器的 Bagging 算法, 训练了多个 CNN 模型, 并采用投票法决定最终的分类结果. 这种集成的方法减少了单个模型造成的分类偏差, 比单一的 Bi-LSTM 模型的分类准确率提高了 5.10%, 比单一的 CNN 模型的分类准确率提高了 1.34%.

本文引用格式

周敬一, 郭燕, 丁友东 . 基于深度学习的中文影评情感分析[J]. 上海大学学报(自然科学版), 2018 , 24(5) : 703 -712 . DOI: 10.12066/j.issn.1007-2861.2075

Abstract

With the rise of social networks, more people choose to express their opinions on the internet, which allows film and television investors to collect the audience's feedback more easily. The watercress movie review is just one such platform through which investors are able to know the viewers' taste and preference, and thereby to make better decision in investing the television and film industry. A large amount of data analysis must be done by means of computer technology. Sentiment analysis is a direction of natural language processing (NLP). Sentiment analysis, also known as emotional tendency analysis, is one aiming to analyze the positive or negative aspects of text description. In order to improve the accuracy of the film's sentiment classification, multiple sets of contrast experiments are set to select the optimal parameters, and the Chinese character vectors and the word vectors are compared as the input matrix, in the bidirectional long short-term memory (Bi-LSTM) model and the convolutional neural network (CNN). A Bagging algorithm with CNN model as weak classifier is proposed. Multiple CNN models are trained to determine the final classification results by voting method. The integrated method reduces the deviation caused by a single model. The accuracy of a single Bi-LSTM model has increased by 5.10%, which is 1.34% higher than that of a single CNN model.

参考文献

[1] Bollen J, Mao H N, Zeng X J. Twitter mood predicts the stock market [J]. Journal of Computational Science, 2011(2): 1, 1-8.
[2] Severyn A, Moschitti A. Twitter sentiment analysis with deep convolutional neural networks[C]// Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015: 959-962.
[3] 唐慧丰, 谭松波, 程学旗. 基于监督学习的中文情感分类技术比较研究[J]. 中文信息学报, 2007(11):88-108.
[4] Mikolov T, Corrado G, Chen K, et al. Efficient estimation of word representations in vector space[C]// International Conference on Learning Representations. 2013: 1-12.
[5] Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques[C]// Proceedings of Annual Conference of the Association for Computational Linguistics. 2002: 79-86.
[6] 谢丽星, 周明, 孙茂松. 基于层次结构的多策略中文微博情感分析和特征抽取[J]. 中文信息学报, 2012,26(1):73-83.
[7] Kim Y. Convolutional neural networks for sentence classification[J]. Eprint ArXiv, 2014, DOI: 10.3115/v1/B14-1181.
[8] Zhang X, Zhao J B. Character-level convolutional networksfor text classification[C]// Advances in Neural Information Processing Systems. 2015: 649-657.
[9] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]// Advances in Neural Information Processing Systems. 2013: 3111-3119.
[10] Bengio Y, Ducharme R, Vincent P. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003,3(6):1137-1155.
[11] Mikolov T, Corrado G, Chen K, et al. Efficient estimation of word representations in vector space[C]// International Conference on Learning Representations. 2013: 1-12.
[12] Hochreiter S, Schemidhuber J. Long short-term memory[J]. Neural Computation, 1997,9(8):1735-1780.
[13] Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks, 2005,18(5):602-610.
[14] Santos C N D, Gattit M. Deep convolutional neural networks for sentiment analysis of short texts[C]// International Conference on Computational Linguistics. 2014: 69-78.
文章导航

/