上海大学学报(自然科学版) ›› 2018, Vol. 24 ›› Issue (5): 703-712.doi: 10.12066/j.issn.1007-2861.2075

• 数字影视技术 • 上一篇    下一篇

基于深度学习的中文影评情感分析

周敬一1, 郭燕1(), 丁友东2   

  1. 1. 中国科学技术大学苏州研究院 软件学院, 江苏 苏州 215123
    2. 上海大学 上海电影学院, 上海 200072
  • 收稿日期:2018-07-02 出版日期:2018-10-30 发布日期:2018-10-26
  • 通讯作者: 郭燕 E-mail:guoyan@ustc.edu.cn

Sentiment analysis of Chinese movie reviews based on deep learning

ZHOU Jingyi1, GUO Yan1(), DING Youdong2   

  1. 1. School of Software Engineering, Suzhou Institute for Advanced Study, University of Science and Technology of China, Suzhou 215123, Jiangsu, China
    2. Shanghai Film Academy, Shanghai University, Shanghai 200072, China
  • Received:2018-07-02 Online:2018-10-30 Published:2018-10-26
  • Contact: GUO Yan E-mail:guoyan@ustc.edu.cn

摘要:

随着社交网络的兴起, 更多人选择在网络上发表自己对影视作品的观点, 这为影视投资人了解观众对电影的反馈提供了更方便的途径. 例如, 豆瓣影评中包含了海量用户或积极或消极的情感观点, 而分析豆瓣影评的情感倾向能够辅助投资人进行决策, 提升作品质量. 大量数据分析必须借助计算机技术手段完成, 其中情感分析是自然语言处理(natural language processing, NLP)的一个方向, 常用来分析判断文本描述的情绪类型, 因此也被称为情感倾向分析. 为了提高影评情感分类的准确率, 设置了多组对比实验来选择最优参数, 比较了当以中文字符向量和词向量为输入矩阵时, 双向长短期记忆(bidirectional long short-term memory, Bi-LSTM)模型和卷积神经网络(convolutional neural network, CNN)模型对分类准确率的影响. 提出了一种以 CNN 模型为弱分类器的 Bagging 算法, 训练了多个 CNN 模型, 并采用投票法决定最终的分类结果. 这种集成的方法减少了单个模型造成的分类偏差, 比单一的 Bi-LSTM 模型的分类准确率提高了 5.10%, 比单一的 CNN 模型的分类准确率提高了 1.34%.

关键词: 双向长短期记忆模型, 卷积神经网络模型, Bagging 算法, 词嵌入向量, 影评情感分析

Abstract:

With the rise of social networks, more people choose to express their opinions on the internet, which allows film and television investors to collect the audience's feedback more easily. The watercress movie review is just one such platform through which investors are able to know the viewers' taste and preference, and thereby to make better decision in investing the television and film industry. A large amount of data analysis must be done by means of computer technology. Sentiment analysis is a direction of natural language processing (NLP). Sentiment analysis, also known as emotional tendency analysis, is one aiming to analyze the positive or negative aspects of text description. In order to improve the accuracy of the film's sentiment classification, multiple sets of contrast experiments are set to select the optimal parameters, and the Chinese character vectors and the word vectors are compared as the input matrix, in the bidirectional long short-term memory (Bi-LSTM) model and the convolutional neural network (CNN). A Bagging algorithm with CNN model as weak classifier is proposed. Multiple CNN models are trained to determine the final classification results by voting method. The integrated method reduces the deviation caused by a single model. The accuracy of a single Bi-LSTM model has increased by 5.10%, which is 1.34% higher than that of a single CNN model.

Key words: bidirectional long short-term memory (Bi-LSTM) model, convolutional neural network (CNN) model, Bagging algorithm, word embedding vector, sentiment analysis of movie reviews

中图分类号: