基于BERT 的金融文本情感分析模型

doi:10.12066/j.issn.1007-2861.2308

上海大学学报(自然科学版) ›› 2023, Vol. 29 ›› Issue (1): 118-128.doi: 10.12066/j.issn.1007-2861.2308

基于BERT 的金融文本情感分析模型

朱鹤, 陆小锋, 薛雷()

上海大学通信与信息工程学院, 上海 200444

收稿日期:2020-12-22 出版日期:2023-02-28 发布日期:2023-03-28
通讯作者: 薛雷 E-mail:xuelei@shu.edu.cn
作者简介:薛雷(1963—), 男, 副教授, 博士, 研究方向为模式识别. E-mail: xuelei@shu.edu.cn
基金资助:
上海市科委基金资助项目(19511105503)

Emotional analysis model of financial text based on the BERT

ZHU He, LU Xiaofeng, XUE Lei()

School of Communication & Information Engineering, Shanghai University, Shanghai 200444, China

Received:2020-12-22 Online:2023-02-28 Published:2023-03-28
Contact: XUE Lei E-mail:xuelei@shu.edu.cn

摘要/Abstract

摘要：

在金融领域, 越来越多的投资者选择在互联网平台上发表自己的见解. 这些评论文本作为舆情的载体, 可以充分反映投资者情绪, 影响投资决策和市场走势. 情感分析作为自然语言处理(natural language processing, NLP) 中重要的分支, 为分析海量的金融文本情感类型提供了有效的研究手段. 由于特定领域文本的专业性和大标签数据集的不适用性, 金融文本的情感分析是对传统情感分析模型的巨大挑战, 传统模型在准确率与召回率上表现较差. 为了克服这些挑战, 针对金融文本的情感分析任务, 从词表示模型出发, 提出了基于金融领域的全词覆盖与特征增强的BERT(bidirectional encoder representations from Transformers) 预处理模型.

关键词: 情感分析, 词嵌入向量, BERT, 词性特征, 命名实体识别

Abstract:

n the financial sector, more and more investors choose to express their opinions on the internet platform. These comment texts can fully reflect investor sentiment and influence their investment decisions and market trends. Emotion analysis as an important branch of natural language processing (NLP), which provides an effective research means for analyzing a large number of text emotional types in financial sector. However, due to the professional nature of domain-specific texts and the inapplicability of large label data sets, text emotion analysis in the financial field has brought great challenges to the traditional emotion analysis model. When the general emotion analysis model is applied to specific fields such as finance, its accuracy and recall rate are poor. In order to overcome these challenges, a BERT (bidirectional encoder representations from Transformers) preprocessing model based on full word coverage and feature enhancement in financial field was proposed for the emotional analysis task of financial text from the perspective of word representation model.

Key words: sentiment analysis, word embedded vector, BERT, bag-of-POS (part of speech), named entity recognition

中图分类号:

TP 391.1

朱鹤, 陆小锋, 薛雷. 基于BERT 的金融文本情感分析模型[J]. 上海大学学报(自然科学版), 2023, 29(1): 118-128.

ZHU He, LU Xiaofeng, XUE Lei. Emotional analysis model of financial text based on the BERT[J]. Journal of Shanghai University（Natural Science Edition）, 2023, 29(1): 118-128.

图/表 7

表1

图1

图2

表2

表3

表4

表5

参考文献 17

[1]	Pang B, Lee L. Opinion mining and sentiment analysis[M]. Hanover: Now Publishers Inc, 2008.
[2]	Jiao J, Zhou Y. Sentiment polarity analysis based multi-dictionary[J]. Physics Procedia, 2011, 22: 590-596.
[3]	Jurek A, Mulvenna M D, Bi Y. Improved lexicon-based sentiment analysis for social media analytics[J]. Security Informatics, 2015, 4(1): 9.
[4]	Li F. The information content of forward-looking statements in corporate filings: a naÏve Bayesian machine learning approach[J]. Journal of Accounting Research, 2010, 48: 1049-1102.
[5]	Hai Z, Cong G, Chang K, et al. Analyzing sentiments in one go: a supervised joint topic modeling approach[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(6): 1172-1185.
[6]	Singh J, Singh G, Singh R. Optimization of sentiment analysis using machine learning classifiers[J]. Human-centric Computing and Information Sciences, 2017, 7: 1-32.
[7]	Al-Amrani Y, Lazaar M, El-Kadiri K E. Random forest and support vector machine based hybrid approach to sentiment analysis[J]. Procedia Computer Science, 2018, 127: 511-520.
[8]	杨开漠, 吴明芬, 陈涛. 广义文本情感分析综述[J]. 计算机应用, 2019, 39(S2): 6-14.
[9]	Man X, Luo T, Lin J. Financial sentiment analysis (FSA): a survey[C]// 2019 IEEE International Conference on Industrial Cyber Physical Systems (ICPS). 2019: 617-622.
[10]	Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
[11]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[12]	Matthew E P, Neumann M, Lyyer M, et al. Deep contextualized word representa tions[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2018: 2227-2237.
[13]	Alec R, Karthik N, Tim S, et al. Improving language understanding by generative pre-training[EB/OL]. [[2020-12-01].http://www.nlpir.org/wordpress/wp-content/uploads/2019/06/Improving-language-understanding-by-generative-pre-training.pdf.
[14]	Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners[EB/OL]. [[2020-12-01].https://d4mucfpksywv.cloudfront.net/better-language-odels/language-models.pdf.
[15]	Yang Z L, Dai Z H, Yang Y M, et al. XLNet: Generalized autoregressive pretraining for language understanding[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019: 5753-5763.
[16]	Dai, Z H, Yang Z L, Yang Y M, et al. Transformer-XL: attentive language models beyond a fixed-length context[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 2978-2988.
[17]	祝清麟, 梁斌, 徐睿峰, 等. 结合金融领域情感词典和注意力机制的细粒度情感分析[J]. 中文信息学报, 2022, 36(18): 109-117.

基于BERT 的金融文本情感分析模型

Emotional analysis model of financial text based on the BERT

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 17

相关文章 15

编辑推荐

Metrics

本文评价

[1]	陈茜, 武星. 结合上下文词汇匹配和图卷积的材料数据命名实体识别[J]. 上海大学学报(自然科学版), 2022, 28(3): 372-385.
[2]	金彦亮, 谢晋飞, 吴迪嘉. 基于分层标注的中文嵌套命名实体识别[J]. 上海大学学报(自然科学版), 2022, 28(2): 270-280.
[3]	杨一璞, 朱永华, 高海燕, 高文靖. 一种结合文章信息的新闻评论情感分析方法[J]. 上海大学学报(自然科学版), 2022, 28(1): 170-178.
[4]	张克, 张文俊, 朱蕴文, 邢毅雪. 基于内联关系的方面级情感分析方法[J]. 上海大学学报(自然科学版), 2022, 28(1): 157-169.
[5]	王晓霞, 袁学颖. Appell 函数和 Humbert 函数的积分表达[J]. 上海大学学报(自然科学版), 2021, 27(5): 907-918.
[6]	卢啸华, 王永超, 丁洋. 达到Gilbert-Varshamov界的准扭码[J]. 上海大学学报(自然科学版), 2021, 27(2): 289-297.
[7]	周敬一, 郭燕, 丁友东. 基于深度学习的中文影评情感分析[J]. 上海大学学报(自然科学版), 2018, 24(5): 703-712.
[8]	杨必成1, 陈强2. 一个含对数核半离散的Hilbert 型不等式[J]. 上海大学学报(自然科学版), 2014, 20(6): 726-732.
[9]	杨必成. 一个含单参数半离散的Hilbert不等式[J]. 上海大学学报(自然科学版), 2012, 18(5): 484-488.
[10]	杨必成. 关于一个非齐次核的Hilbert型积分不等式[J]. 上海大学学报(自然科学版), 2011, 17(5): 603-605.
[11]	马亚丽，叶万洲. Hilbert 空间中加权框架的扰动性及其应用[J]. 上海大学学报(自然科学版), 2010, 16(3): 277-280.
[12]	杨必成. 一个基本-1齐次Hilbert型积分不等式[J]. 上海大学学报(自然科学版), 2009, 15(5): 493-495.
[13]	冯素晓1 张道祥1 2 卢志明1 刘宇陆1. 应力偶流体的D’Alembert流动[J]. 上海大学学报(自然科学版), 2009, 15(3): 296-300.
[14]	杨必成. 一个新的Hilbert型不等式[J]. 上海大学学报(自然科学版), 2007, 13(3): 274-278 .
[15]	钟五一;杨必成. 关于推广的HardyHilbert积分不等式的一个等价式[J]. 上海大学学报(自然科学版), 2007, 13(1): 51-54 .