基于BERT 的金融文本情感分析模型

朱鹤, 陆小锋, 薛雷

doi:10.12066/j.issn.1007-2861.2308

上海大学学报(自然科学版) >

2023 , Vol. 29 >Issue 1: 118 - 128

DOI: https://doi.org/10.12066/j.issn.1007-2861.2308

研究论文

基于BERT 的金融文本情感分析模型

展开

上海大学通信与信息工程学院, 上海 200444

薛雷(1963—), 男, 副教授, 博士, 研究方向为模式识别. E-mail: xuelei@shu.edu.cn

收稿日期: 2020-12-22

网络出版日期: 2023-03-28

基金资助

上海市科委基金资助项目(19511105503)

收起

Emotional analysis model of financial text based on the BERT

Expand

School of Communication & Information Engineering, Shanghai University, Shanghai 200444, China

Received date: 2020-12-22

Online published: 2023-03-28

Fold

摘要

在金融领域, 越来越多的投资者选择在互联网平台上发表自己的见解. 这些评论文本作为舆情的载体, 可以充分反映投资者情绪, 影响投资决策和市场走势. 情感分析作为自然语言处理(natural language processing, NLP) 中重要的分支, 为分析海量的金融文本情感类型提供了有效的研究手段. 由于特定领域文本的专业性和大标签数据集的不适用性, 金融文本的情感分析是对传统情感分析模型的巨大挑战, 传统模型在准确率与召回率上表现较差. 为了克服这些挑战, 针对金融文本的情感分析任务, 从词表示模型出发, 提出了基于金融领域的全词覆盖与特征增强的BERT(bidirectional encoder representations from Transformers) 预处理模型.

关键词： 情感分析; 词嵌入向量; BERT; 词性特征; 命名实体识别

本文引用格式

朱鹤, 陆小锋, 薛雷 . 基于BERT 的金融文本情感分析模型[J]. 上海大学学报(自然科学版), 2023 , 29(1) : 118 -128 . DOI: 10.12066/j.issn.1007-2861.2308

Abstract

n the financial sector, more and more investors choose to express their opinions on the internet platform. These comment texts can fully reflect investor sentiment and influence their investment decisions and market trends. Emotion analysis as an important branch of natural language processing (NLP), which provides an effective research means for analyzing a large number of text emotional types in financial sector. However, due to the professional nature of domain-specific texts and the inapplicability of large label data sets, text emotion analysis in the financial field has brought great challenges to the traditional emotion analysis model. When the general emotion analysis model is applied to specific fields such as finance, its accuracy and recall rate are poor. In order to overcome these challenges, a BERT (bidirectional encoder representations from Transformers) preprocessing model based on full word coverage and feature enhancement in financial field was proposed for the emotional analysis task of financial text from the perspective of word representation model.

Key words： sentiment analysis; word embedded vector; BERT; bag-of-POS (part of speech); named entity recognition

参考文献

[1]	Pang B, Lee L. Opinion mining and sentiment analysis[M]. Hanover: Now Publishers Inc, 2008.
[2]	Jiao J, Zhou Y. Sentiment polarity analysis based multi-dictionary[J]. Physics Procedia, 2011, 22: 590-596.
[3]	Jurek A, Mulvenna M D, Bi Y. Improved lexicon-based sentiment analysis for social media analytics[J]. Security Informatics, 2015, 4(1): 9.
[4]	Li F. The information content of forward-looking statements in corporate filings: a naÏve Bayesian machine learning approach[J]. Journal of Accounting Research, 2010, 48: 1049-1102.
[5]	Hai Z, Cong G, Chang K, et al. Analyzing sentiments in one go: a supervised joint topic modeling approach[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(6): 1172-1185.
[6]	Singh J, Singh G, Singh R. Optimization of sentiment analysis using machine learning classifiers[J]. Human-centric Computing and Information Sciences, 2017, 7: 1-32.
[7]	Al-Amrani Y, Lazaar M, El-Kadiri K E. Random forest and support vector machine based hybrid approach to sentiment analysis[J]. Procedia Computer Science, 2018, 127: 511-520.
[8]	杨开漠, 吴明芬, 陈涛. 广义文本情感分析综述[J]. 计算机应用, 2019, 39(S2): 6-14.
[9]	Man X, Luo T, Lin J. Financial sentiment analysis (FSA): a survey[C]// 2019 IEEE International Conference on Industrial Cyber Physical Systems (ICPS). 2019: 617-622.
[10]	Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
[11]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[12]	Matthew E P, Neumann M, Lyyer M, et al. Deep contextualized word representa tions[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2018: 2227-2237.
[13]	Alec R, Karthik N, Tim S, et al. Improving language understanding by generative pre-training[EB/OL]. [[2020-12-01].http://www.nlpir.org/wordpress/wp-content/uploads/2019/06/Improving-language-understanding-by-generative-pre-training.pdf.
[14]	Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners[EB/OL]. [[2020-12-01].https://d4mucfpksywv.cloudfront.net/better-language-odels/language-models.pdf.
[15]	Yang Z L, Dai Z H, Yang Y M, et al. XLNet: Generalized autoregressive pretraining for language understanding[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019: 5753-5763.
[16]	Dai, Z H, Yang Z L, Yang Y M, et al. Transformer-XL: attentive language models beyond a fixed-length context[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 2978-2988.
[17]	祝清麟, 梁斌, 徐睿峰, 等. 结合金融领域情感词典和注意力机制的细粒度情感分析[J]. 中文信息学报, 2022, 36(18): 109-117.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献