基于大规模弹幕数据监听和情感分类的舆情分析模型

叶健; 赵慧

doi:10.3969/j.issn.1000-5641.2019.03.010

基于大规模弹幕数据监听和情感分类的舆情分析模型

doi: 10.3969/j.issn.1000-5641.2019.03.010

叶健,
赵慧^,

华东师范大学计算机科学与软件工程学院, 上海 200062

详细信息

作者简介:
叶健, 男, 硕士研究生, 研究方向为自然语言处理.E-mail:arthurhappy@qq.com

通讯作者:
赵慧, 女, 教授, 硕士生导师, 研究方向为数据管理与分布式计算.E-mail:hzhao@sei.ecnu.edu.cn

中图分类号: TP391
计量
- 文章访问数: 227
- HTML全文浏览量: 85
- PDF下载量: 122
- 被引次数: 0
出版历程
- 收稿日期: 2018-07-26
- 刊出日期: 2019-05-25

A public opinion analysis model based on Danmu data monitoring and sentiment classification

YE Jian,
ZHAO Hui^,

School of Computer Science and Software Engineering, East China Normal University, Shanghai 200062, China

摘要

摘要: 随着在线视频平台的快速发展，弹幕逐渐成为人们表达观点的一个重要途径，尤其受到年轻人的欢迎.与常规的文本不同，弹幕文本普遍较短，表达随意，网络词汇较多，一些常规的停用词被用于表达情感.提出了一种基于弹幕数据的舆情分析模型，针对弹幕数据生成和存储特点，提出了热点检测循环自适应弹幕数据获取算法；扩充了情感词典来区分弹幕中情感倾向数据和中性数据，以解决弹幕中出现的网络词汇较多的问题；基于卷积神经网络（Convolutional Neural Network，CNN）建立了情感褒贬分类模型，用来区分情感倾向弹幕的正负情感倾向，在此基础上得到了舆情分析的结果.实验表明，本文的舆情分析模型能有效地表达新闻类弹幕数据的舆情分析结果.
- 弹幕情感 /
- 网络舆情 /
- 情感分类 /
- 深度学习 /
- 网络爬虫
Abstract: With the rapid development of online video platforms, Danmu has gradually become an important way for people to express their opinions, and it is particularly welcomed by young people. Unlike conventional texts, Danmu texts are generally short, unstructured, and involve Internet slang as well as conventional stop words to express emotions. In this paper, a public opinion analysis model based on Danmu data is proposed. According to the data generation and storage characteristics of Danmu, a hotspot detection-based loop algorithm is proposed for Danmu data collection. Moreover, the sentiment dictionary to distinguish emotional tendencies is expanded to include network vocabularies commonly appearing in Danmu. Finally, based on the convolutional neural network (CNN), we build a classification model to distinguish positive and negative emotions. Experiments show that the public opinion analysis model of this paper can effectively demonstrate public opinion analysis of Danmu data.
- Danmu emotion /
- Internet sentiment /
- emotion classification /
- deep learning /
- web crawler

HTML全文

图 1 弹幕数量时间分布图

Fig. 1 Distribution of Danmu's quantity over time

下载: 全尺寸图片幻灯片

图 2 数据抓取次数与间隔时间关系图

Fig. 2 Relationship between the number and interval between data captures

下载: 全尺寸图片幻灯片

图 3 弹幕情感分类过程

Fig. 3 Danmu emotion classification process

下载: 全尺寸图片幻灯片

图 4 弹幕数据示例

Fig. 4 Danmu data example

下载: 全尺寸图片幻灯片

图 5 训练样本数量对不同分类模型精度的影响

Fig. 5 The effect of variability in the number of training samples on the accuracy of different classification models

下载: 全尺寸图片幻灯片

图 6 训练样本数量对不同分类模型查全率的影响

Fig. 6 The effect of variability in the number of training samples on the recall of different classification models

下载: 全尺寸图片幻灯片

图 7 训练样本数量对不同分类模型查准率的影响

Fig. 7 The effect of variability in the number of training samples on the precision of different classification models

下载: 全尺寸图片幻灯片

图 8 训练样本数量对不同分类模型$F1$-score的影响

Fig. 8 The effect of variability in the number of training samples on the $F1$-score of different classification models

下载: 全尺寸图片幻灯片

图 9 中性、正向、负向弹幕分布情况图

Fig. 9 Distribution of neutral, positive, and negative Danmu

下载: 全尺寸图片幻灯片

图 10 正向、负向弹幕分布情况图

Fig. 10 Distribution of positive and negative Danmu

下载: 全尺寸图片幻灯片

图 11 不同极性弹幕数量随发布时间变化图

Fig. 11 The number of different polarity Danmu vs. release time

下载: 全尺寸图片幻灯片

图 12 不同极性弹幕数量随视频时间变化图

Fig. 12 The number of different polarity Danmu vs. video time

下载: 全尺寸图片幻灯片

图 13 中性、正向、负向弹幕分布情况图

Fig. 13 Distribution of neutral, positive, and negative Danmu

下载: 全尺寸图片幻灯片

图 14 正向、负向弹幕分布情况图

Fig. 14 Distribution of positive and negative Danmu

下载: 全尺寸图片幻灯片

图 15 不同极性弹幕数量随发布时间变化图

Fig. 15 The number of different polarity Danmu vs. release time

下载: 全尺寸图片幻灯片

图 16 2017年12月10日14点前弹幕标签云

Fig. 16 Danmu tag cloud before 14:00 on December 10, 2017

下载: 全尺寸图片幻灯片

图 17 2017年12月10日14点后弹幕标签云

Fig. 17 Danmu tag cloud after 14:00 on December 10, 2017

下载: 全尺寸图片幻灯片

算法1:热点检测循环自适应数据获取算法
FUNCTION hotpot_cycle ()
输入:无
输出:无
1: WHILE true DO
2: $x$ = 1 //初始化采集次数为1
3: span_time = span($x$) //初始化采集间隔时间
4: WHILE true DO
5: sleep(span_time) //等待span_time时间
6: danmus = spider_get_danmus() //获取弹幕池中的所有弹幕数据
7: pool_size = spider_get_pool_size() //获取弹幕池大小
8: num_of_danmu = compare_and_storge(danmus) //弹幕去重计算新弹幕的数量
9: IF (num_of_danmu $>$ $\partial$($n$, span_time)) or $x>20$ THEN //当弹幕数量大于阈值
或循环次数大于20次时, 跳出此层循环, 回到上层循环
10: break
11: END
12: x++
13: span_time = span(x)
14: END
15: END

下载: 导出CSV

算法2:爬取时间计算函数
FUNCTION span (x)
输入:爬虫爬取次数
输出:爬虫爬取的间隔时间
1: IF $x\leq 6$ THEN
2: return 2^($\pmb x-1$)
3: END
4: return span(6)+2($\pmb x-6$) //递归调用

下载: 导出CSV

算法3:阈值计算函数
FUNCTION $\partial$ ($\pmb n, \bf span$)
输入:弹幕池大小
当前爬取时间间隔
输出:弹幕增长阈值
1: return $\pmb n\times$span/60

下载: 导出CSV

表 1 微博评论标记数据

Tab. 1 Weibo comment tagged data

标记	评论
正向	鼓掌超厉害
正向	更多选择, 更多欢笑嘻嘻
正向	这个可以有$\sim \sim $哈哈
负向	挤公交挤死衰
负向	呵呵, 头像撞了抓狂
负向	去你妈得周末怒

下载: 导出CSV

表 2 3组随机弹幕数据

Tab. 2 Three sets of random Danmu data

组别	文本总数	正向数量	负向数量
第一组	300	150	150
第二组	300	150	150
第三组	300	150	150

下载: 导出CSV

表 3 微博数据分类模型评价

Tab. 3 Evaluation of a Weibo data classification model

模型	精度/%	查准率/%	查全率/%	$F1$-score/%
NB	79.4	76.3	72.8	74.5
SVM	84.3	85.3	83.6	84.4
CNN	89.7	89.8	89.6	89.7

下载: 导出CSV

表 4 不同分类模型在弹幕情感分类中的性能评价

Tab. 4 Evaluation of different classification models in Danmu sentiment classification

模型	组别	有效性
模型	组别	精度/%	查准率/%	查全率/%	$F1$-score/%
NB	第一组	62.3	66.7	61.3	63.9
	第二组	68.7	71.3	67.7	69.5
	第三组	59.6	64.0	58.9	61.3
	平均	63.5	67.3	62.6	64.9
SVM	第一组	75.3	80.0	73.2	76.4
	第二组	74.7	76.0	74.0	75.0
	第三组	72.7	75.4	71.5	73.4
	平均	74.2	77.1	72.9	74.9
CNN	第一组	82.3	84.7	81.4	83.0
	第二组	87.7	89.3	86.5	87.9
	第三组	84.3	81.3	86.5	83.8
	平均	84.8	85.5	84.8	84.9

下载: 导出CSV

参考文献(15)

[1]	燕道成.新媒体对"95后"青年文化的解构[J].当代青年研究, 2017(6):35-40. doi: 10.3969/j.issn.1006-1789.2017.06.006
[2]	HE M, GE Y, CHEN E, et al. Exploring the Emerging Type of Comment for Online Videos:DanMu[J]. ACM Transactions on the Web (TWEB), 2017, 12(1):Article No 1.
[3]	JIA A L, SHEN S, CHEN S, et al. An analysis on a YouTube-like UGC site with enhanced social features[C]//Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 2017:1477-1483. https://www.researchgate.net/publication/322410722_An_Analysis_on_a_YouTube-like_UGC_site_with_Enhanced_Social_Features
[4]	郑飏飏, 徐健, 肖卓.情感分析及可视化方法在网络视频弹幕数据分析中的应用[J].数据分析与知识发现, 2016, 31(11):82-90. doi: 10.11925/infotech.1003-3513.2016.11.10
[5]	邓扬, 张晨曦, 李江峰.基于弹幕情感分析的视频片段推荐模型[J].计算机应用, 2017, 37(4):1065-1070. http://d.old.wanfangdata.com.cn/Periodical/jsjyy201704028
[6]	THELWALL M, BUCKLEY K, PALTOGLOU G. Sentiment strength detection for the social web[J]. Journal of the Association for Information Science and Technology, 2012, 63(1):163-173. http://cn.bing.com/academic/profile?id=9dcd5b6be4e78af0fd60addc5f38a913&encoded=0&v=paper_preview&mkt=zh-cn
[7]	SAIF H, HE Y, FERNANDEZ M, et al. Contextual semantics for sentiment analysis of Twitter[J]. Information Processing & Management, 2016, 52(1):5-19. http://cn.bing.com/academic/profile?id=fc065f4944c8509ef41529307252b541&encoded=0&v=paper_preview&mkt=zh-cn
[8]	YUAN D, ZHOU Y Q, LI R F, et al. Sentiment analysis of microblog combining dictionary and rules[C]//Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on. IEEE, 2014:785-789. https://www.researchgate.net/publication/286107900_Sentiment_analysis_of_microblog_combining_dictionary_and_rules
[9]	刘志明, 刘鲁.基于机器学习的中文微博情感分类实证研究[J].计算机工程与应用, 2012, 48(1):1-4. doi: 10.3778/j.issn.1002-8331.2012.01.001
[10]	秦锋, 王恒, 郑啸, 等.基于上下文语境的微博情感分析[J].计算机工程, 2017, 34(3):241-246. doi: 10.3969/j.issn.1000-3428.2017.03.040
[11]	陈龙, 管子玉, 何金红, 等.情感分类研究进展[J].计算机研究与发展, 2017:54(6):1150-1170. http://d.old.wanfangdata.com.cn/Periodical/jsjyjyfz201706003
[12]	SRIRAM B, FUHRY D, DEMIR E, et al. Short text classification in twitter to improve information filtering[C]//Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2010:841-842. https://www.researchgate.net/publication/221300153_Short_text_classification_in_twitter_to_improve_information_filtering
[13]	KIM Y. Convolutional neural networks for sentence classification[J] arXiv preprint, 2014, arXiv:1408.5882.
[14]	LAI S W, XU L H, LIU K, et al. Recurrent convolutional neural networks for text classification[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015:2267-2273.
[15]	HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving neural networks by preventing co-adaptation of feature detectors[J]. arXiv preprint, arXiv:1207.0580, 2012. https://arxiv.org/abs/1207.0580