Study on the extraction of Chinese microblog subjective sentences based on lexicon and corpus
-
摘要: 提出一种基于词典与语料结合的中文微博主观句抽取方法, 通过判断句子中是否包含情感表达文本来判断句子是否为主观句. 首先, 从现有的情感词典中挑选出情感倾向较为固定的情感词构建了一个高可信情感词典, 用于抽取句子中的情感表达文本, 保证情感表达文本抽取的准确率; 然后提出~N-POSW~模型, 并基于~2-POS~W模型通过语料学习的方法较为准确地抽取句子中的剩余情感表达文本, 保证了情感表达文本抽取的召回率. 实验结果表明, 相比于传统的基于大规模情感词典的方法, 本文方法主观句抽取的F值提高了7%.Abstract: In this paper, we propose a new method for the extraction of Chinese microblog subjective sentence, which is based on a combination of lexicon and corpus. By determining whether the sentence contains emotional expressions, it can be classified as a subjective or objective sentence. Firstly, a highly credible sentiment lexicon was built based on the words whose emotional orientation is fixed from the existing sentiment dictionary. Based on the highly credible sentiment lexicon, sentiment expressions can be extracted with assurance of accuracy. Finally, a N-POSW model was proposed for the corpus-based learning method. Through the 2-POSW model, the remained sentiment expressions in the sentence can be extracted, thus guaranteeing the overall recall rate. Experimental results show that the F Value in this paper increases 7{\%} compared with the traditional method, which is based on the large-scale sentiment lexicon.
-
Key words:
- sentiment lexicon /
- highly credible lexicon /
- N-POSW model /
- subjective sentence
-
[1] {1} KIM S M, HOVY E. Automatic detection of opinion bearing words and sentences[C]//Companion Volume to the Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP). Berlin: Springer, 2005: 61-66.{2} WIEBE J, WILSON T, BELL M. Identifying collocations for recognizing opinions[C]//Proceedings of the ACL'01 Workshop on Collocation: Computational Extraction, Analysis, and Exploitation. Toulouse, FR: ACL, 2001: 24-31.{3} WIEBE J, WILSON T. Learning to disambiguate potentially subjective expressions[C]//Proceedings of the 6th conference on Natural language learning-Volume 20. Stroudsburg, PA: Association for Computational Linguistics, 2002: 1-7.{4} WILSON T, WIEBE J, HWA R. Just how mad are you? Finding strong and weak opinion clauses[C]//Proceedings of the National Conference on Artificial Intelligence. Menlo Park, CA; MIT Press; 1999, 2004: 761-769.{5} WILSON T, WIEBE J, HEA R. Recognizing strong and weak opinion clauses[J]. Computational Intelligence. 2006, 22(2): 73-99.{6} PANG B, LEE L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts[C]//Proceedings of the 42nd annual meeting on Association for Computational Linguistics. [S.l.]: Association for Computational Linguistics, 2004: 271-278.{7} LONG J, MO Y. Target-dependent Twitter Sentiment Classification [C]//Proceeding of the 49th Annual meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2011: 151-160.{8} 叶强, 张紫琼, 罗振雄. 面问互联网评论情感分析的中文主观性自动判别方法研究[J]. 信息系统学报, 2007, 1(1): 7-91.{9} 张博. 基于~SVM~的中文观点句抽取[D]. 北京邮电大学, 2011.{10} 杨武, 宋静静, 唐继强. 中文微博情感分析中主客观句分类方法[J]. 重庆理工大学学报: 自然科学. 2013, 27(1): 51-56.{11} 董振东, 董强. 知网简介[DB/OL]. [2013-7-20]. http://www.keenage.com.{12} 台湾大学NTUSD-简体中文情感极性词典[DB/OL]. [2013-7-20]. http://www.datatang.com/data/11837.{13} ICTCLAS ICTLAS汉语分词系统[DB/OL]. [2014-06-10]. http://www.ictclas.org.{14} 中文信息技术专业委员会. 中文微博情感分析评测[EB/OL]. [2013-7-20]. http://tcci.ccf.org.cn/conference/2012/pages/page04_eva.html.
点击查看大图
计量
- 文章访问数: 1052
- HTML全文浏览量: 16
- PDF下载量: 2031
- 被引次数: 0