中国综合性科技类核心期刊(北大核心)

中国科学引文数据库来源期刊(CSCD)

美国《化学文摘》(CA)收录

美国《数学评论》(MR)收录

俄罗斯《文摘杂志》收录

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种面向微博主题挖掘的改进LDA模型

谢昊 江红

谢昊, 江红. 一种面向微博主题挖掘的改进LDA模型[J]. 华东师范大学学报(自然科学版), 2013, (6): 93-101.
引用本文: 谢昊, 江红. 一种面向微博主题挖掘的改进LDA模型[J]. 华东师范大学学报(自然科学版), 2013, (6): 93-101.
XIE Hao, JIANG Hong. Improved LDA model for microblog topic mining[J]. Journal of East China Normal University (Natural Sciences), 2013, (6): 93-101.
Citation: XIE Hao, JIANG Hong. Improved LDA model for microblog topic mining[J]. Journal of East China Normal University (Natural Sciences), 2013, (6): 93-101.

一种面向微博主题挖掘的改进LDA模型

详细信息
  • 中图分类号: TP39

Improved LDA model for microblog topic mining

  • 摘要: 随着新浪微博用户的不断增长,微博网站成为很多人获取信息的平台.但是微博是一种特殊的文本,其字数受到严格限制,传统的主题模型并不能很好地分析微博的内容.本文提出了一个基于LDA的微博生成模型RT-LDA来解决微博字数受限的问题.模型采用吉布斯抽样法来推导,不仅能准确地挖掘每条微博的主题,还能归纳出用户关注的主题分布情况.在真实数据集上的实验表明,RT-LDA模型能很好地对微博进行主题挖掘.
  • [1] [1] ZHAO W X, HE J, YAN H F, et al. Comparing Twitter and traditional media using topic models[J]. Advances in Information Retrieval, Proceedings. 2011, 6611:338-349.

    [2] NOORDHUIS P, HEIJKOOP M, LAZOVIK A. Mining Twitter in the cloud: a case study[C]. Cloud Computing (CLOUD), 2010 IEEE 3rd International Conference. 2010 July, 107-114.

    [3] KANG J H, LERMAN K, PLANGPRASOPCHOK A. Analyzing microblogs with affinity propagation [C]//Proc of the 1st KDD Workshop on Social Media Analytic. New York: ACM, 2010: 67-70.

    [4] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022.

    [5] 张晨逸, 孙建伶, 丁轶群. 基于MB-LDA模型的微博主题挖掘[J]. 计算机研究与发展,2011, 48(10): 1795-1802.

    [6] RAMAGE D, DUMAIS S, LIEBLING D. Characterizing microblogs with topic models[C]. ICWSM, 2010:130-137.

    [7] 廉捷, 周欣, 曹伟, 刘云. 新浪微博数据挖掘方案[J]. 清华大学学报:自然科学版,2011 51(10): 1300-1305. 

    [8] ZHANG H P, YU H K, XIONG D Y, et al. HHMM-based chinese lexical analyzer ICTCLAS[C]//Proc of the 2nd SigHan Workshop. 2003: 184-187.

    [9] DEERWESTER S, DUMAIS S, LANDAUER T. Indexing by latent semantic analysis[J]. Journal of the American Society of Information Science. 1990, 41(6):391-407.

    [10] HOFMANN T. Probabilistic latent semantic indexing[C]//Proc of the 22nd Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval.New York: ACM, 1999:50-57.

    [11] BLEI D M. Probabilistic topic models[C]. Communications of the ACM. 2012, 4:77-84.

    [12] BISHOP C M. Pattern Recognition and Machine Learning[M]. Germany: Springer, 2007.

    [13] PHILIP R, ERIC H. Gibbs sampling for the uninitiated[R]. Technical Reports from UMIACS, 2010, 6.

    [14] STEYVERS M, GRIFFITHS T. Probabilistic topic models[J]. Handbook of Latent Semantic Analysis, 2007, 427(7):424-440.

    [15] WENG J S, LIM E P, JIANG J, et al. TwitterRank: finding topic-sensitive influential Twitterers[C]//Proceedings of the third ACM WSDM, 2010.

    [16] GRIFFITHS T L, STEYVERS M. Finding scientific topics[C]//Proc of the National Academy of Sciences of the United States of America, 2004, 101: 5228-5235.

    [17] IDO D, LEE L, PEREIRA F. Similarity-based methods for word sense disambiguation[C]//Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, 1997: 56-63.

    [18] KULLBACK S, LEIBLER R. A. On Information and sufficiency[C]. Annals of Mathematical Statistics, 1951, 22(1): 79-86.

    [19] HONG L, DAVISON B D. Empirical study of topic modeling in Twitter[C]//Proceedings of the SIGKDD Workshop on Social Media Analytics, 2010.
  • 加载中
计量
  • 文章访问数:  3513
  • HTML全文浏览量:  25
  • PDF下载量:  2887
  • 被引次数: 0
出版历程
  • 收稿日期:  2012-11-01
  • 修回日期:  2013-02-01
  • 刊出日期:  2013-11-25

目录

    /

    返回文章
    返回