一种面向微博主题挖掘的改进LDA模型

谢昊; 江红; 谢昊; 江红

[1]

［1］ ZHAO W X, HE J, YAN H F, et al. Comparing Twitter and traditional media using topic models［J］. Advances in Information Retrieval, Proceedings. 2011, 6611:338-349.

［2］ NOORDHUIS P, HEIJKOOP M, LAZOVIK A. Mining Twitter in the cloud: a case study［C］. Cloud Computing (CLOUD), 2010 IEEE 3rd International Conference. 2010 July, 107-114.

［3］ KANG J H, LERMAN K, PLANGPRASOPCHOK A. Analyzing microblogs with affinity propagation ［C］//Proc of the 1st KDD Workshop on Social Media Analytic. New York: ACM, 2010: 67-70.

［4］ BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet Allocation［J］. Journal of Machine Learning Research, 2003, 3:993-1022.

［5］张晨逸, 孙建伶, 丁轶群. 基于MB-LDA模型的微博主题挖掘［J］. 计算机研究与发展,2011, 48(10): 1795-1802.

［6］ RAMAGE D, DUMAIS S, LIEBLING D. Characterizing microblogs with topic models［C］. ICWSM, 2010:130-137.

［7］廉捷, 周欣, 曹伟, 刘云. 新浪微博数据挖掘方案［J］. 清华大学学报:自然科学版,2011 51(10): 1300-1305. 

［8］ ZHANG H P, YU H K, XIONG D Y, et al. HHMM-based chinese lexical analyzer ICTCLAS［C］//Proc of the 2nd SigHan Workshop. 2003: 184-187.

［9］ DEERWESTER S, DUMAIS S, LANDAUER T. Indexing by latent semantic analysis［J］. Journal of the American Society of Information Science. 1990, 41(6):391-407.

［10］ HOFMANN T. Probabilistic latent semantic indexing［C］//Proc of the 22nd Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval．New York: ACM, 1999:50-57.

［11］ BLEI D M. Probabilistic topic models［C］. Communications of the ACM. 2012, 4:77-84.

［12］ BISHOP C M. Pattern Recognition and Machine Learning［M］. Germany: Springer, 2007.

［13］ PHILIP R, ERIC H. Gibbs sampling for the uninitiated［R］. Technical Reports from UMIACS, 2010, 6.

［14］ STEYVERS M, GRIFFITHS T. Probabilistic topic models［J］. Handbook of Latent Semantic Analysis, 2007, 427(7):424-440.

［15］ WENG J S, LIM E P, JIANG J, et al. TwitterRank: finding topic-sensitive influential Twitterers［C］//Proceedings of the third ACM WSDM, 2010.

［16］ GRIFFITHS T L, STEYVERS M. Finding scientific topics［C］//Proc of the National Academy of Sciences of the United States of America, 2004, 101: 5228-5235.

［17］ IDO D, LEE L, PEREIRA F. Similarity-based methods for word sense disambiguation［C］//Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, 1997: 56-63.

［18］ KULLBACK S, LEIBLER R. A. On Information and sufficiency［C］. Annals of Mathematical Statistics, 1951, 22(1): 79-86.

［19］ HONG L, DAVISON B D. Empirical study of topic modeling in Twitter［C］//Proceedings of the SIGKDD Workshop on Social Media Analytics, 2010.