中国综合性科技类核心期刊(北大核心)

中国科学引文数据库来源期刊(CSCD)

美国《化学文摘》(CA)收录

美国《数学评论》(MR)收录

俄罗斯《文摘杂志》收录

Message Board

Respected readers, authors and reviewers, you can add comments to this page on any questions about the contribution, review, editing and publication of this journal. We will give you an answer as soon as possible. Thank you for your support!

Name
E-mail
Phone
Title
Content
Verification Code
Issue 3
Jul.  2013
Turn off MathJax
Article Contents
XU Bo-xi, HU Ning, CHEN Wen-bin, GAO Wei-guo, CHENG Jin. Efficient implementation for LDA in Mahout[J]. Journal of East China Normal University (Natural Sciences), 2013, (3): 118-130.
Citation: XU Bo-xi, HU Ning, CHEN Wen-bin, GAO Wei-guo, CHENG Jin. Efficient implementation for LDA in Mahout[J]. Journal of East China Normal University (Natural Sciences), 2013, (3): 118-130.

Efficient implementation for LDA in Mahout

  • Received Date: 2013-03-01
  • Rev Recd Date: 2013-04-01
  • Publish Date: 2013-05-25
  • In a careful study of Latent Dirichlet Allocation (LDA) using Gibbs sampling and the MapReduce framework, an efficient implementation for LDA in Mahout was achieved. The experiments showed the high performance of this distributed parallel LDA program, and several issues about enhancing performance were discussed.
  • loading
  • [1]
    {1}

     BLEI D M, NG A Y, JORDAN M I.

      Latent Dirichlet allocation[J].

       Journal of Machine Learning Research, 2003 (3): 993-1022.

     {2}

      GRIFFITHS T L, STEYVERS M.

      Finding scientific topics[J].

      Proceedings of the National Academy of Sciences, 2004(101): 5228-5235.

     {3}

      VENNER J.

      Pro Hadoop[M].

      New York: Apress, 2009.

     {4}

      OWEN S, ANIL R, DUNNING T, FRIEDMAN E.

      Mahout in Action[M].

      New York: Manning Publications, 2010.

     {5}

      STEYVERS M, GRIFFITHS T.

      Probabilistic topic models[M]//LANDAUER T,

      MCNAMARA D, DENNIS S, et al. Latent Semantic Analysis: A Road to Meaning.[s.l.]:Routledge, 2007.

     {6}

      HEINRICH G.

      Parameter estimation for text analysis[R].

      Darmstadt: Fraunhofer IGD, 2004.

     {7}

      NEWMAN D, ASUNCION A, SMYTH P, WELLING M.

      Distributed inference for latent Dirichlet allocation[J].

     Proc Neural Information Processing Systems, 2007(20): 1081-1088.

     {8}

      WANG Y, BAI H J, STANTON M, et al.

      PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications[M].

     Lecture Notes in Computer Science 5564. Berlin: Springer, 2009: 301-314.

     {9}

      GRIFFITHS T L, STEYVERS M.

      A probabilistic approach to semantic representation[C]// Proceedings of the Twenty-Fourth Annual Conference of Cognitive Science Society,

      2002.

     {10}

      LIU Z Y, ZHANG Y Z, CHANG E Y.

      PLDA+: parallel latent Dirichlet allocation with data placement and pipeline processing[J].

     ACM Transactions on Intelligent Systems and Technology, 2011(2): 26.

     {11}

      SMOLA A, NARAYANAMURTHY S.

      An architecture for parallel topic models[J].

     Proceedings of the VLDB Endowment, 2010(3): 703-710.

     {12}

      EKANAYAKE J, LI H, ZHANG B J, et al.

      Twister: a runtime for iterative MapReduce[J].

     Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010(1): 810-818.

     {13}

      BU Y Y, HOWE B, BALAZINSKA M, et al.

      HaLoop: efficient iterative data processing on large clusters[J].

    Proceedings of the VLDB Endowment, 2010(3): 285-296.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索
    Article views (5168) PDF downloads(3386) Cited by()
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return