中国综合性科技类核心期刊(北大核心)

中国科学引文数据库来源期刊(CSCD)

美国《化学文摘》(CA)收录

美国《数学评论》(MR)收录

俄罗斯《文摘杂志》收录

Message Board

Respected readers, authors and reviewers, you can add comments to this page on any questions about the contribution, review, editing and publication of this journal. We will give you an answer as soon as possible. Thank you for your support!

Name
E-mail
Phone
Title
Content
Verification Code
Issue 5
Sep.  2017
Turn off MathJax
Article Contents
YU Ke-ren, FU Yun-bin, DONG Qi-wen. Survey on distributed word embeddings based on neural network language models[J]. Journal of East China Normal University (Natural Sciences), 2017, (5): 52-65, 79. doi: 10.3969/j.issn.1000-5641.2017.05.006
Citation: YU Ke-ren, FU Yun-bin, DONG Qi-wen. Survey on distributed word embeddings based on neural network language models[J]. Journal of East China Normal University (Natural Sciences), 2017, (5): 52-65, 79. doi: 10.3969/j.issn.1000-5641.2017.05.006

Survey on distributed word embeddings based on neural network language models

doi: 10.3969/j.issn.1000-5641.2017.05.006
  • Received Date: 2017-05-01
  • Publish Date: 2017-09-25
  • Distributed word embedding is one of the most important research topics in the field of Natural Language Processing, whose core idea is using lower dimensional vectors to represent words in text. There are many ways to generate such vectors, among which the methods based on neural network language models perform best. And the respective case is Word2vec, which is an open source tool developed by Google inc. in 2012. Distributed word embeddings can be used to solve many Natural Language Processing tasks such as text clusting, named entity tagging, part of speech analysing and so on. Distributed word embeddings rely heavily on the performance of the neural network language model it based on and the specific task it processes. This paper gives an overview of the distributed word embeddings based on neural network and can be summarized from three aspects, including the construction of classical neural network language models, the optimization method for multi-classification problem in language model, and how to use auxiliary structure to train word embeddings.
  • loading
  • [1]
    HARRIS Z S. Distributional structure[J]. Word, 1954, 10(2/3):146-162. doi:  10.1080/00437956.1954.11659520
    [2]
    FIRTH J R. A synopsis of linguistic theory, 1930-1955[J]. Studies in linguistic analysis, 1957(S):1-31 http://dingo.sbs.arizona.edu/~langendoen/ReviewOfFirth.pdf
    [3]
    来斯惟. 基于神经网络的词和文档语义向量表示方法研究[D]. 北京: 中国科学院大学, 2016.
    [4]
    TURIAN J, RATINOV L, BENGIO Y. Word representations:a simple and general method for semi-supervised learning[C]//ACL 2010, Proceedings of the Meeting of the Association for Computational Linguistics, July 11-16, 2010, Uppsala, Sweden. DBLP, 2010:384-394.
    [5]
    DEERWESTER S, DUMAIS S T, FURNAS G W, et al. Indexing by latent semantic analysis[J]. Journal of the American Society for Information Science, 1990, 41(6):391. doi:  10.1002/(ISSN)1097-4571
    [6]
    PENNINGTON J, SOCHER R, MANNING C. Glove:Global vectors for word representation[C]//Conference on Empirical Methods in Natural Language Processing, 2014:1532-1543.
    [7]
    BROWN P F, DESOUZA P V, MERCER R L, et al. Class-based n-gram models of natural language[J]. Computational linguistics, 1992, 18(4):467-479. http://dl.acm.org/citation.cfm?id=176316&picked=formats
    [8]
    GUO J, CHE W, WANG H, et al. Revisiting embedding features for simple semi-supervised learning[C]//Conference on Empirical Methods in Natural Language Processing, 2014:110-120.
    [9]
    CHEN X, XU L, LIU Z, et al. Joint learning of character and word embeddings[C]//International Conference on Artificial Intelligence. AAAI Press, 2015:1236-1242.
    [10]
    HINTON G E. Learning distributed representations of concepts[C]//Proceedings of the Eighth Annual Conference of the Cognitive Science Society, 1986:12.
    [11]
    MⅡKKULAINEN R, DYER M G. Natural language processing with modular neural networks and distributed lexicon[C]//Cognitive Science, 1991:343-399.
    [12]
    ALEXRUDNICKY. Can artificial neural networks learn language models?[C]//International Conference on Spoken Language Processing. DBLP, 2000:202-205.
    [13]
    BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 3(6):1137-1155. http://machinelearning.wustl.edu/mlpapers/paper_files/BengioDVJ03.pdf
    [14]
    MNIH A, HINTON G. Three new graphical models for statistical language modelling[C]//Machine Learning, Proceedings of the Twenty-Fourth International Conference. DBLP, 2007:641-648.
    [15]
    SUTSKEVER I, HINTON G E. Learning multilevel distributed representations for high-dimensional sequences[J]. Journal of Machine Learning Research, 2007(2):548-555. http://dblp.uni-trier.de/db/journals/jmlr/jmlrp2.html#SutskeverH07
    [16]
    MNIH A, HINTON G. A scalable hierarchical distributed language model[C]//Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December. DBLP, 2008:1081-1088.
    [17]
    MNIH A, KAVUKCUOGLU K. Learning word embeddings efficiently with noise-contrastive estimation[C]//Advances in Neural Information Processing Systems, 2013:2265-2273.
    [18]
    MIKOLOV T, KARAFIÁT M, BURGET L, et al. Recurrent neural network based language model[C]//INTERSPEECH 2010, Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September. DBLP, 2010:1045-1048.
    [19]
    MIKOLOV T, KOMBRINK S, DEORAS A, et al. Rnnlm-recurrent neural network language modeling toolkit[C]//Processingof the 2011 ASRU Workshop, 2011:196-201.
    [20]
    BENGIO Y, SIMARD P, FRASCONI P. Learning long-term dependencies with gradient descent is difficult[J]. IEEE Transactions on Neural Networks, 2002, 5(2):157-166. http://ieeexplore.ieee.org/xpl/abstractKeywords.jsp?reload=true&arnumber=279181&contentType=Journals+%26+Magazines
    [21]
    HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8):1735-1780. doi:  10.1162/neco.1997.9.8.1735
    [22]
    CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoderdecoder for statistical machine translation[C]//Empirical Methods in Natural Language Processing, 2014:1724-1734.
    [23]
    CHO K, VAN MERRIËNBOER B, BAHDANAU D, et al. On the properties of neural machine translation:Encoder-decoder approaches[J]. ArXiv preprint arXiv:1409.1259, 2014.
    [24]
    CHUNG J, GULCEHRE C, CHO K H, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[J]. ArXiv preprint arXiv:1412.3555, 2014.
    [25]
    GREFF K, SRIVASTAVA R K, KOUTNÍK J, et al. LSTM:A search space odyssey[J]. IEEE Transactions on Neural Networks & Learning Systems, 2015(99):1-11. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7508408
    [26]
    JOZEFOWICZ R, ZAREMBA W, SUTSKEVER I, et al. An empirical exploration of recurrent network architectures[C]//International Conference on Machine Learning, 2015:2342-2350.
    [27]
    MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. ArXiv preprint arXiv:1301.3781, 2013.
    [28]
    MORIN F, BENGIO Y. Hierarchical probabilistic neural network language model[C]//Aistats, 2005:246-252.
    [29]
    GOODMAN J. Classes for fast maximum entropy training[C]//IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, 2001:561-564.
    [30]
    FELLBAUM C, MILLER G. WordNet:An Electronic Lexical Database[M].Cambridge, MA:MIT Press, 1998.
    [31]
    MNIH A, HINTON G. A scalable hierarchical distributed language model[C]//International Conference on Neural Information Processing Systems. Curran Associates Inc, 2008:1081-1088.
    [32]
    LE H S, OPARIN I, ALLAUZEN A, et al. Structured Output Layer neural network language model[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2011:5524-5527.
    [33]
    MIKOLOV T, KOMBRINK S, BURGET L, et al. Extensions of recurrent neural network language model[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2011:5528-5531.
    [34]
    COLLOBERT R, WESTON J. A unified architecture for natural language processing:Deep neural networks with multitask learning[C]//International Conference. DBLP, 2008:160-167.
    [35]
    COLLOBERT R, WESTON J, BOTTOU L, et al. Natural Language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12(1):2493-2537. http://www.inf.ed.ac.uk/teaching/courses/tnlp/2014/Ryan.pdf
    [36]
    GUTMANN M, HYVÄRINEN A. Noise-contrastive estimation:A new estimationp rinciple for unnormalized statistical models[J]. Journal of Machine Learning Research, 2010(9):297-304.
    [37]
    GUTMANN M U, HYVARINEN A. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics[J]. Journal of Machine Learning Research, 2012, 13(1):307-361.
    [38]
    MNIH A, TEH Y W. A fast and simple algorithm for training neural probabilistic language models[C]//International Conference on Machine Learning, 2012:1751-1758.
    [39]
    BENGIO Y, SENÉCAL J S. Quick Training of Probabilistic Neural Nets by Impo rtance Sampling[C]//AISTATS, 2003:1-9.
    [40]
    ZOPH B, VASWANI A, MAY J, et al. Simple, Fast Noise-Contrastive Estimation for Large RNN Vocabularies[C]//Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, 2016:1217-1222.
    [41]
    DYER C. Notes on noise contrastive estimation and negative sampling[J]. ArXiv preprint arXiv:1410.8251, 2014.
    [42]
    MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed Representations of Words and Phrases and their Compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26:3111-3119. http://www.cs.wayne.edu/~mdong/Haotian_WordEmbedding.pptx
    [43]
    CHEN W, GRANGIER D, AULI M, et al. Strategies for training large vocabulary neural language models[C]//Meeting of the Association for Computational Linguistics, 2015:1975-1985.
    [44]
    DEVLIN J, ZBIB R, HUANG Z, et al. Fast and robust neural network joint models for statistical machine translation[C]//Meeting of the Association for Computational Linguistics, 2014:1370-1380.
    [45]
    ANDREAS J, DAN K. When and why are log-linear models self-normalizing?[C]//Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, 2015:244-249.
    [46]
    MIKOLOV T, KOPECKY J, BURGET L, et al. Neural network based language models for highly inflective languages[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2009:4725-4728.
    [47]
    SANTOS C D, ZADROZNY B. Learning character-level representations for part-of-speech tagging[C]//Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014:1818-1826.
    [48]
    COTTERELL R, SCHÜTZE H. Morphological word-embeddings[C]//Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, 2015:1287-1292.
    [49]
    BOJANOWSKI P, GRAVE E, JOULIN A, et al. Enriching word vectors with subword information[J]. ArXiv preprint arXiv:1607.04606, 2016.
    [50]
    LI Y, LI W, SUN F, et al. Component-enhanced Chinese character embeddings[C]//Empirical Methods in Natural Language Processing, 2015:829-834.
    [51]
    YU M, DREDZE M. Improving lexical embeddings with semantic knowledge[C]//Meeting of the Association for Computational Linguistics, 2014:545-550.
    [52]
    WANG Z, ZHANG J, FENG J, et al. Knowledge graph and text jointly embedding[C]//Conference on Empirical Methods in Natural Language Processing, 2014:1591-1601.
    [53]
    REISINGER J, MOONEY R J. Multi-prototype vector-space models of word meaning[C]//Human Language Technologies:The 2010 Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010:109-117.
    [54]
    HUANG E H, SOCHER R, MANNING C D, et al. Improving word representations via global context and multiple word prototypes[C]//Meeting of the Association for Computational Linguistics:Long Papers. Association for Computational Linguistics, 2012:873-882.
    [55]
    VILNIS L, MCCALLUM A. Word representations via gaussian embedding[R]. University of Massachusetts Amherst, 2014.
    [56]
    HILL F, REICHART R, KORHONEN A, et al. Simlex-999:Evaluating semantic models with genuine similarity estimation[J]. Computational Linguistics, 2015, 41(4):665-695. doi:  10.1162/COLI_a_00237
    [57]
    FINKELSTEIN R L. Placing search in context:the concept revisited[J]. Acm Transactions on Information Systems, 2002, 20(1):116-131. doi:  10.1145/503104.503110
    [58]
    ZWEIG G, BURGES C J C. The Microsoft Research sentence completion challenge[R]. Technical Report MSRTR-2011-129, Microsoft, 2011.
    [59]
    GLADKOVA A, DROZD A, MATSUOKA S. Analogy-based detection of morphological and semantic relations with word embeddings:what works and what doesn't[C]//HLT-NAACL, 2016:8-15.
    [60]
    MIKOLOV T, YIH W, ZWEIG G. Linguistic regularities in continuous space word representations[C]//HLTNAACL, 2013:746-751.
    [61]
    BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[C]//ICLR, 2015:1-15.
    [62]
    GROVER A, LESKOVEC J. Node2vec:Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016:855-864.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Tables(1)

    Article views (295) PDF downloads(821) Cited by()
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return