Citation: | YU Ke-ren, FU Yun-bin, DONG Qi-wen. Survey on distributed word embeddings based on neural network language models[J]. Journal of East China Normal University (Natural Sciences), 2017, (5): 52-65, 79. doi: 10.3969/j.issn.1000-5641.2017.05.006 |
[1] |
HARRIS Z S. Distributional structure[J]. Word, 1954, 10(2/3):146-162. doi: 10.1080/00437956.1954.11659520
|
[2] |
FIRTH J R. A synopsis of linguistic theory, 1930-1955[J]. Studies in linguistic analysis, 1957(S):1-31 http://dingo.sbs.arizona.edu/~langendoen/ReviewOfFirth.pdf
|
[3] |
来斯惟. 基于神经网络的词和文档语义向量表示方法研究[D]. 北京: 中国科学院大学, 2016.
|
[4] |
TURIAN J, RATINOV L, BENGIO Y. Word representations:a simple and general method for semi-supervised learning[C]//ACL 2010, Proceedings of the Meeting of the Association for Computational Linguistics, July 11-16, 2010, Uppsala, Sweden. DBLP, 2010:384-394.
|
[5] |
DEERWESTER S, DUMAIS S T, FURNAS G W, et al. Indexing by latent semantic analysis[J]. Journal of the American Society for Information Science, 1990, 41(6):391. doi: 10.1002/(ISSN)1097-4571
|
[6] |
PENNINGTON J, SOCHER R, MANNING C. Glove:Global vectors for word representation[C]//Conference on Empirical Methods in Natural Language Processing, 2014:1532-1543.
|
[7] |
BROWN P F, DESOUZA P V, MERCER R L, et al. Class-based n-gram models of natural language[J]. Computational linguistics, 1992, 18(4):467-479. http://dl.acm.org/citation.cfm?id=176316&picked=formats
|
[8] |
GUO J, CHE W, WANG H, et al. Revisiting embedding features for simple semi-supervised learning[C]//Conference on Empirical Methods in Natural Language Processing, 2014:110-120.
|
[9] |
CHEN X, XU L, LIU Z, et al. Joint learning of character and word embeddings[C]//International Conference on Artificial Intelligence. AAAI Press, 2015:1236-1242.
|
[10] |
HINTON G E. Learning distributed representations of concepts[C]//Proceedings of the Eighth Annual Conference of the Cognitive Science Society, 1986:12.
|
[11] |
MⅡKKULAINEN R, DYER M G. Natural language processing with modular neural networks and distributed lexicon[C]//Cognitive Science, 1991:343-399.
|
[12] |
ALEXRUDNICKY. Can artificial neural networks learn language models?[C]//International Conference on Spoken Language Processing. DBLP, 2000:202-205.
|
[13] |
BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 3(6):1137-1155. http://machinelearning.wustl.edu/mlpapers/paper_files/BengioDVJ03.pdf
|
[14] |
MNIH A, HINTON G. Three new graphical models for statistical language modelling[C]//Machine Learning, Proceedings of the Twenty-Fourth International Conference. DBLP, 2007:641-648.
|
[15] |
SUTSKEVER I, HINTON G E. Learning multilevel distributed representations for high-dimensional sequences[J]. Journal of Machine Learning Research, 2007(2):548-555. http://dblp.uni-trier.de/db/journals/jmlr/jmlrp2.html#SutskeverH07
|
[16] |
MNIH A, HINTON G. A scalable hierarchical distributed language model[C]//Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December. DBLP, 2008:1081-1088.
|
[17] |
MNIH A, KAVUKCUOGLU K. Learning word embeddings efficiently with noise-contrastive estimation[C]//Advances in Neural Information Processing Systems, 2013:2265-2273.
|
[18] |
MIKOLOV T, KARAFIÁT M, BURGET L, et al. Recurrent neural network based language model[C]//INTERSPEECH 2010, Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September. DBLP, 2010:1045-1048.
|
[19] |
MIKOLOV T, KOMBRINK S, DEORAS A, et al. Rnnlm-recurrent neural network language modeling toolkit[C]//Processingof the 2011 ASRU Workshop, 2011:196-201.
|
[20] |
BENGIO Y, SIMARD P, FRASCONI P. Learning long-term dependencies with gradient descent is difficult[J]. IEEE Transactions on Neural Networks, 2002, 5(2):157-166. http://ieeexplore.ieee.org/xpl/abstractKeywords.jsp?reload=true&arnumber=279181&contentType=Journals+%26+Magazines
|
[21] |
HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8):1735-1780. doi: 10.1162/neco.1997.9.8.1735
|
[22] |
CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoderdecoder for statistical machine translation[C]//Empirical Methods in Natural Language Processing, 2014:1724-1734.
|
[23] |
CHO K, VAN MERRIËNBOER B, BAHDANAU D, et al. On the properties of neural machine translation:Encoder-decoder approaches[J]. ArXiv preprint arXiv:1409.1259, 2014.
|
[24] |
CHUNG J, GULCEHRE C, CHO K H, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[J]. ArXiv preprint arXiv:1412.3555, 2014.
|
[25] |
GREFF K, SRIVASTAVA R K, KOUTNÍK J, et al. LSTM:A search space odyssey[J]. IEEE Transactions on Neural Networks & Learning Systems, 2015(99):1-11. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7508408
|
[26] |
JOZEFOWICZ R, ZAREMBA W, SUTSKEVER I, et al. An empirical exploration of recurrent network architectures[C]//International Conference on Machine Learning, 2015:2342-2350.
|
[27] |
MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. ArXiv preprint arXiv:1301.3781, 2013.
|
[28] |
MORIN F, BENGIO Y. Hierarchical probabilistic neural network language model[C]//Aistats, 2005:246-252.
|
[29] |
GOODMAN J. Classes for fast maximum entropy training[C]//IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, 2001:561-564.
|
[30] |
FELLBAUM C, MILLER G. WordNet:An Electronic Lexical Database[M].Cambridge, MA:MIT Press, 1998.
|
[31] |
MNIH A, HINTON G. A scalable hierarchical distributed language model[C]//International Conference on Neural Information Processing Systems. Curran Associates Inc, 2008:1081-1088.
|
[32] |
LE H S, OPARIN I, ALLAUZEN A, et al. Structured Output Layer neural network language model[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2011:5524-5527.
|
[33] |
MIKOLOV T, KOMBRINK S, BURGET L, et al. Extensions of recurrent neural network language model[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2011:5528-5531.
|
[34] |
COLLOBERT R, WESTON J. A unified architecture for natural language processing:Deep neural networks with multitask learning[C]//International Conference. DBLP, 2008:160-167.
|
[35] |
COLLOBERT R, WESTON J, BOTTOU L, et al. Natural Language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12(1):2493-2537. http://www.inf.ed.ac.uk/teaching/courses/tnlp/2014/Ryan.pdf
|
[36] |
GUTMANN M, HYVÄRINEN A. Noise-contrastive estimation:A new estimationp rinciple for unnormalized statistical models[J]. Journal of Machine Learning Research, 2010(9):297-304.
|
[37] |
GUTMANN M U, HYVARINEN A. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics[J]. Journal of Machine Learning Research, 2012, 13(1):307-361.
|
[38] |
MNIH A, TEH Y W. A fast and simple algorithm for training neural probabilistic language models[C]//International Conference on Machine Learning, 2012:1751-1758.
|
[39] |
BENGIO Y, SENÉCAL J S. Quick Training of Probabilistic Neural Nets by Impo rtance Sampling[C]//AISTATS, 2003:1-9.
|
[40] |
ZOPH B, VASWANI A, MAY J, et al. Simple, Fast Noise-Contrastive Estimation for Large RNN Vocabularies[C]//Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, 2016:1217-1222.
|
[41] |
DYER C. Notes on noise contrastive estimation and negative sampling[J]. ArXiv preprint arXiv:1410.8251, 2014.
|
[42] |
MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed Representations of Words and Phrases and their Compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26:3111-3119. http://www.cs.wayne.edu/~mdong/Haotian_WordEmbedding.pptx
|
[43] |
CHEN W, GRANGIER D, AULI M, et al. Strategies for training large vocabulary neural language models[C]//Meeting of the Association for Computational Linguistics, 2015:1975-1985.
|
[44] |
DEVLIN J, ZBIB R, HUANG Z, et al. Fast and robust neural network joint models for statistical machine translation[C]//Meeting of the Association for Computational Linguistics, 2014:1370-1380.
|
[45] |
ANDREAS J, DAN K. When and why are log-linear models self-normalizing?[C]//Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, 2015:244-249.
|
[46] |
MIKOLOV T, KOPECKY J, BURGET L, et al. Neural network based language models for highly inflective languages[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2009:4725-4728.
|
[47] |
SANTOS C D, ZADROZNY B. Learning character-level representations for part-of-speech tagging[C]//Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014:1818-1826.
|
[48] |
COTTERELL R, SCHÜTZE H. Morphological word-embeddings[C]//Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, 2015:1287-1292.
|
[49] |
BOJANOWSKI P, GRAVE E, JOULIN A, et al. Enriching word vectors with subword information[J]. ArXiv preprint arXiv:1607.04606, 2016.
|
[50] |
LI Y, LI W, SUN F, et al. Component-enhanced Chinese character embeddings[C]//Empirical Methods in Natural Language Processing, 2015:829-834.
|
[51] |
YU M, DREDZE M. Improving lexical embeddings with semantic knowledge[C]//Meeting of the Association for Computational Linguistics, 2014:545-550.
|
[52] |
WANG Z, ZHANG J, FENG J, et al. Knowledge graph and text jointly embedding[C]//Conference on Empirical Methods in Natural Language Processing, 2014:1591-1601.
|
[53] |
REISINGER J, MOONEY R J. Multi-prototype vector-space models of word meaning[C]//Human Language Technologies:The 2010 Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010:109-117.
|
[54] |
HUANG E H, SOCHER R, MANNING C D, et al. Improving word representations via global context and multiple word prototypes[C]//Meeting of the Association for Computational Linguistics:Long Papers. Association for Computational Linguistics, 2012:873-882.
|
[55] |
VILNIS L, MCCALLUM A. Word representations via gaussian embedding[R]. University of Massachusetts Amherst, 2014.
|
[56] |
HILL F, REICHART R, KORHONEN A, et al. Simlex-999:Evaluating semantic models with genuine similarity estimation[J]. Computational Linguistics, 2015, 41(4):665-695. doi: 10.1162/COLI_a_00237
|
[57] |
FINKELSTEIN R L. Placing search in context:the concept revisited[J]. Acm Transactions on Information Systems, 2002, 20(1):116-131. doi: 10.1145/503104.503110
|
[58] |
ZWEIG G, BURGES C J C. The Microsoft Research sentence completion challenge[R]. Technical Report MSRTR-2011-129, Microsoft, 2011.
|
[59] |
GLADKOVA A, DROZD A, MATSUOKA S. Analogy-based detection of morphological and semantic relations with word embeddings:what works and what doesn't[C]//HLT-NAACL, 2016:8-15.
|
[60] |
MIKOLOV T, YIH W, ZWEIG G. Linguistic regularities in continuous space word representations[C]//HLTNAACL, 2013:746-751.
|
[61] |
BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[C]//ICLR, 2015:1-15.
|
[62] |
GROVER A, LESKOVEC J. Node2vec:Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016:855-864.
|