Citation: | HAN Chengcheng, LI Lei, LIU Tingting, GAO Ming. Approaches for semantic textual similarity[J]. Journal of East China Normal University (Natural Sciences), 2020, (5): 95-112. doi: 10.3969/j.issn.1000-5641.202091011 |
[1] |
BLOEHDORN S, BASILI R, CAMMISA M, et al. Semantic kernels for text classification based on topological measures of feature similarity [C]//Proceeding of the Sixth International Conference on Data Mining (ICDM’06). 2006: 808-812.
|
[2] |
TONG Y, GU L. A news text clustering method based on similarity of text labels [C]//International Conference on Advanced Hybrid Information Processing. 2018: 496-503.
|
[3] |
ATTARDI G, SIMI M, DEI R S. TANL-1: Coreference resolution by parse analysis and similarity clustering [C]//Proceedings of the 5th International Workshop on Semantic Evaluation. 2010: 108-111.
|
[4] |
DAS A, MANDAL J, DANIAL Z, et al. A novel approach for automatic bengali question answering system using semantic similarity analysis[EB/OL]. (2019-10-23)[2020-07-01]. https://arxiv.org/ftp/arxiv/papers/1910/1910.10758.pdf.
|
[5] |
AMIR S, TANASESCU A, ZIGHED D A. Sentence similarity based on semantic kernels for intelligent text retrieval [J]. Journal of Intelligent Information Systems, 2017, 48(3): 675-689.
|
[6] |
SOORI H, PRILEPOK M, PLATOS J, et al. Semantic and similarity measure methods for plagiarism detection of students’ assignments [C]//Proceedings of the Second International Afro-European Conference for Industrial Advancement AECIA 2015. 2016: 117-125.
|
[7] |
VADAPALLI R, KURISINKEL L J, GUPTA M, et al. SSAS: Semantic similarity for abstractive summarization [C]//Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2017: 198-203.
|
[8] |
QIAN M, LIU J, LI C, et al. A comparative study of English-Chinese translations of court texts by machine and human translators and the Word2Vec based similarity measure’s ability to gauge human evaluation biases [C]//Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks. 2019: 95-100.
|
[9] |
MAJUMDER G, PAKRAY P, GELBUKH A, et al. Semantic textual similarity methods, tools, and applications: A survey [J]. Computación y Sistemas, 2016, 20(4): 647-665.
|
[10] |
王春柳, 杨永辉, 邓霏, 等. 文本相似度计算方法研究综述 [J]. 情报科学, 2019, 37(3): 158-168.
|
[11] |
RISTAD, ERIC S, YIANILOS, et al. Learning string-edit distance[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(5): 522-532.
|
[12] |
XU X, CHEN L, HE P. Fast sequence similarity computing with LCS on LARPBS [C]//International Symposium on Parallel and Distributed Processing and Applications. 2005: 168-175.
|
[13] |
KONDRAK G. N-gram similarity and distance [C]// String Processing and Information Retrieval. 2005: 115-126.
|
[14] |
NIWATTANAKUL S, SINGTHONGCHAI J, NAENUDORN E, et al. Using of Jaccard Coefficient for Keywords Similarity [J]. Lecture Notes in Engineering and Computer Science, 2013, 1(3): 13-15.
|
[15] |
车万翔, 刘挺, 秦兵, 等. 基于改进编辑距离的中文相似句子检索 [J]. 高技术通讯, 2004, 14(7): 15-19.
|
[16] |
SLANEY M, CASEY M. Locality-sensitive hashing for finding nearest neighbors [J]. IEEE Signal processing magazine, 2008, 25(2): 128-131.
|
[17] |
SALTON G, WONG A, YANG C S, et al. A vector space model for automatic indexing [J]. Communications of The ACM, 1975, 18(11): 613-620.
|
[18] |
LANDAUER T K, DUMAIS S T. A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge [J]. Psychological Review, 1997, 104(2): 211-240.
|
[19] |
HOFMANN T. Probabilistic latent semantic analysis [J]. Uncertainty in Artificial Intelligence, 1999, 15(6): 289-296.
|
[20] |
BLEI D M, NG A Y, JORDAN M I, et al. Latent dirichlet allocation [J]. Journal of Machine Learning Research, 2012(3): 993-1022.
|
[21] |
GUO Q L, LI Y M, TANG Q. Similarity computing of documents based on VSM [J]. Application Research of Computers, 2008, 25(11): 3256-3258.
|
[22] |
LI L. Research and implementation of an improved VSM-based text similarity algorithm [J]. Computer Applications and Software, 2012, 29(2): 282-284.
|
[23] |
TASI C, HUANG Y, LIU C, et al. Applying VSM and LCS to develop an integrated text retrieval mechanism [J]. Expert Systems With Applications, 2012, 39(4): 3974-3982.
|
[24] |
王振振, 何明, 杜永萍. 基于LDA主题模型的文本相似度计算 [J]. 计算机科学, 2013, 40(12): 229-232.
|
[25] |
XIONG D P, WANG J, LIN H F. An LDA-based approach to finding similar questions for community question answer [J]. Journal of Chinese Information Processing, 2012, 26(5): 40-45.
|
[26] |
ZHANG C, CHEN L, LI X, et al. Chinese text similarity algorithm based on PST_LDA [J]. Application Research of Computers, 2016, 33(2): 375-377.
|
[27] |
MIAO Y, YU L, BLUNSOM P, et al. Neural variational inference for text processing [EB/OL]. (2016-01-04)[2020-07-01]. https://arxiv.org/pdf/1511.06038.pdf.
|
[28] |
LAU J H, BALDWIN T, COHN T, et al. Topically Driven Neural Language Model [C]// Meeting of the Association for Computational Linguistics. 2017: 355-365.
|
[29] |
MILLER, GEORGE A. WordNet: A lexical database for English [J]. Communications of the Acm, 1995, 38(11): 39-41.
|
[30] |
梅家驹, 竺一鸣, 高蕴琦, 等. 同义词词林 [M]. 上海: 上海辞书出版社, 1983.
|
[31] |
董振东. 语义关系的表达和知识系统的建造 [J]. 语言文字应用, 1998(3): 76-82.
|
[32] |
RADA R, MILI H, BICKNELL E J, et al. Development and application of a metric on semantic nets [J]. IEEE Transaction on System Man & Cybernetics, 1989, 19(1):17-30.
|
[33] |
RICHARDSON R, SMEATON A F. Using WordNet in a knowledge-based approach to information retrieval [EB/OL]. (1995-02-01)[2020-07-01].http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=0DDA60E11D37A7DA2777BF162C86760F?doi=10.1.1.48.9324&rep=rep1&type=pdf.
|
[34] |
LEACOCK C, CHODOROW M. Combining local context and WordNet similarity for word sense identification [M]// FELLBAUM C. WordNet: An Electronic Lexical Database. Massachusetts: MIT Press, 1998.
|
[35] |
WU Z B. Verb semantics and lexical selection[C]// Acl Proceedings of Annual Meeting on Association for Computational Linguistics. 1994: 133-138.
|
[36] |
HIRST G, STONGE D. Lexical chains as representations of context for the detection and correction of malapropisms[M]// FELLBAUM C. WordNet: An Electronic Lexical Database. Massachusetts: MIT Press, 1998, 305: 305-332.
|
[37] |
YANG D, POWERS D M W. Measuring semantic similarity in the taxonomy of WordNet [C]// ACSC’05: Proceedings of the Twenty-eighth Australasian conference on Computer Science. 2005, 38: 315-322.
|
[38] |
RESNIK P. Using information content to evaluate semantic similarity in a taxonomy [C]// IJCAI’95: Proceedings of the 14th International Joint Conference on Artificial Intelligence. 1995(1): 448-453.
|
[39] |
JIANG J J, CONRATH D W. Semantic similarity based on corpus statistics and lexical taxonomy [EB/OL]. (1997-10-01)[2020-07-01]. https://arxiv.org/pdf/cmp-lg/9709008.pdf.
|
[40] |
LIN D. An information-theoretic definition of similarity [C]//ICML’98: Proceedings of the Fifteenth International Conference on Machine Learning. 1998(7): 296-304.
|
[41] |
LESK M. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone [C]//Proceedings of the 5th Annual International Conference on Systems Documentation. 1986: 24-26.
|
[42] |
BANERJEE S, PEDERSEN T. An adapted lesk algorithm for word sense disambiguation using WordNet [C]//International Conference on Intelligent Text Processing and Computational Linguistics. 2002: 136-145.
|
[43] |
PEDERSEN T, PATWARDHAN S, MICHELIZZI J. WordNet : Similarity-Measuring the relatedness of concepts [C]//Demonstrations’04: Demonstration Papers at HLT-NAACL 2004. 2004(5): 38-41.
|
[44] |
LI Y, BANDAR Z A, MCLEAN D. An approach for measuring semantic similarity between words using multiple information sources [J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(4): 871-882.
|
[45] |
SHI B, FANG L Y, YAN J Z, et al. Ontology-based measure of semantic similarity between concepts [C]//WCSE '09: Proceedings of the 2009 WRI World Congress on Software Engineering. 2009(2): 109-112.
|
[46] |
郑志蕴, 阮春阳, 李伦, 等. 本体语义相似度自适应综合加权算法研究 [J]. 计算机科学, 2016, 43: 242-247.
|
[47] |
刘群, 李素建. 基于《知网》 的词汇语义相似度计算 [J]. 中文计算语言学, 2002, 7(2): 59-76.
|
[48] |
李峰, 李芳. 中文词语语义相似度计算——基于《知网》2000 [J]. 中文信息学报, 2007, 21(3): 99-105.
|
[49] |
江敏, 肖诗斌, 王弘蔚, 等. 一种改进的基于《知网》的词语语义相似度计算 [J]. 中文信息学报, 2008, 22(5): 84-89.
|
[50] |
STRUBE M, PONZETTO S P. WikiRelate! Computing semantic relatedness using Wikipedia [C]//AAAI'06: Proceedings of the 21st National Conference on Artificial Intelligence. 2006(2): 1419-1424.
|
[51] |
GABRILOVICH E, MARKOVITCH S. Computing semantic relatedness using wikipedia-based explicit semantic analysis [C]//IJCAI’07: Proceedings of the 20th International Joint Conference on Artifical Intelligence. 2007(1): 1606-1611.
|
[52] |
WITTEN I, MILNE D N. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links [C]//Proceedings of AAAI’2008. 2008: 25-30.
|
[53] |
YEH E, RAMAGE D, MANNING C D, et al. WikiWalk: Random walks on Wikipedia for semantic relatedness [C]//Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing. 2009: 41-49.
|
[54] |
CAMACHO-COLLADOS J, PILEHVAR M T, NAVIGLI R. Nasari: A novel approach to a semantically-aware representation of items [C]//Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2015: 567-577.
|
[55] |
MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space [EB/OL]. (2013-09-07)[2020-07-01]. https://arxiv.org/pdf/1301.3781.pdf.
|
[56] |
PENNINGTON J, SOCHER R, MANNING C D. Glove: Global vectors for word representation [C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014: 1532-1543.
|
[57] |
JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification [EB/OL]. (2016-08-09)[2020-07-01]. https://arxiv.org/pdf/1607.01759.pdf.
|
[58] |
PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations [EB/OL]. (2018-03-22)[2020-07-01]. https://arxiv.org/pdf/1802.05365.pdf.
|
[59] |
RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training [EB/OL]. (2018-11-05)[2020-07-01]. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/languageunsupervised/language understanding paper.pdf.
|
[60] |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]//NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 5998-6008.
|
[61] |
DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding [EB/OL]. (2019-05-24)[2020-07-01]. https://arxiv.org/pdf/1810.04805.pdf.
|
[62] |
LE Q, MIKOLOV T. Distributed representations of sentences and documents [C]//ICML’14: Proceedings of the 31st International Conference on International Conference on Machine Learning. 2014, 32: 1188-1196.
|
[63] |
PAGLIARDINI M, GUPTA P, JAGGI M. Unsupervised learning of sentence embeddings using compositional n-gram features [EB/OL]. (2018-12-28)[2020-07-01]. https://arxiv.org/pdf/1703.02507.pdf.
|
[64] |
KIROS R, ZHU Y, SALAKHUTDINOV R R, et al. Skip-thought vectors [C]//Advances in neural information processing systems. 2015: 3294-3302.
|
[65] |
LOGESWARAN L, LEE H. An efficient framework for learning sentence representations [EB/OL]. (2018-03-07)[2020-07-01]. https://arxiv.org/pdf/1803.02893.pdf.
|
[66] |
HILL F, CHO K, KORHONEN A. Learning distributed representations of sentences from unlabelled data [EB/OL]. (2016-02-10)[2020-07-01]. https://arxiv.org/pdf/1602.03483.pdf.
|
[67] |
KUSNER M, SUN Y, KOLKIN N, et al. From word embeddings to document distances [C]//International Conference on Machine Learning. 2015: 957-966.
|
[68] |
ARORA S, LIANG Y, MA T. A simple but tough-to-beat baseline for sentence embeddings [EB/OL]. (2017-02-04)[2020-07-01]. https://openreview.net/pdf?id=SyK00v5xx.
|
[69] |
RÜCKLÉ A, EGER S, PEYRARD M, et al. Concatenated power mean word embeddings as universal cross-lingual sentence representations [EB/OL]. (2018-09-12)[2020-07-01]. https://arxiv.org/pdf/1803.01400.pdf.
|
[70] |
HUANG P S, HE X, GAO J, et al. Learning deep structured semantic models for web search using clickthrough data [C]//Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2013: 2333-2338.
|
[71] |
SHEN Y, HE X, GAO J, et al. A latent semantic model with convolutional-pooling structure for information retrieval[C]//Proceedings of the 23rd ACM international conference on conference on information and knowledge management. 2014: 101-110.
|
[72] |
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks [C]//Advances in nNeural Information Processing Systems. 2012: 1097-1105.
|
[73] |
PALANGI H, DENG L, SHEN Y, et al. Semantic modelling with long-short-term memory for information retrieval [EB/OL]. (2015-02-27)[2020-07-01]. https://arxiv.org/pdf/1412.6629.pdf.
|
[74] |
GERS F. Long short-term memory in recurrent neural networks [D]. Lausanne: EPFL, 2001.
|
[75] |
PONTES E L, HUET S, LINHARES A C, et al. Predicting the semantic textual similarity with siamese CNN and LSTM [EB/OL]. (2018-10-24)[2020-07-01]. https://arxiv.org/pdf/1810.10641.pdf.
|
[76] |
MUELLER J, THYAGARAJAN A. Siamese recurrent architectures for learning sentence similarity [C]//AAAI’16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. 2016(2): 2786-2792 .
|
[77] |
LIN Z, FENG M, SANTOS C N, et al. A structured self-attentive sentence embedding [EB/OL]. (2017-03-09)[2020-07-01]. https://arxiv.org/pdf/1703.03130.pdf.
|
[78] |
CONNEAU A, KIELA D, SCHWENK H, et al. Supervised learning of universal sentence representations from natural language inference data [EB/OL]. (2017-07-21)[2020-07-01]. https://arxiv.org/pdf/1705.02364v4.pdf.
|
[79] |
YIN W, SCHÜTZE H, XIANG B, et al. Abcnn: Attention-based convolutional neural network for modeling sentence pairs [J]. Transactions of the Association for Computational Linguistics, 2016(4): 259-272.
|
[80] |
HE H, LIN J. Pairwise word interaction modeling with deep neural networks for semantic similarity measurement [C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016: 937-948.
|
[81] |
WANG Z, HAMZA W, FLORIAN R. Bilateral multi-perspective matching for natural language sentences [C]//Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence Main track. 2017: 4144-4150.
|
[82] |
GONG Y, LUO H, ZHANG J. Natural language inference over interaction space [EB/OL]. (2018-05-26)[2020-07-01]. https://arxiv.org/pdf/1709.04348.pdf.
|
[83] |
KIM S, KANG I, KWAK N. Semantic sentence matching with densely-connected recurrent and co-attentive information [C]//Proceedings of the AAAI conference on artificial intelligence. 2019, 33: 6586-6593.
|
[84] |
HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700-4708.
|
[85] |
YANG Y, YUAN S, CER D, et al. Learning semantic textual similarity from conversations [EB/OL]. (2018-04-20)[2020-07-01]. https://arxiv.org/pdf/1804.07754.pdf.
|
[86] |
CER D, YANG Y, KONG S, et al. Universal sentence encoder [EB/OL]. (2018-04-12)[2020-07-01]. https://arxiv.org/pdf/1803.11175.pdf.
|
[87] |
CHEN G, SHI X, CHEN M, et al. Text similarity semantic calculation based on deep reinforcement learning [J]. International Journal of Security and Networks, 2020, 15(1): 59-66.
|