A review of machine reading comprehension for automatic QA
-
摘要: 人工智能正在深彻地变革各个行业.AI与教育的结合加速推动教育的结构性变革,正在将传统教育转变为智适应教育.基于深度学习的自动问答系统不仅可帮助学生实时解答疑惑、获取知识,还可以快速获取学生行为数据,加速教育的个性化和智能化.机器阅读理解是自动问答系统的核心模块,是理解学生问题,理解文档内容,快速获取知识的重要技术.在过去的几年里,随着深度学习复兴以及大规模机器阅读数据集的公开,各种各样的基于神经网络的机器阅读模型不断涌现.这篇综述主要讲述3方面的内容:介绍机器阅读理解的定义与发展历程;分析神经机器阅读模型之间的优点及不足;总结机器阅读领域的公开数据集以及评价方法.Abstract: Artificial Intelligence (AI) is affecting every industry. Applying AI to education accelerates the structural reform of education and transforms traditional education into intelligent adaptive education. The automatic Question Answer system, based on deep learning, not only helps students to answer questions and acquire knowledge in real-time, but can also quickly gather student behavioral data and accelerate personalization of the educational process. Machine reading comprehension is the core module of an automatic Question Answer system, and it is an important technology to understand student problems, document content, and acquire knowledge quickly. With the revival of deep learning and the availability of large-scale reading comprehension datasets, a number of neural network-based machine reading models have been proposed over the past few years. The purpose of this review is three-fold:to introduce and review progress in machine reading comprehension; to compare and analyze the advantages and disadvantages between various neural machine reading models; and to summarize the relevant datasets and evaluation methods in the field of machine reading.
-
表 1 程门立雪成语故事
Tab. 1 A idiom story
在宋代, 杨时喜欢研究学问.早期他在颍昌师从程颢, 学到了不少知识.程颢死后, 杨时到洛阳请教另一位理学家程颐(程颢的弟弟).他到程颐家时, 程颐在屋里睡觉.为了不打扰程颐, 他就侍立在程颐家门口.程颐醒来后发现门外的雪已下了一尺多深.程门立雪由此而来. 根据以上材料回答以下问题: 1 杨时喜欢做什么? 研究学问. 2 早期杨时的老师是谁? 程颢. 3 谁侍立在程颐家门口? 杨时. 表 2 Daily Mail数据集中的一个样本
Tab. 2 An example of Daily Mail dataset
Context The BBC producer allegedly struck by Jeremy Clarkson will not press charges against the “Top Gear” host, his lawyer said Friday. Clarkson, who hosted one of the most-watched television shows in the world, was dropped by the BBC Wednesday after an internal investigation by the British broad-caster found he had subjected producer Oisin Tymon “to an unprovoked physical and verbal attack.”... Query Producer X will not press charges against Jeremy Clarkson, his lawyer says. Answer Oisin Tymon 表 3 SQuAD数据集中的样本
Tab. 3 An example of SQuAD dataset
In 1870, Tesla moved to Karlovac, to attend school an the Higher Real Gymnasium, where he was profoundly influenced by a math teach Martin Sekulic. The classes were held in German, as it was a school within the Austro-Hungarian Military Frontier. Tesla was able to perform integral calculus in his head, which prompted his teachers to believe that he was cheating. He finished a four-year term in three years, graduating in 1873. 1. In what language were the classed given? German 2. Who was Tesla's main influence in Karlovac? Martin Sekulic 3. Why did Tesla go to Karlovac? attend school at the Higher Real Gymnasium 表 4 机器阅读数据集总结
Tab. 4 Summary of Machine reading datasets
任务 数据集 语言 规模 问题来源 文档来源 答案 填空式 MCTest [5] EN 2K/500 Crowdsourced Fictional stories Molti. choices CNN/DM [24] EN 1.4M/300K Synthetic cloze News Fill in entity RACE [19] ZH 870K/50K English exam English exam Molti. choices HLF_RC [28] ZH 100K/28K Synthetic cloze Fairy/News Fill in word CBT [50] EN 688K/108 Synthetic cloze Project Gutenberg Molti. choices 段落抽取式 SQuAD [15] EN 100K/536 Crowdsourced WiKi Span of words TrivaQA [16] EN 40K/660k Trivia websites WiKi/Web doc Span of words NewsQA [17] EN 100K/10K Crowdsourced CNN Span of words SearchQA[20] EN 140K/6.9M QA site Web doc Span of words NarrativeQA[18] EN 46K/1.5K Crowdsourced Book & Movie Mannual summary MS MARCO[15] EN 100K/200K User logs Web doc Mannual summary DuReader [21] ZH 200K/1M Web doc Web doc/CQA Mannual summary 表 5 模型在CNN/Daily Mail上的性能比较
Tab. 5 Performance comparison of models on CNN/Daily Mail
模型 CNN Daily Mail Valid Test Valid test Sukhbaatar等人(End to End Memory network) [37] 63.4 66.8 NA NA Hermann等人(Attentive Reader) [24] 61.6 63.0 70.5 69.0 Hermann等人(Impatient Reader) [24] 61.8 63.8 69.0 68.0 Chen等人(Standford Attentive Reader) [25] 72.4 72.4 76.9 75.8 Kadlec等人(AS Reader) [26] 68.6 69.6 75.0 73.9 Cui等人(CAS Reader)[28] 68.2 70.0 NA NA Cui等人(AoA Reader) [29] 73.1 74.4 NA NA Sordoni等人(Iterative Attention) [39] 72.6 73.3 NA NA Seo等人(BiDAF) [31] 76.3 76.9 NA NA Shen等人(ReasoNet) [40] 72.9 72.4 NA NA 表 6 模型在SQuAD数据集上的性能比较
Tab. 6 Performances comparison of models on SQuAD dataset
模型 EM F1 Wang等人Match LSTM [30] 60.474 70.695 Seo等人(BiDAF) [31] 67.974 77.323 Shen等人(ReasoNet) [40] 70.555 79.364 Liu等人(SAN) [35] 76.828 84.396 Huang等人(FusionNet) [34] 75.968 83.900 Wu等人(GLDR) [42] 69.325 77.886 Wang等人(R-Net) [33] 81.391 88.170 Yu等人(QANet) [43] 82.471 89.306 Devlin等人(BERT) [49] 85.083 91.835 -
[1] CHEN D Q. Neural reading comprehension and beyond[D]. CA: Standford University, 2018. [2] LEHNERT W G. The process of question answering[R]. Yale Univ New Haven Conn, 1977. [3] HIRSCHMAN L, LIGHT M, BRECK E, et al. Deep read: A reading comprehension system[C]//Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics, 1999: 325-332. [4] RILOFF E, THELEN M. A rule-based question answering system for reading comprehension tests[C]//Proceedings of the 2000 ANLP/NAACL Workshop on Reading comprehension tests as evaluation for computer-based language understanding sytems-Volume 6. Association for Computational Linguistics, 2000: 13-19. https://www.researchgate.net/publication/50520397_Rule-based_question_answering_system_for_reading_comprehension_tests [5] RICHARDSON M, BURGES C J C, RENSHAW E. Mctest: A challenge dataset for the open-domain machine comprehension of text[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013: 193-203. [6] SACHAN M, DUBEY K, XING E, et al. Learning answer-entailing structures for machine comprehension[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015: 239-249. [7] NARASIMHAN K, BARZILAY R. Machine comprehension with discourse relations[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015: 1253-1262. [8] WANG H, BANSAL M, GIMPEL K, et al. Machine comprehension with syntax, frames, and semantics[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 2015: 700-706. [9] BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003(3):1137-1155. http://cn.bing.com/academic/profile?id=1cb6c7fa24bb0d805377a0d2529ab24e&encoded=0&v=paper_preview&mkt=zh-cn [10] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Advances in neural information processing systems, 2013: 3111-3119. https://www.researchgate.net/publication/257882504_Distributed_Representations_of_Words_and_Phrases_and_their_Compositionality [11] PENNINGTON J, SOCHER R, MANNING C. Glove: Global vectors for word representation[C]//Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014: 1532-1543. http://cn.bing.com/academic/profile?id=537a8511c9ae20478a13d76f2a1e7035&encoded=0&v=paper_preview&mkt=zh-cn [12] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8):1735-1780. doi: 10.1162/neco.1997.9.8.1735 [13] CHUNG J, GULCEHRE C, CHO K H, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[J]. arXiv preprint, arXiv: 1412.3555, 2014. http://cn.bing.com/academic/profile?id=5381dfea1da3c721cd1e556ac7b04bcc&encoded=0&v=paper_preview&mkt=zh-cn [14] RAJPURKAR P, ZHANG J, LOPYREV K, et al. Squad: 100, 000+questions for machine comprehension of text[J]. arXiv preprint, arXiv: 1606.05250, 2016. [15] NGUYEN T, ROSENBERG M, SONG X, et al. MS MARCO:A Human-Generated MAchine Reading COmprehension Dataset[J]. Neural Information Processing Systems, 2016. [16] JOSHI M, CHOI E, WELD D S, et al. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension[J]. arXiv preprint, arXiv: 1705.03551, 2017. https://www.researchgate.net/publication/316859263_TriviaQA_A_Large_Scale_Distantly_Supervised_Challenge_Dataset_for_Reading_Comprehension [17] TRISCHLER A, WANG T, YUAN X, et al. Newsqa: A machine comprehension dataset[J]. arXiv preprint, arXiv: 1611.09830, 2016. [18] KOČSKÝ T, SCHWARZ J, BLUNSOM P, et al. The narrativeqa reading comprehension challenge[J]. Transactions of the Association for Computational Linguistics, 2018(6):317-328. http://cn.bing.com/academic/profile?id=8cee6827bd723c49af947e95a2293e22&encoded=0&v=paper_preview&mkt=zh-cn [19] LAI G, XIE Q, LIU H, et al. Race: Large-scale reading comprehension dataset from examinations[J]. arXiv preprint, arXiv: 1704.04683, 2017. https://www.researchgate.net/publication/316184339_RACE_Large-scale_ReAding_Comprehension_Dataset_From_Examinations [20] DUNN M, SAGUN L, HIGGINS M, et al. Searchqa: A new q & a dataset augmented with context from a search engine[J]. arXiv preprint, arXiv: 1704.05179, 2017. https://www.researchgate.net/publication/316236070_SearchQA_A_New_QA_Dataset_Augmented_with_Context_from_a_Search_Engine [21] HE W, LIU K, LIU J, et al. Dureader: a chinese machine reading comprehension dataset from real-world applications[J]. arXiv preprint, arXiv: 1711.05073, 2017. https://www.researchgate.net/publication/321095447_DuReader_a_Chinese_Machine_Reading_Comprehension_Dataset_from_Real-world_Applications [22] TAYLOR W L. "Cloze procedure":A new tool for measuring readability[J]. Journalism Bulletin, 1953, 30(4):415-433. doi: 10.1177/107769905303000401 [23] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint, arXiv: 1409.0473, 2014. https://www.researchgate.net/publication/265252627_Neural_Machine_Translation_by_Jointly_Learning_to_Align_and_Translate [24] HERMANN K M, KOCISKÝ HERMANN K M, KOCISKY T, et al. Teaching machines to read and comprehend[C]//Advances in neural information processing systems. 2015: 1693-1701. [25] CHEN D, BOLTON J, MANNING C D. A thorough examination of the cnn/daily mail reading comprehension task[J]. arXiv preprint, arXiv: 1606.02858, 2016. https://www.researchgate.net/publication/306094228_A_Thorough_Examination_of_the_CNNDaily_Mail_Reading_Comprehension_Task?ev=auth_pub [26] KADLEC R, SCHMID M, BAJGAR O, et al. Text understanding with the attention sum reader network[J]. arXiv preprint, arXiv: 1603.01547, 2016. https://www.researchgate.net/publication/306093209_Text_Understanding_with_the_Attention_Sum_Reader_Network [27] VINYALS O, FORTUNATO M, JAITLY N. Pointer networks[C]//Advances in Neural Information Processing Systems, 2015: 2692-2700. [28] CUI Y, LIU T, CHEN Z, et al. Consensus attention-based neural networks for chinese reading comprehension[J]. arXiv preprint, arXiv: 1607.02250, 2016. https://www.researchgate.net/publication/305119347_Consensus_Attention-based_Neural_Networks_for_Chinese_Reading_Comprehension [29] CUI Y, CHEN Z, WEI S, et al. Attention-over-attention neural networks for reading comprehension[J]. arXiv preprint, arXiv: 1607.04423, 2016. https://www.researchgate.net/publication/305388870_Attention-over-Attention_Neural_Networks_for_Reading_Comprehension [30] WANG S, JIANG J. Machine comprehension using match-lstm and answer pointer[J]. arXiv preprint, arXiv: 1608.07905, 2016. https://www.researchgate.net/publication/307302995_Machine_Comprehension_Using_Match-LSTM_and_Answer_Pointer [31] SEO M, KEMBHAVI A, FARHADI A, et al. Bidirectional attention flow for machine comprehension[J]. arXiv preprint, arXiv: 1611.01603, 2016. [32] KIM Y. Convolutional neural networks for sentence classification[J]. arXiv preprint, arXiv: 1408.5882, 2014. http://cn.bing.com/academic/profile?id=c7d8b4b6c1a9f7522e6ca40975198a59&encoded=0&v=paper_preview&mkt=zh-cn [33] WANG W, YANG N, WEI F, et al. Gated self-matching networks for reading comprehension and question answering[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017: 189-198. [34] HUANG H Y, ZHU C, SHEN Y, et al. Fusionnet: Fusing via fully-aware attention with application to machine comprehension[J]. arXiv preprint, arXiv: 1711.07341, 2017. https://www.researchgate.net/publication/321181036_FusionNet_Fusing_via_Fully-Aware_Attention_with_Application_to_Machine_Comprehension [35] LIU X, SHEN Y, DUH K, et al. Stochastic answer networks for machine reading comprehension[J]. arXiv preprint, arXiv: 1712.03556, 2017. [36] WESTON J, CHOPRA S, BORDES A. Memory networks[J]. arXiv preprint, arXiv: 1410.3916, 2014. [37] SUKHBAATAR S, WESTON J, FERGUS R. End-to-end memory networks[C]//Advances in neural information processing systems, 2015: 2440-2448. [38] DHINGRA B, LIU H, YANG Z, et al. Gated-attention readers for text comprehension[J]. arXiv preprint, arXiv: 1606.01549, 2016. [39] SORDONI A, BACHMAN P, TRISCHLER A, et al. Iterative alternating neural attention for machine reading[J]. arXiv preprint, arXiv: 1606.02245, 2016. [40] SHEN Y, HUANG P S, GAO J, et al. Reasonet: Learning to stop reading in machine comprehension[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2017: 1047-1055. [41] WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine learning, 1992, 8(3/4):229-256. doi: 10.1023/A:1022672621406 [42] WU F, LAO N, BLITZER J, et al. Fast reading comprehension with convnets[J]. arXiv preprint, arXiv: 1711.04352, 2017. [43] YU A W, DOHAN D, LUONG M T, et al. Qanet: Combining local convolution with global self-attention for reading comprehension[J]. arXiv preprint, arXiv: 1804.09541, 2018. [44] DAUPHIN Y N, FAN A, AULI M, et al. Language modeling with gated convolutional networks[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017: 933-941. [45] GEHRING J, AULI M, GRANGIER D, et al. Convolutional sequence to sequence learning[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017: 1243-1252. [46] CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2017: 1251-1258. [47] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in neural information processing systems, 2017: 5998-6008. [48] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding with unsupervised learning[R/OL]. Technical report, OpenAI, 2018.[2019.08.01]. https://s3-us-west-2.amazonaws.com/openaiassets/research-covers/language-unsupervised/languageunderstandingpaper.pdf. [49] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint, arXiv: 1810.04805, 2018. http://cn.bing.com/academic/profile?id=c8eeb3deba0714dbeba700642c510bed&encoded=0&v=paper_preview&mkt=zh-cn [50] HILL F, BORDES A, CHOPRA S, et al. The goldilocks principle: Reading children's books with explicit memory representations[J]. arXiv preprint, arXiv: 1511.02301, 2015. https://www.researchgate.net/publication/283659163_The_Goldilocks_Principle_Reading_Children's_Books_with_Explicit_Memory_Representations