Relation extraction via distant supervision technology

WANG Jianing; HE Yi; ZHU Renyu; LIU Tingting; GAO Ming

doi:10.3969/j.issn.1000-5641.202091006

Issue 5

Sep. 2020

Turn off MathJax

Article Contents

Article Navigation > Journal of East China Normal University (Natural Sciences) > 2020 > (5): 113-130

WANG Jianing, HE Yi, ZHU Renyu, LIU Tingting, GAO Ming. Relation extraction via distant supervision technology[J]. Journal of East China Normal University (Natural Sciences), 2020, (5): 113-130. doi: 10.3969/j.issn.1000-5641.202091006

Citation:

WANG Jianing, HE Yi, ZHU Renyu, LIU Tingting, GAO Ming. Relation extraction via distant supervision technology[J]. Journal of East China Normal University (Natural Sciences), 2020, (5): 113-130. doi: 10.3969/j.issn.1000-5641.202091006

Citation:

PDF( 902 KB)

Relation extraction via distant supervision technology

doi: 10.3969/j.issn.1000-5641.202091006

1.
School of Data Science and Engineering, East China Normal University, Shanghai　200062, China
2.
Shanghai Municipal Big Data Center, Shanghai　200072, China

Received Date: 2020-08-07
Available Online: 2020-09-24
Publish Date: 2020-09-24

Abstract

Abstract

Relation extraction is one of the classic natural language processing tasks that has been widely used in knowledge graph construction and completion, knowledge base question answering, and text summarization. It aims to extract the semantic relation from a target entity pair. In order to construct a large-scale supervised corpus efficiently, a distant supervision method was proposed to realize automatic annotation by aligning the text with the existing knowledge base. However, it highlights a series of challenges as a result of over-strong assumptions and, accordingly, has attracted the attention of researchers. Firstly, this paper introduces the theories of distant supervision relation extraction and the corresponding formal descriptions. Secondly, we systematically analyze related methods and their respective pros and cons from three perspectives: noisy data, insufficient information, and data imbalance. Next, we explain and compare some benchmark corpus and evaluation metrics. Lastly, we highlight new subsequent challenges for distant supervision relation extraction and discuss trends and directions of future research before concluding.
- relation extraction,
- distant supervision,
- natural language processing,
- knowledge graph,
- noise processing

FullText(HTML)

References(85)

References

[1]	刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述 [J]. 计算机研究与发展, 2016, 53(3): 582-600.
[2]	KEJRIWAL M, SEQUEDA J, LOPEZ V, et al. Knowledge graphs: Construction, management and querying: Editorial [J]. Social Work, 2019, 10(6): 961-962.
[3]	YU M, YIN W, HASAN K S, et al. Improved neural relation detection for knowledge base question answering [C]// Meeting of the Association for Computational Linguistics. 2017: 571-581.
[4]	ALLAHYARI M, POURIYEH S, ASSEFI M, et al. Text summarization techniques: A brief survey [J]. International Journal of Advanced Computer Science and Applications, 2017, 8(10): 397-405.
[5]	HASEGAWA T, SEKINE S, GRISHMAN R, et al. Discovering relations among named entities from large corpora [C]// Meeting of the Association for Computational Linguistics. 2004: 415-422.
[6]	ETZIONI O, BANKO M, SODERLAND S, et al. Open information extraction from the web [J]. Communications of the ACM, 2008, 51(12): 68-74.
[7]	LI F, ZHANG M, FU G, et al. A Bi-LSTM-RNN model for relation classification using low-cost sequence features[J]. ArXiv: Computation and Language, 2016.
[8]	姚春华, 刘潇, 高弘毅, 等. 基于句法语义特征的实体关系抽取技术 [J]. 通信技术, 2018, 51(8): 1828-1835.
[9]	KUMLIEN M C J. Constructing biological knowledge bases by extraction information from text sources [C]// Proc Int Conf Intell Syst Mol Biol. 1999: 77-86.
[10]	MINTZ M, BILLS S, SNOW R, et al. Distant supervision for relation extraction without labeled data [C]// International Joint Conference on Natural Language Processing. 2009: 1003-1011.
[11]	ZENG X, HE S, LIU K, et al. Large scaled relation extraction with reinforcement learning [C]// National Conference on Artificial Intelligence. 2018: 5658-5665.
[12]	杨东明, 杨大为, 顾航, 等. 面向初等数学的知识点关系提取研究 [J]. 华东师范大学学报(自然科学版), 2019(5): 53-65.
[13]	RIEDEL S, YAO L, MCCALLUM A, et al. Modeling relations and their mentions without labeled text [C]// European Conference on Machine Learning. 2010: 148-163.
[14]	BOLLACKER K, EVANS C, PARITOSH P, et al. Freebase: A collaboratively created graph database for structuring human knowledge [C]// International Conference on Management of Data. 2008: 1247-1250.
[15]	JAT S, KHANDELWAL S, TALUKDAR P P, et al. Improving distantly supervised relation extraction using word and entity based attention [J]. ArXiv: Computation and Language, 2018.
[16]	HAN X, ZHU H, YU P, et al. FewRel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation [C]// Empirical Methods in Natural Language Processing. 2018: 4803-4809.
[17]	ZENG D, LIU K, CHEN Y, et al. Distant supervision for relation extraction via piecewise convolutional neural networks [C]// Empirical Methods in Natural Language Processing. 2015: 1753-1762.
[18]	ZELENKO D, AONE C, RICHARDELLA A, et al. Kernel methods for relation extraction [J]. Journal of Machine Learning Research, 2003, 3(6): 1083-1106.
[19]	SHI G, FENG C, HUANG L, et al. Genre separation network with adversarial training for cross-genre relation extraction [C]// Empirical Methods in Natural Language Processing. 2018: 1018-1023.
[20]	VASHISHTH S, JOSHI R, PRAYAGA S S, et al. RESIDE: Improving distantly-supervised neural relation extraction using side information [C]// Empirical Methods in Natural Language Processing. 2018: 1257-1266.
[21]	LI Y, LONG G, SHEN T, et al. Self-attention enhanced selective gate with entity-aware embedding for distantly supervised relation extraction [C]// National Conference on Artificial Intelligence. 2020.
[22]	KUANG J, CAO Y, ZHENG J, et al. Improving neural relation extraction with implicit mutual relations [C]// International Conference on Data Engineering. 2020.
[23]	KRAUSE S, LI H, USZKOREIT H, et al. Large-scale learning of relation-extraction rules with distant supervision from the web [C]// International Semantic Web Conference. 2012: 263-278.
[24]	白龙, 靳小龙, 席鹏弼, 等. 基于远程监督的关系抽取研究综述 [J]. 中文信息学报, 2019, 33(10): 10-17.
[25]	鄂海红, 张文静, 肖思琪, 等. 深度学习实体关系抽取研究综述 [J]. 软件学报, 2019, 30(6): 1793-1818.
[26]	SUCHANEK F M, KASNECI G, WEIKUM G, et al. Yago: A core of semantic knowledge [C]// The Web Conference. 2007: 697-706.
[27]	ZHOU P, SHI W, TIAN J, et al. Attention-based bidirectional long short-term memory networks for relation classification [C]// Meeting of the Association for Computational Linguistics. 2016: 207-212.
[28]	HOFFMANN R, ZHANG C, LING X, et al. Knowledge-based weak supervision for information extraction of overlapping relations [C]// Meeting of the Association for Computational Linguistics. 2011: 541-550.
[29]	SURDEANU M, TIBSHIRANI J, NALLAPATI R, et al. Multi-instance multi-label learning for relation extraction [C]// Empirical Methods in Natural Language Processing. 2012: 455-465.
[30]	TAKAMATSU S, SATO I, NAKAGAWA H, et al. Reducing wrong labels in distant supervision for relation extraction [C]// Meeting of the Association for Computational Linguistics. 2012: 721-729.
[31]	FAN M, ZHAO D, ZHOU Q, et al. Distant supervision for relation extraction with matrix completion [C]// Meeting of the Association for Computational Linguistics. 2014: 839-849.
[32]	ZHANG Q, WANG H. Noise-clustered distant supervision for relation extraction: A nonparametric bayesian perspective [C]// Empirical Methods in Natural Language Processing. 2017: 1808-1813.
[33]	MIN B, GRISHMAN R, WAN L, et al. Distant supervision for relation extraction with an incomplete knowledge base [C]// North American Chapter of the Association for Computational Linguistics. 2013: 777-782.
[34]	XU W, HOFFMANN R, ZHAO L, et al. Filling knowledge base gaps for distant supervision of relation extraction [C]// Meeting of the Association for Computational Linguistics. 2013: 665-670.
[35]	RITTER A, ZETTLEMOYER L, ETZIONI O, et al. Modeling missing data in distant supervision for information extraction [C]// Transactions of the Association for Computational Linguistics. 2013: 367-378.
[36]	LIN Y, SHEN S, LIU Z, et al. Neural relation extraction with selective attention over instances [C]// Meeting of the Association for Computational Linguistics. 2016: 2124-2133.
[37]	JI G, LIU K, HE S, et al. Distant supervision for relation extraction with sentence-level attention and entity descriptions [C]// National Conference on Artificial Intelligence. 2017: 3060-3066.
[38]	JAT S, KHANDELWAL S, TALUKDAR P P, et al. Improving distantly supervised relation extraction using word and entity based attention [J]. ArXiv: Computation and Language, 2018.
[39]	WU S, FAN K, ZHANG Q, et al. Improving distantly supervised relation extraction with neural noise converter and conditional optimal selector [J]. National Conference on Artificial Intelligence, 2019, 33(1): 7273-7280.
[40]	YE Z, LING Z. Distant supervision relation extraction with intra-bag and inter-bag attentions [C]// North American Chapter of the Association for Computational Linguistics. 2019: 2810-2819.
[41]	YUAN Y, LIU L, TANG S, et al. Cross-relation cross-bag attention for distantly-supervised relation extraction [J]. National Conference on Artificial Intelligence, 2019, 33(1): 419-426.
[42]	JIA W, DAI D, XIAO X, et al. ARNOR: Attention regularization based noise reduction for distant supervision relation classification [C]// Meeting of the Association for Computational Linguistics. 2019: 1399-1408.
[43]	ALT C, HUBNER M, HENNIG L, et al. Fine-tuning pre-trained transformer language models to distantly supervised relation extraction [C]// Meeting of the Association for Computational Linguistics. 2019: 1388-1398.
[44]	WU Y, BAMMAN D, RUSSELL S, et al. Adversarial training for relation extraction [C]// Empirical Methods in Natural Language Processing. 2017: 1778-1783.
[45]	QIN P, WEIRAN X U, WANG W Y, et al. DSGAN: Generative adversarial training for robust distant supervision relation extraction [C]// Meeting of the Association for Computational Linguistics. 2018: 496-505.
[46]	LI P, ZHANG X, JIA W, et al. GAN driven semi-distant supervision for relation extraction [C]// North American Chapter of the Association for Computational Linguistics. 2019: 3026-3035.
[47]	HAN X, LIU Z, SUN M, et al. Denoising distant supervision for relation extraction via instance-level adversarial training [J]. ArXiv: Computation and Language, 2018.
[48]	FENG J, HUANG M, ZHAO L, et al. Reinforcement learning for relation classification from noisy data [C]// National Conference on Artificial Intelligence. 2018: 5779-5786.
[49]	HE Z, CHEN W, WANG Y, et al. Improving neural relation extraction with positive and unlabeled learning [C]// National Conference on Artificial Intelligence. 2020.
[50]	QIN P, XU W, WANG W Y, et al. Robust distant supervision relation extraction via deep reinforcement learning [C]// Meeting of the Association for Computational Linguistics. 2018: 2137-2147.
[51]	SU Y, LIU H, YAVUZ S, et al. Global relation embedding for relation extraction [C]// North American Chapter of the Association for Computational Linguistics. 2018: 820-830.
[52]	XU P, BARBOSA D. Investigations on knowledge base embedding for relation prediction and extraction [J]. ArXiv: Computation and Language, 2018.
[53]	XU P, BARBOSA D. Connecting language and knowledge with heterogeneous representations for neural relation extraction [C]// North American Chapter of the Association for Computational Linguistics. 2019: 3201-3206.
[54]	LIU Y, LIU K, XU L, et al. Exploring fine-grained entity type constraints for distantly supervised relation extraction [C]// International Conference on Computational Linguistics. 2014: 2107-2116.
[55]	YE Y, FENG Y, LUO B, et al. Integrating relation constraints with neural relation extractors [C]// National Conference on Artificial Intelligence. 2020.
[56]	BELTAGY I, LO K, AMMAR W, et al. Combining distant and direct supervision for neural relation extraction [C]// North American Chapter of the Association for Computational Linguistics. 2019: 1858-1867.
[57]	WEI Z, SU J, WANG Y, et al. A novel hierarchical binary tagging framework for joint extraction of entities and relations [J]. ArXiv: Computation and Language, 2019.
[58]	REN X, WU Z, HE W, et al. CoType: Joint extraction of typed entities and relations with knowledge bases [C]// The Web Conference. 2017: 1015-1024.
[59]	TAKANOBU R, ZHANG T, LIU J, et al. A hierarchical framework for relation extraction with reinforcement learning [J]. National Conference on Artificial Intelligence, 2019, 33(1): 7072-7079.
[60]	YE W, LI B, XIE R, et al. Exploiting entity BIO tag embeddings and multi-task learning for relation extraction with imbalanced data [C]// Meeting of the Association for Computational Linguistics. 2019: 1351-1360.
[61]	GUI Y, LIU Q, ZHU M, et al. Exploring long tail data in distantly supervised relation extraction [C]// LIN C Y, XUE N, ZHAO D, et al. Natural Language Understanding and Intelligent Applications. ICCPOL 2016, NLPCC 2016. Lecture Notes in Computer Science, 2016.
[62]	ZHANG N, DENG S, SUN Z, et al. Long-tail relation extraction via knowledge graph embeddings and graph convolution networks [C]// North American Chapter of the Association for Computational Linguistics. 2019: 3016-3025.
[63]	HAN X, YU P, LIU Z, et al. Hierarchical relation extraction with coarse-to-fine grained attention [C]// Empirical Methods in Natural Language Processing. 2018: 2236-2245.
[64]	MIKOLOV T, CHEN K, CORRADO G S, et al. Efficient estimation of word representations in vector space [C]// International Conference on Learning Representations. 2013.
[65]	PENNINGTON J, SOCHER R, MANNING C D, et al. Glove: Global vectors for word representation [C]// Empirical Methods in Natural Language Processing. 2014: 1532-1543.
[66]	DEVLIN J, CHANG M, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding [C]// North American Chapter of the Association for Computational Linguistics. 2019: 4171-4186.
[67]	GOODFELLOW I, POUGETABADIE J, MIRZA M, et al. Generative adversarial nets [C]// Neural Information Processing Systems. 2014: 2672-2680.
[68]	SALVARIS M, DEAN D, TOK W H, et al. Generative adversarial networks [J]. ArXiv: Machine Learning, 2018: 187-208.
[69]	ANDREW A M. Reinforcement learning: An introduction [J]. Kybernetes, 1998, 27(9): 1093-1096.
[70]	SUN T, ZHANG C, JI Y, et al. Reinforcement learning for distantly supervised relation extraction [J]. IEEE Access, 2019(7): 98023-98033.
[71]	TANG J, QU M, WANG M, et al. LINE: Large-scale information network embedding [C]// The Web Conference. 2015: 1067-1077.
[72]	HOCHREITER S, SCHMIDHUBER J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735-1780.
[73]	BORDES A, USUNIER N, GARCIADURAN A, et al. Translating embeddings for modeling multi-relational data [C]// Neural Information Processing Systems. 2013: 2787-2795.
[74]	KIPF T, WELLING M. Semi-supervised classification with graph convolutional networks [C]// International Conference on Learning Representations. 2017.
[75]	HENDRICKX I, KIM S N, KOZAREVA Z, et al. SemEval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals [C]// North American Chapter of the Association for Computational Linguistics. 2009: 94-99.
[76]	SURDEANU M, GUPTA S, BAUER J, et al. Stanford's distantly-supervised slot-filling system [R]. Stanford, CA: Stanford University, 2011.
[77]	JI, HENG, GRISHMAN, RALPH, et al. Overview of the TAC 2010 knowledge base population track [C]// Text Analysis Conference. 2009.
[78]	JI H, GRISHMAN R, DANG H. Overview of the TAC2011 knowledge base population track [C]// Text Analysis Conference. 2011.
[79]	GAO T, HAN X, ZHU H, et al. FewRel 2.0: Towards more challenging few-shot relation classification [C]// International Joint Conference on Natural Language Processing. 2019: 6249-6254.
[80]	XU J, WEN J, SUN X, et al. A discourse-level named entity recognition and relation extraction dataset for Chinese literature text [J]. ArXiv: Computation and Language, 2017.
[81]	HAN X, GAO T, YAO Y, et al. OpenNRE: An open and extensible toolkit for neural relation extraction [C]// International Joint Conference on Natural Language Processing. 2019: 169-174.
[82]	LIU T, ZHANG X, ZHOU W, et al. Neural relation extraction via inner-sentence noise reduction and transfer learning [C]// Empirical Methods in Natural Language Processing. 2018: 2195-2204.
[83]	REN Z, WANG X, ZHANG N, et al. Deep reinforcement learning-based image captioning with embedding reward [C]// Computer Vision and Pattern Recognition. 2017: 1151-1159.
[84]	SHANG Y M, HUANG H, MAO X, et al. Are noisy sentences useless for distant supervised relation extraction [C]// National Conference on Artificial Intelligence. 2020.
[85]	CAO Z, HIDALGO G, SIMON T, et al. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields [J]. ArXiv: Computer Vision and Pattern Recognition, 2018.