裁定文书中企业破产事件的自动化抽取

杨佳乐; 王俊豪; 钱卫宁; 罗轶凤

doi:10.3969/j.issn.1000-5641.201921015

裁定文书中企业破产事件的自动化抽取

doi: 10.3969/j.issn.1000-5641.201921015

华东师范大学数据科学与工程学院, 上海　200062

基金项目: 国家重点研发计划(2018YFC0831900)

详细信息

通讯作者:
罗轶凤, 男, 副教授, 硕士生导师, 研究方向为文本数据挖掘与知识图谱. E-mail: yfluo@dase.ecnu.edu.cn

中图分类号: TP399
计量
- 文章访问数: 203
- HTML全文浏览量: 177
- PDF下载量: 4
- 被引次数: 0
出版历程
- 收稿日期: 2019-08-26
- 网络出版日期: 2020-07-20
- 刊出日期: 2020-07-25

Automatic extraction of corporate bankruptcy-related events from ruling documents

School of Data Science and Engineering, East China Normal University, Shanghai　200062, China

摘要

摘要: 提出了一种企业破产事件抽取框架, 该框架可以从法律裁定书等卷宗资料中检测出相应的法律事件, 并抽取出与事件相关的结构化要素信息. 该框架结合从法院所获得的裁定书等卷宗信息, 运用远程监督技术来构建模型训练数据; 再通过命名实体识别技术对句级别的文书进行序列标注; 最后结合自定义的事件触发词表与事件字典, 运用事件抽取技术对法律文书进行事件识别, 并给出对应事件的结构化信息. 实验结果表明本框架能够取得较高的事件识别精度, 是一种有效的企业破产事件抽取框架.
- 企业破产 /
- 命名实体识别 /
- 事件抽取
Abstract: This paper proposes a framework for extracting corporate bankruptcy-related events from ruling documents and thus extracts structured information about the related events. Combined with ruling documents, our framework uses distant supervision to generate training data; applies named entity recognition techniques to implement sequence label tagging on sentences of litigation documents; and implements event extraction with a self-defined list of event trigger words as well as an event dictionary to detect bankruptcy-related events and gather structured information. Our experimental results demonstrate the effectiveness of the framework.
- enterprise bankruptcy /
- named entity recognition /
- event extraction

HTML全文

图 1 法律事件要素抽取整体框架图

Fig. 1 Overview of the legal event key-argument extraction framework

下载: 全尺寸图片幻灯片

图 2 标签数据的生成过程

Fig. 2 The labeled data generation process

下载: 全尺寸图片幻灯片

图 3 Bi-LSTM + CRF模型图

Fig. 3 The Bi-LSTM + CRF Model

下载: 全尺寸图片幻灯片

图 4 事件抽取框架

Fig. 4 The architecture of event extraction

下载: 全尺寸图片幻灯片

图 5 NER标注示例

Fig. 5 Tagging example of named entity recognition

下载: 全尺寸图片幻灯片

表 1 远程监督下自动标注数据统计

Tab. 1 Statistics of automatically labeled data via distant supervision

数据集	远程监督	正例数目	负例数目
借贷纠纷	1 956	1 980	3 144
劳务纠纷	1 523	1 565	4 142
总计	3 507	3 545	7 286

下载: 导出CSV

表 2 NER评测结果

Tab. 2 Precision, recall, and F1 score of named entity recognition

实体类型	准确率	召回率	F1值
LOC	0.557	0.796	0.655
TIME	0.599	0.917	0.725
PER	0.825	0.803	0.814
ORG	0.839	0.797	0.818

下载: 导出CSV

表 3 劳务纠纷的20个子事件

Tab. 3 20 Sub-events of labor event

劳务纠纷	标签	劳务纠纷	标签
解除劳动关系	LB1	法院驳回仲裁申请	LB11
支付工资报酬	LB2	解除担保	LB12
经济补偿款	LB3	停产、裁员	LB13
拖欠工资、报酬	LB4	奖金	LB14
存在/确认劳动关系	LB5	工衣费	LB15
双倍工资差额	LB6	工作岗位	LB16
签订劳动合同期限	LB7	死亡抚恤待遇	LB17
带薪休假、加班	LB8	解除合同通知书	LB18
未签订劳动合同	LB9	吊销营业执照	LB19
工伤、补助费	LB10	达成/调解协议	LB20

下载: 导出CSV

表 4 事件抽取评测结果

Tab. 4 Precision, recall, and F1 score of event extraction

事件类型	准确率	召回率	F1值
劳务纠纷	0.843	0.872	0.859
借贷纠纷	0.815	0.847	0.831
平均值	0.829	0.859	0.845

下载: 导出CSV

参考文献(12)

[1]	MCCALLUM A, FREITAG D, PEREIRA F. Maximum entropy markov models for information extraction and segmentation [C]//ICML, 2000, 17: 591-598.
[2]	LAFFERTY J, MCCALLUM A, PEREIRA F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data [C]//Proc 18th International Conf on Machine Learning, New York: ACM, 2001: 282-289.
[3]	COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch [J]. Journal of Machine Learning Research, 2011(12): 2493-2537.
[4]	HUANG Z, XU W, YU K. Bidirectional LSTM-CRF Models for sequence tagging [J]. Computer Science, 2015: 1508. 01991v1.
[5]	高丹, 彭敦陆, 刘丛. 海量法律文书中基于CNN的实体关系抽取技术 [J]. 小型微型计算机系统, 2018, 39(5): 1021-1026. DOI: 10.3969/j.issn.1000-1220.2018.05.028.
[6]	KOTSIANTIS S B, ZAHARAKIS I, PINTELAS P. Supervised machine learning: A review of classification techniques [J]. Emerging Artificial Intelligence Applications in Computer Engineering, 2007, 160: 3-24.
[7]	BELAVAGI M C, MUNIYAL B. Performance evaluation of supervised machine learning algorithms for intrusion detection [J]. Procedia Computer Science, 2016, 89: 117-123. DOI: 10.1016/j.procs.2016.06.016.
[8]	CARLSON A, BETTERIDGE J, WANG R C, et al. Coupled semi-supervised learning for information extraction [C]//Proceedings of the Third ACM International Conference on Web Search and Data Mining. New York: ACM, 2010: 101-110.
[9]	HAN J, NGAN K N, LI M, et al. Unsupervised extraction of visual attention objects in color images [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2005, 16(1): 141-145.
[10]	ZENG D, LIU K, CHEN Y, et al. Distant supervision for relation extraction via piecewise convolutional neural networks [C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. New York: ACM, 2015: 1753-1762.
[11]	MINTZ M, BILLS S, SNOW R, et al. Distant supervision for relation extraction without labeled data [C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2. Association for Computational Linguistics, 2009: 1003-1011.
[12]	王礼敏. 面向法律文书的中文命名实体识别方法研究 [D]. 江苏苏州: 苏州大学, 2018.