基于混合方法的医疗欺诈行为检测

潘松松; 张伟佳

doi:10.3969/j.issn.1000-5641.2017.05.012

基于混合方法的医疗欺诈行为检测

doi: 10.3969/j.issn.1000-5641.2017.05.012

华东师范大学, 计算机科学与软件工程学院, 上海 200062

基金项目:

国家重点研发计划 2016YFB1000904

详细信息

作者简介:
潘松松, 女, 硕士研究生, 研究方向为数据挖掘.E-mail:pss_ahnu@163.com

中图分类号: TP311
计量
- 文章访问数: 190
- HTML全文浏览量: 97
- PDF下载量: 259
- 被引次数: 0
出版历程
- 收稿日期: 2017-06-20
- 刊出日期: 2017-09-25

Fraudulent medical behavior detection based on hybrid approach

School of Computer Science and Software Engineering, East China Normal University, Shanghai 200062, China

摘要

摘要: 随着医保制度的不断完善，医保覆盖率的不断扩大，医保基金的正常运转已经与人民大众的切身利益密切相关.然而，频繁就医、分解住院和异常费用支出等欺诈行为的频繁发生，极大地威胁着医保基金的正常运转.本文先使用随机森林的方法分病种进行特征选择，然后通过基于Clustering-Based Local Outlier Factor（CBLOF）的方法以及改进的CBLOF方法检测异常的结算费用.同时通过基于规则的方法检测频繁就医和分解住院行为.通过在真实医保结算数据上进行实验，实验结果证明了方法的可行性和有效性.最后，本文给出了医保基金监督平台的系统框架，通过该平台对透视分析的结果进行可视化展示.
- 异常检测 /
- 局部异常因子 /
- CBLOF /
- 分解住院
Abstract: With continuous improvement of medical insurance system, coverage of medical insurance continues to expand. The normal operation of medical insurance funds has been closely related with the vital interests of the people. However, frequent occurrence of fraudulent behaviors such as frequent hospitalization, hospitalization decomposition, abnormal fees threaten the normal operation of funds. This paper firstly used random forest method to select different features according to different diseases. Then the paper applied CBLOF-based and improved CBLOF methods to detect abnormal fees. What's more, we utilized rule-based method to identity frequent hospitalization and hospitalization decomposition. Extensive experiments on real medical claim datasets demonstrate the effectiveness and efficiency of our proposal. Finally, this paper proposed a medical insurance fund supervisory system, which can display results of pivot analysis with the help of Echarts.
- outlier detection /
- local outlier factor /
- CBLOF /
- hospitalization decomposition

HTML全文

图 1 医保基金监督平台框架图

Fig. 1 Framework of medical insurance fund supervisory system

下载: 全尺寸图片幻灯片

图 2 腰椎间盘突出住院费用影响因素

Fig. 2 Factor of hospitalization fees of lumbar disc herniation

下载: 全尺寸图片幻灯片

图 3 时间消耗对比图

Fig. 3 Comparison of time cost

下载: 全尺寸图片幻灯片

图 4 费用支出异常模块系统界面

Fig. 4 The interface of the details of abnormal fees

下载: 全尺寸图片幻灯片

图 5 各个等级医院异常费用检测频次分布情况

Fig. 5 Frequency distribution of abnormal fees of different hospital level

下载: 全尺寸图片幻灯片

算法1.基于CBLOF的异常检测方法

输入:住院记录 $D$ , 簇大小划分阈值 $\alpha $ , 簇集合cluster_set

数据记录与簇最大相似度阈值max_sim

输出:每个簇所包含的记录列表, 每条记录对应的lof值

1.初始化簇集合cluster_set

2. for record in $D$ do

3. if record为第一条记录then

4. addNewCluster(record)

5. else

6. for cluster in cluster_set do

7. computeSimilarity(record, cluster)

8. 用cur_similarity记录最大相似度, 用index记录对应的簇索引号

9. if cur_similarity $>$ max_sim then

10. addToCluster(record, index)

11. else

12. addNewCluster(record)

13. for cluster in cluster_set do

14. if cluster的大小超过 $\alpha \cdot| D|$ then

15. cluster标记为 $L$

16. else

17. cluster标记为 $S$

18. for record in $D$ do

19. if record属于 $c_i $ , 且 $c_i $ 标记为 $S$ then

20. lof(record)=computedLOF_S(record, cluster_set)

21. else

22. lof(record)=computedLOF_L(record, $c_i )$

下载: 导出CSV

算法2.基于改进的CBLOF的异常检测方法

输入:住院记录 ${D}'$ , 大簇更新LOF阈值 $\beta $ , 簇的集合cluster_set

输出:每条新记录对应的lof值

1. for record in ${D}'$ do

2. 使用Squeezer方法进行聚类, 聚类结果为 $c_i $ , 簇索引为index

3. if $c_i $ 标记为 $S$ then

4. addToCluster(record, index)

5. for line in $c_i $ do

6. lof(line)=updateLOF_S(line, index)

7. else

8. $cnt$ = $cnt$ +1

9. lof(record)=computeLOF_L(record, index)

10. if $\dfrac{cnt}{| {c_i }|}>\beta $ then

11. for line in $c_i$ do

12. lof(line)=updateLOF_L(line, index)

13. cnt=0

下载: 导出CSV

表 1 分解住院示例

Tab. 1 Examples of hospitalization decomposition

个人编号	医院编号	入院病种	入院日期	出院日期
057FCF30903	018FF7841008E	腰椎间盘突出	20150611	20150617
057FCF30903	018FF7841008E	腰椎间盘突出	20150617	20150624
057FCF30903	018FF7841008E	腰椎间盘突出	20150624	20150701

下载: 导出CSV

表 2 数据集统计信息

Tab. 2 Statistics of datasets

数据类型	数据条目	数据集关键字段
缴费明细	63700007	区县代码、个人编号、缴费年月、年度工资、缴费工资、
		缴费金额、单位缴费、个人缴费
个人信息	429421	区县代码、个人编号、在职与离退状态、工作单位类型、
		出生日期、性别
住院信息	722883	区县代码、个人编号、就医序号、医疗机构代码医院等级、总费用、
		甲类费用、乙类费用、非基本费用、药品费、起付线、报销比例、
		个人账户支付、统筹账户支付、个人自付、入院日期、出院日期、
		入院病种名称、出院病种名称
门诊信息	1255139	区县代码、个人编号、就医序号、医疗机构代码、总费用、甲类费用、
		乙类费用、非基本费用、起付线、报销比例、个人账户支付、统筹账
		户支付、补保支付、个人自付、结算时间
住院明细	167254167	就医序号、三大目录id、药品/诊疗名称、单价、数量、限价、总费用
门诊明细	2952968	就医序号、三大目录id、药品/诊疗名称、单价、数量、限价、总费用

下载: 导出CSV

表 3 检出率比较

Tab. 3 Comparison of coverage

算法(数据量)	CBLOF	改进的CBLOF
1% (4)	2 (5%)	4(10%)
2% (10)	6(15%)	9(23%)
4% (19)	15(38%)	19(48%)
6% (29)	22(56%)	23(59%)
8% (38)	28(71%)	29(74%)
10% (48)	35(89.7%)	35(89.7%)
12% (57)	37(94.8%)	37(94.8%)
14% (67)	38(97.4%)	39(100%)
15% (72)	39(100%)	39(100%)

下载: 导出CSV

表 4 频繁就医和分解住院检测结果

Tab. 4 Results of frequent hospitalization and hospitalization decomposition detection

检测类型	数据条目	所占比例
频繁就医	675	0.10%
分解住院	621	0.15%

下载: 导出CSV

表 5 分解住院行为检测

Tab. 5 Results of hospitalization decomposition detection

医疗机构代码	医院等级	出现次数	所占比例
8C78D9FC2943A	二级	1 287	30.19%
018FF7841008E	三级	1 261	29.5%
9E9A032EDE596	二级	226	5.3%
D30AFB8296CF7	三级	217	5.2%

下载: 导出CSV

参考文献(25)

[1]	SHI Y, SUN C, LI Q, et al. A fraud resilient medical insurance claim system[C]//Thirtieth AAAI Conference on Artificial Intelligence. USA:AAAI Press, 2016:4393-4394.
[2]	XIE Z P, LI X Y, WU W Y, et al. An improved outlier detection algorithm to medical insurance[J]. IDEAL, 2016:436-444. doi: 10.1007/978-3-319-46257-8_47
[3]	DIONNE G, GAGNé R. Replacement cost endorsement and opportunistic fraud in automobile insurance[J]. Journal of Risk & Uncertainty, 2002, 24(3):213-230. http://econpapers.repec.org/paper/fthetcori/00-01.htm
[4]	SKIBA J M. A phenomenological study of the challenges and barriers facing insurance fraud investigators[J]. Journal of Insurance Regulation, 2013:131-136. http://gradworks.proquest.com/35/67/3567156.html
[5]	KRAUSE J H. A patient-centered approach to health care fraud recovery[J]. Journal of Criminal Law & Criminology, 2006, 96(2):579-619. https://dialnet.unirioja.es/servlet/articulo?codigo=2245097
[6]	LORENZ F A. Healthcare fraud in the United States:Assessing current policy and its role in fraud prevention[J]. California State University Northridge, 2013:221-227. http://scholarworks.calstate.edu/handle/10211.2/3246
[7]	李亮. 基于成本-收益理论的社会医疗保险欺诈问题研究[D]. 长沙: 湖南大学, 2011. http://cdmd.cnki.com.cn/Article/CDMD-10532-1012491622.htm
[8]	王明慧, 陶四海.我国大病医疗保险实施的影响因素分析[J].经营管理者, 2013, 21:298-298. http://www.cnki.com.cn/Article/CJFDTOTAL-GLZJ201321261.htm
[9]	夏宏, 汪凯, 张守春.医疗保险中的欺诈与反欺诈问题[J].现代预防医学, 2007, 34(20):3907-3908. doi: 10.3969/j.issn.1003-8507.2007.20.052
[10]	COHEN W W. Fast effective rule induction[J]. Machine Learning Proceedings, 1995, 46(2):115-123. https://www.sciencedirect.com/science/article/pii/B9781558603776500232
[11]	BIAFORE S. Predictive solutions bring more power to decision makers[J]. Health Management Technology, 1999, 20(10):12. http://www.ncbi.nlm.nih.gov/pubmed/10622867
[12]	MARCUSNEWHALL A, HALPERN D, TAN S J. Healthcare and data mining[J]. Health Management Technology, 2000.
[13]	高臻耀, 张敬谊, 林志杰, 等.一个医保基金风险防控平台中的数据挖掘技术[J].计算机应用与软件, 2011, 28(8):120-122. http://www.cnki.com.cn/Article/CJFDTOTAL-JYRJ201108035.htm
[14]	ROBERTS S J, PENNY W, PILLOT D. Novelty, confidence and errors in connectionist systems[C]//Intelligent Sensors.[S.l.]:IET, 1996:10/1-10/6.
[15]	BREUNIG M M, KRIEGEL H P, NG R T, et al. OPTICS-OF:Identifying local outliers[J]. Lecture Notes in Computer Science, 1999, 1704:262-270. doi: 10.1007/b72280
[16]	黄洪宇, 林甲祥, 陈崇成, 等.离群数据挖掘综述[J].计算机应用研究, 2006, 23(8):8-13. http://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ200608002.htm
[17]	LIU B, YIN J, XIAO Y, et al. Exploiting local data uncertainty to boost global outlier detection[C]//IEEE International Conference on Data Mining.[S.l.]:IEEE Computer Society, 2010:304-313.
[18]	ESTER M, KRIEGEL H P, XU X. A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise[C]//International Conference on Knowledge Discovery and Data Mining. USA:AAAI Press, 1996:226-231.
[19]	NG R T, HAN J. Efficient and effective clustering methods for spatial data mining[C]//International Conference on Very Large Data Bases. San Francisco:Margan Kaufmann, 1994:144-155.
[20]	ZHANG T, RAMAKRISHNAN R, LIVNY M. BIRCH:An efficient data clustering method for very large databases[J]. ACM SIGMOD Record, 1999, 25(2):103-114.
[21]	SUN C F, SHI Y L, LI Q I, et al. A hybrid approach for detecting fraudulent medical insurance claims:(Extended abstract)[C]//Proceedings of the 2016 Interational Conference on Autonomous) Agents & Multiagent Systems. Singapore:IFAAMS, 2016:1287-1288.
[22]	MOYANO L G, APPEL A P, SANTANA V F D, et al. GraPhys:Understanding health care insurance data through graph analytics[C]//International Conference Companion on World Wide Web.[S.l.]:International World Wide Web Conferences Steering Committee, 2016:227-230.
[23]	BAUDER R A, KHOSHGOFTAAR T M. A novel method for fraudulent medicare claims detection from expected payment deviations (Application Paper)[C]//IEEE, International Conference on Information Reuse and Integration.[S.l.]:IEEE, 2016:11-19.
[24]	关皓文. 基于离群点检测方法的医保异常发现[D]. 济南: 山东大学, 2016. http://cdmd.cnki.com.cn/Article/CDMD-10422-1016160032.htm
[25]	HE Z, XU X, DENG S. Squeezer:An efficient algorithm for clustering categorical data[J]. Journal of Computer Science and Technology, 2002, 17(5):611-624. doi: 10.1007/BF02948829