中国综合性科技类核心期刊(北大核心)

中国科学引文数据库来源期刊(CSCD)

美国《化学文摘》(CA)收录

美国《数学评论》(MR)收录

俄罗斯《文摘杂志》收录

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

单图中的近似频繁子图挖掘算法

窦建凯 林欣 胡文心

窦建凯, 林欣, 胡文心. 单图中的近似频繁子图挖掘算法[J]. 华东师范大学学报(自然科学版), 2019, (6): 73-87. doi: 10.3969/j.issn.1000-5641.2019.06.008
引用本文: 窦建凯, 林欣, 胡文心. 单图中的近似频繁子图挖掘算法[J]. 华东师范大学学报(自然科学版), 2019, (6): 73-87. doi: 10.3969/j.issn.1000-5641.2019.06.008
DOU Jian-kai, LIN Xin, HU Wen-xin. Algorithm for mining approximate frequent subgraphs in a single graph[J]. Journal of East China Normal University (Natural Sciences), 2019, (6): 73-87. doi: 10.3969/j.issn.1000-5641.2019.06.008
Citation: DOU Jian-kai, LIN Xin, HU Wen-xin. Algorithm for mining approximate frequent subgraphs in a single graph[J]. Journal of East China Normal University (Natural Sciences), 2019, (6): 73-87. doi: 10.3969/j.issn.1000-5641.2019.06.008

单图中的近似频繁子图挖掘算法

doi: 10.3969/j.issn.1000-5641.2019.06.008
详细信息
    作者简介:

    窦建凯, 男, 硕士研究生, 研究方向为频繁图挖掘.E-mail:yirandjk@163.com

    通讯作者:

    胡文心, 女, 高级工程师, 硕士生导师, 研究方向为计算机科学.E-mail:wxhu@cc.ecnu.edu.cn

  • 中图分类号: TP391.4

Algorithm for mining approximate frequent subgraphs in a single graph

  • 摘要: 图数据的挖掘工作是数据挖掘工作中的重要组成部分,已经有许多人在这个领域进行了深入的研究.由于数据获取不可避免噪音数据,故在挖掘频繁图时考虑近似十分重要.然而许多此前的工作只考虑了子图间编辑距离(Graph Edit Distance,GED)的绝对值,而没有考虑子图间编辑距离与子图大小的相对关系.提出了一种在单图中进行近似频繁子图挖掘的新算法,并在计算近似程度时考虑当前子图的大小.该算法通过对近似频繁子图的大小上限进行预测,并通过局部反单调性进行剪枝,提高了算法的效率.实验表明,该算法能够挖掘出传统算法无法发现的近似频繁子图,且相比对比算法具有更好的时间性能.
  • 图  1  近似匹配例子

    Fig.  1  Example of approximate matches

    图  2  图$G$的近似频繁子图

    Fig.  2  Approximate frequent subgraphs from graph $G$

    图  3  算法主要框架

    Fig.  3  Framework of our algorithm

    图  4  候选子图扩展例图

    Fig.  4  Example graph of candidate graph generation

    图  5  反单调性无法保持示意图

    Fig.  5  Example when anti-monotonicity does not hold

    图  6  候选子图生成流程示意

    Fig.  6  Process of candidate subgraph generation

    图  7  近似子图匹配流程示意图

    Fig.  7  Process of approximate subgraph matching

    图  8  结果展示例图

    Fig.  8  Sample result illustration

    图  9  算法时间消耗随支持度要求的变化

    Fig.  9  Time required varies with support requirements

    图  10  算法时间消耗随近似程度要求的变化

    Fig.  10  Time required varies with approximation

    Algorithm 1 SimilarGraphGene
    Input: $V_g, E_g, T, d= |g|\times(1-\theta)$
    Output: All Similar Graph of g
    1: $S\leftarrow V_g, E_g, $
    2:for each vertex or edge $x$ in $S$ do
    3:    Deleted $\leftarrow x$
    4:    if $x$ is a vertex then
    5:      Deleted $\leftarrow$ every edge connected to $x$
    6:    end if
    7:    $m$=isconnected(Deleted, $V_g$, $E_g$)
    8:    if $m$=true then
    9:       $k= |T|$
    10:      $V_{g'} = V_g$ Deleted$_ V$
    11:        $E_{g' }= E_g$ Deleted$_ E$
    12:        if $|$Deleted$|$ $ < d$ then
    13:          SimilarGraphGene($V_{g'}$, $E_{g'}$, $T, d-|$Deleted$|$)
    14:          if $k= |T|$ then
    15:            $T\leftarrow g$
    16:          end if
    17:        end if
    18:        if $|$Deleted$| = d$ then
    19:          $T\leftarrow g'$
    20:        end if
    21:      end if
    22:      undo line 3-5
    23:end for
    24:return $T$
    下载: 导出CSV

    表  1  样本图上挖掘的时间消耗对比

    Tab.  1  Comparison of time required for mining of sample graph

    算法$t$/ms
    $G7$$G9$$G11$$G13$
    本文算法535547601 347
    AGraP1 04110 226346 91846 301 253
    下载: 导出CSV

    表  2  发现子图能力结果

    Tab.  2  Experimental results showing the ability to discover subgraphs

    近似匹配
    本文算法对比算法
    点0, 4, 7点0, 4, 7
    点1, 3, 8点1, 3, 8
    点2, 6, 10点2, 6, 10
    (0, 1), (3-4), (7, 8)(0, 1), (3-4), (7, 8)
    (1, 0, 2), (4, 3), (8, 7, 10)(1, 0, 2), (8, 7, 10)
    下载: 导出CSV
  • [1] YAN X F, HAN J W. gSpan: Graph-based substructure pattern mining[C]//Proceedings of the 2002 IEEE International Conference on Data Mining. 2002: 721-724.
    [2] ELSEIDY M, ABDELHAMID E, SKIADOPOULOS S, et al. GraMi:Frequent subgraph and pattern mining in a single large graph[J]. Proceedings of the VLDB Endowment, 2014, 7:517-528. doi:  10.14778/2732286.2732289
    [3] CHEN C, YAN X F, ZHU F D, et al. gApprox: Mining frequent approximate patterns from a massive network[C]//7th IEEE International Conference on Data Mining. 2007: 445-450.
    [4] FLORES-GARRIDO M, CARRASCO-OCHOA J A, MARTÍNEZ-TRINIDAD J F. AGraP:An algorithm for mining frequent patterns in a single graph using inexact matching[J]. Knowledge and Information Systems, 2015, 2(44):385-406. http://d.old.wanfangdata.com.cn/NSTLQK/NSTL_QKJJ0234951655/
    [5] KURAMOCHI M, KARYPIS G. Finding frequent patterns in a large sparse graph[J]. Data Mining and Knowledge Discovery, 2005, 11(3):243-271. doi:  10.1007/s10618-005-0003-9
    [6] NIJSSEN S, KOK J N. A quickstart in frequent structure mining can make a difference[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004: 647-652.
    [7] ALONSO A G, PAGOLA J E M, CARRASCO-OCHOA J A, et al. Mining frequent connected subgraphs reducing the number of candidates[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2008: 365-376.
    [8] RANU S, SINGH A K. Graphsig: A scalable approach to mining significant subgraphs in large graph databases[C]//2009 IEEE 25th International Conference on Data Engineering. 2009: 844-855.
    [9] CHENG H, YAN X F, HAN J W. Mining graph patterns[C]//Frequent Pattern Mining, Berlin: Springer, 2014: 307-338.
    [10] KRISHNA V, SURI N R, ATHITHAN G. A comparative survey of algorithms for frequent subgraph discovery[J]. Current Science, 2011, :190-198. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=Open J-Gate000000074023
    [11] CHOUDHURY S, PUROHIT S, LIN P, et al. Percolator: Scalable pattern discovery in dynamic graphs[C]//Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 2018: 759-762.
    [12] INGALALLI V, IENCO D, PONCELET P. Mining frequent subgraphs in multigraphs[J]. Information Sciences, 2018, 451/452:50-66. doi:  10.1016/j.ins.2018.04.001
    [13] ALGULIEV R M, ALIGULIYEV R M, GANJALIYEV F S. Extracting a heterogeneous social network of academic researchers on the Web based on information retrieved from multiple sources[J]. American Journal of Operations Research, 2011, 1(2):33. http://d.old.wanfangdata.com.cn/OAPaper/oai_doaj-articles_d1a43f69b12775f8e61ed6f03a30c453
    [14] LIMA JR D P, GIACOMINI H C, TAKEMOTO R M, et al. Patterns of interactions of a large fish-parasite network in a tropical floodplain[J]. Journal of Animal Ecology, 2012, 81(4):905-913.. doi:  10.1111/j.1365-2656.2012.01967.x
    [15] GAO X B, XIAO B, TAO D C, et al. A survey of graph edit distance[J]. Pattern Analysis and Applications, 2010, 13(1):113-129. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=23f1c6b1111f176881d00ede280bb4eb
    [16] HIDOVIĆ D, PELILLO M. Metrics for attributed graphs based on the maximal similarity common subgraph[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2004, 18(3):299-313. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=26e34f94bfb8adc8f71be64ac36fc57c
    [17] DEHMER M, EMMERT-STREIB F. Comparing large graphs efficiently by margins of feature vectors[J]. Applied Mathematics and Computation, 2007, 188(2):1699-1710. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=bb5b4f11bfe4a7efbf2a637396f786e0
    [18] HOLDER L B, COOK D J, DJOKO S. Substucture discovery in the SUBDUE system[C]//KDD Workshop, 1994: 169-180.
    [19] JIA Y, ZHANG J T, HUAN J. An efficient graph-mining method for complicated and noisy data with real-world applications[J]. Knowledge and Information Systems, 2011, 28(2):423-447. doi:  10.1007/s10115-010-0376-y
    [20] FLORES-GARRIDO M, CARRASCO-OCHOA J A, MARTÍNEZ-TRINIDAD J F. Extensions to AGraP algorithm for finding a reduced set of inexact graph patterns[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2018, 32(1):1860012. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=6062c7c25d7e6f97c25cd19da9333d5c
    [21] ACOSTA-MENDOZA N, GAGO-ALONSO A, CARRASCO-OCHOA J A, et al. Extension of canonical adjacency matrices for frequent approximate subgraph mining on multi-graph collections[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2017, 31(8):1750025. doi:  10.1142/S0218001417500252
    [22] ACOSTA-MENDOZA N, MORALES-GONZáLEZ A, GAGO-ALONSO A, et al. Image classification using frequent approximate subgraphs[C]//Iberoamerican Congress on Pattern Recognition. 2012: 292-299.
    [23] ACOSTA-MENDOZA N, CARRASCO-OCHOA J A, MARTÍNEZ-TRINIDAD J F, et al. Image clustering based on frequent approximate subgraph mining[C]//Mexican Conference on Pattern Recognition. 2018: 189-198.
  • 加载中
图(10) / 表(3)
计量
  • 文章访问数:  163
  • HTML全文浏览量:  82
  • PDF下载量:  1
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-12-21
  • 刊出日期:  2019-11-25

目录

    /

    返回文章
    返回