中国综合性科技类核心期刊(北大核心)

中国科学引文数据库来源期刊(CSCD)

美国《化学文摘》(CA)收录

美国《数学评论》(MR)收录

俄罗斯《文摘杂志》收录

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

两种新的非确定数据库上的Top-k查询

邱鑫 林欣

邱鑫, 林欣. 两种新的非确定数据库上的Top-k查询[J]. 华东师范大学学报(自然科学版), 2017, (1): 52-63. doi: 10.3969/j.issn.1000-5641.2017.01.007
引用本文: 邱鑫, 林欣. 两种新的非确定数据库上的Top-k查询[J]. 华东师范大学学报(自然科学版), 2017, (1): 52-63. doi: 10.3969/j.issn.1000-5641.2017.01.007
QIU Xin, LIN Xin. Two new Top-k queries in uncertain database[J]. Journal of East China Normal University (Natural Sciences), 2017, (1): 52-63. doi: 10.3969/j.issn.1000-5641.2017.01.007
Citation: QIU Xin, LIN Xin. Two new Top-k queries in uncertain database[J]. Journal of East China Normal University (Natural Sciences), 2017, (1): 52-63. doi: 10.3969/j.issn.1000-5641.2017.01.007

两种新的非确定数据库上的Top-k查询

doi: 10.3969/j.issn.1000-5641.2017.01.007
基金项目: 

国家自然科学基金 61572193

上海张江国家自主创新示范区专项发展基金 201411-JA-B108-002

上海市科学技术委员会项目 14DZ2260800

详细信息
    作者简介:

    邱鑫:邱 鑫, 男, 硕士研究生, 研究方向为非确定数据库. E-mail:51131201029@ecnu.cn

    通讯作者:

    林 欣, 男,副教授, 研究方向为新型数据处理. E-mail: xlin@cs.ecnu.edu.cn

  • 中图分类号: TP392

Two new Top-k queries in uncertain database

  • 摘要: 由于当前已有的在非确定数据库上的Top-k查询普遍基于元组层面,使得应用受限, 为了让查询结果更符合直觉,提出了两种新的非确定数据库上的基于x-元组层面的Top-k查询及其执行算法.这两种新的查询综合x元组中各元组的评分和置信度,获得在返回结果中最具实际意义的位置. 查询的执行算法经过优化,执行效率明显改善.
  • 图  1  k=100时程序访问元组数量

    Fig.  1  The number of accessed tuples when k=100

    图  2  k=100时程序获得结果时维护状态数量

    Fig.  2  The number of maintained states after obtaining results when k=100

    图  3  x-元组数量为1 000时的查询执行平均运行时间

    Fig.  3  The average running time of query processing procedure when the number of x-tuple is 1 000

    表  1  数据库udb1

    Tab.  1  The database udb1

    x-tupletuplescoreconfidence
    T1t1580.4
    t2380.1
    t3220.5
    T2t1470.4
    t2420.2
    t3330.4
    T3t1530.7
    t2260.1
    t3150.2
    下载: 导出CSV

    表  2  Global Top-k和U-k Ranks的计算示例(概率)

    Tab.  2  The example of computation of Global Top-k and U-kRanks (probability)

    T1.t1T2.t1T3.t1T2.t2T1.t2T2.t3
    Rank-10.40.0720.420.0360.0120.06
    Rank-200.2160.280.1080.0460.2
    Top-20.40.2880.70.1440.0580.26
    下载: 导出CSV

    表  3  温度传感器非确定性数据库示例

    Tab.  3  The example of uncertain database under temperature sensor

    传感器时间地点温度/℃置信度
    C110: 00L1001580.4
    C210: 00L1001380.1
    C310: 00L1001220.5
    C410: 00L1002470.4
    C510: 00L1002420.2
    C610: 00L1002330.4
    C710: 00L1003530.7
    C810: 00L1003260.1
    C910: 00L1003150.2
    下载: 导出CSV

    表  4  预处理后的udb1

    Tab.  4  The udb1 after preprocessing

    x-tupletuplescoreconfidence
    T1t1950.4
    T3t1920.7
    T2t1890.4
    T2t2850.2
    T1t2810.1
    T2t3800.4
    T3t2750.1
    T1t3730.5
    T3t3700.2
    下载: 导出CSV

    表  5  OS-State [T3,T1]的分组在查询执行程序 访问到T1.t2时的情况

    Tab.  5  The situation of the groups of OS-State [T3,T1] when query processing procedure accessing T1.t2

    计算表达式
    Sj[0]T3.t1.prob×(1−P(T2.xid))×T1.t2.prob0.7×(1−0.4−0.2)×0.1
    Sj[1]T3.t1.prob×(1−P(T1.xid))×(1−P(T2.xid))0.7×(1−0.4−0.1)×(1−0.4−0.2)0.7×(1−0.4−0.1)×(1−0.4−0.2)
    Sj[2](1−P(T3.xid))×(1−P(T1.xid))×(1−P(T2.xid))(1−0.7)×(1−0.4−0.1)×(1−0.4−0.2)
    Sj.ub0.228
    下载: 导出CSV

    表  6  OS-State 置信度上界、下界维护方法

    Tab.  6  The method of maintaining OS-State's upper bound and lower bound of confidence

    匹配Sj[i](1≤ik)匹配Sj[k]Sj[k] 不匹配
    Sj[0]保持DFS保持
    Sj[1]…Sj[ki−1]保持×fu(t.xid)×fu(t.xid)
    Sj[ki]Sj[ki]DFS×fu(t.xid)×fu(t.xid)
    Sj[ki]…Sj[k]×fu(t.xid)×fu(t.xid)×fu(t.xid)
    下载: 导出CSV

    表  7  OI-State置信度上界、下界维护方法(t表示最新访问的元组)

    Tab.  7  The method of maintaining OI-State's upper bound and lower bound of confidence (t means the accessing tuple)

    匹配$S_j[i](1\leqslant i\leqslant k)$不匹配
    lbDFS保持
    ub保持(ub--lb)$\times f_u(t.$xid)+lb
    下载: 导出CSV
    Procedure 1 OS-UTop-k!OI-UTop-k
    Require: Pre-sorted Uncertain Database udb, k
    1: N←0, P[·]0.0
    2: SL←empty state list
    3: while source is not exhausted do
    4: t←the next tuple of udb
    5: P[t.xid]←P[t.xid] + t.prob
    6: if P[t.xid] = 1.0 then
    7: NN + 1
    8: end if
    9: if N = k then
    10: quit the while loop
    11: end if
    12: Update states in SL by guidance as TABLE 6 or TABLE 7
    13: Prune the states with upper bound less than lower bound of another
    14: Generate States by t such as Procedure 2 and push into SL
    15: end while
    16: get the state with highest confidence as result from the SL
    下载: 导出CSV
    Procedure 2 generate OS-States by tuple t
    Require: p is the position of t in Pre-sorted Uncertain Database udb
    1: used←boolean array of false
    2: tmp←array storing xid
    3: used[t.xid]←TRUE
    4: DFS(0,0,p)
    procedure DFS(i, j, p)
    1: if jp then
    2: return
    3: end if
    4: if i = k − 1 then
    5: generate OS-States Sv by xids in tmp and push into SL
    6: calculate the (k + 1) groups of Sv as TABLE 5
    7: end if
    8: if used [udb[j].xid]=TRUE then
    9: tmp[i]←udb[j].xid
    10: used udb[j].xid←TRUE
    11: DFS(i + k, j + 1)
    12: used udb[j].xid −FALSE
    13: end if
    14: DFS(i, j + 1)
    下载: 导出CSV
    Procedure 3 generate OI-States by tuple t
    Require: p is the position of t in Pre-sorted Uncertain Database D
    1: if P[t.xid]>0.0 then
    2: quit the procedure
    3: end if
    4: tmp←array storing xid
    5: S←the list of seen objects' xid except t.xid
    6: DFS(0,0, p)
    procedure DFS(i, j, p)
    1: if i >p then
    2: return
    3: end if
    4: if j = k − 1 then
    5: generate OI-States Sv by xids in tmp and push into SL
    6: return
    7: end if
    8: tmp[j]←S[i]
    9: DFS(i + 1, j + 1, p)
    10:DFS(i + 1, j, p)
    下载: 导出CSV
  • [1] Data clustering: algorithms and applications[M]. CRC Press, 2013.
    [2] ZÖFLE A, EMRICH T, SCHMID K A, et al. Representative clustering of uncertain data [C]//Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014: 243-252.
    [3] LI L, ZHANG X, YU Z, et al. USOM: Mining and visualizing uncertain data based on self-organizing maps [C]//Machine Learning and Cybernetics (ICMLC), 2011 International Conference on. IEEE, 2011, 2: 804-809.
    [4] CONINX A, BONNEAU G P, DROULEZ J, et al. Visualization of uncertain scalar data fields using color scales and perceptually adapted noise [C]//Proceedings of the ACM SIGGRAPH Symposium on Applied Perception in Graphics and Visualization. ACM, 2011: 59-66.
    [5] AGGARWAL C C. Trio A System for Data Uncertainty and Lineage [M]//Managing and Mining Uncertain Data. Springer US, 2009: 1-35.
    [6] FUXMAN, A, FAZLI, E, AND MILLER, R. J. 2005. ConQuer: Efficient management of inconsistentdatabases. In Proceedings of the ACM SIGMOD International Conference on Management ofData (SIGMOD'05). ACM, New York, 155--166.
    [7] SINGH S, MAYFIELD C, MITTAL S, et al. Orion 2.0: native support for uncertain data [C]//Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008: 1239-1242.
    [8] BENJELLOUN O, SARMA A D, HAYWORTH C, et al. An introduction to ULDBs and the Trio system [J]. IEEE Data Engineering Bulletin, March 2006, 2006.
    [9] ABITEBOUL S, KANELLAKIS P, GRAHNE G. On the representation and querying of sets of possible worlds [M]. ACM, 1987.
    [10] BENJELLOUN O, SARMA A D, HALEVY A, et al. Databases with uncertainty and lineage [J]. The VLDB Journal, 2008, 17(2): 243-264.
    [11] SISTLA A P, WOLFSON O, CHAMBERLAIN S, et al. Querying the uncertain position of moving objects [M]//Temporal databases: research and practice. Springer Berlin Heidelberg, 1998: 310-337.
    [12] CHENG R, KALASHNIKOV D V, PRABHAKAR S. Querying imprecise data in moving object environments [J]. Knowledge and Data Engineering, IEEE Transactions on, 2004, 16(9): 1112-1127.
    [13] DE ALMEIDA V T, G\"{U]TING R H. Supporting uncertainty in moving objects in network databases [C]//Proceedings of the 13th annual ACM international workshop on Geographic information systems. ACM, 2005: 31-40.
    [14] SILBERSTEIN A S, BRAYNARD R, ELLIS C, et al. A sampling-based approach to optimizing top-k queries in sensor networks [C]//Data Engineering, 2006. ICDE'06. Proceedings of the 22nd International Conference on. IEEE, 2006: 68-68.
    [15] CONSIDINE J, LI F, KOLLIOS G, et al. Approximate aggregation techniques for sensor databases [C]//Data Engineering, 2004. Proceedings. 20th International Conference on. IEEE, 2004: 449-460.
    [16] JAYRAM T S, KRISHNAMURTHY R, RAGHAVAN S, et al. Avatar Information Extraction System [J]. IEEE Data Eng. Bull., 2006, 29(1): 40-48.
    [17] GUPTA R, SARAWAGI S. Creating probabilistic databases from information extraction models [C]//Proceedings of the international conference on Very large data bases. 2006, 32(2): 965.
    [18] ILYAS I F, BESKALES G, SOLIMAN M A. A survey of Top-$k$ query processing techniques in relational database systems [J]. ACM Computing Surveys (CSUR), 2008, 40(4): 11-11.
    [19] GETOOR L, DIEHL C P. Link mining: a survey [J]. ACM SIGKDD Explorations Newsletter, 2005, 7(2): 3-12.
    [20] HRISTIDIS V, KOUDAS N, PAPAKONSTANTINOU Y. PREFER: A system for the efficient execution of multi-parametric ranked queries [C]//ACM SIGMOD Record. ACM, 2001, 30(2): 259-270.
    [21] NATSEV A, CHANG Y C, SMITH J R, et al. Supporting incremental join queries on ranked inputs [C]//VLDB. 2001, 1: 281-290.
    [22] ILYAS I F, SHAH R, AREF W G, et al. Rank-aware query optimization [C]//Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM, 2004: 203-214.
    [23] LI C, CHANG K C C, ILYAS I F, et al. RankSQL: query algebra and optimization for relational top-k queries [C]//Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM, 2005: 131-142.
    [24] HUA M, PEI J, ZHANG W, et al. Ranking queries on uncertain data: a probabilistic threshold approach[C]//Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008: 673-686.
    [25] ZHANG X, CHOMICKI J. Semantics and evaluation of Top-$k $ queries in probabilistic databases[J]. Distributed and parallel databases, 2009, 26(1): 67-126.
  • 加载中
图(3) / 表(10)
计量
  • 文章访问数:  236
  • HTML全文浏览量:  69
  • PDF下载量:  523
  • 被引次数: 0
出版历程
  • 收稿日期:  2015-11-01
  • 刊出日期:  2017-01-25

目录

    /

    返回文章
    返回