中国综合性科技类核心期刊(北大核心)

中国科学引文数据库来源期刊(CSCD)

美国《化学文摘》(CA)收录

美国《数学评论》(MR)收录

俄罗斯《文摘杂志》收录

Message Board

Respected readers, authors and reviewers, you can add comments to this page on any questions about the contribution, review, editing and publication of this journal. We will give you an answer as soon as possible. Thank you for your support!

Name
E-mail
Phone
Title
Content
Verification Code
Issue 1
Jan.  2018
Turn off MathJax
Article Contents
WANG Zhen, LIN Xin. Data cleaning on probabilistic RDF database[J]. Journal of East China Normal University (Natural Sciences), 2018, (1): 76-90. doi: 10.3969/j.issn.1000-5641.2018.01.008
Citation: WANG Zhen, LIN Xin. Data cleaning on probabilistic RDF database[J]. Journal of East China Normal University (Natural Sciences), 2018, (1): 76-90. doi: 10.3969/j.issn.1000-5641.2018.01.008

Data cleaning on probabilistic RDF database

doi: 10.3969/j.issn.1000-5641.2018.01.008
  • Received Date: 2016-12-03
  • Publish Date: 2018-01-25
  • Due to the factors such as errors and noises in the process of obtaining and analyzing data, uncertain data arises in many domains, which has emerged as an important issue affecting the performance of data. Uncertain data can be stored in probabilistic databases and query facilities always yield answers with confidence. However, the accumulation and propagation of uncertainty may reduce the usability of the query results. As such, it is desirable to reduce the uncertainty of uncertain data. This paper aims at solving the problem how to promote the answers' certainty in RDF(resource description framework) graph query via crowdsourcing. The basic idea is to ask the crowd to decide whether the relationships represented by some edges are correct. In this paper, we introduce three different algorithms to select the edge which maximizes the uncertainty reduction. Finally, we verify these algorithms by experiments and show that unstable pruning algorithm and stable pruning algorithm perform better in term of efficiency.
  • loading
  • [1]
    DALVI N, SUCIU D. Efficient query evaluation on probabilistic databases[J]. The VLDB Journal-The International Journal on Very Large Data Bases, 2007, 16(4):523-544. doi:  10.1007/s00778-006-0004-3
    [2]
    ABITEBOUL S, SENELLART P. Querying and updating probabilistic information in XML[C]//Advances in Database Technology-EDBT 2006, International Conference on Extending Database Technology. DBLP, 2006:1059-1068.
    [3]
    DONG X, GABRILOVICH E, HEITZ G, et al. Knowledge vault:A web-scale approach to probabilistic knowledge fusion[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2014:601-610.
    [4]
    ANGLES R, GUTIERREZ C. Querying RDF data from a graph database perspective[C]//European Semantic Web Conference. Berlin:Springer, 2005:346-360.
    [5]
    FANG H, ZHANG X W. pSPARQL:A querying language for probabilistic RDF[C/OL]//Proceedings of the ISWC 2016 Posters & Demonstrations Track Co-Located with 15th International Semantic Web Conference, ISWC 2016.[2016-08-01]. http://ceur-ws.org/Vol-1690/paper18.pdf.
    [6]
    FUKUSHIGE Y. Representing probabilistic relations in RDF[C/OL]//Proceedings of the International Semantic Web Conference, ISWC 2005.[2016-08-01]. http://ceur-ws.org/Vol-173/pospaper5.pdf.
    [7]
    LIAN X, CHEN L. Efficient query answering in probabilistic rdf graphs[C]//Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, 2011:157-168.
    [8]
    HUANG H, LIU C F. Query evaluation on probabilistic RDF databases[C]//International Conference on Web Information Systems Engineering. Berlin:Springer, 2009:307-320.
    [9]
    UDREA O, SUBRAHMANIAN V S, MAJKIC Z. Probabilistic RDF[C]//Information Reuse and Integration, 2006 IEEE International Conference on. IEEE, 2006:172-177.
    [10]
    ZHANG C J, CHEN L, JAGADISH H V, et al. Reducing uncertainty of schema matching via crowdsourcing[J]. Proceedings of the VLDB Endowment, 2013, 6(9):757-768. doi:  10.14778/2536360
    [11]
    CHENG R, CHEN J C, XIE X K. Cleaning uncertain data with quality guarantees[J]. Proceedings of the VLDB Endowment, 2008(1):722-735. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.226.6845
    [12]
    LEE J, LEE D, HWANG S. CrowdK:Answering top-k queries with crowdsourcing[J]. Information Sciences, 2017, 399:98-120. doi:  10.1016/j.ins.2017.03.010
    [13]
    CICERI E, FRATERNALI P, MARTINENGHI D, et al. Crowdsourcing for top-k query processing over uncertain data[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(1):41-53. doi:  10.1109/TKDE.2015.2462357
    [14]
    VERROIOS V, GARCIA-MOLINA H. Entity resolution with crowd errors[C]//2015 IEEE 31st International Conference on Data Engineering. IEEE, 2015:219-230.
    [15]
    ZHANG C J, CHEN L, TONG Y, et al. Cleaning uncertain data with a noisy crowd[C]//2015 IEEE 31st International Conference on Data Engineering. IEEE, 2015:6-17.
    [16]
    MARCUS A, WU E, KARGER D, et al. Human-powered sorts and joins[J]. Proceedings of the VLDB Endowment, 2011, 5(1):13-24. doi:  10.14778/2047485
    [17]
    MARCUS A, WU E, KARGER D R, et al. Crowdsourced databases:Query processing with people[C/OL]//Proceedings of the 5th Biennial Conference on Innovative Data Systems Research, CIDR 2011.[1016-08-01]. http://www-db.cs.wisc.edu/cidr/cidr2011/Papers/CIDR11Paper29.pdf.
    [18]
    SUCHANEK F M, KASNECI G, WEIKUM G. Yago:a core of semantic knowledge[C]//Proceedings of the 16th International Conference on World Wide Web. ACM, 2007:697-706.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(12)  / Tables(5)

    Article views (209) PDF downloads(340) Cited by()
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return