中国综合性科技类核心期刊(北大核心)

中国科学引文数据库来源期刊(CSCD)

美国《化学文摘》(CA)收录

美国《数学评论》(MR)收录

俄罗斯《文摘杂志》收录

Message Board

Respected readers, authors and reviewers, you can add comments to this page on any questions about the contribution, review, editing and publication of this journal. We will give you an answer as soon as possible. Thank you for your support!

Name
E-mail
Phone
Title
Content
Verification Code
Issue 5
Sep.  2017
Turn off MathJax
Article Contents
CHEN Ming-zhu, WANG Xiao-tong, FANG Jun-hua, ZHANG Rong. Distributed stream processing system for join operations[J]. Journal of East China Normal University (Natural Sciences), 2017, (5): 11-19. doi: 10.3969/j.issn.1000-5641.2017.05.002
Citation: CHEN Ming-zhu, WANG Xiao-tong, FANG Jun-hua, ZHANG Rong. Distributed stream processing system for join operations[J]. Journal of East China Normal University (Natural Sciences), 2017, (5): 11-19. doi: 10.3969/j.issn.1000-5641.2017.05.002

Distributed stream processing system for join operations

doi: 10.3969/j.issn.1000-5641.2017.05.002
  • Received Date: 2017-06-28
  • Publish Date: 2017-09-25
  • Real-time stream processing system plays an increasingly important role in practical applications. Stream Join constitutes one of the most important and expensive operation in big data analysis. However, skewed data distribution in real-world applications and inherent features of streaming data, such as infinity and unpredictability, put great pressure on the join processing in distributed stream systems. Mainstream industrial stream systems have low versatility on join processing, providing no programming interface; though several academic stream prototype systems solve such a problem to a certain extent, they support equi-join processing only, or results in high resource utilization and severe load imbalance. In this paper, after analyzing three typical distributed stream systems, we integrate the techniques based on Join-Matrix into Storm, design and implement a general stream processing system which supports arbitrary theta joins. Experiments demonstrate that the system proposed in this paper outperforms the static-of-the-art strategies.
  • loading
  • [1]
    ANKIT T, SIDDARTH T, AMIT S, et al. Storm@Twitter[C]//Proceedings of SIGMOD International Conference on Management of Data. ACM, 2014:147-156.
    [2]
    LEONARDO N, BRUCE R, ANISH N, et al. S4:Distributed stream computing platform[C]//Proceedings of the International Conference on Data Mining Workshops, 2010:170-177.
    [3]
    CHEN G J, WIENER J L, IYER S, et al. Realtime data processing at Facebook[C]//Proceedings of SIGMOD International Conference on Management of Data. ACM, 2016:1087-1098.
    [4]
    WILSCHUT A N, APERS P M G. Dataflow query execution in a parallel main-memory environment[J]. Distributed and Parallel Databases, 1993(1):103-123. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=183069&contentType=Conference+Publications&sortType%3Dasc_p_Sequence%26filter%3DAND%28p_IS_Number%3A4715%29
    [5]
    URHAN T, FRANKLINM J. Dynamic pipeline scheduling for improving interactive query performance[C]//Proceedings of International Conference on Very Large Data Bases. 2001:501-510.
    [6]
    IVES Z G, FLORESCU D, FRIEDMAN M, et al. An adaptive query execution system for data integration[C]//Proceedings of SIGMOD International Conference on Management of Data. ACM, 1999:299-310.
    [7]
    TAO Y F, YIU M L, PAPADIAS D, et al. RPJ:Producing fast join results on streams through rate-based optimization[C]//Proceedings of SIGMOD International Conference on Management of Data. ACM, 2005:371-382.
    [8]
    MOKBEL M F, LU M, AREF W G. Hash-merge join:A non-blocking join algorithm for producing fast and early join results[C]//Proceedings of the 20th International Conference on Data Engineering. 2004:251-262.
    [9]
    ANANTHANARAYANAN R, BASKER V, DAS S, et al. Photon:Fault-tolerant and scalable joining of continuous data streams[C]//Proceedings of SIGMOD International Conference on Management of Data. ACM, 2013:577-588.
    [10]
    ZAHARIA M, DAS T, LI H Y, et al. Discretized streams:Fault-tolerant streaming computation at scale[C]//Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013:423-438.
    [11]
    QIAN Z P, HE Y, SU C Z, et al. TimeStream:Reliable stream computation in the cloud[C]//Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 2013:1-14.
    [12]
    ELSEIDY M, ELGUINDY A, VITOROVIC A, et al. Scalable and adaptive online joins[C]//Proceedings of International Conference on Very Large Data Bases, 2014(7):441-452.
    [13]
    LIN Q, OOI B C, WANG Z K, et al. Scalable distributed stream join processing[C]//Proceedings of ACM SIGMOD International Conference on Management of Data. ACM, 2015:811-825.
    [14]
    GOODHOPE K, KOSHY J, KREPS J, et al. Building linkedin's real-time activity data pipeline[J]. IEEE Data Eng Bull, 2012, 35(2):33-45. http://sites.computer.org/debull/A12june/pipeline.pdf
    [15]
    REDIS.[DB/OL].[2017-06-01]. https://redis.io/.
    [16]
    ANGULAR JS.[EB/OL].[2017/06-01]. https://angularjs.org/.
    [17]
    FANG J H, ZHANG R, WANG X T, et al. Distributed stream join under workload variance[J]. World Wide Web Journal, 2017:1-22. doi:  10.1007%2Fs11280-017-0431-7.pdf
    [18]
    FANG J H, WANG X T, ZHANG R, et al. Flexible and adaptive stream join algorithm[C]//Proceedings of International Conference on Asia-Pacific Web, 2016:3-16.
    [19]
    FANG J H, ZHANG R, WANG X T, et al. Cost-effective stream join algorithm on cloud system[C]//Proceedings of CIKM International Conference on Information and Knowledge Management. ACM, 2016:1773-1782.
    [20]
    TPC-H BENCHMARK.[EB/OL].[2017-06-01]. http://www.tpc.org/tpch.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(5)  / Tables(1)

    Article views (165) PDF downloads(381) Cited by()
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return