中国综合性科技类核心期刊(北大核心)

中国科学引文数据库来源期刊(CSCD)

美国《化学文摘》(CA)收录

美国《数学评论》(MR)收录

俄罗斯《文摘杂志》收录

Message Board

Respected readers, authors and reviewers, you can add comments to this page on any questions about the contribution, review, editing and publication of this journal. We will give you an answer as soon as possible. Thank you for your support!

Name
E-mail
Phone
Title
Content
Verification Code
Issue 5
Dec.  2019
Turn off MathJax
Article Contents
HUANG Hao, LI Zhi-fang, WANG Jia-lun, WENG Chu-liang. Implementation and optimization of GPU-based relational streaming processing systems[J]. Journal of East China Normal University (Natural Sciences), 2019, (5): 178-189. doi: 10.3969/j.issn.1000-5641.2019.05.015
Citation: HUANG Hao, LI Zhi-fang, WANG Jia-lun, WENG Chu-liang. Implementation and optimization of GPU-based relational streaming processing systems[J]. Journal of East China Normal University (Natural Sciences), 2019, (5): 178-189. doi: 10.3969/j.issn.1000-5641.2019.05.015

Implementation and optimization of GPU-based relational streaming processing systems

doi: 10.3969/j.issn.1000-5641.2019.05.015
  • Received Date: 2019-07-29
  • Publish Date: 2019-09-25
  • State-of-the-art CPU-based streaming processing systems support complex queries on large-scale datasets. However, limited by CPU computational capability, these systems suffer from the performance tradeoff between throughput and response time, and cannot achieve the best of both. In this paper, we propose a GPU-based streaming processing system, named Serval, that co-utilizes CPU and GPU resources and efficiently processes streaming queries by micro-batching. Serval adopts the pipeline model and uses streaming execution cache to optimize throughput and response time on large scale datasets. To meet the demands of various scenarios, Serval implements multiple tuning policies by scaling the micro-batch size dynamically. Experiments show that a single-server Serval outperforms a 3-server distributed Spark Streaming by 3.87x throughput with a 91% response time on average, reflecting the efficiency of the optimization.
  • loading
  • [1]
    ZAHARIA M, DAS T, LI H Y, et al. Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters[C]//Proceedings of the 4th Workshop on Hot Topics in Cloud Computing. USENIX Association, 2012.
    [2]
    IQBAL M H, SOOMRO T R. Big data analysis:Apache storm perspective[J]. International Journal of Computer Trends and Technology, 2015, 19(1):9-14. doi:  10.14445/22312803/IJCTT-V19P103
    [3]
    BRESS S, KÖCHER B, HEIMEL M, et al. Ocelot/HyPE:Optimized data processing on heterogeneous hardware[J]. Proceedings of the VLDB Endowment, 2014, 7(13):1609-1612. doi:  10.14778/2733004.2733042
    [4]
    CARBONE P, KATSIFODIMOS A, EWEN S, et al. Apache Flink:Stream and batch processing in a single engine[J]. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2015, 36(4):28-38. http://cn.bing.com/academic/profile?id=af3e8fbd6cabf93159674a3b0713e6b1&encoded=0&v=paper_preview&mkt=zh-cn
    [5]
    ZHANG S, HE J, HE B, et al. Omnidb:Towards portable and efficient query processing on parallel CPU/GPU architectures[J]. Proceedings of the VLDB Endowment, 2013, 6(12):1374-1377. doi:  10.14778/2536274.2536319
    [6]
    CHEN C, LI K, OUYANG A, et al. GFlink:An in-memory computing architecture on heterogeneous CPU-GPU clusters for big data[J]. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(6):1275-1288. doi:  10.1109/TPDS.2018.2794343
    [7]
    Nvidia Cooperation. CUDA C Programming Guide[R/OL].(2018-04-01)[2019-05-02]. https://docs.nvidia.com/cuda/archive/9.1/pdf/CUDACProgrammingGuide.pdf.
    [8]
    BRESS S, HEIMEL M, SIEGMUND N, et al. GPU-accelerated database systems: Survey and open challenges[M]//Transactions on Large-Scale Data and Knowledge-Centered Systems XV. Berlin: Springer, 2014: 1-35.
    [9]
    MOSTAK T. An overview of MapD (massively parallel database)[R]. White paper. Massachusetts Institute of Technology, 2013.
    [10]
    ROOT C, MOSTAK T. MapD: A GPU-powered big data analytics and visualization platform[C]//ACM SIGGRAPH 2016 Talks. ACM, 2016: 73.
    [11]
    Kinetica DB Inc. Kinetica high performance analytics database[EB/OL].[2019-05-11]. https://www.kinetica.com.
    [12]
    SQream Technologies. SQream: Big Data SQL database[EB/OL].[2019-05-02]. https://sqream.com/.
    [13]
    CHEN Z, XU J, TANG J, et al. GPU-accelerated high-throughput online stream data processing[J]. IEEE Transactions on Big Data, 2016, 4(2):191-202. http://cn.bing.com/academic/profile?id=0bbd388e2d6d6e5d91b2beb0d7b08246&encoded=0&v=paper_preview&mkt=zh-cn
    [14]
    CHEN C, LI K, OUYANG A, et al. GFlink:An in-memory computing architecture on heterogeneous CPU-GPU clusters for big data[J]. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(6):1275-1288. doi:  10.1109/TPDS.2018.2794343
    [15]
    ZHANG Y, MUELLER F. GStream: A general-purpose data streaming framework on GPU clusters[C]//2011 International Conference on Parallel Processing. IEEE, 2011: 245-254.
    [16]
    KIM J, SEO S, LEE J, et al. SnuCL: An OpenCL framework for heterogeneous CPU/GPU clusters[C]//Proceedings of the 26th ACM International Conference on Supercomputing. ACM, 2012: 341-352.
    [17]
    HEWANADUNGODAGE C, XIA Y, LEE J J. GStreamMiner: A GPU-accelerated data stream mining framework[C]//Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 2016: 2489-2492.
    [18]
    HUYNH H P, HAGIESCU A, WONG W F, et al. Scalable framework for mapping streaming applications onto multi-GPU systems[C]//ACM Sigplan Notices. ACM, 2012, 47(8): 1-10.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(6)

    Article views (114) PDF downloads(1) Cited by()
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return