中国综合性科技类核心期刊(北大核心)

中国科学引文数据库来源期刊(CSCD)

美国《化学文摘》(CA)收录

美国《数学评论》(MR)收录

俄罗斯《文摘杂志》收录

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于GPU的关系型流处理系统实现与优化

黄皓 李志方 王嘉伦 翁楚良

黄皓, 李志方, 王嘉伦, 翁楚良. 基于GPU的关系型流处理系统实现与优化[J]. 华东师范大学学报(自然科学版), 2019, (5): 178-189. doi: 10.3969/j.issn.1000-5641.2019.05.015
引用本文: 黄皓, 李志方, 王嘉伦, 翁楚良. 基于GPU的关系型流处理系统实现与优化[J]. 华东师范大学学报(自然科学版), 2019, (5): 178-189. doi: 10.3969/j.issn.1000-5641.2019.05.015
HUANG Hao, LI Zhi-fang, WANG Jia-lun, WENG Chu-liang. Implementation and optimization of GPU-based relational streaming processing systems[J]. Journal of East China Normal University (Natural Sciences), 2019, (5): 178-189. doi: 10.3969/j.issn.1000-5641.2019.05.015
Citation: HUANG Hao, LI Zhi-fang, WANG Jia-lun, WENG Chu-liang. Implementation and optimization of GPU-based relational streaming processing systems[J]. Journal of East China Normal University (Natural Sciences), 2019, (5): 178-189. doi: 10.3969/j.issn.1000-5641.2019.05.015

基于GPU的关系型流处理系统实现与优化

doi: 10.3969/j.issn.1000-5641.2019.05.015
基金项目: 

国家重点研发计划 2018YFB1003400

详细信息
    作者简介:

    黄皓, 男, 硕士研究生, 研究方向为内存型数据库.E-mail:haohuang@stu.ecnu.edu.cn

    通讯作者:

    翁楚良, 男, 教授, 博士生导师, 研究方向为并行与分布式系统.E-mail:clweng@dase.ecnu.edu.cn

  • 中图分类号: TP315

Implementation and optimization of GPU-based relational streaming processing systems

  • 摘要: 现有的基于CPU的流处理系统在功能上已支持在大规模数据集上的复杂分析查询,但由于CPU计算能力与特性的限制,无法在性能上同时满足高吞吐量和低响应时间的要求.本文提出一种基于GPU的流处理系统框架Serval,通过充分利用CPU-GPU异构资源,实现了关系型流查询的高效处理.Serval框架采用流水线模型和流执行缓存技术以优化吞吐量和响应时间,并实现多种调优策略以适应不同场景.实验表明,单节点Serval的吞吐量与响应时间性能均优于现有GPU数据库MapD和三节点分布式服务器上的Spark Streaming.
  • 图  1  Serval架构总览

    Fig.  1  Architecture overview of Serval

    图  2  流水线模型

    Fig.  2  Pipeline model

    图  3  与现有流处理系统对比:吞吐量

    Fig.  3  Comparison of different streaming processing systems: throughput

    图  4  与现有流处理系统对比:响应时间

    Fig.  4  Comparison of different streaming processing systems: latency

    图  5  Q14 (SF100)执行分析

    Fig.  5  Q14 (SF100) execution analysis

    图  6  执行时间花销分析

    Fig.  6  Execution time cost analysis

  • [1] ZAHARIA M, DAS T, LI H Y, et al. Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters[C]//Proceedings of the 4th Workshop on Hot Topics in Cloud Computing. USENIX Association, 2012.
    [2] IQBAL M H, SOOMRO T R. Big data analysis:Apache storm perspective[J]. International Journal of Computer Trends and Technology, 2015, 19(1):9-14. doi:  10.14445/22312803/IJCTT-V19P103
    [3] BRESS S, KÖCHER B, HEIMEL M, et al. Ocelot/HyPE:Optimized data processing on heterogeneous hardware[J]. Proceedings of the VLDB Endowment, 2014, 7(13):1609-1612. doi:  10.14778/2733004.2733042
    [4] CARBONE P, KATSIFODIMOS A, EWEN S, et al. Apache Flink:Stream and batch processing in a single engine[J]. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2015, 36(4):28-38. http://cn.bing.com/academic/profile?id=af3e8fbd6cabf93159674a3b0713e6b1&encoded=0&v=paper_preview&mkt=zh-cn
    [5] ZHANG S, HE J, HE B, et al. Omnidb:Towards portable and efficient query processing on parallel CPU/GPU architectures[J]. Proceedings of the VLDB Endowment, 2013, 6(12):1374-1377. doi:  10.14778/2536274.2536319
    [6] CHEN C, LI K, OUYANG A, et al. GFlink:An in-memory computing architecture on heterogeneous CPU-GPU clusters for big data[J]. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(6):1275-1288. doi:  10.1109/TPDS.2018.2794343
    [7] Nvidia Cooperation. CUDA C Programming Guide[R/OL].(2018-04-01)[2019-05-02]. https://docs.nvidia.com/cuda/archive/9.1/pdf/CUDACProgrammingGuide.pdf.
    [8] BRESS S, HEIMEL M, SIEGMUND N, et al. GPU-accelerated database systems: Survey and open challenges[M]//Transactions on Large-Scale Data and Knowledge-Centered Systems XV. Berlin: Springer, 2014: 1-35.
    [9] MOSTAK T. An overview of MapD (massively parallel database)[R]. White paper. Massachusetts Institute of Technology, 2013.
    [10] ROOT C, MOSTAK T. MapD: A GPU-powered big data analytics and visualization platform[C]//ACM SIGGRAPH 2016 Talks. ACM, 2016: 73.
    [11] Kinetica DB Inc. Kinetica high performance analytics database[EB/OL].[2019-05-11]. https://www.kinetica.com.
    [12] SQream Technologies. SQream: Big Data SQL database[EB/OL].[2019-05-02]. https://sqream.com/.
    [13] CHEN Z, XU J, TANG J, et al. GPU-accelerated high-throughput online stream data processing[J]. IEEE Transactions on Big Data, 2016, 4(2):191-202. http://cn.bing.com/academic/profile?id=0bbd388e2d6d6e5d91b2beb0d7b08246&encoded=0&v=paper_preview&mkt=zh-cn
    [14] CHEN C, LI K, OUYANG A, et al. GFlink:An in-memory computing architecture on heterogeneous CPU-GPU clusters for big data[J]. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(6):1275-1288. doi:  10.1109/TPDS.2018.2794343
    [15] ZHANG Y, MUELLER F. GStream: A general-purpose data streaming framework on GPU clusters[C]//2011 International Conference on Parallel Processing. IEEE, 2011: 245-254.
    [16] KIM J, SEO S, LEE J, et al. SnuCL: An OpenCL framework for heterogeneous CPU/GPU clusters[C]//Proceedings of the 26th ACM International Conference on Supercomputing. ACM, 2012: 341-352.
    [17] HEWANADUNGODAGE C, XIA Y, LEE J J. GStreamMiner: A GPU-accelerated data stream mining framework[C]//Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 2016: 2489-2492.
    [18] HUYNH H P, HAGIESCU A, WONG W F, et al. Scalable framework for mapping streaming applications onto multi-GPU systems[C]//ACM Sigplan Notices. ACM, 2012, 47(8): 1-10.
  • 加载中
图(6)
计量
  • 文章访问数:  114
  • HTML全文浏览量:  93
  • PDF下载量:  1
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-07-29
  • 刊出日期:  2019-09-25

目录

    /

    返回文章
    返回