中国综合性科技类核心期刊(北大核心)

中国科学引文数据库来源期刊(CSCD)

美国《化学文摘》(CA)收录

美国《数学评论》(MR)收录

俄罗斯《文摘杂志》收录

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

非阻塞事务型实时数据注入技术研究与实现

余楷 李志方 周敏奇 周傲英

余楷, 李志方, 周敏奇, 周傲英. 非阻塞事务型实时数据注入技术研究与实现[J]. 华东师范大学学报(自然科学版), 2016, (5): 131-143. doi: 10.3969/j.issn.1000-5641.2016.05.015
引用本文: 余楷, 李志方, 周敏奇, 周傲英. 非阻塞事务型实时数据注入技术研究与实现[J]. 华东师范大学学报(自然科学版), 2016, (5): 131-143. doi: 10.3969/j.issn.1000-5641.2016.05.015
YU Kai, LI Zhi-fang, ZHOU Min-qi, ZHOU Ao-ying. Research and implementation of transactional real-time data ingestion technology without blocking[J]. Journal of East China Normal University (Natural Sciences), 2016, (5): 131-143. doi: 10.3969/j.issn.1000-5641.2016.05.015
Citation: YU Kai, LI Zhi-fang, ZHOU Min-qi, ZHOU Ao-ying. Research and implementation of transactional real-time data ingestion technology without blocking[J]. Journal of East China Normal University (Natural Sciences), 2016, (5): 131-143. doi: 10.3969/j.issn.1000-5641.2016.05.015

非阻塞事务型实时数据注入技术研究与实现

doi: 10.3969/j.issn.1000-5641.2016.05.015
基金项目: 

国家自然科学基金重点项目(61332006), 上海市基金(13ZR1413200)

详细信息
    通讯作者:

    周敏奇, 男, 副教授. 研究方向为对等计算、云计算、分布式数据管理、内存数据管理系统. E-mail: mqzhou@sei.ecnu.edu.cn.

Research and implementation of transactional real-time data ingestion technology without blocking

  • 摘要: 伴随着大数据时代来临, 传统数据库系统已逐渐无法应对海量数据处理带来的挑战, 而分布式数据库系统得到了越来越多的部署和应用. 分布式数据库系统部署数据于多台机器上, 利用大规模并行计算技术实现了对海量数据的存储、管理和分析. 但针对金融领域严苛的事务型实时数据注入需求, 现有分布式数据库系统对其支持有限, 其主要原因在于利用锁和两阶段提交等方式实现分布式事务处理, 无法做到非阻塞式数据注入, 极大地影响了数据注入的性能. 华东师范大学数据科学与工程研究院自主研发的分布式内存数据库系统-----CLAIMS, 已能提供面向关系型数据集的实时数据分析服务, 但尚不能支持实时数据注入. 针对上述实时数据注入的问题, 本文重点分析了现有数据注入技术和基于分布式事务处理的实现方式, 设计了面向元数据的集中式事务处理策略, 利用无锁编程技术, 实现了支持分布式事务的高性能实时数据注入框架, 并通过热备机制实现系统的高可用性. 上述框架在 CLAIMS 系统中的实现, 经充分实验表明: 该框架能够实现高通量的事务型实时数据注入, 同时支持低延时的实时数据查询.
  • [1]

    [ 1 ] DEAN J, GHEMAWAT S. MapReduce: Simplified data processing on large clusters [J]. Communications of the ACM, 2008, 51(1): 107-113.
    [ 2 ] ZAHARIA M, CHOWDHURY M, FRANKLIN M J, et al. Spark: Cluster computing with working sets [C]//Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. Berkeley: USENIX Association, 2010: 10.
    [ 3 ] SHVACHKO K, KUANG H, RADIA S, et al. The hadoop distributed file system [C]//Proceedings of IEEE Conference on MSST. 2010: 1-10.
    [ 4 ] 胡健, 和轶东. SAP内存计算------HANA [M]. 北京: 清华大学出版社, 2013.
    [ 5 ] FARBER F, CHA S K, PRIMSCH J, et al. SAP HANA database: Data management for modern business applications [J]. ACM Sigmod Record, 2012, 40(4): 45-51.
    [ 6 ] GLIGOR G, TEODORU S. Oracle exalytics: Engineered for speed-of-thought analytics [J]. Database Systems Journal, 2011, 2(4): 3-8.
    [ 7 ] WANG L, ZHOU M Q, ZHANG Z J, et al. Elastic pipelining in in-memory DataBase cluster [R]. 2016.
    [ 8 ] TRAVERSO M. Presto: Interacting with petabytes of data at Facebook [EB/OL].(2013-11-07)[2016-06-10]. http://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920.
    [ 9 ] ARMBRUST M, XIN R S, LIAN C, et al. Spark SQL: Relational data processing in spark [C]//Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015: 1383-1394.
    [10] YANG F, TSCHETTER E, LEAUTE X, et al. Druid: A real-time analytical data store [C]//Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014.
    [11] GARCIA-MOLINA H, ULLMAN J D, WIDOM J. Database System Implementation [M]. Upper Saddle River, NJ: Prentice Hall, 2000.
    [12] NAGA P N. Real-time Analytics at Massive Scale with Pinot [EB/OL]. [2016-06-10]. https://engineering.linkedin.com/analytics/real-time-analytics-massive-scale-pinot.
    [13] KREPS J, NARKHEDE N, RAO J, et al. Kafka: A distributed messaging system for log processing [C]//Proceedings of the NetDB. 2011: 1-7.
    [14] LAMB A, FULLER M, VARADARAJAN R, et al. The vertica analytic database: C-store 7 years later [C]//Proceedings of the VLDB Endowment. 2012: 1790-1801.
    [15] CHANG L, WANG Z, MA T, et al. Hawq: A massively parallel processing sql engine in hadoop [C]//Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014.
    [16] STONEBRAKER M, WEISBERG A. The VoltDB main memory DBMS [J]. IEEE Data Eng Bull, 2013: 21-27.
    [17] BRYANT R E, O′HALLARON D R. 深入理解计算机系统[M], 北京:机械工业出版社,2013.
    [18] ESWARAN K P, GRAY J N, LORIE R A, et al. The notions of consistency and predicate locks in a database system [J]. Communications of the ACM, 1976, 19(11): 624-633.
    [19] STONEBRAKER M. One Size Fits None-(Everything You Learned in Your DBMS Class is Wrong) [R/OL]. (2013-05-30)[2016-07-01]. http://slideshot.epfl.ch/talks/166.
    [20] WEIKUM G, VOSSEN G. Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery [M]. San Francisco: Morgan Kaufmann Publishers, 2002.
    [21] DIACONU C, FREEDMAN C, ISMERT E, et al. Hekaton: SQL server’s memory-optimized OLTP engine [C]//Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 2013.
    [22] MICHAEL M M. High performance dynamic lock-free hash tables and list-based sets [C]//Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures. 2002: 73-82.
    [23] LAMPSON BW, STURGIS H E. Crash Recovery in a Distributed Data Storage System [R]. Palo Alto, California: Xerox Palo Alto Research Center, 1979.
    [24] SKEEN D. Nonblocking commit protocols [C]//Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data. 1981.
    [25] HAN J, HAIHONG E, LE G, et al. Survey on NoSQL database [C]//Proceedings of the 2011 6th International Conference on Pervasive Computing and Applications. 2011: 363-366.
    [26] O’NEIL E J, O’NEIL P E, WEIKUM G. The LRU-K page replacement algorithm for database disk buffering [C]//Proceedings of the ACM SIGMOD International Conference on Management of Data. 1993: 297-306.

  • 加载中
计量
  • 文章访问数:  307
  • HTML全文浏览量:  9
  • PDF下载量:  532
  • 被引次数: 0
出版历程
  • 收稿日期:  2016-06-27
  • 刊出日期:  2016-09-25

目录

    /

    返回文章
    返回