中国综合性科技类核心期刊(北大核心)

中国科学引文数据库来源期刊(CSCD)

美国《化学文摘》(CA)收录

美国《数学评论》(MR)收录

俄罗斯《文摘杂志》收录

Message Board

Respected readers, authors and reviewers, you can add comments to this page on any questions about the contribution, review, editing and publication of this journal. We will give you an answer as soon as possible. Thank you for your support!

Name
E-mail
Phone
Title
Content
Verification Code
Issue 5
Nov.  2016
Turn off MathJax
Article Contents
YU Kai, LI Zhi-fang, ZHOU Min-qi, ZHOU Ao-ying. Research and implementation of transactional real-time data ingestion technology without blocking[J]. Journal of East China Normal University (Natural Sciences), 2016, (5): 131-143. doi: 10.3969/j.issn.1000-5641.2016.05.015
Citation: YU Kai, LI Zhi-fang, ZHOU Min-qi, ZHOU Ao-ying. Research and implementation of transactional real-time data ingestion technology without blocking[J]. Journal of East China Normal University (Natural Sciences), 2016, (5): 131-143. doi: 10.3969/j.issn.1000-5641.2016.05.015

Research and implementation of transactional real-time data ingestion technology without blocking

doi: 10.3969/j.issn.1000-5641.2016.05.015
  • Received Date: 2016-06-27
  • Publish Date: 2016-09-25
  • With the advent of big data era, traditional database systems are facing difficulties in satisfying the new challenges brought by massive data processing, while distributed database systems have been deployed widely in real applications. Distributed database systems partitioned and the dispatched the data across machines under a designed scheme and analyzed all the massive data in massive parallel manner. In facing of the requirements of the transactional real-time data ingestion from financial field, distributed database systems are ineffective and inefficient due to their implementation ofthe distributed transaction processing based on the lock and two-phase commit, which lead to the impossibility of non-blocking data ingestion. CLAIMS is a distributed in-memory database system designed and implemented by Institute for Data Science and Engineering of ECNU. It supports real-time data analysis towards relational data set but is incapable of real-time data ingestion. To address these problems, we analyzed data ingestion technology and distributed transaction processing algorithms first, and proposed to mimic the transactional data ingestion in the distributed environment with the centralized transaction processing based on meta data, and eventually achieved the real-time data ingestion with high availability and without blocking. The experiment results with the implementation of the proposed algorithms in CLAIMS proved that the proposed framework could achieve high throughput transactional real-time data ingestion as well as low latency real-time query processing.
  • loading
  • [1]

    [ 1 ] DEAN J, GHEMAWAT S. MapReduce: Simplified data processing on large clusters [J]. Communications of the ACM, 2008, 51(1): 107-113.
    [ 2 ] ZAHARIA M, CHOWDHURY M, FRANKLIN M J, et al. Spark: Cluster computing with working sets [C]//Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. Berkeley: USENIX Association, 2010: 10.
    [ 3 ] SHVACHKO K, KUANG H, RADIA S, et al. The hadoop distributed file system [C]//Proceedings of IEEE Conference on MSST. 2010: 1-10.
    [ 4 ] 胡健, 和轶东. SAP内存计算------HANA [M]. 北京: 清华大学出版社, 2013.
    [ 5 ] FARBER F, CHA S K, PRIMSCH J, et al. SAP HANA database: Data management for modern business applications [J]. ACM Sigmod Record, 2012, 40(4): 45-51.
    [ 6 ] GLIGOR G, TEODORU S. Oracle exalytics: Engineered for speed-of-thought analytics [J]. Database Systems Journal, 2011, 2(4): 3-8.
    [ 7 ] WANG L, ZHOU M Q, ZHANG Z J, et al. Elastic pipelining in in-memory DataBase cluster [R]. 2016.
    [ 8 ] TRAVERSO M. Presto: Interacting with petabytes of data at Facebook [EB/OL].(2013-11-07)[2016-06-10]. http://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920.
    [ 9 ] ARMBRUST M, XIN R S, LIAN C, et al. Spark SQL: Relational data processing in spark [C]//Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015: 1383-1394.
    [10] YANG F, TSCHETTER E, LEAUTE X, et al. Druid: A real-time analytical data store [C]//Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014.
    [11] GARCIA-MOLINA H, ULLMAN J D, WIDOM J. Database System Implementation [M]. Upper Saddle River, NJ: Prentice Hall, 2000.
    [12] NAGA P N. Real-time Analytics at Massive Scale with Pinot [EB/OL]. [2016-06-10]. https://engineering.linkedin.com/analytics/real-time-analytics-massive-scale-pinot.
    [13] KREPS J, NARKHEDE N, RAO J, et al. Kafka: A distributed messaging system for log processing [C]//Proceedings of the NetDB. 2011: 1-7.
    [14] LAMB A, FULLER M, VARADARAJAN R, et al. The vertica analytic database: C-store 7 years later [C]//Proceedings of the VLDB Endowment. 2012: 1790-1801.
    [15] CHANG L, WANG Z, MA T, et al. Hawq: A massively parallel processing sql engine in hadoop [C]//Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014.
    [16] STONEBRAKER M, WEISBERG A. The VoltDB main memory DBMS [J]. IEEE Data Eng Bull, 2013: 21-27.
    [17] BRYANT R E, O′HALLARON D R. 深入理解计算机系统[M], 北京:机械工业出版社,2013.
    [18] ESWARAN K P, GRAY J N, LORIE R A, et al. The notions of consistency and predicate locks in a database system [J]. Communications of the ACM, 1976, 19(11): 624-633.
    [19] STONEBRAKER M. One Size Fits None-(Everything You Learned in Your DBMS Class is Wrong) [R/OL]. (2013-05-30)[2016-07-01]. http://slideshot.epfl.ch/talks/166.
    [20] WEIKUM G, VOSSEN G. Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery [M]. San Francisco: Morgan Kaufmann Publishers, 2002.
    [21] DIACONU C, FREEDMAN C, ISMERT E, et al. Hekaton: SQL server’s memory-optimized OLTP engine [C]//Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 2013.
    [22] MICHAEL M M. High performance dynamic lock-free hash tables and list-based sets [C]//Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures. 2002: 73-82.
    [23] LAMPSON BW, STURGIS H E. Crash Recovery in a Distributed Data Storage System [R]. Palo Alto, California: Xerox Palo Alto Research Center, 1979.
    [24] SKEEN D. Nonblocking commit protocols [C]//Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data. 1981.
    [25] HAN J, HAIHONG E, LE G, et al. Survey on NoSQL database [C]//Proceedings of the 2011 6th International Conference on Pervasive Computing and Applications. 2011: 363-366.
    [26] O’NEIL E J, O’NEIL P E, WEIKUM G. The LRU-K page replacement algorithm for database disk buffering [C]//Proceedings of the ACM SIGMOD International Conference on Management of Data. 1993: 297-306.

  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索
    Article views (308) PDF downloads(532) Cited by()
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return