Distributed secondary index based on LSM Tree
-
摘要: 近年来 Log-Structured-Merge(LSM) Tree 在 NoSQL 系统中得到了广泛地应用. 主要是因为 LSM Tree 架构提出了延迟更新和批量写入的算法, 将随机写转换为批量写, 减少了磁盘臂的移动开销, 从而大大地提升了数据库的写入性能. 然而, 读性能却也因此受到影响. LSM Tree 和 B Tree 之间的本质区别使得 NoSQL 系统不适宜直接引用 B Tree 作为辅助索引结构. 本文实现了 LSM Tree 下的一种分布式辅助索引结构, 提出针对这种读写分离架构的索引批量加载策略, 并对 LSM Tree 的查询计划树进行了缓冲优化, 避免了重复的查询解析, 使得索引读的性能得到了相应的提升.Abstract: In recent years, Log-Structured-Merge Tree has been widely used in NoSQL systems. This is mainly because it has proposed two algorithms: update delayed and batch write, convert random write to batch write, reducing the cost of moving the disk arm therefore the write performance of database has been enhanced greatly. However, the read performance of database has also been affected negatively. The essential difference between LSM Tree and B Tree makes NoSQL not suitable for using B Tree as index structure directly. This paper implements a distributed secondary index based on LSM Tree, and proposes a bulk loading method in this read and write separation architecture. We also do lots of works on the optimization of index query plan to avoid repeatly query parsing IO so that the performance of index read has been greatly improved.
-
Key words:
- Secondary Index LSM Tree NoSQL /
-
[1] [1] APACHE ORG. Apache HBase[EB/OL]. [2016-07-07]. https://hbase.apache.org/.
[2] LAKSHMAN A, MALIK P. Cassandra: A decentralized structured storage system[J]. ACM SIGOPS Operating Systems Review, 2010, 44(2): 35-40.
[3] O’NEIL P, CHENG E, GAWLICK D, et al. The log-structured merge-tree (LSM-tree)[J]. Acta Informatica, 1996, 33(4): 351-385.
[4] HUAWEI. Secondary index in HBase[EB/OL]. [2016-07-07]. https://github.com/Huawei-Hadoop/hindex.
[5] CORBETT J C, DEAN J, EPSTEIN M, ET A L. Spanner: Google’s globally distributed database[J]. ACM Transactions on Computer Systems (TOCS), 2013, 31(3): 8.
[6] CHEN G, VO H T, WU S, et al. A framework for supporting DBMS-like indexes in the cloud[J]. Proceedings of The Vldb Endowment, 2011, 4(11): 702-713.
[7] 翁海星, 宫学庆, 朱燕超,等. 集群环境下分布式索引的实现[J]. 计算机应用,2016, 36(1): 1-7.
[8] CHANG F, DEAN J, GHEMAWAT S, et al. Bigtable: A distributed storage system for structured data[J]. ACM Transactions on Computer Systems, 2008, 26(2): 4.
[9] TAN W, TATA S, TANG Y, et al. Diff-index: differentiated index in distributed log-structured data stores[C]. Extending Database Technology, 2014: 700-711.
[10] 阳振坤. OceanBase~关系数据库架构[J]. 华东师范大学学报(自然科学版), 2014(5): 141-148.
[11] 孟必平, 王腾蛟, 李红燕, 等. 分片位图索引:一种适用于云数据管理的辅助索引机制[J]. 计算机学报, 2012, 35(11):2306-2316.
[12] 黄贵, 庄明强. OceanBase分布式存储引擎[J]. 华东师范大学学报(自然科学版),2014 (5): 164-172.
[13] ALIBABA INC. OceanBase[Z/OL].[2016-07-07]. https://github.com/alibaba/oceanbase/tree/master/oceanbase 0.4.
[14] 杨传辉.大规模分布式存储系统: 原理解析与架构实战[M].北京:机械工业出版社, 2013.
[15] COOPER B F, SILBERSTEIN A, TAM E, et al. Benchmarking cloud serving systems with YCSB[C]// Proceedings of the 1st ACM Symposium on Cloud Computing. ACM, 2010: 143-154.
计量
- 文章访问数: 376
- HTML全文浏览量: 18
- PDF下载量: 900
- 被引次数: 0