Implementation of LevelDB-based secondary index on two-dimensional data
-
摘要: 随着科学研究中产生的空间数据尤其是二维数据量级的增长和NoSQL型数据库技术的发展,越来越多的空间数据被存储到NoSQL数据库中.LevelDB是一款开源的Key-Value型NoSQL数据库,由于它基于LSM架构并拥有较好的写入性能而被广泛应用.但是Key-Value结构的局限性使其无法有效地索引空间数据,对于这个问题本文提出了一种基于LevelDB和R-tree的二级索引,使其可以支持二维数据的索引和近邻查询.实验结果表明该结构有较好的可用性.
-
关键词:
- Key-Value数据库 /
- 二级索引 /
- R-tree
Abstract: With the growth of spatial data generated by scientific research, especially two-dimensional data and the ongoing development of NoSQL-type database technology, more and more spatial data is now stored in NoSQL databases. LevelDB is an open source Key-Value NoSQL database that is widely used because it offers excellent write performance based on LSM architecture. Given the limitations of the Key-Value structure, it is impossible to index spatial data effectively. For this problem, a LevelDB and R-treebased secondary index was proposed to support spatial two-dimensional data indexing and neighbor queries. Experimental results show that the structure has good usability.-
Key words:
- Key-Value database /
- Seconday index /
- R-tree
-
算法1 二级索引插入算法 输入:待插入的坐标cord和对应的key 输出:二级索引数据文件 1:解析和封装输入的key和value//value必须是长度为2的double数组 2:生成SequenceNumber 3:将输入的key和value保存到Write Ahead Log中 4:将key和value保存到MemTable中 5:读取内存二级索引的R-tree 6:根据value的坐标cord找到待插入的节点 7:将cord插入到索引树中, 封装key, cord和SequenceNumber插入到二级索引的数据块中 算法2 二级索引查询算法 输入:待查询的坐标cord 输出: K个和cord最相邻的数据 1:从内存到磁盘依次查询所有的二级索引文件, 在每个索引中分别进行K-NN查询 2:统计所有的K-NN查询结果, 按照距离和先后顺序排序保存在结果队列queue中 3: for queue中所有的数据do 4: 检查SequenceNumber是否过期 5: 在主索引中查询该key校验value是否过期 6: if value未过期then 7: 输出该记录, count加1 8: else 9: 跳过该记录 10: end if 11: if count等于10或queue已空then 12: 结束循环 13: end if 14: end for 算法3 内存二级索引更新算法 输入:内存二级索引文件 输出:磁盘二级索引文件 1: LevelDB的MemTable达到上限并转为Immutable MemTable 2:锁定当前内存二级索引并生成新的内存索引文件 3:LevelDB开始将Immutable MemTable转存为Level0的磁盘SSTable 4:为新SSTable生成对应的磁盘二级索引文件IDX中 5:遍历锁定的内存二级索引, 将数据依次插入新的磁盘索引IDX中 算法4 磁盘二级索引更新算法 输入:待更新的二级索引文件 输出:更新后的磁盘二级索引文件 1: LevelDB触发SSTable的合并 2:查找合并过程中删除和更新的key, value放入delete和update队列 3: if是level0的SSTable合并then 4: 去除删除和更新的key, value, 剩余的数据放入insert队列 5: 删除该二级索引文件 6: end if 7:对磁盘大二级索引分别使用ins -
[1] Google Inc. LevelDB[EB/OL].[2019-06-20]. https://githbu.com/google/leveldb. [2] BECKMANN N, KRIEGEL H P, SCHNEIDER R, et al. The R*-tree: An efficient and robust access method for points and rectangles[C]//ACM Sigmod Record. ACM, 1990, 19(2): 322-331. [3] LUO C, CAREY M J. LSM-based storage techniques: A survey[J/OL]. arXiv preprint, arXiv: 1812.07527, 2018. [4] WU L, LIN W, XIAO X, et al. LSⅡ: An indexing structure for exact real-time search on microblogs[C]//2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 2013: 482-493. [5] KHODAEI A, SHAHABI C, LI C, et al. Hybrid indexing and seamless ranking of spatial and textual features of web documents[C]//International Conference on Database and Expert Systems Applications. Berlin: Springer, 2010: 450-466. [6] QADER M A, CHENG S, HRISTIDIS V. A comparative study of secondary indexing techniques in LSM-based NoSQL databases[C]//Proceedings of the 2018 International Conference on Management of Data. ACM, 2018: 551-566. [7] TAN W, TATA S, TANG Y, et al. Diff-Index: Differentiated index in distributed log-structured data stores[C]//EDBT. 2014: 700-711. [8] DSILVA J V, RUIZCARRILLO R, YU C, et al. Secondary indexing techniques for key-value stores: Two rings to rule them all[C]//Proceedings of the 20th International Conference on Extending Database Technology and 20th International Conference on Database Theory 2017. 2017: 21-24.