面向Cedar的列存储设计与实现

俞文谦; 胡爽; 胡卉芪

doi:10.3969/j.issn.1000-5641.2018.05.006

面向Cedar的列存储设计与实现

doi: 10.3969/j.issn.1000-5641.2018.05.006

华东师范大学数据科学与工程学院, 上海 200062

基金项目:

国家自然科学基金 61702189

上海市青年扬帆计划 17YF1427800

详细信息

作者简介:
俞文谦, 男, 硕士研究生, 研究方向为分布式数据库系统.E-mail:wqyu_cs@163.com

通讯作者:
胡卉芪, 男, 助理研究员, 研究方向为数据库.E-mail:hqhu@dase.ecnu.edu.cn

中图分类号: TP392
计量
- 文章访问数: 191
- HTML全文浏览量: 120
- PDF下载量: 253
- 被引次数: 0
出版历程
- 收稿日期: 2018-07-09
- 刊出日期: 2018-09-25

The designs and implementations of columnar storage in Cedar

School of Data Science and Engineering, East China Normal University, Shanghai 200062, China

摘要

摘要: 随着数据规模和分析需求的日益增长，数据库面向联机分析处理（On-Line Analytical Processing，OLAP）应用的查询性能变得愈发重要.Cedar是一款基于读写分离架构的分布式关系数据库，由于它主要面向联机事务处理（On-Line Transaction Processing，OLTP）业务，在面对分析处理负载时性能表现不足.对于这个问题，很多研究表明列存储技术能够有效地提高I/O（Input/Output）效率，进而提升分析处理的性能.在Cedar上提出了一种列存储机制，分析了其适用场景并针对这种机制改进了Cedar的数据扫描和批量更新方法.实验结果表明，该机制能大幅度地提升Cedar分析处理性能，并且对事务处理性能的影响控制在10%以内.
- 分布式数据库 /
- 列存储 /
- 联机分析处理(OLAP)
Abstract: With the growing size of data and analytical needs, the query performance of databases for OLAP (On-Line Analytical Processing) applications has become increasingly important. Cedar is a distributed relational database based on read-write decoupled architecture. Since Cedar is mainly oriented to the needs of OLTP (On-Line Transaction Processing) applications, it has insufficient performance for handling analytical processing workloads. To address this issue, many studies have shown that column storage technology can effectively improve the efficiency of I/O (Input/Output) and enhance the performance of analytical processing. This paper presents a column-based storage mechanism in Cedar. The study analyzes applicable scenarios and improves Cedar's data query and batch update methods for this mechanism. The results of an experiment demonstrate that the proposed mechanism can enhance the performance of analytical processing substantially, while limiting the negative impacts on transaction processing performance to within 10%.
- distributed database /
- column-based storage /
- OLAP

HTML全文

图 1 Cedar系统架构

Fig. 1 The architecture of Cedar

下载: 全尺寸图片幻灯片

图 2 CS上列存储机制架构

Fig. 2 The architecture of columnar storage on the CS

下载: 全尺寸图片幻灯片

图 3 Parquet数据结构

Fig. 3 The data structure of Parquet

下载: 全尺寸图片幻灯片

图 4 聚合函数性能

Fig. 4 The performance of the aggregate functions

下载: 全尺寸图片幻灯片

图 5 事务处理性能

Fig. 5 Comparison transaction processing performance

下载: 全尺寸图片幻灯片

图 6 查询涉及的列数对性能的影响

Fig. 6 The performance impact of the number of columns in a query

下载: 全尺寸图片幻灯片

图 7 数据重复率对查询性能的影响

Fig. 7 The effect of data repetition on performance

下载: 全尺寸图片幻灯片

算法 1 列存储数据扫描算法
输入: 关系表 $T$, 投影属性列集合 $\{a_1, a_2, \cdots, a_i\}$, 查询条件 $c$
输出: 查询结果集
1: 通过文件索引 file_index 中定位属性列集合所在的存储节点和 parquet_file;
2: 读入 parquet_file的行组元数据;
3: /若查询条件含有主键, 则使用二分法过滤部分行组/
4: if 查询条件含有主键 then
5: scan_row_groups为二分法过滤后的行组集合;
6: else
7: scan_row_groups为parquet_file的全部行组;
8: end if
9: for all scan_row_groups do
10: 读入行组元数据对应的列值范围 [min, max];
11: 在非主键列的查询条件下对行组进行查询;
12: if 得到的结果集与行组对应列值范围交集为空 then
13: 过滤该行组;
14: continue;
15: else
16: 将行组中投影需要的列块读入内存;
17: end if
18: end for

下载: 导出CSV

算法2 列存储数据批量更新算法
输入:关系表$T$的所有数据记录$\{r_1, r_2, \cdots, r_n\}$
输出:列存储数据文件
1:初始化parquet_file;
2: for all关系表$T$中的属性列$\{a_1, a_2, \cdots, a_i\}$ do
3: 初始化该属性列$a_i$对应的数据写入器colum_writer$_{i}$;
4: end for
5: while基线数据存取器获取一个需要写入的记录$r_i\in\{r_1, r_2, \cdots, r_n\}$ do
6: /将行组写入文件/
7: if当前处理的记录数超过parquet_file的行组限制then
8: if当前数据文件大于阈值then
9: 创建新的parquet_file;
10: end if
11: for all $i\in\{1, 2, \cdots, p\}$ do
12: 将列块colum_chunk$_{i}$进行Snappy格式压缩后写入parquet_file;
13: end for
14: end if
15: /数据写入器将记录的每个列数据迭代填入各自数据块缓存/
16: for all属性列的数据项do
17: if当前该列的数据块缓存大于阈值then
18: 将该列数据块缓存中的列数据写入列块colum_chunk$_{i}$;
19: 创建该列新的数据块缓存;
20: end if
21: 对数据项进行RLE编码后写入该列对应的数据块缓存;
22: end for
23: end while

下载: 导出CSV

参考文献(16)

[1]	CODD E F, CODD S B, SALLEY C T. Providing OLAP (On-Line Analytical Processing) to User-Analysts:An IT mandate[J]. Codd and Date, 1993, 32:3-5. http://ci.nii.ac.jp/naid/10020853884
[2]	华东师范大学.面向大型银行应用的高通量可伸缩分布式数据库系统Cedar[DB/OL].[2018-05-16]. https://github.com/daseECNU/Cedar.
[3]	COPELAND G P, KHOSHAFIAN S N. A decomposition storage model[C]//ACM SIGMOD International Conference on Management of Data. New York: ACM, 1985: 268-279. http://users.csc.calpoly.edu/~dekhtyar/560-Fall2012/papers/DSM-columns.pdf
[4]	ABADI D, MADDEN S, HACHEM N, Column-stores vs. row-stores: How different are they really?//Proceedings of the 2008 ACM SIGMOD international conference on Management of data. New York: ACM, 2008: 967-980.
[5]	RAMAN V, ATTALURI G, BARBER R, et al. DB2 with BLU acceleration:So much more than just a column store[J]. Proceedings of the VLDB Endowment, 2013, 11:1080-1091. http://dl.acm.org/citation.cfm?id=2536233
[6]	PETRAKI E, IDREOS S, MANEGOLD S. Holistic Indexing in main-memory column-stores[C]//ACM SIGMOD International Conference on Management of Data. New York: ACM, 2015: 1153-1166. http://stratos.seas.harvard.edu/files/stratos/files/holisticindexing.pdf
[7]	LANG H, FUNKE F, BONCZ P A, et al. Data blocks: Hybrid OLTP and OLAP on compressed storage using both vectorization and compilation[C]//International Conference on Management of Data. New York: ACM, 2016: 311-326. https://15721.courses.cs.cmu.edu/spring2018/papers/22-vectorization2/p311-lang.pdf
[8]	RAMNARAYAN J, MOZAFARI B, WALE S, et al. SnappyData: A hybrid transactional analytical store built on spark[C]//International Conference on Management of Data. New York: ACM, 2016: 2153-2156. http://web.eecs.umich.edu/~mozafari/php/data/uploads/sigmod_2016_demo.pdf
[9]	LEE J, HAN W S, NA H J, et al. Parallel replication across formats for scaling out mixed OLTP/OLAP workloads in main-memory databases[J]. The VLDB Journal, 2018, 27(3):421-444. doi: 10.1007/s00778-018-0503-z
[10]	SQream. SQream SQream DB[DB/OL].[2018-06-16]. https://sqream.com/.
[11]	ROOT C, MOSTAK T. MapD: A GPU-powered big data analytics and visualization platform[C]//Proceeding of the SIGGRAPH'16 ACM SIGGRAPH 2016 Talks. New York: ACM, 2016: Article No 73. DOI: 10.1145/2897839.2927468.
[12]	阳振坤. OceanBase关系数据库架构[J].华东师范大学学报(自然科学版), 2014(5):141-148, 163. doi: 10.3969/j.issn.1000-5641.2014.05.012
[13]	黄贵, 庄明强. OceanBase分布式存储引擎[J].华东师范大学学报(自然科学版), 2014(5):164-172. doi: 10.3969/j.issn.1000-5641.2014.05.014
[14]	GOOGLE. Google Snappy[DB/OL].[2018-06-16]. https://github.com/google/snappy.
[15]	SALOMON D. Data Compression:The Complete Reference[M]. New York:Springer-Verlag Inc, 2000.
[16]	APACHE. Apache parquet[DB/OL].[2018-06-16]. http://parquet.apache.org/.

施引文献

资源附件(0)

访问统计

点击查看大图

图(7) / 表(2)

计量

文章访问数: 191
HTML全文浏览量: 120
PDF下载量: 253
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

面向Cedar的列存储设计与实现

doi: 10.3969/j.issn.1000-5641.2018.05.006

作者简介:
俞文谦, 男, 硕士研究生, 研究方向为分布式数据库系统.E-mail:wqyu_cs@163.com

通讯作者:
胡卉芪, 男, 助理研究员, 研究方向为数据库.E-mail:hqhu@dase.ecnu.edu.cn

计量

The designs and implementations of columnar storage in Cedar

计量

目录

留言板

面向Cedar的列存储设计与实现

doi: 10.3969/j.issn.1000-5641.2018.05.006

作者简介: 俞文谦, 男, 硕士研究生, 研究方向为分布式数据库系统.E-mail:wqyu_cs@163.com

通讯作者: 胡卉芪, 男, 助理研究员, 研究方向为数据库.E-mail:hqhu@dase.ecnu.edu.cn

计量

出版历程

The designs and implementations of columnar storage in Cedar

计量

出版历程

目录

作者简介:
俞文谦, 男, 硕士研究生, 研究方向为分布式数据库系统.E-mail:wqyu_cs@163.com

通讯作者:
胡卉芪, 男, 助理研究员, 研究方向为数据库.E-mail:hqhu@dase.ecnu.edu.cn