异构Redis集群大规模评论数据存储负载均衡设计

张敬伟; 丁志均; 杨青; 张会兵; 张海涛; 周娅

doi:10.3969/j.issn.1000-5641.2017.05.003

异构Redis集群大规模评论数据存储负载均衡设计

doi: 10.3969/j.issn.1000-5641.2017.05.003

1.
桂林电子科技大学广西可信软件重点实验室, 广西桂林 541004
2.
桂林电子科技大学广西自动检测技术与仪器重点实验室, 广西桂林 541004

基金项目:

国家自然科学基金 61363005

国家自然科学基金 61462017

国家自然科学基金 U1501252

广西自然科学基金 2014GXNSFAA118353

广西自然科学基金 2014GXNSFAA118390

广西自动检测技术与仪器重点实验室基金 YQ15110

广西高校中青年教师基础能力提升项目 ky2016YB156

详细信息

作者简介:
张敬伟, 男, 博士, 副教授, 研究方向为海量数据管理.E-mail:gtzjw@hotmail.com

通讯作者:
杨青, 女, 副教授, 研究方向为智能信息处理.E-mail:gtyqing@hotmail.com

中图分类号: TP315
计量
- 文章访问数: 390
- HTML全文浏览量: 160
- PDF下载量: 469
- 被引次数: 0
出版历程
- 收稿日期: 2017-06-30
- 刊出日期: 2017-09-25

Storage and load balancing for large-scale comment data on heterogeneous Redis cluster

1.
Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin Guangxi 541004, China
2.
Guangxi Key Laboratory of Automatic Detection Technology and Instrument, Guilin University of Electronic Technology, Guilin Guangxi 541004, China

摘要

摘要: 大规模评论数据的存储与查询性能对构建于其上的各类应用的快速响应具有重要影响.同时，异构计算环境中各计算节点性能呈现差异，如何充分开采各节点的计算和存储性能，优化大规模评论数据的存储与查询性能，是一个关键挑战.基于Redis集群的数据管理优势，首先提出了一种同构环境下基于卡槽存储平衡的大规模评论数据存储模型；然后论证了卡槽数目与节点查询效率的关系，以"负载与访问性能相平衡"的原则分配卡槽，进一步设计了异构环境下的集群节点负载计算和存储分配方法，充分开采了异构Redis集群中不同节点的性能.实验结果表明，提出的存储模型具有很好的存储平衡效果，提升了集群的整体查询效率.
- 大规模评论数据 /
- 存储负载均衡 /
- 查询优化
Abstract: The storage and query performance for large-scale comment data have a great influence on those applications built on the above data. In a heterogeneous computing environment, each node has different performance on storage and computation, it presents a key challenge for optimizing the storage and query performance for large-scale comment data by taking full advantage of the performance of each node. Based on the ability of Redis cluster, we design a storage model for large-scale comment data in a homogeneous Redis cluster, which provides the storage balancing in Redis slots. And then, we discuss the relationship between the number of Redis slots and query efficiency to design a method for allocating storage on the real load of each computing node for heterogeneous Redis clusters, which can make full use of the performance of each node and can guide to allocate slots to nodes by balancing the query performance and storage loading. Our experimental results show that the proposed model has a good effect on storage loading and improve the query efficiency of the heterogeneous Redis cluster.
- large-scale comment data /
- storage and load balancing /
- query optimization

HTML全文

图 1 不同存储负载的访问示例

Fig. 1 Accessing illustration on different storage loading

下载: 全尺寸图片幻灯片

图 2 16 384个卡槽测试键表

Fig. 2 Illustrating test key table for 16 384 slots

下载: 全尺寸图片幻灯片

图 3 查询负载测试过程

Fig. 3 The test process for query performance

下载: 全尺寸图片幻灯片

图 4 节点查询性能测试

Fig. 4 Performance test for node query

下载: 全尺寸图片幻灯片

图 5 迁移卡槽前后存储数据量对比

Fig. 5 Comparison of data volume before and after shifting slots

下载: 全尺寸图片幻灯片

表 1 评论数据二级索引结构

Tab. 1 Two-level index for comment data

键名	值
键名	排序值	值内容
ItemID	StartTime	ItemID: Number

下载: 导出CSV

表 2 评论数据存储结构

Tab. 2 Storage structure for comment data

键名	值
键名	排序值	值内容
ItemID: Number	Timestamp	UserID: comment

下载: 导出CSV

表 3 基于用户ID的辅助索引结构

Tab. 3 A secondary index on UserID

键名	值
UserID	ItemID: Number: Timestamp,
	ItemID: Number: Timestamp,
	${\cdots}$

下载: 导出CSV

表 4 存储平衡分割参数(LineNum)的测试结果

Tab. 4 Experimental results for parameter(LineNum) of storage partition

分割参数(LineNum)/万	节点1/M	节点2/M	节点3/M	标准差
1	502.40	501.26	460.20	19.63
2	495.12	489.26	471.28	10.14
3	509.81	486.51	421.12	37.54
4	486.44	526.01	433.86	37.74
5	557.57	414.99	450.43	60.61
6	644.51	362.38	410.35	123.26
7	664.95	376.70	384.30	134.13
8	661.56	400.65	373.29	129.92
9	648.79	469.65	339.60	126.76
10	602.16	471.74	341.24	106.52

下载: 导出CSV

表 5 卡槽转移后存储数据比例

Tab. 5 Ratios after shifting slots

项目	节点(1核、2核、4核)	比例值
键值	235、299、620	1: 1.272: 2.638
卡槽转移后存储容量	301.92 MB、382.43 MB、777.58 MB	1: 1.266: 2.639
卡槽数量	3 449、4 000、8 935	1: 1.159: 2.590

下载: 导出CSV

表 6 查询数据表

Tab. 6 Data fact for testing queries

商品ID	评论数目/条	容量/M
1	117 908	27.1
2	58 182	15.1
3	212 708	40.5
4	85 236	11.4
5	104 352	9.07
6	93 431	8.73
7	53 937	4.65
8	26 439	2.60
9	1 691	1.12
10	985	0.508

下载: 导出CSV

表 7 范围查询测试结果

Tab. 7 The experimental results for queries

查询范围项目	卡槽移动前/s	卡槽移动后/s	速度提高率/%
1日	0.022 64	0.018 35	23.4
1月	0.278 66	0.223 21	24.8
半年	1.112 17	1.032 61	7.71
1年	1.944 21	1.833 71	6.00
1年半	2.404 70	2.168 68	10.9

下载: 导出CSV

参考文献(10)

[1]	INTEL. A yearly product cadence moves the industry forward in a predictable fashion that can be planned in advance[EB/OL].[2017-05-10]. https://www.intel.com/content/www/us/en/silicon-innovations/intel-tock-modelgeneral.html.
[2]	CHANG F, DEAN J, GHEMAWAT S. et al. Bigtable:A distributed storage system for structured data[J]. ACM Transactions on Computer Systems, 2006, 26(2):205-218.
[3]	BORTHAKUR D. The Hadoop distributed file system:Achitecture and design[EB/OL].[2017-06-02]. http://hadoop.apache.org/common/docs/r0.180/hdfsdesign.pdf.
[4]	申德荣, 于戈, 王习特, 等.支持大数据管理的NoSQL系统研究综述[J].软件学报, 2013(8):1786-1803. http://www.cnki.com.cn/Article/CJFDTOTAL-RJXB201308008.htm
[5]	何亚农, 宋玮, 赵跃龙.基于平衡结构的对等网络存储系统研究[J].计算机工程与设计, 2011, 32(8):2611-2613. http://www.cnki.com.cn/Article/CJFDTOTAL-SJSJ201108014.htm
[6]	KALA K A, CHITHARANJAN K. Locality Sensitive Hashing based incremental clustering for creating affinity groups in Hadoop-HDFS-An infrastructure extension[C]//International Conference on Circuits, Power and Computing Technologies. IEEE, 2013:1243-1249.
[7]	ROWSTRON A, DRUSCHEL P. Storage management and caching in PAST, a large-scale, persistent peer-topeer storage utility[C]//Proceedings of the 18th ACM Symposium on Operating Systems Principles. ACM, 2001:188-201.
[8]	OKCAN A, RIEDEWALD M. Processing theta-joins using MapReduce[C]//Proceedings of SIGMOD International Conference on Management of Data. ACM, 2011:949-960.
[9]	WEI Q, VEERAVALLI B, GONG B, et al. CDRM:A cost-effective dynamic replication management scheme for cloud storage cluster[C]//IEEE International Conference on CLUSTER Computing. 2010:188-196.
[10]	XIE C, CAI B. A decentralized storage cluster with high reliability and flexibility[C]//Proceedings of 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing. IEEE, 2006:1-8.