Woodpecker+: Customized workload performance evaluation based on data characteristics
-
摘要: 数据库的性能评测随着复杂多样的应用出现变得更加重要.在很多情况下,研究、开发人员的性能评估工作受限于负载的缺乏.虽然OLTP-Bench通用数据库性能测试框架在一定程度上提高了性能测试的效率,但由于标准Benchmark负载固定且针对用户多样的应用场景的代表性差,故无法精准地满足不同应用的系统性能;此外,大部分测试框架使用高级编程语言支持为应用编写测试负载,不仅增加测试负担还会引入大量重复性工作,导致测试效率低下.本文设计并实现了一个用户自定义性能测试负载的工具:Woodpecker+.该工具的主要贡献为:易于使用和可拓展、提供了可高效构造测试案例、做测试安排的测试定义语言(TDL)、可灵活控制事务执行模式及数据访问分布、实现了轻量级的细粒度统计信息收集与分析、支持多种主流数据库系统DBMS及提供数据库访问接口的其他数据库.通过一组详细的运行在主流DBMS上的自定义负载实验来验证Woodpecker+的特性.Abstract: There are a number of performance testing tools, like Sysbench and OLTPBench, that can be used to benchmark the testing of database performance. However, because the standard benchmark workload is fixed and application scenarios for users are not always representative, it is impossible to accurately determine system performance. Moreover, if users are required to use a high-level programming language to implement a test workload separately for each application, this will undoubtedly introduce a substantial amount of repetitive work, resulting in inefficient testing. To address these issues, this paper designs and implements a user-defined performance test workload tool. The main benefits of this tool can be summarized as follows:It is easy to use and expandable; it provides a test definition language (TDL) for efficient construction of test cases; and it offers flexible control for mixed execution of transactions, data access distribution, lightweight and granular statistical information collection and analysis, and support for multiple mainstream DBMSs and other databases that provide database access interfaces. We highlight the tool's features through a detailed set of customized workload experiments running on the mainstream DBMS.
-
Key words:
- database /
- performance test /
- customized workload
-
表 1 数据生成与导入关键字
Tab. 1 Data generation and importing of keywords
关键字 说明 TABLE[tbl_name; tbl_size; col_name, col_type; ⋯; 定义表模式, PK为主键(支持复合主键), FK为外键, INDEX为表中的索引, 其中外键和索引可存在多个 PK(col_name, ⋯, auto_increment); FK(col_name, ref_tbl_name, ref_col_name); ⋯; INDEX(index_name, col_name, ⋯); ⋯] Column[col_name, null_ratio, cardinality, range1, range2]; 指定某属性的数据特征 IMPORT_TBL[tbl_name; ⋯] 创建表并按照数据特征导入数据 CLEAR_TBL[tbl_name; ⋯] 删除已经创建的表和导入的数据 表 2 负载操作关键字
Tab. 2 Load operation keywords
关键字 说明 TXN[ratio; txnName] transaction body; END_TXN 定义事务, ratio 为该类事务在负载中所占的比例, txnName 为事务名 TXN_LOADING[thread_number; thread_run_times; load_machine_number; sync_or_async] 加载事务, 参数分别为总数据库链接数、每个链接的执行次数、负载机个数和执行方式(同步或异步). MULTIPLE[min, max] operations; END_MULTIPLE 模拟真实业务 while 逻辑, 执行次数为 min 和 max 之间的随机值 BRANCH[ratio; ⋯] operations; END_ BRANCH 模拟真实业务分支逻辑, ratio 为每个分支的执行比例 BRANCH_DELIMITER BRANCH 中分支间的分隔符 INSERT/REPLACE[tbl_name; is_prepared; distribution_type] 插入(替换)操作, 参数分别为表名、是否预编译执行、操作访问分布 SELECT[tbl_name, ⋯; is_prepared; distribution_ type, select_expression, ⋯; FILTER; APPEND (expression)] 选择操作, FILTER 为 where 子句, APPEND 为 having、order by、group by 等子句 DELETE[tbl_name, ⋯; is_prepared; distribution_ type; FILTER] 删除操作 UPDATE[tbl_name, ⋯; is_prepared; distribution_ type; col_name operator updated_value, ⋯; FILTER] 更新操作, 参数示例: c1 += 7 = > c1 = c1 + 7, operator 可为 =、+=、++ 等 SELECT_FOR_UPDATE[tbl_name, ⋯; is_prepared; distribution_type; select_expression, ⋯; FILTER; APPEND(statement)] 查询操作, 除了会对涉及的数据加锁, 功能 基本与 SELECT 操作相同 FILTER(col_name operator, & /, ⋯) where 子句后的过滤条件, 参数示例为: (c1 =, & , c2 =) 表示同时满足 c1 和 c2 两个等值条件, 等于的值由指定操作分布生成, operator 可为 =, > , < , != 等 APPEND(statement) group by、order by、limit、having 条件 表 3 自定义表Schma导入数据案例
Tab. 3 Customized table schema example
行号 操作 1 TABLE[student; 100; STU_ D int, STU_ NAME varchar(20), STU_ AGE int, STU_ DEPT int; PK(STU_ ID)]; 2 COLUMN[student; STU NAME; 0.0; 100; 4; 16]; 3 COLUMN[student; STU_ AGE; 0.0; 20; 18; 38]; 4 COLUMN[student; STU_ DEPT; 0.0; 10; 1; 10]; 5 TABLE[score; 2000; STU_ D int, SC_ AVG Decimal(4, 2), SC_ GRADE char; PK(STU_ ID)]; 6 COLUMN[score; SC AVG; 0.0; 30; 50; 100]; 7 IMPORT TBL[student; score]; 表 4 自定义数据特征事务案例
Tab. 4 Customized data characteristics transactions example
行号 操作 1 TXN[0.4; " AddScores" ]; 2 MULTIPLE[1,100]; 3 REPLACE[score; true; NULL; ]; 4 END_MULTIPLE; 5 SELECT_FOR_UPDATE[score; true; NORMAL(1, 100, 1) 6 SC_AVG, SC_GRADE; FILTER(STU_ID =); APPEND(ORDER BY SC_AVG)]; 7 TXN_END; 8 TXN[0.6; " OperateScores" ]; 9 UPDATE[score; true; UNIQUE(1, 2000); SC_GRADE = 'A'; filter(SC_AVG > 90)]; 10 BRANCH[0.8; 0.2]; 11 SELECT[score; true; ZIPFIAN(1, 100, 10, 3) " *" ; filter(STU_ID =); append(GROUP BY SC_GRADE)]; 12 BRANCH_DELIMITER 13 DELETE[score; true; UNIFORM[1,2000]; FILTER(S_AVG = 0)]; 14 TXN_END; 15 TXN_LOADING[10, 100, 1, asyn] 表 5 TDL测试案例构造代价对比
Tab. 5 Comparison of the cost of generating a test case with TDL
Benchmark 表数 事务数 Java代码行数 TDL代码行数 TPC-C 9 5 6223 84 SmallBank 3 6 4575 39 SEATS 10 6 7364 48 Voter 3 3 4278 13 YSCB 1 6 4368 12 表 6 SmallBank读密集型负载组织-TDL实现
Tab. 6 Read intensive workload organization of SmallBank with TDL
行号 操作 1 // 创建三张表, 每张表大小100万 2 TABLE[ACCOUNTS; 1000000; custid int, name varchar(64); PK(custid)]; 3 TABLE[SAVINGS; 1000000; custid int, bal float; PK(custid)]; 4 TABLE[CHECKING; 1000000; custid int, bal float; PK(custid)]; 5 // 导入创建的三张表 6 IMPORT_TBL[ACCOUNTS; SAVINGS; CHECKING]; 7 // 第一组事务, 执行比例15% 8 TXN[0.15; “txn1”]; 9 SELECT[ACCOUNTS; true; unique(0, 999999); " *" ; filter(custid =)]; 10 SELECT[SAVINGS; true; unique(0, 999999); " bal" ; filter(custid =)]; 11 SELECT[CHECKING; true; unique(0, 999999); " bal" ; filter(custid =)]; 12 END_TXN; 13 // 第二组事务, 执行比例15% 14 TXN[0.15; “txn2”]; 15 SELECT[ACCOUNTS; true; unique(0, 999999); " *" ; filter(custid =)]; 16 SELECT[SAVINGS; true; unique(0, 999999); " bal" ; filter(custid =)]; 17 SELECT[CHECKING; true; unique(0, 999999); " bal" ; filter(custid =)]; 18 END_TXN; 19 // 第三组事务, 执行比例15% 20 TXN[0.15; “txn3”]; 21 SELECT[ACCOUNTS; true; unique(0, 999999); " *" ; filter(custid =)]; 22 END_TXN; 23 // 第四组事务, 执行比例15% 24 TXN[0.25; “txn4”]; 25 SELECT[ACCOUNTS; true; unique(0, 999999); " *" ; filter(custid =)]; 26 SELECT[CHECKING; true; unique(0, 999999); " bal" ; filter(custid =)]; 27 END_TXN; 28 // 第五组事务, 执行比例15% 29 TXN[0.15; “txn5”]; 30 SELECT[ACCOUNTS; true; unique(0, 999999); " *" ; filter(custid =)]; 31 SELECT[SAVINGS; true; unique(0, 999999); " bal" ; filter(custid =)]; 32 END_TXN; 33 // 第六组事务, 执行比例15% 34 TXN[0.15; “txn6”]; 35 SELECT[ACCOUNTS; true; unique(0, 999999); " *" ; filter(custid =)]; 36 SELECT[SAVINGS; true; unique(0, 999999); " bal" ; filter(custid =)]; 37 SELECT[CHECKING; true; unique(0, 999999); " bal" ; filter(custid =)]; 38 END_TXN; 39 // 导入创建的三张表 40 TXN_LOADING[100; 10000; 1; sync] 表 7 SmallBank读密集型负载组织-Sysbench实现
Tab. 7 Read intensive workload organization of SmallBank with Sysbench
行号 操作 1 // 创建账户表的函数 2 function create_account_table(table_num) 3 local query 4 query = string.format(CREATE TABLE ACCOUNT(custid INTEGER NOT NULL, 5 name VARCHAR DEFAULT '0' NOT NULL, PRIMARY KEY(custid))) 6 con:query(query) 7 if sysbench.opt.auto_inc then 8 query = " INSERT INTO ACCOUNT" .. table_num.. " name) VALUES" 9 else 10 query = " INSERT INTO ACCOUNT" .. table_num.. " (custid, name) VALUES" 11 end 12 con:bulk_insert_init(query) 13 for i = 1, sysbench.opt.table_size do 14 name_val = get_name_value() 15 if (sysbench.opt.auto_inc) then 16 query = string.format(" ('%s')" , sysbench.rand.default(1, sysbench.opt.table_size), name_val) 17 else 18 query = string.format(" (%d, '%s')" , i, sysbench.rand.default(1, sysbench.opt.table_size), name_val) 19 end 20 con:bulk_insert_next(query) 21 end 22 con:bulk_insert_done() 23 end 24 // 执行第一组事务的函数 25 function execute_trx1() 26 if not sysbench.opt.skip_trx then 27 begin() 28 end 29 account_point_selects() 30 savings_point_selects() 31 checking_point_selects() 32 if not sysbench.opt.skip_trx then 33 commit() 34 end 35 check_reconnect() 36 end 表 8 双Paxos组分布式事务测试负载
Tab. 8 Distributed transaction workload of Double Paxos group
行号 操作 1 CREATE_CLIENT clt1; 2 // 三Paxos组需多插入一条(‘par1#2’, 2, 2)记录 3 REPLACE INTO __all_all_group(group_name, start_version, paxos_id) VALUES ('par1#0', 2, 0), ('par1#1', 2, 1); 4 //三Paxos组修改哈希函数’x%2’为’x%3’ 5 INSERT INTO __all_partition_rules(rule_name, rule_par_num, rule_par_list, rule_body, type) values('func1', 1, 'x', 'x%2', 0); 6 //三Paxos组插入的数据为300万 7 TABLE[USERTABLE; 2000000; YCSB_KEY int, FIELD1 varchar(100), FIELD2 varchar(100), FIELD3 varchar(100), 8 PARTITION_BY(par1, func1, YCSB_KEY)]; ]; 9 IMPORT_TBL[USERTABLE; ]; 10 //三Paxos组UPDATE和REPLACE的唯一值访问分布范围为100~200万和200~300万 11 TXN[1.0; “distributed_txn”]; 12 SELECT[USERTABLE; true; unique(0, 9999999); " *" ; filter(YCSB_KEY =)]; 13 SELECT[USERTABLE; true; unique(0, 9999999); " *" ; filter(FIELD1 > , & , FIELD1 < )]; 14 UPDATE[USERTABLE; true; unique(0, 999999); FIELD2 = “tee”; filter(YCSB_KEY =)]; 15 REPLACE[USERTABLE; true; unique(1000000, 1999999); FIELD2 =”tee”; filter(YCSB_KEY =)]; 16 END_TX; N 17 TXN_LOADING[100; 12000; 1; sync]; 表 9 负载可拓展定义案例
Tab. 9 Extensible workload test case
行号 操作 1 TABLE[USERTABLE; 1200000; YCSB_KEY int, FIELD1 varchar(100), FIELD2 varchar(100), 2 FIELD3 varchar(100), FIELD4 varchar(100), FIELD5 varchar(100), FIELD6 varchar(100), 3 FIELD7 varchar(100), FIELD8 varchar(100), FIELD9 varchar(100), FIELD10 varchar(100); 4 PK(YCSB_KEY)]; 5 IMPORT_TBL[USERTABLE; ]; 6 TXN[0.8; “read_txn”]; 7 SELECT[USERTABLE; true; zipfian(0, 1199999, 10, 3); " *" ; filter(YCSB_KEY =)]; 8 END_TXN; 9 TXN[0.2; “write_txn”]; 10 DELETE[USERTABLE; true; zipfian(0, 1199999, 10, 3); filter(YCSB_KEY =)]; 11 END_TXN; 12 // 三个参数的意义分别为; 最大线程数、每个线程的执行次数、负载机个数和负载执行方式 13 // 可通过调整最大线程数和负载机个数来控制负载的执行次数。 14 TXN_LOADING[120; 10000; 3; sync]; -
[1] GEORGE L. HBase:The Definitive Guide:Random Access to Your Planet-Size Data[M]. CA:O'Reilly Media, 2011. [2] STONEBRAKER M, WEISBERG A. The VoltDB Main Memory DBMS[J]. IEEE Data Eng Bull, 2013, 36(2):21-27. http://cn.bing.com/academic/profile?id=68cfdec87cf7bf2946b2e3962fcc8f2f&encoded=0&v=paper_preview&mkt=zh-cn [3] NVM[EB/OL].[2019-01-14]. https://en.wikipedia.org/wiki/NVM. [4] ORACLE. The MySQL Test Framework[EB/OL].[2017-07-12]. https://dev.mysql.com/doc/mysqltest/2.0/en/. [5] SysBench: A system performance benchmark[EB/OL].[2019-01-14]. https://sysbench.sourceforge.net. [6] DIFALLAH D E, PAVLO A, CURINO C, et al. OLTP-Bench:An extensible testbed for benchmarking relational databases[J]. Proceedings of the VLDB Endowment, 2013, 7(4):277-288. doi: 10.14778/2732240.2732246 [7] ECNU[EB/OL]. Woodpecker.[2019-01-14]. https://github.com/Gizing/Woodpecker. [8] Transaction Processing Performance Council (TPC)[EB/OL].[2019-01-14]. http://www.tpc.org.2011. [9] COOPER B F, SILBERSTEIN A, TAM E, et al. Benchmarking cloud serving systems with YCSB[C]//Proceedings of the 1st ACM symposium on Cloud computing, New York: ACM, 2010: 143-154. [10] CAHILL M J, ROHM U, FEKETE A D. Serializable isolation for snapshot databases[J]. ACM Transactions on Database Systems, 2009, 34(4):1-42. http://d.old.wanfangdata.com.cn/OAPaper/oai_arXiv.org_1208.4179 [11] KODAGANALLUR V. Incorporating language processing into Java applications:A JavaCC tutorial[J]. IEEE Software, 2004, 21(4):70-77. doi: 10.1109/MS.2004.16 [12] Nmon for Linux[EB/OL].[2019-01-14]. http://nmon.sourceforge.net/pmwiki.php. [13] CBASE. Bank of Communications[EB/OL].[2019-01-14]. https://github.com/BankOfCommunications/CBASE. [14] YAN C, CHEUNG A. Leveraging lock contention to improve OLTP application performance[J]. Proceedings of the Vldb Endowment, 2016, 9(5):444-455. doi: 10.14778/2876473.2876479 [15] 杨传辉.大规模分布式存储系统原理解析与架构实现[M].北京:机械工业出版社, 2013. [16] Two-phase commit protocol[EB/OL].[2019-01-14]. https://en.wikipedia.org/wiki/Two-phasecommitprotocol.