Planet MySQL

Updating InnoDB Table Statistics Manually

In this post, we will discuss how to fix cardinality for InnoDB tables manually.

As a support engineer, I often see situations when the cardinality of a table is not correct. When InnoDB calculates the cardinality of an index, it does not scan the full table by default. Instead it looks at random pages, as determined by options innodb_stats_sample_pages, innodb_stats_transient_sample_pages and innodb_stats_persistent_sample_pages, or by the 

CREATE TABLE option STATS_SAMPLE_PAGES. The default value for persistent statistics is 20. This approach works fine when the number of unique values in your secondary key grows in step with the size of the table. But what if you have a column that has a comparatively small number of unique values? This could be a common service, many-to-many relationship table, for example, or just a table containing a list of sell orders that belong to one of a dozen shops owned by the company. Such tables could grow up to billions of rows with a small (less than 100) number of unique shop IDs.

At some point, InnoDB will report the wrong values for such indexes. Really! If 20 pages have 100 unique shop IDs, how many unique shop IDs would 20000 pages have? 100 times 1000? This seems logical, and after a certain number of rows such indexes will have extraordinarily large cardinality values.

ANALYZE TABLE will not help, because it uses the same algorithm. Increasing the number of “stats” sample pages would help, but it has its own downside: the more pages you have to examine, the slower ANALYZE TABLE runs. While this command is not blocking, it still creates side effects as described in this blog post. And the longer it runs, the less control you have.

Another issue with InnoDB statistics: even if it is persistent and

STATS_AUTO_RECALC is set to 0, it still adds values for secondary indexes as shown in lp:1538765. Eventually, after you insert million of rows, your statistics get corrupted. ANALYZE TABLE  can fix it only if you specify a very large number of “stats” sample pages. Can we do anything about it?

InnoDB stores statistics in the “mysql” database, in the tables

innodb_table_stats and innodb_index_stats. Since they are regular MySQL tables, privileged users can access them. We can update them and modify statistics as we like. And these statistics are used by the Optimizer!

I created a small example showing how to do this trick. I used Percona Server for MySQL version 5.7.19, but the trick will work on any supported MySQL and Percona Server for MySQL version.

First, let’s create test tables. The first table has shops, with a few shop profiles with the shop ID and name:

create table shops( shop_id int not null auto_increment primary key, name varchar(32) ) engine=innodb;

The second table refers to the “shops” table:

create table goods( id int not null auto_increment primary key, shop_id int not null, name varchar(32), create_date datetime DEFAULT NULL, key (shop_id, create_date) ) engine=innodb;

Let’s check how many unique shops we have:

mysql> select count(distinct shop_id) from shops; +-------------------------+ | count(distinct shop_id) | +-------------------------+ | 100 | +-------------------------+ 1 row in set (0.02 sec)

With 100 distinct shops, and a key on

(shop_id, create_date), we expect cardinality in table goods to be not much different than this query result:mysql> select count(distinct id) as `Cardinality for PRIMARY`, -> count(distinct shop_id) as `Cardinality for shop_id column in index shop_id`, -> count(distinct shop_id, create_date) as `Cardinality for create_date column in index shop_id` -> from goods *************************** 1. row *************************** Cardinality for PRIMARY: 8000000 Cardinality for shop_id column in index shop_id: 100 Cardinality for create_date column in index shop_id: 169861 1 row in set (2 min 8.74 sec)

However, 

SHOW INDEX returns dramatically different values for the column shop_id:mysql> show index from goods; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | goods | 0 | PRIMARY | 1 | id | A | 7289724 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 1 | shop_id | A | 13587 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 2 | create_date | A | 178787 | NULL | NULL | YES | BTREE | | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 3 rows in set (0.09 sec)

ANALYZE TABLE does not help:mysql> analyze table goods; +------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------+---------+----------+----------+ | test.goods | analyze | status | OK | +------------+---------+----------+----------+ 1 row in set (0.88 sec) mysql> show index from goods; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | goods | 0 | PRIMARY | 1 | id | A | 7765796 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 1 | shop_id | A | 14523 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 2 | create_date | A | 168168 | NULL | NULL | YES | BTREE | | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 3 rows in set (0.00 sec)

As a result, if we join the two tables, Optimizer chooses the wrong

JOIN order and query execution plan:mysql> explain select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486); +----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+ | 1 | SIMPLE | shops | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 100 | 100.00 | Using where; Using index | | 1 | SIMPLE | goods | NULL | ref | shop_id | shop_id | 4 | test.shops.shop_id | 534 | 11.11 | Using index condition | +----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+ 2 rows in set, 1 warning (0.13 sec) mysql> P md5sum PAGER set to 'md5sum' mysql> select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486); 4a94dabc4bfbfb7dd225bcb50278055b - 31896 rows in set (43.32 sec)

If compared to 

STRAIGHT_JOIN order:mysql> explain select goods.* from goods straight_join shops on(goods.shop_id = shops.shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486); +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+ | 1 | SIMPLE | goods | NULL | range | shop_id | shop_id | 10 | NULL | 31997 | 100.00 | Using index condition | | 1 | SIMPLE | shops | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.goods.shop_id | 1 | 100.00 | Using index | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+ 2 rows in set, 1 warning (0.14 sec) mysql> P md5sum PAGER set to 'md5sum' mysql> select goods.* from goods straight_join shops on(goods.shop_id = shops.shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486); 4a94dabc4bfbfb7dd225bcb50278055b - 31896 rows in set (7.94 sec)

The time difference for a small 8M row table is around six times! For a big table with many columns, it would be even larger.

Is

STRAIGHT_JOIN the only solution for this case?

No! It’s also not a great solution because if the query is complicated and involves more than two tables, it may be affected by bug fixes and improvements in the Optimizer code. Then the query order might not be optimal for new versions and updates. Therefore, you’ll need to test such queries at each upgrade, including minor ones.

So why does

ANALYZE TABLE not work? Because the default number of pages it uses to calculate statistics is too small for the difference. You can increase the table option STATS_SAMPLE_PAGES  until you find a proper one. The drawback is that the greater you set STATS_SAMPLE_PAGES, the longer it takes for ANALYZE TABLE to finish. Also, if you update a large portion of the table, you are often affected by lp:1538765. At some point, the statistics will again be inaccurate. Now let’s try our manual statistics update trick

InnoDB stores its persistent statistics in the tables

mysql.innodb_table_stats  and mysql.innodb_index_stats:mysql> alter table goods stats_persistent=1, stats_auto_recalc=0; Query OK, 0 rows affected (0.11 sec) Records: 0 Duplicates: 0 Warnings: 0 +---------------+------------+---------------------+---------+----------------------+--------------------------+ | database_name | table_name | last_update | n_rows | clustered_index_size | sum_of_other_index_sizes | +---------------+------------+---------------------+---------+----------------------+--------------------------+ | test | goods | 2017-09-05 00:21:12 | 7765796 | 34624 | 17600 | +---------------+------------+---------------------+---------+----------------------+--------------------------+ 1 row in set (0.00 sec) mysql> select * from mysql.innodb_index_stats where table_name='goods'; +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ | database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description | +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ | test | goods | PRIMARY | 2017-09-05 00:21:12 | n_diff_pfx01 | 7765796 | 20 | id | | test | goods | PRIMARY | 2017-09-05 00:21:12 | n_leaf_pages | 34484 | NULL | Number of leaf pages in the index | | test | goods | PRIMARY | 2017-09-05 00:21:12 | size | 34624 | NULL | Number of pages in the index | | test | goods | shop_id | 2017-09-05 00:21:12 | n_diff_pfx01 | 14523 | 20 | shop_id | | test | goods | shop_id | 2017-09-05 00:21:12 | n_diff_pfx02 | 168168 | 20 | shop_id,create_date | | test | goods | shop_id | 2017-09-05 00:21:12 | n_diff_pfx03 | 8045310 | 20 | shop_id,create_date,id | | test | goods | shop_id | 2017-09-05 00:21:12 | n_leaf_pages | 15288 | NULL | Number of leaf pages in the index | | test | goods | shop_id | 2017-09-05 00:21:12 | size | 17600 | NULL | Number of pages in the index | +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ 8 rows in set (0.00 sec)

And we can update these tables directly:

mysql> update mysql.innodb_table_stats set n_rows=8000000 where table_name='goods'; Query OK, 1 row affected (0.18 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> update mysql.innodb_index_stats set stat_value=8000000 where stat_description in('id', 'shop_id,create_date,id') and table_name='goods'; Query OK, 2 rows affected (0.08 sec) Rows matched: 2 Changed: 2 Warnings: 0 mysql> update mysql.innodb_index_stats set stat_value=100 where stat_description in('shop_id') and table_name='goods'; Query OK, 1 row affected (0.09 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> update mysql.innodb_index_stats set stat_value=169861 where stat_description in('shop_id,create_date') and table_name='goods'; Query OK, 1 row affected (0.08 sec) Rows matched: 1 Changed: 1 Warnings: 0

I took index values from earlier, as calculated by this query:

select count(distinct id) as `Cardinality for PRIMARY`, count(distinct shop_id) as `Cardinality for shop_id column in index shop_id`, count(distinct shop_id, create_date) as `Cardinality for create_date column in index shop_id` from goods;

mysql> select * from mysql.innodb_table_stats where table_name='goods'; +---------------+------------+---------------------+---------+----------------------+--------------------------+ | database_name | table_name | last_update | n_rows | clustered_index_size | sum_of_other_index_sizes | +---------------+------------+---------------------+---------+----------------------+--------------------------+ | test | goods | 2017-09-05 00:47:45 | 8000000 | 34624 | 17600 | +---------------+------------+---------------------+---------+----------------------+--------------------------+ 1 row in set (0.00 sec) mysql> select * from mysql.innodb_index_stats where table_name='goods'; +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ | database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description | +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ | test | goods | PRIMARY | 2017-09-05 00:48:32 | n_diff_pfx01 | 8000000 | 20 | id | | test | goods | PRIMARY | 2017-09-05 00:21:12 | n_leaf_pages | 34484 | NULL | Number of leaf pages in the index | | test | goods | PRIMARY | 2017-09-05 00:21:12 | size | 34624 | NULL | Number of pages in the index | | test | goods | shop_id | 2017-09-05 00:49:13 | n_diff_pfx01 | 100 | 20 | shop_id | | test | goods | shop_id | 2017-09-05 00:49:26 | n_diff_pfx02 | 169861 | 20 | shop_id,create_date | | test | goods | shop_id | 2017-09-05 00:48:32 | n_diff_pfx03 | 8000000 | 20 | shop_id,create_date,id | | test | goods | shop_id | 2017-09-05 00:21:12 | n_leaf_pages | 15288 | NULL | Number of leaf pages in the index | | test | goods | shop_id | 2017-09-05 00:21:12 | size | 17600 | NULL | Number of pages in the index | +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ 8 rows in set (0.00 sec)

Now the statistics are up to date, but not used:

mysql> explain select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486); +----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+ | 1 | SIMPLE | shops | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 100 | 100.00 | Using where; Using index | | 1 | SIMPLE | goods | NULL | ref | shop_id | shop_id | 4 | test.shops.shop_id | 534 | 11.11 | Using index condition | +----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+ 2 rows in set, 1 warning (0.04 sec)

To finalize the changes, we need to run

FLUSH TABLE goods:mysql> FLUSH TABLE goods; Query OK, 0 rows affected (0.00 sec) mysql> explain select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486); +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+ | 1 | SIMPLE | goods | NULL | range | shop_id | shop_id | 10 | NULL | 31997 | 100.00 | Using index condition | | 1 | SIMPLE | shops | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.goods.shop_id | 1 | 100.00 | Using index | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+ 2 rows in set, 1 warning (0.28 sec) mysql> P md5sum PAGER set to 'md5sum' mysql> select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486); 4a94dabc4bfbfb7dd225bcb50278055b - 31896 rows in set (7.79 sec)

Now everything is good.

But

FLUSH TABLE is a blocking operation, right? Won’t it block queries and create a worse scenario than described for ANALYZE TABLE in this post?

At first glance this is true. But we can use the same trick Percona Toolkit uses: set

lock_wait_timeout to 1 and call FLUSH in a loop. To demonstrate how it works, I use a similar scenario as described in the ANALYZE TABLE blog post.

First, let’s reset the statistics to ensure our

FLUSH works as expected:mysql> analyze table goods; +------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------+---------+----------+----------+ | test.goods | analyze | status | OK | +------------+---------+----------+----------+ 1 row in set (0.38 sec) mysql> show indexes from goods; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | goods | 0 | PRIMARY | 1 | id | A | 7765796 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 1 | shop_id | A | 14523 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 2 | create_date | A | 168168 | NULL | NULL | YES | BTREE | | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 3 rows in set (0.00 sec)

And then update

mysql.innodb_*_stats tables manually. Then check that Optimizer still sees outdated statistics:mysql> update mysql.innodb_table_stats set n_rows=8000000 where table_name='goods'; Query OK, 1 row affected (0.09 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> update mysql.innodb_index_stats set stat_value=8000000 where stat_description in('id', 'shop_id,create_date,id') and table_name='goods'; Query OK, 2 rows affected (0.09 sec) Rows matched: 2 Changed: 2 Warnings: 0 mysql> update mysql.innodb_index_stats set stat_value=100 where stat_description in('shop_id') and table_name='goods'; Query OK, 1 row affected (0.11 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> update mysql.innodb_index_stats set stat_value=169861 where stat_description in('shop_id,create_date') and table_name='goods'; Query OK, 1 row affected (0.10 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> show indexes from goods; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | goods | 0 | PRIMARY | 1 | id | A | 7765796 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 1 | shop_id | A | 14523 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 2 | create_date | A | 168168 | NULL | NULL | YES | BTREE | | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 3 rows in set (0.00 sec)

Now let’s start a long running query in one session that blocks our

FLUSH TABLE command:mysql> select sleep(1) from goods limit 1000, 300;

And let’s run

FLUSH TABLE in a loop:sveta@Thinkie:~/build/ps-5.7/mysql-test$ until (`mysqlmtr -P13001 -e "set lock_wait_timeout=1; flush table goods;" test`); do sleep 1; done ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction ...

Now let’s ensure we can access the table:

mysql> select * from goods order by id limit 10; ^C

We cannot! We cannot even connect to the database where the table is stored:

sveta@Thinkie:~/build/ps-5.7/mysql-test$ mysqlmtr -P13001 test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A ^C

The reason for this is that while the 

FLUSH TABLE command was killed due to the metadata lock wait timeout, it also requested table lock for flushing and blocked other incoming queries.

But we can enclose

FLUSH TABLE into LOCK TABLE ... WRITE; ... UNLOCK TABLES; operations. In this case, the LOCK TABLE command gets blocked until all queries release metadata lock on the table. Then it exclusively locks the table, FLUSH TABLE runs and then the script immediately unlocks the table. Since closing the session causes an implicit unlock, I used a PHP one-liner to have everything in a single session:$ php -r ' > $link = new mysqli("127.0.0.1", "root", "", "test", 13001); > $link->query("set lock_wait_timeout=1"); > while(!$link->query("lock table goods write")) {sleep(1);} > $link->query("flush table goods"); > $link->query("unlock tables");'

We can confirm if a parallel session can access the table:

mysql> select * from goods order by id limit 10; +----+---------+----------------------------------+---------------------+ | id | shop_id | name | create_date | +----+---------+----------------------------------+---------------------+ | 1 | 58 | 5K0z2sHTgjWKKdryTaniQdZmjGjA9wls | 2015-09-19 00:00:00 | | 2 | 17 | xNll02kgUTWAFURj6j5lL1zXAubG0THG | 2013-10-19 00:00:00 | | 3 | 30 | clHX7uQopKmoTtEFH5LYBgQncsxRtTIB | 2017-08-01 00:00:00 | | 4 | 93 | bAzoQTN98AmFjPOZs7PGfbiGfaf9Ye4b | 2013-02-24 00:00:00 | | 5 | 20 | rQuTO5GHjP60kDbN6WoPpE2S8TtMbrVL | 2017-08-05 00:00:00 | | 6 | 37 | WxqxA5tBHxikaKbuvbIF84H9QuaCnqQ3 | 2013-10-18 00:00:00 | | 7 | 13 | DoYnFpQZSVV8UswBsWklgGBUc8zW9mVW | 2017-02-06 00:00:00 | | 8 | 81 | dkNxMQyZNZuTrONEX4gxRLa0DOedatIs | 2015-07-05 00:00:00 | | 9 | 12 | Z0t2uQ9itexpPf01KUpa7qBWlT5fBmXR | 2014-06-25 00:00:00 | | 10 | 90 | 6urABBQyaUVVyxljvd11D3kUxbdDRPRV | 2013-10-23 00:00:00 | +----+---------+----------------------------------+---------------------+ 10 rows in set (0.00 sec) mysql> update goods set name='test' where id=100; Query OK, 1 row affected (0.08 sec) Rows matched: 1 Changed: 1 Warnings: 0

After the PHP script finishes its job, statistics are corrected:

mysql> show index from goods; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | goods | 0 | PRIMARY | 1 | id | A | 8000000 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 1 | shop_id | A | 100 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 2 | create_date | A | 169861 | NULL | NULL | YES | BTREE | | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 3 rows in set (0.00 sec)

Conclusion

We can manually update persistent InnoDB statistics to fix Optimizer plans for our queries, with almost no impact on a live server.

MySQL Team speaking at Percona Live Dublin 2017


From September 25th to 27th 2017, the MySQL engineers and community team will be speaking at Percona Live Europe in Dublin.

MySQL is also part of the contributing sponsors.

For the conference, we tried to bring new faces to the Community. Usually, it’s always the same famous team leaders speaking at the conferences, this time, it’s not. A lot of key developers will present their own work. We have so much talented people in MySQL we want to present to our community.

Of course, once again we will focus our talks on MySQL 8.0. This is the list of sessions the MySQL Team will deliver:

Monday, Sep 25th

On Monday, I will deliver a tutorial with my friend Kenny from Percona. We will focus on the migration from a Master/Slave environment to a MySQL InnoDB Cluster running on MySQL 8.0. We will also highlight the last changes in Group Replication. Bring your laptop with Virtual Box 5.1 if you plan to attend it.

Tuesday, Sep 26th Wednesday, Sep 27th

Geir will be part of the keynote this time to tell us about the current state of Sakila. I’ll also have the pleasure to deliver a session and a demo with Jan.

I’m really looking forward this now usual European conference but in a new location as after London and Amsterdam, Dublin has been chosen to host it.

Don’t forget to also come to meet the MySQL Team during the Community Dinner !

Upcoming Webinar Tuesday September 12: Differences between MariaDB® and MySQL®

Join Percona’s Chief Evangelist, Colin Charles (@bytebot) as he presents Differences Between MariaDB and MySQL on Tuesday, September 12, 2017, at 7:00 am PDT / 10:00 am EDT (UTC-7).

Reserve Your Spot

 

Are they syntactically similar? Where do these two query languages differ? Why would I use one over the other?

MariaDB is on the path to gradually diverge from MySQL. One obvious example is the internal data dictionary currently under development for MySQL 8. This is a major change to the way metadata is stored and used within the server, and MariaDB doesn’t have an equivalent feature. Implementing this feature could mark the end of datafile-level compatibility between MySQL and MariaDB.

There are also non-technical differences between MySQL and MariaDB, including:

  • Licensing: MySQL offers their code as open-source under the GPL, and provides the option of non-GPL commercial distribution in the form of MySQL Enterprise. MariaDB can only use the GPL because their work is derived from the MySQL source code under the terms of that license.
  • Support services: Oracle provides technical support, training, certification and consulting for MySQL, while MariaDB has their own support services. Some people will prefer working with smaller companies, as traditionally it affords them more leverage as a customer.
  • Community contributions: MariaDB touts the fact that they accept more community contributions than Oracle. Part of the reason for this disparity is that developers like to contribute features, bug fixes and other code without a lot of paperwork overhead (and they complain about the Oracle Contributor Agreement). However, MariaDB has its own MariaDB Contributor Agreement — which more or less serves the same purpose.

Colin will take a look at some of the differences between MariaDB and MySQL and help answer some of the common questions our Database Performance Experts get about the two databases.

You can register for the webinar here.

Colin Charles, Percona Chief Evangelist

Colin Charles is the Chief Evangelist at Percona. He was previously on the founding team of MariaDB Server in 2009, worked at MySQL since 2005 and been a MySQL user since 2000. Before joining MySQL, he worked actively on the Fedora and OpenOffice.org projects. He’s well known within open source communities in APAC and has spoken at many conferences.

Building a Web App with AdonisJS

AdonisJS is a Node.js MVC framework. It offers a stable eco-system to write web servers so that you can focus on business needs over finalizing which package to choose or not. In this tutorial, I’ll be showing you how to build a web app with AdonisJS.

What We'll Be Building

In order to see how to build applications with AdonisJS, we’ll build a simple task list (todo) application. We’ll be using AdonisJS 4.0 in this tutorial. Below is a demo of what the final application will look like:

Requirements

This tutorial assumes you have the following installed on your computer:

  • Node.js 8.0 or greater
  • Npm 3.0 or greater
Installing Adonis CLI

We need to first install the Adonis CLI which will help us in creating new AdonisJS applications and also comes with some useful commands:

npm i -g @adonisjs/cli Create new project

We'll start by creating a new AdonisJS application. We'll make use of the adonis CLI.

adonis new adonis-tasks

The command above will create a new AdonisJS application with the name adonis-tasks using the fullstack app boilerplate. To make sure everything is working as expected, let’s run the newly created app. First, we cd into adonis-tasks and run the command below:

adonis serve --dev

Then visit http://127.0.0.1:3333 in your browser, and you should get something similar to the image below:

Good! Let’s now start fleshing out the application.

Database and Migration

We’ll start by structuring the application database. We’ll be using the AdonisJS migration schema to define our application’s database schema. Before we dive into the migration, let’s quickly take time to setup our database. For the purpose of this tutorial, we’ll be using MySQL. So, we need to install Node.js driver for MySQL.

npm install mysql --save

Next, we need to make AdonisJS know we are using MySQL. Taking a look at config/database.js, you see config settings for different databases including MySQL. Though we can easily enter our MySQL settings directly in the config file, that will mean we’ll have to change these settings every time we change our application environment (development, staging, production etc.) which is actually a bad practice. Instead, we’ll make use of environment variables and depending on the environment our application is running on, it will pull the settings for that environment. AdonisJS got us covered here. All we have to do is enter our config settings in the .env file.

So, open .env and add the snippet below to it:

// .env DB_CONNECTION=mysql DB_HOST=localhost DB_DATABASE=adonis-tasks DB_USER=root DB_PASSWORD=

Remember to update the database name, username and password accordingly with your own database settings.

For simplicity our application will have only one database table which we’ll call tasks. The tasks table will contain 3 fields id, title, created_at and updated_at. We’ll make use of the adonis make:migration command to create the migration file:

adonis make:migration tasks

On prompt choose Create table option and press Enter. This will create a new file within the database/migrations directory. The file name will be a timestamp with the name of the schema (in my case 1504289855390_tasks_schema.js). Open this file and update the up() as below:

// database/migrations/1504289855390_tasks_schema.js up () { this.create('tasks', (table) => { table.increments() table.string('title') table.timestamps() }) }

The increments() will create an id field with Auto Increment and set as Primary key. The timestamps() will create the created_at and updated_at fields respectively. With that done, we can run the migration:

adonis migration:run

With our database and table set up, let’s now create a model. We’ll call it Task. Though we won’t be making extensive use of the model in this tutorial, we'll use models over writing plain database queries because they bring ease of use and provide an expressive API to drive the data flow and also allows us use Lucid (AdonisJS ORM). To make a model, we use the adonis CLI make:model command:

adonis make:model Task

This will create a Task.js within the app/Models directory.

Creating Application Routes

Open start/routes.js and update with the snippet below:

// start/routes.js Route.get('/', 'TaskController.index') Route.post('tasks', 'TaskController.store') Route.delete('tasks/:id', 'TaskController.destroy')

We define three routes for our task list application. The first route will serve as our application landing page. It is bound to the index() of the TaskController (which we’ll create shortly). The second route is a POST request which will handle adding new task to the task list. It is bound to the store() of the TaskController. Lastly, we have a route with a DELETE request which will handle deleting a task. It takes the ID of a task to be deleted as a parameter. It is bound to the destroy() of the TaskController.

Creating The Task Controller

Having defined our application’s routes and bind to methods on the TaskController, it's time to create the TaskController itself. Again, we’ll use the adonis CLI command:

adonis make:controller Task

On prompt choose For HTTP requests option and press Enter. Now we have a TaskController.js file within the app/Controllers/Http directory.

Note: Before we ran the make:controller command, the app/Controllers/Http wasn’t present. It was created after running the command.

As we have seen from the routes above, the TaskController will have 3 methods (index(), store(), and destroy()). Open TaskController.js and add the following code into it:

// app/Controllers/Http/TaskController.js // remember to reference the Task model at the top const Task = use('App/Models/Task') async index ({ view }) { const tasks = await Task.all() return view.render('tasks.index', { tasks: tasks.toJSON() }) }

The index() simply fetches all the tasks that have been created from the database and renders a view. AdonisJS uses ES7 async/await and ES6 Object Destructuring. The tasks fetched is then passed to a view file tasks.index (which we’ll create shortly).

Creating Master Layout

AdonisJS makes use of Edge as its templating engine which has support for layouts. We are going to create a master layout for our application. All view files must be within the resources/views directory. So within the directory, let’s create a new view file and name it master.edge. Edge files have the .edge extension. Open the newly created file and paste the following code in it:

<!-- resources/views/master.edge --> <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>Task List</title> {{ css('https://maxcdn.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css') }} {{ css('https://cdnjs.cloudflare.com/ajax/libs/bulma/0.5.1/css/bulma.min.css') }} </head> <body> <section class="section"> <div class="container"> <div class="columns"> <div class="column is-8 is-offset-2"> @!section('content') </div> </div> </div> </section> </body> </html>

We are using Bulma CSS framework. We use AdonisJS view’s css() global to reference our css files on CDN. The layout is simple, it contain only one section which is content.

Tips: The ! within @!section() indicate that it is a self closing section.

Creating The Task View

For simplicity our task list application will have just one view file. Every view specific stuff will be done within this view file. We are going to place this view within a tasks directory. Create a new directory named tasks within the resources/views directory, then within the task directory, create a new view file and name it index.edge. Now, open the index.edge file and paste the following code into it:

<!-- resources/views/tasks/index.edge --> @layout('master') @section('content') <div class="box"> <h1 class="title">Task List</h1> <table class="table is-bordered is-striped is-narrow is-fullwidth"> <thead> <tr> <th>SN</th> <th>Title</th> <th>Action</th> </tr> </thead> <tbody> @each(task in tasks) <tr> <td> {{ ($loop.index + 1) }} </td> <td> {{ task.title }} </td> <td> <button class="button is-danger is-outlined"> <span>DELETE</span> <span class="icon is-small"> <i class="fa fa-times" aria-hidden="true"></i> </span> </button> </td> </tr> @else <tr> <td colspan="3" class="has-text-centered">No task created yet!</td> </tr> @endeach </tbody> </table> </div> @endsection

First, we indicate we are using the master layout we created above. We simply display the tasks in a table. If there are no tasks, we display a appropriate message. For the SN of the tasks, we are using the index property of Edge’s $loop variable. The index property holds the iteration index, which starts from 0, hence the addition of 1. Lastly, we have a delete button which does nothing for now.

If we visit the application in the browser, since we haven’t added any tasks yet, we should get something similar to the image below:

Adding New Task

Let’s update the index.edge file to include a form for adding a new task. Add the following code immediately after @section('content'):

<!-- resources/views/tasks/index.edge --> <div class="box"> <h2 class="title">New Task</h2> <form action="/tasks" method="POST"> {{ csrfField() }} <div class="field has-addons"> <div class="control is-expanded"> <input class="input" type="text" name="title" value="{{ old('title', '') }}" placeholder="Task title"> </div> <div class="control"> <button type="submit" class="button is-primary"> Add Task </button> </div> </div> {{ elIf('<p class="help is-danger">$self</p>', getErrorFor('title'), hasErrorFor('title')) }} </form> </div>

It’s a simple form with one field for the title of the task. We also add a CSRF field since AdonisJS by default prevent us from CSRF attacks. Lastly, we display a validation error message if the form fails validation.

Next, we need to create the store() that will handle adding a new task to the database. Before we create this method, let’s quickly setup Adonis validator which will be used for validating our form. The validator is not installed by default, so we need to install it first:

adonis install @adonisjs/validator

Next, we need to register the provider inside start/app.js:

const providers = [ ... '@adonisjs/validator/providers/ValidatorProvider' ]

Now, let’s create the store() in TaskController.js. Paste the snippet below just after the index():

// app/Controllers/Http/TaskController.js // remember to reference the Validator at the top const { validate } = use('Validator') async store ({ request, response, session }) { // validate form input const validation = await validate(request.all(), { title: 'required|min:3|max:255' }) // show error messages upon validation fail if (validation.fails()) { session.withErrors(validation.messages()) .flashAll() return response.redirect('back') } // persist to database const task = new Task() task.title = request.input('title') await task.save() // Fash success message to session session.flash({ notification: 'Task added!' }) return response.redirect('back') }

First, we validate the requests coming from the form against some rules. If the validation fails, we simply save the validation messages to the session and return back to the form with the error messages. If everything went well, we persist the new task to the database and flash a notification message indicating that the task was added successfully then redirect to the form.

With that done, we can now add tasks to the task list. You should get something similar to the image below:

Deleting a Task

The last functionality our task list application will have is “deleting tasks”. To achieve this, we need to update the dummy delete button created earlier to include actual form for deleting a specific task. Replace the delete button entirely with the code below:

<!-- resources/views/tasks/index.edge --> <form action="{{ 'tasks/' + task.id + '?_method=DELETE' }}" method="POST"> {{ csrfField() }} <button type="submit" class="button is-danger is-outlined"> <span>DELETE</span> <span class="icon is-small"> <i class="fa fa-times" aria-hidden="true"></i> </span> </button> </form>

Remember the route handling deleting of task accepts the ID of the task as a parameter, so we are attaching the task ID to the form action. Also, we are passing the request method (DELETE in this case) as query string. This is the AdonisJS way of doing method spoofing, since HTML forms aren’t capable of making requests other than GET and POST.

Next, we add the destroy() to TaskController.js. Paste the code below into it just after the store():

// app/Controllers/Http/TaskController.js async destroy ({ params, session, response }) { const task = await Task.find(params.id) await task.delete() // Fash success message to session session.flash({ notification: 'Task deleted!' }) return response.redirect('back') }

We first get the ID of the task from the params object and then use it to retrieve the task from the database. We then delete the task thereafter. Lastly, we flash an appropriate message and redirect back to the page.

Below is what we get when we delete the task added above:

Conclusion

That’s it. We have been able to build a simple application with AdonisJS. Though this tutorial only covered the basics of AdonisJS, it should get you started in building your application with AdonisJS. I hope you find this tutorial helpful. If you have any questions, suggestions, comments, kindly leave them below.

Monitoring MySQL Problematic Queries

This blog describes how to identify queries that cause a sudden spike in system resources as well as the user and host who executed the culprit query using the Monyog MySQL Monitor and Advisor.

How many times have you seen a system go live and perform much worse than it did in testing? There could be several reasons behind bad performance. For instance, a slow running query in MySQL can be caused by a poor database design or may be due to higher-than-normal latency in network communication. Other issues such as using too few or too many indexes may also be a factor. This blog will identify the types of poorly performing queries and outline some concrete strategies for identifying them using monitoring. Finally, some tips for improving performance will be presented.

The Effects of Misbehaving Queries

Typically, misbehaving queries will result in two possible outcomes: high CPU usage and/or slow execution. The two issues tend to be related to some degree, because one will lead to or exacerbate the other. Depending on the root cause, the problem may be a database configuration or query issue. For instance, as MySQL databases grow in size, tables get fragmented over time. This contributes to MySQL load spikes. Protecting a server from MySQL high CPU issues requires close monitoring and periodic optimization of the database. Meanwhile, a query that requires a high degree of type conversion will also place a burden on the CPU.

Identifying the culprit requires a different approach based on the dominant outcome: high CPU usage and/or slow execution. In the next sections, we will examine how to track down both causes.

Some Well-known Causes of Slow-running Queries

Without knowing the root cause of a slow running query, it’s difficult for a DBA to troubleshoot the problem. Therefore, the first step should be to check efficiency of all the database components before going to use a query monitor or optimizer. This check will help to understand whether the root cause of the problem is related to a query or something else. Here are a few potential causes to consider:

  • Network latency: Is the slowness limited to a specific query, batch process, database, or are other network resources suffering as well?
  • Another reason behind SQL performance issue could be a bad index creation or accessing a bad index from the specified queries.
  • Choosing a slow execution plan may degrade performance.
  • Running a single query at a time may go smoothly, but check if running multiple queries at the same time hampers server performance.
  • If someone is experiencing a bad performance issue with database components, then a System Monitor can be quite helpful. By employing a System Monitor, performance of both database and non-database components can be monitored.
  • Ad hoc SQL queries that are run outside of a stored procedure: stored procedures almost always offer better performance because MySQL can cache their execution plans; ad hoc queries should, whenever feasible, be converted to stored procedures.
  • Long-running or CPU-heavy queries in execution plans. Table scan operations indicate the lack of a suitable index, and putting an index in place to eliminate the table scan can have an immediate and positive effect on performance.
  • Queries that include a large number of joins. Joins take time, and while MySQL Server is obviously designed to handle them, a large number of joins can really slow things down. A good general rule of thumb is to limit the number of joins to seven; if you have more than that, you may have to start looking at ways to cut back.
  • A slow-running query that always runs slowly. This is a query that could perhaps be rewritten to perform better. A query that runs slowly some of the time is one that’s likely being affected by outside factors, such as locks or resource contention.
Employing Monyog Tools

Using a Monitoring and Profiling tool such as Monyog will help in improving the performance issues that are related to queries. Monyog can display long running queries (Queries that holds a large number of resources) in MySQL, as well as a host of other potential issues, such as hanging threads and improper index usage – i.e. over or under utilization.

The Overview Page

As soon as you’ve logged into Monyog, the Overview page provides a high level picture of all the selected servers registered with Monyog. Below the count of the total servers registered with Monyog, total number of disconnected servers, and servers having critical alerts and warnings, you’ll find the top 10 queries across the selected servers in Monyog, based on total execution time:

Monyog Overview – Top 10 MySQL queries

Queries with the longest execution times are positioned at the top of the list, giving you an immediate place to start looking at bottlenecks. You can click on a query to get more details. These give the list of server names on which the particular query was executed. Clicking on the server names will open the sniffer for that server with the time-range selected as the first and last seen of the query.

Index Usage

Beyond the Overview page, the Index Usage monitor group is good starting point for identifying the presence of table scans. You should always try to keep these as low as possible by building indexes on searchable fields.

Although this screen does not relay information about specific queries, the monitor groups in Monyog read the MySQL Slow Query log do provide that information (covered in the next section). Once you have identified the problematic queries, you can create the appropriate indexes or rewrite the queries to use indexes.

Monitors tab – Index usage

Examining the MySQL Slow Query Log

The MySQL slow query log is a log that MySQL sends slow, potentially problematic queries to. Generally the queries that are logged are those that take longer than a specified amount of time to execute or queries that do not properly hit indexes.  Queries that do not use an index may not be slow if there are only a few hundred or few thousand records in the table(s) involved. But they are ‘potentially slow’ and should be identified if they access tables, which will continue to grow.

This logging functionality comes with MySQL but is turned off by default.  You can check whether or not it’s turned on from the MySQL Logs screen.

Monitors tab – MySQL logs

The No. of Slow Queries relays how many queries are taking longer than the Min. execution time for a query to be considered slow threshold (based on the MySQL long_query_time variable).  On the right, there is a chart icon that, when clicked, opens the TREND VALUES chart.

Selecting “History” from the TIMEFRAME dropdown at the top of the Monitors page allows us to group trend data by minutes, hours, days, weeks, months, or years.  More precise timeframes help to more accurately tie the offending query to the slowdown event:

Monitors tab – No. of Slow Queries

Thread Monitors

Accessible from the left-hand button bar (refer to the highlighted icon below), the Threads page shows you the number of threads currently being executed by MySQL fetched using the query SHOW FULL PROCESSLIST. Each query sent to MySQL is executed in a thread. The Threads feature should be used to check which queries are being executed presently. It gives you a general sense of what is keeping your server busy at that moment.

While not the best feature for monitoring queries executed over a period of time (that would be the Query Analyzer presented in the next section), it can nonetheless be employed to locate runaway processes.  Under the Actions heading on the far-right, there are buttons to kill each thread:

Threads

Real-time Query Monitoring

Monyog also gives us the ability to monitor MySQL servers via Real-time monitoring.  It may also provide invaluable information on poorly performing queries. To show performance metrics:

  1. Click the (Real-time) Clock icon on the left-hand side of the screen.
  2. On the next screen:
    1. Select a server to monitor.
    2. You may then choose to start a new session or load a saved one. 

Realtime MySQL Monitor

Like the Overview page, Monyog’s Query Analyzer screen also displays The Average Latency and the Total Time taken by each query to execute.  In addition, you’ll find the user and host who executed the query.

You can delve deeper into a query’s mechanics via the EXPLAIN command by clicking on the query and selecting the Explain tab on the Query Details screen:

Query Analyzer

The Explain Result’s Type column describes how tables are joined.  In the above Explain Result, we see a type of “ALL” in combination with an absence of keys.  That indicates that a full table scan is being done for each combination of rows from the previous tables. This is normally not good if the table is the first table not marked “const”, and usually very bad in all other cases. You can usually avoid ALL by adding indexes that enable row retrieval from the table based on constant values or column values from earlier tables.

MySQL queries details screen

“Performance Schema” Mode for Data Collection

The latest update of Monyog brings new easier ways to find problem SQL in Real-Time, including “sniffer” based Query Analyser using Performance Schema.  It allows you to view the success/failure status of every query executed and also the number of ‘full table scans’ for queries in a single glance.

Monyog’s query sniffer is a functionality that records a ‘pseudo server log’ and stores it in the Monyog embedded database.

The Sniffing Mode is set on the ADVANCED tab of the server properties dialog.

Server properties dialog

With ‘Performance Schema Sniffer’ enabled on the Query Analyzer screen, you can include many additional columns to the analysis, including a count of Full Table Scans.

Query Analyzer screen with manage columns panel

Here is the Query Analyzer screen in Sniffer mode with the Full Table Scan column added:

Query analyzer – sniffer

A Few More Tips…

Some other ideas that may help fix CPU problems:

  • Run SHOW FULL PROCESSLIST; while the CPU load is high.  This will show you any queries that are currently running or in the queue to run and what it’s doing.
  • Keep an eye on things like your buffer sizes, table cache, query cache and innodb_buffer_pool_size (if you’re using innodb tables) as all of these memory allocations can adversely affect query performance which can cause MySQL to consume CPU cycles.
  • If you are using WordPress, its plugins are notorious for doing monstrous queries.
  • Try modifying the innodb_buffer_pool_size parameter.  It should be set to at least the size of the file ibdata1, which is located in /var/lib/mysql.  InnoDB works much more efficiently when it is able to be resident in memory. This may be impractical in some situations because the ibdata1 can be quite large.  The innodb_log_buffer_size parameter should be 25% of the size of innodb_buffer_pool_size.
  • Give MySQL at least half of available server memory if possible.
Conclusion

By using the Monyog features described in this blog, you should be able to identify the queries and/or processes that are causing system bottlenecks.   You can then make the changes to improve the performance by modifying the query, Indexes and database design, based on Monyog’s Advice text.

SQL query-performance tuning is as much art as science, and is thought by some to belong to the realm of application development rather than Database Administration.  Under that assumption, the goal of DBAs would be to identify those slow-running or CPU-intensive queries, gather evidence and then work with developers to find ways of improving them.

Monyog is an agentless MySQL monitoring tool that can uncover key MySQL performance insights. You can download a 14-day free trial here.

The post Monitoring MySQL Problematic Queries appeared first on Webyog Blog.

Native ProxySQL Clustering now available

ProxySQL 1.4.2 now supports Native Clustering!!!

ProxySQL enters the club of the software associated with the buzz word "Cluster".
This blog post is the first in a series describing how to setup ProxySQL Cluster.
For details on the implementation, please refer to the documentation in the wiki

Disclaimer
Features described are EXPERIMENTAL and subject to change.

Preface

ProxySQL is a decentralized proxy, recommended to be deployed close to the application. This approach scales pretty well even up to hundreds of nodes, as it was designed to be easily reconfigurable at runtime.
This allows for simple configure a farm of ProxySQL instances using software like Ansible/Chef/Puppet/Salt (in alphabetical order), or service discovery tools like Etcd/Consul/ZooKeeper.
Furthermoore, ProxySQL is highly customizable, and can be adopted in any setup which makes use of these technologies, or even home-made tools.

This solution however have its drawbacks:

  • External software is required for its configuration (i.e. configuration management software)
  • Multiple instance can't sync up their configuration natively
  • Converge time it is not predictable
  • There is no protection against split brain (network partition)

In order to address the above, new features have been introduced in ProxySQL 1.4.2 to support clustering natively.
As we've already pointed, these features are EXPERIMENTAL and subject to change, especially because not all the features in the roadmap have been implemented yet.

Setting up a 3 node ProxySQL Cluster

We will start by setting up a cluster with just 3 nodes.
We will use the following bootstrap config file for /etc/proxysql.cnf.
Admin variables will mostly be set to their default values, we are specifying them in any case to enumerate the new cluster variables. What is important to note is that the cluster_username and cluster_password variables should specify credentials also listed in admin_credentials. In fact, admin_credentials can contain multiple sets of credentials.

datadir="/var/lib/proxysql" admin_variables = { admin_credentials="admin:admin;cluster1:secret1pass" mysql_ifaces="0.0.0.0:6032" cluster_username="cluster1" cluster_password="secret1pass" cluster_check_interval_ms=200 cluster_check_status_frequency=100 cluster_mysql_query_rules_save_to_disk=true cluster_mysql_servers_save_to_disk=true cluster_mysql_users_save_to_disk=true cluster_proxysql_servers_save_to_disk=true cluster_mysql_query_rules_diffs_before_sync=3 cluster_mysql_servers_diffs_before_sync=3 cluster_mysql_users_diffs_before_sync=3 cluster_proxysql_servers_diffs_before_sync=3 } proxysql_servers = ( { hostname="172.16.3.130" port=6032 comment="proxysql130" }, { hostname="172.16.3.131" port=6032 comment="proxysql131" }, { hostname="172.16.3.132" port=6032 comment="proxysql132" } )

At this stage, we can start proxysql on all the 3 nodes. As pointed out in the documention, when clustering is active proxysql generates checksums for the configuration of each active module.
Let's connect to any of the proxysql admin interface, and check the current configuration checksums:

Admin130> SELECT * FROM runtime_checksums_values ORDER BY name; +-------------------+---------+------------+--------------------+ | name | version | epoch | checksum | +-------------------+---------+------------+--------------------+ | admin_variables | 0 | 0 | | | mysql_query_rules | 1 | 1504615779 | 0x0000000000000000 | | mysql_servers | 1 | 1504615779 | 0x0000000000000000 | | mysql_users | 1 | 1504615779 | 0x0000000000000000 | | mysql_variables | 0 | 0 | | | proxysql_servers | 1 | 1504615779 | 0x474020F334F98128 | +-------------------+---------+------------+--------------------+ 6 rows in set (0.00 sec)

Note that only 4 modules are active right now:

  • mysql_query_rules
  • mysql_servers
  • mysql_users
  • proxysql_servers

Modules not active have version=0.
Modules with empty configuration have a zero valued checksum.
Epoch is a unix timestamp.

Admin130> SELECT name,FROM_UNIXTIME(epoch), checksum FROM runtime_checksums_values WHERE version ORDER BY name; +-------------------+----------------------+--------------------+ | name | FROM_UNIXTIME(epoch) | checksum | +-------------------+----------------------+--------------------+ | mysql_query_rules | 2017-09-05 12:49:39 | 0x0000000000000000 | | mysql_servers | 2017-09-05 12:49:39 | 0x0000000000000000 | | mysql_users | 2017-09-05 12:49:39 | 0x0000000000000000 | | proxysql_servers | 2017-09-05 12:49:39 | 0x474020F334F98128 | +-------------------+----------------------+--------------------+ 4 rows in set (0.00 sec)

To view the status of the whole cluster, we need to check table stats_proxysql_servers_checksums:

Admin130> SELECT * FROM stats_proxysql_servers_checksums; +--------------+------+-------------------+---------+------------+--------------------+------------+------------+------------+ | hostname | port | name | version | epoch | checksum | changed_at | updated_at | diff_check | +--------------+------+-------------------+---------+------------+--------------------+------------+------------+------------+ | 172.16.3.132 | 6032 | admin_variables | 0 | 0 | | 0 | 1504618520 | 0 | | 172.16.3.132 | 6032 | mysql_query_rules | 1 | 1504615782 | 0x0000000000000000 | 1504615782 | 1504618520 | 0 | | 172.16.3.132 | 6032 | mysql_servers | 1 | 1504615782 | 0x0000000000000000 | 1504615782 | 1504618520 | 0 | | 172.16.3.132 | 6032 | mysql_users | 1 | 1504615782 | 0x0000000000000000 | 1504615782 | 1504618520 | 0 | | 172.16.3.132 | 6032 | mysql_variables | 0 | 0 | | 0 | 1504618520 | 0 | | 172.16.3.132 | 6032 | proxysql_servers | 1 | 1504615782 | 0x474020F334F98128 | 1504615782 | 1504618520 | 0 | | 172.16.3.131 | 6032 | admin_variables | 0 | 0 | | 0 | 1504618520 | 0 | | 172.16.3.131 | 6032 | mysql_query_rules | 1 | 1504615780 | 0x0000000000000000 | 1504615781 | 1504618520 | 0 | | 172.16.3.131 | 6032 | mysql_servers | 1 | 1504615780 | 0x0000000000000000 | 1504615781 | 1504618520 | 0 | | 172.16.3.131 | 6032 | mysql_users | 1 | 1504615780 | 0x0000000000000000 | 1504615781 | 1504618520 | 0 | | 172.16.3.131 | 6032 | mysql_variables | 0 | 0 | | 0 | 1504618520 | 0 | | 172.16.3.131 | 6032 | proxysql_servers | 1 | 1504615780 | 0x474020F334F98128 | 1504615781 | 1504618520 | 0 | | 172.16.3.130 | 6032 | admin_variables | 0 | 0 | | 0 | 1504618520 | 0 | | 172.16.3.130 | 6032 | mysql_query_rules | 1 | 1504615779 | 0x0000000000000000 | 1504615780 | 1504618520 | 0 | | 172.16.3.130 | 6032 | mysql_servers | 1 | 1504615779 | 0x0000000000000000 | 1504615780 | 1504618520 | 0 | | 172.16.3.130 | 6032 | mysql_users | 1 | 1504615779 | 0x0000000000000000 | 1504615780 | 1504618520 | 0 | | 172.16.3.130 | 6032 | mysql_variables | 0 | 0 | | 0 | 1504618520 | 0 | | 172.16.3.130 | 6032 | proxysql_servers | 1 | 1504615779 | 0x474020F334F98128 | 1504615780 | 1504618520 | 0 | +--------------+------+-------------------+---------+------------+--------------------+------------+------------+------------+ 18 rows in set (0.00 sec)

Modules which are not active have version=0, empty configurations also have zero-values, and timestamps are "unix timestamps":

Admin130> SELECT hostname, port, name, version, FROM_UNIXTIME(epoch) epoch, checksum, FROM_UNIXTIME(changed_at) changed_at, FROM_UNIXTIME(updated_at) updated_at, diff_check, DATETIME('NOW') FROM stats_proxysql_servers_checksums WHERE version > 0 ORDER BY hostname, name; +--------------+------+-------------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ | hostname | port | name | version | epoch | checksum | changed_at | updated_at | diff_check | DATETIME('NOW') | +--------------+------+-------------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ | 172.16.3.130 | 6032 | mysql_query_rules | 1 | 2017-09-05 12:49:39 | 0x0000000000000000 | 2017-09-05 12:49:40 | 2017-09-05 13:44:48 | 0 | 2017-09-05 13:44:48 | | 172.16.3.130 | 6032 | mysql_servers | 1 | 2017-09-05 12:49:39 | 0x0000000000000000 | 2017-09-05 12:49:40 | 2017-09-05 13:44:48 | 0 | 2017-09-05 13:44:48 | | 172.16.3.130 | 6032 | mysql_users | 1 | 2017-09-05 12:49:39 | 0x0000000000000000 | 2017-09-05 12:49:40 | 2017-09-05 13:44:48 | 0 | 2017-09-05 13:44:48 | | 172.16.3.130 | 6032 | proxysql_servers | 1 | 2017-09-05 12:49:39 | 0x474020F334F98128 | 2017-09-05 12:49:40 | 2017-09-05 13:44:48 | 0 | 2017-09-05 13:44:48 | | 172.16.3.131 | 6032 | mysql_query_rules | 1 | 2017-09-05 12:49:40 | 0x0000000000000000 | 2017-09-05 12:49:41 | 2017-09-05 13:44:48 | 0 | 2017-09-05 13:44:48 | | 172.16.3.131 | 6032 | mysql_servers | 1 | 2017-09-05 12:49:40 | 0x0000000000000000 | 2017-09-05 12:49:41 | 2017-09-05 13:44:48 | 0 | 2017-09-05 13:44:48 | | 172.16.3.131 | 6032 | mysql_users | 1 | 2017-09-05 12:49:40 | 0x0000000000000000 | 2017-09-05 12:49:41 | 2017-09-05 13:44:48 | 0 | 2017-09-05 13:44:48 | | 172.16.3.131 | 6032 | proxysql_servers | 1 | 2017-09-05 12:49:40 | 0x474020F334F98128 | 2017-09-05 12:49:41 | 2017-09-05 13:44:48 | 0 | 2017-09-05 13:44:48 | | 172.16.3.132 | 6032 | mysql_query_rules | 1 | 2017-09-05 12:49:42 | 0x0000000000000000 | 2017-09-05 12:49:42 | 2017-09-05 13:44:48 | 0 | 2017-09-05 13:44:48 | | 172.16.3.132 | 6032 | mysql_servers | 1 | 2017-09-05 12:49:42 | 0x0000000000000000 | 2017-09-05 12:49:42 | 2017-09-05 13:44:48 | 0 | 2017-09-05 13:44:48 | | 172.16.3.132 | 6032 | mysql_users | 1 | 2017-09-05 12:49:42 | 0x0000000000000000 | 2017-09-05 12:49:42 | 2017-09-05 13:44:48 | 0 | 2017-09-05 13:44:48 | | 172.16.3.132 | 6032 | proxysql_servers | 1 | 2017-09-05 12:49:42 | 0x474020F334F98128 | 2017-09-05 12:49:42 | 2017-09-05 13:44:48 | 0 | 2017-09-05 13:44:48 | +--------------+------+-------------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ 12 rows in set (0.00 sec)

To clarify some fields:

  • epoch is the timestamp generated by the remote proxy when executing LOAD ... TO RUNTIME
  • changed_at is the timestamp recorded when the local proxy detected a change in the remote proxy
  • updated_at is the last timestamp for which the local proxy retrieved the checksum from the remote proxy
Add new users

Now that we have ProxySQL Cluster up and running, let's see how configuration is propagated from a single node to all other nodes.
Lets start by adding a new user on one of the nodes.

Admin130> SELECT * FROM mysql_users; Empty set (0.00 sec) Admin130> INSERT INTO mysql_users(username,password) VALUES ('sbtest','sbtest'); Query OK, 1 row affected (0.00 sec) Admin130> SELECT * FROM mysql_users; +----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+ | username | password | active | use_ssl | default_hostgroup | default_schema | schema_locked | transaction_persistent | fast_forward | backend | frontend | max_connections | +----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+ | sbtest | sbtest | 1 | 0 | 0 | NULL | 0 | 1 | 0 | 1 | 1 | 10000 | +----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+ 1 row in set (0.00 sec) Admin130> SELECT * FROM runtime_mysql_users; Empty set (0.00 sec)

As we can see from the above example, there are no users loaded at runtime, we have just created a new user in memory.
Time to load it into runtime and see what happens next:

Admin130> LOAD MYSQL USERS TO RUNTIME; Query OK, 0 rows affected (0.00 sec) Admin> SELECT hostname, port, name, version, FROM_UNIXTIME(epoch) epoch, checksum, FROM_UNIXTIME(changed_at) changed_at, FROM_UNIXTIME(updated_at) updated_at, diff_check, DATETIME('NOW') FROM stats_proxysql_servers_checksums WHERE version > 0 ORDER BY hostname, name; +--------------+------+-------------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ | hostname | port | name | version | epoch | checksum | changed_at | updated_at | diff_check | DATETIME('NOW') | +--------------+------+-------------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ | 172.16.3.130 | 6032 | mysql_query_rules | 1 | 2017-09-05 12:49:39 | 0x0000000000000000 | 2017-09-05 12:49:40 | 2017-09-05 13:47:52 | 0 | 2017-09-05 13:47:53 | | 172.16.3.130 | 6032 | mysql_servers | 1 | 2017-09-05 12:49:39 | 0x0000000000000000 | 2017-09-05 12:49:40 | 2017-09-05 13:47:52 | 0 | 2017-09-05 13:47:53 | | 172.16.3.130 | 6032 | mysql_users | 2 | 2017-09-05 13:47:48 | 0x7917CD487C11478F | 2017-09-05 13:47:48 | 2017-09-05 13:47:52 | 0 | 2017-09-05 13:47:53 | | 172.16.3.130 | 6032 | proxysql_servers | 1 | 2017-09-05 12:49:39 | 0x474020F334F98128 | 2017-09-05 12:49:40 | 2017-09-05 13:47:52 | 0 | 2017-09-05 13:47:53 | | 172.16.3.131 | 6032 | mysql_query_rules | 1 | 2017-09-05 12:49:40 | 0x0000000000000000 | 2017-09-05 12:49:41 | 2017-09-05 13:47:53 | 0 | 2017-09-05 13:47:53 | | 172.16.3.131 | 6032 | mysql_servers | 1 | 2017-09-05 12:49:40 | 0x0000000000000000 | 2017-09-05 12:49:41 | 2017-09-05 13:47:53 | 0 | 2017-09-05 13:47:53 | | 172.16.3.131 | 6032 | mysql_users | 2 | 2017-09-05 13:47:48 | 0x7917CD487C11478F | 2017-09-05 13:47:49 | 2017-09-05 13:47:53 | 0 | 2017-09-05 13:47:53 | | 172.16.3.131 | 6032 | proxysql_servers | 1 | 2017-09-05 12:49:40 | 0x474020F334F98128 | 2017-09-05 12:49:41 | 2017-09-05 13:47:53 | 0 | 2017-09-05 13:47:53 | | 172.16.3.132 | 6032 | mysql_query_rules | 1 | 2017-09-05 12:49:42 | 0x0000000000000000 | 2017-09-05 12:49:42 | 2017-09-05 13:47:53 | 0 | 2017-09-05 13:47:53 | | 172.16.3.132 | 6032 | mysql_servers | 1 | 2017-09-05 12:49:42 | 0x0000000000000000 | 2017-09-05 12:49:42 | 2017-09-05 13:47:53 | 0 | 2017-09-05 13:47:53 | | 172.16.3.132 | 6032 | mysql_users | 2 | 2017-09-05 13:47:48 | 0x7917CD487C11478F | 2017-09-05 13:47:49 | 2017-09-05 13:47:53 | 0 | 2017-09-05 13:47:53 | | 172.16.3.132 | 6032 | proxysql_servers | 1 | 2017-09-05 12:49:42 | 0x474020F334F98128 | 2017-09-05 12:49:42 | 2017-09-05 13:47:53 | 0 | 2017-09-05 13:47:53 | +--------------+------+-------------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ 12 rows in set (0.00 sec) Admin130> SELECT hostname, port, name, version, FROM_UNIXTIME(epoch) epoch, checksum, FROM_UNIXTIME(changed_at) changed_at, FROM_UNIXTIME(updated_at) updated_at, diff_check, DATETIME('NOW') FROM stats_proxysql_servers_checksums WHERE name='mysql_users' ORDER BY hostname, name; +--------------+------+-------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ | hostname | port | name | version | epoch | checksum | changed_at | updated_at | diff_check | DATETIME('NOW') | +--------------+------+-------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ | 172.16.3.130 | 6032 | mysql_users | 2 | 2017-09-05 13:47:48 | 0x7917CD487C11478F | 2017-09-05 13:47:48 | 2017-09-05 13:48:21 | 0 | 2017-09-05 13:48:21 | | 172.16.3.131 | 6032 | mysql_users | 2 | 2017-09-05 13:47:48 | 0x7917CD487C11478F | 2017-09-05 13:47:49 | 2017-09-05 13:48:21 | 0 | 2017-09-05 13:48:21 | | 172.16.3.132 | 6032 | mysql_users | 2 | 2017-09-05 13:47:48 | 0x7917CD487C11478F | 2017-09-05 13:47:49 | 2017-09-05 13:48:21 | 0 | 2017-09-05 13:48:21 | +--------------+------+-------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ 3 rows in set (0.00 sec)

We applied the change on 172.16.3.130 and the change was propagated everywhere else almost instantly.
Checking the log on 172.16.3.131 we can see the log entries generated during the sync:

2017-09-05 14:47:48 [INFO] Cluster: detected a new checksum for mysql_users from peer 172.16.3.130:6032, version 2, epoch 1504619268, checksum 0x7917CD487C11478F . Not syncing yet ... 2017-09-05 14:47:48 [INFO] Cluster: detected a peer 172.16.3.130:6032 with mysql_users version 2, epoch 1504619268, diff_check 3. Own version: 1, epoch: 1504615780. Proceeding with remote sync 2017-09-05 14:47:48 [INFO] Cluster: detected a peer 172.16.3.130:6032 with mysql_users version 2, epoch 1504619268, diff_check 4. Own version: 1, epoch: 1504615780. Proceeding with remote sync 2017-09-05 14:47:48 [INFO] Cluster: detected peer 172.16.3.130:6032 with mysql_users version 2, epoch 1504619268 2017-09-05 14:47:48 [INFO] Cluster: Fetching MySQL Users from peer 172.16.3.130:6032 started 2017-09-05 14:47:48 [INFO] Cluster: Fetching MySQL Users from peer 172.16.3.130:6032 completed 2017-09-05 14:47:48 [INFO] Cluster: Loading to runtime MySQL Users from peer 172.16.3.130:6032 2017-09-05 14:47:48 [INFO] Cluster: Saving to disk MySQL Query Rules from peer 172.16.3.130:6032 2017-09-05 14:47:48 [INFO] Cluster: detected a new checksum for mysql_users from peer 172.16.3.132:6032, version 2, epoch 1504619268, checksum 0x7917CD487C11478F . Not syncing yet ... 2017-09-05 14:47:48 [INFO] Cluster: checksum for mysql_users from peer 172.16.3.132:6032 matches with local checksum 0x7917CD487C11478F , we won't sync. 2017-09-05 14:47:49 [INFO] Cluster: detected a new checksum for mysql_users from peer 172.16.3.131:6032, version 2, epoch 1504619268, checksum 0x7917CD487C11478F . Not syncing yet ... 2017-09-05 14:47:49 [INFO] Cluster: checksum for mysql_users from peer 172.16.3.131:6032 matches with local checksum 0x7917CD487C11478F , we won't sync.

What happened?

  • ProxySQL detected a change in the checksum for table mysql_users
  • When diff_check reaches the threshold, the sync process is initialized
  • MySQL Users are fetched from 172.16.3.130:6032
  • MySQL Users are loaded to runtime
  • MySQL Users are also saved to disk because cluster_mysql_users_save_to_disk=true
  • A new checksum is also detected from 172.16.3.132:6032 , but it matches the local one so there is no need to sync
  • A new checksum is also detected from 172.16.3.131:6032 (that is the same proxysql instance), but it matches the local one so there is again no need to sync

Let's now add a new user directly on 172.16.3.131:6032, a different node.
We first verify what user(s) is/are present, then we add a new one:

Admin131> SELECT hostname, port, name, version, FROM_UNIXTIME(epoch) epoch, checksum, FROM_UNIXTIME(changed_at) changed_at, FROM_UNIXTIME(updated_at) updated_at, diff_check, DATETIME('NOW') FROM stats_proxysql_servers_checksums WHERE name='mysql_users' ORDER BY hostname, name; +--------------+------+-------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ | hostname | port | name | version | epoch | checksum | changed_at | updated_at | diff_check | DATETIME('NOW') | +--------------+------+-------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ | 172.16.3.130 | 6032 | mysql_users | 2 | 2017-09-05 13:47:48 | 0x7917CD487C11478F | 2017-09-05 13:47:48 | 2017-09-05 14:52:32 | 0 | 2017-09-05 14:52:32 | | 172.16.3.131 | 6032 | mysql_users | 2 | 2017-09-05 13:47:48 | 0x7917CD487C11478F | 2017-09-05 13:47:49 | 2017-09-05 14:52:32 | 0 | 2017-09-05 14:52:32 | | 172.16.3.132 | 6032 | mysql_users | 2 | 2017-09-05 13:47:48 | 0x7917CD487C11478F | 2017-09-05 13:47:48 | 2017-09-05 14:52:32 | 0 | 2017-09-05 14:52:32 | +--------------+------+-------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ 3 rows in set (0.00 sec) Admin131> SELECT * FROM mysql_users; +----------+-------------------------------------------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+ | username | password | active | use_ssl | default_hostgroup | default_schema | schema_locked | transaction_persistent | fast_forward | backend | frontend | max_connections | +----------+-------------------------------------------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+ | sbtest | *2AFD99E79E4AA23DE141540F4179F64FFB3AC521 | 1 | 0 | 0 | | 0 | 1 | 0 | 0 | 1 | 10000 | | sbtest | *2AFD99E79E4AA23DE141540F4179F64FFB3AC521 | 1 | 0 | 0 | | 0 | 1 | 0 | 1 | 0 | 10000 | +----------+-------------------------------------------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+ 2 rows in set (0.00 sec) Admin131> INSERT INTO mysql_users (username,password) VALUES ('user1','password1'); Query OK, 1 row affected (0.00 sec) Admin131> LOAD MYSQL USERS TO RUNTIME; Query OK, 0 rows affected (0.00 sec) Admin131> SELECT hostname, port, name, version, FROM_UNIXTIME(epoch) epoch, checksum, FROM_UNIXTIME(changed_at) changed_at, FROM_UNIXTIME(updated_at) updated_at, diff_check, DATETIME('NOW') FROM stats_proxysql_servers_checksums WHERE name='mysql_users' ORDER BY hostname, name; +--------------+------+-------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ | hostname | port | name | version | epoch | checksum | changed_at | updated_at | diff_check | DATETIME('NOW') | +--------------+------+-------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ | 172.16.3.130 | 6032 | mysql_users | 3 | 2017-09-05 14:53:45 | 0x7C60129E5360AC28 | 2017-09-05 14:53:45 | 2017-09-05 14:53:48 | 0 | 2017-09-05 14:53:48 | | 172.16.3.131 | 6032 | mysql_users | 3 | 2017-09-05 14:53:44 | 0x7C60129E5360AC28 | 2017-09-05 14:53:45 | 2017-09-05 14:53:48 | 0 | 2017-09-05 14:53:48 | | 172.16.3.132 | 6032 | mysql_users | 3 | 2017-09-05 14:53:45 | 0x7C60129E5360AC28 | 2017-09-05 14:53:45 | 2017-09-05 14:53:47 | 0 | 2017-09-05 14:53:48 | +--------------+------+-------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ 3 rows in set (0.00 sec)

Also in this case, configuration is immediately propagated to all the nodes.

It is interesting to note what happens when running LOAD TO RUNTIME without performing any changes.

Admin131> LOAD MYSQL USERS TO RUNTIME; Query OK, 0 rows affected (0.00 sec) Admin131> SELECT hostname, port, name, version, FROM_UNIXTIME(epoch) epoch, checksum, FROM_UNIXTIME(changed_at) changed_at, FROM_UNIXTIME(updated_at) updated_at, diff_check, DATETIME('NOW') FROM stats_proxysql_servers_checksums WHERE name='mysql_users' ORDER BY hostname, name; +--------------+------+-------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ | hostname | port | name | version | epoch | checksum | changed_at | updated_at | diff_check | DATETIME('NOW') | +--------------+------+-------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ | 172.16.3.130 | 6032 | mysql_users | 3 | 2017-09-05 14:53:45 | 0x7C60129E5360AC28 | 2017-09-05 14:53:45 | 2017-09-05 14:55:45 | 0 | 2017-09-05 14:55:45 | | 172.16.3.131 | 6032 | mysql_users | 4 | 2017-09-05 14:55:44 | 0x7C60129E5360AC28 | 2017-09-05 14:53:45 | 2017-09-05 14:55:45 | 0 | 2017-09-05 14:55:45 | | 172.16.3.132 | 6032 | mysql_users | 3 | 2017-09-05 14:53:45 | 0x7C60129E5360AC28 | 2017-09-05 14:53:45 | 2017-09-05 14:55:45 | 0 | 2017-09-05 14:55:45 | +--------------+------+-------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ 3 rows in set (0.00 sec)

On proxysql131 , version is increased by 1, but checksum doesn't change. This means that no sync is triggered as there is no need for it.
Similarly, the version on other nodes does not change, because they don't re-execute LOAD ... TO RUNTIME.

Create new query rules

We just saw how new users can be created in just one ProxySQL node, and instantly detected by other nodes.
It is now time to show to synchronize MySQL Query Rules.

Let's start by creating new query rules in a node without any specific behaviour.

Admin130> SELECT * FROM mysql_query_rules; Empty set (0.00 sec) Admin130> INSERT INTO mysql_query_rules (schemaname, destination_hostgroup) VALUES ('shard001',1); Query OK, 1 row affected (0.00 sec) Admin130> INSERT INTO mysql_query_rules (schemaname, destination_hostgroup) VALUES ('shard002',1); Query OK, 1 row affected (0.00 sec) Admin130> INSERT INTO mysql_query_rules (schemaname, destination_hostgroup) VALUES ('shard003',1); Query OK, 1 row affected (0.00 sec) Admin130> INSERT INTO mysql_query_rules (schemaname, destination_hostgroup) VALUES ('shard004',1); Query OK, 1 row affected (0.00 sec) Admin130> INSERT INTO mysql_query_rules (schemaname, destination_hostgroup) VALUES ('shard005',2), ('shard006',2), ('shard007',2), ('shard008',2); Query OK, 4 rows affected (0.00 sec) Admin130> UPDATE mysql_query_rules SET active=1, apply=1 WHERE destination_hostgroup IN (1,2); Query OK, 8 rows affected (0.00 sec)

Let's check the current query rules:

Admin130> SELECT rule_id,schemaname,destination_hostgroup,apply FROM mysql_query_rules WHERE active=1; +---------+------------+-----------------------+-------+ | rule_id | schemaname | destination_hostgroup | apply | +---------+------------+-----------------------+-------+ | 1 | shard001 | 1 | 1 | | 2 | shard002 | 1 | 1 | | 3 | shard003 | 1 | 1 | | 4 | shard004 | 1 | 1 | | 5 | shard005 | 2 | 1 | | 6 | shard006 | 2 | 1 | | 7 | shard007 | 2 | 1 | | 8 | shard008 | 2 | 1 | +---------+------------+-----------------------+-------+ 8 rows in set (0.00 sec)

On another node, say proxysql131, there are no rules:

Admin131> SELECT rule_id,schemaname,destination_hostgroup,apply FROM mysql_query_rules WHERE active=1; Empty set (0.00 sec) Admin131> SELECT rule_id,schemaname,destination_hostgroup,apply FROM runtime_mysql_query_rules WHERE active=1; Empty set (0.00 sec)

Now it is time to load the rules into runtime on proxysql130:

Admin130> SELECT name,FROM_UNIXTIME(epoch), checksum FROM runtime_checksums_values WHERE name='mysql_query_rules'; +-------------------+----------------------+--------------------+ | name | FROM_UNIXTIME(epoch) | checksum | +-------------------+----------------------+--------------------+ | mysql_query_rules | 2017-09-05 23:03:53 | 0x0000000000000000 | +-------------------+----------------------+--------------------+ 1 row in set (0.00 sec) Admin130> SELECT hostname, port, name, version, FROM_UNIXTIME(epoch) epoch, checksum, FROM_UNIXTIME(changed_at) changed_at, FROM_UNIXTIME(updated_at) updated_at, diff_check, DATETIME('NOW') FROM stats_proxysql_servers_checksums WHERE name='mysql_query_rules' ORDER BY hostname, name; +--------------+------+-------------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ | hostname | port | name | version | epoch | checksum | changed_at | updated_at | diff_check | DATETIME('NOW') | +--------------+------+-------------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ | 172.16.3.130 | 6032 | mysql_query_rules | 1 | 2017-09-05 23:03:53 | 0x0000000000000000 | 2017-09-05 23:03:53 | 2017-09-05 23:04:47 | 0 | 2017-09-05 23:04:47 | | 172.16.3.131 | 6032 | mysql_query_rules | 1 | 2017-09-05 23:03:53 | 0x0000000000000000 | 2017-09-05 23:03:54 | 2017-09-05 23:04:47 | 0 | 2017-09-05 23:04:47 | | 172.16.3.132 | 6032 | mysql_query_rules | 1 | 2017-09-05 23:03:54 | 0x0000000000000000 | 2017-09-05 23:03:55 | 2017-09-05 23:04:47 | 0 | 2017-09-05 23:04:47 | +--------------+------+-------------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ 3 rows in set (0.00 sec) Admin130> LOAD MYSQL QUERY RULES TO RUNTIME; Query OK, 0 rows affected (0.00 sec) Admin130> SELECT name,FROM_UNIXTIME(epoch), checksum FROM runtime_checksums_values WHERE name='mysql_query_rules'; +-------------------+----------------------+--------------------+ | name | FROM_UNIXTIME(epoch) | checksum | +-------------------+----------------------+--------------------+ | mysql_query_rules | 2017-09-05 23:04:51 | 0xE2F5A21142C799C0 | +-------------------+----------------------+--------------------+ 1 row in set (0.00 sec) Admin130> SELECT hostname, port, name, version, FROM_UNIXTIME(epoch) epoch, checksum, FROM_UNIXTIME(changed_at) changed_at, FROM_UNIXTIME(updated_at) updated_at, diff_check, DATETIME('NOW') FROM stats_proxysql_servers_checksums WHERE name='mysql_query_rules' ORDER BY hostname, name; +--------------+------+-------------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ | hostname | port | name | version | epoch | checksum | changed_at | updated_at | diff_check | DATETIME('NOW') | +--------------+------+-------------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ | 172.16.3.130 | 6032 | mysql_query_rules | 2 | 2017-09-05 23:04:51 | 0xE2F5A21142C799C0 | 2017-09-05 23:04:51 | 2017-09-05 23:06:22 | 0 | 2017-09-05 23:06:23 | | 172.16.3.131 | 6032 | mysql_query_rules | 2 | 2017-09-05 23:04:52 | 0xE2F5A21142C799C0 | 2017-09-05 23:04:52 | 2017-09-05 23:06:23 | 0 | 2017-09-05 23:06:23 | | 172.16.3.132 | 6032 | mysql_query_rules | 2 | 2017-09-05 23:04:52 | 0xE2F5A21142C799C0 | 2017-09-05 23:04:52 | 2017-09-05 23:06:23 | 0 | 2017-09-05 23:06:23 | +--------------+------+-------------------+---------+---------------------+--------------------+---------------------+---------------------+------------+---------------------+ 3 rows in set (0.00 sec)

As we can see from the above, the information is detected immediately by the other nodes and they immediately sync with the node that has the most recent changes.

Let's check the logs on proxysql131:

2017-09-06 00:04:51 ProxySQL_Cluster.cpp:325:set_checksums(): [INFO] Cluster: detected a new checksum for mysql_query_rules from peer 172.16.3.130:6032, version 2, epoch 1504652691, checksum 0xE2F5A21142C799C0 . Not syncing yet ... 2017-09-06 00:04:52 ProxySQL_Cluster.cpp:473:set_checksums(): [INFO] Cluster: detected a peer 172.16.3.130:6032 with mysql_query_rules version 2, epoch 1504652691, diff_check 3. Own version: 1, epoch: 1504652633. Proceeding with remote sync 2017-09-06 00:04:52 ProxySQL_Cluster.cpp:473:set_checksums(): [INFO] Cluster: detected a peer 172.16.3.130:6032 with mysql_query_rules version 2, epoch 1504652691, diff_check 4. Own version: 1, epoch: 1504652633. Proceeding with remote sync 2017-09-06 00:04:52 ProxySQL_Cluster.cpp:1106:get_peer_to_sync_mysql_query_rules(): [INFO] Cluster: detected peer 172.16.3.130:6032 with mysql_query_rules version 2, epoch 1504652691 2017-09-06 00:04:52 ProxySQL_Cluster.cpp:572:pull_mysql_query_rules_from_peer(): [INFO] Cluster: Fetching MySQL Query Rules from peer 172.16.3.130:6032 started 2017-09-06 00:04:52 ProxySQL_Cluster.cpp:622:pull_mysql_query_rules_from_peer(): [INFO] Cluster: Fetching MySQL Query Rules from peer 172.16.3.130:6032 completed 2017-09-06 00:04:52 ProxySQL_Cluster.cpp:623:pull_mysql_query_rules_from_peer(): [INFO] Cluster: Loading to runtime MySQL Servers from peer 172.16.3.130:6032 2017-09-06 00:04:52 ProxySQL_Cluster.cpp:626:pull_mysql_query_rules_from_peer(): [INFO] Cluster: Saving to disk MySQL Query Rules from peer 172.16.3.130:6032 2017-09-06 00:04:52 ProxySQL_Cluster.cpp:325:set_checksums(): [INFO] Cluster: detected a new checksum for mysql_query_rules from peer 172.16.3.132:6032, version 2, epoch 1504652692, checksum 0xE2F5A21142C799C0 . Not syncing yet ... 2017-09-06 00:04:52 ProxySQL_Cluster.cpp:327:set_checksums(): [INFO] Cluster: checksum for mysql_query_rules from peer 172.16.3.132:6032 matches with local checksum 0xE2F5A21142C799C0 , we won't sync. 2017-09-06 00:04:52 ProxySQL_Cluster.cpp:325:set_checksums(): [INFO] Cluster: detected a new checksum for mysql_query_rules from peer 172.16.3.131:6032, version 2, epoch 1504652692, checksum 0xE2F5A21142C799C0 . Not syncing yet ... 2017-09-06 00:04:52 ProxySQL_Cluster.cpp:327:set_checksums(): [INFO] Cluster: checksum for mysql_query_rules from peer 172.16.3.131:6032 matches with local checksum 0xE2F5A21142C799C0 , we won't sync.

Also in this case, we can see that a change is detected, and after a short period of time all nodes are synced.

Conclusion

In this first blog post about ProxySQL Cluster we showed how multiple ProxySQL instances are aware of each other, and how we can create new users and new query rules to any of the proxysql instances as well as how this information will immediately be propagated to all the other nodes.
Stay tuned for future blog post on ProxySQL Cluster.

This Week in Data with Colin Charles #5: db tech showcase and Percona Live Europe

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Percona Live Europe 2017 in Dublin

Have you registered for Percona Live Europe Dublin? We have announced some awesome keynotes, and our sponsor list is growing (and we’re always looking for more!).

There will also be a community dinner (Tuesday, September 26, 2017), so definitely watch the announcement that will be on the blog, and I’m sure on Twitter. Besides being fun, the Lightning Talks will happen during that time.

Releases Link List db tech showcase Tokyo, Japan

The annual db tech showcase Tokyo 2017 took place this week from 5-7 September. It was a fun event as always, with capacity for 800 people per day. The event grows larger each year, and reminds me of the heyday of the MySQL Conference & Expo.

The db tech showcase is a five-parallel-track show, with each talk approximately 50 minutes. The event started with a keynote by Richard Hipp, creator of SQLite (if you were a Percona Live Santa Clara 2017 attendee, you’d have also seen him there). The rest of the event is a mix between Japanese language content and English language content. The sponsor list is lengthy, and if you walk the floor you could collect a lot of datasheets.

One thing I really liked? At some talks, you’d get a clear folder with a contact form as well as the printed slide deck. This is a great way to let the speaker’s company contact you. It’s a common issue that I (and others) speak to large amounts of people and have no idea who’s in the talk. I can only imagine our marketing and sales teams being much happier if they could get access to an attendee list! I wonder if this will work in other markets?

It’s interesting to see that there is a Japan MariaDB User Group now. It’s clear the MySQL user group needs a revival! I saw a talk from Toshiba on performance tests using MariaDB Server, but not with MySQL (a little odd?). The MongoDB content was pretty latent, which is unsurprising because we don’t see a huge MongoDB uptake or community in Japan (or South Korea for that matter).

Will I go back? Absolutely. I’ve been going for a few years, and it’s a great place for people who are crazy about database technology. You really get a spectrum of database presentations, and I expect most people go back with many ideas of what they might want to evaluate for production.

I spoke about the Engineering that goes into Percona Server for MySQL 5.6 and 5.7, with a hint of MongoDB. The slides are in a mix of Japanese and English. The Japanese translation: Percona ServerをMySQL 5.6と5.7用に作るエンジニアリング(そしてMongoDBのヒント).

Upcoming Appearances

Percona’s website keeps track of community events, so check there to see where to listen to a Perconian speak. My upcoming appearances are:

Feedback

Did you try replication-manager last week? Guillaume Lefranc, the lead developer, writes in to talk about the new features such as support for MySQL 5.7, Binlog Flashback, multi-cluster mode and various stability fixes.

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Releasing ProxySQL 1.4.2

Releasing ProxySQL 1.4.2

ProxySQL is a high performance, high availability, protocol aware proxy for MySQL, with a GPL license!

Today I am excited to announce the release of ProxySQL 1.4.2, the second stable release of series 1.4.x .

Binaries are available here .

This release introduce several bug fixes and enhancements compared to previous version:

  • introduced experimental native Cluster solution
  • fixes random failure in handling PREPARE #1169
  • don't use connections from connection pool for fast_forward users #1157
  • reset SQL_MODE and TIME_ZONE when a connection is reset #1160
  • added reload command in init script #1154
  • several improvements to Galera checker #1158 and #981
  • adds option autocommit_false_not_reusable #1144
  • does not return ERR for MYSQL_OPTION_MULTI_STATEMENTS_ON #1122
  • fix compiling issue in FreeBSD
  • reintroduced binaries for Ubuntu12
  • fixed memory leaks in auxiliary threads
  • several performance improvements

The most important news is the clustering solution built-in directly into ProxySQL .
A series of blog post will follow with further details.
Stay tuned!!

Once again, a special thanks to all the people that reports bugs and submit pull request: this makes each version of ProxySQL better than the previous one.
Please report any bugs or feature requests on github issue tracker

Thanks

Linkbench - in-memory, low-concurrency

After a few weeks of sysbench it is now time for more complex workloads and the first one is Linkbench with a cached database and low-concurrency. I prefer to start with cached & low-concurrency configurations before trying IO-bound & high-concurrency.

tl;dr:
  • InnoDB from MySQL 5.6 had the best throughput
  • CPU efficiency is similar for MyRocks and InnoDB
  • There is a CPU regression from MySQL 5.6 to 5.7 to 8.x
  • Write efficiency was similar for all engines on the load test but much better for MyRocks and TokuDB on the transaction test.

Configuration

I used my Linkbench repo and helper scripts to run sysbench with maxid1=10M, loaders=1 and requestors=1 so there will be 2 concurrent connections doing the load and 1 connection running transactions after the load finishes. My linkbench repo has a recent commit that changes the Linkbench workload and results with that change are labeled new config while results without the change are labeled old config. I share both. The test pattern is 1) load and 2) transactions. The transactions were run in 12 1-hour loops and I share results from the last hour. The test server has 48 HW threads, fast SSD and 256gb of RAM.

Tests were run for MyRocks, InnoDB from upstream MySQL and TokuDB. The binlog was enabled but sync on commit was disabled for the binlog and database log. All engines used jemalloc. Mostly accurate my.cnf files are here but the database cache was made large enough to cache the ~10gb database.
  • MyRocks was compiled on August 15 with git hash 0d76ae. Compression was not used.
  • InnoDB was from upstream 5.6.35, 5.7.17 and 8.0.2.  The performance schema was enabled.
  • TokuDB was from Percona Server 5.7.17. Compression was not used.

Load Results

All of the data is here and note that I did not adjust metrics computed from iostat to account for iostat overstating bytes written by up to 2X for RocksDB. The table below has a subset of the results for the new config. The results for the old config are similar.
  • InnoDB 5.6 has the best insert rate but there is a regression from InnoDB in 5.6 to 5.7 to 8.0.2
  • Write efficiency (wkb/i) is similar for all engines
  • CPU efficiency (Mcpu/i) is similar for MyRocks and InnoDB

ips     wkb/i   Mcpu/i  size    wMB/s   cpu     engine  54283  1.60     83     14       86.7    4.5    myrocks  64402  1.02     72     16       65.6    4.6    inno5635  56414  1.03     77     16       58.2    4.3    inno5717  42954  1.02     97     16       45.1    4.2    inno802  21611  1.42    179     14       30.7    3.9    toku5717
legend: * ips - inserts/second * wkb/i - iostat KB written per insert * Mcpu/i - normalized CPU time per insert * wMB/s - iostat write MB/s, average * size - database size in GB at test end * cpu - average value of vmstat us + sy columns

Transaction Results
These are results from the 12th 1-hour loop of the transaction phase. All of the data is here and note that I did not adjust metrics computed from iostat to account for iostat overstating bytes written by up to 2X for RocksDB. The table below has a subset of the results for the new config. I will explain them. The results for the old config are similar.
  • InnoDB 5.6 has the best transaction rate but there is a regression from 5.6 to 5.7 to 8.0.2
  • Write efficiency (wkb/t) is much better for TokuDB and MyRocks than for InnoDB
  • CPU efficiency (Mcpu/t) is similar for MyRocks and InnoDB
  • Response times are similar between MyRocks and InnoDB

tps     wkb/t   Mcpu/t  size  un    gn    ul    gl    wMB/s  engine 5489    0.78     642    15    0.3   0.1   0.5   0.5    4.3   myrocks 7239    5.15     524    26    0.3   0.1   0.4   0.2   37.3   inno5635 6463    5.17     580    26    0.3   0.1   0.5   0.3   33.4   inno5717 5855    5.25     623    25    0.3   0.1   0.6   0.3   30.7   inno802 3333    0.08    1043    18    0.6   0.2   1.0   0.8   10.0   toku5717
legend: * tps - transactions/second * wkb/t - iostat KB written per transaction * Mcpu/t - normalized CPU time per transaction * size - database size in GB at test end * un, gn, ul, gl - 99th percentile response time in millisecs for UpdateNode,                    GetNode, UpdateList and GetLinkedList transactions * wMB/s - iostat write MB/s, average

Charts
Charts for load and transaction throughput using the ips and tps columns from the tables above.

Percona Live from the Emerald Isle: Containerised Dolphins, MySQL Load Balancers, MongoDB Management & more - for plenty of 9s

Yes, it’s that time of year again: the Percona Live Europe Conference is just around the corner.

And we’ll be broadcasting live from the Emerald Isle!

Quite literally, since we’re planning to be on our Twitter and Facebook channels during the course of the conference, so do make sure to tune in (if you’re not attending in person). And this year’s location is indeed Dublin, Ireland, so expect lots of banter and a bit of craic.

More specifically though, the Severalnines team will be well represented with three talks and a booth in the exhibition area. So do come and meet us there. If you haven’t registered yet, there’s still time to do so on the conference website: https://www.percona.com/live/e17/

Our talks at the conference:

Ashraf Sharif, Senior Support Engineer will be talking about MySQL on Docker and containerised dolphins (which only sounds like a good idea in the database world).

Krzysztof Książek, Senior Support Engineer, will share his knowledge and experience on all things MySQL load balancing.

And Ruairí Newman, also Senior Support Engineer, will be discussing what some of the main considerations are to think through when looking at automating and managing MongoDB - including a closer look at MongoDB Ops Manager and ClusterControl.

Related resources  MySQL on Docker - Understanding the Basics  ClusterControl for MongoDB  MySQL Load Balancing

And since we don’t solely employ Senior Support Engineers at Severalnines, you’ll be pleased to know that Andrada Enache, Sales Manager, and myself will also be present to talk open source database management with ClusterControl at our booth.

This year’s conference agenda looks pretty exciting overall with a wide enough range of topics, so there’ll be something of interest for every open source database aficionado out there.

See you in Dublin ;-)

Tags:  percona live docker MongoDB MySQL

Heads Up: The List of Replication Defaults That Have Changed in 8.0.2

In development milestone release (DMR) version 8.0.2 we are changing several replication options. The motivation behind this is simply that we want our users to enjoy default installations with the best efficient setup, configuration and performance. We also don’t want users to struggle getting them to work optimally.…

Always Verify Examples When Comparing DB Products (PostgreSQL and MySQL)

In this blog post, I’ll look at a comparison of PostgreSQL and MySQL.

I came across a post from Hans-Juergen Schoenig, a Postgres consultant at Cybertec. In it, he dismissed MySQL and showed Postgres as better. While his post ignores most of the reasons why MySQL is better, I will focus on where his post is less than accurate. Testing for MySQL was done with Percona Server 5.7, defaults.

Mr. Schoenig complains that MySQL changes data types automatically. He claims inserting 1234.5678 into a numeric(4, 2) column on Postgres produces an error, and that MySQL just rounds the number to fit. In my testing I found this to be a false claim:

mysql> CREATE TABLE data ( -> id integer NOT NULL, -> data numeric(4, 2)); Query OK, 0 rows affected (0.07 sec) mysql> INSERT INTO data VALUES (1, 1234.5678); ERROR 1264 (22003): Out of range value for column 'data' at row 1

His next claim is that MySQL allows updating a key column to NULL and silently changes it to 0. This is also false:

mysql> INSERT INTO data VALUES (1, 12); Query OK, 1 row affected (0.00 sec) mysql> UPDATE data SET id = NULL WHERE id = 1; ERROR 1048 (23000): Column 'id' cannot be null

In the original post, we never see the warnings and so don’t have the full details of his environment. Since he didn’t specify which version he was testing on, I will point out that MySQL 5.7 does a far better job out-of-the-box handling your data than 5.6 does, and SQL Mode has existed in MySQL for ages. Any user could set it to

STRICT_ALL|TRANS_TABLES and get the behavior that is now default in 5.7.

The author is also focusing on a narrow issue, using it to say Postgres is better. I feel this is misleading. I could point out factors in MySQL that are better than in Postgres as well.

This is another case of “don’t necessarily take our word for it”. A simple test of what you see on a blog can help you understand how things work in your environment and why.

IO-bound table scan performance for MyRocks, InnoDB and TokuDB

I used sysbench to compare IO-bound table scan performance for MyRocks, InnoDB and TokuDB. Tests were run on a large server with fast SSD, Intel NUCs with SSD and an Intel NUC with disk. I call this IO-bound because for all tests the table was larger than RAM.

tl;dr
  • MyRocks can be more than 2X slower than for InnoDB. 
  • InnoDB in 5.7 does better than in 5.6
  • TokuDB without compression is comparable to InnoDB without compression and does much better than InnoDB when prefetching is enabled.
  • Compression usually has a small impact on scan performance for MyRocks with zstd and a much larger impact for TokuDB with zlib. I wonder how much of this is a measure of zstd vs zlib.
  • Scans were usually slower for all engines after fragmentation but the impact was larger for MyRocks and TokuDB than for InnoDB.

Configuration

I used my sysbench helper scripts with my sysbench branch. For tests with X tables there was 1 connection per table doing a full scan and when X > 1 the scans were concurrent. The scan time was measured twice -- first immediately after the load and index step and then after many updates have been applied. The second measurement was done to show the impact of fragmentation on scan performance.

I repeated tests on different hardware:
  • 48core.ssd - server has 48 HW threads, fast SSD and 50gb of RAM. Tests were done with 8 tables and 100M rows/table and then 1 table with 800M rows.
  • i3.ssd - a core i3 Intel NUC with Samsung 850 SSD, 8gb of RAM and 4 HW threads. The test used 2 tables and 80M rows/table.
  • i3.disk - a core i3 Intel NUC with 1 disk, 8gb of RAM and 4 HW threads. The test used 1 table and 160M rows/table.
I repeated tests for MyRocks, InnoDB and TokuDB:
  • I compiled MyRocks on August 15 with git hash 0d76ae. The MyRocks tests were done without compression and with zstd compression (myrocks.none, myrocks.zstd). I did one test for MyRocks with a binary that did not use special instructions to make crc32 faster (myrocks.none.slowcrc) and learned that fast crc doesn't make a difference on this test. It would be a bigger deal for an IO-bound test doing point queries.
  • I used TokuDB from Percona Server 5.7.17. The TokuDB tests were done without compression and with zlib compression. I tried tokudb_disable_prefetching ON and OFF (toku5717.none, toku5717.none.prefetch), but I have been setting this to ON for my OLTP benchmarks because enabling it ruined some OLTP results.
  • I used InnoDB from upstream 5.6.35 and 5.7.17. The performance_schema was enabled. The InnoDB tests did not use compression. 

Results

The results below list the number of seconds to scan the table(s) and the time relative to InnoDB from MySQL 5.6.35. For the relative time a value greater than 1 means the engine is slower than InnoDB. These values are reported for pre and post where pre is the measurement taken immediately after loading the table and creating the secondary index and post is the measurement taken after applying random updates to the table(s).

See tl;dr above for what I learned from these results.

Large server


These are results from 8 tables with 100M rows/table and then 1 table and 800M rows/table on the large server.

48core.ssd - 8t x 100m
pre     pre     post    post    engine
secs    ratio   secs    ratio
221     2.302   246     2.256   myrocks.none
201     2.093   211     1.935   myrocks.zstd
 96     1.000   109     1.000   inno5635
 75     0.781    86     0.788   inno5717
 67     0.697    94     0.862   touk5717.none
 39     0.406    69     0.633   toku5717.none.prefetch
190     1.979   224     2.055   toku5717.zlib

48core.ssd - 1t x 800m
pre     pre     post    post    engine
secs    ratio   secs    ratio
 638    1.065   1032    1.627   myrocks.none
 916    1.529   1063    1.676   myrocks.zstd
 599    1.000    634    1.000   inno5635
 434    0.724    449    0.708   inno5717
 513    0.856    735    1.159   toku5717.none
 249    0.415    502    0.791   toku5717.none.prefetch
1525    2.545   1776    2.801   toku5717.zlib

Intel NUC

These are results from the Intel NUC using SSD and then a disk.

i3.ssd - 2t x 80m
pre     pre     post    post    engine
secs    ratio   secs    ratio
181     1.448   192     1.560   myrocks.none
182     1.456   189     1.536   myrocks.none.slowcrc
219     1.752   238     1.934   myrocks.zstd
125     1.000   123     1.000   inno5635
114     0.912   107     0.869   inno5717

i3.disk - 1t x 160m
pre     pre     post    post    engine
secs    ratio   secs    ratio
330     1.304   348     1.343   myrocks.none
432     1.707   451     1.741   myrocks.zstd
253     1.000   259     1.000   inno5635
257     1.015   261     1.007   inno5717

Charts

Below are charts from the large server tests for 8 tables & 100M rows/table and then 1 table with 800M rows.

Lesson 02: Installing MySQL

Notes/errata/updates for Chapter 2:
See the official book errata at http://tahaghoghi.com/LearningMySQL/errata.php – Chapter 2 includes pages 9-93 (but we skip pages 83-92). It seems like a lot of pages, however you will skip the operating systems that do not apply to you. Do NOT compile or install from tarball or source; just use the packages that are pre-made. You will want the latest version of MySQL, which at the time of this writing is MySQL 5.7.

You should install the latest version of MySQL, which can be downloaded from http://dev.mysql.com/downloads/mysql/ If you want to install something else, you can install MariaDB or Percona’s patched version.

Note that you do NOT need to install Apache, Perl or PHP. You can skip pages 83-92.

On p. 10, it says that “The MySQL Manual says that you can get a performance increase of up to 30 percent if you compile the code with the ideal settings for your environment.” This was true up through MySQL 5.1 (see http://dev.mysql.com/doc/refman/5.1/en/compile-and-link-options.html) However, with MySQL 5.5 and newer, this is no longer true.

On p. 62, it talks about checking to make sure the mysql user and group are on the machine by using NetInfo Manager. The NetInfo Manager was taken out of Mac OS X Leopard (10.7) and above. Just skip the paragraph starting “To check using the NetInfo Manager”, and proceed to the paragraph with “You can instead check these settings from the shell prompt.”

On p. 71, it talks about the MySQL Migration Toolkit and says it’s part of the “MySQL GUI Tools Bundle”. These days, it’s part of MySQL Workbench.

On p. 75 -78, the book talks about setting your path, which probably should be under “configuration” instead of the troubleshooting section…you might think once you get MySQL installed, you can skip that section, but you need to read it anyway.

On p. 93, it lists http://forge.mysql.com as a resource, but that website has been deprecated since the book was published.

Topics covered:
Installing MySQL on Linux, Mac OS X and Windows.

Verifying packages with MD5

Configuring a new server

Reference/Quick Links for MySQL Marinate

Upcoming Webinar Thursday, September 7: Using PMM to Troubleshoot MySQL Performance Issues

Join Percona’s Product Manager, Michael Coburn as he presents Using Percona Monitoring and Management to Troubleshoot MySQL Performance Issues on Thursday, September 7, 2017, at 10:00 am PDT / 1:00 pm EDT (UTC-7).

Reserve Your Spot

 

Successful applications often become limited by MySQL performance. Michael will show you how to get great MySQL performance using Percona Monitoring and Management (PMM). There will be a demonstration of how to leverage the combination of the query analytics and metrics monitor when troubleshooting MySQL performance issues. We’ll review the essential components of PMM, and use some of the most common database slowness cases as examples of where to look and what to do.

By the end of the webinar you will have a better understanding of:

  • Query metrics, including bytes sent, lock time, rows sent, and more
  • Metrics monitoring
  • How to identify MySQL performance issues
  • Point-in-time visibility and historical trending of database performance

Register for the webinar here.

Michael Coburn, Product Manager Michael joined Percona as a Consultant in 2012 after having worked with high volume stock photography websites and email service provider platforms. WIth a foundation in systems administration, Michael enjoys working with SAN technologies and high availability solutions. A Canadian, Michael currently lives in the Nicoya, Costa Rica area with his wife, two children, and two dogs.

MyRocks Experimental Now Available with Percona Server for MySQL 5.7.19-17

Percona, in collaboration with Facebook, is proud to announce the first experimental release of MyRocks in Percona Server for MySQL 5.7, with packages.

Back in October of 2016, Peter Zaitsev announced that we were going to port MyRocks from Facebook MySQL to Percona Server for MySQL.

Then in April 2017, Vadim Tkachenko announced the availability of experimental builds of Percona Server for MySQL with the MyRocks storage engine.

Now in September 2017, we are pleased to announce the first full experimental release of MyRocks with packages for Percona Server for MySQL 5.7.19-17.

The basis of the MyRocks storage engine is the RocksDB key-value store, which is a log-structured merge-tree (or LSM). It uses much less space, and has a much smaller write volume (write amplification) compared to a B+ tree database implementation such as InnoDB. As a result, MyRocks has the following advantages compared to other storage engines, if your workload uses fast storage (such as SSD):

  • Requires less storage space
  • Provides more storage endurance
  • Ensures better IO capacity

Percona MyRocks is distributed as a separate package that can be enabled as a plugin for Percona Server for MySQL 5.7.19-17.

WARNING: Percona MyRocks is currently considered experimental and is not yet recommended for production use.

We are providing packages for most popular 64-bit Linux distributions:

  • Debian 8 (“jessie”)
  • Debian 9 (“stretch”)
  • Ubuntu 14.04 LTS (Trusty Tahr)
  • Ubuntu 16.04 LTS (Xenial Xerus)
  • Ubuntu 16.10 (Yakkety Yak)
  • Ubuntu 17.04 (Zesty Zapus)
  • Red Hat Enterprise Linux or CentOS 6 (Santiago)
  • Red Hat Enterprise Linux or CentOS 7 (Maipo)

Installation instructions can be found here.

Due to the differences between Facebook MySQL 5.6.35 and Percona Server for MySQL 5.7, there are some behavioral differences and additional limitations. Some of these are documented here.

We encourage you to install and experiment with MyRocks for Percona Server for MySQL and join the discussion here.

Any issues that you might find can be searched for and reported here.

We thank the RocksDB and MyRocks development teams at Facebook for providing the foundation and assistance in developing MyRocks for Percona Server for MySQL. Without their efforts, this would not have been possible.

Timing load & index for sysbench tables

This post compares MyRocks, InnoDB and TokuDB on the time required to load and index a table for sysbench.

tl;dr
  • MyRocks, InnoDB and TokuDB have similar load performance although there is a regression for InnoDB from 5.6 to 5.7 to 8.x
  • InnoDB create index is much faster starting in 5.7

Configuration

I used my sysbench helper scripts with my sysbench branch and configured it to create 1 table with 800M rows. The binlog was enabled but sync on commit was disabled for the binlog and database log. The sysbench client shared the host with mysqld. The host has 48 HW threads, 50gb of RAM for the OS and MySQL and fast SSD. The test table is larger than RAM but it will take me a few days to get details on that. The test was repeated for MyRocks, InnoDB and TokuDB. I continue to use the IO-bound setup as described previously.
  • I compiled MyRocks on August 15 with git hash 0d76ae. The MyRocks tests were done without compression and with zstd compression. 
  • I used TokuDB from Percona Server 5.7.17. The TokuDB tests were done without compression and with zlib compression. I tried tokudb_disable_prefetching ON and OFF, but I have been setting this to ON for my OLTP benchmarks. 
  • I used InnoDB from upstream 5.6.35, 5.7.17, 8.0.1 and 8.0.2. For 8.x I used latin1/latin1_swedish_ci charset/collation. The performance_schema was enabled. The InnoDB tests did not use compression. 

A sample command line for sysbench is:
bash all.sh 1 800000000 180 300 180 innodb 1 0 /bin/mysql none /sysbench10 /dbdir
Results

The load is in PK order and there are no secondary indexes. Engines have similar performance although there is a slow regression for InnoDB with each new release and there is a big regression from 8.0.1 to 8.0.2 which I hope will be fixed when 8.x approaches GA. The ratio is the time to load for the engine divided by the time to load for InnoDB from MySQL 5.6.35.

load    load    engine
secs    ratio
 7266   1.000   inno5635
 7833   1.078   inno5717
 8286   1.140   inno801
10516   1.447   inno802
 7640   1.051   myrocks.none
 7810   1.074   myrocks.zstd
 7558   1.040   toku5717.none
 7494   1.031   toku5717.none.prefetch
 7726   1.063   toku5717.zlib 

Create index performance has more diversity. The table is larger than RAM, some of it will be read from storage and engines with compression (MyRocks.zstd, toku5717.zlib) suffer from decompression latency. The prefetch option doesn't help TokuDB in this case. InnoDB create index performance got much faster starting in 5.7. The ratio is the create index time for the engine divided by the time for InnoDB from MySQL 5.6.35.

create  create  engine
secs    ratio
 3565   1.000   inno5635
 1904   0.534   inno5717
 1961   0.550   inno801
 1966   0.551   inno802
 3321   0.931   myrocks.none
 3802   1.066   myrocks.zstd
 9817   2.753   toku5717.none
 9855   2.764   toku5717.none.prefetch
10731   3.010   toku5717.zlib

Charts

These charts have the same data as the tables above.

Write-heavy workloads with MyRocks

MyRocks is based on RocksDB and RocksDB is write-optimized, so why don't write-heavy workloads always run faster on MyRocks than on InnoDB? I will start with a summary of MyRocks versus InnoDB: MyRocks has better space & write efficiency, frequently has better write latency and sometimes has better read latency. Better space & write efficiency means you use less SSD and it will last longer.

Don't forget that better write and space efficiency with MyRocks can lead to better read latency. When a database engine does fewer writes to storage there is more IO capacity available for reads which is a big deal with disk and can be a big deal with slower SSD. When a database engine uses less space in storage then it is likely to cache more data in RAM and have better cache hit rates on reads. Many of the benchmarks that I run use uniform distribution for key generation and won't show the benefit from better cache hit rates.

With RocksDB write usually means a call to Put and Put is a blind-write. As shown by many benchmark results, a blind write can be very fast with RocksDB - insert data into memtable, optionally flush the WAL to the OS page cache, optionally force the WAL to persistent storage. But SQL update and insert statements usually need much more than a blind-write and the reads done by MyRocks can explain why some write-heavy workloads are faster with InnoDB. Things that get in the way include:
  • Pluggable storage engine APIs are slow to adopt blind-write optimizations
  • The modified row count must be returned from an update statement 
  • Enforcement of the PK and unique constraints
  • Secondary index maintenance
Pluggable storage engine APIs are slow to adopt blind-write optimizations. I don't blame MySQL and MongoDB for this because such optimizations are limited to write-optimized engines. But MyRocks and MongoRocks are here and I expect that WiredTiger/MongoDB will eventually have an LSM in production. M*Rocks engines can use the merge operator for this. I don't expect blind-write optimizations to ever be implemented for an update-in-place b-tree.

The modified row count must be returned for an update statement and that requires evaluation of the where clause. For RocksDB this requires reads -- from the memtable, maybe from the LSM tree, OS page cache and storage. Reads from storage and the OS page cache might require decompression. This is a lot more work than a blind-write. The usage of blind-write optimizations will result in statements that are update-like and insert-like because the application programmers must be aware of the semantic differences -- modified row count won't be returned, constraint violations won't be acknowledged. I think it is worth doing.

Unique constraints must be enforced for the PK and unique secondary indexes. With the exception of an auto-increment column, this requires a read from the index to confirm the value does not exist. In the best case updates & inserts are in key order and the structures to be searched are cached in RAM -- otherwise this requires reads from the OS page cache and/or storage and might require decompression.

Secondary indexes must be maintained as part of the update & insert statement processing. For an insert this means that a new index entry will be inserted. For an update that requires index maintenance the existing index entry will be deleted and a new one will be inserted. With a b-tree the leaf page(s) will be read (possibly from storage), made dirty in the buffer pool and eventually those pages will be written back to storage. With RocksDB non-unique secondary index maintenance is read-free and blind-writes are done to put the new index entry and possibly delete-mark the old index entry. See the previous paragraph if there is a unique secondary index.

On Open Source Databases. Interview with Peter Zaitsev

“To be competitive with non-open-source cloud deployment options, open source databases need to invest in “ease-of-use.” There is no tolerance for complexity in many development teams as we move to “ops-less” deployment models.” –Peter Zaitsev

I have interviewed Peter Zaitsev, Co-Founder and CEO of Percona.
In this interview, Peter talks about the Open Source Databases market; the Cloud; the scalability challenges at Facebook; compares MySQL, MariaDB, and MongoDB; and presents Percona’s contribution to the MySQL and MongoDB ecosystems.

RVZ

Q1. What are the main technical challenges in obtaining application scaling?

Peter Zaitsev: When it comes to scaling, there are different types. There is a Facebook/Google/Alibaba/Amazon scale: these giants are pushing boundaries, and usually are solving very complicated engineering problems at a scale where solutions aren’t easy or known. This often means finding edge cases that break things like hardware, operating system kernels and the database. As such, these companies not only need to build a very large-scale infrastructures, with a high level of automation, but also ensure it is robust enough to handle these kinds of issues with limited user impact. A great deal of hardware and software deployment practices must to be in place for such installations.

While these “extreme-scale” applications are very interesting and get a lot of publicity at tech events and in tech publications, this is a very small portion of all the scenarios out there. The vast majority of applications are running at the medium to high scale, where implementing best practices gets you the scalability you need.

When it comes to MySQL, perhaps the most important question is when you need to “shard.” Sharding — while used by every application at extreme scale — isn’t a simple “out-of-the-box” feature in MySQL. It often requires a lot of engineering effort to correctly implement it.

While sharding is sometimes required, you should really examine whether it is necessary for your application. A single MySQL instance can easily handle hundreds of thousands per second (or more) of moderately complicated queries, and Terabytes of data. Pair that with MemcacheD or Redis caching, MySQL Replication or more advanced solutions such as Percona XtraDB Cluster or Amazon Aurora, and you can cover the transactional (operational) database needs for applications of a very significant scale.

Besides making such high-level architecture choices, you of course need to also ensure that you exercise basic database hygiene. Ensure that you’re using the correct hardware (or cloud instance type), the right MySQL and operating system version and configuration, have a well-designed schema and good indexes. You also want to ensure good capacity planning, so that when you want to take your system to the next scale and begin to thoroughly look at it you’re not caught by surprise.

Q2. Why did Facebook create MyRocks, a new flash-optimized transactional storage engine on top of RocksDB storage engine for MySQL?

Peter Zaitsev: The Facebook Team is the most qualified to answer this question. However, I imagine that at Facebook scale being efficient is very important because it helps to drive the costs down. If your hot data is in the cache when it is important, your database is efficient at handling writes — thus you want a “write-optimized engine.”
If you use Flash storage, you also care about two things:

      – A high level of compression since Flash storage is much more expensive than spinning disk.

– You are also interested in writing as little to the storage as possible, as the more you write the faster it wears out (and needs to be replaced).

RocksDB and MyRocks are able to achieve all of these goals. As an LSM-based storage engine, writes (especially Inserts) are very fast — even for giant data sizes. They’re also much better suited for achieving high levels of compression than InnoDB.

This Blog Post by Mark Callaghan has many interesting details, including this table which shows MyRocks having better performance, write amplification and compression for Facebook’s workload than InnoDB.

Q3. Beringei is Facebook’s open source, in-memory time series database. According to Facebook, large-scale monitoring systems cannot handle large-scale analysis in real time because the query performance is too slow. What is your take on this?

Peter Zaitsev: Facebook operates at extreme scale, so it is no surprise the conventional systems don’t scale well enough or aren’t efficient enough for Facebook’s needs.

I’m very excited Facebook has released Beringei as open source. Beringei itself is a relatively low-end storage engine that is hard to use for a majority of users, but I hope it gets integrated with other open source projects and provides a full-blown high-performance monitoring solution. Integrating it with Prometheus would be a great fit for solutions with extreme data ingestion rates and very high metric cardinality.

Q4. How do you see the market for open source databases evolving?

Peter Zaitsev: The last decade has seen a lot of open source database engines built, offering a lot of different data models, persistence options, high availability options, etc. Some of them were build as open source from scratch, while others were released as open source after years of being proprietary engines — with the most recent example being CMDB2 by Bloomberg. I think this heavy competition is great for pushing innovation forward, and is very exciting! For example, I think if that if MongoDB hadn’t shown how many developers love a document-oriented data model, we might never of seen MySQL Document Store in the MySQL ecosystem.

With all this variety, I think there will be a lot of consolidation and only a small fraction of these new technologies really getting wide adoption. Many will either have niche deployments, or will be an idea breeding ground that gets incorporated into more popular database technologies.

I do not think SQL will “die” anytime soon, even though it is many decades old. But I also don’t think we will see it being the dominant “database” language, as it has been since the turn of millennia.

The interesting disruptive force for open source technologies is the cloud. It will be very interesting for me to see how things evolve. With pay-for-use models of the cloud, the “free” (as in beer) part of open source does not apply in the same way. This reduces incentives to move to open source databases.

To be competitive with non-open-source cloud deployment options, open source databases need to invest in “ease-of-use.” There is no tolerance for complexity in many development teams as we move to “ops-less” deployment models.

Q5. In your opinion what are the pros and cons of MySQL vs. MariaDB?

Peter Zaitsev: While tracing it roots to MySQL, MariaDB is quickly becoming a very different database.
It implements some features MySQL doesn’t, but also leaves out others (MySQL Document Store and Group Replication) or implements them in a different way (JSON support and Replication GTIDs).

From the MySQL side, we have Oracle’s financial backing and engineering. You might dislike Oracle, but I think you agree they know a thing or two about database engineering. MySQL is also far more popular, and as such more battle-tested than MariaDB.

MySQL is developed by a single company (Oracle) and does not have as many external contributors compared to MariaDB — which has its own pluses and minuses.

MySQL is “open core,” meaning some components are available only in the proprietary version, such as Enterprise Authentication, Enterprise Scalability, and others. Alternatives for a number of these features are available in Percona Server for MySQL though (which is completely open source). MariaDB Server itself is completely open source, through there are other components that aren’t that you might need to build a full solution — namely MaxScale.

Another thing MariaDB has going for it is that it is included in a number of Linux distributions. Many new users will be getting their first “MySQL” experience with MariaDB.

For additional insight into MariaDB, MySQL and Percona Server for MySQL, you can check out this recent article

Q6. What’s new in the MySQL and MongoDB ecosystem?

Peter Zaitsev: This could be its own and rather large article! With MySQL, we’re very excited to see what is coming in MySQL 8. There should be a lot of great changes in pretty much every area, ranging from the optimizer to retiring a lot of architectural debt (some of it 20 years old). MySQL Group Replication and MySQL InnoDB Cluster, while still early in their maturity, are very interesting products.

For MongoDB we’re very excited about MongoDB 3.4, which has been taking steps to be a more enterprise ready database with features like collation support and high-performance sharding. A number of these features are only available in the Enterprise version of MongoDB, such as external authentication, auditing and log redaction. This is where Percona Server for MongoDB 3.4 comes in handy, by providing open source alternatives for the most valuable Enterprise-only features.

For both MySQL and MongoDB, we’re very excited about RocksDB-based storage engines. MyRocks and MongoRocks both offer outstanding performance and efficiency for certain workloads.

Q7. Anything else you wish to add?

Peter Zaitsev: I would like to use this opportunity to highlight Percona’s contribution to the MySQL and MongoDB ecosystems by mentioning two of our open source products that I’m very excited about.

First, Percona XtraDB Cluster 5.7.
While this has been around for about a year, we just completed a major performance improvement effort that allowed us to increase performance up to 10x. I’m not talking about improving some very exotic workloads: these performance improvements are achieved in very typical high-concurrency environments!

I’m also very excited about our Percona Monitoring and Management product, which is unique in being the only fully packaged open source monitoring solution specifically built for MySQL and MongoDB. It is a newer product that has been available for less than a year, but we’re seeing great momentum in adoption in the community. We are focusing many of our resources to improving it and making it more effective.

———————

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With more than 150 professionals in 29 countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of Internet giants, large enterprises and many exciting startups. Percona was named to the Inc. 5000 in 2013, 2014, 2015 and 2016.

Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Data Performance Blog. He has also been tapped as a contributor to Fortune and DZone, and his recent ebook Practical MySQL Performance Optimization Volume 1 is one of percona.com’s most popular downloads.
————————-

Resources

Percona, in collaboration with Facebook, announced the first experimental release of MyRocks in Percona Server for MySQL 5.7, with packages. September 6, 2017

eBook, “Practical MySQL Performance Optimization,” by Percona CEO Peter Zaitsev and Principal Consultant Alexander Rubin. (LINK to DOWNLOAD, registration required)

MySQL vs MongoDB – When to Use Which Technology. Peter Zaitsev, June 22, 2017

Percona Live Open Source Database Conference Europe, Dublin, Ireland. September 25 – 27, 2017

Percona Monitoring and Management (PMM) Graphs Explained: MongoDB with RocksDB, By Tim Vaillancourt,JUNE 18, 2017

Related Posts

On Apache Ignite, Apache Spark and MySQL. Interview with Nikita Ivanov. ODBMS Industry Watch, 2017-06-30

On the new developments in Apache Spark and Hadoop. Interview with Amr Awadallah. ODBMS Industry Watch,2017-03-13

On in-memory, key-value data stores. Ofer Bengal and Yiftach Shoolman. ODBMS Industry Watch, 2017-02-13

follow us on Twitter: @odbmsorg

##

Automatic Partition Maintenance in MariaDB

Automatic Partition Maintenance in MariaDB geoff_montee_g Tue, 09/05/2017 - 14:30

A MariaDB Support customer recently asked how they could automatically drop old partitions after 6 months. MariaDB does not have a mechanism to do this automatically out-of-the-box, but it is not too difficult to create a custom stored procedure and an event to call the procedure on the desired schedule. In fact, it is also possible to go even further and create a stored procedure that can also automatically add new partitions. In this blog post, I will show how to write stored procedures that perform these tasks.

Partitioned table definition

For this demonstration, I'll use a table definition based on one from MySQL's documentation on range partitioning, with some minor changes:

DROP TABLE IF EXISTS db1.quarterly_report_status; CREATE TABLE db1.quarterly_report_status ( report_id INT NOT NULL, report_status VARCHAR(20) NOT NULL, report_updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) ENGINE=InnoDB PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated) ) ( PARTITION p_first VALUES LESS THAN ( UNIX_TIMESTAMP('2016-10-01 00:00:00')), PARTITION p201610 VALUES LESS THAN ( UNIX_TIMESTAMP('2016-11-01 00:00:00')), PARTITION p201611 VALUES LESS THAN ( UNIX_TIMESTAMP('2016-12-01 00:00:00')), PARTITION p201612 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-01-01 00:00:00')), PARTITION p201701 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-02-01 00:00:00')), PARTITION p201702 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-03-01 00:00:00')), PARTITION p201703 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-04-01 00:00:00')), PARTITION p201704 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-05-01 00:00:00')), PARTITION p201705 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-06-01 00:00:00')), PARTITION p201706 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-07-01 00:00:00')), PARTITION p201707 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-08-01 00:00:00')), PARTITION p201708 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-09-01 00:00:00')), PARTITION p_future VALUES LESS THAN (MAXVALUE) );

The most significant change is that the partition naming scheme is based on the date. This will allow us to more easily determine which partitions to remove.

Stored procedure definition (create new partitions)

The stored procedure itself contains some comments that explain what it does, so I will let the code speak for itself, for the most part. One noteworthy item to point out is that we are not doing ALTER TABLE ... ADD PARTITION. This is because the partition p_future already covers the end range up to MAXVALUE, so we actually need to do ALTER TABLE ... REORGANIZE PARTITION instead.

DROP PROCEDURE IF EXISTS db1.create_new_partitions; DELIMITER $$ CREATE PROCEDURE db1.create_new_partitions(p_schema varchar(64), p_table varchar(64), p_months_to_add int) LANGUAGE SQL NOT DETERMINISTIC SQL SECURITY INVOKER BEGIN DECLARE done INT DEFAULT FALSE; DECLARE current_partition_name varchar(64); DECLARE current_partition_ts int; -- We'll use this cursor later to check -- whether a particular already exists. -- @partition_name_to_add will be -- set later. DECLARE cur1 CURSOR FOR SELECT partition_name FROM information_schema.partitions WHERE TABLE_SCHEMA = p_schema AND TABLE_NAME = p_table AND PARTITION_NAME != 'p_first' AND PARTITION_NAME != 'p_future' AND PARTITION_NAME = @partition_name_to_add; -- We'll also use this cursor later -- to query our temporary table. DECLARE cur2 CURSOR FOR SELECT partition_name, partition_range_ts FROM partitions_to_add; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE; DROP TEMPORARY TABLE IF EXISTS partitions_to_add; CREATE TEMPORARY TABLE partitions_to_add ( partition_name varchar(64), partition_range_ts int ); SET @partitions_added = FALSE; SET @months_ahead = 0; -- Let's go through a loop and add each month individually between -- the current month and the month p_months_to_add in the future. WHILE @months_ahead

Let's try running the new procedure:

MariaDB [db1]> SHOW CREATE TABLE db1.quarterly_report_status\G *************************** 1. row *************************** Table: quarterly_report_status Create Table: CREATE TABLE `quarterly_report_status` ( `report_id` int(11) NOT NULL, `report_status` varchar(20) NOT NULL, `report_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated)) (PARTITION p_first VALUES LESS THAN (1475294400) ENGINE = InnoDB, PARTITION p201610 VALUES LESS THAN (1477972800) ENGINE = InnoDB, PARTITION p201611 VALUES LESS THAN (1480568400) ENGINE = InnoDB, PARTITION p201612 VALUES LESS THAN (1483246800) ENGINE = InnoDB, PARTITION p201701 VALUES LESS THAN (1485925200) ENGINE = InnoDB, PARTITION p201702 VALUES LESS THAN (1488344400) ENGINE = InnoDB, PARTITION p201703 VALUES LESS THAN (1491019200) ENGINE = InnoDB, PARTITION p201704 VALUES LESS THAN (1493611200) ENGINE = InnoDB, PARTITION p201705 VALUES LESS THAN (1496289600) ENGINE = InnoDB, PARTITION p201706 VALUES LESS THAN (1498881600) ENGINE = InnoDB, PARTITION p201707 VALUES LESS THAN (1501560000) ENGINE = InnoDB, PARTITION p201708 VALUES LESS THAN (1504238400) ENGINE = InnoDB, PARTITION p_future VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ 1 row in set (0.00 sec) MariaDB [db1]> CALL db1.create_new_partitions('db1', 'quarterly_report_status', 3); +--------------------------------------------------------+ | CONCAT('Creating partition: ', @partition_name_to_add) | +--------------------------------------------------------+ | Creating partition: p201709 | +--------------------------------------------------------+ 1 row in set (0.01 sec) +--------------------------------------------------------+ | CONCAT('Creating partition: ', @partition_name_to_add) | +--------------------------------------------------------+ | Creating partition: p201710 | +--------------------------------------------------------+ 1 row in set (0.02 sec) +--------------------------------------------------------+ | CONCAT('Creating partition: ', @partition_name_to_add) | +--------------------------------------------------------+ | Creating partition: p201711 | +--------------------------------------------------------+ 1 row in set (0.02 sec) Query OK, 0 rows affected (0.09 sec) MariaDB [db1]> SHOW CREATE TABLE db1.quarterly_report_status\G *************************** 1. row *************************** Table: quarterly_report_status Create Table: CREATE TABLE `quarterly_report_status` ( `report_id` int(11) NOT NULL, `report_status` varchar(20) NOT NULL, `report_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated)) (PARTITION p_first VALUES LESS THAN (1475294400) ENGINE = InnoDB, PARTITION p201610 VALUES LESS THAN (1477972800) ENGINE = InnoDB, PARTITION p201611 VALUES LESS THAN (1480568400) ENGINE = InnoDB, PARTITION p201612 VALUES LESS THAN (1483246800) ENGINE = InnoDB, PARTITION p201701 VALUES LESS THAN (1485925200) ENGINE = InnoDB, PARTITION p201702 VALUES LESS THAN (1488344400) ENGINE = InnoDB, PARTITION p201703 VALUES LESS THAN (1491019200) ENGINE = InnoDB, PARTITION p201704 VALUES LESS THAN (1493611200) ENGINE = InnoDB, PARTITION p201705 VALUES LESS THAN (1496289600) ENGINE = InnoDB, PARTITION p201706 VALUES LESS THAN (1498881600) ENGINE = InnoDB, PARTITION p201707 VALUES LESS THAN (1501560000) ENGINE = InnoDB, PARTITION p201708 VALUES LESS THAN (1504238400) ENGINE = InnoDB, PARTITION p201709 VALUES LESS THAN (1506830400) ENGINE = InnoDB, PARTITION p201710 VALUES LESS THAN (1509508800) ENGINE = InnoDB, PARTITION p201711 VALUES LESS THAN (1512104400) ENGINE = InnoDB, PARTITION p_future VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ 1 row in set (0.00 sec)

We can see that it appears to be working as expected.

Stored procedure definition (drop old partitions)

This additional stored procedure also contains some comments that explain what it does, so I will let the code speak for itself, for the most part. One noteworthy item to point out is that the stored procedure drops all old partitions individually with ALTER TABLE ... DROP PARTITION, and then it increases the range of the p_first partition with ALTER TABLE ... REORGANIZE PARTITION, so that it fills in the gap left behind.

DROP PROCEDURE IF EXISTS db1.drop_old_partitions; DELIMITER $$ CREATE PROCEDURE db1.drop_old_partitions(p_schema varchar(64), p_table varchar(64), p_months_to_keep int, p_seconds_to_sleep int) LANGUAGE SQL NOT DETERMINISTIC SQL SECURITY INVOKER BEGIN DECLARE done INT DEFAULT FALSE; DECLARE current_partition_name varchar(64); -- We'll use this cursor later to get -- the list of partitions to drop. -- @last_partition_name_to_keep will be -- set later. DECLARE cur1 CURSOR FOR SELECT partition_name FROM information_schema.partitions WHERE TABLE_SCHEMA = p_schema AND TABLE_NAME = p_table AND PARTITION_NAME != 'p_first' AND PARTITION_NAME != 'p_future' AND PARTITION_NAME 0 THEN SELECT CONCAT('Sleeping for ', p_seconds_to_sleep, ' seconds'); SELECT SLEEP(p_seconds_to_sleep); END IF; SELECT CONCAT('Dropping partition: ', current_partition_name); -- First we build the ALTER TABLE query. SET @schema = p_schema; SET @table = p_table; SET @partition = current_partition_name; SET @q = 'SELECT CONCAT(''ALTER TABLE '', @schema, ''.'', @table, '' DROP PARTITION '', @partition) INTO @query'; PREPARE st FROM @q; EXECUTE st; DEALLOCATE PREPARE st; -- And then we prepare and execute the ALTER TABLE query. PREPARE st FROM @query; EXECUTE st; DEALLOCATE PREPARE st; SET @first = FALSE; END LOOP; CLOSE cur1; -- If no partitions were dropped, then we can also skip this. IF ! @first THEN -- Then we need to get the date of the new first partition. -- We need the date in UNIX timestamp format. SET @q = 'SELECT DATE_FORMAT(@last_month_to_keep, ''%Y-%m-01 00:00:00'') INTO @new_first_partition_date'; PREPARE st FROM @q; EXECUTE st; DEALLOCATE PREPARE st; SELECT UNIX_TIMESTAMP(@new_first_partition_date) INTO @new_first_partition_ts; -- We also need to get the date of the second partition -- since the second partition is also needed for REORGANIZE PARTITION. SET @q = 'SELECT DATE_ADD(@new_first_partition_date, INTERVAL 1 MONTH) INTO @second_partition_date'; PREPARE st FROM @q; EXECUTE st; DEALLOCATE PREPARE st; SELECT UNIX_TIMESTAMP(@second_partition_date) INTO @second_partition_ts; SELECT CONCAT('Reorganizing first and second partitions. first partition date = ', @new_first_partition_date, ', second partition date = ', @second_partition_date); -- Then we build the ALTER TABLE query. SET @schema = p_schema; SET @table = p_table; SET @q = 'SELECT CONCAT(''ALTER TABLE '', @schema, ''.'', @table, '' REORGANIZE PARTITION p_first, '', @last_partition_name_to_keep, '' INTO ( PARTITION p_first VALUES LESS THAN ( '', @new_first_partition_ts, '' ), PARTITION '', @last_partition_name_to_keep, '' VALUES LESS THAN ( '', @second_partition_ts, '' ) ) '') INTO @query'; PREPARE st FROM @q; EXECUTE st; DEALLOCATE PREPARE st; -- And then we prepare and execute the ALTER TABLE query. PREPARE st FROM @query; EXECUTE st; DEALLOCATE PREPARE st; END IF; END$$ DELIMITER ;

Let's try running the new procedure:

MariaDB [db1]> SHOW CREATE TABLE db1.quarterly_report_status\G *************************** 1. row *************************** Table: quarterly_report_status Create Table: CREATE TABLE `quarterly_report_status` ( `report_id` int(11) NOT NULL, `report_status` varchar(20) NOT NULL, `report_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated)) (PARTITION p_first VALUES LESS THAN (1475294400) ENGINE = InnoDB, PARTITION p201610 VALUES LESS THAN (1477972800) ENGINE = InnoDB, PARTITION p201611 VALUES LESS THAN (1480568400) ENGINE = InnoDB, PARTITION p201612 VALUES LESS THAN (1483246800) ENGINE = InnoDB, PARTITION p201701 VALUES LESS THAN (1485925200) ENGINE = InnoDB, PARTITION p201702 VALUES LESS THAN (1488344400) ENGINE = InnoDB, PARTITION p201703 VALUES LESS THAN (1491019200) ENGINE = InnoDB, PARTITION p201704 VALUES LESS THAN (1493611200) ENGINE = InnoDB, PARTITION p201705 VALUES LESS THAN (1496289600) ENGINE = InnoDB, PARTITION p201706 VALUES LESS THAN (1498881600) ENGINE = InnoDB, PARTITION p201707 VALUES LESS THAN (1501560000) ENGINE = InnoDB, PARTITION p201708 VALUES LESS THAN (1504238400) ENGINE = InnoDB, PARTITION p_future VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ 1 row in set (0.00 sec) MariaDB [db1]> CALL db1.drop_old_partitions('db1', 'quarterly_report_status', 6, 5); +--------------------------------------------------------------------------+ | CONCAT('Dropping all partitions before: ', @last_partition_name_to_keep) | +--------------------------------------------------------------------------+ | Dropping all partitions before: p201702 | +--------------------------------------------------------------------------+ 1 row in set (0.00 sec) +--------------------------------------------------------+ | CONCAT('Dropping partition: ', current_partition_name) | +--------------------------------------------------------+ | Dropping partition: p201610 | +--------------------------------------------------------+ 1 row in set (0.00 sec) +---------------------------------------------------------+ | CONCAT('Sleeping for ', p_seconds_to_sleep, ' seconds') | +---------------------------------------------------------+ | Sleeping for 5 seconds | +---------------------------------------------------------+ 1 row in set (0.02 sec) +---------------------------+ | SLEEP(p_seconds_to_sleep) | +---------------------------+ | 0 | +---------------------------+ 1 row in set (5.02 sec) +--------------------------------------------------------+ | CONCAT('Dropping partition: ', current_partition_name) | +--------------------------------------------------------+ | Dropping partition: p201611 | +--------------------------------------------------------+ 1 row in set (5.02 sec) +---------------------------------------------------------+ | CONCAT('Sleeping for ', p_seconds_to_sleep, ' seconds') | +---------------------------------------------------------+ | Sleeping for 5 seconds | +---------------------------------------------------------+ 1 row in set (5.03 sec) +---------------------------+ | SLEEP(p_seconds_to_sleep) | +---------------------------+ | 0 | +---------------------------+ 1 row in set (10.03 sec) +--------------------------------------------------------+ | CONCAT('Dropping partition: ', current_partition_name) | +--------------------------------------------------------+ | Dropping partition: p201612 | +--------------------------------------------------------+ 1 row in set (10.03 sec) +---------------------------------------------------------+ | CONCAT('Sleeping for ', p_seconds_to_sleep, ' seconds') | +---------------------------------------------------------+ | Sleeping for 5 seconds | +---------------------------------------------------------+ 1 row in set (10.05 sec) +---------------------------+ | SLEEP(p_seconds_to_sleep) | +---------------------------+ | 0 | +---------------------------+ 1 row in set (15.05 sec) +--------------------------------------------------------+ | CONCAT('Dropping partition: ', current_partition_name) | +--------------------------------------------------------+ | Dropping partition: p201701 | +--------------------------------------------------------+ 1 row in set (15.05 sec) +--------------------------------------------------------------------------------------------------------------------------------------------------------------+ | CONCAT('Reorganizing first and second partitions. first partition date = ', @new_first_partition_date, ', second partition date = ', @second_partition_date) | +--------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Reorganizing first and second partitions. first partition date = 2017-02-01 00:00:00, second partition date = 2017-03-01 00:00:00 | +--------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (15.06 sec) Query OK, 0 rows affected (15.11 sec) MariaDB [db1]> SHOW CREATE TABLE db1.quarterly_report_status\G *************************** 1. row *************************** Table: quarterly_report_status Create Table: CREATE TABLE `quarterly_report_status` ( `report_id` int(11) NOT NULL, `report_status` varchar(20) NOT NULL, `report_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated)) (PARTITION p_first VALUES LESS THAN (1485925200) ENGINE = InnoDB, PARTITION p201702 VALUES LESS THAN (1488344400) ENGINE = InnoDB, PARTITION p201703 VALUES LESS THAN (1491019200) ENGINE = InnoDB, PARTITION p201704 VALUES LESS THAN (1493611200) ENGINE = InnoDB, PARTITION p201705 VALUES LESS THAN (1496289600) ENGINE = InnoDB, PARTITION p201706 VALUES LESS THAN (1498881600) ENGINE = InnoDB, PARTITION p201707 VALUES LESS THAN (1501560000) ENGINE = InnoDB, PARTITION p201708 VALUES LESS THAN (1504238400) ENGINE = InnoDB, PARTITION p_future VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ 1 row in set (0.00 sec)

We can see that our changes seem to be working as expected. In addition to old partitions being dropped, we can also see that p_first's date range was updated.

Stored procedure definition (tie other procedures together)

It is probably going to be preferable in most cases to perform all partition maintenance at the same time. Therefore, we can create another stored procedure that calls our other two stored procedures. This is fairly straight forward.

DROP PROCEDURE IF EXISTS db1.perform_partition_maintenance; DELIMITER $$ CREATE PROCEDURE db1.perform_partition_maintenance(p_schema varchar(64), p_table varchar(64), p_months_to_add int, p_months_to_keep int, p_seconds_to_sleep int) LANGUAGE SQL NOT DETERMINISTIC SQL SECURITY INVOKER BEGIN CALL db1.drop_old_partitions(p_schema, p_table, p_months_to_keep, p_seconds_to_sleep); CALL db1.create_new_partitions(p_schema, p_table, p_months_to_add); END$$ DELIMITER ;

Let's reset our partitioned table to its original state, and then let's try running our new stored procedure.

MariaDB [db1]> SHOW CREATE TABLE db1.quarterly_report_status\G *************************** 1. row *************************** Table: quarterly_report_status Create Table: CREATE TABLE `quarterly_report_status` ( `report_id` int(11) NOT NULL, `report_status` varchar(20) NOT NULL, `report_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated)) (PARTITION p_first VALUES LESS THAN (1475294400) ENGINE = InnoDB, PARTITION p201610 VALUES LESS THAN (1477972800) ENGINE = InnoDB, PARTITION p201611 VALUES LESS THAN (1480568400) ENGINE = InnoDB, PARTITION p201612 VALUES LESS THAN (1483246800) ENGINE = InnoDB, PARTITION p201701 VALUES LESS THAN (1485925200) ENGINE = InnoDB, PARTITION p201702 VALUES LESS THAN (1488344400) ENGINE = InnoDB, PARTITION p201703 VALUES LESS THAN (1491019200) ENGINE = InnoDB, PARTITION p201704 VALUES LESS THAN (1493611200) ENGINE = InnoDB, PARTITION p201705 VALUES LESS THAN (1496289600) ENGINE = InnoDB, PARTITION p201706 VALUES LESS THAN (1498881600) ENGINE = InnoDB, PARTITION p201707 VALUES LESS THAN (1501560000) ENGINE = InnoDB, PARTITION p201708 VALUES LESS THAN (1504238400) ENGINE = InnoDB, PARTITION p_future VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ 1 row in set (0.00 sec) MariaDB [db1]> CALL db1.perform_partition_maintenance('db1', 'quarterly_report_status', 3, 6, 5); +--------------------------------------------------------------------------+ | CONCAT('Dropping all partitions before: ', @last_partition_name_to_keep) | +--------------------------------------------------------------------------+ | Dropping all partitions before: p201702 | +--------------------------------------------------------------------------+ 1 row in set (0.00 sec) +--------------------------------------------------------+ | CONCAT('Dropping partition: ', current_partition_name) | +--------------------------------------------------------+ | Dropping partition: p201610 | +--------------------------------------------------------+ 1 row in set (0.00 sec) +---------------------------------------------------------+ | CONCAT('Sleeping for ', p_seconds_to_sleep, ' seconds') | +---------------------------------------------------------+ | Sleeping for 5 seconds | +---------------------------------------------------------+ 1 row in set (0.02 sec) +---------------------------+ | SLEEP(p_seconds_to_sleep) | +---------------------------+ | 0 | +---------------------------+ 1 row in set (5.02 sec) +--------------------------------------------------------+ | CONCAT('Dropping partition: ', current_partition_name) | +--------------------------------------------------------+ | Dropping partition: p201611 | +--------------------------------------------------------+ 1 row in set (5.02 sec) +---------------------------------------------------------+ | CONCAT('Sleeping for ', p_seconds_to_sleep, ' seconds') | +---------------------------------------------------------+ | Sleeping for 5 seconds | +---------------------------------------------------------+ 1 row in set (5.03 sec) +---------------------------+ | SLEEP(p_seconds_to_sleep) | +---------------------------+ | 0 | +---------------------------+ 1 row in set (10.03 sec) +--------------------------------------------------------+ | CONCAT('Dropping partition: ', current_partition_name) | +--------------------------------------------------------+ | Dropping partition: p201612 | +--------------------------------------------------------+ 1 row in set (10.03 sec) +---------------------------------------------------------+ | CONCAT('Sleeping for ', p_seconds_to_sleep, ' seconds') | +---------------------------------------------------------+ | Sleeping for 5 seconds | +---------------------------------------------------------+ 1 row in set (10.06 sec) +---------------------------+ | SLEEP(p_seconds_to_sleep) | +---------------------------+ | 0 | +---------------------------+ 1 row in set (15.06 sec) +--------------------------------------------------------+ | CONCAT('Dropping partition: ', current_partition_name) | +--------------------------------------------------------+ | Dropping partition: p201701 | +--------------------------------------------------------+ 1 row in set (15.06 sec) +--------------------------------------------------------------------------------------------------------------------------------------------------------------+ | CONCAT('Reorganizing first and second partitions. first partition date = ', @new_first_partition_date, ', second partition date = ', @second_partition_date) | +--------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Reorganizing first and second partitions. first partition date = 2017-02-01 00:00:00, second partition date = 2017-03-01 00:00:00 | +--------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (15.08 sec) +--------------------------------------------------------+ | CONCAT('Creating partition: ', @partition_name_to_add) | +--------------------------------------------------------+ | Creating partition: p201709 | +--------------------------------------------------------+ 1 row in set (15.16 sec) +--------------------------------------------------------+ | CONCAT('Creating partition: ', @partition_name_to_add) | +--------------------------------------------------------+ | Creating partition: p201710 | +--------------------------------------------------------+ 1 row in set (15.17 sec) +--------------------------------------------------------+ | CONCAT('Creating partition: ', @partition_name_to_add) | +--------------------------------------------------------+ | Creating partition: p201711 | +--------------------------------------------------------+ 1 row in set (15.17 sec) Query OK, 0 rows affected (15.26 sec) MariaDB [db1]> SHOW CREATE TABLE db1.quarterly_report_status\G *************************** 1. row *************************** Table: quarterly_report_status Create Table: CREATE TABLE `quarterly_report_status` ( `report_id` int(11) NOT NULL, `report_status` varchar(20) NOT NULL, `report_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated)) (PARTITION p_first VALUES LESS THAN (1485925200) ENGINE = InnoDB, PARTITION p201702 VALUES LESS THAN (1488344400) ENGINE = InnoDB, PARTITION p201703 VALUES LESS THAN (1491019200) ENGINE = InnoDB, PARTITION p201704 VALUES LESS THAN (1493611200) ENGINE = InnoDB, PARTITION p201705 VALUES LESS THAN (1496289600) ENGINE = InnoDB, PARTITION p201706 VALUES LESS THAN (1498881600) ENGINE = InnoDB, PARTITION p201707 VALUES LESS THAN (1501560000) ENGINE = InnoDB, PARTITION p201708 VALUES LESS THAN (1504238400) ENGINE = InnoDB, PARTITION p201709 VALUES LESS THAN (1506830400) ENGINE = InnoDB, PARTITION p201710 VALUES LESS THAN (1509508800) ENGINE = InnoDB, PARTITION p201711 VALUES LESS THAN (1512104400) ENGINE = InnoDB, PARTITION p_future VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ 1 row in set (0.00 sec)

This stored procedure also seems to be working as expected.

Running the procedure more often than necessary

It should be noted that these stored procedures can be run more often than is necessary. If the procedures are run when no partitions need to be added or deleted, then the procedure will not perform any work. Let's reset our table definition and try it out.

MariaDB [db1]> CALL db1.perform_partition_maintenance('db1', 'quarterly_report_status', 3, 6, 5); +--------------------------------------------------------------------------+ | CONCAT('Dropping all partitions before: ', @last_partition_name_to_keep) | +--------------------------------------------------------------------------+ | Dropping all partitions before: p201702 | +--------------------------------------------------------------------------+ 1 row in set (0.00 sec) +--------------------------------------------------------+ | CONCAT('Dropping partition: ', current_partition_name) | +--------------------------------------------------------+ | Dropping partition: p201610 | +--------------------------------------------------------+ 1 row in set (0.00 sec) +---------------------------------------------------------+ | CONCAT('Sleeping for ', p_seconds_to_sleep, ' seconds') | +---------------------------------------------------------+ | Sleeping for 5 seconds | +---------------------------------------------------------+ 1 row in set (0.03 sec) +---------------------------+ | SLEEP(p_seconds_to_sleep) | +---------------------------+ | 0 | +---------------------------+ 1 row in set (5.03 sec) +--------------------------------------------------------+ | CONCAT('Dropping partition: ', current_partition_name) | +--------------------------------------------------------+ | Dropping partition: p201611 | +--------------------------------------------------------+ 1 row in set (5.03 sec) +---------------------------------------------------------+ | CONCAT('Sleeping for ', p_seconds_to_sleep, ' seconds') | +---------------------------------------------------------+ | Sleeping for 5 seconds | +---------------------------------------------------------+ 1 row in set (5.06 sec) +---------------------------+ | SLEEP(p_seconds_to_sleep) | +---------------------------+ | 0 | +---------------------------+ 1 row in set (10.06 sec) +--------------------------------------------------------+ | CONCAT('Dropping partition: ', current_partition_name) | +--------------------------------------------------------+ | Dropping partition: p201612 | +--------------------------------------------------------+ 1 row in set (10.06 sec) +---------------------------------------------------------+ | CONCAT('Sleeping for ', p_seconds_to_sleep, ' seconds') | +---------------------------------------------------------+ | Sleeping for 5 seconds | +---------------------------------------------------------+ 1 row in set (10.08 sec) +---------------------------+ | SLEEP(p_seconds_to_sleep) | +---------------------------+ | 0 | +---------------------------+ 1 row in set (15.09 sec) +--------------------------------------------------------+ | CONCAT('Dropping partition: ', current_partition_name) | +--------------------------------------------------------+ | Dropping partition: p201701 | +--------------------------------------------------------+ 1 row in set (15.09 sec) +--------------------------------------------------------------------------------------------------------------------------------------------------------------+ | CONCAT('Reorganizing first and second partitions. first partition date = ', @new_first_partition_date, ', second partition date = ', @second_partition_date) | +--------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Reorganizing first and second partitions. first partition date = 2017-02-01 00:00:00, second partition date = 2017-03-01 00:00:00 | +--------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (15.11 sec) +--------------------------------------------------------+ | CONCAT('Creating partition: ', @partition_name_to_add) | +--------------------------------------------------------+ | Creating partition: p201709 | +--------------------------------------------------------+ 1 row in set (15.18 sec) +--------------------------------------------------------+ | CONCAT('Creating partition: ', @partition_name_to_add) | +--------------------------------------------------------+ | Creating partition: p201710 | +--------------------------------------------------------+ 1 row in set (15.18 sec) +--------------------------------------------------------+ | CONCAT('Creating partition: ', @partition_name_to_add) | +--------------------------------------------------------+ | Creating partition: p201711 | +--------------------------------------------------------+ 1 row in set (15.18 sec) Query OK, 0 rows affected (15.28 sec) MariaDB [db1]> CALL db1.perform_partition_maintenance('db1', 'quarterly_report_status', 3, 6, 5); +--------------------------------------------------------------------------+ | CONCAT('Dropping all partitions before: ', @last_partition_name_to_keep) | +--------------------------------------------------------------------------+ | Dropping all partitions before: p201702 | +--------------------------------------------------------------------------+ 1 row in set (0.01 sec) Query OK, 0 rows affected (0.02 sec)

As we can see from the above output, the procedure did not perform any work the second time.

Event definition

We want our stored procedure to run automatically every month, so we can use an event to do that. Before testing the event, we need to do two things:

  • We need to recreate the table with the original definition, so that it has all of the original partitions.
  • We need to ensure that event_scheduler=ON is set, and if not, we need to set it.
MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'event_scheduler'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | event_scheduler | OFF | +-----------------+-------+ 1 row in set (0.00 sec) MariaDB [(none)]> SET GLOBAL event_scheduler=ON; Query OK, 0 rows affected (0.00 sec) MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'event_scheduler'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | event_scheduler | ON | +-----------------+-------+ 1 row in set (0.00 sec)

And then we can run the following:

DROP EVENT db1.monthly_perform_partition_maintenance_event; CREATE EVENT db1.monthly_perform_partition_maintenance_event ON SCHEDULE EVERY 1 MONTH STARTS NOW() DO CALL db1.perform_partition_maintenance('db1', 'quarterly_report_status', 3, 6, 5);

However, there's another great change that we can make here. It might not be ideal to only run the procedure once per month, because if the procedure fails for whatever reason, then it might not get another chance to run again until the next month. For that reason, it might be better to run the procedure more often, such as once per day. As mentioned above, the procedure will only do work when partition maintenance is actually necessary, so it should not cause any issues to execute the procedure more often.

If we wanted to run the procedure once per day, then the event definition would become:

DROP EVENT db1.monthly_perform_partition_maintenance_event; CREATE EVENT db1.monthly_perform_partition_maintenance_event ON SCHEDULE EVERY 1 DAY STARTS NOW() DO CALL db1.perform_partition_maintenance('db1', 'quarterly_report_status', 3, 6, 5); Conclusion

Thanks to the flexibility of stored procedures and events, it is relatively easy to automatically perform partition maintenance in MariaDB. Has anyone else implemented something like this?

A MariaDB Support customer recently asked how they could automatically drop old partitions after 6 months. MariaDB does not have a mechanism to do this automatically out-of-the-box, but it is not too difficult to create a custom stored procedure and an event to call the procedure on the desired schedule. In fact, it is also possible to go even further and create a stored procedure that can also automatically add new partitions. In this blog post, I will show how to write stored procedures that perform these tasks.

Shlomi Noach

Tue, 09/05/2017 - 13:49

Partition management via common_schema

A shameless plug to common_schema's `sql_range_partitions` view, which generates the correct `DROP` statements for purging old partitions, as well as `CREATE` statements for creating the next partitions: sql_range_partitions

Also see Ike Walker's blog post on this topic: http://mechanics.flite.com/blog/2016/03/28/simplifying-mysql-partition-management-using-common-schema/

Login or Register to post comments

Pages