Planet MySQL

Moving persistent data out of Redis

Historically, we have used Redis in two ways at GitHub:

We used it as an LRU cache to conveniently store the results of expensive computations over data originally persisted in Git repositories or MySQL. We call this transient Redis.

We also enabled persistence, which gave us durability guarantees over data that was not stored anywhere else. We used it to store a wide range of values: from sparse data with high read/write ratios, like configuration settings, counters, or quality metrics, to very dynamic information powering core features like spam analysis. We call this persistent Redis.

Recently we made the decision to disable persistence in Redis and stop using it as a source of truth for our data. The main motivations behind this choice were to:

  • Reduce the operational cost of our persistence infrastructure by removing some of its complexity.
  • Take advantage of our expertise operating MySQL.
  • Gain some extra performance, by eliminating the I/O latency during the process of writing big changes on the server state to disk.

Transitioning all that information transparently involved planning and coordination. For each problem domain using persistent Redis, we considered the volume of operations, the structure of the data, and the different access patterns to predict the impact on our current MySQL capacity, and the need for provisioning new hardware.

For the majority of callsites, we replaced persistent Redis with GitHub::KV, a MySQL key/value store of our own built atop InnoDB, with features like key expiration. We were able to use GitHub::KV almost identically as we used Redis: from trending repositories and users for the explore page, to rate limiting to spammy user detection.

Our biggest challenge: Migrating the activity feeds

We have lots of “events” at GitHub. Starring a repository, closing an issue and pushing commits are all events that we display on our activity feeds, like the one found on your GitHub homepage.

We used Redis as a secondary indexer for the MySQL table that stores all our events. Previously, when an event happened, we “dispatched” the event identifier to Redis keys corresponding to each user’s feed that should display the event. That’s a lot of write operations and a lot of Redis keys and no single table would be able to handle that fanout. We weren’t going to be able to simply replace Redis with GitHub::KV everywhere in this code path and call it a day.

Our first step was to gather some metrics and let them tell us what to do. We pulled numbers for the different types of feeds we had and calculated the writes and reads per second for each timeline type (e.g., issue events in a repository, public events performed by a user, etc.). One timeline wasn’t ever read, so we were able to axe it right away and immediately knock one off the list. Of the remaining timelines, two were so write-heavy that we knew we couldn’t port them to MySQL as is. So that’s where we began.

Let’s walk through how we handled one of the two problematic timelines. The “organization timeline” that you can see if you toggle the event feed on your home page to one of the organizations you belong to, accounted for 67% of the more than 350 million total writes per day to Redis for these timelines. Remember when I said we “dispatched” event IDs to Redis for every user that should see them? Long story short - we were pushing event IDs to separate Redis keys for every event and every user within an org. So for an active organization that produces, say, 100 events per day and has 1000 members, that would potentially be 100,000 writes to Redis for only 100 events. Not good, not efficient, and would require far more MySQL capacity than what we are willing to accept.

We changed up how writing to and reading from Redis keys worked for this timeline before even thinking about MySQL. We’d write every event happening to one key for the org, and then on retrieval, we’d reject those events that the requesting user shouldn’t see. Instead of doing the filtering each time the event is fanned out, we’d do it on reads.

This resulted in a dramatic 65% reduction of the write operations in for this feature, getting us closer to the point were we could move the activity feeds to MySQL entirely.

Although the single goal in mind was to stop using Redis as a persistent datastore, we thought that, given this was a legacy piece of code that evolved organically over the years, there would be some room for improving its efficiency as well. Reads were fast because the data was properly indexed and compact. Knowing that, we decided to stop writing separately to certain timelines that we could compose from the events contained in others, and therefore reduce the remaining writes another 30% (~11% overall). We got to a point that we were writing less than 1500 keys per second 98% of the time, with spikes below 2100 keys written per second. This was a volume of operations we thought we could handle with our current MySQL infrastructure without adding any new servers.

While we prepared to migrate the activity feeds to MySQL, we experimented with different schema designs, tried out one-record-per-event normalization and fixed-size feed subsets per record, and we even experimented with MySQL 5.7 JSON data type for modeling the list of event IDs. However we finally went with a schema similar to that of GitHub::KV, just without some of the features we didn’t need, like the record’s last updated at and expiration timestamps.

On top of that schema, and inspired by Redis pipelining, we created a small library for batching and throttling writes of the same event that were dispatched to different feeds.

With all that in place, we began migrating each type of feed we had, starting with the least “risky”. We measured risk of migrating any given type based on its number of write operations, as reads were not really the bottleneck.

After we migrated each feed type, we checked cluster capacity, contention and replication delay. We had feature flags in place that enabled writes to MySQL, while still writing to persistent Redis, so that we wouldn’t disrupt user experience if we had to roll back. Once we were sure writes were performing well, and that all the events in Redis were copied to MySQL, we flipped another feature flag to read from the new data store, again measured capacity, and then proceeded with the next activity feed type.

When we were sure everything was migrated and performing properly we deployed a new pull request removing all callsites to persistent Redis. These are the resulting performance figures as of today:

We can see how at the store level, writes (mset) are below 270wps at peak, with reads (mget) below 460ps. These values are way lower than the number of events being written thanks to the way events are batched before writes.

Replication delay is below 180 milliseconds at peak. The blue line, correlated with the number of write operations, shows how delay is checked before any batch is written to prevent replicas from getting out of sync.

What we learned

At the end of the day we just grew out of Redis as a persistent datastore for some of our use cases. We needed something that would work for both and GitHub Enterprise, so we decided to lean on our operational experience with MySQL. However, clearly MySQL isn’t a one-size-fits-all solution and we had to rely on data and metrics to guide us in our usage of it for our event feeds at GitHub. Our first priority was moving off of persistent Redis, and our data-driven approach enabled us to optimize and improve performance along the way.

Work with us

Thank you to everybody on the Platform and Infrastucture teams who contributed to this project. If you would like to work on problems that help scale GitHub out, we are looking for an engineer to join us. The Platform team is responsible for building a resilient, highly available platform for internal engineers and external integrators to add value to our users.

We would love you to join us. Apply here!

MySQL & Friends Community Dinner 2017

FOSDEM is happening again in Brussels, and as usual, there will there be a MySQL and Friends Devroom. We can’t really imagine having a FOSDEM without having a MySQL and Friends Community Dinner, so here we are again..

Like the last couple of years, we have rented the same private space at ICAB, more detailed directions below.

You can buy tickets for the dinner at

The listed ticket price includes a selection of Belgian Speciality Beers and food will be Belgian food served by a new (and improved) caterer: Koken Met Klaas.

If you have any dietary requirements (vegetarian, vegan, gluten-free,…) please let us know ahead of time when buying the ticket as we need to inform the caterer who wil make the necessary accommodations for you.

The reason we charge money up front is two-fold:
  1. We need to be able to give ourselves a rough estimate of the amount of attendees ahead of time, so we can make sure the caterer prepares enough food for everyone to enjoy. Free signups tend to make that number a bit less reliable. We want to make sure there is enough food to go around for everyone attending!
  2. Stress-free financials on the night of the dinner! We all know the effects of Belgian Beer on arithmetic skills. Flash your ticket, get your food.
  3. Costs went up a lot this year, especially due to the catering we now have (expect very good food!). Therefore we had to raise the ticket prices as well as had to find a couple more sponsors to keep ticket prices reasonably low

We also want to dedicate this event in loving memory of Liz, who is on a road trip and is unlikely to come back to Belgium.

We’re looking forward to meeting you all again at Fosdem and the Community Dinner. See you then! Party-Squad – Dimitri VanoverbekeTom De CoomanKenny Gryp Sponsors Once again, we want to thank our generous sponsors, whose help makes this affordable at such a great price. Community Sponsors:



Other Sponsors:

ICAB Brussels – Business and Technology Incubator

Wondering how to get there from Fosdem?

The venue itself is located very close to the VUB. You can find the route to get there right here.

The total distance from the ULB campus is about 2.3km, so you could walk there, but it might be more comfortable to take the tram.

Tram 7 and 25 depart from the “Cambre Etoile” stop regularly (every few minutes), and will take you up to “Hansen-Soulie”, where you need to get out, and walk the remaining 200m. The tram ride takes about 9 minutes.

MySQL 8.0.1: The Next Development Milestone

This post discusses the next MySQL development milestone: MySQL 8.0.1.

From the outset, MySQL 8.0 has received plenty of attention. Both this blog (see the MySQL 8.0 search) and other sites around the Internet have covered it. Early reviews seem positive (including my own MySQL 8.0 early bugs review). There is plenty of excitement about the new features.

As for early feedback on MySQL 8.0, Peter Zaitsev (Percona CEO) listed a set of recommendations for benchmarking MySQL 8.0. I hope these get reviewed and implemented.

MySQL achieved the current development milestone (available for download on on September 12, 2016. Its release immediately came with a detailed review by Geir Hoydalsvik from MySQL. If you haven’t had the opportunity to do so yet, you can also review the MySQL 8.0 release notes.

It now looks like we’re nearing 8.0.1, the next development milestone. I don’t have insider information, but it’s quite clear when navigating that:

Regarding timing, it’s interesting to note that the “What Is New in MySQL 8.0” page was updated on the 6th of January.

It looks like the release might come soon. So, restrain your excitement for a few days (or weeks?) more. Maybe you’ll be able to checkout the all new MySQL 8.0.1!

PS: If MySQL quality interests you, have a look at this recent – and very interesting – change made to the MTR (MySQL Test Run, the MySQL test suite) program. I believe it improves quality for everyone who runs MySQL (including its forks). The tests (which are run worldwide, often for each code change made) will now test the product with its own defaults.

In 2016 Galera Cluster surged past one million downloads – setting up for a great year in 2017

Helsinki, Finland & London, UK  – January 9th 2017 – Codership, whose Galera Cluster technology brings high availability (HA) and scalability to open source databases worldwide, has seen its technology break the one million download barrier. Codership’s customer roster has grown over 100 percent thanks to the adoption of OpenStack by enterprises and the recent strategic partnership with MariaDB.


Galera has been recognised as the most widely used OpenStack High Availability technology for the second year in a row. According to the OpenStack Survey 2016, one-third of users rely on Galera Clusters running on MariaDB and MySQL.


To cope with an increase of customer demand coming from various industries such as financial services, retail and telecoms, Codership has grown its engineering hires by 50 percent. The new team members will ensure Galera continues to deliver on its mission to protect enterprise applications from unplanned downtime.


Codership looks set for further growth. Industry analysts Gartner, predicts by 2018, more than 70 percent of new in-house applications will be developed on an Open Source Database Management System (OSDBMS), and 50 percent of existing commercial Relational Database Management System (RDBMS) instances will have been converted.


Another addition for this year is Galera joining Mirantis’ unlocked partner programme. Mirantis and Codership are collaborating to provide high availability for OpenStack infrastructure datastore used by companies such as online marketplace MercadoLibre, 8th biggest e-commerce platform in the world with over 100 million consumers. Galera Cluster is now the default for OpenStack high availability in Mirantis OpenStack distributions and reference architecture.


Sakari Keskitalo, Codership’s Chief Operating Officer said: “Our mission is to provide world-class, web-scale technology as widely as possible. OpenStack provides an excellent platform for Galera to accomplish our mission. Galera Cluster’s success and popularity has made it ready for deployment ahead of the new data deluge, created by connected machines, coming into the work environment. Codership being the most popular high availability solution for the databases gives us confidence enterprises take clustering very seriously. The more they do, the more we will drive to stay ‘Top of the Stack’.”

Cédric Bruderer: Reset MySQL 5.7 password on macOS over the command line

This one is for all MySQL-DBA's, which are working on macOS. Since the Apple OS has a rather peculiar way of starting and stopping MySQL, compared to Linux, you can run into some issues. These problems occur especially, if you have no access to the GUI.


Put skip-grant-tables into the mysqld section of the my.cnf. A my.cnf can be found in /usr/local/mysql/support-files. You MUST work as root for all the following steps.

shell> sudo -s shell> vi /usr/local/mysql/support-files/my-default.cnf ... [mysqld] skip-grant-tables skip-networking ...

Save the configuration file! (In vi this is "[ESC] + :x")

Continue with stopping MySQL:

launchctl unload /Library/LaunchDaemons/

Restart MySQL, so skip-grant-tables becomes active:

launchctl load /Library/LaunchDaemons/

Reset the password

After MySQL is started again, you can log into the CLI and reset the password:

shell> mysql -u root mysql> FLUSH PRIVILEGES; mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY 'super-secret-password';

Plan B

If you are not capable of stopping MySQL in a civilised manner, you can use the more rough way. You can send a SIGTERM to the MySQL-Server:

shell> ps -aef | grep mysql | grep -v grep 74 28017 1 0 Fri10AM ?? 5:59.50 /usr/local/mysql/bin/mysqld --user=_mysql --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data --plugin-dir=/usr/local/mysql/lib/plugin --log-error=/usr/local/mysql/data/mysqld.local.err --pid-file=/usr/local/mysql/data/

You should receive one line. The second column from the left is the process id. Use this process id to stop the MySQL-Server.

shell> kill -15 [process id]

In this example, the command would look like this:

shell> kill -15 28017

macOS will restart MySQL, since the process has not stopped correctly. The configuration will be read and the changes to the parameters will become effective. Continue with logging in to the CLI.


No matter how secure your MySQL-Password is, it is a lot more important to secure access to the server it self. If your server is not secured by something that prevents access from the internet, it will only take a few minutes for someone with bad intentions to take over your database or worse, the entire server.

Taxonomy upgrade extras: mysqlserver

What is the default sharding key in MySQL Cluster?

MySQL Cluster does an automatic sharding/partitioning to the tables across data nodes, enabling databases to scale horizontally to serve read and write-intensive workloads, but what is the default sharding key used in partitioning the data?
According to the recent update (Oct, 2016) of the MySQL Cluster white paper, primary key is the default sharding key:

By default, sharding is based on hashing of the primary key, which generally leads to a more even distribution of data and queries across the cluster than alternative approaches such as range partitioning.

However, that is not the case in all MySQL Cluster versions so far!
In this post, I’ll do some test cases on MySQL Cluster (of 4 datanodes) to confirm the default sharding key.

Testing on MySQL Cluster 7.2.26

Creating a simple table of one column (as primary key), insert a single value in that table and check which partition(s) will be used to retrieve that value:
mysql> CREATE TABLE shard_check_pk (id int(11) primary key) engine=ndbcluster;
Query OK, 0 rows affected (0.38 sec)
mysql> INSERT INTO shard_check_pk values (1);
Query OK, 1 row affected (0.01 sec)
mysql> EXPLAIN PARTITIONS SELECT * FROM shard_check_pk WHERE id=1;
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | shard_check | p3 | eq_ref | PRIMARY | PRIMARY | 4 | const | 1 | |
1 row in set (0.00 sec)

If the PK is the sharding key, then a PK value lookup will be checked in only one partition and – according to the previous explain output – that was the case already in this example, a single id value (id=1) does only require scanning one partition, ”p3”.

What is the case then if we didn’t specify a PK for a table in MySQL Cluster?

mysql> CREATE TABLE shard_check_no_pk (id int(11)) engine=ndbcluster;
Query OK, 0 rows affected (0.19 sec)
mysql> INSERT INTO shard_check_no_pk values (1);
Query OK, 1 row affected (0.00 sec)
mysql> EXPLAIN PARTITIONS SELECT * FROM shard_check_no_pk WHERE id=1;
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | shard_check_no_pk | p0,p1,p2,p3 | ALL | NULL | NULL | NULL | NULL | 2 | Using where with pushed condition |
1 row in set (0.00 sec)

As we can see in the previous explain plan, all partitions in the table (p0, p1, p2 & p3) will be scanned to retrieve a single id value (id=1). The reason for that is because MySQL Cluster creates a hidden column to be the sharding key on tables without PKs.

Up to here, the results are as expected but in the latter MySQL Cluster versions (7.3, 7.4 and 7.5), the explain plan has different output!

Testing on MySQL Cluster 7.5.4 (same results on 7.3 and 7.4):

mysql> CREATE TABLE shard_check_pk (id int(11) primary key) engine=ndbcluster;
Query OK, 0 rows affected (0.38 sec)
mysql> INSERT INTO shard_check_pk values (1);
Query OK, 1 row affected (0.01 sec)
mysql> EXPLAIN SELECT * FROM shard_check_pk WHERE id=1;
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
| 1 | SIMPLE | shard_check_pk | p0,p1,p2,p3 | eq_ref | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | NULL |
1 row in set, 1 warning (0.00 sec)

Starting from MySQL Cluster 7.3, the explain plan shows as the hidden column is always used as the default sharding key even on a table that has a PK (check my bug report about this case Bug #84374) which is not true according to the ndb_desc tool! Thanks Maurits for the correction.
If we specified the sharding key explicitly, the output will be corrected:

mysql> ALTER TABLE shard_check_pk PARTITION BY KEY (`id`);
Query OK, 1 row affected (1.10 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> EXPLAIN SELECT * FROM shard_check_pk WHERE id=1;
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
| 1 | SIMPLE | shard_check_pk | p3 | eq_ref | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | NULL |
1 row in set, 1 warning (0.00 sec)

  1. Primary key is the default sharding key in MySQL Cluster 7.2 and a hidden column will be used if no PKs defined for a table.
  2. Until the bug got fixed and starting from MySQL Cluster 7.3, the hidden column is always the default sharding key even on a table that has a PK The PK is also the default sharding key but the explain plan shows as a hidden column is always used instead even on a table that has a PK!

Proxy Wars: comparing ProxySQL to others

MySQL load balancers and proxies are becoming a very hot topic in the last few years. In the near future there will two presentations to compare proxies:

Due the increased popularity of the subject, I decided to go ahead and do some comparison between ProxySQL and the other proxies.
Here is compare chart. Let me know if I missed anything, or if I should make any corrections.

Please save the dates and make sure to attend these sessions!

Happy Proxying!

Easy-to-use Perl scripts to backup your MySQL database with mysqldump and FTP the files to a remote server

Most users of MySQL utilize the mysqldump utility to backup their database. While mysqldump is handy and easy-to-use (and free), if you have data which is important to your business, then you should take a look at the MySQL Enterprise Edition – and use MySQL Enterprise Backup instead. The MySQL Enterprise Backup allows you to backup your database without the table locking you get with mysqldump. And, it is extremely fast – especially when you have to restore your database. Here is a sample speed comparison between MySQL Enterprise Backup and using mysqldump:

49x Better Performance: Backup

80x Better Performance: Backup

From the image, you can see it takes a long time to either dump or restore a fairly large (73 gigabyte) database compared to mysqldump. Even if your database isn’t this large (and most people don’t care how long their backups take to complete), when it comes time to restore your database in a production environment, the quicker you can restore your database, the better.

If your only option is to use mysqldump, here are two Perl scripts to make the task easier. The first script will backup your database(s) and send a copy of your backup to a remote server via FTP. The second script will connect to your FTP server and delete your old backup files – in case you have a storage limit on your FTP server. You can put these scripts in cron or Windows Task Scheduler, or run them manually. You can have the backup script run as often as possible (maybe once an hour) – but keep in mind there will be table-level locking. The script to delete the old backups only needs to be run once a day.


For the backup script, you will have to enter a few variables to match your system. You will also need to create a configuration file of all of the databases you want to backup. I could have connected to the MySQL database and ran a query (SHOW DATABASES;) to retrieve all of the databases, but I prefer to manually manage the list of databases to backup. With this method, you can skip an hourly backup of static or read-only databases and only backup the databases which are being changed. This configuration file is a text file with a list of the databases to be backed up, and you can use a # (pound sign) to comment out databases you want to skip.

NOTE:You don’t want to backup the following databases: PERFORMANCE_SCHEMA, INFORMATION_SCHEMA or SYS SCHEMA.

# set the directory where you will keep the backup files $backup_folder = '/Users/tonydarnell/cron/mysqlbackups'; # the config file is a text file with a list of the databases to backup # this should be in the same location as this script, but you can modify this # if you want to put the file somewhere else my $config_file = dirname($0) . "/mysql_backup.config"; # Here is where you will put your FTP server name (or IP address) # and your FTP username and password my $host = ""; my $user = "username"; my $password = "password";

You can also modify your mysqldump command which will be use to backup your databases, or use what I have in the script. You will need to add your mysql password where I have the word “PassWord“.

`/usr/local/mysql/bin/mysqldump -R -h192.168.1.2 --events --triggers -u mysqlbackup --password=PassWord --routines --add-drop-database --set-gtid-purged=OFF --add-drop-table $database $table | compress > $folder/$file.Z`;

I created a separate FTP user which has its own home directory, so I don’t have to change the directory once I login via FTP. If you can’t do this, you will need to uncomment the last three lines of this and navigate your FTP user to the correct directory:

# uncomment the last three lines if you can't set the home directory of the FTP user to a specific directory # the directory on the FTP server where you want to save the backup files # my $dir = "mysqlbackups"; # print "Accessing FTP - changing to $dir folder\n"; # $f->cwd($dir) or die "Can't cwd to $dir\n";

That is all you need to modify in the first script for it to work. If you run this script on the command line, or if you run it in cron, you should see something like this:

mysqladmin: [Warning] Using a password on the command line interface can be insecure.

It isn’t a good idea to have your password in plain text anywhere, so you should create a user which only has the limited read-only permissions needed to run mysqldump. You will need to change the value of “database_name” in the GRANT statement to match each database you want to backup. You will need to run the GRANT statement for every database you want to backup, or you can use an asterisk “*” in place of the database name.

CREATE USER 'mysqlbackup'@'' IDENTIFIED WITH sha256_password BY ''; GRANT SELECT, LOCK TABLES, SHOW VIEW, EVENT, TRIGGER ON `database_name`.* TO 'mysqlbackup'@'';

Or, to grant permissions to the mysqlbackup user on all of the tables:


Here is a sample output from the script:

# perl Beginning 2017-01-06-16:35:57 Backing up database_01 - 2017-01-06-1635 ... mysqldump: [Warning] Using a password on the command line interface can be insecure. Finished exporting database_01 - as 2017-01-06-1635/database_01.sql.Z Backing up database_02 - 2017-01-06-1635 ... mysqldump: [Warning] Using a password on the command line interface can be insecure. Finished exporting database_02 - as 2017-01-06-1635/database_02.sql.Z Backing up database_03 - 2017-01-06-1635 ... mysqldump: [Warning] Using a password on the command line interface can be insecure. Finished exporting database_03 - as 2017-01-06-1635/database_03.sql.Z .... Backing up database_04 - 2017-01-06-1635 ... mysqldump: [Warning] Using a password on the command line interface can be insecure. Finished exporting database_04 - as 2017-01-06-1635/database_04.sql.Z ------------------------------------------------------------------ Compressing backup as: 2017-01-06-1635.tar.gz ------------------------------------------------------------------ Creating a tar file from the backup... tar -czf 2017-01-06-1635.tar.gz 2017-01-06-1635 FTP'ing the file - 2017-01-06-1635.tar.gz Deleting the original backup directory and files. Finished 2017-01-06-16:36:49

The backup script creates a new directory with a timestamp as the name of the directory. In this example, the directory was named “2017-01-06-1635“. After the backup is finished, it creates a single tar file and then deletes the backup directory and files. I delete the backup directory and files as the backup is now contained in the single tar file. It is easier to copy and delete a single tar file via FTP than to do the same with a directory containing multiple files.


The script to delete the old backups is fairly simple to configure. You will need to enter the host, username and password of your FTP user, and specify how many backups you want to keep on the server. Since I do a backup once an hour, I keep a week’s worth (168 copies) of backups on my server. I could have checked the date/time on the files and deleted the older files which were X number of days old, but I decided to just go with a certain number of files to keep. I also included a “print_output” variable if you want to suppress any output – simply change this value to anything but “yes” and the script won’t print any output.

# Here is where you will put your FTP server name (or IP address) # and your username and password my $host = ""; my $user = "username"; my $password = "password"; # how many copies of the backup do you want to keep? $total_files_to_keep = 168; $print_output = "yes";

Uncomment this line if you want to see a list of the other files which will not be deleted.

# optional output - remove # to have it print remaining files # if ($print_output eq "yes") { print "| $count of $total_files_available | Keeping: $filename\n"; }

Here is a sample output from running the script:

root# perl -------------------------------------------- Total files: 194 Total backup files: 192 Total files to keep: 168 Total files to delete: 24 ---------------------------------------------- | x of 192 | Skipping: . | x of 192 | Skipping: .. | 1 of 192 | Deleting: 2017-12-29-1352.tar.gz | 2 of 192 | Deleting: 2017-12-29-1452.tar.gz | 3 of 192 | Deleting: 2017-12-29-1552.tar.gz | 4 of 192 | Deleting: 2017-12-29-1652.tar.gz .... | 24 of 192 | Deleting: 2017-12-30-1252.tar.gz ---------------------------------------------- Finished 2017-01-06-15:21:58

When I run this script on my FTP server, and I do a listing of the backup directory, it shows the single-period (.) or current directory and the double-period (..) or the parent directory. I take this into account by skipping all files which do not have “20” in the name (as in the first two letters of the year).

If you login to your FTP server and you don’t see the “.” and “..“, then you will need to remove the “-2” in this part of the script, and simply have $total_files_available = $total_files:

# subtract two because of the . and .. $total_files_available = $total_files - 2;

Here is what I see when I FTP to my server: (notice the “.” and “..“)

root# ftp Trying 1234:f1c0:4738:5088:cb9a:dksi:ebfa:3829... Connected to 220 FTP Server ready. 331 Password required for backups Password: 230 User backups logged in Remote system type is UNIX. Using binary mode to transfer files. ftp> dir 229 Entering Extended Passive Mode (|||58906|) 150 Opening ASCII mode data connection for file list drwx---r-x 2 u63541528-backups ftpusers 4096 Jan 6 16:52 . drwx---r-x 2 u63541528-backups ftpusers 4096 Jan 6 16:52 .. -rw----r-- 1 u63541528-backups ftpusers 45522630 Jan 5 22:52 2017-01-05-2252.tar.gz -rw----r-- 1 u63541528-backups ftpusers 45539118 Jan 5 23:52 2017-01-05-2352.tar.gz -rw----r-- 1 u63541528-backups ftpusers 45558328 Jan 6 01:52 2017-01-06-0152.tar.gz -rw----r-- 1 u63541528-backups ftpusers 45560794 Jan 6 07:52 2017-01-06-0752.tar.gz ....

The scripts are available on GitHub – The scripts are named: and

I am not the best Perl programmer, so there may be a better/easier way to do this. Good luck with the scripts and let me know how they work for you. And follow me on Twitter at ScriptingMySQL.


Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] and on LinkedIn. Tony is the author of Twenty Forty-Four: The League of Patriots


Visit for more information.

Log Buffer #501: A Carnival of the Vanities for DBAs

This Log Buffer Edition covers Oracle, SQL Server and MySQL blog posts.


Will the FPDS-NG v 1.5 Upgrade In April 1st 2017 impact Contract Lifecycle Management (CLM)?

Real-Time Materialized Views in #Oracle 12c

Simplifying your data validation code with Database 12.2

Oracle VSIZE Function with Examples

Fixing PRVF-0002 : Could not retrieve local nodename

SQL Server:

Dynamic Data Masking Feature in SQL Server 2016

Removing Email from msdb Database

Identify Servers and the Databases Mounted in SharePoint

SQL Server v.Next : STRING_AGG() performance

Bridging the DevOps Divide


Duplicate Indexes in MySQL

Powerful MariaDB exports using MyMDBDump

Tips and Tricks – How to shard MySQL with ProxySQL in ClusterControl

Faster MySQL replication with group commit and delay

Caching and crashing in lm2

Archiving MySQL and MongoDB Data

This post discusses archiving MySQL and MongoDB data, and determining what, when and how to archive data.

Many people store infrequently used data. This data is taking up storage space and might make your database slower than it could be. Archiving data can be a huge benefit, both regarding the performance impact and storage savings.

Why archive?

One of the reasons for archiving data is freeing up space on your database volumes. You can store archived data on slower, less expensive storage devices, and current data on the faster database drives. Archiving old data makes backups and restores run faster since they need to process less data. Last, but by no means least, archiving data has the benefit of making your queries perform more efficiently since they do not need to process through old data.

What do you archive?

That is the big question. Archiving too much is just as detrimental as not archiving enough (or at all). As you’ll see, finding this balance requires foresight and planning. Fortunately, you can tweak your archiving scheme to make it work better as time goes by,

Some people feel that keeping all the data in their database, even if they don’t access that data frequently, is the best way to go. If you are lucky enough to have vast quantities of storage, and a database that is performing well, keeping all of the data in your database might be a good idea. Even with lots of storage, archiving off some data that you don’t use regularly might have advantages. We all know someone whose desk is piled with stacks of paper. When they need something, they tell us that they know where everything is. Even if they can find the requested item, they need to work through the piles of stuff to locate it. They also have to decide where to put new items so that they can be easily found. In your database, this equates to slower queries and potentially slower writes. Clearing out some of the less frequently accessed data will have a beneficial effect overall.

At the other end of the spectrum are the people who want to archive in a manner that is too aggressive. This means that any requests for data must access the archive location This might be slower and more burdensome, causing the queries to run slowly. In addition, new data written into the database will have to go through an archive process fairly quickly, which might slow down the database. This is the person who puts each and every item they own into storage. It makes for a clean home, but it’s tough to find many of the items that you own. In our database, this means that most queries are run against archived data, and archive processes are frequently running. This too can slow down performance overall.

The best archiving solution is one that meets both the needs of efficient use of storage and efficiency of queries and inserts. You want to be able to write new data quickly, access frequently used data promptly, and still be able to get the information that might not often be used. There is no simple answer here: each company will have different needs and requirements. For some companies, regulations might govern how long data must be stored. With these sorts of requirements in place, you should look to place data that isn’t accessed often on a storage medium that is lower in cost (and often slower in performance). It is still there, but it is not crowding out the more commonly used data. Other companies might query or manipulate data shortly after it is loaded into the database, and they might be able to archive more often.

When do you archive?

This is another big consideration. Do you archive data daily, weekly, monthly, annually or on some other schedule? The basic answer is that it doesn’t matter what the schedule is. It matters that there is some sort of schedule, and that archiving is happening as expected. Keeping to a schedule allows everyone to know that the data is being archived as expected, and will avoid any “gee, we completely forgot about doing that” issues from arising.

Frequent archiving (daily or weekly) is good when you have high data volumes and normally need to access only the latest data in your queries. Think of stock data. Queries to pull trade volumes and pricing over a short time period are more frequent than queries that would analyze a stock’s performance over time. Therefore, archiving old data can be helpful since it keeps the frequently accessed table’s data easily accessible, but still accommodates the need to get at data for longer time spans. With high data volume, you might need to archive often so that one archive process can complete before another is started.

Less frequent archiving might be used when you have longer term projects or if you find your current database is performing reasonably well. In these cases, archiving monthly, quarterly, or annually might make sense. This is like cleaning out your garage or attic. You might do it, but you probably don’t do it every week. The amount of stuff being stored, along with the space to store it in, might determine how often you do this type of cleanup.

How do you go about archiving MySQL and MongoDB data?

There are lots of possibilities here as well. If well planned, it can be an easy implementation. But like many things, figuring this out is usually done once things have gotten a little out of control.

You can archive data using a standard backup, moving it to another table in the database, exporting to a flat file, or moving it to another database altogether. The end goal is a cleaner production environment that still allows access to the archived data if it is needed. The method for performing the archive determines the method used to bring that data back to a state in which it can be queried. One of the considerations must be how much time you are willing and able to invest in making that data available again.

  1. You can use your standard backup method to create and manage your archive, but this is a solution that is cumbersome and prone to error. You can perform a backup and then delete the unwanted data from your table(s). Now, the deleted data is only stored in your backup and must be restored in order to be queried. You should restore to another database for this purpose so that you keep your production environment clean. With this option, you also have to consider the methods for recovering space used by deleted files. This opens to the possibility of someone restoring to the original database, which can cause a host of problems. With MongoDB, there is an optional –archive option that moves the data to an archive location that you specify. MongoDB version 3.2 added this option.
  2. Another possibility is to move the data to another MySQL table or MongoDB collection in the existing database (i.e., moving from the transactions table to transactions_archived). This is a fast and efficient way to backup the data, and it allows for easy querying since the data still resides in the database. Of course, this assumes that you have enough storage space to accommodate the active and the archive tables/collections.
  3. You can also export the data to be archived to a flat file and then delete it from the original table or collection. This is workable if the data needs to be kept available but is unlikely to be regularly needed. (It will need to be imported in order to query it.) This method also comes with all the caveats about needing to delete and recover the space of the archived records, issues with importing into the original database (and ruining all the good archiving work you’ve done, and the possibility of deleting the flat file.
  4. Alternatively, you can move the data to another database. This too can be an effective method for archiving data, and can also allow that data to be made available to others for query analysis. Once again, all warnings about recovering the space apply, but the good thing here is that the data does not need to be restored to be queried. It is simply queried through the other database.

Another option for archiving MySQL data is a tool like pt-archiver. pt-archiver is a component of the Percona Toolkit that nibbles old data from a source table and moves it to a target table. The target can be in the current or an archive database. It is designed for use with an up-and-running database. It has minimal impact on the overall performance of the database. It is part of the Percona Toolkit, so it is available as an open source download. It has the benefit of slowly working to move the data and is always running. This allows it to archive data regularly and cleanly. One warning is that it does delete the data from the source table, so you should test it before running it in production. pt-archiver works with MySQL data only. It is also important to note that removing large quantities of data might cause InnoDB fragmentation. Running OPTIMIZE TABLE to recover the space resolves this. As of version 5.7.4, this is no longer a locking action.

So now what?

Unless you are in the enviable position where archiving MySQL and MongoDB data isn’t an issue, the first step is to come up with an archiving scheme. This will likely involve many different people since there can be an impact across the entire organization. Determine what can and should be archived, and then determine how best to archive the data. Document the process and test it before putting it into production. In the end, your database and your users will thank you.

Millions of Queries per Second: PostgreSQL and MySQL’s Peaceful Battle at Today’s Demanding Workloads

This blog compares how PostgreSQL and MySQL handle millions of queries per second.

Anastasia: Can open source databases cope with millions of queries per second? Many open source advocates would answer “yes.” However, assertions aren’t enough for well-grounded proof. That’s why in this blog post, we share the benchmark testing results from Alexander Korotkov (CEO of Development, Postgres Professional) and Sveta Smirnova (Principal Technical Services Engineer, Percona). The comparative research of PostgreSQL 9.6 and MySQL 5.7 performance will be especially valuable for environments with multiple databases.

The idea behind this research is to provide an honest comparison for the two popular RDBMSs. Sveta and Alexander wanted to test the most recent versions of both MySQL and PostgreSQL with the same tool, under the same challenging workloads and using the same configuration parameters (where possible). However, because both PostgreSQL and MySQL ecosystems evolved independently, with standard testing tools (pgbench and SysBench) used for each database, it wasn’t an easy journey.

The task fell to database experts with years of hands-on experience. Sveta has worked as a Senior Principal Technical Support Engineer in the Bugs Verification Group of the MySQL Support Group at Oracle for more than eight years, and since 2015  has worked as a Principal Technical Services Engineer at Percona. Alexander Korotkov is a PostgreSQL major contributor, and the developer of a number PostgreSQL features – including the CREATE ACCESS METHOD command, generic WAL interface, lockfree Pin/UnpinBuffer, index-based search for regular expressions and much more. So we have a pretty decent cast for this particular play!

SvetaDimitri Kravtchuk regularly publishes detailed benchmarks for MySQL, so my main task wasn’t confirming that MySQL can do millions of queries per second. As our graphs will show, we’ve passed that mark already. As a Support Engineer, I often work with customers who have heterogeneous database environments in their shops, and want to know about the impact of migrating jobs from one database to another. So instead, I found the chance to work with the Postgres Professional company and identify both the strong and weak points of the two databases an excellent opportunity.

We wanted to test both databases on the same hardware, using the same tools and tests. We expected to test base functionality, and then work on more detailed comparisons. That way we could compare different real-world use case scenarios and popular options.

Spoiler: We are far from the final results. This is the start of a blog series.

OpenSource Databases on Big Machines, Series 1: “That Was Close…”

PostgreSQL Professional together with Freematiq provided two modern, powerful machines for tests.

Hardware configuration:

Processors: physical = 4, cores = 72, virtual = 144, hyperthreading = yes
Memory: 3.0T
Disk speed: about 3K IOPS
OS: CentOS 7.1.1503
File system: XFS

I also used a smaller Percona machine.

Hardware configuration:

Processors: physical = 2, cores = 12, virtual = 24, hyperthreading = yes
Memory: 251.9G
Disk speed: about 33K IOPS
OS: Ubuntu 14.04.5 LTS
File system: EXT4

Note that machines with smaller numbers of CPU cores and faster disks are more common for MySQL installations than machines with larger numbers of cores.

The first thing we needed to agree on is which tool to use. A fair comparison only makes sense if the workloads are as close as possible.

The standard PostgreSQL tool for performance tests is pgbench, while for MySQL it’s SysBench. SysBench supports multiple database drivers and scriptable tests in the Lua programming language, so we decided to use this tool for both databases.

The initial plan was to convert pgbench tests into SysBench Lua syntax, and then run standard tests on both databases. After initial results, we modified our tests to better examine specific MySQL and PostgreSQL features.

I converted pgbench tests into SysBench syntax, and put the tests into an open-database-bench GitHub repository.

And then we both faced difficulties.

As I wrote already, I also ran the tests on a Percona machine. For this converted test, the results were almost identical:

Percona machine:

OLTP test statistics: transactions: 1000000 (28727.81 per sec.) read/write requests: 5000000 (143639.05 per sec.) other operations: 2000000 (57455.62 per sec.)

Freematiq machine:

OLTP test statistics: transactions: 1000000 (29784.74 per sec.) read/write requests: 5000000 (148923.71 per sec.) other operations: 2000000 (59569.49 per sec.)

I started investigating. The only place where the Percona machine was better than Freematiq’s was disk speed. So I started running the pgbench read-only test, which was identical to SysBench’s point select test with full dataset in memory. But this time SysBench used 50% of the available CPU resources:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4585 smirnova 20 0 0,157t 0,041t 9596 S 7226 1,4 12:27.16 mysqld 8745 smirnova 20 0 1266212 629148 1824 S 7126 0,0 9:22.78 sysbench

Alexander, in turn, had issues with SysBench, which could not create a high load on PostgreSQL when prepared statements were used:

93087 korotkov 20 0 9289440 3,718g 2964 S 242,6 0,1 0:32.82 sysbench 93161 korotkov 20 0 32,904g 81612 80208 S 4,0 0,0 0:00.47 postgres 93116 korotkov 20 0 32,904g 80828 79424 S 3,6 0,0 0:00.46 postgres 93118 korotkov 20 0 32,904g 80424 79020 S 3,6 0,0 0:00.47 postgres 93121 korotkov 20 0 32,904g 80720 79312 S 3,6 0,0 0:00.47 postgres 93128 korotkov 20 0 32,904g 77936 76536 S 3,6 0,0 0:00.46 postgres 93130 korotkov 20 0 32,904g 81604 80204 S 3,6 0,0 0:00.47 postgres 93146 korotkov 20 0 32,904g 81112 79704 S 3,6 0,0 0:00.46 postgres

We contacted SysBench author Alexey Kopytov, and he fixed MySQL issue. The solution is:

  • Use SysBench with the options --percentile=0 --max-requests=0  (reasonable CPU usage)
  • Use the concurrency_kit branch (better concurrency and Lua processing)
  • Rewrite Lua scripts to support prepared statements (pull request:
  • Start both SysBench and mysqld with the jemalloc or tmalloc library pre-loaded

A fix for PostgreSQL is on the way. For now, Alexander converted a standard SysBench test into pgbench format and we stuck with it. Not much new for MySQL, but at least we had a baseline for comparison.

The next difficulty I faced was the default operating system parameters. To make the long story short, I changed them to the recommended ones (described below):

vm.swappiness=1 cpupower frequency-set --governor performance kernel.sched_autogroup_enabled=0 kernel.sched_migration_cost_ns= 5000000 vm.dirty_background_bytes=67108864 vm.dirty_bytes=536870912 IO scheduler [deadline]

The same parameters were better for PostgreSQL performance as well. Alexander set his machine similarly.

After solving these issues we learned and implemented the following:

  • We cannot use a single tool (for now)
  • Alexander wrote a test for pgbench, imitating the standard SysBench tests
  • We are still not able to write custom tests because we use different tools

But we could use these tests as a baseline. After work done by Alexander, we stuck with the standard SysBench tests. I converted them to use prepared statements, and Alexander converted them into pgbench format.

I should mention that I was not able to get the same results as Dimitri for the Read Only and Point Select tests. They are close, but slightly slower. We need to investigate if this is the result of different hardware, or my lack of performance testing abilities. The results from the Read-Write tests are similar.

Another difference was between the PostgreSQL and MySQL tests. MySQL users normally have many connections. Setting the value of the variable

max_conenctions, and limiting the total number of parallel connections to thousands is not rare nowadays. While not recommended, people use this option even without the thread pool plugin. In real life, most of these connections are sleeping. But there is always a chance they all will get used in cases of increased website activity.

For MySQL I tested up to 1024 connections. I used powers of two and multiplies of the number of cores: 1, 2, 4, 8, 16, 32, 36, 64, 72, 128, 144, 256, 512 and 1024 threads.

For Alexander, it was more important to test in smaller steps. He started from one thread and increased by 10 threads, until 250 parallel threads were reached. So you will see a more detailed graph for PostgreSQL, but no results after 250 threads.

Here are our comparison results.


  • pgsql-9.6 is standard PostgreSQL
  • pgsql-9.6 + pgxact-align is PostgreSQL with this patch (more details can be found in this blog post)
  • MySQL-5.7 Dimitri is Oracle’s MySQL Server
  • MySQL-5.7 Sveta is Percona Server 5.7.15



Sync commit in PostgreSQL is a feature, similar to

innodb_flush_log_at_trx_commit=1 in InnoDB, and async commit is similar to innodb_flush_log_at_trx_commit=2.

You see that the results are very similar: both databases are developing very fast and work with modern hardware well.

MySQL results which show 1024 threads for reference.


OLTP RW with innodb_flush_log_at_trx_commit set to 1 and 2

After receiving these results, we did a few feature-specific tests that will be covered in separate blog posts.

More Information

MySQL Options for OLTP RO and Point SELECT tests:

# general table_open_cache = 8000 table_open_cache_instances=16 back_log=1500 query_cache_type=0 max_connections=4000 # files innodb_file_per_table innodb_log_file_size=1024M innodb_log_files_in_group=3 innodb_open_files=4000 # Monitoring innodb_monitor_enable = '%' performance_schema=OFF #cpu-bound, matters for performance #Percona Server specific userstat=0 thread-statistics=0 # buffers innodb_buffer_pool_size=128000M innodb_buffer_pool_instances=128 #to avoid wait on InnoDB Buffer Pool mutex innodb_log_buffer_size=64M # InnoDB-specific innodb_checksums=1 #Default is CRC32 in 5.7, very fast innodb_use_native_aio=1 innodb_doublewrite= 1 # innodb_stats_persistent = 1 innodb_support_xa=0 #(We are read-only, but this option is deprecated) innodb_spin_wait_delay=6 #(Processor and OS-dependent) innodb_thread_concurrency=0 join_buffer_size=32K innodb_flush_log_at_trx_commit=2 sort_buffer_size=32K innodb_flush_method=O_DIRECT_NO_FSYNC innodb_max_dirty_pages_pct=90 innodb_max_dirty_pages_pct_lwm=10 innodb_lru_scan_depth=4000 innodb_page_cleaners=4 # perf special innodb_adaptive_flushing = 1 innodb_flush_neighbors = 0 innodb_read_io_threads = 4 innodb_write_io_threads = 4 innodb_io_capacity=2000 innodb_io_capacity_max=4000 innodb_purge_threads=4 innodb_max_purge_lag_delay=30000000 innodb_max_purge_lag=0 innodb_adaptive_hash_index=0 (depends on workload, always check)

MySQL Options for OLTP RW:

#Open files table_open_cache = 8000 table_open_cache_instances = 16 query_cache_type = 0 join_buffer_size=32k sort_buffer_size=32k max_connections=16000 back_log=5000 innodb_open_files=4000 #Monitoring performance-schema=0 #Percona Server specific userstat=0 thread-statistics=0 #InnoDB General innodb_buffer_pool_load_at_startup=1 innodb_buffer_pool_dump_at_shutdown=1 innodb_numa_interleave=1 innodb_file_per_table=1 innodb_file_format=barracuda innodb_flush_method=O_DIRECT_NO_FSYNC innodb_doublewrite=1 innodb_support_xa=1 innodb_checksums=1 #Concurrency innodb_thread_concurrency=144 innodb_page_cleaners=8 innodb_purge_threads=4 innodb_spin_wait_delay=12 Good value for RO is 6, for RW and RC is 192 innodb_log_file_size=8G innodb_log_files_in_group=16 innodb_buffer_pool_size=128G innodb_buffer_pool_instances=128 #to avoid wait on InnoDB Buffer Pool mutex innodb_io_capacity=18000 innodb_io_capacity_max=36000 innodb_flush_log_at_timeout=0 innodb_flush_log_at_trx_commit=2 innodb_flush_sync=1 innodb_adaptive_flushing=1 innodb_flush_neighbors = 0 innodb_max_dirty_pages_pct=90 innodb_max_dirty_pages_pct_lwm=10 innodb_lru_scan_depth=4000 innodb_adaptive_hash_index=0 innodb_change_buffering=none #can be inserts, workload-specific optimizer_switch="index_condition_pushdown=off" #workload-specific

MySQL SysBench parameters:

LD_PRELOAD=/data/sveta/5.7.14/lib/mysql/  /data/sveta/sbkk/bin/sysbench  [ --test=/data/sveta/sysbench/sysbench/tests/db/oltp_prepared.lua | --test=/data/sveta/sysbench/sysbench/tests/db/oltp_simple_prepared.lua ]  --db-driver=mysql --oltp-tables-count=8 --oltp-table-size=10000000 --mysql-table-engine=innodb --mysql-user=msandbox --mysql-password=msandbox  --mysql-socket=/tmp/mysql_sandbox5715.sock --num-threads=$i --max-requests=0 --max-time=300 --percentile=0 [--oltp-read-only=on --oltp-skip-trx=on]
PostgreSQL pgbench parameters:

$ git clone $ cd pg_oltp_bench $ make USE_PGXS=1 $ sudo make USE_PGXS=1 install $ psql DB -f oltp_init.sql $ psql DB -c "CREATE EXTENSION pg_oltp_bench;" $ pgbench -c 100 -j 100 -M prepared -f oltp_ro.sql -T 300 -P 1 DB $ pgbench -c 100 -j 100 -M prepared -f oltp_rw.sql -T 300 -P 1 DB

Features in MySQL 5.7 that significantly improved performance:

  • InnoDB: transaction list optimization
  • InnoDB: Reduce lock_sys_t::mutex contention
  • InnoDB: fix index->lock contention
  • InnoDB: faster and parallel flushing
    • Multiple page cleaner threads: WL #6642
    • Reduced number of pages which needs to be flushed: WL #7047
    • Improved adaptive flushing: WL #7868
  • MDL (Meta-Data Lock) scalability
    • Remove THR_LOCK::mutex for InnoDB: Wl #6671
    • Partitioned LOCK_grant
    • Number of partitions is constant
    • Thread ID used to assign partition
    • Lock-free MDL lock acquisition for DML

Anastasia: The initial findings of this research were announced at Percona Live Amsterdam 2016. More findings were added to the second version of the same talk given at Moscow HighLoad++ 2016. Hopefully the third iteration of this talk will be available at Percona Live Open Source Database Conference 2017 in Santa Clara. Stay tuned: the Percona Live Committee is working on the program!



















Using MySQL to Output JSON

MySQL has had a JSON data type since version 5.7 was released way back in '15. But did you know you could produce JSON output from non-JSON columns? It is very simple and saves a lot of time over trying to format it in your application.

World DatabaseWe will be using the good old World database that MySQL has used for years in documentation, examples, and in the classroom. Starting with a simple query we will build up to something more complex.

SELECT Name, District, Population FROM City;

This will output the data from the table in a tabular format.

'Kabul', 'Kabol', '1780000'
'Qandahar', 'Qandahar', '237500'
Array or Object?We have two options for composing JSON data: JSON_ARRAY and JSON_OBJECT.

Of the two, you will find JSON_ARRAY the least fussy. It will JSON-ize your data very easily. It takes a list of values or an empty list and returns a JSON array.

We add this function to our example query and it becomes SELECT JSON_ARRAY(Name, District, Population) FROM City;

And the output looks like:

'[\"Kabul\", \"Kabol\", 1780000]'
'[\"Qandahar\", \"Qandahar\", 237500]'

JSON_OBJECT wants key/value pairs and will complain if the key name is NULL or you have an odd number of objects. If we try SELECT JSON_OBJECT(Name, District, Population) FROM City; we will get Error Code: 1582. Incorrect parameter count in the call to native function 'JSON_OBJECT'. This fuctions sees the odd number of arguments as a 'key' and the evens as the 'value' in key/value pairs and therefore we should not have an odd number of arguments. We could stick in a dummy literal string into the select but odds are that we want the three fields specified but need to turn them into key/value pairs. So lets add 'keys' and let the database supply the values.

SELECT JSON_OBJECT('City', Name, 'Dist', District, 'Pop', Population) FROM City;,

And the output looks like:

'{\"Pop\": 1780000, \"City\": \"Kabul\", \"Dist\": \"Kabol\"}'
'{\"Pop\": 237500, \"City\": \"Qandahar\", \"Dist\": \"Qandahar\"}'
ConclusionDevelopers need to work smarter and not harder. And I have been harping on letting the database do the heavy lifting for years. This is an example of letting the database format your information for you rather than feeding it into a function within your application. Sure you can do it but this saves you a step or two and reduces the complexity of your application.

Duplicate Indexes in MySQL

Why do we sometimes want to keep duplicate indexes?

I’ve done dutiful DBA work in the past to identify and remove what are commonly called duplicate indexes. That is, those indexes that look like (a) and (a,b). The thought is that a query will utilize an index as easily on (a) as on (a,b), and removing (a) will save storage cost and write performance. I’ve had the experience, though, of removing (a) and seeing performance tank.

(As an aside, these are really redundant indexes. A duplicate index would be (a,b) and (a,b) by two different names – this can commonly be done by object relational mapping (ORM) or other automated schema creation tools. I’ll call (a) and (a,b) redundant indexes below.)

This test is on Percona Server 5.7.14 with the sys schema installed and performance schema enabled.

Given two tables with the same number of rows and with the same schema except an extra index on the second table:

mysql>show create table mysqlslap.t5\G Table: t5 Create Table: CREATE TABLE `t5` ( `intcol1` int(11) DEFAULT NULL, `intcol2` int(11) DEFAULT NULL, `intcol3` int(11) DEFAULT NULL, `charcol1` varchar(255) DEFAULT NULL, `charcol2` varchar(255) DEFAULT NULL, `charcol3` varchar(255) DEFAULT NULL, KEY `one` (`intcol1`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec)mysql>show create table mysqlslap.t6\G Table: t6 Create Table: CREATE TABLE `t6` ( `intcol1` int(11) DEFAULT NULL, `intcol2` int(11) DEFAULT NULL, `intcol3` int(11) DEFAULT NULL, `charcol1` varchar(255) DEFAULT NULL, `charcol2` varchar(255) DEFAULT NULL, `charcol3` varchar(255) DEFAULT NULL, KEY `one` (`intcol1`), KEY `two` (`intcol1`,`charcol1`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec)

Notice that table t6 has a redundant index on intcol1.

The tables were populated with a statement like the following:

sudo mysqlslap –concurrency=2 –iterations=10000 –query=“insert into mysqlslap.t5(intcol1,intcol2,intcol3,charcol1,charcol2,charcol3,charcol4) values(FLOOR(RAND() * 10000),FLOOR(RAND() * 10000),FLOOR(RAND() * 10000),SUBSTRING(MD5(RAND()) FROM 1 FOR 50),SUBSTRING(MD5(RAND()) FROM 1 FOR 50),SUBSTRING(MD5(RAND()) FROM 1 FOR 50),now());” –delimiter=“;” –verbose

And then for the remaining tables, like this:

mysql>insert into t6 select * from t5;

In both cases, the explain plan is showing the following query will use index one (look at the read_cost value, too):

mysql>explain format=json select intcol1 from mysqlslap.t5 where intcol1>1910858200\G EXPLAIN: { “query_block”: { “select_id”: 1, “cost_info”: { “query_cost”: “7092.73” }, “table”: { “table_name”: “t5”, “access_type”: “range”, “possible_keys”: [ “one” ], “key”: “one”, “used_key_parts”: [ “intcol1” ], “key_length”: “5”, “rows_examined_per_scan”: 17670, “rows_produced_per_join”: 17670, “filtered”: “100.00”, “using_index”: true, “cost_info”: { “read_cost”: “3558.73”, “eval_cost”: “3534.00”, “prefix_cost”: “7092.73”, “data_read_per_join”: “13M” }, “used_columns”: [ “intcol1” ], “attached_condition”: “(`mysqlslap`.`t5`.`intcol1`>1910858200)” } } } mysql>explain format=json select intcol1 from mysqlslap.t6 where intcol1>1910858200\G EXPLAIN: { “query_block”: { “select_id”: 1, “cost_info”: { “query_cost”: “7092.73” }, “table”: { “table_name”: “t6”, “access_type”: “range”, “possible_keys”: [ “one”, “two” ], “key”: “one”, “used_key_parts”: [ “intcol1” ], “key_length”: “5”, “rows_examined_per_scan”: 17670, “rows_produced_per_join”: 17670, “filtered”: “100.00”, “using_index”: true, “cost_info”: { “read_cost”: “3558.73”, “eval_cost”: “3534.00”, “prefix_cost”: “7092.73”, “data_read_per_join”: “13M” }, “used_columns”: [ “intcol1” ], “attached_condition”: “(`mysqlslap`.`t6`.`intcol1`>1910858200)” } } } mysql>explain format=json select intcol1 from mysqlslap.t6 force index(two) where intcol1>1910858200\G EXPLAIN: { “query_block”: { “select_id”: 1, “cost_info”: { “query_cost”: “7508.43” }, “table”: { “table_name”: “t6”, “access_type”: “range”, “possible_keys”: [ “two” ], “key”: “two”, “used_key_parts”: [ “intcol1” ], “key_length”: “5”, “rows_examined_per_scan”: 17368, “rows_produced_per_join”: 17368, “filtered”: “100.00”, “using_index”: true, “cost_info”: { “read_cost”: “4034.84”, “eval_cost”: “3473.60”, “prefix_cost”: “7508.44”, “data_read_per_join”: “12M” }, “used_columns”: [ “intcol1” ], “attached_condition”: “(`mysqlslap`.`t6`.`intcol1`>1910858200)” } } }

But the explain tool only shows what the optimizer predicts it will do; is the optimizer predicting the best execution plan? What really happens?

mysqlslap –concurrency=2 –iterations=1000 –create-schema=mysqlslap –query=“select intcol1 from t5 where intcol1>FLOOR(RAND() * 10000000);select intcol1 from t6 where intcol1>FLOOR(RAND() * 10000000);select intcol1 from t6 force index(two) where intcol1>FLOOR(RAND() * 10000000);” –delimiter=“;” –verbosemysql>select * from sys.schema_index_statistics; ------------------------------------------------------------------------------------------------------------ | table_schema | table_name | index_name | rows_selected | select_latency | rows_inserted | insert_latency | rows_updated | update_latency | rows_deleted | delete_latency | ------------------------------------------------------------------------------------------------------------ | mysqlslap | t6 | two | 200692044 | 5.32 m | 0 | 0 ps | 0 | 0 ps | 0 | 0 ps | | mysqlslap | t6 | one | 200692044 | 4.92 m | 0 | 0 ps | 0 | 0 ps | 0 | 0 ps | | mysqlslap | t5 | one | 200692044 | 4.87 m | 0 | 0 ps | 0 | 0 ps | 0 | 0 ps | | sys | sys_config | PRIMARY | 0 | 0 ps | 0 | 0 ps | 0 | 0 ps | 0 | 0 ps | ------------------------------------------------------------------------------------------------------------

The select_latency is higher for the index on (intcol1,charcol1). That value is coming from the performance schema via the sys schema:

…`sys`.`format_time`(`performance_schema`.`table_io_waits_summary_by_index_usage`.`SUM_TIMER_FETCH`) AS `select_latency`…

The full sys schema view is:

CREATE ALGORITHM=MERGE DEFINER=`mysql.sys`@`localhost` SQL SECURITY INVOKER VIEW `sys`.`schema_index_statistics` AS select `performance_schema`.`table_io_waits_summary_by_index_usage`.`OBJECT_SCHEMA` AS `table_schema`,`performance_schema`.`table_io_waits_summary_by_index_usage`.`OBJECT_NAME` AS `table_name`,`performance_schema`.`table_io_waits_summary_by_index_usage`.`INDEX_NAME` AS `index_name`,`performance_schema`.`table_io_waits_summary_by_index_usage`.`COUNT_FETCH` AS `rows_selected`,`sys`.`format_time`(`performance_schema`.`table_io_waits_summary_by_index_usage`.`SUM_TIMER_FETCH`) AS `select_latency`,`performance_schema`.`table_io_waits_summary_by_index_usage`.`COUNT_INSERT` AS `rows_inserted`,`sys`.`format_time`(`performance_schema`.`table_io_waits_summary_by_index_usage`.`SUM_TIMER_INSERT`) AS `insert_latency`,`performance_schema`.`table_io_waits_summary_by_index_usage`.`COUNT_UPDATE` AS `rows_updated`,`sys`.`format_time`(`performance_schema`.`table_io_waits_summary_by_index_usage`.`SUM_TIMER_UPDATE`) AS `update_latency`,`performance_schema`.`table_io_waits_summary_by_index_usage`.`COUNT_DELETE` AS `rows_deleted`,`sys`.`format_time`(`performance_schema`.`table_io_waits_summary_by_index_usage`.`SUM_TIMER_INSERT`) AS `delete_latency` from `performance_schema`.`table_io_waits_summary_by_index_usage` where (`performance_schema`.`table_io_waits_summary_by_index_usage`.`INDEX_NAME` is not null) order by `performance_schema`.`table_io_waits_summary_by_index_usage`.`SUM_TIMER_WAIT` desc

Just to prove the results aren’t being confounded by the redundant index itself, here are two new tables with the same data:

mysql>show create table mysqlslap.t7\G Table: t7 Create Table: CREATE TABLE `t7` ( `intcol1` int(11) DEFAULT NULL, `intcol2` int(11) DEFAULT NULL, `intcol3` int(11) DEFAULT NULL, `charcol1` varchar(255) DEFAULT NULL, `charcol2` varchar(255) DEFAULT NULL, `charcol3` varchar(255) DEFAULT NULL, KEY `one` (`intcol1`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql>show create table mysqlslap.t8\G Table: t8 Create Table: CREATE TABLE `t8` ( `intcol1` int(11) DEFAULT NULL, `intcol2` int(11) DEFAULT NULL, `intcol3` int(11) DEFAULT NULL, `charcol1` varchar(255) DEFAULT NULL, `charcol2` varchar(255) DEFAULT NULL, `charcol3` varchar(255) DEFAULT NULL, KEY `two` (`intcol1`,`charcol1`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec)

(Restarted to reset the performance schema and sys tables.)

mysqlslap –concurrency=2 –iterations=1000 –create-schema=mysqlslap –query=“select intcol1 from t7 where intcol1>FLOOR(RAND() * 10000000);select intcol1 from t8 where intcol1>FLOOR(RAND() * 10000000);” –delimiter=“;” –verbosemysql>select * from sys.schema_index_statistics; ------------------------------------------------------------------------------------------------------------ | table_schema | table_name | index_name | rows_selected | select_latency | rows_inserted | insert_latency | rows_updated | update_latency | rows_deleted | delete_latency | ------------------------------------------------------------------------------------------------------------ | mysqlslap | t8 | two | 200012000 | 5.16 m | 0 | 0 ps | 0 | 0 ps | 0 | 0 ps | | mysqlslap | t7 | one | 200012000 | 4.67 m | 0 | 0 ps | 0 | 0 ps | 0 | 0 ps | ------------------------------------------------------------------------------------------------------------

Remember, this is the same data and schema except for the indexes.

Conclusion: Sometimes redundant indexes are helpful.

Thank you to Abhinav Gupta for the technical review.

Powerful MariaDB exports using MyMDBDump

You can export data from MariaDB using mysqldump and a bunch of other tools, but if you need really flexible output format, this might not be what you need. Instead, give MyMDBDump a try. This tool will export data just like mysqldump, but the output format is a lot more flexible. I urge you to test it and check out the documentation, but some if the features are, in short:
  • Dynamic column support - Dynamic columns can be exported as binary, JSON or even as SQL, where I think the latter is a pretty unique feature.
  • Oracle export format - This exports data as INSERT statements, just like mysqldump, but in an Oracle friendly way.
  • MS SQL / SQL Server format - This exports data as SQL Server friendly INSERTs.
  • Binary data formats - Supporting plain binary, base64, hex and a number of more formats.
  • Transaction support.
  • JSON export support - Including embedding dynamic columns as JSON.
  • UTF8 support and UTF8 validity checking.
  • Generated ROWNUM column output
  • FLOAT and DOUBLE formatting
  • DATETIME, TIMESTAMP, DATE and TIME custom formating.
And a lot more. Download from sourceforge.

Happy SQL'ing

Tips and Tricks - How to shard MySQL with ProxySQL in ClusterControl

Related resources  Sharding MySQL with MySQL Fabric and ProxySQL  MySQL Load Balancing with ProxySQL - an Overview  How to set up read-write split in Galera Cluster using ProxySQL  How ProxySQL adds Failover and Query Control to your MySQL Replication Setup  Two Database Sharding Tools for MySQL

Having too large a (write) workload on a master is dangerous. If the master collapses and a failover happens to one of its slave nodes, the slave node could collapse under the write pressure as well. To mitigate this problem you can shard horizontally across more nodes.

Sharding increases the complexity of data storage though, and very often, it requires an overhaul of the application. In some cases, it may be impossible to make changes to an application. Luckily there is a simpler solution: functional sharding. With functional sharding you move a schema or table to another master, and thus alleviating the master from the workload of these schemas or tables.

In this Tips & Tricks post, we will explain how you can functionally shard your existing master, and offload some workload to another master using functional sharding. We will use ClusterControl, MySQL replication and ProxySQL to make this happen, and the total time taken should not be longer than 15 minutes in total. Mission impossible? :-)

The example database

In our example we have a serious issue with the workload on our simple order database, accessed by the so_user. The majority of the writes are happening on two tables: orders and order_status_log. Every change to an order will write to both the order table and the status log table.

CREATE TABLE `orders` ( `id` int(11) NOT NULL AUTO_INCREMENT, `customer_id` int(11) NOT NULL, `status` varchar(14) DEFAULT 'created', `total_vat` decimal(15,2) DEFAULT '0.00', `total` decimal(15,2) DEFAULT '0.00', `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `order_status_log` ( `orderId` int(11) NOT NULL, `status` varchar(14) DEFAULT 'created', `changeTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `logline` text, PRIMARY KEY (`orderId`, `status`, `changeTime` ) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `customers` ( `id` int(11) NOT NULL AUTO_INCREMENT, `firstname` varchar(15) NOT NULL, `surname` varchar(80) NOT NULL, `address` varchar(255) NOT NULL, `postalcode` varchar(6) NOT NULL, `city` varchar(50) NOT NULL, `state` varchar(50) NOT NULL, `country` varchar(50) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

What we will do is to move the order_status_log table to another master.

As you might have noticed, there is no foreign key defined on the order_status_log table. This simply would not work across functional shards. Joining the order_status_log table with any other table would simply no longer work as it will be physically on a different server than the other tables. And if you write transactional data to multiple tables, the rollback will only work for one of these masters. If you wish to retain these things, you should consider to use homogenous sharding instead where you keep related data grouped together in the same shard.

Installing the Replication setups

First, we will install a replication setup in ClusterControl. The topology in our example is really basic: we deploy one master and one replica:

But you could import your own existing replication topology into ClusterControl as well.

After the setup has been deployed, deploy the second setup:

While waiting for the second setup to be deployed, we will add ProxySQL to the first replication setup:

Adding the second setup to ProxySQL

After ProxySQL has been deployed we can connect with it via command line, and see it’s current configured servers and settings:

MySQL [(none)]> select hostgroup_id, hostname, port, status, comment from mysql_servers; +--------------+-------------+------+--------+-----------------------+ | hostgroup_id | hostname | port | status | comment | +--------------+-------------+------+--------+-----------------------+ | 20 | | 3306 | ONLINE | read server | | 20 | | 3306 | ONLINE | read server | | 10 | | 3306 | ONLINE | read and write server | +--------------+-------------+------+--------+-----------------------+ MySQL [(none)]> select rule_id, active, username, schemaname, match_pattern, destination_hostgroup from mysql_query_rules; +---------+--------+----------+------------+---------------------------------------------------------+-----------------------+ | rule_id | active | username | schemaname | match_pattern | destination_hostgroup | +---------+--------+----------+------------+---------------------------------------------------------+-----------------------+ | 100 | 1 | NULL | NULL | ^SELECT .* FOR UPDATE | 10 | | 200 | 1 | NULL | NULL | ^SELECT .* | 20 | | 300 | 1 | NULL | NULL | .* | 10 | +---------+--------+----------+------------+---------------------------------------------------------+-----------------------+

As you can see, ProxySQL has been configured with the ClusterControl default read/write splitter for our first cluster. Any basic select query will be routed to hostgroup 20 (read pool) while all other queries will be routed to hostgroup 10 (master). What is missing here is the information about the second cluster, so we will add the hosts of the second cluster first:

MySQL [(none)]> INSERT INTO mysql_servers VALUES (30, '', 3306, 'ONLINE', 1, 0, 100, 10, 0, 0, 'Second repl setup read server'), (30, '', 3306, 'ONLINE', 1, 0, 100, 10, 0, 0, 'Second repl setup read server'); Query OK, 2 rows affected (0.00 sec) MySQL [(none)]> INSERT INTO mysql_servers VALUES (40, '', 3306, 'ONLINE', 1, 0, 100, 10, 0, 0, 'Second repl setup read and write server'); Query OK, 1 row affected (0.00 sec)

After this we need to load the servers to ProxySQL runtime tables and store the configuration to disk:

MySQL [(none)]> LOAD MYSQL SERVERS TO RUNTIME; Query OK, 0 rows affected (0.00 sec) MySQL [(none)]> SAVE MYSQL SERVERS TO DISK; Query OK, 0 rows affected (0.01 sec)

As ProxySQL is doing the authentication for the clients as well, we need to add the os_user user to ProxySQL to allow the application to connect through ProxySQL:

MySQL [(none)]> INSERT INTO mysql_users (username, password, active, default_hostgroup, default_schema) VALUES ('so_user', 'so_pass', 1, 10, 'simple_orders'); Query OK, 1 row affected (0.00 sec) MySQL [(none)]> LOAD MYSQL USERS TO RUNTIME; Query OK, 0 rows affected (0.00 sec) MySQL [(none)]> SAVE MYSQL USERS TO DISK; Query OK, 0 rows affected (0.00 sec)

Now we have added the second cluster and user to ProxySQL. Keep in mind that normally in ClusterControl the two clusters are considered two separate entities. ProxySQL will remain part of the first cluster. Even though it is now configured for the second cluster, it will only be displayed under the first cluster,.

Mirroring the data

Keep in mind that mirroring queries in ProxySQL is still a beta feature, and it doesn’t guarantee the mirrored queries will actually be executed. We have found it working fine within the boundaries of this use case. Also there are (better) alternatives to our example here, where you would make use of a restored backup on the new cluster and replicate from the master until you make the switch. We will describe this scenario in a follow up Tips & Tricks blog post.

Now that we have added the second cluster, we need to create the simple_orders database, the order_status_log table and the appropriate users on the master of the second cluster:

mysql> create database simple_orders; Query OK, 1 row affected (0.01 sec) mysql> use simple_orders; Database changed mysql> CREATE TABLE `order_status_log` ( `orderId` int(11) NOT NULL, `status` varchar(14) DEFAULT 'created', `changeTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `logline` text, PRIMARY KEY (`orderId`, `status`, `changeTime` ) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; Query OK, 0 rows affected (0.00 sec) mysql> create user 'so_user'@'' identified by 'so_pass'; Query OK, 0 rows affected (0.00 sec) mysql> grant select, update, delete, insert on simple_orders.* to 'so_user'@''; Query OK, 0 rows affected (0.00 sec)

This enables us to start mirroring the queries executed against the first cluster onto the second cluster. This requires an additional query rule to be defined in ProxySQL:

MySQL [(none)]> INSERT INTO mysql_query_rules (rule_id, active, username, schemaname, match_pattern, destination_hostgroup, mirror_hostgroup, apply) VALUES (50, 1, 'so_user', 'simple_orders', '(^INSERT INTO|^REPLACE INTO|^UPDATE|INTO TABLE) order_status_log', 20, 40, 1); Query OK, 1 row affected (0.00 sec) MySQL [(none)]> LOAD MYSQL QUERY RULES TO RUNTIME; Query OK, 1 row affected (0.00 sec)

In this rule ProxySQL will match everything that is writing to the orders_status_log table, and send it in addition to the hostgroup 40. (write server of the second cluster)

Now that we have started mirroring the queries, the backfill of the data from the first cluster can take place. You can use the timestamp from the first entry in the new orders_status_log table to determine the time we started to mirror.

Once the data has been backfilled we can reconfigure ProxySQL to perform all actions on the orders_status_log table on the second cluster. This will be a two step approach: add a new rule to move the read queries to the second cluster’s read servers and except the SELECT … FOR UPDATE queries. Then another one to modify our mirroring query to stop mirroring and only write to the second cluster.

MySQL [(none)]> INSERT INTO mysql_query_rules (rule_id, active, username, schemaname, match_pattern, destination_hostgroup, apply) VALUES (70, 1, 'so_user', 'simple_orders', '^SELECT .* FROM order_status_log', 30, 1), (60, 1, 'so_user', 'simple_orders', '^FROM order_status_log .* FOR UPDATE', 40, 1); Query OK, 2 rows affected (0.00 sec) MySQL [(none)]> UPDATE mysql_query_rules SET destination_hostgroup=40, mirror_hostgroup=NULL WHERE rule_id=50; Query OK, 1 row affected (0.00 sec)

And don’t forget to activate and persist the new query rules:

MySQL [(none)]> LOAD MYSQL QUERY RULES TO RUNTIME; Query OK, 1 row affected (0.00 sec) MySQL [(none)]> SAVE MYSQL QUERY RULES TO DISK; Query OK, 0 rows affected (0.05 sec)

After this final step we should see the workload drop on the first cluster, and increase on the second cluster. Mission possible and accomplished. Happy clustering!

Tags: shardingMySQLMariaDBproxysqlclustercontrol

Faster MySQL replication with group commit and delay

We’ve been having a problem with MySQL replication at work. Replicas periodically tend to fall behind and we couldn’t really figure out how to speed things up. It wasn’t about resources. The replicas have plenty of CPU and I/O available. We’re also using multithreaded replication (a.k.a. MTR) but most of the replication threads were idle.

One thing that we decided to try out was the new LOGICAL_CLOCK parallelization policy introduced in MySQL 5.7.2. Here’s what the MySQL reference manual says about slave-parallel-type:

When using a multi-threaded slave (slave_parallel_workers is greater than 0), this option specifies the policy used to decide which transactions are allowed to execute in parallel on the slave. The possible values are:

  • DATABASE: Transactions that update different databases are applied in parallel. This value is only appropriate if data is partitioned into multiple databases which are being updated independently and concurrently on the master. Only recommended if there are no cross-database constraints, as such constraints may be violated on the slave.

  • LOGICAL_CLOCK: Transactions that are part of the same binary log group commit on a master are applied in parallel on a slave. There are no cross-database constraints, and data does not need to be partitioned into multiple databases.

We’ve been using --slave-parallel-type=DATABASE, but clearly it has not been offering the level of parallelism we want. So we tried LOGICAL_CLOCK.

Initially, this ended being slower than --slave-parallel-type=DATABASE. My guess was we’re not grouping enough transactions per binary log commit for this to be a big improvement. Fortunately, that’s something we can tune using binlog_group_commit_sync_delay.

Here is the documentation about that:

Controls how many microseconds the binary log commit waits before synchronizing the binary log file to disk. By default binlog-group-commit-sync-delay is set to 0, meaning that there is no delay. Setting binlog-group-commit-sync-delay to a microsecond delay enables more transactions to be synchronized together to disk at once, reducing the overall time to commit a group of transactions because the larger groups require fewer time units per group. With the correct tuning, this can increase slave performance without compromising the master’s throughput.

Setting binlog-group-commit-sync-delay to 3000 (3 ms) and --slave-parallel-type=LOGICAL_CLOCK resulted in a huge improvement in replication delay. Too bad I didn’t learn about this sooner!

Caching and crashing in lm2


lm2 is my ordered key-value storage library. You can read my post about it here. There’s a lot to say about this little library, so this will be the first of a few posts about how lm2 works and why I chose to do things a certain way.


lm2 is essentially a linked list on disk. Everyone knows linked lists aren’t very fast. Searches take a ridiculously long time and require a lot of seeking. That’s why lm2 has a record cache, which stores a subset of the key-value records in memory. This cache really speeds up searches, but it’s used for much more. It’s also used for the write path. All writes in lm2 happen in memory before they’re durably recorded to disk.

There is only one level of caching at the moment. If you think about it, the architecture looks like a 2-level skip list.

A cache like this has some interesting behavior when you have large scans over many records.

Scan resistance

Scan resistance is about keeping the cache “good” when faced with large scans. A scan will access lots of elements, but many will not be accessed again. For example, an LRU is a bad choice for scans because it will insert every accessed element, but many won’t be accessed again.

lm2 uses probability to insert elements into the cache. Like a skip list, records are probabilistically inserted into the record cache whenever they’re accessed. A frequently accessed record may not be in the cache, but it’s definitely more likely. Rarely accessed elements will rarely make it into the cache.

This approach is scan resistant because a single, full collection scan won’t destroy the cache. The other benefit is that cached records tend to be at the areas that are read the most, which I think is what you want from a cache like this.

The bad thing about using a probabilistic cache is that it can take a while for it to “warm up.” We’ll get back to this later.

No dirty records

Besides the time during a write, lm2 does not hold dirty records. This means that the cache has records as they appear on disk. This makes it really easy to evict elements because there isn’t any flushing to do.


The fact that lm2 is append-only and does not overwrite records only applies to some data, like the actual keys and values. There’s a bunch of metadata (pointers, versions, tombstone versions, etc.) that is updated in-place. As I mentioned earlier, all of these updates first happen in memory.

Some systems (like InnoDB) use rollback information to undo changes that happen in place. This doesn’t exist in lm2. Once something changes, there’s no going back. But what if something bad happens halfway (or some other arbitrary point) into a write? This is undefined. So what do you do? Crash!

Crashing isn’t a big deal in lm2. Writes are guaranteed to be fully durable when acknowledged, so partially written data is cleanly discarded. The in-memory state is always thrown out. This includes the cache (which takes a while to build!).

Early on during testing, I realized that recovery after losing the cache was horrible. This is where the poor performance of a linked list really shows. To counteract this effect, lm2 now periodically saves the cache state in the background. Every few seconds, it writes the offsets of the records in memory to a separate file. After a crash, it reads these records back into memory and is able to perform just as well as it did before the crash.

Further reading

For more on scan resistance, check out this page titled “Making the Buffer Pool Scan Resistant” in the MySQL reference manual.

Also see “Saving and Restoring the Buffer Pool State”. This is where I got the record cache saving idea :).

Enabling and Disabling Jemalloc on Percona Server

This post discusses enabling and disabling jemalloc on Percona Server for MySQL.

The benefits of jemalloc versus glibc for use with MySQL have been widely discussed. With jemalloc (along with Transparent Huge Pages disabled) you have less memory fragmentation, and thus more efficient resource management of the available server memory.

For standard installations of Percona Server 5.6+ (releases starting with 5.6.19-67.0), the only thing needed to use jemalloc as the memory library for mysqld is for it to be installed on the server.

Enabling Jemalloc on Percona Server

First thing first: install Jemalloc.

The library is available on the Percona repository, which is available for both apt and yum package management:

Once you have the repo, just run the install command (according to your OS) to install it:

yum install jemalloc / apt-get install jemalloc

Now that you have the jemalloc package installed, all it takes to start using it is…..

  • Restart the server.

That’s it! No modifications needed on the my.cnf file or anywhere else. Plain and simple!

Disabling Jemalloc on Percona Server

If for any reason you need to disable jemalloc and go back to the default library, you have two options: remove the jemalloc package (not too practical), or add the following line to the [mysqld_safe] section of the my.cnf file:

malloc-lib =

In other words, an empty path. That will do the trick. Note that commenting or removing the “malloc-lib” parameter on the cnf file won’t work.

How to Know if Jemalloc is Being Used?

There are couple of ways you can verify this, but the less invasive way is by using the pt-mysql-summary (version 2.2.20 and higher) tool from the Percona Toolkit:

root@reports:~# pt-mysql-summary | grep -A5 -i "memory management" # Memory management library ################################## jemalloc enabled in MySQL config for process with ID 5122 Using jemalloc from /usr/lib/x86_64-linux-gnu/ # The End ####################################################

Fosdem MySQL & Friends changed devroom !

Hi all,

Fosdem’s organization relocated the MySQL & Friends Devroom from H.1308 to H.1309 (Van Rijn).

It’s the same building, just the door next to it.

MySQL will also have a stand in the H building. Don’t hesitate to visit it (I’ve heard there will be some Sakila Dolphins !)