Planet MySQL

What Happens If You Set innodb_open_files Higher Than open_files_limit?

The settings of MySQL configuration variables have a fundamental impact on the performance of your database system. Sometimes it can be a little tricky to predict how changing one variable can affect others, and especially when dealing with cases like the one I’ll describe in this post, where the outcome is not very intuitive. So here, we’ll look at what happens when you set innodb_open_files higher than the open_files_limit.

We can set the maximum number of open files in our MySQL configuration file using:


If this isn’t set, then the default – which is 5,000 in MySQL 5.7 – should be used.

See Sveta’s excellent blog post for an explanation of how to change the open file limit; if this value is set it will take the SystemD LIMIT_NOFILES unless it’s set to infinity (and on CentOS 7 it will then use 65536,  though much higher values are possible if specified manually):

[root@centos7-pxc57-3 ~]# grep open_files_limit /etc/my.cnf open_files_limit=10000 [root@centos7-pxc57-3 ~]# grep LimitNOFILE /lib/systemd/system/mysqld.service.d/limit_nofile.conf LimitNOFILE=infinity [root@centos7-pxc57-3 ~]# mysql -e “SELECT @@open_files_limit” +--------------------+ | @@open_files_limit | +--------------------+ | 65536 | +--------------------+ [root@centos7-pxc57-3 ~]# perl -pi -e ’s/LimitNOFILE=infinity/LimitNOFILE=20000/‘ /lib/systemd/system/mysqld.service.d/limit_nofile.conf && systemctl daemon-reload && systemctl restart mysqld [root@centos7-pxc57-3 ~]# mysql -e “SELECT @@open_files_limit” +--------------------+ | @@open_files_limit | +--------------------+ | 20000 | +--------------------+ [root@centos7-pxc57-3 ~]# perl -pi -e ’s/LimitNOFILE=20000/LimitNOFILE=5000/‘ /lib/systemd/system/mysqld.service.d/limit_nofile.conf && systemctl daemon-reload && systemctl restart mysqld [root@centos7-pxc57-3 ~]# mysql -e “SELECT @@open_files_limit” +--------------------+ | @@open_files_limit | +--------------------+ | 5000 | +--------------------+

As you can see above, MySQL cannot set the value of open_files_limit higher than the system is configured to allow, and open_files_limit will default back to the maximum if it’s set too high.

That seems pretty straightforward, but what isn’t quite as obvious is how that affects innodb_open_files. The innodb_open_files value configures how many .ibd files MySQL can keep open at any one time.

As this obviously requires files to be open, it should be no higher than the open_files_limit (and should be lower). If we try to set it higher as per this example, MySQL will print a warning in the log file:

[root@centos7-pxc57-3 ~]# grep innodb_open_files /var/log/mysqld.log 2018-09-21T08:31:06.002120Z 0 [Warning] InnoDB: innodb_open_files should not be greater than the open_files_limit.

What the warning doesn’t state is that the value is being lowered. Not to the maximum value allowed though:

[root@centos7-pxc57-3 ~]# mysql -e “SELECT @@innodb_open_files” +---------------------+ | @@innodb_open_files | +---------------------+ | 2000 | +---------------------+

2000? Why 2000?

It’s because if we set innodb_open_files too high, it reverts back to the default value, which per the documentation is:

300 if innodb_file_per_table is not enabled, and the higher of 300 and table_open_cache otherwise. Before 5.6.6, the default value is 300.

And table_open_cache? Well that defaults to 400 for versions of MySQL up to 5.6.7, and 2000 for 5.6.8 onwards.

Note that table_open_cache is another setting completely. innodb_open_files controls the number of InnoDB files (.ibd) the server can keep open at once; whilst table_open_cache controls the number of table definition (.frm) files the server can have open at once.


Photo by Logan Kirschner from Pexels

How a Social Media Platform Benefits from Upgraded Security and Performance

We recently helped a social media management platform improve its security and performance.

The social media platform supports social network integrations for Twitter, Facebook, Instagram, LinkedIn and YouTube, and has more than 15 million users. The company needed to ensure maximum security and performance for its authentication processes by reducing the hours and errors associated with manual administration.

These improvements would allow the organization to not only minimize downtime and expand the feature set available to its developers, but would also ensure the company’s compliance with Europe’s new General Data Protection Regulation (GDPR). The challenge was automating its database user management and authentication processes, which required knowledge of MySQL security and DevOps environments.

Pythian’s experts have deep knowledge of MySQL security and DevOps-structured environments, and were able to develop a customized MySQL Bastion proxy to increase security by restricting user access and automating access management.

The client’s internal processes are now fully automated, thus improving security, productivity and resource availability for the company.

Read the full story about how Pythian helped a client achieve the highest levels of MySQL scalability, reliability and uptime.

Code Challenge #14: Test Your Knowledge of React Hooks

The front-end world of JavaScript keeps buzzing daily with new tools, technologies, and super updates! One of such recent release by the React.js team is React Hooks in React 16.7-alpha. Here's an awesome post by Chris introducing React Hooks.

Hooks give us a way to write cleaner code by replacing class components or rather stateful components in React with functional components (this is NOT replacing class components). Thus, you have the ability to utilize features such as state and lifecycle methods which were previously unusable in a functional component.

The Challenge

In this challenge, we'll test our knowledge of React Hooks by converting class components into stateful functional components.

In this challenge, you are provided with individual components in a simple pre-configured React project which requires you to convert the following to a functional component:

  • A component having a single state variable.
  • A component having multiple state variables.
  • A component having a componentDidMount lifecycle method.
  • A component having a componentWillUnmount lifecycle method.

A sample codesandbox scaffolded from a create-react-app project is provided with the components. Fork the sandbox to get started.

Fork this sandbox to get started:


The only requirement is to re-write the class components to functional components.

Goals for this Challenge
  • Re-write class components to functional components while retaining state and lifecycle properties.

  • Understand how React Hooks work.

  • Write simple React components.

  • and of course, use codesandbox to demo projects.


New to React Hooks? This post by Chris and this by Peter should help you get through. Also, all required dependencies have been installed in the sandbox to quickly help you start out.


Would you like feedback and reviews on your submission after completion? You can:

  • Share on Twitter and be sure to use the hashtag #ScotchCodeChallenge and we would be sure to see it.
  • Boost your technical writing skills by writing a post on Scotch about how you solved the challenge.
  • Post the link to your completed sandbox in the comment section of this post.

The solution to this challenge will be published next Monday! Happy keyboard slapping!

Using MySQL Enterprise Backup on InnoDB Cluster – Member Restore Use Cases

This is a quick blog demonstrating a couple backup Restore uses cases within a MySQL InnoDB Cluster 8.0 setup.  The backup used was completed in a previous blog (part 2 in this series) with the  MySQL Enterprise Backup 8.0 utility.  I’ll then use that backup to build an additional member to add to the cluster. This blog… Read More »

Mastering Continuent Clustering: Keeping a Consistent View of the Cluster

The Question

You already know about the Continuent Connector which is the “secret sauce” that routes your application database traffic to the appropriate MySQL data source of your cluster.

Have you ever wondered how the Connector keeps track of the cluster configuration? How it always knows which host is the master (or masters in a Composite Multimaster topology), and which are slaves?

The Short Version

This information is actually held and maintained by the Managers, which monitor and take care of their local MySQL node.

Each one of the Connectors maintains a single connection to one, and only one, manager per data service. In the case of a Composite Multimaster topology, that will be one Manager chosen per site.

The Nitty Gritty

Every 3 seconds, each Manager sends a “ping” on this previously-established connection, which is also the opportunity to send a refresh of the cluster states, just in case… As long as the Connector receives these pings, it’s happy in perfect world.

If a Connector does not hear from the selected Manager for more than 30s (by default, defined by the --connector-keepalive-timeout option), it will just disconnect from that Manager and try to reach another one. Most of the time, the Connector is able to discover another available Manager, connect to it and operations simply continue normally.

If the Connector is unable to reach any of the Managers (which could happen if the host is isolated from the network), the Connector will go into the “ON HOLD” state, which will delay any new connection requests and continue to serve existing client connections.

If the Connector is unable to reach another Manager within 30s (by default, defined by the --connector-delay-before-offline option), the Connector will declare itself isolated and will kill existing connections and reject new ones. This is done so that there is no risk of writing data to a cluster node that may no longer have the master role.

That’s how Continuent Clustering keeps your data safe!

See Percona CEO Peter Zaitsev’s Keynote at AWS re:Invent: MySQL High Availability and Disaster Recovery

Join Percona CEO Peter Zaitsev at AWS re:Invent as he presents MySQL High Availability and Disaster Recovery on Tuesday, November 27, 2018, in the Bellagio Resort, Level 1, Gaugin 2 at 1:45 PM.

In this hour-long session, Peter describes the differences between high availability (HA) and disaster recovery (DR), and then moves through scenarios detailing how each is handled manually and in Amazon RDS.

He’ll review the pros and cons of managing HA and DR in the traditional database environment as well in the cloud. Having full control of these areas is daunting, and Amazon RDS makes meeting these needs easier and more efficient.

Regardless of which path you choose, it is necessary that you monitor your environment, so Peter wraps up with a discussion of metrics you should regularly review to keep your environment working correctly and performing optimally.

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona to one of the most respected open source companies in the business. Peter is a co-author of High-Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance.

You can also stop by and see Percona at AWS re:Invent in booth 1605 in The Venetian Hotel Expo Hall.

Enterprise Backup Support in InnoDB Cluster 8.0

In this blog post, I’ll highlight new Backup Support in the MySQL Enterprise Backup 8.0 utility for Group Replication and InnoDB Cluster. This will also include usage of the MySQL login-path utility for accessing credentials for the backup process as highlighted in this other blog post on Devops ideas with MySQL and Scripting for Binary Log… Read More »

Fun with Bugs #73 - On MySQL Bug Reports I am Subscribed to, Part X

It's time to continue my review of MySQL bug reports that I considered interesting for some reason recently. I had not got any notable reaction from Oracle engineers to my previous post about recent regression bugs in MySQL 8.0.13, so probably this topic is not really that hot. In this boring post I'll just review some bugs I've subscribed to since August that are still not closed, starting from the oldest.

Let me start with a couple of bug reports that remain "Open":
  • Bug #91959 - "UBSAN: signed integer overflow in lock_update_trx_age". It's really unusual to see bug reported by Shane Bester himself just "Open" for months. As he noted, it's really strange to see age defined as 32-bit value here when age_updated two lines later is 64-bit. On the other hand, from comments in the code it seems the age value is supposed to be small enough and if it grows too much this is a problem.
  • Bug #92964 - "Slave performance degrades over time". As description says:
    "When mysql slave is running gtid based asynchronous multi threaded replication, performance degrades over time in significant way when session_track_gtids is set to OWN_GTID; "The bug is assigned, but there are no public comments, while the way to reproduce seem to be defined clearly.
Same as the weather at Seven Sisters Cliffs back on September 14, 2018, some recent MySQL bug reports do not add confidence to those few visitors who explore them...
Now let me continue with some "Verified" bugs:
  • Bug #91981 - "Inconsistent user@host definitions for definer/grantor/grantee columns". It seems consistency is not a high priority these days.
  • Bug #92020 - "Introduce new SQL mode rejecting queries with results depending on query plan". I already mentioned this report while reviewing recent MySQL optimizer bugs. I have nothing to add to my conclusion there:
    "So, current MySQL considers different results when different execution plans are used normal!"
  • Bug #92032 - "Restrict usage of session foreign_key_checks". It's a great request from Federico Razzoli and I am happy to see it verified as a bug, fast.
  • Bug #92093 - "Replication crash safety needs relay_log_recovery even with GTID." Recently Jean-François Gagné pays special attention to MySQL replication crash safety in his bug reports, talks and blog posts. This specific report ended up as a request for a more clear documentation that should not provoke (false) safety feelings. See also his Bug #92109 - "Please make replication crash safe with GTID and less durable setting (bis)." for the request to improve MySQL in this regard one day.
  • Bug #92131 - "ASan: Direct leak of 272 byte(s) in main.mysqlpump_partial_bkp MTR test case". This kind of reports make me nervous. They make me think that still there is no regular testing of ASan-enabled builds in Oracle. Lucky we are, Percona engineers (like Yura Sorokin) do this for Oracle.
  • Bug #92133 - "DICT_SYS mutex contention causes complete stall when running with 40 mill tables". Proper data dictionary in MySQL 8.0 was supposed to solve some performance problems for instances with a lot of tables. As this perfect bug report from Alexander Rubin shows, this is not yet the case. We can still push MySQL instance over the limits and single instance for thousands of different small databases is hardly usable.
  • Bug #92209 - "AVG(YEAR(datetime_field)) makes an error result because of overflow". Perfect analysis from Pin Lin. All recent MySQL (and MariaDB) versions are affected.
  • Bug #92252 - "Mysql generates gtid gaps because of option slave-skip-errors". Yet another bug report from Pin Lin
  • Bug #92364 - "events_transactions_summary_global_by_event_name not working as expected". This bug was reported by Przemyslaw Malkowski from Percona. We all know Performance Schema is near perfect, but it turned out that "...performance_schema.events_transactions_summary_global_by_event_name behavior for instrumenting transactions seems completely broken.".
  • Bug #92398 - "point in time recovery using mysqlbinlog broken with temporary table -> errors". Let me quote Shane Bester:
    "Running a multi-threaded workload involving temporary and non-temporary tables leads to binary log playback giving errors."
  • Bug #92421 - "Queries with views and operations over local variables don't use indexes". This is a kind of regression in MySQL 5.7. MySQL 8 allows to workaround the problem properly using ROW_NUMBER() function.
  • Bug #92540 - "Comparison between DATE and DECIMAL does not round nanoseconds". This bug was reported by Alexander Barkov from MariaDB. MariaDB 10.3.x is not affected based on my tests.
To summarize:
  1. It seems ASan-enabled builds are not tested by Oracle engineers on a regular basis.
  2. Percona engineers help a lot with regular MySQL QA and testing for various extreme use cases.
  3. There is a lot to do to make GTID-based replication crash-safe and working fast in common use cases.
 More bug review are coming soon, so stay tuned.


Features and capabilities continue to arrive in MySQL 8.0, most recently noted in the GA release of MySQL 8.0.13.  The mysql-shell 8.0.12 release introduced a number of things, an important part was a cross-platform mysql-shell secure password handling facility.  The MySQL Login-paths are a part of that security focus.  Here we will look at where InnoDB Cluster 8.0,… Read More »

Compression options in MySQL (part 1)

Over the last year, I have been pursuing a part time hobby project exploring ways to squeeze as much data as possible in MySQL. As you will see, there are quite a few different ways. Of course things like compression ratio matters a lot but, other items like performance of inserts, selects and updates, along with the total amount of bytes written are also important. When you start combining all the possibilities, you end up with a large set of compression options and, of course, I am surely missing a ton. This project has been a great learning opportunity and I hope you’ll enjoy reading about my results. Given the volume of results, I’ll have to write a series of posts. This post is the first of the series. I also have to mention that some of my work overlaps work done by one of my colleague, Yura Sorokin, in a presentation he did in Dublin.

The compression options
  • InnoDB page size in {16k, 32k, 64k} (as references)
  • InnoDB barracuda page compression, block_size in {8k, 4k}
  • InnoDB Transparent page compression with punch holes, page size in {16k, 32k, 64k} * compression algo in {LZ4, Zlib}
  • MyISAM, MyISAM Packed, MyISAM on ZFS with recordsize in {16k, 32k}
  • InnoDB on ZFS, ZFS compression algo in {LZ4, Zlib}, ZFS record size in {16k, 32k, 64k, 128k}, InnoDB page size in {16k, 32k, 64k}
  • TokuDB, TokuDB compression algo in {ZLIB, LZMA, QUICKLZ, SNAPPY}
  • TokuDB on ZFS, TokuDB compression algo set to None, ZFS compression Zlib, ZFS record size in {16k, 32k, 64k, 128k}
  • MyRocks, compression algo in {None, ZSTD}
  • MyRocks on ZFS, MyRocks compression algo set to None, ZFS compression Zlib, ZFS record size in {16k, 32k, 64k, 128k}

In many interesting cases, the ZFS experiments have been conducted with and without a SLOG.

The test datasets

In order to cover these solutions efficiently, I used a lot of automation and I restricted myself to two datasets. My first dataset consists of a set of nearly 1 billion rows from the Wikipedia access stats. The table schema is:

CREATE TABLE `wiki_pagecounts` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `day` date NOT NULL, `hour` tinyint(4) NOT NULL, `project` varchar(30) NOT NULL, `title` text NOT NULL, `request` int(10) unsigned NOT NULL, `size` int(10) unsigned NOT NULL, PRIMARY KEY (`id`), KEY `idx_time` (`day`,`hour`) );

and here’s a typical row:

mysql> select * from wiki_pagecounts where id = 16\G *************************** 1. row *************************** id: 16 day: 2016-01-01 hour: 0 project: aa title: 'File:Wiktionary-logo-en.png' request: 1 size: 10752 1 row in set (0.00 sec)

The average length of the title columns is above 70 and it often has HTML escape sequences for UTF-8 characters in it. The actual column content is not really important but it is not random data. Loading this dataset in plain InnoDB results in a data file of about 113GB.

The second dataset is from the defunct Percona cloud tool project and is named “o1543”. Instead of a large number of rows, it is made of only 77M rows but this time, the table has 134 columns, mostly using float or bigint. The table definition is:

CREATE TABLE `query_class_metrics` ( `day` date NOT NULL, `query_class_id` int(10) unsigned NOT NULL, `instance_id` int(10) unsigned NOT NULL, `start_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `end_ts` timestamp NOT NULL DEFAULT '1970-01-01 00:00:01', `query_count` bigint(20) unsigned NOT NULL, `lrq_count` bigint(20) unsigned NOT NULL DEFAULT '0', ... `Sort_scan_sum` bigint(20) unsigned DEFAULT NULL, `No_index_used_sum` bigint(20) unsigned DEFAULT NULL, `No_good_index_used_sum` bigint(20) unsigned DEFAULT NULL, PRIMARY KEY (`start_ts`,`instance_id`,`query_class_id`), KEY `start_ts` (`instance_id`) );

When loaded in plain InnoDB, the resulting data file size is slightly above 87GB.

The test queries

The first test query is, of course, the inserts used to load the datasets. These are multi-inserts statements in primary key order generated by the mysqldump utility.

The second test query is a large range select. For the Wikipedia dataset I used:

select `day`, `hour`, max(request), sum(request), sum(size) from wikipedia_pagecounts.wiki_pagecounts where day = '2016-01-05' group by `day`,`hour`;

while for the o1543 dataset, I used:

select query_class_id, sum(Query_time_sum) as Totat_time,sum(query_count), sum(Rows_examined_sum), sum(InnoDB_pages_distinct_sum) from o1543.query_class_metrics where start_ts between '2014-10-01 00:00:00' and '2015-06-30 00:00:00' group by query_class_id order by Totat_time desc limit 10;

In both cases, a significant amount of data needs to be processed. Finally, I tested random access with the updates. I generated 20k distinct single row updates in random primary key order for the Wikipedia dataset like:

update wikipedia_pagecounts.wiki_pagecounts set request = request + 1 where id = 377748793;

For the o1543 dataset, I used the following update statement:

update o1543.query_class_metrics set Errors_sum = Errors_sum + 1 where query_class_id = 472 and start_ts between '2014-10-01 00:00:00' and '2014-12-31 23:59:59';

which ends up updating very close to 20k rows, well spaced in term of primary key values.

The metrics recorded

In order to compare the compression options, I recorded key metrics.

  • Time: that’s simply the execution time of the queries. It is very difficult to minimally tune a server for all the different engines. Some also rely on the OS file cache. Take the execution time as a rough performance indicator which could be modified substantially through targeted tuning.
  • Amount of data read and written by the MySQL process, as reported by /proc/$(pidof mysqld)/io.
  • Amount of data written to the actual physical device where the data is stored, from /sys/block/$datadevice/stat. That matters a lot for flash devices that have a finite endurance. The amount of data written to the storage is the main motivation of Facebook with MyRocks.
  • The actual size of the final datasets

Even with these simple metrics, you’ll see there is quite a lot to discuss and learn.

The procedure

For the benchmarks, I used a LXC virtual machine. My main goal was to simulate a dataset much larger than the available memory. I tried to limit the MySQL buffers to 128MB but in some cases, like with MyRocks, that was pretty unfair and it impacted the performance results. Basically, the procedure was:

  1. Start mysqld (no buffer pool load)
  2. Sync + drop cache
  3. Capture: du -hs the datadir
  4. Capture: cat /proc/$(pidof mysqld)/io
  5. Capture: cat /sys/block/vdb/stat
  6. Capture: show global variables and show global status;
  7. Run the test query
  8. Wait for 30 minutes for any flushing or maintenance to complete
  9. Capture: du -hs the datadir
  10. Capture: cat /proc/$(pidof mysqld)/io
  11. Capture: cat /sys/block/vdb/stat
  12. Capture: show global variables and show global status;
  13. Stop mysqld

As much as possible, I automated the whole procedure. On many occasions, I ran multiple runs of the same benchmark to validate unexpected behaviors.

First results: Traditional storage options Inserting the data

In this first post of the series, I’ll report on the traditional “built-in” options which are InnoDB with Barracuda compression (IBC) and MyISAM with packing. I debated a bit the inclusion of MyISAM in this post since tables become read-only once packed but still, I personally implemented solutions using MyISAM packed a few times.

On the first figure (above), we have the final table sizes in GB for the different options we are considering in this first round of results. Over all the post series, we’ll use the plain InnoDB results as a reference. The Wikipedia dataset has a final size in InnoDB of 112.6GB while the o1543 dataset is slightly smaller, at 87.4GB.

The MyISAM sizes are smaller by 10 to 15% which is expected since InnoDB is page based, with page and row headers, and it doesn’t fully pack its pages. The InnoDB dataset size could have been up to twice as large if the data was inserted in random order of the primary key.

Adding Barracuda page compression with an 8KB block size (InnoDBCmp8k), both datasets shrink by close to 50%. Pushing to a block size of 4KB (InnoDBCmp4k), the Wikipedia dataset clearly doesn’t compress that much but, the o1543 dataset is very compressible, by more than 75%. Looking at the MyISAM Packed results, the o1543 dataset compressed to only 8.8GB, a mere 11% of the original MyISAM size. That means the o1543 could have done well with IBC using block size of 2KB. Such a compression ratio is really exceptional. I’ve rarely encountered a ratio that favorable in a production database.

With IBC, we know when the block size is too small for the compressibility of the dataset when the final size is no longer following the ratio of compressed block size over the original block size. For example, the Wikipedia dataset started at 112.6GB and a fourth (4KB/16KB) of this number is much smaller than the 46.9GB size of the InnoDB compression with 4k block size. You’ll also see a large number of compression failure in the innodb_cmp table of the information schema.

When the compression fails with IBC, InnoDB splits the page in two and recompresses each half. That process adds an overhead which can observed in the insertion time figure. While the insertion time of the Wikipedia dataset for the InnoDBCmp4k explodes to 2.6 times the uncompressed insertion time, the o1543 dataset takes only 30% more time. I suggest you do not take the times here too formally, the test environment I used was not fully isolated. View these times as trends.

The amount of data written during the inserts, shown on the above figure, has bugged me for a while. Let’s consider, for example, the amount of writes needed to insert all the rows of the Wikipedia dataset in plain InnoDB. The total is 503GB for a dataset of 113GB. From the MySQL status variables, I have 114GB written to the data files (innodb_pages_written * 16kb), 114GB written to the double write buffer (Inndb_dblwr_pages_written * 16kb) and 160GB written to the InnoDB log files (innodb_os_log_written). If you sum all these values, you have about 388GB, a value short by about… 114GB, too close to the size of the dataset to be an accident. What else is written?

After some research, I finally found it! When a datafile is extended, InnoDB first writes zeros to allocate the space on disk. All the tablespaces have to go through that initial allocation phase so here are the missing 114GB.

Back to the data written during the inserts figure, look at the number of writes required for the Wikipedia dataset when using InnoDBCmp4k. Any idea why it is higher? What if I tell you the double write buffer only writes 16KB pages (actually, it depends on innodb_page_size)? So, when the pages are compressed, they are padded with zeros when written to the double write buffer. We remember that, when compressing to 4KB, we had many compression misses so we ended up with many more pages to write. Actually, Domas Mituzas filed a bug back in 2013 about this. Also, by default (innodb_log_compressed_pages), the compressed and uncompressed versions of the pages are written to the InnoDB log files. Actually, only the writes to the tablespace are reduced by IBC.

MyISAM, since it is not a transactional engine, cheats here. The only overhead are the writes to the b-trees of the index. So, to the expense of durability, MyISAM writes much less data in this simple benchmark.

Range selects

So great, we have now inserted a lot of rows in our tables. How fast can we access these rows? The following figure presents the times to perform large range scans on the datasets. In InnoDB, the times are “fairly” stable. For the Wikipedia dataset, InnoDB compression improves the select performance, the time to decompress a page is apparently shorter than the time to read a full page. MyISAM without compression is the fastest solution but, once again, the benchmark offers the most favorable conditions to MyISAM. MyISAMPacked doesn’t fare as well for the large range select, likely too much data must be decompressed.

20k updates

Going to the updates, the time required to perform 20k updates by a single thread is shown on the above figure. For both datasets, InnoDB and InnoDBCmp8k show similar times. There is a divergence with Wikipedia dataset stored on InnoDBCmp4k, the execution time is 57% larger, essentially caused by a large increase in the number of pages read. MyISAM is extremely efficient dealing with the updates of the o1543 dataset since the record size is fixed and the update process is single-threaded.

Finally, let’s examine the number of bytes written per updates as shown on the figure above. I was naively expecting about two pages written per update statement, one for the double write buffer and one for the tablespace file. The Wikipedia dataset shows more like three pages written per update while the o1543 dataset fits rather well with what I was expecting. I had to look at the file_summary_by_instance table of the Performance schema and the innodb_metrics table to understand. Actually, my updates to the Wikipedia dataset are single row updates executed in autocommit mode, while the updates to the o1543 dataset are from a single statement updating 20k rows. When you do multiple small transactions, you end up writing much more to the undo log and to the system tablespace. The worse case is when the updates are in separate transactions and a long time is allowed for MySQL to flush the dirty pages.

Here are the writes associated with 30 updates, in autocommit mode, 20s apart:

mysql> select NAME, COUNT_RESET from innodb_metrics where name like '%writt%' and count_reset > 0 ; +---------------------------------+-------------+ | NAME | COUNT_RESET | +---------------------------------+-------------+ | buffer_pages_written | 120 | | buffer_data_written | 2075136 | | buffer_page_written_index_leaf | 30 | | buffer_page_written_undo_log | 30 | | buffer_page_written_system_page | 30 | | buffer_page_written_trx_system | 30 | | os_log_bytes_written | 77312 | | innodb_dblwr_pages_written | 120 | +---------------------------------+-------------+ 8 rows in set (0.01 sec)

The index leaf write is where the row is stored in the tablespace, 30 matches the number of rows updated. Each update has dirtied one leaf page as expected. The undo log is used to store the previous version of the row for rollback, in the ibdata1 file. I wrongly assumed these undo entries would not be actually written to disk, and would only live in the buffer pool and purged before they needed to be flushed to disk. I don’t clearly enough understand what is written to the system page and trx system to attempt a clear explanation for these ones. The sum of pages to write is 120, four per update but you need to multiply by two because of the double write buffer. So, in this worse case scenario, a simple single row update may cause up to eight pages to be written to disk.

Grouping the updates in a single transaction basically removes the pages written to the system_page and trx_system as these are per transaction.

Here is the result for the same 30 updates, send at a 20s interval, but this time in a single transaction:

mysql> select NAME, COUNT_RESET from innodb_metrics where name like '%writt%' and count_reset > 0 ; +---------------------------------+-------------+ | NAME | COUNT_RESET | +---------------------------------+-------------+ | buffer_pages_written | 63 | | buffer_data_written | 1124352 | | buffer_page_written_index_leaf | 30 | | buffer_page_written_undo_log | 31 | | buffer_page_written_system_page | 1 | | buffer_page_written_trx_system | 1 | | os_log_bytes_written | 60928 | | innodb_dblwr_pages_written | 63 | +---------------------------------+-------------+

The write load, in terms of the number of pages written, is cut by half, to four per update. The most favorable case will be a single transaction with no sleep in between.

For 30 updates in a single transaction with no sleep, the results are:

mysql> select NAME, COUNT_RESET from innodb_metrics where name like '%writt%' and count_reset > 0 ; +---------------------------------+-------------+ | NAME | COUNT_RESET | +---------------------------------+-------------+ | buffer_pages_written | 33 | | buffer_data_written | 546304 | | buffer_page_written_index_leaf | 30 | | buffer_page_written_undo_log | 1 | | buffer_page_written_system_page | 1 | | buffer_page_written_trx_system | 1 | | os_log_bytes_written | 4608 | | innodb_dblwr_pages_written | 33 | +---------------------------------+-------------+ 8 rows in set (0.00 sec)

Now, the undo log is flushed only once and we are down to approximately two page writes per update. This is what I was originally expecting. The other results falls well into place if you keep in mind that only the index_leaf writes are compressed. The InnoDBCmp4k results for the Wikipedia dataset are higher, essentially because it took much more time and thus more page flushing occurred.

What we learned?

Everything can be a pretext to explore and learn. Just to summarize, what have we learned in this post?

  • The InnoDB log file logs compressed and uncompressed result by default (see innodb_log_compressed_pages)
  • The double write buffer only writes full pages, compressed pages are zero padded
  • With InnoDB the total amount of data written to disk during the inserts is more than 5 times the final size. Compression worsen the ratio.
  • A single row update causes from two up to eight pages to be written to disk

Not bad in term of collateral learning…


In this post, we reviewed the traditional data compression solutions available with MySQL. In future posts, we’ll start looking at the alternatives. In the next one, I will evaluate InnoDB Transparent page compression with punch hole, a feature available since MySQL 5.7.

Caveats With pt-table-checksum Using Row-Based Replication, and Replication Filters

As per the documentation, pt-table-checksum is a tool to perform online replication consistency checks by executing checksum queries on the master, which produces different results on replicas that are inconsistent with the master.

The master and each slave insert checksums into the percona.checksums table, and these are later compared for differences. It’s fairly obvious that the checksums need to be determined independently on each node, and so these inserts must be replicated as STATEMENT and not ROW. Otherwise, the slaves would just insert the same checksum as the master and not calculate it independently.

The tool only requires

binlog_format=STATEMENT  for its own session. It sets this itself on the master, but will error if this isn’t already set on each slave node. The reason for the error is that the statement to change the session’s binlog_format will not be replicated. So if a slave has binlog_format=ROW then the slave itself will execute the checksum correctly, but the results will be written as a ROW. Any slaves of this slave in the chain will just insert the same result. See bug 899415.

This is only a problem if we have chained replication, and the error can be skipped with --no-check-binlog-format so for simple replication setups with ROW or MIXED replication we can still use the tool. If we do not have a slave-of-slave style chained topology, then there’s no need to worry about this.

However, there is one caveat to be aware of if you’re using replication filters: when a slave isn’t replicating a particular database due to binlog-ignore-db, this setting behaves differently with ROW based replication (RBR) vs. STATEMENT based.

Per the documentation, with RBR,


will cause all updates to testing.* to be skipped. With STATEMENT-based replication it will cause all updates after

USE test_database;  to be ignored (regardless of where the updates were being written to).

pt-table-checksum operates in the following way:

use `testing`/*!*/; SET TIMESTAMP=1541583280/*!*/; REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT ‘testing', 'testing', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, CONCAT(ISNULL(`id`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `testing`.`testing` /*checksum table*/

Due to the use testing the slave will then skip these statements with no errors, and simply not write into percona.checksums.

As per the documentation:

The tool monitors replicas continually. If any replica falls too far behind in replication, pt-table-checksum pauses to allow it to catch up. If any replica has an error, or replication stops, pt-table-checksum pauses and waits.

In this case, you will see the tool continually wait, with the following debug output:

# pt_table_checksum:12398 10967 SELECT MAX(chunk) FROM `percona`.`checksums` WHERE db=‘testing’ AND tbl=‘testing’ AND master_crc IS NOT NULL # pt_table_checksum:12416 10967 Getting last checksum on slave1 # pt_table_checksum:12419 10967 slave1 max chunk: undef # pt_table_checksum:12472 10967 Sleep 0.25 waiting for chunks # pt_table_checksum:12416 10967 Getting last checksum on slave1 # pt_table_checksum:12419 10967 slave1 max chunk: undef # pt_table_checksum:12472 10967 Sleep 0.5 waiting for chunks # pt_table_checksum:12416 10967 Getting last checksum on slave1 # pt_table_checksum:12419 10967 slave1 max chunk: undef

We don’t recommend using the tool with replication filters in place, but if --no-check-replication-filters is specified you should be aware of the differences in how different binlog formats handle these filters.

One workaround would be to replace


With the following which will just ignore writes to that database:


More resources

You can read more about pt-table-checksum in this blog post MySQL Replication Primer with pt-table-checksum and pt-table-sync

The latest version of Percona Toolkit can be downloaded from our website. All Percona software is open source and free.

Photo by Black ice from Pexels


MySQL InnoDB Cluster with 2 Data Centers for Disaster Recovery: howto

As you know, MySQL InnoDB Cluster is a High Availability solution for MySQL. However more and more people are trying to use it as a Disaster Recovery solution with 2 data centers. Natively, this is not yet supported. But it’s already possible to realize such setup if we agree with the following points:

  •  a human interaction is required in case of Disaster Recovery which, by my own experience, is often acceptable
  • a human interaction is required if the any Primary-Master acting as asynchronous slave leaves its group  (crash, network problem, …) or becomes secondary

These are not big constraints and it’s relatively easily to deal with them.

The Architecture

The situation is as follow:

  • 2 data centers (one active, one inactive, only used for disaster recovery)
  • 2 MySQL InnoDB Clusters (one in each DC)
  • 3 members in each cluster (to deal with local failure)
  • 1 router in each DC used for asynchronous replication
  • application server(s) with local router in each DC (only those in the active DC should be available, once again, this is a disaster recovery solution)


The first step is to take a backup from the current production to restore on the cluster (if you need live migration with minimal downtime, please check this post and/or this tutorial)

Example of backup using MySQL Enterprise Backup, aka meb:

/opt/meb/bin/mysqlbackup \ --host \ --backup-dir /vagrant/backup/ \ backup-and-apply-log

This machine had the following set of GTID executed:

mysql> show master status\G *************************** 1. row *************************** File: binlog.000003 Position: 2019696 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: aa410ab6-edb1-11e8-9e34-08002718d305:1-30

Now we will restore the backup on all the servers/nodes in DC1 (mysql1, mysql2 and mysql3) and DC2 (mysql4, mysql5 and mysql6):

# systemctl stop mysqld # rm -rf /var/lib/mysql/* # /opt/meb/bin/mysqlbackup --backup-dir=/vagrant/backup/ copy-back # chown -R mysql. /var/lib/mysql # systemctl start mysqld

Let’s start MySQL-Shell (this can be remote or on one of the server, it doesn’t matter):

# mysqlsh MySQL>JS> dba.configureInstance('clusteradmin@mysql1') MySQL>JS> dba.configureInstance('clusteradmin@mysql2') MySQL>JS> dba.configureInstance('clusteradmin@mysql3') MySQL>JS> dba.configureInstance('clusteradmin@mysql4') MySQL>JS> dba.configureInstance('clusteradmin@mysql5') MySQL>JS> dba.configureInstance('clusteradmin@mysql6')

clusteradmin is a user that was created already in the production server from which we took the backup. The user was created with the following statements:

mysql> create user clusteradmin identified by 'MySQL8isGreat'; mysql> grant all privileges on *.* to 'clusteradmin'@'%' with grant option;

The dba.configureInstance( ) method will ask you to modify 4 variables on a fresh installed MySQL 8.0.13, please confirm and let the shell restart mysqld.

The 4 variables are:

+--------------------------+---------------+----------------+ | Variable | Current Value | Required Value | +--------------------------+---------------+----------------+ | binlog_checksum | CRC32 | NONE | | enforce_gtid_consistency | OFF | ON | | gtid_mode | OFF | ON | | server_id | 1 | | +--------------------------+---------------+----------------+ Cluster Creation

When the restart operation is performed, it’s time to connect to one of the node (I use mysql1) and create the first cluster:

MYSQL>JS> \c clusteradmin@mysql1 Creating a session to 'clusteradmin@mysql1' Fetching schema names for autocompletion... Press ^C to stop. Your MySQL connection id is 8 (X protocol) Server version: 8.0.13 MySQL Community Server - GPL No default schema selected; type \use to set one. MySQL>JS> cluster=dba.createCluster('clusterDC1') A new InnoDB cluster will be created on instance 'clusteradmin@mysql1:3306'. Validating instance at mysql1:3306... This instance reports its own address as mysql1 Instance configuration is suitable. Creating InnoDB cluster 'clusterDC1' on 'clusteradmin@mysql1:3306'... Adding Seed Instance... Cluster successfully created. Use Cluster.addInstance() to add MySQL instances. At least 3 instances are needed for the cluster to be able to withstand up to one server failure.

Let’s verify:

JS> cluster.status() { "clusterName": "clusterDC1", "defaultReplicaSet": { "name": "default", "primary": "mysql1:3306", "ssl": "REQUIRED", "status": "OK_NO_TOLERANCE", "statusText": "Cluster is NOT tolerant to any failures.", "topology": { "mysql1:3306": { "address": "mysql1:3306", "mode": "R/W", "readReplicas": {}, "role": "HA", "status": "ONLINE" } } }, "groupInformationSourceMember": "mysql://clusteradmin@mysql1:3306" } Adding Instances

We can now add the other members to our cluster:

JS> cluster.addInstance('clusteradmin@mysql2') JS> cluster.addInstance('clusteradmin@mysql3')

Now we have our first cluster ready (the one in DC1):

JS> cluster.status() { "clusterName": "clusterDC1", "defaultReplicaSet": { "name": "default", "primary": "mysql1:3306", "ssl": "REQUIRED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "mysql1:3306": { "address": "mysql1:3306", "mode": "R/W", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "mysql2:3306": { "address": "mysql2:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "mysql3:3306": { "address": "mysql3:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" } } }, "groupInformationSourceMember": "mysql://clusteradmin@mysql1:3306" } Adding members of DC2

Now we will cover the little trick, we must add also the future nodes of DC2 in the actual cluster of DC1. This is only temporary but it’s needed to avoid to play with replication filters for InnoDB Cluster Metadata (that won’t work completely anyway).
So let’s add mysql4, mysql5 and mysql6 too:

JS> cluster.addInstance('clusteradmin@mysql4') JS> cluster.addInstance('clusteradmin@mysql5') JS> cluster.addInstance('clusteradmin@mysql6')

And we can verify the status:

JS> cluster.status() { "clusterName": "clusterDC1", "defaultReplicaSet": { "name": "default", "primary": "mysql1:3306", "ssl": "REQUIRED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to 2 failures.", "topology": { "mysql1:3306": { "address": "mysql1:3306", "mode": "R/W", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "mysql2:3306": { "address": "mysql2:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "mysql3:3306": { "address": "mysql3:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "mysql4:3306": { "address": "mysql4:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "mysql5:3306": { "address": "mysql5:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "mysql6:3306": { "address": "mysql6:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" } } }, "groupInformationSourceMember": "mysql://clusteradmin@mysql1:3306" } Prepare the replication user

On the primary master, in our case mysql1, we need to create a user that will be used for asynchronous replication between the clusters:

mysql> create user 'repl'@'%' identified by 'replication' REQUIRE SSL mysql> grant replication slave on *.* to 'repl'@'%'; MysQL Router Bootstrap

As the MySQL  Router doesn’t yet allow the bootstrap method when more than one cluster is available in the InnoDB Cluster metadata, this is the right time to perform this operation (don’t try to perform it later as it will fail):

# mysqlrouter --bootstrap clusteradmin@mysql1 --user mysqlrouter Note: I've entered a Feature Request #93302. Second InnoDB Cluster Creation

Before the creation of the second cluster, we need  to remove all the 3 nodes from the current cluster (clusterDC1):

JS> cluster.removeInstance('clusteradmin@mysql4') JS> cluster.removeInstance('clusteradmin@mysql5') JS> cluster.removeInstance('clusteradmin@mysql6')

Now clusterDC1 is back at the requested state:

JS> cluster.status() { "clusterName": "clusterDC1", "defaultReplicaSet": { "name": "default", "primary": "mysql1:3306", "ssl": "REQUIRED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "mysql1:3306": { "address": "mysql1:3306", "mode": "R/W", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "mysql2:3306": { "address": "mysql2:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "mysql3:3306": { "address": "mysql3:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" } } }, "groupInformationSourceMember": "mysql://clusteradmin@mysql1:3306" }

But on the nodes of DC2,  we still have the innodb metadata information:

SQL> select * from mysql_innodb_cluster_metadata.clusters\G *************************** 1. row *************************** cluster_id: 1 cluster_name: clusterDC1 default_replicaset: 1 description: Default Cluster mysql_user_accounts: NULL options: null attributes: {"default": true}

It’s time to connect via MySQL Shell to mysql4 and create the second cluster (clusterDC2):

JS> \c clusteradmin@mysql4 Creating a session to 'clusteradmin@mysql4' Fetching schema names for autocompletion... Press ^C to stop. Closing old connection... Your MySQL connection id is 34 (X protocol) Server version: 8.0.13 MySQL Community Server - GPL No default schema selected; type \use to set one. MySQL>JS> cluster2=dba.createCluster('clusterDC2') A new InnoDB cluster will be created on instance 'clusteradmin@mysql4:3306'. The MySQL instance at 'mysql4:3306' currently has the super_read_only system variable set to protect it from inadvertent updates from applications. You must first unset it to be able to perform any changes to this instance. For more information see: Note: there are open sessions to 'mysql4:3306'. You may want to kill these sessions to prevent them from performing unexpected updates: 1 open session(s) of 'root@localhost'. Do you want to disable super_read_only and continue? [y/N]: y Validating instance at mysql4:3306... This instance reports its own address as mysql4 Instance configuration is suitable. Creating InnoDB cluster 'clusterDC2' on 'clusteradmin@mysql4:3306'... Adding Seed Instance... Cluster successfully created. Use Cluster.addInstance() to add MySQL instances. At least 3 instances are needed for the cluster to be able to withstand up to one server failure.

As you can see, mysql4 was read only !

We have our second cluster in the cluster2 object:

JS> cluster2.status() { "clusterName": "clusterDC2", "defaultReplicaSet": { "name": "default", "primary": "mysql4:3306", "ssl": "REQUIRED", "status": "OK_NO_TOLERANCE", "statusText": "Cluster is NOT tolerant to any failures.", "topology": { "mysql4:3306": { "address": "mysql4:3306", "mode": "R/W", "readReplicas": {}, "role": "HA", "status": "ONLINE" } } }, "groupInformationSourceMember": "mysql://clusteradmin@mysql4:3306" }

We can verify in the InnoDB Cluster Metadata that we have 2 clusters now:

SQL> select * from mysql_innodb_cluster_metadata.clusters\G *************************** 1. row *************************** cluster_id: 1 cluster_name: clusterDC1 default_replicaset: 1 description: Default Cluster mysql_user_accounts: NULL options: null attributes: {"default": true} *************************** 2. row *************************** cluster_id: 2 cluster_name: clusterDC2 default_replicaset: 2 description: Default Cluster mysql_user_accounts: NULL options: null attributes: {"default": true} 2 rows in set (0.0026 sec)

All good !

Before being able to add mysql5 and mysql6to clusterDC2, we need first to setup and start asynchronous replication from clusterDC1 to mysql4. This is required because for the moment, mysql5 and mysql6 have extra transactions (GTID) that mysql4 is not yet aware of. Indeed, when mysql4 was removed from the cluster, the meta data had been modified and this modification was also replicated via the group to mysql5 and mysql6. Same when mysql5 was removed.

Setup & start asynchronous replication from DC1 to DC2

Setup asynchronous replication is very easy when GTID are used. We only need to run the following 2 commands:


And we can check that everything is working as expected:

mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: mysql1 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: binlog.000006 Read_Master_Log_Pos: 54330 Relay_Log_File: mysql4-relay-bin.000002 Relay_Log_Pos: 2875 Relay_Master_Log_File: binlog.000006 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 54330 Relay_Log_Space: 3076 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: Yes Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1247156063 Master_UUID: 5208b04f-edb2-11e8-b3a1-08002718d305 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 9927c511-edb2-11e8-8e54-08002718d305:51-58 Executed_Gtid_Set: 4ac4ffd9-edb2-11e8-a836-08002718d305:1-4, 5208b04f-edb2-11e8-b3a1-08002718d305:1-12, 9927c511-edb2-11e8-8e54-08002718d305:1-58, aa410ab6-edb1-11e8-9e34-08002718d305:1-30, ba9c231d-edb3-11e8-8d74-08002718d305:1-2 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: Master_public_key_path: Get_master_public_key: 0

mysql4 retrieved all the changes and now has also the missing GTIDs.

Adding the other nodes

We can now add mysql5 and mysql6 to clusterDC2:

JS> cluster2.addInstance('clusteradmin@mysql5') JS> cluster2.addInstance('clusteradmin@mysql6')

And we can check clusterDC2‘s status:

JS> cluster2.status() { "clusterName": "clusterDC2", "defaultReplicaSet": { "name": "default", "primary": "mysql4:3306", "ssl": "REQUIRED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "mysql4:3306": { "address": "mysql4:3306", "mode": "R/W", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "mysql5:3306": { "address": "mysql5:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "mysql6:3306": { "address": "mysql6:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" } } }, "groupInformationSourceMember": "mysql://clusteradmin@mysql4:3306" } Asynchronous replication to clusterDC1

We have now our 2 clusters running. It’s time to replicate to clusterDC1 the metadata of clusterDC2.

On mysql1 (the Primary-Master, see status()),  we need to run the following statements:


That slave thread can be stopped, certainly as you SHOULD NOT WRITE in DC2. But if you have a lot of metadata changes and you rotate your binary logs, you might end up with errands transaction in case of failover. So I recommend to let it on.

MySQL Router Extra Configuration

We still have to manually modify the MySQL Router. We bootstrapped it earlier. Now it’s time to change the configuration.

We will have to create two configurations for the router, 1 for all the application servers in DC1 and 1 for those in DC2.

/etc/mysqlrouter/mysqlrouter.conf in DC1 # File automatically generated during MySQL Router bootstrap [DEFAULT] name=system user=mysqlrouter keyring_path=/var/lib/mysqlrouter/keyring master_key_path=/etc/mysqlrouter/mysqlrouter.key connect_timeout=30 read_timeout=30 [logger] level = INFO [metadata_cache:clusterDC1] router_id=1 bootstrap_server_addresses=mysql://mysql1:3306,mysql://mysql2:3306,mysql://mysql3:3306 user=mysql_router1_8qm3jeohgy4e metadata_cluster=clusterDC1 ttl=0.5 [routing:clusterDC1_default_rw] bind_address= bind_port=6446 destinations=metadata-cache://clusterDC1/default?role=PRIMARY routing_strategy=round-robin protocol=classic [routing:clusterDC1_default_ro] bind_address= bind_port=6447 destinations=metadata-cache://clusterDC1/default?role=SECONDARY routing_strategy=round-robin protocol=classic [routing:clusterDC1_default_x_rw] bind_address= bind_port=64460 destinations=metadata-cache://clusterDC1/default?role=PRIMARY routing_strategy=round-robin protocol=x [routing:clusterDC1_default_x_ro] bind_address= bind_port=64470 destinations=metadata-cache://clusterDC1/default?role=SECONDARY routing_strategy=round-robin protocol=x /etc/mysqlrouter/mysqlrouter.conf in DC2 # File automatically generated during MySQL Router bootstrap [DEFAULT] name=system user=mysqlrouter keyring_path=/var/lib/mysqlrouter/keyring master_key_path=/etc/mysqlrouter/mysqlrouter.key connect_timeout=30 read_timeout=30 [logger] level = INFO [metadata_cache:clusterDC2] router_id=1 bootstrap_server_addresses=mysql://mysql4:3306,mysql://mysql5:3306,mysql://mysql6:3306 user=mysql_router1_8qm3jeohgy4e metadata_cluster=clusterDC2 ttl=0.5 [routing:clusterDC2_default_rw] bind_address= bind_port=6446 destinations=metadata-cache://clusterDC2/default?role=PRIMARY routing_strategy=round-robin protocol=classic [routing:clusterDC2_default_ro] bind_address= bind_port=6447 destinations=metadata-cache://clusterDC2/default?role=SECONDARY routing_strategy=round-robin protocol=classic [routing:clusterDC2_default_x_rw] bind_address= bind_port=64460 destinations=metadata-cache://clusterDC2/default?role=PRIMARY routing_strategy=round-robin protocol=x [routing:clusterDC2_default_x_ro] bind_address= bind_port=64470 destinations=metadata-cache://clusterDC2/default?role=SECONDARY routing_strategy=round-robin protocol=x

Of course the configuration file, the key (/etc/mysqlrouter/mysqlrouter.key) and the keyring file (/var/lib/mysqlrouter/keyring) must be copied on all routers and the ownership changed to mysqlrouter’s user (mysqlrouter).

MySQL Router for Replication

As explained in this post, we can also use the MySQL Router in case the Primary-Master acting as asynchronous master dies. If it’s the asynchronous slave that dies, you need to promote the new Primary-Master as asynchronous slave manually.

On a dedicated machine or on all nodes you need to setup the MySQL Router to point to the other DC. So in DC1, you need to have the same router configuration as the application servers in DC2. And the opposite in DC2.

For example, I used mysql1 as router for clusterDC2 and mysql4 as router for clusterDC1.

[root@mysql1 vagrant]# cp mysqlrouter.key /etc/mysqlrouter/ [root@mysql1 vagrant]# cp dc2_mysqlrouter.conf /etc/mysqlrouter/mysqlrouter.conf cp: overwrite ‘/etc/mysqlrouter/mysqlrouter.conf’? y [root@mysql1 vagrant]# mkdir /var/lib/mysqlrouter [root@mysql1 vagrant]# cp keyring /var/lib/mysqlrouter/ [root@mysql1 vagrant]# chown mysqlrouter -R /var/lib/mysqlrouter/ [root@mysql1 vagrant]# chown mysqlrouter /etc/mysqlrouter/mysqlrouter.* [root@mysql1 vagrant]# systemctl start mysqlrouter [root@mysql1 vagrant]# systemctl status mysqlrouter ● mysqlrouter.service - MySQL Router Loaded: loaded (/usr/lib/systemd/system/mysqlrouter.service; disabled; vendor preset: disabled) Active: active (running) since Thu 2018-11-22 12:32:50 UTC; 4s ago Main PID: 15403 (main) CGroup: /system.slice/mysqlrouter.service └─15403 /usr/bin/mysqlrouter -c /etc/mysqlrouter/mysqlrouter.conf Nov 22 12:32:50 mysql1 systemd[1]: Started MySQL Router. Nov 22 12:32:50 mysql1 systemd[1]: Starting MySQL Router...

And the same on mysql4 using the config of DC1.

We can test, we need to use port 6446 yo reach the Primary-Master of each cluster:

[root@mysql1 ~]# mysql -h mysql1 -P 6446 -u clusteradmin -p -e "select @@hostname" Enter password: +------------+ | @@hostname | +------------+ | mysql4 | +------------+ [root@mysql1 ~]# mysql -h mysql4 -P 6446 -u clusteradmin -p -e "select @@hostname" Enter password: +------------+ | @@hostname | +------------+ | mysql1 | +------------+

Perfect, now we just need to change the asynchronous replication.

Let’s start in mysql1:

mysql> STOP SLAVE; mysql> CHANGE MASTER TO MASTER_HOST='mysql1', MASTER_PORT=6446, MASTER_USER='repl', MASTER_PASSWORD='replication', MASTER_AUTO_POSITION=1, MASTER_SSL=1; mysql> START SLAVE; mysql> SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: mysql1 Master_User: repl Master_Port: 6446 Connect_Retry: 60 Master_Log_File: binlog.000006 Read_Master_Log_Pos: 37341 Relay_Log_File: mysql1-relay-bin.000002 Relay_Log_Pos: 391 Relay_Master_Log_File: binlog.000006 Slave_IO_Running: Yes Slave_SQL_Running: Yes ...

We need to perform the change in mysql4 that needs to use mysql4 as master using port 6446.


As you can see, it’s possible to use MySQL InnoDB Cluster as High Availability solution (HA) and extend it for Disaster Recovery (DR) using Asynchronous Replication.

Please keep in mind, that it is highly recommended to write only in one DC at the time.

Percona Server for MySQL 5.7.23-25 Is Now Available

Percona announces the release of Percona Server for MySQL 5.7.23-25 on November 21, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.7.23, including all the bug fixes in it. Percona Server for MySQL 5.7.23-25 is now the current GA release in the 5.7 series. All of Percona’s software is open-source and free.

This release fixes a critical bug in a RocksDB submodule.

Bugs Fixed
  • #5049: Severe memory leak regression in the RocksDB Block Cache

Find the release notes for Percona Server for MySQL 5.7.23-25 in our online documentation. Report bugs in the Jira bug tracker.

Percona Server for MySQL 5.5.62-38.14 Is Now Available

Percona announces the release of Percona Server for MySQL 5.5.62-38.14 on November 21, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.5.62, including all the bug fixes in it. Percona Server for MySQL 5.5.62-38.14 is now the current GA release in the 5.5 series. All of Percona’s software is open-source and free.

Note that Percona Server for MySQL 5.5.62-38.14 is the last release of the 5.5 series. This series goes EOL on December 1st, 2018.

  • #4790: The accuracy of user statistics has been improved
Bugs Fixed
  • The binary log could be corrupted when the disk partition used for temporary files (tmpdir system variable) had little free space. Bug fixed #1107
  • PURGE CHANGED_PAGE_BITMAPS did not work when the innodb_data_home_dir system variable was used. Bug fixed #4723
Other Bugs Fixed
  • #4773: Percona Server sources can’t be compiled without server.
  • #4781: sql_yacc.yy uses SQLCOM_SELECT instead of SQLCOM_SHOW_XXXX_STATS

Find the release notes for Percona Server for MySQL 5.5.62-38.14 in our online documentation. Report bugs in the Jira bug tracker.

How to Encrypt Your MySQL & MariaDB Backups

We usually take care of things we value, whether it is an expensive smartphone or the company’s servers. Data is one of the most important assets of the organisation, and although we do not see it, it has to be carefully protected. We implement data at rest encryption to encrypt database files or whole volumes which contain our data. We implement data in transit encryption using SSL to make sure no one can sniff and collect data sent across networks. Backups are no different. No matter if this is a full backup or incremental, it will store at least a part of your data. As such, backups have to be encrypted too. In this blog post, we will look into some options you may have when it comes to encrypting backups. First though, let’s look at how you can encrypt your backups and what could be use cases for those methods.

How to encrypt your backup? Encrypt local file

First of all, you can easily encrypt existing files. Let’s imagine that you have a backup process storing all your backups on a backup server. At some point you decided it’s the high time to implement offsite backup storage for disaster recovery. You can use S3 or similar infrastructure from other cloud providers for that. Of course, you don’t want to upload unencrypted backups anywhere outside of your trusted network, therefore it is critical that you implement backup encryption one way or the other. A very simple method, not requiring any changes in your existing backup scripts would be to create a script which will take a backup file, encrypt it and upload it to S3. One of the methods you can use to encrypt the data is to use openssl:

openssl enc -aes-256-cbc -salt -in backup_file.tar.gz -out backup_file.tar.gz.enc -k yoursecretpassword

This will create a new, encrypted file, ‘backup_file.tar.gz.enc’ in the current directory. You can always decrypt it later by running:

openssl aes-256-cbc -d -in backup_file.tar.gz.enc -out backup_file.tar.gz -k yoursecretpassword

This method is very simple, but it has some drawbacks. The biggest one is the disk space requirements. When encrypting like we described above, you have to keep both unencrypted and encrypted file and in the result you require a disk space twice the size of the backup file. Of course, depending on your requirements, this might be something you want - keeping non-encrypted files locally will improve recovery speed - after all decrypting the data will also take some time.

Encrypt backup on the fly

To avoid the need of storing both encrypted and unencrypted data, you may want to implement the encryption at the earlier stage of the backup process. We can do that through pipes. Pipes are, in short, a way of sending the data from one command to another. This makes it possible to create a chain of commands that processes data. You can generate the data, then compress it and encrypt. An example of such chain might be:

mysqldump --all-databases --single-transaction --triggers --routines | gzip | openssl enc -aes-256-cbc -k mypass > backup.xb.enc

You can also use this method with xtrabackup or mariabackup. In fact, this is the example from MariaDB documentation:

mariabackup --user=root --backup --stream=xbstream | openssl enc -aes-256-cbc -k mypass > backup.xb.enc

If you want, you can even try to upload data as the part of the process:

mysqldump --all-databases --single-transaction --triggers --routines | gzip | openssl enc -aes-256-cbc -k mysecretpassword | tee -a mysqldump.gz.enc | nc 9991

As a result, you will see a local file ‘mysqldump.gz.enc’ and copy of the data will be piped to a program which will do something about it. We used ‘nc’, which streamed data to another host on which following was executed:

nc -l 9991 > backup.gz.enc

In this example we used mysqldump and nc but you can use xtrabackup or mariabackup and some tool which will upload the stream to Amazon S3, Google Cloud Storage or some other cloud provider. Any program or script which accepts data on stdin can be used instead of ‘nc’.

Use embedded encryption

In some of the cases, a backup tool has embedded support for encryption. An example here is xtrabackup, which can internally encrypt the file. Unfortunately, mariabackup, even though it is a fork of xtrabackup, does not support this feature.

Before we can use it, we have to create a key which will be used to encrypt the data. It can be done by running the following command:

root@vagrant:~# openssl rand -base64 24 HnliYiaRo7NUvc1dbtBMvt4rt1Fhunjb

Xtrabackup can accept the key in plain text format (using --encrypt-key option) or it can read it from file (using --encrypt-key-file option). The latter is safer as passing passwords and keys as plain text to commands result in storing them in the bash history. You can also see it clearly on the list of running processes, which is quite bad:

root 1130 0.0 0.6 65508 4988 ? Ss 00:46 0:00 /usr/sbin/sshd -D root 13826 0.0 0.8 93100 6648 ? Ss 01:26 0:00 \_ sshd: root@notty root 25363 0.0 0.8 92796 6700 ? Ss 08:54 0:00 \_ sshd: vagrant [priv] vagrant 25393 0.0 0.6 93072 4936 ? S 08:54 0:01 | \_ sshd: vagrant@pts/1 vagrant 25394 0.0 0.4 21196 3488 pts/1 Ss 08:54 0:00 | \_ -bash root 25402 0.0 0.4 52700 3568 pts/1 S 08:54 0:00 | \_ sudo su - root 25403 0.0 0.4 52284 3264 pts/1 S 08:54 0:00 | \_ su - root 25404 0.0 0.4 21196 3536 pts/1 S 08:54 0:00 | \_ -su root 26686 6.0 4.0 570008 30980 pts/1 Sl+ 09:48 0:00 | \_ innobackupex --encrypt=AES256 --encrypt-key=TzIZ7g+WzLt0PXWf8WDPf/sjIt7UzCKw /backup/

Ideally, you will use the key stored in a file but then there’s a small gotcha you have to be aware of.

root@vagrant:~# openssl rand -base64 24 > encrypt.key root@vagrant:~# innobackupex --encrypt=AES256 --encrypt-key-file=/root/encrypt.key /backup/ . . . xtrabackup: using O_DIRECT InnoDB: Number of pools: 1 encryption: unable to set libgcrypt cipher key - User defined source 1 : Invalid key length encrypt: failed to create worker threads. Error: failed to initialize datasink.

You may wonder what the problem is. It’ll become clear when we will open encrypt.key file in a hexadecimal editor like hexedit:

00000000 6D 6B 2B 4B 66 69 55 4E 5A 49 48 77 39 42 36 72 68 70 39 79 6A 56 44 72 47 61 79 45 6F 75 6D 70 0A mk+KfiUNZIHw9B6rhp9yjVDrGayEoump.

You can now notice the last character encoded using ‘0A’. This is basically the line feed character, but it is taken under consideration while evaluating the encryption key. Once we remove it, we can finally run the backup.

root@vagrant:~# innobackupex --encrypt=AES256 --encrypt-key-file=/root/encrypt.key /backup/ xtrabackup: recognized server arguments: --datadir=/var/lib/mysql --innodb_buffer_pool_size=185M --innodb_flush_log_at_trx_commit=2 --innodb_file_per_table=1 --innodb_data_file_path=ibdata1:100M:autoextend --innodb_read_io_threads=4 --innodb_write_io_threads=4 --innodb_doublewrite=1 --innodb_log_file_size=64M --innodb_log_buffer_size=16M --innodb_log_files_in_group=2 --innodb_flush_method=O_DIRECT --server-id=1 xtrabackup: recognized client arguments: --datadir=/var/lib/mysql --innodb_buffer_pool_size=185M --innodb_flush_log_at_trx_commit=2 --innodb_file_per_table=1 --innodb_data_file_path=ibdata1:100M:autoextend --innodb_read_io_threads=4 --innodb_write_io_threads=4 --innodb_doublewrite=1 --innodb_log_file_size=64M --innodb_log_buffer_size=16M --innodb_log_files_in_group=2 --innodb_flush_method=O_DIRECT --server-id=1 --databases-exclude=lost+found --ssl-mode=DISABLED encryption: using gcrypt 1.6.5 181116 10:11:25 innobackupex: Starting the backup operation IMPORTANT: Please check that the backup run completes successfully. At the end of a successful backup run innobackupex prints "completed OK!". 181116 10:11:25 version_check Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock' as 'backupuser' (using password: YES). 181116 10:11:25 version_check Connected to MySQL server 181116 10:11:25 version_check Executing a version check against the server... 181116 10:11:25 version_check Done. 181116 10:11:25 Connecting to MySQL server host: localhost, user: backupuser, password: set, port: not set, socket: /var/lib/mysql/mysql.sock Using server version 5.7.23-23-57 innobackupex version 2.4.12 based on MySQL server 5.7.19 Linux (x86_64) (revision id: 170eb8c) xtrabackup: uses posix_fadvise(). xtrabackup: cd to /var/lib/mysql xtrabackup: open files limit requested 0, set to 1024 xtrabackup: using the following InnoDB configuration: xtrabackup: innodb_data_home_dir = . xtrabackup: innodb_data_file_path = ibdata1:100M:autoextend xtrabackup: innodb_log_group_home_dir = ./ xtrabackup: innodb_log_files_in_group = 2 xtrabackup: innodb_log_file_size = 67108864 xtrabackup: using O_DIRECT InnoDB: Number of pools: 1 181116 10:11:25 >> log scanned up to (2597648) xtrabackup: Generating a list of tablespaces InnoDB: Allocated tablespace ID 19 for mysql/server_cost, old maximum was 0 181116 10:11:25 [01] Encrypting ./ibdata1 to /backup/2018-11-16_10-11-25/ibdata1.xbcrypt 181116 10:11:26 >> log scanned up to (2597648) 181116 10:11:27 >> log scanned up to (2597648) 181116 10:11:28 [01] ...done 181116 10:11:28 [01] Encrypting ./mysql/server_cost.ibd to /backup/2018-11-16_10-11-25/mysql/server_cost.ibd.xbcrypt 181116 10:11:28 [01] ...done 181116 10:11:28 [01] Encrypting ./mysql/help_category.ibd to /backup/2018-11-16_10-11-25/mysql/help_category.ibd.xbcrypt 181116 10:11:28 [01] ...done 181116 10:11:28 [01] Encrypting ./mysql/slave_master_info.ibd to /backup/2018-11-16_10-11-25/mysql/slave_master_info.ibd.xbcrypt 181116 10:11:28 [01] ...done

As a result we will end up with a backup directory full of encrypted files:

root@vagrant:~# ls -alh /backup/2018-11-16_10-11-25/ total 101M drwxr-x--- 5 root root 4.0K Nov 16 10:11 . drwxr-xr-x 16 root root 4.0K Nov 16 10:11 .. -rw-r----- 1 root root 580 Nov 16 10:11 backup-my.cnf.xbcrypt -rw-r----- 1 root root 515 Nov 16 10:11 ib_buffer_pool.xbcrypt -rw-r----- 1 root root 101M Nov 16 10:11 ibdata1.xbcrypt drwxr-x--- 2 root root 4.0K Nov 16 10:11 mysql drwxr-x--- 2 root root 12K Nov 16 10:11 performance_schema drwxr-x--- 2 root root 12K Nov 16 10:11 sys -rw-r----- 1 root root 113 Nov 16 10:11 xtrabackup_checkpoints -rw-r----- 1 root root 525 Nov 16 10:11 xtrabackup_info.xbcrypt -rw-r----- 1 root root 2.7K Nov 16 10:11 xtrabackup_logfile.xbcrypt

Xtrabackup has some other variables which can be used to tune encryption performance:

  • --encrypt-threads allows for parallelization of the encryption process
  • --encrypt-chunk-size defines a working buffer for encryption process.

Should you need to decrypt the files, you can use innobackupex with --decrypt option for that:

root@vagrant:~# innobackupex --decrypt=AES256 --encrypt-key-file=/root/encrypt.key /backup/2018-11-16_10-11-25/

As xtrabackup does not clean encrypted files, you may want to remove them using following one-liner:

for i in `find /backup/2018-11-16_10-11-25/ -iname "*\.xbcrypt"`; do rm $i ; done Backup encryption in ClusterControl

With ClusterControl encrypted backups are just one click away. All backup methods (mysqldump, xtrabackup or mariabackup) support encryption. You can both create a backup ad hoc or you can prepare a schedule for your backups.

In our example we picked a full xtrabackup, we will store it on the controller instance.

On the next page we enabled the encryption. As stated, ClusterControl will automatically create an encryption key for us. This is it, when you click at the “Create Backup” button a process will be started.

New backup is visible on the backup list. It is marked as encrypted (the lock icon).

We hope that this blog post gives you some insights into how to make sure your backups are properly encrypted.

Tags:  MySQL MariaDB backup encryption

MySQL Multi Source Replication With GTID and Non-GTID Mode

During our recent migration we migrated a part of a app to different region with new db. Our previous setup was on MySQL 5.6.39 with single write master and multiple slave to read data.  
Current Setup:

But now Masters will be two one for old and second for new migrated service. We moved new master on MySQL 5.7 with GTID, MTS and Logical Clock based replication. So our use case was to read whole data (written on master1 + master2) from slaves.

So now question is What we have done to achieve this?

Below is New Setup:

Now Question is what slave 5 is doing and how we can generate GTID on an intermediate slave?
Ans: With the help of our community expert (Jean-François Gagné) we got to know that we can generate GTID on an intermediate slave using process described in blog:
So we compiled MySQL from patched source code and rolled out GTID on it. It worked and started generating GTID's. Now we upgraded our existing slaves from MySQL 5.6 to 5.7 and configured new slave channel of Master 2 on it.
Conclusion: So in way we have configured Multisource replication where we are getting data from Non GTID master and GTID master both.

How To Install and Secure phpMyAdmin on Ubuntu 18.04 LTS

phpMyAdmin is a free and open source administration tool for MySQL and MariaDB. phpMyAdmin is a web-based tool that allows you to easily manage MySQL or MariaDB databases. In this tutorial, we will be going to explain how to install and secure phpMyAdmin on Ubuntu 18.04 server.

MariaDB 10.3.11, and MariaDB Connector/C 3.0.7, Connector/ODBC 3.0.7 and Connector/Node.js 2.0.1 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.3.11, the latest stable release in the MariaDB 10.3 series, as well as MariaDB Connector/C 3.0.7 and MariaDB Connector/ODBC 3.0.7, both stable releases, and MariaDB Connector/Node.js 2.0.1, the first beta release of the new 100% JavaScript non-blocking MariaDB client for Node.js, compatible with Node.js […]

The post MariaDB 10.3.11, and MariaDB Connector/C 3.0.7, Connector/ODBC 3.0.7 and Connector/Node.js 2.0.1 now available appeared first on

Where you can find MySQL this week

Just a friendly reminder for another two shows where you can find MySQL team at this week. 

  • Feira do Conhecimento Brazil is a unique event with more than 10.000 attendees. There will be about 200 exhibitors on 9000m2 of expo space presenting technological solutions to improve key areas for society such as health, education, cities, agriculture, water resources, renewable energies, inclusion and diversity. The show will bring together academia, business, government and community to show how science and technology can help reduce inequalities. We are happy to be part of this show with our talk and MySQL representative!
    • Place: Fortaleza, Brazil
    • Date: November 21-24, 2018
    • Approved Talk: "Innovation, Business & Technology" 
  • PyCon HK where MySQL Community is a Bronze sponsor. See details below:
    • Place: Hong Kong
    • Date: November 23-24, 2018
    • MySQL talk: "NoSQL Development for MySQL Document Store using Python" by Ivan Ma, the MySQL Principal Sales Consultant​