Planet MySQL

How to Install BookStack Documentation Wiki on CentOS 7

BookStack is an open source platform to create documentation/wiki content for your project. In this tutorial, I will show you step-by-step how to install and configure BookStack on CentOS 7 under the LEMP (Linux, Nginx, PHP-FPM, MySQL/MariaDB) stack.

MySQL Connector/NET 6.9.12 GA has been released

Dear MySQL users,

MySQL Connector/Net 6.9.12 is a maintenance release for the 6.9.x series
of the .NET driver for MySQL. It can be used for production

It is appropriate for use with MySQL server versions 5.5-5.7.

It is now available in source and binary form from
(note that not all mirror sites may be up to date at this point-if you
can’t find this version on some mirror, please try again later or choose
another download site.)

Note: C/NET 6.9 will not be supported any more. If users are looking to get latest
updates on C/NET, they should upgrade to using C/NET 6.10 and 8.0 series instead.

Changes in MySQL Connector/Net 6.9.12 (2018-05-04, General

Functionality Added or Changed

* Connections made to MySQL 8.0 (up to and including
version 8.0.3) and compatibility with the new data
dictionary are now supported. For information about the
data dictionary, see MySQL Data Dictionary

* Support for the caching_sha2_password authentication
plugin through the classic MySQL protocol was added. In
addition, the sha256_password plugin was extended to
support authentication when RSA keys are available
through non-secure connections. Caching SHA-2 pluggable
authentication offers faster authentication than basic
SHA-256 authentication.

* Support was added for the new caching_sha2_password
padding mechanism introduced in the MySQL 8.0 release
series. The new padding mechanism is enabled when all of
the following conditions apply:

+ The user account is set with the
caching_sha2_password authentication plugin.

+ SSL is disabled explicitly (SslMode=none).

+ The AllowPublicKeyRetrieval connection option is
enabled (AllowPublicKeyRetrieval=true).
When enabled, the new padding mechanism is used to encode
the password during RSA key encryption, which applies the
correct padding to match the server.

Bugs Fixed

* The MySqlConnection.GetSchema(“PROCEDURES”, restrictions)
method call generated an error message, instead of
returning stored procedures, when the server connection
was to the MySQL 8.0 release series. (Bug #25961782)

* Attempting to generate an Entity Framework model from a
MySQL 5.7 database using either EF5 or EF6 produced an
exception that prevented the operation from generating
the expected model. (Bug #22173048, Bug #79163)

The documentation is available at:

Nuget packages are available at:

Enjoy and thanks for the support!

On behalf of the MySQL Release Team,
Sreedhar S

Q&A: “Percona XtraDB Cluster 5.7 and ProxySQL for Your High Availability Needs” Webinar

On March 22, 2018, we held a webinar on how Percona XtraDB cluster 5.7 (PXC) and ProxySQL can help achieve your database clustering high availability needs. Firstly, thanks to all the attendees for taking time to attend the webinar and we are sure you had a webinar experience. We tried answering some of your high availability questions during the call but due to time restrictions if we missed some of the questions then this blog will help clarify them.

Q. You say the replication to servers is virtually synchronous, if there is any latency, does ProxySQL detect this and select a node accordingly?

A. PXC nodes are virtually synchronous, which effectively means while the apply/commit of a transaction may be in progress on one node, other nodes may have completed applying it. There is no direct way for ProxySQL to know about this, but it could be traced by looking at wsrep_last_applied and wsrep_last_committed. Also, if a user expects to always fetch updated data, then a wsrep_sync_wait configuration can be used.

Q. Hello, do you suggest geoReplication / wan clustering for an e-commerce website? Let ‘s say served by an Italian pxc cluster and served by a US PXC cluster?

A. Geo-distributed PXC is already in use by a lot of customers, and is meant to exactly serve the use-case you have pointed out. An important aspect of geo-distributed clustering (that often gets missed) is to configure timeout and window setting to accommodate network latency and segment settings. There is also a separate webinar on this topic and you can surely get in touch with us to find out more details on how to configure it correctly.

Q. Can we add a read-only node with PXC?

A. You can simply mark the selected nodes as super_read_only (or read_only). Replication continues as normal but direct traffic is blocked.

Q. Does the ProxySQL impact performance?

A. Using all of ProxySQL’s features gives you a huge performance improvement. Here is the sample use case.

Q. I have not had “excellent” results with Drupal. (e.g., clearing cache sometimes causes corruption, although i ensure all tables do have a primary key). Any advice on its suitability? I am currently using proxySQL with a single percona (non-cluster) 5.7 but would like to try again with PXC if advisable.

A. Not sure what exact problem you faced, but you may want to check this variable and articles around it wsrep_drupal_282555_workaround.

Q. Another question (to queue up as you are able to answer if possible): do you recommend SSL between ProxySQL — and specifically, what are the performance impacts, especially if there’s some latency between proxySQL and master percona db for writes?

A. We recommend SSL for security reasons, but it depends on the individual setup. Currently, ProxySQL does not support SSL from frontends. This feature is only available since 2.0.

Q1. Can i put two PXC clusters in master-slave replication mode with automatic failover?

Q2. How can i setup two PXC clusters in master-slave replication model with automatic failover?

A. You can have async master-slave replication link among two PXC clusters, but automatic failover of the node (if the acting master from cluster-1 fails then another active nodes of the cluster takes over as master) is currently not supported.

Q. Can the replication be done from ProxySQL level, so that if one node goes down in slave PXC, another node in PXC will take over the slave role?

A. This feature is not supported through ProxySQL. You can monitor replication lag through ProxySQL.

Q. Suppose if i have 5 node cluster in DC1 & DC2, how can we make transaction successful as soon as nodes in DC1 are committed rather than waiting for certification from nodes in DC2?

A. Given the transaction is executed on the local node and during commit (as a pre-commit stage) it is replicated (replication action doesn’t include certification and commit) to the other nodes of the cluster. Once replicated each node can parallelly certify, apply and commit the transaction. So this effectively means a transaction doesn’t need to be certified on all the nodes of the cluster before communicating commit success to end-user. Once the transaction is replicated, originating node can complete the local commit and communicate success to the application.

Q. Hi, thank you for the webinar is ProxySQL support HA, is it a single point of failure?

A. ProxySQL supports Native Clustering, thereby forming a ProxySQL cluster (vs. a single ProxySQL node) and in turn helps avoid a single point failure.

Q.  What is a good setup you will recommend, make proxySQL on some other server/vm or on the same as one of PXC nodes?

A. We would recommend installing ProxySQL on an independent node (or share with other applications). We don’t recommend installing ProxySQL on a PXC node. If the node hosting PXC and ProxySQL goes down (network or power failure), even though the cluster is working, the application will still lose connectivity as the ProxySQL gateway goes down as well.

Q. Let’s say we have three nodes, good quorum, what happened when one node goes down for maintenance what happens to the quorum since only two nodes now?

A. Two nodes can still form the quorum and continue servicing the workload.

Q If the transaction is not committed to all the nodes then will the cluster remains locked for read too?

A. No. The transaction commit is independent of a read action. “transaction commit” can continue in the background and the user can continue to read from the cluster node. If a user has configured wsrep_sync_wait, which effectively means wait for a transaction to get committed to fetch updated data only, then the read may wait for transaction commit to complete.

Q. Is there a way to do partitioning over data? To not have 100% replicate in each master?

A. PXC/Galera, being a multi-master solution, doesn’t recommend unsync data nodes. As an end-user you can still achieve it by setting wsrep_on=off -> execute a workload (this will not be replicated on the cluster) -> wsrep_on=on (all action post this point will again follow replication). This can lead to data inconsistency, though,  and shutdown of the cluster if the workload or action are not properly segregated – so not-recommended.

Q. Are changes done by triggers rollbackable?

A. Yes, they are.

Q. Does ProxySQL prevent “mysql server gone away” in mostly idle daemons?

A. ProxySQL Monitor Module regularly probes the backend nodes and marks the node as OFFLINE in the ProxySQL database if MySQL server is down.

Q. Can proxysql cache rules use regexes?

A. We can use regex with ProxySQL query rules. Go here for more info.

Q. Can PMM be used in Digitalocean droplets?

A. Yes.

Q. In regards to PXC, how much delay is introduced when data is written since it has to appear on all nodes?

A. When a user initiates a transaction on given node (let’s call it an originating node), then it is first applied (not committed) on the said node and a binary write-set is created. This write-set is then replicated on other nodes of the cluster. Once the replication is successful, each node can independently certify, apply and commit the transaction. Since originating node has already applied the transaction, it just needs to certify and commit the transaction. But it is interesting to note that the apply stage on the other replicated node is fast too, given that the transaction is now packed in a database optimized apply format. In short, there would be no delay (or marginal delay). Delay could be higher if the transaction is a huge transaction, as the apply stage could take time. That is one of the reasons Galera doesn’t recommend huge transactions.

Q. How does PXC (Percona XtraDB Cluster) allow DDL (schema changes) on one server with DML on the same table on another server? (This can break MySQL Master-Master replication)?

A. PXC executes DDL using the TOI (total order isolated) protocol. In short, while DDL is executing it takes complete control of the node (no other parallel DML or DDL is allowed). DDL executes at the same position on all then does.

Q. Can ProxySQL split read-write queries based on stored procedure names (patterns)? e.g. sp_write vs sp_read?

A. ProxySQL read/write split is based on mysql_query_rules and hostgroups. For more info.

Q. Can we use ProxySQL with a single node for the query caching feature? Especially since query cache will be discontinued in MySQL 8?

A. If you configure Query Cache properly, you can cache queries for a single node. 

Q. Must binary logging be enabled for ProxySQL / PXC to work?

A. PXC replicates write-sets. While binary logging is not needed, PXC still needs these write-sets that are generated using binary logging module so PXC can then enable emulation based bin logs for a generation of these write-sets (persistence to disk is not needed). If disk space is not a constraint, we recommend you enable binary logging.

Q. Please define Galera and Percona, as well as the relationship between the two?

A. Galera is replication technology owned and developed by Codership and distributed under GPL license. Percona has adopted the said technology along with its Percona Server for MySQL and build PXC. Percona continues to refresh updates made to Galera and related wsrep-plugin on a regular basis. At the same time, Percona also continues to refresh from Percona-Server for MySQL for related enhancement and bug fixes.

Q. Is the ProxySql Admin tool the script/tool that you mentioned would autodetect your existing PXC or there’s a different script? Trying to know if you need to have the PXC and ProxySQL installed at the same time?

With ProxySQL, do we need to wait for active threads on the PXC to drain before shutting down the PXC?

A. ProxySQL Admin (proxysql-admin) script helps you configure your PXC nodes to ProxySQL database. PXC and ProxySQL should be up and running to initiate proxysql-admin script. For more info.

If you trigger PXC node shutdown proxysql_galera_checker script marks the node as offline in the ProxySQL DB, and new connections aren’t redirected to the offline node.

Q. 1) HAProxy and ProxySQL: which one has the better performance when the number of Clusters is large? Up to 30 Clusters?
2) What´s the better tool to monitoring a large number of clusters and nodes?

A. For PXC, we strongly recommend ProxySQL as it is closely integrated with PXC. HAProxy works with PXC as well, and before ProxySQL we had customers using it. For a quick comparison, you can take a look at following article.

Q. When the PXC settings have a maximum connection how does ProxySQL allow for much more than the standard connections?

A. ProxySQL terminates the connection with a connection timeout error.

FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_insert.lua:61: SQL error, errno = 9001, state = 'HY000': Max connect timeout reached while reaching hostgroup 10 after 10012ms


Once again, thanks for your questions and queries. If you still have more questions or need clarification, you can log them at the percona-xtradb-cluster forum. We would also like to know what else you expect from Percona XtraDB Cluster in upcoming releases.

The post Q&A: “Percona XtraDB Cluster 5.7 and ProxySQL for Your High Availability Needs” Webinar appeared first on Percona Database Performance Blog.

Causes and Workarounds for Slave Performance Too Slow with Row-Based Events

Recently I worked on one customer issue that I would describe as “slave performance too slow”. During a quick analysis, I’ve found that the replication slave SQL thread cannot keep up while processing row-based events from the master’s binary log.

For example:

mysql> SHOW SLAVE STATUSG *************************** 1. row *************************** ... Master_Log_File: binlog.0000185 Read_Master_Log_Pos: 86698585 ... Relay_Master_Log_File: binlog.0000185 Slave_IO_Running: Yes Slave_SQL_Running: Yes ... Exec_Master_Log_Pos: 380 Relay_Log_Space: 85699128 ... Master_UUID: 98974e7f-2fbc-18e9-72cd-07003817585c ... Retrieved_Gtid_Set: 98974e7f-2fbc-18e9-72cd-07003817585c:1055-1057 Executed_Gtid_Set: 7f42e2c5-3fbc-16e7-7fb8-05003715789a:1-2, 98974e7f-2fbc-18e9-72cd-07003817585c:1-1056 ...

The processlist state for the SQL thread can be one of the following: Reading event from the relay log, or System lock, or potentially some other state. In my case:

mysql> SHOW PROCESSLIST; +----+-----------------+-----------------+------+---------+------+----------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-----------------+-----------------+------+---------+------+----------------------------------+------------------+ ... | 4 | system user | | NULL | Connect | 268 | Reading event from the relay log | NULL | ... +----+-----------------+-----------------+------+---------+------+----------------------------------+------------------+

What causes that?

Let’s take a look what could potentially cause such behavior and what we need to pay attention to. When the SQL thread applies the change from a row-based event, it has to locate the exact row that was updated. With a primary key, this is trivial as only one row can possibly have the same value for the primary key.

However, if there is no primary key on the table on the replication slave side, the SQL thread must search the entire table to locate the row to update or delete. It repeats the search for each updated row. This search is both very resource usage intensive (CPU usage can be up to 100%) and slow causing the slave to fall behind.

For InnoDB tables, the “hidden” key used for the clustered index for tables without a primary key cannot be used to avoid searching the entire table for the rows to update or delete. We need to keep in mind that the “hidden” key is unique only to each MySQL instance, so the replication master and replication slave generally don’t have the same values for the “hidden” key for the same row. What can we do to solve that?

The best solution is to ensure that all tables have a primary key. This not only ensures the SQL thread can easily locate rows to update or delete, but it is also considered as a best practice since it ensures all rows are unique.

If there is no way to logically add a natural primary key for the table, a potential solution is to add an auto-increment unsigned integer column as the primary key.

The query below helps you to locate tables without a primary key:

SELECT tables.table_schema, tables.table_name, tables.table_rows FROM information_schema.tables LEFT JOIN ( SELECT table_schema, table_name FROM information_schema.statistics GROUP BY table_schema, table_name, index_name HAVING SUM( CASE WHEN non_unique = 0 AND nullable != 'YES' THEN 1 ELSE 0 END ) = COUNT(*) ) puks ON tables.table_schema = puks.table_schema AND tables.table_name = puks.table_name WHERE puks.table_name IS NULL AND tables.table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys') AND tables.table_type = 'BASE TABLE' AND engine='InnoDB';

Please note that for InnoDB, there must always be a unique NOT NULL key for all tables. It is required for the clustered index. So adding an explicit “dummy” column as suggested above will not add to the overall storage requirements as it will merely replace the hidden key.

It’s not always possible to add a primary key to the table immediately if, for example, there are many relations on the application side/legacy system, lack of resources, unknown application behavior after the change which required testing, etc.

In this case, a short-term solution is to change the search algorithm used by the replication slave to locate the rows changed by row-based events.

The search algorithm is set using the slave_rows_search_algorithms option which is available in MySQL 5.6 and later. The default value is to use an index scan if possible, otherwise a table scan.

However, for tables without a primary key using a hash scan, which causes the SQL thread to temporarily cache hashes to reduce the overhead of searching the whole table. The value of slave_rows_search_algorithms can be changed dynamically using:

mysql> SET GLOBAL slave_rows_search_algorithms = 'INDEX_SCAN,HASH_SCAN';
Just to note INDEX_SCAN,HASH_SCAN is the default value in MySQL 8.0.

One thing to be aware of when using hash scans that the hashes are only reused within one row-based event. (Each row-based event may have changes to several rows in the same table originating from the same SQL statement).

The binlog_row_event_max_size option on the replication master controls the maximum size of a row-based event. The default max event size is 8kB. This means that switching to hash scans only improves the performance of the SQL thread when:

  1. Several rows fit into one row based event. It may help to increase the value of binlog_row_event_max_size on the replication master, if you perform updates or deletes on large rows (e.g., with blob or text data). You can only set the binlog_row_event_max_size in the MySQL configuration file, and resetting this value requires a restart.
  2. One statement changes several rows.


Even if enabling hash scans improves the performance enough for the replication slave to keep up, the permanent solution is to add an explicit primary key to each table. This should be the general rule of thumb in the schema design in order avoid and/or minimize many issues like slave performance too slow (as described in this post).

Next, I am going to investigate how we can find out the exact thread state using Performance Schema in order to make issue identification less of a guessing game.

The post Causes and Workarounds for Slave Performance Too Slow with Row-Based Events appeared first on Percona Database Performance Blog.

MySQL Shell 8.0 – What’s New?

MySQL Document Store

In MySQL 8.0 a new stack of components is introduced, turning MySQL to a competitive Document Store solution.

The MySQL Document Store takes advantage of existing capabilities including:

  • Efficient and safe data management from the InnoDB storage engine
  • ACID compliance and Transactional Operations
  • High Availability with Group Replication and InnoDB cluster

One of the most attractive components introduced in MySQL 8.0 is a development API (DevAPI) that enables you to create JSON document based applications through a fluent API.…

MariaDB 10.0.35, MariaDB Galera Cluster 5.5.60 and MariaDB Connector C 3.0.4 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.0.35, MariaDB Galera Cluster 5.5.60 as well as MariaDB Connector/C 3.0.4, all stable releases. See the release notes and changelogs for details. Download MariaDB 10.0.35 Release Notes Changelog What is MariaDB 10.0? MariaDB APT and YUM Repository Configuration Generator Download MariaDB Galera Cluster 5.5.60 […]

The post MariaDB 10.0.35, MariaDB Galera Cluster 5.5.60 and MariaDB Connector C 3.0.4 now available appeared first on

Manage Your MySQL Database Credentials with Vault

Any software system requires secrets to run, be it credentials to communicate with databases, tokens to access resources, or encryption keys that need to be distributed. Secrets management means to deal with all kinds of secrets in a structured and secure way. In this blog entry we will show how to use Hashicorp Vault to […]

MySQL master discovery methods, part 1: DNS

This is the first in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master.

These posts are not concerned with the manner by which the replication failure detection and recovery take place. I will share orchestrator specific configuration/advice, and point out where cross DC orchestrator/raft setup plays part in discovery itself, but for the most part any recovery tool such as MHA, replication-manager, severalnines or other, is applicable.

We discuss asynchronous (or semi-synchronous) replication, a classic single-master-multiple-replicas setup. A later post will briefly discuss synchronous replication (Galera/XtraDB Cluster/InnoDB Cluster).

Master discovery via DNS

In DNS master discovery applications connect to the master via a name that gets resolved to the master's box. By way of example, apps would target the masters of different clusters by connecting to,, etc. It is up for the DNS to resolve those names to IPs.

Issues for concern are:

  • You will likely have multiple DNS servers. How many? In which data centers / availability zones?
  • What is your method for distributing/deploying a name change to all your DNS servers?
  • DNS will indicate a TTL (Time To Live) such that clients can cache the IP associated with a name for a given number of seconds. What is that TTL?

As long as things are stable and going well, discovery via DNS makes sense. Trouble begins when the master fails over. Assume M used to be the master, but got demoted. Assume R used to be a replica, that got promoted and is now effectively the master of the topology.

Our failover solution has promoted R, and now needs to somehow apply the change, such that the apps connect to R instead of M. Some notes:

  • The apps need not change configuration. They should still connect to,, etc.
  • Our tool instructs DNS servers to make the change.
  • Clients will still resolve to old IP based on TTL.
A non planned failover illustration #1

Master M dies. R gets promoted. Our tool instructs all DNS servers on all DCs to update the IP address.

Say TTL is 60 seconds. Say update to all DNS servers takes 10 seconds. We will have between 10 and 70 seconds until all clients connect to the new master R.

During that time they will continue to attempt connecting to M. Since M is dead, those attempts will fail (thankfully).

A non planned failover illustration #2

Master M gets network isolated for 30 seconds, during which time we failover. R gets promoted. Our tool instructs all DNS servers on all DCs to update the IP address.

Again, assume TTL is 60 seconds. As before, it will take between 10 and 70 seconds for clients to learn of the new IP.

Clients who will require between 40 and 70 seconds to learn of the new IP will, however, hit an unfortunate scenario: the old master M reappears on the grid. Those clients will successfully reconnect to M and issue writes, leading to data loss (writes to M no longer replicate anywhere).

Planned failover illustration

We wish to replace the master, for maintenance reasons. We successfully and gracefully promote R. We need to change DNS records. Since this is a planned failover, we set the old master to read_only=1, or even better, we network isolated it.

And still our clients take 10 to 70 seconds to recognize the new master.


The above numbers are just illustrative. Perhaps DNS deployment is quicker than 10 seconds. You should do your own math.

TTL is a compromise which you can tune. Setting lower TTL will mitigate the problem, but will cause more hits on the DNS servers.

For planned takeover we can first deploy a change to the TTL, to, say, 2sec, wait 60sec, then deploy the IP change, then restore TTL to 60.

You may choose to restart apps upon DNS deployment. This emulates apps' awareness of the change.

Sample orchestrator configuration

orchestrator configuration would look like this:

"ApplyMySQLPromotionAfterMasterFailover": true, "PostMasterFailoverProcesses": [ "/do/what/you/gotta/do to apply dns change for {failureClusterAlias} to {successorHost}" ],

In the above:

  • ApplyMySQLPromotionAfterMasterFailover instructs orchestrator to set read_only=0; reset slave all on promoted server.
  • PostMasterFailoverProcesses really depends on your setup. But orchestrator will supply with hints to your scripts: identity of cluster, identity of successor.

See orchestrator configuration documentation.

Axis Order in Spatial Reference Systems

MySQL 8.0 has many new GIS features, including a catalog of spatial reference systems (SRSs) and support for geographic (latitude-longitude) computations.

In a Cartesian SRS, it doesn’t really matter which coordinate is on which axis. The axes are orthogonal and the units are the same on both axes, so if a user consistently puts the X value in the Y coordinate and the Y value in the X coordinate, it doesn’t affect computations (in the functions MySQL currently supports).…

ProxySQL Query Rewrite Use Case

In this blog post, I’m going to revisit the ProxySQL Query Rewrite feature. You may have seen me talking about possible use case scenarios in the past few conferences, but the reason I’m starting with this is that query rewriting was the original intention for building ProxySQL.

Why would you need to rewrite a query?

  • You’ve identified a query that’s causing bottleneck or slowness
  • A special operation requires query routing
  • You cannot modify application code

So here we have a case of a bad query hitting the backend database. You as a DBA have identified the query as causing severe slowdown, which could lead to a site-wide outage. This query needs to be optimized, and you have asked the developer to correct this bad query. Their answer isn’t really what you expected. You can rewrite some queries to have the same data result by choosing a different optimizer path. In cases where an application was written in ORM – such as Hibernate or similar – it is not easy to quickly make a code change.

The query rewrite feature of ProxySQL makes this possible (until the application can be modified).

How do we rewrite a query? There are two ways to accomplish this with ProxySQL.

Query rewrite is just a match_pattern + replace_pattern activity, whereas match_digest is only used for matching a query, not rewriting it. Logically, match_digest serves the same purpose of username, schemaname, proxy_addr, etc. It only matches the query.

These two different mechanisms offers ways to optimize query matching operation efficiently depending on the query type (such as DML operation versus SELECT query). Please note that if your intention is to rewrite queries, the rule must match the original query by using match_pattern. Query rules are processed by using rule_id field and only applied if active = 1.

Here’s how we can demonstrate match_digest in our test lab:

mysql> SELECT hostgroup hg, sum_time, count_star, digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC limit 10; +----+-----------+------------+-----------------------------------+ | hg | sum_time | count_star | digest_text | +----+-----------+------------+-----------------------------------+ | 0 | 243549572 | 85710 | SELECT c FROM sbtest10 WHERE id=? | | 0 | 146324255 | 42856 | COMMIT | | 0 | 126643488 | 44310 | SELECT c FROM sbtest7 WHERE id=? | | 0 | 126517140 | 42927 | BEGIN | | 0 | 123797307 | 43820 | SELECT c FROM sbtest1 WHERE id=? | | 0 | 123345775 | 43460 | SELECT c FROM sbtest6 WHERE id=? | | 0 | 122121030 | 43010 | SELECT c FROM sbtest9 WHERE id=? | | 0 | 121245265 | 42400 | SELECT c FROM sbtest8 WHERE id=? | | 0 | 120554811 | 42520 | SELECT c FROM sbtest3 WHERE id=? | | 0 | 119244143 | 42070 | SELECT c FROM sbtest5 WHERE id=? | +----+-----------+------------+-----------------------------------+ 10 rows in set (0.00 sec) mysql> INSERT INTO mysql_query_rules (rule_id,active,username,match_digest, match_pattern,replace_pattern,apply) VALUES (10,1,'root','SELECT.*WHERE id=?','sbtest2','sbtest10',1); Query OK, 1 row affected (0.00 sec) mysql> LOAD MYSQL QUERY RULES TO RUNTIME; Query OK, 0 rows affected (0.00 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | cache_ttl | apply | +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ | 0 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | NULL | 1 | +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ 1 row in set (0.00 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | cache_ttl | apply | +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ | 593 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | NULL | 1 | +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ 1 row in set (0.00 sec)

We can also monitor Query Rules activity live using the ProxyTop utility:

To reset ProxySQL’s statistics for query rules, use following steps:

mysql> SELECT 1 FROM stats_mysql_query_digest_reset LIMIT 1; +---+ | 1 | +---+ | 1 | +---+ 1 row in set (0.01 sec) mysql> LOAD MYSQL QUERY RULES TO RUNTIME; Query OK, 0 rows affected (0.00 sec)

Here’s a match_pattern example:

mysql> SELECT hostgroup hg, sum_time, count_star, digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC limit 5; +----+----------+------------+----------------------------------+ | hg | sum_time | count_star | digest_text | +----+----------+------------+----------------------------------+ | 0 | 98753983 | 16292 | BEGIN | | 0 | 84613532 | 16232 | COMMIT | | 1 | 49327292 | 16556 | SELECT c FROM sbtest3 WHERE id=? | | 1 | 49027118 | 16706 | SELECT c FROM sbtest2 WHERE id=? | | 1 | 48095847 | 16396 | SELECT c FROM sbtest4 WHERE id=? | +----+----------+------------+----------------------------------+ 5 rows in set (0.01 sec) mysql> INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply) VALUES (20,1,'root','DISTINCT(.*)ORDER BY c','DISTINCT1',1); Query OK, 1 row affected (0.00 sec) mysql> LOAD MYSQL QUERY RULES TO RUNTIME; Query OK, 0 rows affected (0.01 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | cache_ttl | apply | +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ | 0 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | NULL | 1 | | 0 | 20 | NULL | 1 | root | NULL | DISTINCT(.*)ORDER BY c | DISTINCT1 | NULL | 1 | +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ 2 rows in set (0.01 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | cache_ttl | apply | +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ | 9994 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | NULL | 1 | | 6487 | 20 | NULL | 1 | root | NULL | DISTINCT(.*)ORDER BY c | DISTINCT1 | NULL | 1 | +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ 2 rows in set (0.00 sec) mysql> SELECT 1 FROM stats_mysql_query_digest_reset LIMIT 1; +---+ | 1 | +---+ | 1 | +---+ 1 row in set (0.00 sec) mysql> LOAD MYSQL QUERY RULES TO RUNTIME; Query OK, 0 rows affected (0.00 sec)

The key in query ruling for a rewrite is the order of the apply field:

  • apply = 1 means don’t evaluate any other rules if there’s a match already.
  • apply = 0 means evaluate the next rules in the chain.

As we can see in the test below, all queries matching with rule_id = 10 or rule_id = 20 have hits. In reality, all rules in runtime_mysql_query_rules are active. If we want to disable a rule that is in the mysql_query_rules table, set active = 0:

mysql> update mysql_query_rules set apply = 1 where rule_id in (10); Query OK, 1 row affected (0.00 sec) mysql> update mysql_query_rules set apply = 0 where rule_id in (20); Query OK, 1 row affected (0.00 sec) mysql> LOAD MYSQL QUERY RULES TO RUNTIME; Query OK, 0 rows affected (0.00 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | cache_ttl | apply | +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ | 0 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | NULL | 1 | | 0 | 20 | NULL | 1 | root | NULL | DISTINCT(.*)ORDER BY c | DISTINCT1 | NULL | 0 | +------+---------+--------+--------+----------+--------------------+------------------------+-----------------+-----------+-------+ 2 rows in set (0.00 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, flagIN, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | flagIN | apply | +-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+ | 10195 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | 0 | 1 | | 6599 | 20 | NULL | 1 | root | NULL | DISTINCT(.*)ORDER BY c | DISTINCT1 | 0 | 0 | +-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+ 2 rows in set (0.00 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, flagIN, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | flagIN | apply | +-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+ | 20217 | 5 | NULL | 1 | root | NULL | DISTINCT(.*)ORDER BY c | DISTINCT1 | 0 | 1 | | 27020 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | 0 | 0 | +-------+---------+--------+--------+----------+--------------------+------------------------+-----------------+--------+-------+ 2 rows in set (0.00 sec) mysql> update mysql_query_rules set active = 0 where rule_id = 5; Query OK, 1 row affected (0.00 sec) mysql> LOAD MYSQL QUERY RULES TO RUNTIME; Query OK, 0 rows affected (0.02 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | cache_ttl | apply | +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ | 0 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | NULL | 0 | +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ 1 row in set (0.00 sec) mysql> SELECT hits, mysql_query_rules.rule_id,digest,active,username, match_digest, match_pattern, replace_pattern, cache_ttl, apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ | hits | rule_id | digest | active | username | match_digest | match_pattern | replace_pattern | cache_ttl | apply | +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ | 4224 | 10 | NULL | 1 | root | SELECT.*WHERE id=? | sbtest2 | sbtest10 | NULL | 0 | +------+---------+--------+--------+----------+--------------------+---------------+-----------------+-----------+-------+ 1 row in set (0.01 sec)

Additionally, ProxySQL can help to identify bad queries. Login to the admin module and follow these steps:

Find the most time-consuming queries:

mysql> SELECT SUM(sum_time), SUM(count_star), digest_text FROM stats_mysql_query_digest GROUP BY digest ORDER BY SUM(sum_time) DESC LIMIT 3G *************************** 1. row *************************** SUM(sum_time): 95053795 SUM(count_star): 13164 digest_text: BEGIN *************************** 2. row *************************** SUM(sum_time): 85094367 SUM(count_star): 13130 digest_text: COMMIT *************************** 3. row *************************** SUM(sum_time): 52110099 SUM(count_star): 13806 digest_text: SELECT c FROM sbtest3 WHERE id=? 3 rows in set (0.00 sec)

Find highest average execution time:

mysql> SELECT SUM(sum_time), SUM(count_star), SUM(sum_time)/SUM(count_star) avg, digest_text FROM stats_mysql_query_digest GROUP BY digest ORDER BY SUM(sum_time)/SUM(count_star) DESC limit 1; +---------------+-----------------+--------+--------------------------------+ | SUM(sum_time) | SUM(count_star) | avg | digest_text | +---------------+-----------------+--------+--------------------------------+ | 972162 | 1 | 972162 | CREATE INDEX k_5 ON sbtest5(k) | +---------------+-----------------+--------+--------------------------------+ 1 row in set (0.00 sec)

The above information can also be gathered from information_schema.events_statements_summary_by_digest, but I prefer the ProxySQL admin interface. Also, you can run the slow query log analysis by running a detailed pt-query-digest on your system to identify slow queries. You can also use PMM’s QAN.


I’ve found the best documentation on ProxySQL query rewrite is at IBM’s site, where they explain query rewrite fundamentals with examples. It’s worth a read. I’m not going to get into the details of these techniques here, but if you find more relevant resources, please post them in the comments section.

A few of the possible query optimization techniques:

  • Operation merging
  • Operation movement
  • Predicate translation

At the time of this blog post, ProxySQL has also announced a new fast schema routing algorithm to support thousands of shards.

There may be other cases where you want to divert traffic to another table. Think of a table hitting the maximum integer value, and you want to keep inserts going into a new table while you alter the old one to correct the issue. In the mean time, all selects can still point to the old table to continue operation.

As of MySQL 5.7.6, Oracle also offers query rewrite as a plugin, and you can find the documentation here. The biggest disadvantage of using Oracle’s built-in solution is the rewrite rule sits with the server it is implemented on. That’s where ProxySQL has a bigger advantage: it sits between the application and database server, so the rule applies to the entire topology, not just for a single host.

As you can see, ProxySQL query rewrite is a great way to solve some real operational issues and make you a hero to the team and project. To become a rock star, you might want to consider Percona Training on ProxySQL. The training will provide the knowledge to set up a ProxySQL environment with best practices, understand when and how to change the configuration, and maintain it to ensure increasing your uptime SLAs. Contact us for more details at


The post ProxySQL Query Rewrite Use Case appeared first on Percona Database Performance Blog.

How to Overcome Accidental Data Deletion in MySQL & MariaDB

Someone accidently deleted part of the database. Someone forgot to include a WHERE clause in a DELETE query, or they dropped the wrong table. Things like that may and will happen, it is inevitable and human. But the impact can be disastrous. What can you do to guard yourself against such situations, and how can you recover your data? In this blog post, we will cover some of the most typical cases of the data loss, and how you can prepare yourself so you can recover from them.


There are things you should do in order to ensure a smooth recovery. Let’s go through them. Please keep in mind that it’s not “pick one” situation - ideally you will implement all of the measures we are going to discuss below.


You have to have a backup, there is no getting away from it. You should have your backup files tested - unless you test your backups, you cannot be sure if they are any good and if you will ever be able to restore them. For disaster recovery you should keep a copy of your backup somewhere outside of your datacenter - just in case the whole datacenter becomes unavailable. To speed up the recovery, it’s very useful to keep a copy of the backup also on the database nodes. If your dataset is large, copying it over the network from a backup server to the database node which you want to restore may take significant time. Keeping the latest backup locally may significantly improve recovery times.

Logical Backup

Your first backup, most likely, will be a physical backup. For MySQL or MariaDB, it will be either something like xtrabackup or some sort of filesystem snapshot. Such backups are great for restoring a whole dataset or for provisioning new nodes. However, in case of deletion of a subset of data, they suffer from significant overhead. First of all, you are not able to restore all of the data, or else you will overwrite all changes that happened after the backup was created. What you are looking for is the ability to restore just a subset of data, only the rows which were accidentally removed. To do that with a physical backup, you would have to restore it on a separate host, locate removed rows, dump them and then restore them on the production cluster. Copying and restoring hundreds of gigabytes of data just to recover a handful of rows is something we would definitely call a significant overhead. To avoid it you can use logical backups - instead of storing physical data, such backups store data in a text format. This makes it easier to locate the exact data which was removed, which can then be restored directly on the production cluster. To make it even easier, you can also split such logical backup in parts and backup each and every table to a separate file. If your dataset is large, it will make sense to split one huge text file as much as possible. This will make the backup inconsistent but for the majority of the cases, this is no issue - if you will need to restore the whole dataset to a consistent state, you will use physical backup, which is much faster in this regard. If you need to restore just a subset of data, the requirements for consistency are less stringent.

Point-In-Time Recovery

Backup is just a beginning - you will be able to restore your data to the point at which the backup was taken but, most likely, data was removed after that time. Just by restoring missing data from the latest backup, you may lose any data that was changed after the backup. To avoid that you should implement Point-In-Time Recovery. For MySQL it basically means you will have to use binary logs to replay all the changes which happened between the moment of the backup and the data loss event. The below screenshot shows how ClusterControl can help with that.

What you will have to do is to restore this backup up to the moment just before the data loss. You will have to restore it on a separate host in order not to make changes on the production cluster. Once you have the backup restored, you can log into that host, find the missing data, dump it and restore on the production cluster.

Delayed Slave

All of the methods we discussed above have one common pain point - it takes time to restore the data. It may take longer, when you restore all of the data and then try to dump only the interesting part. It may take less time if you have logical backup and you can quickly drill down to the data you want to restore, but it is by no means a quick task. You still have to find a couple of rows in a large text file. The larger it is, the more complicated the task gets - sometimes the sheer size of the file slows down all actions. One method to avoid those problems is to have a delayed slave. Slaves typically try to stay up to date with the master but it is also possible to configure them so that they will keep a distance from their master. In the below screenshot, you can see how to use ClusterControl to deploy such a slave:

In short, we have here an option to add a replication slave to the database setup and configure it to be delayed. In the screenshot above, the slave will be delayed by 3600 seconds, which is one hour. This lets you to use that slave to recover the removed data up to one hour from the data deletion. You will not have to restore a backup, it will be enough to run mysqldump or SELECT ... INTO OUTFILE for the missing data and you will get the data to restore on your production cluster.

ClusterControl Single Console for Your Entire Database Infrastructure Find out what else is new in ClusterControl Install ClusterControl for FREE Restoring Data

In this section, we will go through a couple of examples of accidental data deletion and how you can recover from them. We will walk through recovery from a full data loss, we will also show how to recover from a partial data loss when using physical and logical backups. We will finally show you how to restore accidentally deleted rows if you have a delayed slave in your setup.

Full Data Loss

Accidental “rm -rf” or “DROP SCHEMA myonlyschema;” has been executed and you ended up with no data at all. If you happened to also remove files other than from the MySQL data directory, you may need to reprovision the host. To keep things simpler we will assume that only MySQL has been impacted. Let’s consider two cases, with a delayed slave and without one.

No Delayed Slave

In this case the only thing thing we can do is to restore the last physical backup. As all of our data has been removed, we don’t need to be worried about activity which happened after the data loss because with no data, there is no activity. We should be worried about the activity which happened after the backup took place. This means we have to do a Point-in-Time restore. Of course, it will take longer than to just restore data from the backup. If bringing your database up quickly is more crucial than to have all of the data restored, you can as well just restore a backup and be fine with it.

First of all, if you still have access to binary logs on the server you want to restore, you can use them for PITR. First, we want to convert the relevant part of the binary logs to a text file for further investigation. We know that data loss happened after 13:00:00. First, let’s check which binlog file we should investigate:

root@vagrant:~# ls -alh /var/lib/mysql/binlog.* -rw-r----- 1 mysql mysql 1.1G Apr 23 10:32 /var/lib/mysql/binlog.000001 -rw-r----- 1 mysql mysql 1.1G Apr 23 10:33 /var/lib/mysql/binlog.000002 -rw-r----- 1 mysql mysql 1.1G Apr 23 10:35 /var/lib/mysql/binlog.000003 -rw-r----- 1 mysql mysql 1.1G Apr 23 10:38 /var/lib/mysql/binlog.000004 -rw-r----- 1 mysql mysql 1.1G Apr 23 10:39 /var/lib/mysql/binlog.000005 -rw-r----- 1 mysql mysql 1.1G Apr 23 10:41 /var/lib/mysql/binlog.000006 -rw-r----- 1 mysql mysql 1.1G Apr 23 10:43 /var/lib/mysql/binlog.000007 -rw-r----- 1 mysql mysql 1.1G Apr 23 10:45 /var/lib/mysql/binlog.000008 -rw-r----- 1 mysql mysql 1.1G Apr 23 10:47 /var/lib/mysql/binlog.000009 -rw-r----- 1 mysql mysql 1.1G Apr 23 10:49 /var/lib/mysql/binlog.000010 -rw-r----- 1 mysql mysql 1.1G Apr 23 10:51 /var/lib/mysql/binlog.000011 -rw-r----- 1 mysql mysql 1.1G Apr 23 10:53 /var/lib/mysql/binlog.000012 -rw-r----- 1 mysql mysql 1.1G Apr 23 10:55 /var/lib/mysql/binlog.000013 -rw-r----- 1 mysql mysql 1.1G Apr 23 10:57 /var/lib/mysql/binlog.000014 -rw-r----- 1 mysql mysql 1.1G Apr 23 10:59 /var/lib/mysql/binlog.000015 -rw-r----- 1 mysql mysql 306M Apr 23 13:18 /var/lib/mysql/binlog.000016

As can be seen, we are interested in the last binlog file.

root@vagrant:~# mysqlbinlog --start-datetime='2018-04-23 13:00:00' --verbose /var/lib/mysql/binlog.000016 > sql.out

Once done, let’s take a look at the contents of this file. We will search for ‘drop schema’ in vim. Here’s a relevant part of the file:

# at 320358785 #180423 13:18:58 server id 1 end_log_pos 320358850 CRC32 0x0893ac86 GTID last_committed=307804 sequence_number=307805 rbr_only=no SET @@SESSION.GTID_NEXT= '52d08e9d-46d2-11e8-aa17-080027e8bf1b:443415'/*!*/; # at 320358850 #180423 13:18:58 server id 1 end_log_pos 320358946 CRC32 0x487ab38e Query thread_id=55 exec_time=1 error_code=0 SET TIMESTAMP=1524489538/*!*/; /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/; drop schema sbtest /*!*/;

As we can see, we want to restore up to position 320358785. We can pass this data to the ClusterControl UI:

Delayed Slave

If we have a delayed slave and that host is enough to handle all of the traffic, we can use it and promote it to master. First though, we have to make sure it caught up with the old master up to the point of the data loss. We will use some CLI here to make it happen. First, we need to figure out on which position the data loss happened. Then we will stop the slave and let it run up to the data loss event. We showed how to get the correct position in the previous section - by examining binary logs. We can either use that position (binlog.000016, position 320358785) or, if we use a multithreaded slave, we should use GTID of the data loss event (52d08e9d-46d2-11e8-aa17-080027e8bf1b:443415) and replay queries up to that GTID.

First, let’s stop the slave and disable delay:

mysql> STOP SLAVE; Query OK, 0 rows affected (0.01 sec) mysql> CHANGE MASTER TO MASTER_DELAY = 0; Query OK, 0 rows affected (0.02 sec)

Then we can start it up to a given binary log position.

mysql> START SLAVE UNTIL MASTER_LOG_FILE='binlog.000016', MASTER_LOG_POS=320358785; Query OK, 0 rows affected (0.01 sec)

If we’d like to use GTID, the command will look different:

mysql> START SLAVE UNTIL SQL_BEFORE_GTIDS = ‘52d08e9d-46d2-11e8-aa17-080027e8bf1b:443415’; Query OK, 0 rows affected (0.01 sec)

Once the replication stopped (meaning all of the events we asked for have been executed), we should verify that the host contains the missing data. If so, you can promote it to master and then rebuild other hosts using new master as the source of data.

This is not always the best option. All depends on how delayed your slave is - if it is delayed by a couple of hours, it may not make sense to wait for it to catch up, especially if write traffic is heavy in your environment. In such case, it’s most likely faster to rebuild hosts using physical backup. On the other hand, if you have a rather small volume of traffic, this could be a nice way to actually quickly fix the issue, promote new master and get on with serving traffic, while the rest of the nodes are being rebuilt in the background.

Partial Data Loss - Physical Backup

In case of the partial data loss, physical backups can be inefficient but, as those are the most common type of a backup, it’s very important to know how to use them for partial restore. First step will always be to restore a backup up to a point in time before the data loss event. It’s also very important to restore it on a separate host. ClusterControl uses xtrabackup for physical backups so we will show how to use it. Let’s assume we ran the following incorrect query:

DELETE FROM sbtest1 WHERE id < 23146;

We wanted to delete just a single row (‘=’ in WHERE clause), instead we deleted a bunch of them (< in WHERE clause). Let’s take a look at the binary logs to find at which position the issue happened. We will use that position to restore the backup to.

mysqlbinlog --verbose /var/lib/mysql/binlog.000003 > bin.out

Now, let’s look at the output file and see what we can find there. We are using row-based replication therefore we will not see the exact SQL that was executed. Instead (as long as we will use --verbose flag to mysqlbinlog) we will see events like below:

### DELETE FROM `sbtest`.`sbtest1` ### WHERE ### @1=999296 ### @2=1009782 ### @3='96260841950-70557543083-97211136584-70982238821-52320653831-03705501677-77169427072-31113899105-45148058587-70555151875' ### @4='84527471555-75554439500-82168020167-12926542460-82869925404'

As can be seen, MySQL identifies rows to delete using very precise WHERE condition. Mysterious signs in the human-readable comment, “@1”, “@2”, mean “first column”, “second column”. We know that the first column is ‘id’, which is something we are interested in. We need to find a large DELETE event on a ‘sbtest1’ table. Comments which will follow should mention id of 1, then id of ‘2’, then ‘3’ and so on - all up to id of ‘23145’. All should be executed in a single transaction (single event in a binary log). After analysing the output using ‘less’, we found:

### DELETE FROM `sbtest`.`sbtest1` ### WHERE ### @1=1 ### @2=1006036 ### @3='123' ### @4='43683718329-48150560094-43449649167-51455516141-06448225399' ### DELETE FROM `sbtest`.`sbtest1` ### WHERE ### @1=2 ### @2=1008980 ### @3='123' ### @4='05603373460-16140454933-50476449060-04937808333-32421752305'

The event, to which those comments are attached started at:

#180427 8:09:21 server id 1 end_log_pos 29600687 CRC32 0x8cfdd6ae Xid = 307686 COMMIT/*!*/; # at 29600687 #180427 8:09:21 server id 1 end_log_pos 29600752 CRC32 0xb5aa18ba GTID last_committed=42844 sequence_number=42845 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= '0c695e13-4931-11e8-9f2f-080027e8bf1b:55893'/*!*/; # at 29600752 #180427 8:09:21 server id 1 end_log_pos 29600826 CRC32 0xc7b71da5 Query thread_id=44 exec_time=0 error_code=0 SET TIMESTAMP=1524816561/*!*/; /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/; BEGIN /*!*/; # at 29600826

So, we want to restore the backup up to the previous commit at position 29600687. Let’s do that now. We’ll use external server for that. We will restore backup up to that position and we will keep the restore server up and running so we can then later extract the missing data.

Once restore is completed, let’s make sure our data has been recovered:

mysql> SELECT COUNT(*) FROM sbtest.sbtest1 WHERE id < 23146; +----------+ | COUNT(*) | +----------+ | 23145 | +----------+ 1 row in set (0.03 sec)

Looks good. Now we can extract this data into a file which we will load back on the master.

mysql> SELECT * FROM sbtest.sbtest1 WHERE id < 23146 INTO OUTFILE 'missing.sql'; ERROR 1290 (HY000): The MySQL server is running with the --secure-file-priv option so it cannot execute this statement

Something is not right - this is because the server is configured to be able to write files only in a particular location - it’s all about security, we don’t want to let users save contents anywhere they like. Let’s check where we can save our file:

mysql> SHOW VARIABLES LIKE "secure_file_priv"; +------------------+-----------------------+ | Variable_name | Value | +------------------+-----------------------+ | secure_file_priv | /var/lib/mysql-files/ | +------------------+-----------------------+ 1 row in set (0.13 sec)

Ok, let’s try one more time:

mysql> SELECT * FROM sbtest.sbtest1 WHERE id < 23146 INTO OUTFILE '/var/lib/mysql-files/missing.sql'; Query OK, 23145 rows affected (0.05 sec)

Now it looks much better. Let’s copy the data to the master:

root@vagrant:~# scp /var/lib/mysql-files/missing.sql missing.sql 100% 1744KB 1.7MB/s 00:00

Now it’s time to load the missing rows on the master and test if it succeeded:

mysql> LOAD DATA INFILE '/var/lib/mysql-files/missing.sql' INTO TABLE sbtest.sbtest1; Query OK, 23145 rows affected (2.22 sec) Records: 23145 Deleted: 0 Skipped: 0 Warnings: 0 mysql> SELECT COUNT(*) FROM sbtest.sbtest1 WHERE id < 23146; +----------+ | COUNT(*) | +----------+ | 23145 | +----------+ 1 row in set (0.00 sec)

That’s all, we restored our missing data.

Partial Data Loss - Logical Backup

In the previous section, we restored lost data using physical backup and an external server. What if we had logical backup created? Let’s take a look. First, let’s verify that we do have a logical backup:

root@vagrant:~# ls -alh /root/backups/BACKUP-13/ total 5.8G drwx------ 2 root root 4.0K Apr 27 07:35 . drwxr-x--- 5 root root 4.0K Apr 27 07:14 .. -rw-r--r-- 1 root root 2.4K Apr 27 07:35 cmon_backup.metadata -rw------- 1 root root 5.8G Apr 27 07:35 mysqldump_2018-04-27_071434_complete.sql.gz

Yes, it’s there. Now, it’s time to decompress it.

root@vagrant:~# mkdir /root/restore root@vagrant:~# zcat /root/backups/BACKUP-13/mysqldump_2018-04-27_071434_complete.sql.gz > /root/restore/backup.sql

When you look into it, you will see that the data is stored in multi-value INSERT format. For example:

INSERT INTO `sbtest1` VALUES (1,1006036,'18034632456-32298647298-82351096178-60420120042-90070228681-93395382793-96740777141-18710455882-88896678134-41810932745','43683718329-48150560094-43449649167-51455516141-06448225399'),(2,1008980,'69708345057-48265944193-91002879830-11554672482-35576538285-03657113365-90301319612-18462263634-56608104414-27254248188','05603373460-16140454933-50476449060-04937808333-32421752305')

All we need to do now is to pinpoint where our table is located and then where the rows, which are of interest to us, are stored. First, knowing mysqldump patterns (drop table, create new one, disable indexes, insert data) let’s figure out which line contains CREATE TABLE statement for ‘sbtest1’ table:

root@vagrant:~/restore# grep -n "CREATE TABLE \`sbtest1\`" backup.sql > out root@vagrant:~/restore# cat out 971:CREATE TABLE `sbtest1` (

Now, using a method of trial and error, we need to figure out where to look for our rows. We’ll show you the final command we came up with. The whole trick is to try and print different range of lines using sed and then check if the latest line contains rows close to, but later than what we are searching for. In the command below we look for lines between 971 (CREATE TABLE) and 993. We also ask sed to quit once it reaches line 994 as the rest of the file is of no interest to us:

root@vagrant:~/restore# sed -n '971,993p; 994q' backup.sql > 1.sql root@vagrant:~/restore# tail -n 1 1.sql | less

The output looks like below:

INSERT INTO `sbtest1` VALUES (31351,1007187,'23938390896-69688180281-37975364313-05234865797-89299459691-74476188805-03642252162-40036598389-45190639324-97494758464','60596247401-06173974673-08009930825-94560626453-54686757363'),

This means that our row range (up to row with id of 23145) is close. Next, it’s all about manual cleaning of the file. We want it to start with the first row we need to restore:

INSERT INTO `sbtest1` VALUES (1,1006036,'18034632456-32298647298-82351096178-60420120042-90070228681-93395382793-96740777141-18710455882-88896678134-41810932745','43683718329-48150560094-43449649167-51455516141-06448225399')

And end up with the last row to restore:


We had to trim some of the unneeded data (it is multiline insert) but after all of this we have a file which we can load back on the master.

root@vagrant:~/restore# cat 1.sql | mysql -usbtest -psbtest -h10.0.0.101 sbtest mysql: [Warning] Using a password on the command line interface can be insecure.

Finally, last check:

mysql> SELECT COUNT(*) FROM sbtest.sbtest1 WHERE id < 23146; +----------+ | COUNT(*) | +----------+ | 23145 | +----------+ 1 row in set (0.00 sec)

All is good, data has been restored.

Partial Data Loss, Delayed Slave Related resources  How to Schedule Database Backups with ClusterControl  MySQL vs MariaDB vs Percona Server: Security Features Comparison  Capacity Planning for MySQL and MariaDB - Dimensioning Storage Size

In this case, we will not go through the whole process. We already described how to identify the position of a data loss event in the binary logs. We also described how to stop a delayed slave and start the replication again, up to a point before the data loss event. We also explained how to use SELECT INTO OUTFILE and LOAD DATA INFILE to export data from external server and load it on the master. That’s all you need. As long as the data is still on the delayed slave, you have to stop it. Then you need to locate the position before the data loss event, start the slave up to that point and, once this is done, use the delayed slave to extract data which was deleted, copy the file to master and load it to restore the data.


Restoring lost data is not fun, but if you follow the steps we went through in this blog, you will have a good chance of recovering what you lost.

Tags:  MySQL MariaDB human error data loss recovery

Authenticating Vault Against LDAP for Accessing MySQL Through ProxySQL

Earlier this year, I was presented with the challenge of streamlining user access to MySQL, allowing users self-serve access using their LDAP credentials, while logging all access. Of course, various MySQL forks allow for user auditing, but the solution is also needed to eventually support other data storage systems without native user auditing. This gave me the opportunity to do a trial integration of MySQL, Vault, ProxySQL, and LDAP; Vault would be used to dynamically create user accounts, and ProxySQL would be used to limit access and log activity. To evaluate the functionality and configuration of the integration, I used Docker to set up a test environment.

Below I will present the methods used to:

  1. Provision the Docker environment.
  2. Configure OpenLDAP.
  3. Initialize Vault and configure it to support LDAP and MySQL.
  4. Add users to ProxySQL.


1. Provision the Docker environment

I typically use docker-compose to spin up test environments, because it makes networking and container configuration easy. All files needed to set up the environment described here can be cloned from

From the directory containing the docker-compose-vault.yaml file, start the cluster with the following statement:

docker-compose -f docker-compose-vault.yaml up -d

I use Kitematic for an easy-to-manage list of running containers. From Kitematic, it is easy to jump into any container by clicking the EXEC button:

Initialization scripts

In the docker-compose yaml file, I have included several initialization helper scripts. If you were provisioning this kind of cluster for production, most of these things would already be set up – for example, replication would already be running and LDAP would be in place already. For this test environment, just run these four quick scripts.

This one sets up replication between the primary and replica containers:

source mysqlprimary mysqlreplica

This script is used to add the required monitor user for proxysql.


This one installs LDAP utilities on the app container (for various reasons, this was nontrivial to add to the Docker image):


Finally, this script is used to initialize Vault using three generated keys:


Example output:

Sealed: false Key Shares: 1 Key Threshold: 1 Unseal Progress: 0 Unseal Nonce:

Set up MySQL Authorization

I am ready to add MySQL to Vault. This section is an update to a previous Vault walk-through by Derek Downey on this blog (, using updated syntax and plugins.

vault mount database

Example output:

Successfully mounted 'database' at 'database'!

If you are using MySQL 5.6 or earlier, use the plugin_name=mysql-legacy-database-plugin, because those earlier versions limit the username to 16 characters. More on the error you will otherwise see below. Here, root is the username and password is the password. The hostname is mysqlprimary and the port is the default 3306.

vault write database/config/mysql \ plugin_name=mysql-database-plugin \ connection_url="root:password@tcp(mysqlprimary:3306)/" \ allowed_roles="readonly"

2. Configure OpenLDAP

Configure LDAP Authentication Backend

Next, I will configure LDAP to set up groups and users in openldap/phpldapadmin.

Using Kitematic, I can click directly to the web address hosting the phpldapadmin GUI, using the upward facing arrow icon in the WEB PREVIEW section on the right:

Or just visit this link in a browser:


Login is required:

The credentials configured in the docker-compose yaml, for the openldap container, are:

user = cn=admin,dc=proxysql,dc=com

password = password

Click login and enter these credentials:

Set up groups and users:

Create two Organisational Units: groups and users.

After these two OUs are created, the directory structure will look like this:

Then create a child entry under ou=groups called developers, and a child entry under ou=users called vthompson.

Click on ou=groups, and then choose Create a child entry.

Choose Posix Group for the group=developers.

Choose Generic: User Account for the user=vthompson.

Choose the right password type for the use. In testing, clear is fine.

Finish the rest of the fields:

Completed entries:

Add the memberUid attribute to the developers group:

Configure LDAP

The IP address of the openldap container needs to be added to several configuration statements below. Here is a way to get the IP address from the local machine running Docker:

docker inspect --format='{{ .NetworkSettings.Networks.adjacents_default.IPAddress }}' openldap

Example output:

Test the user created above directly against ldap. The IP address is that found above,; the username is vthompson and the password is thompson.

ldapwhoami -vvv -H ldap:// -D cn=vthompson,ou=users,dc=proxysql,dc=com -x -wthompson

Example output:

dn:cn=vthompson,ou=users,dc=proxysql,dc=com Result: Success (0)

If you get a failure at this point, one of two things may have happened. You may have forgotten to add the memberUid as a new attribute (see above), or you may have entered the password in phpldapadmin incorrectly. You can test the password using the check password functionality in the GUI:

Another way to ensure the LDAP configuration is done correctly is to test binddn against it; you should get an output of all groups and users set up in phpldapadmin. Here we are using the admin credentials.

ldapsearch -H ldap:// -x -D "cn=admin,dc=proxysql,dc=com" -w password -b "dc=proxysql,dc=com"

Example output:

# extended LDIF # # LDAPv3 # base <dc=proxysql,dc=com> with scope subtree # filter: (objectclass=*) # requesting: ALL # # dn: dc=proxysql,dc=com objectClass: top objectClass: dcObject objectClass: organization o: proxysql dc: proxysql # admin, dn: cn=admin,dc=proxysql,dc=com objectClass: simpleSecurityObject objectClass: organizationalRole cn: admin description: LDAP administrator userPassword:: e1NTSEF9bVI3NUhCRTl4dG1nbK0raHVCc0lGZnpINU0wVlRJK2c= # groups, dn: ou=groups,dc=proxysql,dc=com ou: groups objectClass: organizationalUnit objectClass: top # users, dn: ou=users,dc=proxysql,dc=com ou: users objectClass: organizationalUnit objectClass: top # developers, groups, dn: cn=developers,ou=groups,dc=proxysql,dc=com cn: developers gidNumber: 500 objectClass: posixGroup objectClass: top memberUid: vthompson # vthompson, users, dn: cn=vthompson,ou=users,dc=proxysql,dc=com givenName: Valerie sn: Thompson cn: vthompson uid: vthompson userPassword:: dGhvbRBzb24= uidNumber: 1000 gidNumber: 500 homeDirectory: /home/users/vthompson loginShell: /bin/sh objectClass: inetOrgPerson objectClass: posixAccount objectClass: top # search result search: 2 result: 0 Success # numResponses: 7 # numEntries: 6

3. Initialize Vault and configure it to support LDAP and MySQL.

Attach LDAP to Vault

Continue to do the next steps from the app container.

First, set up LDAP authentication inside Vault:

vault auth-enable ldap

Example output:

Successfully enabled 'ldap' at 'ldap'!


Verify LDAP was set up in Vault:

vault auth -methods

Example output:

Path Type Default TTL Max TTL Replication Behavior Description ldap/ ldap system system replicated token/ token system system replicated token based credentials


Next, write the LDAP configs to Vault. First, look at the default presets:

vault read auth/ldap/config

Example output:

Key Value --- ----- binddn case_sensitive_names false certificate deny_null_bind true discoverdn false groupattr cn groupdn groupfilter (|(memberUid={{.Username}})(member={{.UserDN}})(uniqueMember={{.UserDN}})) insecure_tls false starttls false tls_max_version tls12 tls_min_version tls12 upndomain url ldap:// userattr cn userdn


Then write the appropriate configs. The groupfilter shown is particular to openldap.

vault write auth/ldap/config \ url="ldap://" \ binddn="cn=admin,dc=proxysql,dc=com" \ bindpass="password" \ userdn="ou=users,dc=proxysql,dc=com" \ userattr="uid" \ groupdn="ou=groups,dc=proxysql,dc=com" \ groupattr="cn" \ groupfilter="(|(memberUid={{.Username}})(member={{.UserDN}})(uniqueMember={{.UserDN}}))" \ insecure_tls=true


Example output:

Success! Data written to: auth/ldap/config


Now look at settings again after writing configs:

vault read auth/ldap/config

Example output:

Key Value --- ----- binddn cn=admin,dc=proxysql,dc=com case_sensitive_names false certificate deny_null_bind true discoverdn false groupattr cn groupdn ou=groups,dc=proxysql,dc=com groupfilter (|(memberUid={{.Username}})(member={{.UserDN}})(uniqueMember={{.UserDN}})) insecure_tls true starttls false tls_max_version tls12 tls_min_version tls12 upndomain url ldap:// userattr uid userdn ou=users,dc=proxysql,dc=com

Now test LDAP authentication against Vault. It is important to do this in a new connection to the app container, because the connection being used to do the above configuration has the root token authenticated.

Until policies are mapped in a later command below, this command will report “* user is not a member of any authorized group.” That is okay for now.

vault auth -method=ldap username=vthompson [thompson]

Example output:

Error making API request. URL: PUT http://vault:9200/v1/auth/ldap/login/vthompson Code: 400. Errors: * user is not a member of any authorized group


Map LDAP Groups to Roles and Policies

Create a mysql readonly role with a default expiration. In this case, MySQL is the database, but a variety of database connections are supported. This section is done back in the root-authenticated app container connection.

vault write database/roles/readonly \ db_name=mysql \ creation_statements="CREATE USER '{{name}}'@'%' IDENTIFIED BY '{{password}}';GRANT SELECT ON *.* TO '{{name}}'@'%';" \ default_ttl="1h" \ max_ttl="24h"

Example output:

Success! Data written to: database/roles/readonly


View the vault policies:

vault policies

Example output:

default root


The following hcl configuration file was pre-loaded into the app container in the docker-compose file. Take a look at the contents:

# cat mysqlread.hcl path "database/creds/readonly" { policy = "read" capabilities = ["list", "read"] }


Use this file to create and apply this to the mysql read-only role created above.

vault policy-write mysqlread mysqlread.hcl

Example output:

Policy 'mysqlread' written.


And set the mapping:

vault write auth/ldap/groups/developers policies=mysqlread

Example output:

Success! Data written to: auth/ldap/groups/developers


Use Token to Connect

Here, the test is as the created user, not root, so go back to the connection to the app container that is not root-authenticated, or open a new connection to the app container.

vault auth -method=ldap username=vthompson [thompson]

Example output:

Successfully authenticated! You are now logged in. The token below is already saved in the session. You do not need to "vault auth" again with the token. token: e17233fc-a30b-717e-20f5-d5faa293ad61 token_duration: 2764799 token_policies: [default mysqlread]


When LDAP is authenticated as above, there will be a token in the session. Use this to get a MySQL credential from the path associated with the read-only policy granted above.

Again, do this as the created user, not root.

vault read database/creds/readonly

Example output:

Key Value --- ----- lease_id database/creds/readonly/e0ece849-d53d-7ab8-8502-ff1533a61b74 lease_duration 12h0m0s lease_renewable true password A1a-7y7zzr463qqzr34u username v-ldap-vthom-readonly-ux9rx9z3pr


Note: If your setup uses MySQL 5.6 or earlier, the default username generated by Vault will be too long to fit in the data field. For example:

* Error 1470: String 'v-ldap-vthom-readonly-ux9rx9z3pr' is too long for user name (should be no longer than 16)

To fix this, change the plugin used when setting up MySQL authorization above to “mysql-legacy-database-plugin.”


Test database access directly from the app container with username/password returned from that output. The name of the container running MySQL is mysqlprimary.

mysql -hmysqlprimary -uv-ldap-vthom-readonly-ux9rx9z3pr -cA1a-7y7zzr463qqzr34u


At this point, Vault is serving temporary MySQL user accounts, authenticated by LDAP. I also wanted to test ProxySQL functionality, so I added this temporary MySQL user to ProxySQL. I did this manually for testing, but in any production scenario, this part would be automated with a wrapper script.


4. Add users to ProxySQL.

Connect ProxySQL

If ProxySQL were supported as a database connection from Vault, then I would use a direct connection to it. I did some experimentation to force the authorization step to write to ProxySQL; the primary blocker was lack of support for transactions in ProxySQL. I understand there may have been further work on this since my tests.

Here, I will add the user manually to ProxySQL. From the proxysql container, login as admin:

mysql -u admin -padmin -h -P6032


Then add the user provided by Vault above. Remember ProxySQL requires writing to runtime and disk:

insert into mysql_users (username,password) values ("v-ldap-vthom-readonly-ux9rx9z3pr","A1a-7y7zzr463qqzr34u"); LOAD MYSQL USERS TO RUNTIME; SAVE MYSQL USERS TO DISK;


To allow tracking user activity through ProxySQL, enable proxysql query logging:



The following statements turn on the default query rules:

select * from mysql_query_rules\G UPDATE mysql_query_rules SET log=1; LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK;


Now MySQL can be accessed through ProxySQL:

mysql -hproxysql -u v-ldap-vthom-readonly-ux9rx9z3pr -pA1a-7y7zzr463qqzr34u -P3306


Network restrictions could force connections to go through ProxySQL if needed. Filters could also be enabled inside ProxySQL to further limit access to the data.

Do a sample select to populate the log. The default pattern above is defined as match_pattern: ^SELECT, so either run your select in uppercase, or add a different match_pattern.

SELECT * from mysql.user;


The logs are stored in the following directory on the proxysql container:

ls -al /var/lib/proxysql/ | grep log


The tool to read the logs in plain text is not installed by default. Here is one way to enable it on the proxysql container:

yum install git -y git clone yum groupinstall "Development Tools" -y cd proxysql/tools make


Finally, use the tool to read the logs:

./eventslog_reader_sample /var/lib/proxysql/queries.log.00000001


Example output:

ProxySQL LOG QUERY: thread_id="3" username="v-ldap-vthom-readonly-ux9rx9z3pr" schemaname=information_schema" client="" HID=1 server="mysqlreplica:3306" starttime="2018-04-27 02:13:33.430834" endtime="2018-04-27 02:13:33.436488" duration=5654us digest="0x3B281446289B0759" SELECT * from mysql.user



As shown, Vault can be used to authenticate against LDAP and provide (and expire) temporary MySQL credentials. These can additionally be added to ProxySQL to restrict access to the data stored in MySQL via filters and to log activity on the database.

Using Docker, you can set up a test environment easily to test the functionality and configuration settings needed for this type of stack. Once the process and functionality are confirmed, further automation can be used to deploy a similar stack to production.


ClickHouse Meetup in Salt Lake City

Join Percona CTO Vadim Tkachenko at the Cloud Native Utah meetup in Salt Lake City on Tuesday, May 8, 2018, for an Intro to ClickHouse.

Next week, I’ll be switching from MyRocks performance testing and present an introduction to ClickHouse to the Cloud Native Utah meetup.

Interestingly enough, even though it is totally different from OLTP engines, ClickHouse uses a MergeTree engine. MergeTree engines have a lot of similarities with Log Structured Merge Tree (which is what is used by MyRocks / RocksDB). This the structure is optimized to run on huge datasets / low memory scenarios.

PingCAP TiDB and CockroachDB – the new databases on the block – are using RocksDB as the main storage engine. So is Log Structured Merge Tree the future of databases?

We can talk about this and other questions next week in Salt Lake City. If you are in town please join us at the Cloud Native Utah meetup.


The post ClickHouse Meetup in Salt Lake City appeared first on Percona Database Performance Blog.

Linux configuration for MySQL Cluster

NDB Cluster was designed from the ground up for real-time operations.
It has its origins in the telecom industry where predictability of performance
and latency is absolutely critical. In addition the telecom vendors are
competing very much on pricing of their product.

This leads to that it is important to get the most performance from each
telecom server. Increasing the performance by 10% means that you need 10%
less network equipment. If many servers are shipped this can be a substantial
cost that is worth spending valuable engineering time to achieve.

Thus if you are using or are planning to MySQL Cluster on many servers it
is a good idea to spend some time ensuring that one achieves optimal
performance. If you are operating MySQL Cluster in just a few servers the
payback on your time investment might not be as high.

Another reason to take a deep interest in performance of MySQL Cluster is
of course that you want to learn more about Linux and MySQL Cluster.

In this blog post I will mainly discuss one specific optimisation that can have
large impact on your performance. This is placing the execution threads of
MySQL Cluster and Linux on to the proper CPUs.

In my book MySQL Cluster 7.5 inside and out I devote 3 chapters to this
topic. One chapter on the usage of hyperthreading in data nodes, one
chapter on configuration of CPU locking for data nodes in NDB and
one chapter on configuring Linux interrupt handling.

In this blog post I will mainly discuss the Linux configuration.

NDB is a distributed DBMS where all operations pass through the network,
thus network operations is an essential part of the setup of a MySQL Cluster.

So how does a network packet arrive from the wire into MySQL Cluster and
out again.

The first thing that happens is that the network card receives a packet. It now
needs to inform a CPU about this packet such that Linux TCP/IP code can
handle the packet.

Now the Linux TCP/IP is divided into 3 parts. The first part is handling the
HW interrupt from the network card, the second part is handling the
soft interrupt. The final part is a function call interrupt from the soft interrupt
to the device driver to handle most of the interrupt handling.

The function call interrupt part was introduced in Linux 2.6.35, thus in older
Linux kernels this concept doesn't exist.

Modern Linux kernels also use the NAPI mechanism, this means that the
HW interrupt is disabled during the time we process a set of HW interrupts.
This mechanisms avoids overloading the CPUs with interrupt handling and
takes care of interrupts in batches at high load.

Now Linux interrupt setup is highly configurable and the defaults depends
among other things on the Linux distribution.

There are three main areas that can be configured for Linux. These are
RSS (Receive Side Scaling), RPS (Receive Packet Steering) and
RFS (Receive Flow Steering).

RSS handles setup of the HW interrupt and the soft interrupt. Often the
default is to spread these interrupts on all CPUs. This is usually not a
good idea for MySQL Cluster since some of the CPUs we want to
protect as much as possible from all OS activity (these are in particular
the LDM threads).

RPS is the new mechanism introduced in Linux 2.6.35 and one can configure
the CPUs where the function call interrupts are handled.

Finally RFS tries to ensure that the function call interrupt is executed
on the same CPUs where the application calls recv from. In NDB data nodes
this happens in the recv threads. In a MySQL Server the communication
from the data nodes arrives in the NDB API receive threads (this can be
configured to be locked to a specific CPU). The communication from MySQL
clients arrive in the connection thread where they are executed, so these are
spread on all CPUs the MySQL server is executing on.

When the NDB data node has processed the messages arriving on the network
it will send some responses onto the network. In the NDB data nodes this
happens either in all the threads or it happens in a send thread or a combination
of the two.

Linux has the possibility to configure transmit interrupts as well through
something called XPS (Transmit Packet Steering). The optimal behaviour is
achieved if the transmit interrupts are handled by the same CPU as the send
call is done from.

Configuring Linux and MySQL Cluster together can have a major positive
impact on performance. To achieve the best performance it is important to
consider the flow of data through the servers to process the database requests
from NDB clients (e.g. the MySQL Server).

A Look at MyRocks Performance

In this blog post, I’ll look at MyRocks performance through some benchmark testing.

As the MyRocks storage engine (based on the RocksDB key-value store ) is now available as part of Percona Server for MySQL 5.7, I wanted to take a look at how it performs on a relatively high-end server and SSD storage. I wanted to check how it performs for different amounts of available memory for the given database size. This is similar to the benchmark I published a while ago for InnoDB (

In this case, I plan to use a sysbench-tpcc benchmark ( and I will execute it for both MyRocks and InnoDB. We’ll use InnoDB as a baseline.

For the benchmark, I will use 100 TPC-C warehouses, with a set of 10 tables (to shift the bottleneck from row contention). This should give roughly 90GB of data size (when loaded into InnoDB) and is a roughly equivalent to 1000 warehouses data size.

To vary the memory size, I will change innodb_buffer_pool_size from 5GB to 100GB for InnoDB, and rocksdb_block_cache_size for MyRocks.

For MyRocks we will use LZ4 as the default compression on disk. The data size in the MyRocks storage engine is 21GB. Interesting to note, that in MyRocks uncompressed size is 70GB on the storage.

For both engines, I did not use FOREIGN KEYS, as MyRocks does not support it at the moment.

MyRocks does not support SELECT .. FOR UPDATE statements in REPEATABLE-READ mode in the Percona Server for MySQL implementation. However, “SELECT .. FOR UPDATE” is used in this benchmark. So I had to use READ-COMMITTED mode, which is supported.

The most important setting I used was to enable binary logs, for the following reasons:

  1. Any serious production uses binary logs
  2. With disabled binary logs, MyRocks is affected by a suboptimal transaction coordinator

I used the following settings for binary logs:

  • binlog_format = ‘ROW’
  • binlog_row_image=minimal
  • sync_binlog=10000 (I am not using 0, as this causes serious stalls during binary log rotations, when the  content of binary log is flushed to storage all at once)

While I am not a full expert in MyRocks tuning yet, I used recommendations from this page: The Facebook-MyRocks engineering team also provided me input on the best settings for MyRocks.

Let’s review the results for different memory sizes.

This first chart shows throughput jitter. This helps to understand the distribution of throughput results. Throughput is measured every 1 second, and on the chart I show all measurements after 2000 seconds of a run (the total length of each run is 3600 seconds). So I show the last 1600 seconds of each run (to remove warm-up phases):

To better quantify results, let’s take a look at them on a boxplot. The quickest way to understand boxplots is to take a look at the middle line. It represents a median of measurements (see more at

Before we jump to the summary of results, let’s take a look at a variation of the throughput for both InnoDB and MyRocks. We will zoom to a 1-second resolution chart for 100 GB of allocated memory:

We can see that there is a lot of variation with periodical 1-second performance drops with MyRocks. At this moment, I do not know what causes these drops.

So let’s take a look at the average throughput for each engine for different memory settings (the results are in tps, and more is better):

Memory, GB InnoDB MyRocks 5 849.0664 4205.714 10 1321.9 4298.217 20 1808.236 4333.424 30 2275.403 4394.413 40 2968.101 4459.578 50 3867.625 4503.215 60 4756.551 4571.163 70 5527.853 4576.867 80 5984.642 4616.538 90 5949.249 4620.87 100 5961.2 4599.143


This is where MyRocks behaves differently from InnoDB. InnoDB benefits greatly from additional memory, up to the size of working dataset. After that, there is no reason to add more memory.

At the same time, interestingly MyRocks does not benefit much from additional memory.

Basically, MyRocks performs as expected for a write-optimized engine. You can refer to my article How Three Fundamental Data Structures Impact Storage and Retrieval for more details. 

In conclusion, InnoDB performs better (compared to itself) when the working dataset fits (or almost fits) into available memory, while MyRocks can operate (and outperform InnoDB) on small memory sizes.

IO and CPU usage

It is worth looking at resource utilization for each engine. I took vmstat measurements for each run so that we can analyze IO and CPU usage.

First, let’s review writes per second (in KB/sec). Please keep in mind that these writes include binary log writes too, not just writes from the storage engine.

Memory, GB InnoDB MyRocks 5 244754.4 87401.54 10 290602.5 89874.55 20 311726 93387.05 30 313851.7 93429.92 40 316890.6 94044.94 50 318404.5 96602.42 60 276341.5 94898.08 70 217726.9 97015.82 80 184805.3 96231.51 90 187185.1 96193.6 100 184867.5 97998.26


We can also calculate how many writes per transaction each storage engine performs:

This chart shows the essential difference between InnoDB and MyRocks. MyRocks, being a write-optimized engine, uses a constant amount of writes per transaction.

For InnoDB, the amount of writes greatly depends on the memory size. The less memory we have, the more writes it has to perform.

What about reads?

The following table shows reads in KB per second.

Memory, GB InnoDB MyRocks 5 218343.1 171957.77 10 171634.7 146229.82 20 148395.3 125007.81 30 146829.1 110106.87 40 144707 97887.6 50 132858.1 87035.38 60 98371.2 77562.45 70 42532.15 71830.09 80 3479.852 66702.02 90 3811.371 64240.41 100 1998.137 62894.54


We can translate this to the number of reads per transaction:

This shows MyRocks’ read-amplification. The allocation of more memory helps to decrease IO reads, but not as much as for InnoDB.

CPU usage

Let’s also review CPU usage for each storage engine. Let’s start with InnoDB:

The chart shows that for 5GB memory size, InnoDB spends most of its time in IO waits (green area), and the CPU usage (blue area) increases with more memory.

This is the same chart for MyRocks:

In tabular form:

Memory, GB engine us sys wa id 5 InnoDB 8 2 57 33 5 MyRocks 56 11 18 15 10 InnoDB 12 3 57 28 10 MyRocks 57 11 18 13 20 InnoDB 16 4 55 25 20 MyRocks 58 11 19 11 30 InnoDB 20 5 50 25 30 MyRocks 59 11 19 10 40 InnoDB 26 7 44 24 40 MyRocks 60 11 20 9 50 InnoDB 35 8 38 19 50 MyRocks 60 11 21 7 60 InnoDB 43 10 36 10 60 MyRocks 61 11 22 6 70 InnoDB 51 12 34 4 70 MyRocks 61 11 23 5 80 InnoDB 55 12 31 1 80 MyRocks 61 11 23 5 90 InnoDB 55 12 32 1 90 MyRocks 61 11 23 4 100 InnoDB 55 12 32 1 100 MyRocks 61 11 24 4


We can see that MyRocks uses a lot of CPU (in us+sys state) no matter how much memory is allocated. This leads to the conclusion that MyRocks performance is limited more by CPU performance than by available memory.

MyRocks directory size

As MyRocks writes all changes and compacts SST files down the road, it would be interesting to see how the data directory size changes during the benchmark so we can estimate our storage needs. Here is a chart of datadirectory size:

We can see that datadirectory goes from 20GB at the start, to 31GB during the benchmark. It is interesting to observe the data growing until compaction shrinks it.


In conclusion, I can say that MyRocks performance increases as the ratio of dataset size to memory increases, outperforming InnoDB by almost five times in the case of 5GB memory allocation. Throughput variation is something to be concerned about, but I hope this gets improved in the future.

MyRocks does not require a lot of memory and shows constant write IO, while using most of the CPU resources.

I think this potentially makes MyRocks a great choice for cloud database instances, where both memory and IO can cost a lot. MyRocks deployments would make it cheaper to deploy in the cloud.

I will follow up with further cloud-oriented benchmarks.

Extras Raw results, scripts and config

My goal is to provide fully repeatable benchmarks. To this end, I’m  sharing all the scripts and settings I used in the following GitHub repo:

MyRocks settings

rocksdb_max_open_files=-1 rocksdb_max_background_jobs=8 rocksdb_max_total_wal_size=4G rocksdb_block_size=16384 rocksdb_table_cache_numshardbits=6 # rate limiter rocksdb_bytes_per_sync=16777216 rocksdb_wal_bytes_per_sync=4194304 rocksdb_compaction_sequential_deletes_count_sd=1 rocksdb_compaction_sequential_deletes=199999 rocksdb_compaction_sequential_deletes_window=200000 rocksdb_default_cf_options="write_buffer_size=256m;target_file_size_base=32m;max_bytes_for_level_base=512m;max_write_buffer_number=4;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=20;level0_stop_writes_trigger=30;max_write_buffer_number=4;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=0};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;memtable_prefix_bloom_size_ratio=0.05;prefix_extractor=capped:12;compaction_pri=kMinOverlappingRatio;compression=kLZ4Compression;bottommost_compression=kLZ4Compression;compression_opts=-14:4:0" rocksdb_max_subcompactions=4 rocksdb_compaction_readahead_size=16m rocksdb_use_direct_reads=ON rocksdb_use_direct_io_for_flush_and_compaction=ON

InnoDB settings

# files  innodb_file_per_table  innodb_log_file_size=15G  innodb_log_files_in_group=2  innodb_open_files=4000 # buffers  innodb_buffer_pool_size= 200G  innodb_buffer_pool_instances=8  innodb_log_buffer_size=64M # tune  innodb_doublewrite= 1  innodb_support_xa=0  innodb_thread_concurrency=0  innodb_flush_log_at_trx_commit= 1  innodb_flush_method=O_DIRECT_NO_FSYNC  innodb_max_dirty_pages_pct=90  innodb_max_dirty_pages_pct_lwm=10  innodb_lru_scan_depth=1024  innodb_page_cleaners=4  join_buffer_size=256K  sort_buffer_size=256K  innodb_use_native_aio=1  innodb_stats_persistent = 1  #innodb_spin_wait_delay=96 # perf special  innodb_adaptive_flushing = 1  innodb_flush_neighbors = 0  innodb_read_io_threads = 4  innodb_write_io_threads = 2  innodb_io_capacity=2000  innodb_io_capacity_max=4000  innodb_purge_threads=4  innodb_adaptive_hash_index=1

Hardware spec

Supermicro server:

  • CPU:
    • Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
    • 2 sockets / 28 cores / 56 threads
  • Memory: 256GB of RAM
  • Storage: SAMSUNG  SM863 1.9TB Enterprise SSD
  • Filesystem: ext4
  • Percona-Server-5.7.21-20
  • OS: Ubuntu 16.04.4, kernel 4.13.0-36-generic

The post A Look at MyRocks Performance appeared first on Percona Database Performance Blog.

Keep Sensitive Data Secure in a Replication Setup

This blog post describes how to keep sensitive data secure on slave servers in a MySQL async replication setup.

Almost every web application has a sensitive data: passwords, SNN, credit cards, emails, etc. Splitting the database to secure and “public” parts allows for restricting user and application parts access to sensitive data.

Field encryption

This is based on MySQL encryption functions or on client-side encryption when the authorized user knows a secret, but encrypted data is distributed to all slaves.

  • If possible, use hashes with a big enough salt, and do not store real sensitive data in the database. A good example is passwords. An end-user sends the login and password, application/SQL code calculates the hash with a salt value unique for each end-user and compares the hash with the value stored in the database. Even if the attacker gets the hashes, it’s still hard or even impossible to extract real passwords for all users. Make sure that you are using a good random number generator for the salt, application-side secret, and a good hash function (not MD5).
  • Encryption is not suitable if you are going to provide public access to your database (via slave dumps in sql/csv/xml/json format).
  • Encryption is a complex topic. Check here for a good blog post explaining hashing usage, and try to find a security consultant if you are inventing some “new” method of storing and encrypting data.
Field encryption example

I’m using a single server setup, because the most important part of data separation should be done on the application side. The secure part of the application has a secret passphrase. For example, you can place the code working with authentication, full profile and payments on a separate server and use a dedicated MySQL account.

create database encrypted; use encrypted; create table t(c1 int, c2 varchar(255), rnd_pad varbinary(16), primary key(c1)); SET block_encryption_mode = 'aes-256-cbc'; SET @key_str = SHA2('My secret passphrase',512); SET @init_vector = RANDOM_BYTES(16); insert into t (c1,c2, rnd_pad) values (1, AES_ENCRYPT('Secret', @key_str, @init_vector), @init_vector); -- decrypt data select c1, AES_DECRYPT(c2,@key_str, rnd_pad) from t;

  • GOOD: Master and slave servers have exactly the same data and no problems with replication.
  • GOOD: Even if two different end-users have exactly the same password, the stored values are different due to random bytes in the init vector for AES encryption.
  • GOOD: Both the encryption and random number generation uses an external library (openssl).
  • CONF: It’s important to have binlog_format=ROW to avoid sending the secret to slave servers.
  • CONF: Do not allow end-users to change data without changing the init_vector, especially for small strings without random padding. Each update should cause init_vector re-generation.
  • BAD: Encrypted data is still sent to slave servers. If the encryption algorithm or protocol is broken, it is possible to get access to data from an insecure part of the application.
  • BAD: The described protocol still could be insecure.
Replication filters

There are two types of replication filters: a master-side with binlog-*db and a slave-side with replicate-*.

Both could cause replication breakage. Replication filters were created for STATEMENT-based replication and are problematic with modern binlog_format=ROW + gtid_mode=on setup. You can find several cases related to database-level slave-side filters in this blog post. If you still need slave-side filtering, use per-table replicate-wild-*-table options.


Even if binary logging is disabled for a specific database, the statement still could be stored in the binary log if it’s a DDL statement, or if the binlog_format is STATEMENT or MIXED and default database is not used by the statement. For details, see the reference manual for the binlog-do-db option. In order to avoid replication issues, you should use ROW-based replication and run SET SESSION sql_log_bin=0; before each DDL statement is executed against the ignored database. It’s not a good idea to use binlog-do-db, because you are losing control of what should be replicated.

Why is binary log filtering useful? Changing the sql_log_bin variable is prohibited inside transactions. The sql_log_bin is DANGEROUS, please do not use it instead of binlog-ignore-db in production on the application side. If you need it for database administration, make sure that you are always typing the “session” word before sql_log_bin. This makes problematic consistent updates of multiple entities inside database.

We still should have the ability to hide just one column from the table. But if we are ignoring the database, we should provide a method of reading non-secure data on slaves / by restricted MySQL accounts. This is possible with triggers and views:

create database test; set session sql_log_bin=0; create table test.t(c1 int, c2 int, primary key(c1)); alter table test.t add primary key(c1); set session sql_log_bin=1; create database test_insecure; create table test_insecure.t(c1 int, c2 int default NULL, primary key(c1)); use test delimiter // create trigger t_aft_ins after insert on test.t FOR EACH ROW BEGIN INSERT test_insecure.t (c1) values (NEW.c1); END // create trigger t_aft_upd after update on test.t FOR EACH ROW BEGIN UPDATE test_insecure.t SET c1 = NEW.c1 WHERE c1 = OLD.c1; END // create trigger t_aft_del after delete on test.t FOR EACH ROW BEGIN DELETE FROM test_insecure.t WHERE c1 = OLD.c1; END // delimiter ; -- just on slave: create database test; create view test.t as select * from test_insecure.t; -- typical usage INSERT INTO test.t values(1,1234); SELECT * from test.t; -- works on both master and slave, c2 field will have NULL value on slave.

  • BAD: The data is not the same on the master and slaves. It potentially breaks replication. It’s not possible to use a slave’s backup to restore the master or promote the slave as a new master.
  • BAD: Triggers could reduce DML statement performance.
  • GOOD: The sensitive data is not sent to slaves at all (and not written to binary log).
  • GOOD: It works with GTID
  • GOOD: It requires no application changes (or almost no application changes).
  • GOOD: binlog-ignore-db allows us to not use the dangerous sql_log_bin variable after initial table creation.

The post Keep Sensitive Data Secure in a Replication Setup appeared first on Percona Database Performance Blog.

MySQL Support for Ubuntu 18.04 and Fedora 28

Last week, Ubuntu released their newest long term support version, Ubuntu 18.04. As of today, we offer 18.04 packages for recent versions of almost all MySQL products, both for manual download and through our APT repos. Congratulations to the Ubuntu community on another solid release, and a special shout-out to our friends over at Canonical: we […]