Planet MySQL

What Should I Monitor, and How Should I Do It?

Monitoring tools offer two core types of functionality: alerts based on aliveness checks and comparing metrics to thresholds, and displaying time-series charts of status counters. Nagios + Graphite are the prototypical time-series tools that do these things.

But these tools don’t answer the crucial questions about what we should monitor. What kinds of aliveness/health checks should we build into Nagios? Which metrics should we monitor with thresholds to raise alarms, and what should the thresholds be? What graphs should we build of status counters, which graphs should we examine and what do they mean?

We need guiding principles to help answer these questions. This webinar briefly introduces the principles that motivate and inform what we do at VividCortex, then explains which types of health checks and charts are valuable and what conclusions should be drawn from them. The webinar is focused mostly on MySQL database monitoring, but will be relevant beyond MySQL as well. Some of the questions we answer are:

  • What status counters from MySQL are central and core, and which are peripheral?
  • What is the meaning of MySQL status metrics?
  • Which subsystems inside MySQL are the most common causes of problems in production?
  • What is the unit of work-getting-done in MySQL, and how can you measure it?
  • Which open-source tools do a good job at monitoring in the way we recommend at VividCortex?
  • Which new and/or popular open-source tools should you evaluate when choosing a solution?

You will leave this webinar with a solid understanding of the types of monitoring you should be doing, the low-hanging fruit, and tools for doing it. This is not just a sales pitch for VividCortex. Register below, and we will send you a link to the recording and a copy of the slide deck.

Pic Cred

PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL for Visual Studio 1.2.4 has been released

MySQL for Visual Studio 1.2.4 has been released

The MySQL Windows Experience Team is proud to announce the release of MySQL for Visual Studio 1.2.4. This is a maintenance release for 1.2.x, and it can be used for production environments.

MySQL for Visual Studio is a product that includes all of the Visual Studio integration functionality to create and manage MySQL databases when developing .NET applications.

This version is appropriate for using with MySQL server versions 5.5-5.6. The 5.7 server version is also compatible but is not fully supported, thus it should be used at your own discretion.

As this is a GA version of the MySQL for Visual Studio product, it can be downloaded by using the product standalone installer at this link or with the MySQL Windows Installer.

What’s new in 1.2.4

  • The MySQL for Visual studio plugin is now available in Visual Studio 2015.
  • We stopped supporting Visual Studio 2008.

Bugs Fixed

  • Added Microsoft Visual Studio 2015 support. (Bug #21438524, Bug #77708)

Known limitations

  • MySQL for Visual Studio project templates are designed to work with MVC 4, but Microsoft Visual Studio 2015 is the first VS version that ships with MVC 5. As a workaround for this release, you must install MVC 4 to get the project templates working with VS 2015. If MVC 4 is not installed, then the MySQL template menus and toolbars will be disabled.
  • If MySQL for Visual Studio 1.2.4 is installed before MVC 4, then you must uninstall and then reinstall the 1.2.4 plugin. Executing a “Change” or “Repair” will not work.
  • The Launch Workbench and Launch MySQL Utilities toolbar buttons are disabled in this release.

Quick links

Enjoy and thanks for the support!

MySQL for Visual Studio Team.

PlanetMySQL Voting: Vote UP / Vote DOWN

Become a MySQL DBA blog series - Database upgrades

Database vendors typically release patches with bug/security fixes on a monthly basis, why should we care? The news is full of reports of security breaches and hacked systems, so unless security is not a concern, you might want to have the most current security fixes on your systems. Major versions are rarer, and usually harder (and riskier) to upgrade to. But they might bring along some important features that make the upgrade worth the effort.

In this blog post, we will cover one of the most basic tasks of the DBA - minor and major database upgrades.  

This is the sixth installment in the ‘Become a MySQL DBA’ blog series. Our previous posts in the DBA series include Replication Topology Changes, Schema Changes, High Availability, Backup & Restore, Monitoring & Trending.

MySQL upgrades

Once every couple of years, a MySQL version becomes outdated and is not longer supported by Oracle. It happened to MySQL 5.1 on December 4, 2013, and earlier to MySQL 5.0 on January 9, 2012. It will also happen to MySQL 5.5 somewhere in 2018, 8 years after the GA was released. It means that for both MySQL 5.0 and MySQL 5.1, users cannot rely on fixes - not even for serious, security bugs. This is usually the point where you really need to plan an upgrade of MySQL to a newer version.

You won’t be dealing only with major version upgrades, though - it’s more likely that you’ll be upgrading to minor versions more often, like 5.6.x -> 5.6.y. Most likely, it is so that the newest version brings some fixes for bugs that affect your workload, but it can be any other reason. 

There is a significant difference in the way you perform a major and a minor version upgrade.


Before you can even think about performing an upgrade, you need to decide what kind of testing you need to do. Ideally, you have a staging/development environment where you do tests for your regular releases. If that is the case, the best way of doing pre-upgrade tests will be to build a database layer of your staging environment using the new MySQL version. Once that is done, you can proceed with a regular set of tests. More is better - you want to focus not only on the “feature X works/does not work” aspect but also performance.

On the database side, you can also do some generic tests. For that you would need a list of queries in a slow log format. Then you can use pt-upgrade to run them on both the old and the new MySQL version, comparing the response time and result sets. In the past, we have noticed that pt-upgrade returns a lot of false positives - it may report a query as slow while in fact, the query is perfectly fine on both versions. For that, you may want to introduce some additional sanity checks - parse pt-upgrade output, grab the slow queries it reported, execute them once more on the servers and compare the results again. What you need to keep in mind that you should connect to both old and new database servers in the same way (socket connection will be faster than TCP).

Typical results from such generic tests are queries where the execution plan has changed - usually it’s enough to add some indexes or force the optimizer to pick a correct one. You can also see queries with discrepancies in the result set - it’s most likely a result of lack of explicit ORDER BY in the query - you can’t rely on rows being sorted the way they are if you didn’t sort them explicitly.

Minor version upgrades

A minor upgrade is relatively easy to perform - most of the time, all you need to do is to just install the new version using the package manager of your distribution. Once you do that, you need to ensure that MySQL has been started after the upgrade and then you should run the mysql_upgrade script. This script goes through the tables in your database and ensures all of them are compatible with the current version. It may also fix your system tables if required.

Obviously, installing the new version of a package requires the service to be stopped. Therefore you need to plan the upgrade process. It may slightly differ depending if you use Galera Cluster or MySQL replication.

MySQL replication

When we are dealing with MySQL replication, the upgrade process is fairly simple. You need to upgrade slave by slave, taking them out of rotation for the time required to perform the upgrade (it is a short time if everything goes right, not more than few minutes of downtime). For that you may need to do some temporary changes in your proxy configuration to ensure that the traffic won’t be routed to the slave that is under maintenance. It’s hard to give any details here because it depends on your setup. In some cases, it might not even be needed to make any changes as the proxy can adapt to topology changes on it’s own and detects which node is available and which is not. That’s how you should configure your proxy, by the way.

Once every slave has been updated, you need to execute a planned failover. We discussed the process in an earlier blog post. The process may also depend on your setup. It doesn’t have to be manual one if you have tools to automate it for you (MHA for example). Once a new master is elected and failover is completed, you should perform the upgrade on the old master which, at this point, should be slaving off the new master. This will conclude minor version upgrade for the MySQL replication setup.

Galera Cluster

With Galera, it is somewhat easier to perform upgrades - you need to stop the nodes one by one, upgrade the stopped node and then restart before moving to the next. If your proxy needs some manual tweaks to ensure traffic won’t hit nodes which are undergoing maintenance, you will have to make those changes. If it can detect everything automatically, all you need to do is to stop MySQL, upgrade and restart. Once you gone over all nodes in the cluster, the upgrade is complete.

Major version upgrades

A major version upgrade in MySQL would be 5.x -> 5.y or even 4.x > 5.y. Such upgrade is more tricky and complex that the minor upgrades we just covered in earlier paragraphs.

The recommended way of performing the upgrade is to dump and reload the data - this requires some time (depends on the database size) but it’s usually not feasible to do it while the slave is out of rotation. Even when using mydumper/myloader, the process will take too long. In general, if the dataset is larger than a hundred of gigabytes, it will probably require additional preparations.

While it might be possible to do just a binary upgrade (install new packages), it is not recommended as there could be some incompatibilities in binary format between the old version and the new one, which, even after mysql_upgrade has been executed, may still cause some problems. We’ve seen cases where a binary upgrade resulted is some weird behavior in how the optimizer works, or caused instability. All those issues were solved by performing the dump/reload process. So, while you may be ok to run a binary upgrade, you may also run into serious problems - it’s your call and eventually it’s your decision. If you decide to perform a binary upgrade, you need to do detailed (and time-consuming) tests to ensure it does not break anything. Otherwise you are at risk. That’s why dump and reload is the officially recommended way to upgrade MySQL and that’s why we will focus on this approach to the upgrade.

MySQL replication

If our setup is based on MySQL replication, we will build a slave on the new MySQL version. Let’s say we are upgrading from MySQL 5.5 to MySQL 5.6. As we have to perform a long dump/reload process, we may want to build a separate MySQL host for that. A simplest way would be to use xtrabackup to grab the data from one of the slaves along with the replication coordinates. That data will allow you to slave the new node off the old master. Once the new node (still running MySQL 5.5 - xtrabackup just moves the data so we have to use the same, original, MySQL version) is up and running, it’s time to dump the data. You can use any of the logical backup tools that we discussed in our earlier post on Backup and Restore. It doesn’t matter as long as you can restore the data later.

After the dump had been completed, it’s time to stop the MySQL, wipe out the current data directory, install MySQL 5.6 on the node, initialize the data directory using mysql_install_db script and start the new MySQL version. Then it’s time to load the dumps - a process which also may take a lot of time. Once done, you should have a new and shiny MySQL 5.6 node. It’s time now to sync it back with the master - you can use coordinates collected by xtrabackup to slave the node off a member of the production cluster running MySQL 5.5. What’s important to remember here is that, as you want to eventually slave the node off the current production cluster, you need to ensure that binary logs won’t rotate out. For large datasets, the dump/reload process may take days so you want to adjust expire_logs_days accordingly on the master. You also want to confirm you have enough free disk space for all those binlogs.

Once we have a MySQL 5.6 slaving off MySQL 5.5 master, it’s time to go over the 5.5 slaves and upgrade them. The easiest way now would be to leverage xtrabackup to copy the data from the 5.6 node. So, we take a 5.5 slave out of rotation, stop the MySQL server, wipe out data directory, upgrade MySQL to 5.6, restore data from the other 5.6 slave using xtrabackup. Once that’s done, you can setup the replication again and you should be all set.

This process is much faster than doing dump/reload for each of the slaves - it’s perfectly fine to do it once per replication cluster and then use physical backups to rebuild other slaves. If you use AWS, you can rely on EBS snapshots instead of xtrabackup. Similar to the logical backup, it doesn’t really matter how you rebuild the slaves as long as it will work.

Finally, once all of the slaves were upgraded, you need to failover from the 5.5 master to one of the 5.6 slaves. At this point it may happen that you won’t be able to keep the 5.5 in the replication (even if you setup master - master replication between them). In general, replicating from a new version of MySQL to an older one is not supported - replication might break. One way or another, you’ll want to upgrade and rebuild the old master using the same process as with slaves.

Galera Cluster

Compared to MySQL Replication, Galera is, at the same time, both trickier and easier to upgrade. A cluster created with Galera should be treated as a single MySQL server. This is crucial to remember when discussing Galera upgrades - it’s not a master with some slaves or many masters connected to each other - it’s like a single server. To perform an upgrade of a single MySQL server you need to either do the offline upgrade (take it out of rotation, dump the data, upgrade MySQL to 5.6, load the data, bring it back into rotation) or create a slave, upgrade it and finally failover to it (the process we described in the previous section, while discussing MySQL replication upgrade).

Same thing applies for Galera cluster - you either take everything down for the upgrade (all nodes) or you have to build a slave - another Galera cluster connected via MySQL replication.

An online upgrade process may look as follows. For starters, you need to create the slave on MySQL 5.6 - process is exactly the same as above: create a node with MySQL 5.5 (it can be a Galera but it’s not required), use xtrabackup to copy the data and replication coordinates, dump the data using a logical backup tool, wipe out the data directory, upgrade MySQL to 5.6 Galera, bootstrap the cluster, load the data, slave the node off the 5.5 Galera cluster.

At this point you should have two Galera clusters - 5.5 and a single node of Galera 5.6, both connected via replication. Next step will be to build the 5.6 cluster to a production size. It’s hard to tell how to do it - if you are in the cloud, you can just spin up new instances. If you are using colocated servers in a datacenter, you may need to move some of the hardware from the old to the new cluster. You need to keep in mind the total capacity of the system to make sure it can cope with some nodes taken out of rotation. While hardware management may be tricky, what is nice is that you don’t have to do much regarding building the 5.6 cluster - Galera will use SST to populate new nodes automatically.

In general, the goal of this phase is to build a 5.6 cluster that’s large enough to handle the production workload. Once it’s done, you need to failover to 5.6 Galera cluster - this will conclude the upgrade. Of course, you may still need to add some more nodes to it but it’s now a regular process of provisioning Galera nodes, only now you use 5.6 instead of 5.5.


Blog category: Tags:
PlanetMySQL Voting: Vote UP / Vote DOWN

Announced – MySQL 5.7 Community Contributor Award Program 2015!

I am glad to share this post with community. It is really great story for a young guy to be among experts!

Today Lenka Kasparova announced results of: MySQL 5.7 Community Contributor Award Program 2015!

Awarded as:

Shahriyar Rzayev, the Azerbaijan MySQL User Group leader for reproducing bugs in MySQL 5.7

See my MySQL BUG database activity -> Reporter: Shahriyar Rzayev

Also there is a gift from MySQL Community team:

The post Announced – MySQL 5.7 Community Contributor Award Program 2015! appeared first on Azerbaijan MySQL UG.

PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL QA Episode 9: Reducing Testcases for Experts: multi-threaded

Welcome to MySQL QA Episode 9. This episode will go more in-depth into Reducing Testcases for Experts: multi-threaded

We will explore how to use to do true multi-threaded testcase reduction – a world’s first.


  1. Expert configurable variables & their default settings

Full-screen viewing @ 720p resolution recommended.

The post MySQL QA Episode 9: Reducing Testcases for Experts: multi-threaded appeared first on MySQL Performance Blog.

PlanetMySQL Voting: Vote UP / Vote DOWN

Webinar: Learn how to add HA and DR to MySQL operating on-prem and in VMware vCloud Air public cloud

Join us this Thursday to learn how VMware Continuent adds HA, DR and real-time data warehouse loading to off-the-shelf MySQL operating on-prem and in VMware vCloud Air public cloud.  We introduce vCloud Air basics, then do a deep dive into the VMware Continuent system architecture covering important issues like fail-over, zero-downtime maintenance, and load scaling. We will conclude with a
PlanetMySQL Voting: Vote UP / Vote DOWN

MariaDB 10.1.6 now available

Download MariaDB 10.1.6

Release Notes Changelog What is MariaDB 10.1?

MariaDB APT and YUM Repository Configuration Generator

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.1.6. This is a Beta release.

See the Release Notes and Changelog for detailed information on this release and the What is MariaDB 10.1? page in the MariaDB Knowledge Base for general information about the MariaDB 10.1 series.

Thanks, and enjoy MariaDB!

PlanetMySQL Voting: Vote UP / Vote DOWN

FromDual.en: MySQL Environment MyEnv 1.2.1 has been released

Taxonomy upgrade extras: myenvoperationMySQL Operationsmulti instanceconsolidationtestingupgrade

FromDual has the pleasure to announce the release of the new version 1.2.1 of its popular MySQL, Galera Cluster, MariaDB and Percona Server multi-instance environment MyEnv.

The new MyEnv can be downloaded here.

In the inconceivable case that you find a bug in the MyEnv please report it to our bug tracker.

Any feedback, statements and testimonials are welcome as well! Please send them to

Upgrade from 1.1.x to 1.2.1 # cd ${HOME}/product # tar xf /download/myenv-1.2.1.tar.gz # rm -f myenv # ln -s myenv-1.2.1 myenv

If you are using plug-ins for showMyEnvStatus create all the links in the new directory structure:

cd ${HOME}/product/myenv ln -s ../../utl/oem_agent.php plg/showMyEnvStatus/
Changes in MyEnv 1.2.1 MyEnv
  • Bug in myenv.server fixed (chmod user:).
  • Distribution check for CentOS 7.1 fixed (bug report Jörg).
  • Directory /var/run/mysqld is created by myenv.server
  • Missing function output added.
  • Output changes unified, execute with LC_ALL=C made language independent.
MyEnv Installer
  • Template my.cnf extended by hostname and instance name.
  • Skip in init script implemented, (bug report Jörg).
  • MyEnv my.cnf.template merged with website.
  • Template AUTO_INCREMENT values fixed.
  • Galera Cluster variable causal read in my.cnf template updated.
  • Functions separated into own library.
  • All templates moved from etc to tpl.
  • my.cnf template provided with hostname tag.
MyEnv Utilities
  • Skript alter_engine.php now supports socket.
  • Check in alter_engine.php for Primary Key length of 767 bytes was removed because it was wrong.
  • Skript decrypt_mylogin_cnf.php added.
  • Script log_maintenance.php added.
  • Usage and --help added to

For subscriptions of commercial use of MyEnv please get in contact with us.

PlanetMySQL Voting: Vote UP / Vote DOWN

Yet another MySQL 5.7 silent change

When a new version of MySQL appears, the first source of information for the brave experimenter is a page in the manual named What is new in MySQL X.X, also known as MySQL in a nutshell. For MySQL 5.7, the in-a-nutshell page lists quite a lot of changes. In that page, the list of removed features is enough to send a chill down the spine of most any DBA. Some of the items in the deprecation section are removals in disguise, as they require immediate, rather than delayed, action to use the new version with existing application (SET PASSWORD comes to mind immediately.)

However, for all the abundance of changes, there is at least one (I fear I may find more) that is not mentioned in that page, and yet it may cause some discomfort to users:

mysql [localhost] {msandbox} (information_schema) > select @@version;
| @@version |
| 5.7.8-rc |
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (information_schema) > select * from GLOBAL_STATUS;
Empty set, 1 warning (0.00 sec)

mysql [localhost] {msandbox} (information_schema) > show warnings\G
*************************** 1. row ***************************
Level: Warning
Code: 1287
Message: 'INFORMATION_SCHEMA.GLOBAL_STATUS' is deprecated and will be removed in
a future release. Please use performance_schema.global_status instead
1 row in set (0.00 sec)

Did I miss a deprecation information in MySQL 5.6? Apparently not. The manual says that information_schema.GLOBAL_STATUS is deprecated as of 5.7.6 (and so are GLOBAL_VARIABLES, SESSION_STATUS, and SESSION_VARIABLES).

Until recently, a deprecation notice used to mean that the feature would be removed in a future version, but it remained in place, so I could get organized to adapt my procedures to the recommended changes. So I would expect that a deprecation in 5.7 would become a removal in 5.8, and a removal in 5.7 would mean that the feature had been deprecated in a previous release. But in this case, the deprecation is effectively killing the feature today. It’s no use to me if the information_schema.GLOBAL_VARIABLES is still there when it does not return results. This change can break lots of procedures that check the %_STATUS and %_VARIABLES tables to see if a given variable is available or not. For example, after enabling the semi-sync replication plugin, we want to check if the procedure was successful:

master [localhost] {msandbox} ((none)) > select version();
| version() |
| 5.6.25-log |
1 row in set (0.00 sec)

master [localhost] {msandbox} ((none)) > select * from information_schema.global_variables where variable_name='rpl_semi_sync_master_enabled';
1 row in set (0.00 sec)

The same query does not give us what we expect in MySQL 5.7.8.

master [localhost] {msandbox} ((none)) > select * from information_schema.global_variables where variable_name='rpl_semi_sync_master_enabled';
Empty set, 1 warning (0.00 sec)

master [localhost] {msandbox} ((none)) > show warnings\G
*************************** 1. row ***************************
Level: Warning
Code: 1287
Message: 'INFORMATION_SCHEMA.GLOBAL_VARIABLES' is deprecated and will be removed in a future release. Please use performance_schema.global_variables instead
1 row in set (0.00 sec)

When we query the new table, we get it.

master [localhost] {msandbox} ((none)) > select * from performance_schema.global_variables where variable_name='rpl_semi_sync_master_enabled';
| rpl_semi_sync_master_enabled | ON |
1 row in set (0.00 sec)

However, if we had an automated procedure that was querying the information_schema table for this variable, the result of the operation would suggest that the plugin had not been installed yet or that the installation had failed.

If this is a true deprecation, the old table should continue working. If it is a change like removing the password column or changing the syntax of SET PASSWORD, users would be better off with a complete removal of the offending table. In that case, at least they would be notified of the problem and fix it. As it is now, they may waste time trying to find the cause of incomprehensible failures such as Bug#77732.

PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 5.7 : no more password column!

Maintaining a project like MySQL::Sandbox is sometimes tiring, but it has its advantages. One of them is that everything related to the server setup comes to my attention rather earlier than if I were an average DBA or developer.

I try to keep MySQL Sandbox up to date with every release of MySQL and (to a lesser extent) MariaDB [1]. For this reason, I am used to trying a new release with MySQL Sandbox, and … seeing it fail.

Of the latest changes in MySQL, probably the most disruptive was what happened in MySQL 5.7.6, where the mysql.user table lost the password column.

Yep. No ‘password’ column anymore. And just to make the setup procedure harder, the syntax of SET PASSWORD was changed, and deprecated.

Previously, I could run:

mysql [localhost] {msandbox} (mysql) > select version();
| version() |
| 5.6.25 |
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (mysql) > select host,user,password from user;
| host | user | password |
| localhost | root | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.% | msandbox | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| localhost | msandbox | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| localhost | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.% | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.% | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| localhost | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.% | rsandbox | *B07EB15A2E7BD9620DAE47B194D5B9DBA14377AD |
8 rows in set (0.00 sec)

In the latest releases, though, this fails.

mysql [localhost] {msandbox} (mysql) > select version();
| version() |
| 5.7.8-rc |
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (mysql) > select host,user,password from user;
ERROR 1054 (42S22): Unknown column 'password' in 'field list'

Instead of a password column (which was CHAR(41)), we have now an authentication_string column of type TEXT.

| host | user | authentication_string |
| localhost | root | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.% | msandbox | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| localhost | msandbox | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| localhost | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.% | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.% | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| localhost | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.% | rsandbox | *B07EB15A2E7BD9620DAE47B194D5B9DBA14377AD |

Fixing MySQL Sandbox to handle this issue and to be at the same time compatible with previous releases was quite challenging, but in the end I did it. Recent versions of the sandbox can handle all the releases from Oracle, Percona, and MariaDB without showing hiccups.

So, for testing, the issue is solved. Now comes the hard part: when thousands of database administration procedures will break down for lack of the password column. To all the DBAs and database developers out there: good luck!

  1. It is my pleasure to disclose that MariaDB 10.1 runs in MySQL Sandbox 3.0.55+, with only minimal changes.  ↩

PlanetMySQL Voting: Vote UP / Vote DOWN

What makes a MySQL server failure/recovery case?

Or: How do you reach the conclusion your MySQL master/intermediate-master is dead and must be recovered?

This is an attempt at making a holistic diagnosis of our replication topologies. The aim is to cover obvious and not-so-obvious crash scenarios, and to be able to act accordingly and heal the topology.

At we are dealing with very large amounts of MySQL servers. We have many topologies, and many servers in each topology. See past numbers to get a feel for it. At these numbers failures happen frequently. Typically we would see normal slaves failing, but occasionally -- and far more frequently than we would like to be paged for -- an intermediate master or a master would crash. But our current (and ever in transition) setup also include SANs, DNS records, VIPs, any of which can fail and bring down our topologies.

Tackling issues of monitoring, disaster analysis and recovery processes, I feel safe to claim the following statements:

  • The fact your monitoring tool cannot access your database does not mean your database has failed.
  • The fact your monitoring tool can access your database does not mean your database is available.
  • The fact your database master is unwell does not mean you should fail over.
  • The fact your database master is alive and well does not mean you should not fail over.

Bummer. Let's review a simplified topology with a few failure scenarios. Some of these scenarios you will find familiar. Some others may be caused by setups you're not using. I would love to say I've seen it all but the more I see the more I know how strange things can become.

We will consider the simplified case of a master with three replicas: we have M as master, A, B, C as slaves.


A common monitoring scheme is to monitor each machine's IP, availability of MySQL port (3306) and responsiveness to some simple query (e.g. "SELECT 1"). Some of these checks may run local to the machine, others remote.

Now consider your monitoring tool fails to connect to your master.

I've marked the slaves with question marks as the common monitoring schema does not associate the master's monitoring result to the slaves'.  Can you safely conclude your master is dead? Are your feeling comfortable with initiating a failover process? How about:

  • Temporary network partitioning; it just so happens that your monitoring tool cannot access the master, though everyone else can.
  • DNS/VIP/name cache/name resolving issue. Sometimes similar to the above; does you monitoring tool host think the master's IP is what it really is? Has something just changed? Some cache expired? Some cache is stale?
  • MySQL connection rejection. This could be due to a serious "Too many connections" problem on the master, or due to accidental network noise.

Now consider the following case: a first tier slave is failing to connect to the master:

The slave's IO thread is broken; do we have a problem here? Is the slave failing to connect because the master is dead, or because the slave itself suffers from a network partitioning glitch?

A holistic diagnosis

In the holistic approach we couple the master's monitoring with that of its direct slaves. Before I continue to describe some logic, the previous statement is something we must reflect upon.

We should associate the master's state with that of its direct slaves. Hence we must know which are its direct slaves. We might have slaves D, E, F, G replicating from B, C. They are not in our story. But slaves come and go. Get provisioned and de-provisioned. They get repointed elsewhere. Our monitoring needs to be aware of the state of our replication topology.

My preferred tool for the job is orchestrator, since I author it. It is not a standard monitoring tool and does not serve metrics; but it observes your topologies and records them. And notes changes. And acts as a higher level failure detection mechanism which incorporates the logic described below.

We continue our discussion under the assumption we are able to reliably claim we know our replication topology. Let's revisit our scenarios from above and then add some.

We will further only require MySQL client protocol connection to our database servers.

Dead master

A "real" dead master is perhaps the clearest failure. MySQL has crashed (signal 11); or the kernel panicked; or the disks failed; or power went off. The server is really not serving. This is observed as:

In the holistic approach, we observe that:

  • We cannot reach the master (our MySQL client connection fails).
  • But we are able to connect to the slaves A, B, C
  • And A, B, C are all telling us they cannot connect to the master

We have now cross referenced the death of the master with its three slaves. Funny thing is the MySQL server on the master may still be up and running. Perhaps the master is suffering from some weird network partitioning problem (when I say "weird", I mean we have it; discussed further below). And perhaps some application is actually still able to talk to the master!

And yet our entire replication topology is broken. Replication is not there for beauty; it serves our application code. And it's turning stale. Even if by some chance things are still operating on the master, this still makes for a valid failover scenario.

Unreachable master

Compare the above with:

Our monitoring scheme cannot reach our master. But it can reach the slaves, an they're all saying: "I'm happy!"

This gives us suspicion enough to avoid failing over. We may not actually have a problem: it's just us that are unable to connect to the master.


There are still interesting use cases. Consider the problem of "Too many connections" on the master. You are unable to connect; the application starts throwing errors; but the slaves are happy. They were there first. They started replicating at the dawn of time, long before there was an issue. Their persistent connections are good to go.

Or the master may suffer a deadlock. A long, blocking ALTER TABLE. An accidental FLUSH TABLES WITH READ LOCK. Or whatever occasional bug we hit. Slaves are still connected; but new connections are hanging; and your monitoring query is unable to process.

And still our holistic approach can find that out: as we are able to connect to our slaves, we are also able to ask them: well what have your relay logs have to say about this? Are we progressing in replication position? Do we actually find application content in the slaves' relay logs? We can do all this via MySQL protocol ("SHOW SLAVE STATUS", "SHOW RELAYLOG EVENTS").

Understanding the topology gives you greater insight into your failure case; you have increasing leevels of confidentiality in your analysis. Strike that: in your automated analysis.

Dead master and slaves

They're all gone!

You cannot reach the master and you cannot reach any of its slaves. Once you are able to associate your master and slaves you can conclude you either have a complete DC power failure problem (or is this cross DC?) or you are having a network partitioning problem. Your application may or may not be affected -- but at least you know where to start. Compare with:

Failed DC

I'm stretching it now, because when a DC fails all the red lights start flashing. Nonetheless, if M, A, B are all in one DC and C is on another, you have yet another diagnosis.

Dead master and some slaves

Things start getting complicated when you're unable to get an authorized answer from everyone. What happens if the master is dead as well as one of its slaves? We previously expected all slaves to say "we cannot replicate". For us, master being unreachable, some slaves being dead and all other complaining on IO thread is good enough indication that the master is dead.

All first tier slaves not replicating

Not a failover case, but certainly needs to ring the bells. All master's direct slaves are failing replication on some SQL error or are just stopped. Our topology is turning stale.

Intermediate masters

With intermediate master the situation is not all that different. In the below:

The servers E, F, G replicating from C provide us with the holistic view on C. D provides the holistic view on A.

Reducing noise

Intermediate master failover is a much simpler operation than master failover. Changing masters require name resolve changes (of some sort), whereas moving slaves around the topology affects no one.

This implies:

  • We don't mind over-reacting on failing over intermediate masters
  • We pay with more noise

Sure, we don't mind failing over D elsewhere, but as D is the only slave of A, it's enough that D hiccups that we might get an alert ("all" intermediate master's slaves are not replicating). To that effect orchestrator treats single slave scenarios differently than multiple slaves scenarios.

Not so fun setups and failures

At we are in transition between setups. We have some legacy configuration, we have a roadmap, two ongoing solutions, some experimental setups, and/or all of the above combined. Sorry.

Some of our masters are on SAN. We are moving away from this; for those masters on SANs we have cold standbys in an active-passive mode; so master failure -> unmount SAN -> mount SAN on cold standby -> start MySQL on cold standby -> start recovery -> watch some TV -> go shopping -> end recovery.

Only SANs fail, too. When the master fails, switching over to the cold standby is pointless if the origin of the problem is the SAN. And given that some other masters share the same SAN... whoa. As I said we're moving away from this setup for Pseudo GTID and then for Binlog Servers.

The SAN setup also implied using VIPs for some servers. The slaves reference the SAN master via VIP, and when the cold standby start up it assumes the VIP, and the slaves know nothing about this. Same setup goes for DC masters. What happens when the VIP goes down? MySQL is running happily, but slaves are unable to connect. Does that make for a failover scenario? For intermediate masters we're pushing it to be so, failing over to a normal local-disk based server; this improves out confidence in non-SAN setups (which we have plenty of, anyhow).

Double checking

You sample your server once every X seconds. But in a failover scenario you want to make sure your data is up to date. When orchestrator suspects a dead master (i.e. cannot reach the master) it immediately contacts its direct slaves and checks their status.

Likewise, when orchestrator sees a first tier slave with broken IO thread, it immediately contacts the master to check if everything is fine.

For intermediate masters orchestrator is not so concerned and does not issue emergency checks.

How to fail over

Different story. Some other time. But failing over makes for complex decisions, based on who the replicating slaves are; with/out log-slave-updates; with-out GTID; with/out Pseudo-GTID; are binlog servers available; which slaves are available in which data centers. Or you may be using Galera (we're not) which answers most of the above.

Anyway we use orchestrator for that; it knows our topologies, knows how they should look like, understands how to heal them, knows MySQL replication rules, and invokes external processes to do the stuff it doesn't understand.

PlanetMySQL Voting: Vote UP / Vote DOWN

Testing the MySQL JSON Labs Release using MySQL sandbox

The MySQL 5.7.7 release candidate has been available for several months, but it doesn't include the new JSON datatype or built-in JSON functions. Those are currently only available in the MySQL JSON Labs Release. Unlike the regular 5.7.7 release, the MySQL JSON Labs Release is only available in two download formats:

In order to try out the new JSON data type and functions on my Mac laptop, I need to build it from source.

Read on to see how I did that with the help of MySQL Sandbox.


Download the tarball

cd ~ wget

Extract the tarball contents

tar xvf mysql-5.7.7-labs-json.tar.gz

Build a new tarball for my OS

cd ~/mysql-5.7.7-labs-json/ mkdir bld cd bld cmake .. -DBUILD_CONFIG=mysql_release -DDOWNLOAD_BOOST=1 -DWITH_BOOST=~/my_boost/ make && ./scripts/make_binary_distribution

Create a sandbox server

make_sandbox --add_prefix=json --export_binaries mysql-5.7.7-labs-json-osx10.8-x86_64.tar.gz ```

Next I connect to my sandbox, verify the version, and run a quick test using the new JSON datatype and JSON functions:

``` $HOME/sandboxes/msb_json5_7_7/use

mysql [localhost] {msandbox} ((none)) > select version(); +-----------------+ | version() | +-----------------+ | 5.7.7-labs-json | +-----------------+ 1 row in set (0.00 sec)

mysql [localhost] {msandbox} ((none)) > use test; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A

Database changed

mysql [localhost] {msandbox} (test) > create table json_table (json_string json); Query OK, 0 rows affected (0.06 sec)

mysql [localhost] {msandbox} (test) > insert into json_table (json_string)

-> values ('{"a":["hello world","goodbye world"]}');

Query OK, 1 row affected (0.00 sec)

mysql [localhost] {msandbox} (test) > select jsn_extract(json_string,'$.a[0]')

-> from json_table;

+-----------------------------------+ | jsn_extract(json_string,'$.a[0]') | +-----------------------------------+ | "hello world" | +-----------------------------------+ 1 row in set (0.00 sec) ```

PlanetMySQL Voting: Vote UP / Vote DOWN

Log Buffer #433: A Carnival of the Vanities for DBAs

This Log Buffer Edition covers Oracle, SQL Server and MySQL blogs of the running week.


  • While checking the sources of the Cassandra/NetBeans integration into GitHub yesterday, something went very badly wrong and ALL the source files in my Maven project that disappeared!
  • AWR Reports, Performance Hub, historisches SQL Monitoring in 12c
  • Oracle Database Mobile Server 12c: Advanced data synchronization engine
  • ORA-39001, ORA-39000 and ORA-39142
  • ORA-15410: Disks in disk group do not have equal size

SQL Server:

  • SAN and NAS protocols and how they impact SQL Server
  • SQL Style Habits: Attack of the Skeuomorphs
  • Is It Worth Writing Unit Tests?
  • Large SQL Server Database Backup on an Azure VM and Archiving
  • Reporting Services: Drawing a Buffer on a Map


  • MySQL Tcpdump system : use percona-toolkit to analyze network packages
  • Replication in real-time from Oracle and MySQL into data warehouses and analytics
  • Altering tablespace of table – new in MySQL 5.7
  • MySQL QA Episode 8: Reducing Testcases for Engineers: tuning
  • MySQL upgrade 5.6 with innodb_fast_checksum=1


Learn more about Pythian’s expertise in Oracle , SQL ServerMySQL, as well as the author Fahd Mirza.


The post Log Buffer #433: A Carnival of the Vanities for DBAs appeared first on Pythian - Data Experts Blog.

PlanetMySQL Voting: Vote UP / Vote DOWN

InnoDB vs TokuDB in LinkBench benchmark

Previously I tested Tokutek’s Fractal Trees (TokuMX & TokuMXse) as MongoDB storage engines – today let’s look into the MySQL area.

I am going to use modified LinkBench in a heavy IO-load.

I compared InnoDB without compression, InnoDB with 8k compression, TokuDB with quicklz compression.
Uncompressed datasize is 115GiB, and cachesize is 12GiB for InnoDB and 8GiB + 4GiB OS cache for TokuDB.

Important to note is that I used tokudb_fanout=128, which is only available in our latest Percona Server release.
I will write more on Fractal Tree internals and what does tokudb_fanout mean later. For now let’s just say it changes the shape of the fractal tree (comparing to default tokudb_fanout=16).

I am using two storage options:

  • Intel P3600 PCIe SSD 1.6TB (marked as “i3600” on charts) – as a high end performance option
  • Crucial M500 SATA SSD 900GB (marked as “M500” on charts) – as a low end SATA SSD

The full results and engine options are available here

Results on Crucial M500 (throughput, more is better)

    Engine Throughput [ADD_LINK/10sec]

  • InnoDB: 6029
  • InnoDB 8K: 6911
  • TokuDB: 14633

There TokuDB outperforms InnoDB almost two times, but also shows a great variance in results, which I correspond to a checkpoint activity.

Results on Intel P3600 (throughput, more is better)

  • Engine Throughput [ADD_LINK/10sec]
  • InnoDB: 27739
  • InnoDB 8K: 9853
  • TokuDB: 20594

To understand the reasoning why InnoDB shines on a fast storage let’s review IO usage by all engines.
Following chart shows Reads in KiB, that engines, in average, performs for a request from client.

Following chart shows Writes in KiB, that engines, in average, performs for a request from client.

There we can make interesting observations that TokuDB on average performs two times less writes than InnoDB, and this is what allows TokuDB to be better on slow storages. On a fast storage, where there is no performance penalty on many writes, InnoDB is able to get ahead, as InnoDB is still better in using CPUs.

Though, it worth remembering, that:

  • On a fast expensive storage, TokuDB provides a better compression, which allows to store more data in limited capacity
  • TokuDB still writes two time less than InnoDB, that mean twice longer lifetime for SSD (still expensive).

Also looking at the results, I can make the conclusion that InnoDB compression is inefficient in its implementation, as it is not able to get befits: first, from doing less reads (well, it helps to get better than uncompressed InnoDB, but not much); and, second, from a fast storage.

The post InnoDB vs TokuDB in LinkBench benchmark appeared first on MySQL Performance Blog.

PlanetMySQL Voting: Vote UP / Vote DOWN

The Q&A: Creating best-in-class backup solutions for your MySQL environment

Thank you for attending my July 15 webinar, “Creating Best in Class Backup solutions for your MySQL environment.” Due to the amount of content we discussed and some minor technical difficulties faced near the end of webinar we have decided to cover the final two slides of the presentation along with the questions asked by attendees during the webinar via this blog post.

The slides are available for download. And you can watch the webinar in it’s entirety here.

The final two slides were about our tips for having a good backup and recovery strategy. Lets see the bullet points along with what would have been their explanation during the webinar :

  • Use the three types of backups
    • Binary for full restores, new slaves
      • Binary backups are easy to restore, plus takes the least amount of time to restore. The mean time to recover is mostly bound by the time to transfer backup to the appropriate target server,
    • Logical for partial restores
      • Logical backups, especially when done table-wise come in handy when you’re wanting to restore one or few smaller tables,
    • Binlog for point in time recovery
      • Very often the need is to have Point In Time Recovery, with a Full backup of any type (Logical or Binary) its half the story, we still need the DML statements processed on the server in order to bring it to the latest state, thats where Binary logs (Binlog) backups come into picture.
  • Store on more than one server and off-site
    •  Store your backups on more than one location, what if the backup server goes down ? Considering offsite storages like Amazon S3 and Glacier with weekly or monthly backups retention can be cheaper options.
  • Test your backups!!!!
    • Testing your backups is very important, its always great to know backups are recoverable and not corrupted. Spin off an EC2 instance if you want, copy and restore the backup there, roll-forward a days worth of binlogs just to be sure.
  • Document restore procedures, script them and test them!!!
    • Also when you test your backups, make sure to document the steps to restore the backup to avoid last minute hassle over which commands to use.
  • If taking from a slave run pt-table-checksum
    • Backups are mostly taken from slaves, as such make sure to checksum them regularly, you dont wanna backup inconsistent data. 
  • Configuration files, scripts
    • Data is not the only thing you should be backing up, backup your config files, scripts and user access at a secure location.
  • Do you need to backup everything all days?
    • For very large instances doing a logical backup is a toughie, in such cases evaluate your backup needs, do you want to backup all the tables ? Most of the time smaller tables are the more important ones, and needs partial restore, backup only those.
  • Hardlinking backups can save lot of disk space in some circumstances
    • There are schemas which contains only a few high activity tables, rest of them are probably updated once a week or are updated by an archiver job that runs montly, make sure to hardlink the files with the previous backup, it can save good amount of space in such scenarios.
  • Monitor your Backups
    • Lastly, monitor your backups. You do not want to realize that you’re backup had been failing the whole time. Even a simple email notification from your backup scripts can help reduce the chance of failure.

Now lets try to answer some of the questions asked during the webinar :

Q : –use-memory=2G, is that pretty standard, if we have more more, should we have a higher value?
Usually we would evaluate the value based on size of xtrabackup_logfile (amount of transactions to apply). If you have more free memory feel free to provide it to –use-memory, you dont want to let the memory be a bottleneck in the restore process.

Q : which is the best backup option for a 8Tb DB?
Usually it would depend on what type of data would you have and business requirements for the backups. For eg: a full xtrabackup and later incrementals on the weekdays would be a good idea. Time required for backups play an important role here, backing up to a slow NAS share can be time consuming, and it will make xtrabackup record lot of transactions which will further increase your restore time. Also look into backing up very important medium-small size tables via logical backups.

Q : I’m not sure if this will be covered, but if you have a 3 master-master-master cluster using haproxy, is it recommended to run the backup from the haproxy server or directly on a specific server? Would it be wise to have a 4th server which would be part of the cluster, but not read from to perform the backups?
I am assuming this a Galera cluster setup, in which case you can do backups locally on any of the node by using tools like percona xtrabackup, however the best solution would be spinning off a slave from one of the nodes and running backups there.

Q : With Mudumper, can we strem the data over SSH or netcat to another server? Or would one have to use something like NFS? I’ve used mysqldump and piped it over netcat before.. curious if we can do that with Mydumper ?
Mydumper is similar in nature with other mysql client tools. They can be run remotely (–host option). Which means you can run mydumper from another server to backup from the master or slave. Mydumper can be piped for sure too.

Q : Is Mydumper still maintained. It hasn’t had a release since March of last year?
Indeed, Max Bubenick from Percona is currently maintaining the project. Actually he has added new features to the tool which  makes it more comprehensive and feature rich. He is planning the next release soon, stay tuned for the blog post.

Q : Is MyDumper an opensource ? prepare and restore are same ?
Absolutely. Right now we need to download the source and compile, however very soon we will have packages built for it too. Prepare and Restore are common terminologies used in the backup lingo, in the webinar, Restore means copying back the backup files from its storage location to the destination location, whereas Prepare means applying the transactions to the backup and making it ready to restore.

Q : Is binlog mirroring needed on Galera (PXC)?
It is good idea to keep binlog mirroring. Even though the IST and SST will do its job to join the node, the binlogs could play a role in case you wanted to rollforward a particular schema on a slave or QA instance.

Q : As we know that Percona XtraBackup takes Full & Incremental as well. Like that Does MyDumper helps in taking the incremental backup.?
At this moment we do not have the ability to take Incremental backups with mydumper or with any other logical backup tool. However, Weekly full backups (logical) and daily binlog backups can serve as the same strategy with other Incremental backup solutions, plus they are easy to restore

Q : Is it possible to encrypt the output file ? What will be Best methodology to back up data with the database size of 7 to 8 Gb and increses 25 % each day ? what is difference between innobackupex and mydumper ?
Indeed its possible to encrypt the backup files, as a matter of fact, we encrypt backup files with GPG keys before uploading to offsite storage. The best method to backup a 7 to 8G instance would be implementing all 3 types of backup we discussed in the webinar, your scenarios require planning for the future, so its always best to have different solutions available as the data grows. Innobackupex is part of the Percona-Xtrabackup toolkit and is a script which does binary backups of databases, MyDumper on the other hand is a logical backup tool which creates backups as text files.

Q : How can I optimize a MySQL dump of a large database? The main bottleneck while taking MySQL dump backup of a large database is if any table is found to be corrupted then it never goes beyond by skipping this corrupted tables temporary. Can we take database backup of large database without using locking mechanism i.e. Does someone know how to make the backup without locking the tables ? Is there any tools which would faster in restoration and backup technique or how come we use MySQL dump to optimize this kind of issue in future during crash recovery.
Mysqldump is logical backup tool, and as such it executes full table scans to backup the tables and write them down in the output file, hence its very difficult to improve performance of mysqldump (query-wise). Assuming that you’re referring the corruption to MyISAM tables, it is highly recommended you repair them before backing up, also to make sure mysqldump doesnt fail due to error on such a corrupt table try using –force option to mysqldump. If you’re using MyISAM tables first recommendation would be to switch to Innodb, with most of the tables innodb locking can be greatly reduced, actually till a point where the locking is negligible, look into –single-transaction. Faster backup recovery can be achieved with binary backups, look into using Percona Xtrabackup tool, we have comprehensive documentation to get you started.

Hope this was a good webinar and we have answered most of your questions. Stay tuned for more such webinars from Percona.

The post The Q&A: Creating best-in-class backup solutions for your MySQL environment appeared first on MySQL Performance Blog.

PlanetMySQL Voting: Vote UP / Vote DOWN