Planet MySQL

Articles on setting up highly available DNS/DHCP servers using MySQL Cluster

I found this old article that shows how to setup
highly available DNS/DHCP servers using MySQL Cluster.

It uses 4 machines with 4 replicas, it describes how to do it with MySQL Cluster 7.3,
obviously today it would make more sense to use 7.5 or 7.6. In this case an extension
would be add ndb_read_backup=on to my.cnf to ensure that all MySQL servers in the
clusters reads the local copy rather than going over the network to read from another

Another tool that I found and discussed in my book on MySQL Cluster that uses
NDB for DNS and DHCP is Kea, here is an article describing how to setup Kea using NDB.

MySQL 8.0: Support for BLOBs in TempTable engine

In some cases, the server creates internal temporary tables while processing statements. These tables could be stored in memory or on disk – the first option is preferred but there exist some limitations. One of such restrictions was presence of TEXT or BLOB columns in the table; as in-memory storage engines (MEMORY and TempTable) did not supported these types server had to use the on-disk engine (InnoDB or MyISAM).…

Database security tasks.

Do you like what you are about to read. Signup as a DBAdojo patreon How serious are companies and the people who support their databases about security? Not serious enough it seems. How many databases are running with users with … Continue reading →

Use Cases for MariaDB Data Versioning

Use Cases for MariaDB Data Versioning rasmusjohansson Thu, 07/05/2018 - 19:54

Working in software development, versioning of code is something that we’ve often taken for granted. Task definitions and bug descriptions are preferably also managed by a system that versions every change. On top of this, we use a lot of documents for designing, documenting and managing our development cycles. For example, some of the tools we use are Jira, Google Docs and our own Knowledge Base to accomplish these things, which all provide versioning support. In MariaDB Server 10.3, we’ve introduced an elegant and easy way for data versioning, called System-Versioned Tables. Let’s look at what it can be used for.

A Look Back at Data for GDPR and PCI DSS Compliance

The General Data Protection Regulation (GDPR) is now enforced by the European Union (EU). All companies collecting user data in the EU have to comply with the GDPR rules. In addition to the daily questions and statements coming over email asking you to agree to new terms because of GDPR, the companies also have to store personal and private data in a way that fulfills the criteria of GDPR.

Card payments also have their own rules. There are standards like the Payment Card Industry Data Security Standard (PCI DSS), which are followed by banks and other online businesses. 1) What happened when, 2) by whom and 3) what did the data look like before and after? In MariaDB Server, the MariaDB Audit plugin is there for dealing with 1) and 2). It can also be used for 3) by looking in the audit logs on changes made, but it doesn’t give you the full data for how it looked before and after. With the newly released System-Versioned Tables this is possible. Let’s say that payment card information is stored in a database table. By turning on versioning for that table, all changes will create a new version of the row(s) affected by the change. The rows will also be time stamped, which means that you can query the row to see what it looked like before.

Handling Personal Data in Registries

When you think about versioning, one thing that comes to mind are registries of all sorts, which is the domain of GDPR when it comes to handling personal data. There are many types of personal data and one important to all of us is healthcare data, for example the patient registries of hospitals. In these registries versioning is of great importance to keep track of patients’ health history and related information such as medication. Other personal data registers are civil registers, tax registers, school and student registers and employee registers. The list is endless.

Rapidly Changing Data Sets

All of the above mentioned examples with data versioning applied in one way or another can be seen as slowly changing data. What I mean is that, although the systems can be huge and the total amount of transactions happening enormous, each piece of data doesn’t change that often. For example, my information in the civil register doesn’t change every second. But what if we have rapidly changing data such as the share rates at a stock exchange or tracking vehicle data for a shipping company. In these cases, we can make use of MariaDB’s data versioning.

Creating applications or software for the above purposes and having a database that provides data versioning out-of-the-box will lead to easier design, less customization and a more secure solutions.

Step-by-step Example

I’ll end with a GDPR example. I have a newsletter with subscribers and want to make sure that I always know when and what has happened to the subscriber data. I create a table in the database for the purpose and turn on versioning for the table.

CREATE TABLE Subscriber ( SubscriberId int(11) NOT NULL AUTO_INCREMENT, FirstName varchar(50) NOT NULL, LastName varchar(50) NOT NULL, Newsletter bit NOT NULL, PRIMARY KEY (SubscriberId) ) ENGINE=InnoDB WITH SYSTEM VERSIONING;

I insert myself as subscriber.

INSERT INTO Subscriber (FirstName, LastName, Newsletter) VALUES ('Rasmus', 'Johansson', 1);

I then try to add a column to the table.

ALTER TABLE Subscriber ADD COLUMN Gender char(1) NULL; ERROR 4119 (HY000): Not allowed for system-versioned `Company`.`Subscriber`. Change @@system_versioning_alter_history to proceed with ALTER.

It results in the above error, because changing a versioned table is not permitted by default. I turn on the possibility to change the table and then the ALTER succeeds.

SET @@system_versioning_alter_history = 1; ALTER TABLE Subscriber ADD COLUMN Gender char(1) NULL; Query OK, 1 row affected (0.17 sec)

I also want a constraint on the new column.

ALTER TABLE Subscriber ADD CONSTRAINT con_gender CHECK (Gender in ('f','m'));

Then I do a couple of updates in the table.

UPDATE Subscriber SET Newsletter = 0 WHERE SubscriberId = 1; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Inserted: 1 Warnings: 0 UPDATE Subscriber SET Gender = 'm' WHERE SubscriberId = 1; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Inserted: 1 Warnings: 0

Finally, I delete the row in the table:

DELETE FROM Subscriber WHERE SubscriberId = 1;

If we ask for the rows in the table, including the old versions we get the following.

SELECT *, ROW_START, ROW_END FROM Subscriber FOR SYSTEM_TIME ALL; +--------------+-----------+-----------+------------+--------+----------------------------+----------------------------+ | SubscriberId | FirstName | LastName | Newsletter | Gender | ROW_START | ROW_END | +--------------+-----------+-----------+------------+--------+----------------------------+----------------------------+ | 1 | Rasmus | Johansson | # | NULL | 2018-06-08 10:57:36.982721 | 2018-06-08 11:14:07.654996 | | 1 | Rasmus | Johansson | | NULL | 2018-06-08 11:14:07.654996 | 2018-06-08 11:15:05.971761 | | 1 | Rasmus | Johansson | | NULL | 2018-06-08 11:15:05.971761 | 2018-06-08 11:15:28.459109 | | 1 | Rasmus | Johansson | | m | 2018-06-08 11:15:28.459109 | 2038-01-19 03:14:07.999999 | +--------------+-----------+-----------+------------+--------+----------------------------+----------------------------+

Even though I deleted the row, I get four old versions of the row. All this was handled by the database. The only thing I had to do was to turn on versioning for the table. What will you do with MariaDB’s System Versioned-Tables? We’d love to hear from you!

Login or Register to post comments

MySQL Performance : 8.0 GA on IO-bound TPCC

This post is mainly inspired by findings from the previous testing of MySQL 8.0 on TPCC workload(s) and observations from IO-bound Sysbench OLTP on Optane -vs- SSD. But also by several "urban myths" I'm often hearing when discussing with users about their IO-bound OLTP performance problems :
Myth #1 : "if I'll double the number of my storage drives -- I'll get x2 times better TPS !"
  • this was mostly true during "HDD era", and again..
  • (ex.: a single thread app doing single random IO reads from a single HDD will not go faster by doing the same from 2x HDD -- similar like single thread workload will not run faster on 8CPU cores -vs- 2CPU cores, etc.)
  • all depends on your workload and how many parallel IO operations you're involving..
  • indeed, it is much more easier to saturate HDD, but it's much more harder to do it with modern SSD/NVMe
  • NOTE : we're speaking about OLTP (e.g. if you started to observe full table scans in your workload -- means you're already doing something wrong ;-))
  • simple rule : if you're not saturating your storage on any of its limits => you'll not see any gain by adding more drives, you'll probably just have a bigger storage space, that's all.

Myth #2 : "I'll go faster with flash drive which is claimed capable x2 times more IOps in specs than my current drive !"
  • if you're expecting to run OLTP workload, rather pay attention to IO latency first !
  • sometimes it may be not mentioned these x2 times more IOps were obtained with x4 times more IO threads ;-))
  • e.g. a drive capable of x2 times more IOps but with x4 times higher latency will still be x4 times slower than your current drive on 1 user load, and 2, and 4, and .. maybe up to 64 ? (depends on where your current drive the limit is reached) -- and if you don't have more than 64 concurrent users ? -- then you'll never see your x2 times more IOps, but rather x4 times worse TPS ;-))

Test yourself - this is the only advice I could give you !
  • because "only a real test will give you a real answer" (and I'm repeating to say it again and again.. ;-))
  • testing your own workload will give you the best answer !
  • otherwise you may still use some generic benchmark workloads which are representative for you
  • for ex. your new flash drive may be look better from all points and passing very well all generic pure IO tests, but show x3 times worse results once used by MySQL -- and this is just because, for ex., every involved fsync() will take x3 times more time, etc. (based on real story, no kidding ;-))

So far, for the following story I'll use :
  • Sysbench-TPCC workload, 10x100W (x10 of 100 warehouses, ~100GB data, here is why)
  • Same Skylake server as before, same config, etc.
  • and the same Optane & SSD as in the previous testing, except that I'll also add to the game x2 Optane drives used as a single RAID-0 MDM volume !
  • EXT4 is used on top of each drive or volume

Starting Test scenario :
  • concurrent users : 1, 2, 4, .. 1024
  • trx_commit : 1 (flush REDO on every COMMIT)
  • Buffer Pool (BP) : 128GB (in-memory), 32GB (IO-bound)

Sorry, no "user friendly" graphs this time, but hope it'll still be easy to understand the results ;-))
On the first graph you'll see :
  • 3 test results : with x2 Optaine first, then with single Optane, and then single SSD
  • on each test the load is going from 1, 2, 4 .. up to 1024 concurrent users
  • the Commits/sec curve is showing obtained TPS level

and the first result is with 128GB BP :

InnoDB Buffer Pool 128GB Comments :
  • first of all, as you can see, TPS is not better with two -vs- one Optane
  • as we're not hitting any single limit of Optane drive, there is no any gain by using x2 ;-))
  • also, regardless the used data volume is ~100GB, we're not observing here such a big difference between Optane -vs- SSD as it was with Sysbench OLTP on a similar data size and still "in-memory"..
  • this because TPCC workload is much less aggressive on IO writes, so REDO flushing is less impacted..

now, how TPS will be impacted if BP size was smaller, just 32GB ?

Read more... (9 min remaining to read)

How to split MySQL/MariaDB datadir to multiple mount points

If you are going to be using InnoDB tables and if you plan to have innodb_file_per_table enabled, then your best option would probably be to use the CREATE TABLE statement’s “DATA DIRECTORY” option, so that you can place a table outside the data directory.
From the MySQL documentation:


For InnoDB, the DATA DIRECTORY=’directory’ option allows you to create InnoDB file-per-table tablespaces outside the MySQL data directory. Within the directory that you specify, MySQL creates a subdirectory corresponding to the database name, and within that a .ibd file for the table. The innodb_file_per_table configuration option must be enabled to use the DATA DIRECTORY option with InnoDB. The full directory path must be specified. See Section 14.7.5, “Creating File-Per-Table Tablespaces Outside the Data Directory” for more information.

There’s additional information about that here:

If you expect that any single table will exceed 1 TB, then you may need to use partitioning for that table, and you may need to use different “DATA DIRECTORY” clauses for different partitions.

Sometime we are curious to know that how would be the backup and restoration performed on these databases while using CREATE TABLE statements with “DATA DIRECTORY” option?

In case of mysqldump or Percona xtrabackup or MariaDB Backup,

mysqldump backups would include the DATA DIRECTORY option. However, a mysqldump backup would probably be impractical for a database that is several TB large.

Percona XtraBackup seems to do something a little interesting. It places the tablespace file in the same directory as the rest of the backup during the actual backup step, but during the restore step, it places the tablespace file back in the correct directory as specified by the DATA DIRECTORY option. See the output below for an example:

[ec2-user@ip-172-30-0-249 ~]$ mysql -u root Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 11 Server version: 10.1.31-MariaDB MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> use db1; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed MariaDB [db1]> CREATE TABLE tab ( -> str varchar(50) -> ) DATA DIRECTORY='/mariadb_data/'; Query OK, 0 rows affected (0.01 sec) MariaDB [db1]> \q Bye [ec2-user@ip ~]$ sudo ls -l /mariadb_data/ total 0 drwxrwx--- 2 mysql mysql 20 Mar 23 15:04 db1 [ec2-user@ip ~]$ sudo ls -l /mariadb_data/db1/ total 192 -rw-rw---- 1 mysql mysql 98304 Mar 23 15:04 tab.ibd [ec2-user@ip ~]$ cd backups/ [ec2-user@ip backups]$ sudo innobackupex /home/ec2-user/backups/ 180323 15:06:05 innobackupex: Starting the backup operation IMPORTANT: Please check that the backup run completes successfully. At the end of a successful backup run innobackupex prints "completed OK!". ... 180323 15:06:14 completed OK! [ec2-user@ip backups]$ sudo ls -l total 4 drwx------ 8 root root 4096 Mar 23 15:06 2018-03-23_15-06-05 [ec2-user@ip backups]$ sudo ls -l 2018-03-23_15-06-05/db1/ | grep "tab\..*" -rw-r----- 1 root root 481 Mar 23 15:06 tab.frm -rw-r----- 1 root root 98304 Mar 23 15:06 tab.ibd -rw-r----- 1 root root 25 Mar 23 15:06 tab.isl [ec2-user@ip backups]$ sudo systemctl stop mariadb [ec2-user@ip backups]$sudo innobackupex --applylog 2018-03-23_15-06-05\ 180323 15:14:17 innobackupex: Starting the apply-log operation IMPORTANT: Please check that the apply-log run completes successfully. At the end of a successful apply-log run innobackupex prints "completed OK!". ... 180323 15:14:21 completed OK! [ec2-user@ip backups]$ sudo rm /mariadb_data/db1/tab.ibd [ec2-user@ip backups]$ sudo rm -fr /var/lib/mysql/* [ec2-user@ip backups]$ sudo innobackupex --copy-back /home/ec2-user/backups/2018-03-23_15-06-05/ 180323 15:16:17 innobackupex: Starting the copy-back operation IMPORTANT: Please check that the copy-back run completes successfully. At the end of a successful copy-back run innobackupex prints "completed OK!". ... 180323 15:16:25 completed OK! [ec2-user@ip backups]$ sudo ls -l /mariadb_data/db1/ total 96 -rw-r----- 1 root root 98304 Mar 23 15:16 tab.ibd

MariaDB Backup based on Percona Xtrabackup so almost same behavior will be seen while using it.

Log Buffer #548: A Carnival of the Vanities for DBAs

This Log Buffer Edition covers blog posts from Cloud, Oracle, and MySQL.


Google Stackdriver lets you track your cloud-powered applications with monitoring, logging and diagnostics. Using Stackdriver to monitor Google Cloud Platform (GCP) or Amazon Web Services (AWS) projects has many advantages—you can get detailed performance data and can set up tailored alerts.

This post is courtesy of Sam Dengler, AWS Solutions Architect. Message brokers can be used to solve a number of needs in enterprise architectures, including managing workload queues and broadcasting messages to a number of subscribers.

New Cloud Filestore service brings GCP users high-performance file storage.

One of the biggest trends in application development today is the use of APIs to power the backend technologies supporting a product.

It’s no secret that data is an essential part of running a business, no matter how large or small a business may be. Many companies host their business data using relational databases.


How can I print to PDF? How can I get a document/report with my data? How can I export my data from APEX to Excel?

Almost a year ago, Oracle released Oracle GoldenGate 12c ( At that time, there were two architectures released: Microservices and Classic. Both architectures provided the same enterprise-level replication. The only difference was that one enabled a RESTful API interface with HTML5 page and the other was still command line driven.

Ubuntu 16.04: Installation of Chrome browser fails with libnss3 (>= 2:3.22) [2]

UTL_FILE_DIR and 18c

DevOps in OAC: Scripting Oracle Cloud Instance Management with PSM Cli


One problem that’s a lot less common these days is swapping. Most of the issues that cause swapping with MySQL have been nailed down to several different key configuration points, either in the OS or MySQL, or issues like the swap insanity issue.

MariaDB 10.3 is now generally available (10.3.7 was released GA on 2018-05-25). The article What’s New in MariaDB Server 10.3 by the MariaDB Corporation lists three key improvements in 10.3: temporal data processing, Oracle compatibility features, and purpose-built storage engines.

MySQL 8.0 InnoDB Cluster on ARM64 with Oracle Linux and the Raspberry Pi 3B.

Detecting Incompatible Use of Spatial Functions before Upgrading to MySQL 8.0

JFG Posted on the Percona Community Blog – A Nice Feature in MariaDB 10.3: no InnoDB Buffer Pool in Core Dumps

Audit Log’s JSON format logging

Blood, sweat, tears and the JSON format logging is finally supported by the Audit Log plugin. This comes in pair with the feature that allows to read log events, which could be useful for rapid analysis of the audit log trail without the need of accessing the files directly.…

New Webinar: Disaster Recovery Planning for MySQL & MariaDB with ClusterControl

Everyone should have a disaster recovery plan for MySQL & MariaDB!

Join Vinay Joosery, CEO at Severalnines, on July 24th for our new webinar on Disaster Recovery Planning for MySQL & MariaDB with ClusterControl; especially if you find yourself wondering about disaster recovery planning for MySQL and MariaDB, if you’re unsure about RTO and RPO or whether you should you have a secondary datacenter, or are concerned about disaster recovery in the cloud…

Organizations need an appropriate disaster recovery plan in order to mitigate the impact of downtime. But how much should a business invest? Designing a highly available system comes at a cost, and not all businesses and certainly not all applications need five 9’s availability.

Vinay will explain key disaster recovery concepts and walk us through the relevant options from the MySQL & MariaDB ecosystem in order to meet different tiers of disaster recovery requirements; and demonstrate how ClusterControl can help fully automate an appropriate disaster recovery plan.

Sign up below to join the discussion!

Date, Time & Registration Europe/MEA/APAC

Tuesday, July 24th at 09:00 BST / 10:00 CEST (Germany, France, Sweden)

Register Now

North America/LatAm

Tuesday, July 24th at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)

Register Now

  • Business Considerations for DR
    • Is 100% uptime possible?
    • Analyzing risk
    • Assessing business impact
  • Defining DR
    • Outage Timeline
    • RTO
    • RPO
    • RTO + RPO = 0 ?
  • DR Tiers
    • No offsite data
    • Database backup with no Hot Site
    • Database backup with Hot Site
    • Asynchronous replication to Hot Site
    • Synchronous replication to Hot Site
  • Implementing DR with ClusterControl
    • Demo
  • Q&A

Vinay Joosery, CEO & Co-Founder, Severalnines

Vinay Joosery, CEO, Severalnines, is a passionate advocate and builder of concepts and business around distributed database systems. Prior to co-founding Severalnines, Vinay held the post of Vice-President EMEA at Pentaho Corporation - the Open Source BI leader. He has also held senior management roles at MySQL / Sun Microsystems / Oracle, where he headed the Global MySQL Telecoms Unit, and built the business around MySQL's High Availability and Clustering product lines. Prior to that, Vinay served as Director of Sales & Marketing at Ericsson Alzato, an Ericsson-owned venture focused on large scale real-time databases.

This webinar builds upon a related white paper written by Vinay on disaster recovery, which you can download here:

We look forward to “seeing” you there!

Tags:  webinar disaster recovery MySQL MariaDB clustercontrol

How to Set Up Replication Between AWS Aurora and an External MySQL Instance

Amazon RDS Aurora (MySQL) provides its own low latency replication. Nevertheless, there are cases where it can be beneficial to set up replication from Aurora to an external MySQL server, as Amazon RDS Aurora is based on MySQL and supports native MySQL replication. Here are some examples of when replicating from Amazon RDS Aurora to an external MySQL server can make good sense:

  • Replicating to another cloud or datacenter (for added redundancy)
  • Need to use an independent reporting slave
  • Need to have an additional physical backup
  • Need to use another MySQL flavor or fork
  • Need to failover to another cloud and back

In this blog post I will share simple step by step instructions on how to do it.

Steps to setup MySQL replication from AWS RDS Aurora to MySQL server
  1. Enable binary logs in the option group in Aurora (Binlog format = mixed). This will require a restart.
  2. Create a snapshot and restore it (create a new instance from a snapshot). This is only needed to make a consistent copy with mysqldump. As Aurora does not allow “super” privileges, running mysqldump --master-data  is not possible. The snapshot is the only way to get a consistent backup with the specific binary log position.
  3. Get the binary log information from the snapshot. In the console, look for the “Alarms and Recent Events” for the restored snapshot instance. We should see something like:
    Binlog position from crash recovery is mysql-bin-changelog.000708 31278857
  4. Install MySQL 5.6 (i.e. Percona Server 5.6) on a separate EC2 instance (for Aurora 5.6 – note that you should use MySQL 5.7 for Aurora 5.7). After MySQL is up and running, import the timezones:
    # mysql_tzinfo_to_sql /usr/share/zoneinfo/|mysql
    Sample config:
    [mysqld] log-bin=log-bin log-slave-updates binlog-format=MIXED server-id=1000 relay-log=relay-bin innodb_log_file_size=1G innodb_buffer_pool_size=2G innodb_flush_method=O_DIRECT innodb_flush_log_at_trx_commit=0 # as this is replication slave
  5. From now on we will make all backups from the restored snapshot. First get all users and import those to the new instance:
    pt-show-grants -h -u percona > grants.sql
    # check that grants are valid and upload to MySQL
    mysql -f < grants.sql
    Make a backup of all schemas except for the “mysql” system tables as Aurora using different format of those (make sure we connect to the snapshot):
    host="" mysqldump --single-transaction -h $host -u percona --triggers --routines --databases `mysql -u percona -h $host -NBe "select group_concat(schema_name separator ' ') from information_schema.schemata where schema_name not in ('mysql', 'information_schema', 'performance_schema')"` > all.sql
  6. Restore to the local database:
    mysql -h localhost < all.sql
  7. Restore users again (some users may fail to create where there are missing databases):
    mysql -f < grants.sql
  8. Download the RDS/Aurora SSL certificate:
    # cd /etc/ssl # wget '' # chown mysql.mysql rds-combined-ca-bundle.pem
  9. Configure MySQL replication. Take the values for the binary log name and position from #3 above. Please note: now we connect to the actual instance, not a snapshot:
    # mysql -h localhost ... mysql> CHANGE MASTER TO MASTER_HOST='dev01-aws-1...', MASTER_USER='awsreplication', MASTER_PASSWORD='<pass>', MASTER_LOG_FILE = 'mysql-bin-changelog.000708', MASTER_LOG_POS = 31278857, MASTER_SSL_CA = '/etc/ssl/rds-combined-ca-bundle.pem', MASTER_SSL_CAPATH = '', MASTER_SSL_VERIFY_SERVER_CERT=1; mysql> start slave;
  10. Verify that the slave is working. Optionally add the SQL_Delay option to the CHANGE MASTER TO (or anytime) and specify the slave delay in seconds.

I hope those steps will be helpful for setting up an external MySQL replica.

The post How to Set Up Replication Between AWS Aurora and an External MySQL Instance appeared first on Percona Database Performance Blog.

Linux OS Tuning for MySQL Database Performance

In this post we will review the most important Linux settings to adjust for performance tuning and optimization of a MySQL database server. We’ll note how some of the Linux parameter settings used OS tuning may vary according to different system types: physical, virtual or cloud. Other posts have addressed MySQL parameters, like Alexander’s blog MySQL 5.7 Performance Tuning Immediately After Installation. That post remains highly relevant for the latest versions of MySQL, 5.7 and 8.0. Here we will focus more on the Linux operating system parameters that can affect database performance.

Server and Operating System

Here are some Linux parameters that you should check and consider modifying if you need to improve database performance.

Kernel – vm.swappiness

The value represents the tendency of the kernel  to swap out memory pages. On a database server with ample amounts of RAM, we should keep this value as low as possible. The extra I/O can slow down or even render the service unresponsive. A value of 0 disables swapping completely while 1 causes the kernel to perform the minimum amount of swapping. In most cases the latter setting should be OK:

# Set the swappiness value as root echo 1 > /proc/sys/vm/swappiness # Alternatively, using sysctl sysctl -w vm.swappiness=1 # Verify the change cat /proc/sys/vm/swappiness 1 # Alternatively, using sysctl sysctl vm.swappiness vm.swappiness = 1

The change should be also persisted in /etc/sysctl.conf:

vm.swappiness = 1

Filesystems – XFS/ext4/ZFS XFS

XFS is a high-performance, journaling file system designed for high scalability. It provides near native I/O performance even when the file system spans multiple storage devices.  XFS has features that make it suitable for very large file systems, supporting files up to 8EiB in size. Fast recovery, fast transactions, delayed allocation for reduced fragmentation and near raw I/O performance with DIRECT I/O.

The default options for mkfs.xfs are good for optimal speed, so the simple command:

# Use default mkfs options mkfs.xfs /dev/target_volume

will provide best performance while ensuring data safety. Regarding mount options, the defaults should fit most cases. On some filesystems you can see a performance increase by adding the noatime mount option to the /etc/fstab.  For XFS filesystems the default atime behaviour is relatime, which has almost no overhead compared to noatime and still maintains sane atime values.  If you create an XFS file system on a LUN that has a battery backed, non-volatile cache, you can further increase the performance of the filesystem by disabling the write barrier with the mount option nobarrier. This helps you to avoid flushing data more often than necessary. If a BBU (backup battery unit) is not present, however, or you are unsure about it, leave barriers on, otherwise you may jeopardize data consistency. With this options on, an /etc/fstab file should look like the one below:

/dev/sda2 /datastore xfs defaults,nobarrier /dev/sdb2 /binlog xfs defaults,nobarrier


ext4 has been developed as the successor to ext3 with added performance improvements. It is a solid option that will fit most workloads. We should note here that it supports files up to 16TB in size, a smaller limit than xfs. This is something you should consider if extreme table space size/growth is a requirement. Regarding mount options, the same considerations apply. We recommend the defaults for a robust filesystem without risks to data consistency. However, if an enterprise storage controller with a BBU cache is present, the following mount options will provide the best performance:

/dev/sda2 /datastore ext4 noatime,data=writeback,barrier=0,nobh,errors=remount-ro /dev/sdb2 /binlog ext4 noatime,data=writeback,barrier=0,nobh,errors=remount-ro

Note: The data=writeback option results in only metadata being journaled, not actual file data. This has the risk of corrupting recently modified files in the event of a sudden power loss, a risk which is minimised with a presence of a BBU enabled controller. nobh only works with the data=writeback option enabled.


ZFS is a filesystem and LVM combined enterprise storage solution with extended protection vs data corruption. There are certainly cases where the rich feature set of ZFS makes it an essential option to consider, most notably when advance volume management is a requirement. ZFS tuning for MySQL can be a complex topic and falls outside the scope of this blog. For further reference, there is a dedicated blog post on the subject by Yves Trudeau:

Disk Subsystem – I/O scheduler 

Most modern Linux distributions come with noop or deadline I/O schedulers by default, both providing better performance than the cfq and anticipatory ones. However it is always a good practice to check the scheduler for each device and if the value shown is different than noop or deadline the policy can change without rebooting the server:

# View the I/O scheduler setting. The value in square brackets shows the running scheduler cat /sys/block/sdb/queue/scheduler noop deadline [cfq] # Change the setting sudo echo noop > /sys/block/sdb/queue/scheduler

To make the change persistent, you must modify the GRUB configuration file:

# Change the line: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash" # to: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash elevator=noop"

AWS Note: There are cases where the I/O scheduler has a value of none, most notably in AWS VM instance types where EBS volumes are exposed as NVMe block devices. This is because the setting has no use in modern PCIe/NVMe devices. The reason is that they have a very large internal queue and they bypass the IO scheduler altogether. The setting in this case is none and it is the optimal in such disks.

Disk Subsystem – Volume optimization

Ideally different disk volumes should be used for the OS installation, binlog, data and the redo log, if this is possible. The separation of OS and data partitions, not just logically but physically, will improve database performance. The RAID level can also have an impact: RAID-5 should be avoided as the checksum needed to ensure integrity is costly. The best performance without making compromises to redundancy is achieved by the use of an advanced controller with a battery-backed cache unit and preferably RAID-10 volumes spanned across multiple disks.

AWS Note: For further information about EBS volumes and AWS storage optimisation, Amazon has documentation at the following links:

Database settings System Architecture – NUMA settings

Non-uniform memory access (NUMA) is a memory design where an SMP’s system processor can access its own local memory faster than non-local memory (the one assigned local to other CPUs). This may result in suboptimal database performance and potentially swapping. When the buffer pool memory allocation is larger than size of the RAM available local to the node, and the default memory allocation policy is selected, swapping occurs. A NUMA enabled server will report different node distances between CPU nodes. A uniformed one will report a single distance:

# NUMA system numactl --hardware available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 65525 MB node 0 free: 296 MB node 1 cpus: 8 9 10 11 12 13 14 15 node 1 size: 65536 MB node 1 free: 9538 MB node 2 cpus: 16 17 18 19 20 21 22 23 node 2 size: 65536 MB node 2 free: 12701 MB node 3 cpus: 24 25 26 27 28 29 30 31 node 3 size: 65535 MB node 3 free: 7166 MB node distances: node   0   1   2   3   0:  10  20  20  20   1:  20  10  20  20   2:  20  20  10  20   3:  20  20  20  10 # Uniformed system numactl --hardware available: 1 nodes (0) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 64509 MB node 0 free: 4870 MB node distances: node   0   0:  10

In the case of a NUMA system, where numactl shows different distances across nodes, the MySQL variable innodb_numa_interleave should be enabled to ensure memory interleaving. Percona Server provides improved NUMA support by introducing the flush_caches variable. When enabled, it will help with allocation fairness across nodes. To determine whether or not allocation is equal across nodes, you can examine numa_maps for the mysqld process with this script:

# The perl script will report memory allocation per CPU node: # 3595 is the pid of the mysqld process perl < /proc/3595/numa_maps N0        :     16010293 ( 61.07 GB) N1        :     10465257 ( 39.92 GB) N2        :     13036896 ( 49.73 GB) N3        :     14508505 ( 55.35 GB) active    :          438 (  0.00 GB) anon      :     54018275 (206.06 GB) dirty     :     54018275 (206.06 GB) kernelpagesize_kB:         4680 (  0.02 GB) mapmax    :          787 (  0.00 GB) mapped    :         2731 (  0.01 GB)


In this blog post we examined a few important OS related settings and explained how they can be tuned for better database performance.

While you are here …

You might also find value in this recorded webinar Troubleshooting Best Practices: Monitoring the Production Database Without Killing Performance


The post Linux OS Tuning for MySQL Database Performance appeared first on Percona Database Performance Blog.

Monitoring Master-Slave Replication in MySQL 8

MySQL 8 introduced a number of enhancements to improve both replication performance and the monitoring thereof. Improvements included more efficient replication of small updates on big JSON documents, the addition of Performance Schema tables for slave performance metrics, tracing and debug logging improvements, among others. With regards to monitoring, Group Replication has been improved by introducing new replication timestamps, additional columns to the performance schema, as well as by making the relationship between replication threads more efficient. We covered the various improvements to replication performance in the Replication Performance Enhancements in MySQL 8 blog. Today’s blog will provide some practical guidelines on monitoring your MySQL 8 master-slave and group (cluster) replication.

Two Types of MySQL Replication

As mentioned in previous blogs, MySQL supports two types of replication: statement-based and row-based:

  1. In statement-based logging, the master writes SQL statements to the binary log. Replication of the master to the slave works by executing the logged SQL statements on the slave. Abbreviated to SBR, statement-based replication corresponds to the MySQL statement-based binary logging format.
  2. In row-based logging, the master writes events to the binary log that indicate how individual table rows are changed. Replication of the master to the slave works by copying the events representing the changes to the table rows to the slave. This is called row-based replication, which can be abbreviated as RBR.
  3. You can also configure MySQL to use a mix of both statement-based and row-based logging, depending on which is most appropriate for the change to be logged. This is called mixed-format logging. When using mixed-format logging, a statement-based log is used by default. Depending on certain statements, and also the storage engine being used, the log is automatically switched to row-based in particular cases. Replication using the mixed format is referred to as mixed-based replication or mixed-format replication.

How you go about tracking replication progress will largely depend on the logging and replication types employed.

General Replication Monitoring

In the Replication Performance Enhancements in MySQL 8 blog, we learned that new replication timestamps were added to the information provided by the SHOW SLAVE STATUS. It provides status information on essential parameters of the slave threads and is useful in ensuring that replication is occurring and that there were no errors between the slave and the master.

It provides a few key pieces of information, such as:

  • Slave_IO_Running tells us if the Slave is able to connect to the Master.
  • Slave_SQL_Running indicates if data received from the Master is being processed.
  • Seconds_Behind_Master shows difference between last timestamp read in the binlogs and current time. This is important to understand. It does not report directly the delay between when information is updated/inserted on the master and recorded on the slave. A slow network can cause an artificially inflated number, as well as long running queries or blocking/locking operations. If it lags behind the Master by any appreciable level, it’s usually a clear sign that replication is not working properly.
  • Last_IO_Errno, Last_IO_Error, Last_SQL_Errno, Last_SQL_Error, are all pretty much what they say, the last error number and error from the IO or SQL threads

Here is some typical SHOW SLAVE STATUS output with the above fields highlighted. The “\G” statement terminator is utilized instead of a semi-colon in order to produce a more readable vertical layout:


*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event


Master_User: root

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: mysql-bin.000006

Read_Master_Log_Pos: 106

Relay_Log_File: stg06-cf-relay-bin.000002

Relay_Log_Pos: 251

Relay_Master_Log_File: mysql-bin.000006

Slave_IO_Running: Yes

Slave_SQL_Running: Yes







Last_Errno: 0


Skip_Counter: 0

Exec_Master_Log_Pos: 106

Relay_Log_Space: 409

Until_Condition: None


Until_Log_Pos: 0

Master_SSL_Allowed: No






Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0


Last_SQL_Errno: 0


1 row in set (0.00 sec)

Using Performance Schema Replication Tables

Being such an integral process, DBAs frequently want to know how replication is proceeding. Traditionally, the health of MySQL replication was monitored using the SHOW SLAVE STATUS command. Eventually, this static command was no longer able to live up to people’s expectations. This is because SHOW SLAVE STATUS does not scale well in the sense that there are multiple parts of the slave server:

  • the receiver
  • the applier
  • intermediate queues, etc

Once transactions are received by a replication channel’s receiver (I/O) thread, they are added to the channel’s relay log file and passed through to an applier thread.

In some cases, there are multiple instances of each part.

SHOW SLAVE STATUS now produces 54 fields, and interleaves a lot of different information together. As of MySQL 5.7, it had reached such a point where an SQL interface that allowed users to query exactly what is required from the replication status was in order. So, the MySQL team decided to add tables to monitor replication to the performance_schema database.

The benefits provided by the new tables include:

  • Easier to access exactly what is required, through an SQL interface.
  • Pieces of data can be assigned to variables and thus used in stored procedures etc.
  • Easier testing with SELECT item from Performance Schema tables.
  • Logically unrelated information is split into different tables.
  • Cross-reference monitoring data seamlessly by joining with other Performance Schema tables, Information_Schema tables etc.
  • Easier to extend.
  • More flexibility to accommodate a lot of replication information but still be organized and easy to use.

Six tables were created to deliver the above benefits:

  • replication_connection_configuration
  • replication_connection_status
  • replication_execute_configuration
  • replication_execute_status: All replication applier module status information of the replication_execute_status table is further split based on overall stats: coordinator’s status or worker’s status. That gives us two more tables:
  • replication_execute_status_by_coordinator
  • replication_execute_status_by_worker

There are also 8 new raw and summary instrument TRANSACTIONS tables:

  • events_transactions_current
  • events_transactions_history
  • events_transactions_history_long
  • events_transactions_summary_by_account_by_event_name
  • events_transactions_summary_by_host_by_event_name
  • events_transactions_summary_by_thread_by_event_name
  • events_transactions_summary_by_user_by_event_name
  • events_transactions_summary_global_by_event_name

Developers familiar with the Performance Schema interface can even extend the replication tables to provide additional information by adding rows to the tables.

For a request read as a statement from the relay log on a replication slave:

  1. Statements in the relay log are stored and are read as text. There is no network protocol, so the statement/abstract/new_packet instrument is not used. Instead, the initial instrument is statement/abstract/relay_log.
  2. When the statement is parsed, the exact statement type is known. For example, if the request is an INSERT statement, the Performance Schema refines the instrument name from statement/abstract/Query to statement/sql/insert, which becomes the final name.

An Example

When the SQL thread has executed all events from the relay log, it enters MYSQL_BIN_LOG::wait_for_update_relay_log(), which then waits for the “update_cond” condition to be raised. At this point in time, the slave goes idle. Ever since MySQL v5.5, we can track conditions using the Performance Schema. Within the Performance Schema, conditions get timed from the time they were waiting for the condition to be raised by another thread, in other words, the total time that the SQL thread was idle.

The condition waited for within wait_for_update_relay_log() maps to the wait/synch/cond/sql/MYSQL_RELAY_LOG::update_cond event within the Performance Schema. The following code snippet shows a snapshot from the MySQL server that is in the middle of a replication tree ( both the slave and master reside on the same box, so both the relay logs and binary logs are enabled ):

mysql> select * from events_waits_summary_global_by_event_name order by sum_timer_wait desc limit 5; +------------------------------------------------------------+------------+--------------------+----------------+----------------+----------------+ | EVENT_NAME | COUNT_STAR | SUM_TIMER_WAIT | MIN_TIMER_WAIT | AVG_TIMER_WAIT | MAX_TIMER_WAIT | +------------------------------------------------------------+------------+--------------------+----------------+----------------+----------------+ | wait/synch/cond/sql/MYSQL_RELAY_LOG::update_cond | 595360 | 677877417470900396 | 15021936 | 1138128892553 | 20142735786288 | | wait/synch/cond/sql/MYSQL_BIN_LOG::update_cond | 60466 | 167695654597814052 | 187124608 | 2773387450762 | 84468739641228 | | wait/io/file/myisam/kfile | 3237163 | 844640545485132 | 696600 | 260919992 | 1122678434952 | | wait/io/file/myisam/dfile | 4156623 | 377275756973940 | 206712 | 90764968 | 2911791051108 | | wait/synch/cond/sql/COND_queue_state | 11 | 323434243065288 | 166075717824 | 29184140278662 | 59224373303796 | +------------------------------------------------------------+------------+--------------------+----------------+----------------+----------------+

The wait/synch/cond/sql/MYSQL_BIN_LOG::update_cond is a similar condition that is used by the IO thread logged in to the master, to signal that new events have been written to the binary log.

Monitoring Row-based Replication

The current progress of the replication thread when using row-based replication is also monitored through Performance Schema instrument stages, enabling you to track the processing of operations and check the amount of work completed as well as the estimated work. When these Performance Schema instrument stages are enabled, the events_stages_current table shows stages for replication threads and their progress.


To track progress of all three row-based replication event types (write, update, delete):

Enable the three Performance Schema stages by issuing:

mysql> UPDATE performance_schema.setup_instruments SET ENABLED = ‘YES’
-> WHERE NAME LIKE ‘stage/sql/Applying batch of row changes%’;

Wait for some events to be processed by the replication and then check on its progress by looking at the events_stages_current table. For example to get progress for update events use the following statement:

mysql> SELECT WORK_COMPLETED, WORK_ESTIMATED FROM performance_schema.events_stages_current
-> WHERE EVENT_NAME LIKE ‘stage/sql/Applying batch of row changes (update)’

If binlog_rows_query_log_events is enabled, information about queries is stored in the binary log and is stored in the processlist_info column. Hence, here’s the SELECT statement to see the original query that triggered the event:

mysql> SELECT db, processlist_state, processlist_info FROM performance_schema.threads
-> WHERE processlist_state LIKE ‘stage/sql/Applying batch of row changes%’ AND thread_id = N;

An Example

Here’s an example that shows Replication Applier Status Details:

mysql> select * from performance_schema.replication_applier_status_by_coordinator;




| | 24 | ON | 0 | | 0000-00-00 00:00:00 |

| 56-cluster | 26 | ON | 0 | | 0000-00-00 00:00:00 |


mysql> select * from performance_schema.replication_applier_status_by_worker;




| | 1 | 27 | ON | <GTID SET> | 0 | | 0000-00-00 00:00:00 |

| | 2 | 29 | ON | <GTID SET> | 0 | | 0000-00-00 00:00:00 |

| | 3 | 31 | ON | <GTID SET> | 0 | | 0000-00-00 00:00:00 |

| | 4 | 33 | ON | <GTID SET> | 0 | | 0000-00-00 00:00:00 |

| 56-cluster | 1 | 28 | ON | <GTID SET> | 0 | | 0000-00-00 00:00:00 |

| 56-cluster | 2 | 30 | ON | <GTID SET> | 0 | | 0000-00-00 00:00:00 |

| 56-cluster | 3 | 32 | ON | <GTID SET> | 0 | | 0000-00-00 00:00:00 |

| 56-cluster | 4 | 34 | ON | <GTID SET> | 0 | | 0000-00-00 00:00:00 |



In today’s blog, we learned about new Performance Schema tables that can be utilized for monitoring MySQL 8 master-slave and group (cluster) replication. We also reviewed some of the key differences between statement-based and row-based logging. In a future blog we’ll see how Monyog makes monitoring replication easier.


The post Monitoring Master-Slave Replication in MySQL 8 appeared first on Monyog Blog.

MySQL Performance : 8.0 GA on IO-bound Sysbench OLTP with Optane -vs- SSD

MySQL Performance on IO-bound workloads is still extremely depending on the underlaying storage layer (thus is directly depending on your Storage Performance).. Indeed, flash storage is definitively changing the game, but even with flash there is, as usual, "flash and flash" -- all storage vendors are improving their stuff constantly, so every time you have something new to discover and to learn ;-)) During all my MySQL 8.0 GA tests I was very pleasantly surprised by IO performance delivered by Intel Optane SSD. However, what the storage device can deliver alone on pure IO tests is not at all the same to what you could observe when it's used by MySQL -- unfortunately, in the past I've observed many cases when with a device claimed to be x2 times faster we were even not observing 10% gain.. But MySQL 8.0 is probably the most best placed MySQL version today to re-visit all this IO-bound story (there are many "under-hood" changes in the code helping to use your storage more efficiently) -- well, we're definitively yet very far from "perfect" ;-)) -- but again, as usual -- "everything is relative".. And for the "relative" point for this article I'll propose you to get a look on 2 different flash drives from Intel :

yes, both devices are from Intel, so it's only Intel -vs- Intel (e.g. no blame for "competition" post ;-))
so far, if you'll look on their specs, both drives are looking pretty good for avg. IO-bound workloads, except that Optane drive is claimed to be x5 times faster in all categories (and it's particularly impressive with its low IO latency -- see 1M IO-bound QPS with MySQL 8.0 -- so, can confirm at least for latency ;-)) -- however, will the x5 claim still be valid in different conditions / workloads ?
For my testing I'll use the same config as before (the same Skylake server as well), and the key config points are :
  • InnoDB page size : 16K
  • trx_commit : 1 (flush REDO on every commit)
  • single flash drive (just EXT4 on top of single drive, no RAID, no LVM, etc..)
  • InnoDB Buffer Pool (BP) : 128G / 64G / 32G (will vary along with test scenarios)

(no double write, no binlog for the moment -- these ones will be covered later, patience ;-))..
So far, let's start first with so called "in-memory" Sysbench OLTP_RW workload :
  • test workload : Sysbench OLTP_RW uniform
  • concurrent users : 1, 2, 4, .. 1024
  • data volume : 8 tables of 10M rows each (~20GB)
  • InnoDB Buffer Pool (BP) : 128GB
  • so, all the data will be fully cached in BP, thus NO any IO reads (the reason to be so called "in-memory")
  • however, there will be definitively IO writes (as there are writes in OLTP_RW ;-))
  • and in our case, if all is going well, there will be only 2 types of writes :
    • 1) dirty pages flushing
    • 2) REDO log writes
  • the 1) is going in background, and as soon as your flushing is fast enough -- your performance is not impacted
  • the 2) are pure sequential writes only mixed with periodic fsync(), and generally as soon as you use a good flash drive for your REDO files -- you're fine ;-))

Ok, what about results ?
Sysbench OLTP_RW 10Mx8-tables BP = 128GB : Intel SSD -vs- Intel Optane Comments :
  • on the left side of the graph you can see TPS level of MySQL 8.0 running on SSD, and on the right side -- the same but on Optane
  • concurrent users level is progressing from 1 to 2, 4, 8, .. 1024
  • and from TPS results you can see that up to 16 users there is nearly no TPS difference between SSD and Optane
  • then Optane is slightly better on 64usr level, and definitively better on higher load
  • but you don't see any sign of expected x5 difference, right ?
  • and if you'll stop here, you could just say "all this Optane story is b*shit !", Dimitri, WFT ?.. ;-))

Indeed, on such a small data volume and having all the active data set cached in BP you'll not see much difference with a faster storage.. -- however, what will happen if your data will grow ?..

Read more... (9 min remaining to read)

New Whitepaper: Disaster Recovery Planning for MySQL & MariaDB

We’re happy to announce that our new whitepaper Disaster Recovery Planning for MySQL & MariaDB is now available to download for free!

Database outages are almost inevitable and understanding the timeline of an outage can help us better prepare, diagnose and recover from one. To mitigate the impact of downtime, organizations need an appropriate disaster recovery (DR) plan. However, it makes no business sense to abstract the cost of a DR solution from the design of it, so organizations have to implement the right level of protection at the lowest possible cost.

This white paper provides essential insights into how to build such a plan, discussing the database mechanisms involved as well as how these mechanisms can be fully automated with ClusterControl, a management platform for open source database systems.

Topics included in this whitepaper are…

  • Business Considerations for Disaster Recovery
    • Is 100% Uptime Possible?
    • Analysing Risk
    • Assessing Business Impact
  • Defining Disaster Recovery
    • Recovery Time Objectives
    • Recovery Point Objectives
  • Disaster Recovery Tiers
    • Offsite Data
    • Backups and Hot Sites

Download the whitepaper today!

ClusterControl Single Console for Your Entire Database Infrastructure Find out what else is new in ClusterControl Install ClusterControl for FREE About the Author

Vinay Joosery, CEO & Co-Founder, Severalnines

Vinay Joosery, CEO, Severalnines, is a passionate advocate and builder of concepts and business around distributed database systems. Prior to co-founding Severalnines, Vinay held the post of Vice-President EMEA at Pentaho Corporation - the Open Source BI leader. He has also held senior management roles at MySQL / Sun Microsystems / Oracle, where he headed the Global MySQL Telecoms Unit, and built the business around MySQL's High Availability and Clustering product lines. Prior to that, Vinay served as Director of Sales & Marketing at Ericsson Alzato, an Ericsson-owned venture focused on large scale real-time databases.

Related resources  Download the Whitepaper  Cloud Disaster Recovery for MariaDB and MySQL  Several Ways to Intentionally Fail or Crash your MySQL Instances for Testing About ClusterControl

ClusterControl is the all-inclusive open source database management system for users with mixed environments that removes the need for multiple management tools. ClusterControl provides advanced deployment, management, monitoring, and scaling functionality to get your MySQL, MongoDB, and PostgreSQL databases up-and-running using proven methodologies that you can depend on to work. At the core of ClusterControl is it’s automation functionality that lets you automate many of the database tasks you have to perform regularly like deploying new databases, adding and scaling new nodes, running backups and upgrades, and more.

To learn more about ClusterControl click here.

Tags:  MySQL MariaDB disaster recovery whitepaper

Fixing ER_MASTER_HAS_PURGED_REQUIRED_GTIDS when pointing a slave to a different master

GTID replication has made it convenient to setup and maintain MySQL replication. You need not worry about binary log file and position thanks to GTID and auto-positioning. However, things can go wrong when pointing a slave to a different master. Consider a situation where the new master has executed transactions that haven’t been executed on the old master. If the corresponding binary logs have been purged already, how do you point the slave to the new master?

The scenario

Based on technical requirements and architectural change, there is a need to point the slave to a different master by

  1. Pointing it to another node in a PXC cluster
  2. Pointing it to another master in master/master replication
  3. Pointing it to another slave of a master
  4. Pointing it to the slave of a slave of the master … and so on and so forth.

Theoretically, pointing to a new master with GTID replication is easy. All you have to do is run:


Alas, in some cases, replication breaks due to missing binary logs:

*************************** 1. row *************************** Slave_IO_State: Master_Host: pxc_57_5 Master_User: repl Master_Port: 3306 **redacted** Slave_IO_Running: No Slave_SQL_Running: Yes ** redacted ** Last_IO_Errno: 1236 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.' ** redacted ** Master_Server_Id: 1 Master_UUID: 4998aaaa-6ed5-11e8-948c-0242ac120007 Master_Info_File: /var/lib/mysql/ ** redacted ** Last_IO_Error_Timestamp: 180613 08:08:20 Last_SQL_Error_Timestamp: ** redacted ** Retrieved_Gtid_Set: Executed_Gtid_Set: 1904cf31-912b-ee17-4906-7dae335b4bfc:1-3 Auto_Position: 1

The strange issue here is that if you point the slave back to the old master, replication works just fine. The error says that there are missing binary logs in the new master that the slave needs. If there’s no problem with replication performance and the slave can easily catch up, then it looks like there are transactions executed in the new master that have not been executed in the old master but are recorded in the missing binary logs. The binary logs are most likely lost due to manually purging with PURGE BINARY LOGS or automatic purging if expire_logs_days is set.

At this point, it would be prudent to check and sync old master and new master with tools such as pt-table-checksum and pt-table-sync. However, if a consistency check has been performed and no differences have been found, or there’s confidence that the new master is a good copy—such as another node in the PXC cluster—you can follow the steps below to resolve the problem.


To solve the problem, the slave needs to execute the missing transactions. But since these transactions have been purged, the steps below provide the workaround.

Step 1 Find the GTID sequences that are purged from the new master that is needed by the slave

To identify which GTID sequences are missing, run SHOW GLOBAL VARIABLES LIKE 'gtid_purged'; and SHOW MASTER STATUS; on the new master and SHOW GLOBAL VARIABLES LIKE 'gtid_executed'; on the slave:

New Master:

mysql> SHOW GLOBAL VARIABLES LIKE 'gtid_purged'; +---------------+-------------------------------------------------------------------------------------+ | Variable_name | Value | +---------------+-------------------------------------------------------------------------------------+ | gtid_purged | 1904cf31-912b-ee17-4906-7dae335b4bfc:1-2, 4998aaaa-6ed5-11e8-948c-0242ac120007:1-11 | +---------------+-------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> SHOW MASTER STATUS; +------------------+----------+--------------+------------------+-------------------------------------------------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-------------------------------------------------------------------------------------+ | mysql-bin.000004 | 741 | | | 1904cf31-912b-ee17-4906-7dae335b4bfc:1-6, 4998aaaa-6ed5-11e8-948c-0242ac120007:1-11 | +------------------+----------+--------------+------------------+-------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)


mysql> SHOW GLOBAL VARIABLES LIKE 'gtid_executed'; +---------------+------------------------------------------+ | Variable_name | Value | +---------------+------------------------------------------+ | gtid_executed | 1904cf31-912b-ee17-4906-7dae335b4bfc:1-3 | +---------------+------------------------------------------+ 1 row in set (0.00 sec)

Take note that 1904cf31-912b-ee17-4906-7dae335b4bfc and 1904cf31-912b-ee17-4906-7dae335b4bfc are UUIDs and refer to the MySQL instance where the transaction originated from.

Based on the output:

  • The slave has executed 1904cf31-912b-ee17-4906-7dae335b4bfc:1-3
  • The new master has executed 1904cf31-912b-ee17-4906-7dae335b4bfc:1-6 and 4998aaaa-6ed5-11e8-948c-0242ac120007:1-11
  • The new master has purged 1904cf31-912b-ee17-4906-7dae335b4bfc:1-2 and 4998aaaa-6ed5-11e8-948c-0242ac120007:1-11

This means that the slave has no issue with 1904cf31-912b-ee17-4906-7dae335b4bfc it requires sequences 4-6 and sequences 3-6 are still available in the master. However, the slave cannot fetch sequences 1-11 from 4998aaaa-6ed5-11e8-948c-0242ac120007 because these has been purged from the master.

To summarize, the missing GTID sequences are 4998aaaa-6ed5-11e8-948c-0242ac120007:1-11.

Step 2: Identify where the purged GTID sequences came from

From the SHOW SLAVE STATUS output in the introduction section, it says that the Master_UUID is 4998aaaa-6ed5-11e8-948c-0242ac120007, which means the new master is the source of the missing transactions. You can also verify the new Master’s UUID by running SHOW GLOBAL VARIABLES LIKE 'server_uuid';

mysql> SHOW GLOBAL VARIABLES LIKE 'server_uuid'; +---------------+--------------------------------------+ | Variable_name | Value | +---------------+--------------------------------------+ | server_uuid | 4998aaaa-6ed5-11e8-948c-0242ac120007 | +---------------+--------------------------------------+ 1 row in set (0.00 sec)

If the new master’s UUID does not match the missing GTID, it is most likely that this missing sequence came from its old master, another master higher up the chain or from another PXC node. If that other master still exists, you can run the same query on those masters to check.

The missing sequences are small such as 1-11. Typically, commands executed locally are due to performing maintenance on this server directly. For example, creating users, fixing privileges or updating passwords. However, you have no guarantee that this is the reason, since the binary logs have already been purged. If you still want to point the slave to the new master, proceed to step 3 or step 4.

Step 3. Injecting the missing transactions on the slave with empty transactions

The workaround is to pretend that those missing GTID sequences have been executed on the slave by injecting 11 empty transactions as instructed here by running:


It looks tedious, but a simple script can automate this:

cat #!/bin/bash uuid=$1 first_sequence_no=$2 last_sequence_no=$3 while [ "$first_sequence_no" -le "$last_sequence_no" ] do echo "SET GTID_NEXT='$uuid:$first_sequence_no';" echo "BEGIN;COMMIT;" first_sequence_no=`expr $first_sequence_no + 1` done echo "SET GTID_NEXT='AUTOMATIC';" bash 4998aaaa-6ed5-11e8-948c-0242ac120007 1 11 SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:1'; BEGIN;COMMIT; SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:2'; BEGIN;COMMIT; SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:3'; BEGIN;COMMIT; SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:4'; BEGIN;COMMIT; SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:5'; BEGIN;COMMIT; SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:6'; BEGIN;COMMIT; SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:7'; BEGIN;COMMIT; SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:8'; BEGIN;COMMIT; SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:9'; BEGIN;COMMIT; SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:10'; BEGIN;COMMIT; SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:11'; BEGIN;COMMIT; SET GTID_NEXT='AUTOMATIC';

Before executing the generated output on the slave, stop replication first:

mysql> STOP SLAVE; Query OK, 0 rows affected (0.00 sec) mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:1'; Query OK, 0 rows affected (0.00 sec) mysql> BEGIN;COMMIT; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:2'; Query OK, 0 rows affected (0.00 sec) mysql> BEGIN;COMMIT; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:3'; Query OK, 0 rows affected (0.00 sec) mysql> BEGIN;COMMIT; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:4'; Query OK, 0 rows affected (0.00 sec) mysql> BEGIN;COMMIT; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:5'; Query OK, 0 rows affected (0.00 sec) mysql> BEGIN;COMMIT; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.01 sec) mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:6'; Query OK, 0 rows affected (0.00 sec) mysql> BEGIN;COMMIT; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.01 sec) mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:7'; Query OK, 0 rows affected (0.00 sec) mysql> BEGIN;COMMIT; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.01 sec) mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:8'; Query OK, 0 rows affected (0.00 sec) mysql> BEGIN;COMMIT; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.01 sec) mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:9'; Query OK, 0 rows affected (0.00 sec) mysql> BEGIN;COMMIT; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:10'; Query OK, 0 rows affected (0.00 sec) mysql> BEGIN;COMMIT; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:11'; Query OK, 0 rows affected (0.00 sec) mysql> BEGIN;COMMIT; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) mysql> SET GTID_NEXT='AUTOMATIC'; Query OK, 0 rows affected (0.00 sec)

There’s also an even easier solution of injecting empty transactions by using mysqlslavetrx from MySQL utilities. By stopping the slave first and running
mysqlslavetrx --gtid-set=4998aaaa-6ed5-11e8-948c-0242ac120007:1-11 --slaves=root:password@:3306 you will achieve the same result as above.

By running SHOW GLOBAL VARIABLES LIKE 'gtid_executed'; on the slave you can see that sequences 4998aaaa-6ed5-11e8-948c-0242ac120007:1-11 have been executed already:

mysql> SHOW GLOBAL VARIABLES LIKE 'gtid_executed'; +---------------+-------------------------------------------------------------------------------------+ | Variable_name | Value | +---------------+-------------------------------------------------------------------------------------+ | gtid_executed | 1904cf31-912b-ee17-4906-7dae335b4bfc:1-3, 4998aaaa-6ed5-11e8-948c-0242ac120007:1-11 | +---------------+-------------------------------------------------------------------------------------+ 1 row in set (0.01 sec)

Resume replication and check if replication is healthy by running START SLAVE; and SHOW SLAVE STATUS\G

mysql> START SLAVE; Query OK, 0 rows affected (0.01 sec) mysql> SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: pxc_57_5 Master_User: repl Master_Port: 3306 ** redacted ** Slave_IO_Running: Yes Slave_SQL_Running: Yes ** redacted ** Seconds_Behind_Master: 0 ** redacted ** Master_Server_Id: 1 Master_UUID: 4998aaaa-6ed5-11e8-948c-0242ac120007 ** redacted ** Retrieved_Gtid_Set: 1904cf31-912b-ee17-4906-7dae335b4bfc:4-6 Executed_Gtid_Set: 1904cf31-912b-ee17-4906-7dae335b4bfc:1-6, 4998aaaa-6ed5-11e8-948c-0242ac120007:1-11 Auto_Position: 1 ** redacted ** 1 row in set (0.00 sec)

At this point, we have already solved the problem. However, there’s another way to restore the slave much faster but at the cost of erasing all the existing binary logs on the slave as mentioned in this article. If you want to do this, proceed to step 4.

Step 4. Add the missing sequences to GTID_EXECUTED by modifying GTID_PURGED.

If you followed the steps in Step 3, you do not need to perform Step 4!

To add the missing transactions, you’ll need to stop the slave, reset the master, place the original value of gtid_executed and the missing sequences in gtid_purged variable. A word of caution on using this method: this will purge the existing binary logs of the slave.

mysql> STOP SLAVE; Query OK, 0 rows affected (0.02 sec) mysql> RESET MASTER; Query OK, 0 rows affected (0.02 sec) mysql> SET GLOBAL gtid_purged="1904cf31-912b-ee17-4906-7dae335b4bfc:1-3,4998aaaa-6ed5-11e8-948c-0242ac120007:1-11"; Query OK, 0 rows affected (0.02 sec)

Similar to Step 3, running SHOW GLOBAL VARIABLES LIKE 'gtid_executed'; on the slave shows that sequence 4998aaaa-6ed5-11e8-948c-0242ac120007:1-11 has been executed already:

mysql> SHOW GLOBAL VARIABLES LIKE 'gtid_executed'; +---------------+-------------------------------------------------------------------------------------+ | Variable_name | Value | +---------------+-------------------------------------------------------------------------------------+ | gtid_executed | 1904cf31-912b-ee17-4906-7dae335b4bfc:1-3, 4998aaaa-6ed5-11e8-948c-0242ac120007:1-11 | +---------------+-------------------------------------------------------------------------------------+ 1 row in set (0.01 sec)

Run START SLAVE; and SHOW SLAVE STATUS\G to resume replication and check if replication is healthy:

mysql> START SLAVE; Query OK, 0 rows affected (0.01 sec) mysql> SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: pxc_57_5 Master_User: repl Master_Port: 3306 ** redacted ** Slave_IO_Running: Yes Slave_SQL_Running: Yes ** redacted ** Seconds_Behind_Master: 0 ** redacted ** Master_Server_Id: 1 Master_UUID: 4998aaaa-6ed5-11e8-948c-0242ac120007 ** redacted ** Retrieved_Gtid_Set: 1904cf31-912b-ee17-4906-7dae335b4bfc:4-6 Executed_Gtid_Set: 1904cf31-912b-ee17-4906-7dae335b4bfc:1-6, 4998aaaa-6ed5-11e8-948c-0242ac120007:1-11 Auto_Position: 1 ** redacted ** 1 row in set (0.00 sec)

Step 5. Done Summary

In this article, I demonstrated how to point the slave to a new master even if it’s missing some binary logs that need to be executed. Although, it is possible to do so with the workarounds shared above, it is prudent to check the consistency of the old and new master first before switching the slave to the new master.

The post Fixing ER_MASTER_HAS_PURGED_REQUIRED_GTIDS when pointing a slave to a different master appeared first on Percona Database Performance Blog.

Upgrading to MySQL 8.0 with Spatial Data

The big change from MySQL 5.7 to 8.0 when it comes to spatial data, is the support for multiple spatial reference systems and geographic computations. This means that the SRIDs of geometries actually have meaning and affect computations. In 5.7 and earlier, however, the SRIDs are ignored, and all computations are Cartesian.…

On Innovation. Interview with Scott McNealy

“We made it a point to hire really smart, visionary people and then let them do their work.
I wanted to delegate and let people be in charge of things. My own decision-making process was to decide who got to decide. To make decisions, you have to first outline the problem, and if you hire really great people, they’re going to know more about the problem they’re dealing with than you ever will.”–Scott McNealy

I have interviewed Scott McNealy. Scott is a Silicon Valley pioneer, most famous for co-founding Sun Microsystems in 1982. We talked about Innovation, AI, Big Data, Redis, Curriki and Wayin.


Q1. You co-Founded Sun Microsystems in 1982, and served as CEO and Chairman of the Board for 22 years. What are the main lessons learned in all these years?

Scott McNealy: Companies and technologies can go down as fast as they go up. Smart people make leading easier. Managerial courage is hard to find. Tech has the shelf life of a banana. Being CEO is a young person’s game.
Luck helps. Data is the new oil. It used to be Microsoft and Apple, but now it’s Google, Facebook and Amazon. As things continue to evolve, it’ll be someone else tomorrow. Have lunch or be lunch, or get tenure and do lunch.

Q2.​ ​You piloted the company from startup to legendary Silicon Valley giant. What were the main reasons for this success?

Scott McNealy: We made it a point to hire really smart, visionary people and then let them do their work.
I wanted to delegate and let people be in charge of things. My own decision-making process was to decide who got to decide. To make decisions, you have to first outline the problem, and if you hire really great people, they’re going to know more about the problem they’re dealing with than you ever will.
Then, once the parameters of each problem were clear, we would make decisions and take decisive action quickly. There was no time to grind out and wait until answers became obvious because competition wasn’t going to wait. I put a lot of pressure on my managers to be quick decision makers and let them know that it was okay to fail if they failed fast, applied that to their next endeavor.

Q3. What are in your opinion the top 3 criteria for successful innovation?

Scott McNealy: Quality of the people with whom you hire; participative but not consensus management, listening to both customers and engineers.

Q4. In your opinion how can we create a culture that supports and sustains innovation?

Scott McNealy: You have to have a common purpose that everyone can rally around. At Sun, we always had a cause and a very specific set of enemies, and while we drew from a diverse array of backgrounds, ideas and cultures, we always maintained that emphasis. Diversity of race, culture, gender was not as important as having a diversity of backgrounds and perspectives while most importantly having a commonality of purpose. Fostering that commonality of purpose is what really great organizations do and something I am very proud of from my time at Sun.

Q5. What do you think stops/slows down innovation?

Scott McNealy: Innovation and entrepreneurship are inherently risky, so a general climate averse to risk is a definite innovation killer. I’ve spoken in Japan on numerous occasions, and as smart and industrious as the people are, the innovation is lacking solely because of the pervasive cultural aversion to failure. Failure happens. Get over it. Just try to make bets small enough that they won’t bury the company if they fail. The other problem comes when managers are afraid to cannibalize their own products and revenue. If you dont, someone else surely will.

Q6. Do you think becoming an innovator can be taught?

Scott McNealy: Innovation is probably like leadership. It really can’t be taught but it can be identified and allowed to succeed. Not all employees are equally talented. Bill Joy and Andy Bechtolsheim and James Gosling are special people. They had a virtual carte blanche to do what they thought was right. And they were usually right. So making a company innovative is correctly identifying the innovators and letting them do their thing.

Q7. You have been quoted saying “​In a world without fences, who needs gates?​” Do you really believe that people can’t hide things anymore?

Scott McNealy: People haven’t been able to hide anything for a long time—it’s just coming to light more prominently now. I’ve been saying since the 90’s that privacy is a myth. A lot of people bristled when I said it then, but with all the news around Facebook and Cambridge Analytica, it’s staring them in the face now. The only organization that should terrify you regarding privacy is government. They are the ones that can actually negatively impact your life through IRS audits, student loan grants/ forgiveness, health care choices, judicial decisions, etc. It is truly scary how much the government knows about you and how much they can change your life. You can leave FB but you cant escape the Feds or even the local governments with their regulations and permits, etc.

Q8. Big Data, Machine Learning, AI driven algorithms: How these new technologies will change the world (for the better or the worst)?

Scott McNealy: All these new technologies will drive massive changes in the landscape from a business sense—the gorillas of today are not the gorillas of tomorrow. Of the three, though, big data is the most overarching and still the largest challenge organizations face today. And frankly it is something that will only compound as the years go on.
First and foremost, machine learning and AI driven algorithms are a byproduct of big data, and the more sophisticated those technologies become, the more data, insights and inferences can be made – ultimately resulting in more data. The ability to not only store, but make better use of data, will always be at the center of innovation.
Opening that whole data market to innovation is why we bought MySQL when I was at Sun — it’s a huge opportunity and a necessary step in the evolution of the technology. Now, as data continues to get bigger and the use cases and applications require faster transactions and 99.99% uptime, it’s critical that organizations have the right database for the right time.
At this point, however, web applications and the infinite number of edge and IoT use cases are multiplying by the day and a common theme underlying each of them is speed and reliability. And because of this shifting landscape, the incumbents are not in the best position to drive that innovation. This is why new approaches and open-source friendly solutions such as Redis are so critical.

Q9: Why did you join Redis Labs’ advisory board?

Scott McNealy: I love the management team, the technology, and their desire to use me as a mentor and advisor.

Q10. Curriki: What is the road ahead?

Scott McNealy: Curriki (curricula + Wiki) was founded over 11 years ago with the idea of creating a free and open community for teachers, parents and students where they could go to find, share, and contribute learning resources. It was modeled after the success of the Java Developer Community. At the time this was a completely new and innovative concept for K-12.

Today we are the leaders in this space with close to 15 Million users from 194 countries around the world that have touched over 200 Million students since our inception. We have over 250,000 multi media learning resources and over 1,000 groups of educators collaborating on creating free and open content. Our vision is to build out this high quality content into full K-12 multi media curricula, that can be used in the classroom or self paced. We want it to be real time scored so that students and educators know exactly where they stand and what resources they need to access to improve their outcomes.

Q11. Wayin: What is the road ahead?

Scott McNealy: Wayin has been used by major brands such as Coca Cola, IBM, P&G, and Nordstrom a number of years to create interactive experiences that collect first party data. These experiences have lived on clients owned and earned channels.
Wayin has just released the capability to publish these data collecting experiences inside Google DoubleClick ad units, Snapchat ads, as well as call to actions on Instagram and Facebook ads. Being able to deliver paid media ads, that aren’t just trying to force someone to watch a video or look at an image, but instead deliver an interactive experience, where the brand can collect first party data and marketing optins in return for entering into something like a competition is exciting. It’s changing the relationship of ads being interruptive on digital channels to something participatory with a value exchange. And given the risks brands now have with using third party data for targeting ads, after the Cambridge Analytica fallout and the EU’s enforcement of GDPR, marketers are now all trying to build up their own first party datasets around consumers. So this solution is timely.

Scott McNealy (@ScottMcNealy)
Co-Founder, Former Chairman of the Board, and CEO, Sun Microsystems, Inc.
Co-Founder, and Board Member, Curriki

Co-Founder, and Executive Chairman of the Board, Wayin
Board Member, San Jose Sharks Sports and Entertainment

Scott McNealy is an outspoken advocate for personal liberty, small government, and free-market competition.
In 1982, he co-Founded Sun Microsystems and served as CEO and Chairman of the Board for 22 years. He piloted the company from startup to legendary Silicon Valley giant in computing infrastructure, network computing, and open source software.
Today McNealy is heavily involved in advisory roles for companies that range from startup stage to large corporations, including Curriki and Wayin. Curriki (curriculum + wiki) is an independent 501(c)(3) organization working toward eliminating the education divide by providing free K-12 curricula and collaboration tools through an open-source platform. Wayin, the Digital Campaign CMS platform enables marketers and agencies to deliver authentic interactive campaign experiences across all digital properties including web, social, mobile and partner channels. Wayin services more than 300 brands across 80 countries and 10 industries.
Scott McNealy is an enthusiastic ice hockey fan, and an avid golfer with a single digit handicap. He resides in California with his wife, Susan, and they have 4 sons.


– Technology Pioneer Scott McNealy Joins Redis Labs Advisory Board

Related Posts

– On Artificial Intelligence, Machine Learning, and Deep Learning. Interview with Pedro Domingos. ODBMS Industry Watch, 2018-06-18

– On Technology Innovation, AI and IoT. Interview with Philippe Kahn. ODBMS Industry Watch,  2018-01-27

Follow us on Twitter: @odbmsorg


What's Right and What's Wrong With Oracle's Way of MySQL Server Development

Recently it's quite common to state that "Oracle's Acquisition Was Actually the Best Thing to Happen to MySQL". I am not going to argue with that - Oracle proved over years that they are committed to continue active development of this great open source RDBMS, and they have invested a lot into making it better and implementing features that were missed or became important recently. Unlike Sun Microsystems, they seem to clearly know what to do with this software to make it more popular and make money on it.

Among the right things Oracle does for MySQL server development I'd like to highlight the following:
  1. MySQL server development continues, with new features added, most popular OSes supported, regular releases happened and source code still published at GitHub under GPL license.
  2. Oracle continues to maintain public MySQL bugs database and fix bugs reported there.
  3. Oracle accepts external contributions to MySQL server under clear conditions. They acknowledge contributions in public. The release notes, in particular, mention authors of each community-provided patch.
  4. Oracle works hard on improving performance and scalability of MySQL server.
  5. Oracle tries to provide good background for their new MySQL designs (check new InnoDB redo logging, for example).
  6. Oracle cooperates with MySQL Community. They organize their own community events and participate in numerous related conferences, including (but not limited to) the biggest Percona Live ones. Oracle engineers speak and write about their work in progress. Oracle seems to actively support some open source tools that work with MySQL server, like ProxySQL.
  7. Oracle still keeps and maintains pluggable storage engine architecture and plugin APIs, even though their own development is recently mostly related to InnoDB storage engine.
  8. Oracle still maintains and improves public MySQL Manual.
So, Oracle is doing good with MySQL, and dozens of their customers, community members and MySQL experts keep stating this all the time. But, as a former and current "MySQL Entomologist" (somebody who worked on processing MySQL bug reports from community and reported MySQL bugs for 13 years), I clearly see problems with the way Oracle handles MySQL server development. I write and speak about these problems in public since the end of 2012 or so, and would like to summarize them in this post.
MySQL's future is bright, but there are some clouds

Here is the list of problems I see:
  1. Oracle does not develop MySQL server in a true open source way.
  2. Oracle does not care enough to maintain public bugs database properly.
  3. Some older MySQL features remain half-backed, not well tested, not properly integrated with each other and new features, and not documented properly, for years.
    In general, Oracle's focus seem to be more on new developments and cool features for MySQL (with some of them got ignored and going nowhere with time).
  4. Oracle's internal QA efforts still seem to be somewhat limited.
    We get regression bugs, ASAN failures, debug assertions, crashes, test failures etc in the official releases, and Oracle MySQL still relies a lot on QA by MySQL Community (while not highlighting this fact that much in public).
  5. MySQL Manual still have many details missing and is not fixed fast enough.
    Moreover, it is not open source, so there is no other way for community to fix or improve it other than add comments or report documentation bugs, and wait.
In the upcoming weeks I am going to explain each of these items in a separate post, with some links to my older blog posts, MySQL server bug reports and other sources that should illustrate my points. In the meantime I am open for comments from those who disagree with the theses presented above.

This Week in Data with Colin Charles 44: MongoDB 4.0 and Facebook MyRocks

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

There have been two big pieces of news this week: the release of MongoDB 4.0 and the fact that Facebook has migrated the Messenger backend over to MyRocks.

MongoDB 4.0 is stable, with support for multi-document ACID transactions. I quite like the engineering chalk and talks videos on the transactions page. There are also improvements to help manage your MongoDB workloads in a Kubernetes cluster. MongoDB Atlas supports global clusters (geographically distributed databases, low latency writes, and data placement controls for regulatory compliance), HIPAA compliance, and more. ZDNet calls it the “operational database that is developer friendly”. The TechCrunch take was more focused on MongoDB Atlas, MongoDB launches Global Clusters to put geographic data control within reach of anyone.

In addition to that, I found this little snippet on CNBC featuring Michael Gordon, MongoDB CFO, very interesting: last quarter MongoDB Inc reported 53% year-over-year growth in their subscription revenue business. The fastest-growing piece of the business? Cloud-hosted database as a service offering. They partner with Amazon, Google and Microsoft. They are looking to grow in the Chinese market.

Did you attend MongoDB World 2018? I personally can’t wait to see the presentations. Do not forget to read the MongoDB 4.0 release notes in the manual. Take heed of this important note: “In most cases, multi-document transaction incurs a greater performance cost over single document writes, and the availability of multi-document transaction should not be a replacement for effective schema design.”

As for Facebook Messenger migrating to MyRocks, this blog post is highly detailed: Migrating Messenger storage to optimize performance. This is a migration from the previous HBase backend to MyRocks. End users should notice a more responsive product and better search. For Facebook, storage consumption went down by 90%! The migration methodology to ensure Messenger usage was not disrupted for end users is also worth paying attention to. A more personal note from Yoshinori Matsunobu, as MyRocks is something he’s been spearheading. Don’t forget that you can try out MyRocks in Percona Server for MySQL as well as in MariaDB Server 10.2 and 10.3. To use Zstandard (or zstd for short), Percona Server for MySQL supports this (MariaDB does not, but has varying other compression algorithms).

Have you seen the Percona Open Source Database Community Blog? Jean-François Gagné recently wrote about how he posted on the Community Blog (so a very nice behind the scenes kind of post), and I hope you also read A Nice Feature in MariaDB 10.3: No InnoDB Buffer Pool in Core Dumps. Don’t forget to add this new blog to your RSS feed readers.

Lastly, as a quick note, there will unlikely be a column next week. I’m taking a short vacation, so see you in the following week!

Releases Link List Industry Updates
  • Louis Fahrberger (formerly of Clustrix, MariaDB Corporation, InfoBright and MySQL) is now an Account Executive in Sales for MemSQL.
  • The Wall Street Journal reports on Oracle Cloud and how the business continues to grow. “Revenues from its cloud services businesses jumped 25% year over year to $1.7 billion for its fiscal fourth quarter that ended May 31”.
  • The Financial Times reports on Red Hat sinks as currency swings cloud full-year sales outlook. The CFO, Eric Shander said, “we continue to expect strong demand for our hybrid cloud enabling technologies”.
Upcoming appearances
  • OSCON – Portland, Oregon, USA – July 16-19 2018

I look forward to feedback/tips via e-mail at or on Twitter @bytebot.

The post This Week in Data with Colin Charles 44: MongoDB 4.0 and Facebook MyRocks appeared first on Percona Database Performance Blog.

Percona XtraDB Cluster 5.7.22-29.26 Is Now Available

Percona announces the release of Percona XtraDB Cluster 5.7.22-29.26 (PXC) on June 29, 2018. Binaries are available from the downloads section or our software repositories.

Percona XtraDB Cluster 5.7.22-29.26 is now the current release, based on the following:


The following variables are deprecated starting from this release:

  • wsrep-force-binlog-format
  • wsrep_sst_method = mysqldump

As long as the use of binlog_format=ROW is enforced in 5.7, wsrep_forced_binlog_format variable is much less significant. The same is related to mysqldump, as xtrabackup is now the recommended SST method.

New features
  • PXC-907: New variable wsrep_RSU_commit_timeout allows to configure RSU wait for active commit connection timeout (in microseconds).
  • Percona XtraDB Cluster now supports the keyring_vault plugin, which allows to store the master key in a vault server.
  • Percona XtraDB Cluster  5.7.22 depends on Percona XtraBackup  2.4.12 in order to fully support vault plugin functionality.
Fixed Bugs
  • PXC-2127: Percona XtraDB Cluster shutdown process hung if thread_handling option was set to pool-of-threads due to a regression in  5.7.21.
  • PXC-2128: Duplicated auto-increment values were set for the concurrent sessions on cluster reconfiguration due to the erroneous readjustment.
  • PXC-2059: Error message about the necessity of the SUPER privilege appearing in case of the CREATE TRIGGER statements fail due to enabled WSREP was made more clear.
  • PXC-2061: Wrong values could be read, depending on timing, when read causality was enforced with wsrep_sync_wait=1, because of waiting on the commit monitor to be flushed instead of waiting on the apply monitor.
  • PXC-2073: CREATE TABLE AS SELECT statement was not replicated in case if result set was empty.
  • PXC-2087: Cluster was entering the deadlock state if table had an unique key and INSERT ... ON DUPLICATE KEY UPDATE statement was executed.
  • PXC-2091: Check for the maximum number of rows, that can be replicated as a part of a single transaction because of the Galera limit, was enforced even when replication was disabled with wsrep_on=OFF.
  • PXC-2103: Interruption of the local running transaction in a COMMIT state by a replicated background transaction while waiting for the binlog backup protection caused the commit fail and, eventually, an assert in Galera.
  • PXC-2130: Percona XtraDB Cluster failed to build with Python 3.
  • PXC-2142: Replacing Percona Server with Percona XtraDB Cluster on CentOS 7 with the yum swap command produced a broken symlink in place of the /etc/my.cnf configuration file.
  • PXC-2154: rsync SST is now aborted with error message if used onnode with keyring_vault plugin configured, because it doesn’t support  keyring_vault. Also Percona doesn’t recommend using rsync-based SST for data-at-rest encryption with keyring.
  •  PXB-1544: xtrabackup --copy-back didn’t read which encryption plugin to use from plugin-load setting of the my.cnf configuration file.
  •  PXB-1540: Meeting a zero sized keyring file, Percona XtraBackup was removing and immediately recreating it, and this could affect external software noticing the file had undergo some manipulations.

Other bugs fixed:

PXC-2072 “flush table <table> for export should be blocked with mode=ENFORCING”.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

The post Percona XtraDB Cluster 5.7.22-29.26 Is Now Available appeared first on Percona Database Performance Blog.