Planet MySQL

Manage Your MySQL Databases With SQLyog

SQLyog is a professional-grade GUI tool for the MySQL RDBMS. SQLyog is available for the Windows operating system starting from Windows XP/Windows 2003 to Windows 8/Server 2008 R2. It will also work under Linux, Unix and Mac OS X using the Wine environment. Moreover, a subset of SQLyog Enterprise/Ultimate functionalities are available with the free SQLyog Job Agent (SJA) for Linux as a native Linux utility. This makes it possible to specify and test ‘scheduled jobs’ on a Windows environment and port execution parameters seamlessly to a Linux environment.

Why Use a Database Administration Tool

Database administration tools provide a GUI/Web interface to automate database tasks like querying tables, find and replace operations, and any other task that you might want to run on a database. With respect to MySQL, it offers the free MySQL Workbench. Although it’s far more visually appealing than the command line interface, performance is less than stellar when under a high workload. As such, many users have reported that MySQL Workbench hangs or fails to respond. Frustration has led DBAs to the far superior paid solutions such as SQLyog. It not only delivers more robust operation, but also comes with a variety of useful features that help you get the most out of your DBA tasks.

Throughout the remainder of this blog, we’ll explore how SQLyog helps boost your productivity via some of its unique and value-adding features.

Session Restore

Have you ever experienced the frustration of your system crashing, or accidentally closing your database administration application and losing all your unsaved tabs and queries? If you’re using SQLyog, you can breathe a sigh of relief; we’ve got you covered!

With the Session Restore feature, restore your previous session along with all the query tabs, query builder tabs, schema designer tabs and the history tab just the way they were prior to the crash.

In fact, SQLyog will always restore your previous session on startup by default. However, you can turn off this behavior from the Others tab of the main Preferences dialog. It’s accessible by selecting Tools > Preferences from the main menu.

The Restore session on startup option

Session Savepoints

SQLyog also has the ability to save Session Savepoints and open connections from a save point to the same state as when the save point was generated.

As in the automatic restoring of the previous session on startup, the term “session” encompasses all open connections as well as all Query, Query Builder, Schema Designer tabs opened for every connection as well as the History tab.

The Session Savepoint commands are located on the File menu.

Session Savepoint commands in the File menu

To save the current session, select either Save Session… or Save Session As… from the File menu. Save Session… will save to the current active session (thereby overwriting it) whereas Save Session As… will open the Save As dialog for you to choose your own file name.

Session information is written to a file with a “.ysav” extension. SQLyog employs a SQLite database to keep track of the last saved file’s location.

Automatic and manually saved Session Savepoints can be employed simultaneously by SQLyog. If automatic session restoration is enabled, the last active session will be loaded on startup. Then, any saved Session Savepoints can be accessed via the Open Session Savepoint… command.

Once a saved Session Savepoint has been opened, it may be manually closed by choosing the End Session command from the File menu. That causes SQLyog to stop tracking activity for that session. It will, however, continue to track the session for automatic session restoration if that option is activated.

If for instance, you are a consultant working with many clients, this is a highly useful feature. You can have many savepoints for each client and get access to all the connections & scripts with a single click. And you won’t confuse what belongs to client A with what belongs to client B.

Autocomplete

Autocomplete, or word completion, is a software feature where the application predicts the rest of a word a user is typing. It’s one of the most coveted features in both development and administration software for a couple of very good reasons:

  1. It speeds up human-computer interactions by providing one or more words that the user might intend to enter after only a few characters have been typed into a text input field.
  2. It reduces typos by inserting the exact text of the object that the user wishes to access.

With respect to database administration software like SQLyog, the purpose of the autocomplete feature is to facilitate the insertion of database tables and columns names as well as MySQL keywords and functions into queries and DDL statements.

Available in SQLyog Professional, Enterprise and Ultimate editions, the Autocomplete feature is made possible by the use of a small database engine that is built right into SQLyog. It contains the names of MySQL keywords and functions and for every connection that it uses. It also maintains a list of the object names for each connection for fast retrieval.

In addition to the two general benefits of Autocompletion stated above, there are very real and tangible benefits when dealing with long and/or awkward object names that are often found in RDBMSes. By reducing the possibility of misspellings, there is far less risk of accidentally executing incorrect queries and reaching erroneous conclusions.

Autocomplete basically works in four independent and differing ways.

‘Auto-popup’ or ‘tooltip’

Whenever you type an object name followed by a period (.), a list of available child objects will pop up in a small window. You can then select an item from the list by:

    1. using the up and down arrow keys to move the selection highlighting, and then,
    2. pressing the TAB or ENTER key to select the item.

For instance, try typing a database name like “sakila.” to bring up the list of tables within the sakila database:


Moreover, once you’ve selected a table, entering another period (.) will bring up the popup list once again with a list of columns within that table:




The Autocomplete feature also works with functions. For example, typing the letters “sub” followed by the Ctrl + SPACE key combination will bring up the popup list with functions that match your typed letters:


Typing the name of a function followed by an opening parenthesis “(“ will display the parameter list of that function. Continuing with our previous example, typing “substr(“ will bring up the parameters for the substr() function:


In this case, there are actually four signatures for the substr() function, each with a different parameter list. You can iterate through each of these using the up and down arrows. As you do so, the parameters will change.

The current parameter is highlighted in red.


While writing a routine call the parameter list will highlight the current parameter in red.

The Ctrl+Shift+Space Shortcut

If for whatever reason, the parameter list goes away, such as when invoking a different autocomplete feature, the Ctrl+Shift+Space keyboard shortcut will always display the parameter-list of a routine when the cursor is positioned inside it.

TAB-key functionality

Highlighting a database in the Object Browser and typing in one or more characters in the editor pane will activate the TAB key to act as an object iterator. For every press of the TAB key, the autocomplete to iterate over all possible inputs that start with that character combination. Inputs are a combination of functions, keywords and database objects from the selected database.

For example, say that the film table of the sakila database is selected in the Object Browser. After typing in a value of “fil” in the editor, pressing the TAB key once will select the first matching word – in this case “film”. Pressing the TAB key a second time will change the word to the next matching word – in this instance “film_actor”. Once the desired word is present on the screen, simply continue typing from there.

The TAB key autocomplete functionality in action

CTRL+ENTER functionality.

Similar to the TAB-key functionality, pressing the CTRL+ENTER key combination after typing one or more characters in the editor pane will bring up a small window containing a list of keywords and functions starting with that character combination. You can then use the arrow-keys to navigate to the one you want. Pressing ENTER inserts the selected entry.

CTRL+SPACE functionality

Whenever the cursor is placed in the editor pane pressing CTRL+SPACE always opens a small window with a list of ALL possible keywords and functions. As above use the up and down arrow-keys to select an item and then press ENTER to insert the selected item.

Using Autocomplete with Aliases

Since version 6.5 Autocomplete supports table aliases and column aliases.

Autocomplete on a table alias

In the above image, you will see that all Autocomplete functionalities identify ‘f’ as an alias of the film table.

Autocomplete supports both table and column aliases in SELECT, UPDATE and DELETE statements except in the following cases, which are not supported at this time:

  • SELECT statements used to specify a value for INSERT such as:
    "INSERT INTO mytable (col1) values ((SELECT ...))"
  • SELECT statements used as an argument to a function such as:
    "SELECT IF((SELECT ....),'true','false')"
Column-width Persistence

The Result Tab displays the data from SELECT queries in a Grid view by default. It formats the results in Excel-like rows and columns. (Form and Text views are also available.) The Grid view offers a few benefits, such as sorting: Clicking on the header sorts the results by that column. This sorting is done on the client side without sending any additional queries to the MySQL server.

In SQLyog, the Grid view offers another advantage.

With most other GUI tools using a Data GRID you will need to spend time adjusting the width of column headers every time you open the program. In SQLyog, column widths are persistent across sessions and even across master/slave replicas of same database.

The Result grid

Conclusion

In this blog, we explored the many advantages offered by Database administration tools, in particular, those provided by SQLyog. Its unique features include:

  • Session Restore recalls your previous session along with all the query tabs, query builder tabs, schema designer tabs and the history tab just the way they were before a sudden outage or application shut down.
  • Session Savepoints allow for the manual saving and restoring of sessions to any number of save point states.
  • Available in SQLyog Professional, Enterprise and Ultimate editions, the Autocomplete feature displays suggestions for table, view, column, and routine names, along with their parameters.
    It reduces the possibility of misspellings, there is far less risk of accidentally executing incorrect queries and reaching erroneous conclusions.

Autocomplete works in four ways:

    1. Whenever you type an object name followed by a period (.), a list of available child objects will pop up in a small window.
    2. Highlighting a database in the Object Browser and typing in one or more characters in the editor pane will activate the TAB key to act as an object iterator. For every press of the TAB key, the autocomplete to iterate over all possible inputs that start with that character combination. Inputs are a combination of functions, keywords and database objects from the selected database.
    3. Similar to the TAB-key functionality, pressing the CTRL+ENTER key combination after typing one or more characters in the editor pane will bring up a small window containing a list of keywords and functions starting with that character combination.
    4. Whenever the cursor is placed in the editor pane pressing CTRL+SPACE always opens a small window with a list of ALL possible keywords and functions.

Since version 6.5 Autocomplete supports table aliases and column aliases.

  • With most other GUI tools using a Data GRID, you will need to spend time to adjust the width of column headers every time you open the program. In SQLyog, column widths are persistent across session and even across master/slave replicas of the same database.

SQLyog not only delivers more robust operation than free database administration tools, but also comes with a variety of useful features that help you get the most out of your DBA tasks. Download SQLyog free trial.

The post Manage Your MySQL Databases With SQLyog appeared first on SQLyog Blog.

Fun with Bugs #57 - On MySQL Bug Reports I am Subscribed to, Part I

I've decided to stop reviewing MySQL Release Notes in this series, but it does not mean that I am not interested in MySQL bugs any more. At the moment I am subscribed to 91 active MySQL bugs reported by other MySQL users, and in this blog post I am going to present 15 of them, the most recently reported ones. I'd really want to see them fixed or at least properly processed as soon as possible.

In some cases I am going to add my speculations on how the bug had better be handled, or maybe highlight some important details about it. It is not my job any more to process/"verify" any community bug reports for any kind of MySQL, but I did that for many years and I've spent more than 5 years "on the other side", being a member of Community, so in some cases I let myself to share some strong opinion on what may be done differently from the Oracle side.

As a side note, I started to subscribe to MySQL bugs mostly after I left Oracle, as before that I got email notification about each and every change in every MySQL bug report ever created...

Here is the list, starting from the most recent ones:
  • Bug #88422 - "MySQL 5.7 innodb purge thread get oldest readview could block other transaction". It is one of that bug reports without a test case from reporter. It is tempting to set it to "Verified" just "based on code review", as the code in 5.7 is quite obviously shows both holding the trx_sys->mutex and linear complexity of search depending on number of read views in the worst case (when most of them are closed):
    /**
    Get the oldest (active) view in the system.
    @return oldest view if found or NULL */

    ReadView*
    MVCC::get_oldest_view() const
    {
            ReadView*       view;

            ut_ad(mutex_own(&trx_sys->mutex));

            for (view = UT_LIST_GET_LAST(m_views);
                 view != NULL;
                 view = UT_LIST_GET_PREV(m_view_list, view)) {

                    if (!view->is_closed()) {
                            break;
                    }
            }

            return(view);
    }But probably current Oracle bugs verification rules do not let to just mark it as verified. After all, somebody will have to create a test case... So, my dear old friend Sinisa Milivojevic decided to try to force bug reporter to provide a test case instead of spending some time trying to create one himself. I am not going to blame him for that, why to try the easy way :) But I consider this his statement in the comment dated [10 Nov 16:21]:
    "... 5.7 methods holds no mutex what so ever..."a bit wrong, as we can see the mutex is acquired when get_oldest_view() method is called:void
    MVCC::clone_oldest_view(ReadView* view)
    {
            mutex_enter(&trx_sys->mutex);

            ReadView*       oldest_view = get_oldest_view();

            if (oldest_view == NULL) {
    ...
  • Bug #88381 - "Predicate cannot be pushed down "past" window function". Here bug reporter had provided enough hints for a test case. One can probably just check 'Handler%' status variables before and after query execution to come to the conclusion. Moreover, it seems Oracle developer,  Dag Wanvik, accepted this as a known limitation, but the bug still remains "Open" and nobody knows if it was copied to the internal bugs database, got prioritized and if any work on this is planned any time soon. We shell see. You may also want to monitor MDEV-10855.
  • Bug #88373 - "Renaming a column breaks replication from 5.7 to 8.0 because of impl. collation". This bug was quickly verified by Umesh Shastry. I expect a lot of "fun" for users upgrading to MySQL 8.0 when it becomes GA, especially in replication setups.
  • Bug #88328 - "Performance degradation with the slave_parallel_workers increase". There is no test case, just some general description and ideas about the root case when semi-sync replication is used. I expect this bug to stay "Open" for a long time, as it is a topic for a good research and blog posts like this one, that is, a work for real expert!
  • Bug #88223 - "Replication with no tmpdir space and InnoDB as tmp_storage_engine can break". Here we have clear and simple test case from Sveta Smirnova (no wonder, she also worked at bugs verification team in MySQL, Sun and Oracle). I hope Umesh will verify it soon. As a side note, it is explained (in the comments) elsewhere that InnoDB as internal_tmp_disk_storage_engine may not be the best possible option. We do not have this variable and do not plan to support InnoDB for internal temporary tables in MariaDB 10.2+.
  • Bug #88220 - "compressing and uncompressing InnoDB tables seems to be inconsistent". See also other, older bug reports mentioned there that are duplicates/closely related, but were not getting proper attention.
  • Bug #88150 - "'Undo log record is too big.' error occurring in very narrow range of str length". It was reported by my colleague Geoff Montee and is already fixed in recent versions of MariaDB (see MDEV-14051 for the details and some nice examples of gdb usage by a developer)!
  • Bug #88127 - "Index not used for 'order by' query with utf8mb4 character set". Here I am just curious when bugs like that would be caught up by Oracle QA before any public releases.
  • Bug #88071 - "An arresting Performance degradation when set sort_buffer_size=32M". here the test case is clear - just run sysbench oltp test at high concurrency with different values of sort_buffer_size. Still, Sinisa Milivojevic decided to explain when RAM limit may play a role instead of just showing how it works great (if it does) on any server with enough RAM... Let's see how this attempt to force bug reporter to work/explain more may end up...
  • Bug #87947 - "Optimizer chooses ref over range when access when range access is faster". Nice example of a case when optimizer trace may be really useful. Øystein Grøvlen kindly explained that "range access and ref access are not comparable costs". I wish we get better cost model for such cases in MySQL one day.
  • Bug #87837 - "MySQL 8 does not start after upgrade to 8.03". It is expected actually, and even somewhat documented in the release notes that MySQL 8.0.3 is not compatible to any older version. So, it is more like MySQL Installer (that I do not care much about) bug, but I still subscribed to it as yet another source of potential fun during further upgrade attempts.
  • Bug #87716 - "SELECT FOR UPDATE with BETWEEN AND gets row lock excessively". I think I already studied once why with IN() rows are locked differently by InnoDB comparing to BETWEEN that selects the same rows. But I'd like to know what's the Oracle's take on this, and I'd like to study this specific test case in details one day as well.
  • Bug #87670 - "Force index for group by is not always honored". Clear and simple test case, so no wonder it was immediately verified.
  • Bug #87621 - "Huge InnoDB slowdown when selecting strings without indexes ". I'd like to check with perf one day where the time is spent mostly during this test. For now I think this is a result of the way "long" data are stored on separate pages in InnoDB. What;'s interesting here is also a test case where R is used to generate data set.
  • Bug #87589 - "Documentation incorrectly states that LOAD DATA LOCAL INFILE does not use tmpdir". This was yet another report from my colleague Geoff Montee. lsof is your friend, maybe I have to talk about it one day at FOSDEM (call for papers is still open :) I like to find and follow bugs and missing details in MySQL manual, maybe because I would never be able to contribute to it as a writer directly...

So, this list shows my typical recent interests related to MySQL bugs - mostly InnoDB, optimizer, replication problems, fine manual and just some fun details like the way some Oracle engineers try to avoid working extra hard while processing bugs... I am also happy to know that in some cases MariaDB is able to deliver fixes faster.

New optimizer hint for changing the session system variable.

In MySQL 8.0.3, we have introduced a new optimizer hint called SET_VAR. This hint allows to change the value of a system variable for a single query.

Earlier, changing a system variable for a single query would require four steps:

1) Store current value.…

Shinguz: MariaDB master/master GTID based replication with keepalived VIP

Some of our customers still want to have old-style MariaDB master/master replication clusters. Time goes by, new technologies appear but some old stuff still remains.

The main problem in a master/master replication set-up is to make the service highly available for the application (applications typically cannot deal with more than one point-of-contact). This can be achieved with a load balancer (HAproxy, Galera Load Balancer (GLB), ProxySQL or MaxScale) in front of the MariaDB master/master replication cluster. But the load balancer by it-self should also become highly available. And this is typically achieved by a virtual IP (VIP) in front of one of the load balancers. To make operations of the VIP more handy the VIP is controlled by a service like keepalived or corosync.

Because I like simple solutions (I am a strong believer in the KISS principle) I thought about avoiding the load balancer in the middle and attach the VIP directly to the master/master replication servers and let them to be controlled by keepalived as well.

Important: A master/master replication set-up is vulnerable to split-brain situations. Neither keepalived nor the master/master replication helps you to avoid conflicts and in any way to prevent this situation. If you are sensitive to split-brain situations you should look for Galera Cluster. Keepalived is made for stateless services like load balancers, etc. but not databases.

Set-up a MariaDB master/master replication cluster

Because most of the Linux distributions have a bit old versions of software delivered we use the MariaDB 10.2 repository from the MariaDB website:

# # /etc/yum.repos.d/MariaDB-10.2.repo # # MariaDB 10.2 CentOS repository list - created 2017-11-08 20:32 UTC # http://downloads.mariadb.org/mariadb/repositories/ # [mariadb] name = MariaDB baseurl = http://yum.mariadb.org/10.2/centos7-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1

Then we install the MariaDB server and start it:

shell> yum makecache shell> yum install MariaDB-server MariaDB-client shell> systemctl start mariadb shell> systemctl enabled mariadb

For the MariaDB master/master replication set-up configuration we use the following parameters:

# # /etc/my.cnf # [mysqld] server_id = 1 # 2 on the other node log_bin = binlog-m1 # binlog-m2 on the other node log_slave_updates = 1 gtid_domain_id = 1 # 2 on the other node gtid_strict_mode = On auto_increment_increment = 2 auto_increment_offset = 1 # 2 on the other node read_only = On # super_read_only for MySQL 5.7 and newer

Then we close the master/master replication ring according to: Starting with empty server.

mariadb> SET GLOBAL gtid_slave_pos = ""; mariadb> CHANGE MASTER TO master_host="192.168.56.101", master_user="replication" , master_use_gtid=current_pos; mariadb> START SLAVE;
Installing keepalived

Literature:


The next step is to install and configure keepalived. This can be done as follows:

shell> yum install keepalived shell> systemctl enable keepalived

Important: In my tests I got crashes and core dumps with keepalived which disappeared after a full upgrade of CentOS 7.

Configuring keepalived

The most important part is the keepalived configuration file:

# # /etc/keepalived/keepalived.conf # global_defs { notification_email { root@localhost dba@example.com } notification_email_from root@master1 # master2 on the other node smtp_server localhost 25 router_id MARIADB_MM enable_script_security } # Health checks vrrp_script chk_mysql { script "/usr/sbin/pidof mysqld" weight 2 # Is relevant for the diff in priority interval 1 # every ... seconds timeout 3 # script considered failed after ... seconds fall 3 # number of failures for K.O. rise 1 # number of success for OK } vrrp_script chk_failover { script "/etc/keepalived/chk_failover.sh" weight -4 # Is relevant for the diff in priority interval 1 # every ... seconds timeout 1 # script considered failed after ... seconds fall 1 # number of failures for K.O. rise 1 # number of success for OK } # Main configuration vrrp_instance VI_MM_VIP { state MASTER # BACKUP on the other side interface enp0s9 # private heartbeat interface priority 100 # Higher means: elected first (BACKUP: 99) virtual_router_id 42 # ID for all nodes of Cluster group debug 0 # 0 .. 4, seems not to work? unicast_src_ip 192.168.56.101 # Our private IP address unicast_peer { 192.168.56.102 # Peers private IP address } # For keepalived communication authentication { auth_type PASS auth_pass Secr3t! } # VIP to move around virtual_ipaddress { 192.168.1.99/24 dev enp0s8 # public interface for VIP } # Check health of local system. See vrrp_script above. track_script { chk_mysql # If File /etc/keepalived/failover is touched failover is triggered # Similar can be reached when priority is lowered followed by a reload chk_failover } # When node becomes MASTER this script is triggered notify_master "/etc/keepalived/keepalived_master.sh --user=root --password= --wait=yes --variable=read_only" # When node becomes SLAVE this script is triggered notify_backup "/etc/keepalived/keepalived_backup.sh --user=root --password= --kill=yes --variable=read_only" # Possibly fault and stop should also call keepalived_backup.sh to be on the safe side... notify_fault "/etc/keepalived/keepalived_fault.sh arg1 arg2" notify_stop "/etc/keepalived/keepalived_stop.sh arg1 arg2" # ANY state transit is triggered notify /etc/keepalived/keepalived_notify.sh smtp_alert # send notification during state transit }

With the command:

shell> systemctl restart keepalived

the service is started and/or the configuration is reloaded.

The scripts we used in the configuration file are the following:

chk_failover.sh keepalived_backup.sh keepalived_fault.sh keepalived_master.sh keepalived_notify.sh keepalived_stop.sh
#!/bin/bash # # /etc/keepalived/keepalived_notify.sh # TYPE=${1} NAME=${2} STATE=${3} PRIORITY=${4} TS=$(date '+%Y-%m-%d_%H:%M:%S') LOG=/etc/keepalived/keepalived_notify.log echo $TS $0 $@ >>${LOG}
#!/bin/bash # # /etc/keepalived/chk_failover.sh # /usr/bin/stat /etc/keepalived/failover 2>/dev/null 1>&2 if [ ${?} -eq 0 ] ; then exit 1 else exit 0 fi

To make MariaDB master/master replication more robust against replication problems we took the following (configurable) actions on the database side:

Getting the MASTER role:

  • Waiting for catch-up replication
  • Make the MariaDB instance read/write

Getting the BACKUP role:

  • Make the MariaDB instance read-only
  • Kill all open connections

Testing scenarios

The following scenarios where tested under load (insert_test.sh):

  • Intentional fail-over for maintenance: shell> touch /etc/keepalived/failover shell> rm -f /etc/keepalived/failover
  • Stopping keepalived: shell> systemctl stop keepalived shell> systemctl start keepalived
  • Stopping MariaDB node: shell> systemctl stop mariadb shell> systemctl start mariadb
  • Reboot server: shell> reboot
  • Simulation of split-brain: shell> ip link set enp0s9 down shell> ip link set enp0s9 up

Problems

Problems we faced during set-up and testing were:

  • SElinux/AppArmor
  • Firewall

Keepalived controlling 2 virtual IPs

A second scenario we wanted to build is a MariaDB master/master GTID based replication cluster with 2 VIP addresses. This is to achieve either a read-only VIP and a read/write VIP or to have half of the load on one master and half of the load on the other master:

For this scenario we used the same scripts but a slightly different keepalived configuration:

# # /etc/keepalived/keepalived.conf # global_defs { notification_email { root@localhost dba@example.com } notification_email_from root@master1 # master2 on the other node smtp_server localhost 25 router_id MARIADB_MM enable_script_security } # Health checks vrrp_script chk_mysql { script "/usr/sbin/pidof mysqld" weight 2 # Is relevant for the diff in priority interval 1 # every ... seconds timeout 3 # script considered failed after ... seconds fall 3 # number of failures for K.O. rise 1 # number of success for OK } vrrp_script chk_failover { script "/etc/keepalived/chk_failover.sh" weight -4 # Is relevant for the diff in priority interval 1 # every ... seconds timeout 1 # script considered failed after ... seconds fall 1 # number of failures for K.O. rise 1 # number of success for OK } # Main configuration vrrp_instance VI_MM_VIP1 { state MASTER # BACKUP on the other side interface enp0s9 # private heartbeat interface priority 100 # Higher means: elected first (BACKUP: 99) virtual_router_id 42 # ID for all nodes of Cluster group unicast_src_ip 192.168.56.101 # Our private IP address unicast_peer { 192.168.56.102 # Peers private IP address } # For keepalived communication authentication { auth_type PASS auth_pass Secr3t! } # VIP to move around virtual_ipaddress { 192.168.1.99/24 dev enp0s8 # public interface for VIP } # Check health of local system. See vrrp_script above. track_script { chk_mysql chk_failover } # ANY state transit is triggered notify /etc/keepalived/keepalived_notify.sh smtp_alert # send notification during state transit } vrrp_instance VI_MM_VIP2 { state BACKUP # MASTER on the other side interface enp0s9 # private heartbeat interface priority 99 # Higher means: elected first (MASTER: 100) virtual_router_id 43 # ID for all nodes of Cluster group unicast_src_ip 192.168.56.101 # Our private IP address unicast_peer { 192.168.56.102 # Peers private IP address } # For keepalived communication authentication { auth_type PASS auth_pass Secr3t! } # VIP to move around virtual_ipaddress { 192.168.1.98/24 dev enp0s8 # public interface for VIP } # Check health of local system. See vrrp_script above. track_script { chk_mysql chk_failover } # ANY state transit is triggered notify /etc/keepalived/keepalived_notify.sh smtp_alert # send notification during state transit }
Taxonomy upgrade extras:  mariadb master-master keepalived VIP replication GTID

New Book: Pro MySQL NDB Cluster

It is with great pleasure, I can announce that a new book dedicated to MySQL NDB Cluster has just been released. The book Pro MySQL NDB Cluster is written by my colleague Mikiya Okuno and myself and is a nearly 700 pages deep dive into the world of MySQL NDB Cluster. The book is published by Apress.

Tip: There are several ways to cluster MySQL. This book is about the product MySQL Cluster (often called MySQL NDB Cluster to clarify which cluster it is). There is also MySQL InnoDB Cluster, clustering using replication, and clustering through operating or hardware features. Pro MySQL NDB Cluster is only about the former.

We very much hope you will enjoy the book. Feedback and questions are most welcome, for example on Twitter (@nippondanji and @JWKrogh).

Note: At the time of writing, only the eBook is available for purchase. A softcover version will follow as soon as it has been possible to print it; this can also be pre-ordered now.

The book is divided into five parts and 20 chapters.

Part I – The Basics

The first part provides some background information on the various parts in MySQL NDB Cluster and how it works. The chapters are:

  • Chapter 1: Architecture and Core Concepts
  • Chapter 2: The Data Nodes
Part II – Installation and Configuration

The second part focuses on the installation and configuration related topics, including replication between clusters. The chapter are:

  • Chapter 3: System Planning
  • Chapter 4: Configuration
  • Chapter 5: Installation
  • Chapter 6: Replication
Part III – Daily Tasks and Maintenance

In the third part, the topics include tasks that is part of the daily routine as a database administrator plus a tutorial where the tasks discussed in parts II and III are handled through MySQL Cluster Manager. The chapters are:

  • Chapter 7: The NDB Management Client and Other NDB Utilities
  • Chapter 8: Backups and Restores
  • Chapter 9: Table Maintenance
  • Chapter 10: Restarts
  • Chapter 11: Upgrades and Downgrades
  • Chapter 12: Security Considerations
  • Chapter 13: MySQL Cluster Manager
Chapter IV – Monitoring and Troubleshooting

The fourth part continues with two topics that are also part of the daily routine: monitoring and troubleshooting. The chapters are:

  • Chapter 14: Monitoring Solutions and the Operating System
  • Chapter 15: Sources for Monitoring Data
  • Chapter 16: Monitoring MySQL NDB Cluster
  • Chapter 17: Typical Troubles and Solutions
Chapter V – Development and Performance Tuning

The final part covers topics that are related to development and getting the tuning the cluster with respect to performance. The chapters are:

  • Chapter 18: Developing Applications Using SQL with MySQL NDB Cluster
  • Chapter 19: MySQL NDB Cluster as a NoSQL Database
  • Chapter 20: MySQL NDB Cluster and Application Performance Tuning

Sysbench: in-memory, InnoDB and a small server, MySQL 5.0 to 8.0

This has results for in-memory sysbench with InnoDB and MySQL versions 5.0, 5.1, 5.5, 5.6, 5.7 and 8.0. The previous post covered MyISAM. There are four types of tests: write-heavy, scan-heavy, point-query and inlist-query. For MyISAM the results for the tests within each group were similar. That is less true for InnoDB.

tl;dr - because the other tl;dr are too long
  • InnoDB in 5.5 frequently has lousy performance for write-heavy tests. I didn't debug it.
  • InnoDB 5.7 and 8.0 have great scan performance, are much better for high-concurrency workloads (to be covered in a future post), but suffer from more CPU overhead for low-concurrency workloads.

tl;dr - from 5.0.96 to 8.0.3
  • For update-nonindex QPS decreased 41% on the i5 NUC and 52% on the i3 NUC. Results for update-one and update-inlist were similar. The regression was smaller for update-index and insert. For delete 8.0.3 gets more QPS than 5.0.96.
  • For scan-heavy tests the QPS for MySQL 8.0.3 is between 1.05X and 1.26X better than for 5.0.96 on all tests except read-only with range-size=100 where it is 2% to 5% worse. 
  • For point-query the QPS decreased 36% on the i5 NUC and 46% on the i3 NUC
  • For the inlist-heavy tests the QPS decreased ~22%.
  • The QPS regression is larger for point-query tests than for inlist-heavy. New MySQL releases bring new features which brings new CPU overheads. For point-query there is 1 row fetched per SELECT versus 100 per select for the inlist-heavy tests, so the inlist-heavy amortize that overhead over more rows.

tl;dr - from 5.6.35 to 8.0.3
  • For update-nonindex QPS decreased 23% on the i5 NUC and 30% on the i3 NUC. Results for update-one and update-inlist were similar. The regression was smaller for update-index. Unlike the 5.0 to 8.0 result QPS decreased by ~25% for delete and ~25% for insert.
  • For scan-heavy tests InnoDB 5.7.17 has the best QPS. While there have been regressions since then, scan performance in InnoDB 8.0.3 is still better than in 5.6.35 by up to 1.5X.
  • For point-query the QPS decreased ~11% on the i5 NUC and 17% on the i3 NUC
  • For the inlist-heavy tests the QPS decreased ~7%

Configuration

The tests used InnoDB from upstream MySQL versions 5.0.96, 5.1.72, 5.5.51, 5.6.35, 5.7.17 and 8.0.3. All tests used jemalloc with mysqld. The i3 and i5 NUC servers are described here. My use of sysbench is described here. The my.cnf files are here for the i3 NUC and i5 NUC. I tried to tune my.cnf for all engines including: disabled SSL for MySQL 5.7, used the same charset and collation. For all tests the binlog was enabled but fsync was disabled for the binlog and database redo log. I compiled all of the MySQL versions on the test servers and did not use binaries from upstream. The built-in InnoDB is used for MySQL 5.0 and 5.1.

Sysbench is run with 2 tables and 2M rows per table. Each test is repeated for 1 and 2 clients. Each test runs for 600 seconds except for the insert-only test which runs for 300 seconds. The database fits in RAM as the i3 NUC has 8gb of RAM and the i5 NUC has 16gb.

I repeat tests on an i5 NUC and i3 NUC. The i5 NUC has more RAM, a faster SSD and faster CPU than the i3 NUC, but I disabled turbo boost on the i5 NUC many months ago to reduce variance in performance and with that the difference in CPU performance between these servers is much smaller. Assuming I will always disable turbo boost in the future I might as well stick with an i3 NUC for my next purchase.

Results


All of the data for the tests is on github for the i3 NUC and the i5 NUC. Results for each test are listed separately below. For all tests but scan the result has the QPS for the test with 1 client relative to the QPS for InnoDB in MySQL 5.0.96. The tests are explained here.

Graphs

There are 4 types of tests and I provided a graph for each type: write-heavy, scan-heavy, point-query, inlist-query. Fortunately the results within each group are similar and one graph per group is sufficient. The tests are explained here.

The write-heavy group includes update-inlist, update-one, update-index, update-nonindex, delete and insert. The graph is for update-nonindex using the QPS for each MySQL version relative to the QPS for MySQL 5.0.96.
  • From MySQL 5.0.96 to 8.0.3 QPS decreased 41% on the i5 NUC and 52% on the i3 NUC
  • From MySQL 5.6.35 to 8.0.3 QPS decreased 23% on the i5 NUC and 30% on the i3 NUC
  • For the update-index test the QPS decreased ~20% from 5.0.96 to 8.0.3. From 5.6.35 to 8.0.3 it decreased 6% for the i5 NUC and 13% for the i3 NUC.
  • For the delete test 8.0.3 gets more QPS than 5.0.96. But from 5.6.35 to 8.0.3 the QPS decreased by ~25%.
  • For the insert test the QPS decreased 9% for the i5 NUC and 15% for the i3 NUC from 5.0.96 to 8.0.3. From 5.6.35 to 8.0.3 it decreased 22% for the i5 NUC and 27% for the i3 NUC. 
The scan-heavy group includes read-write with range-size set to 100 and 10,000 and then read-only with range-size set to 100 and 10,000. The graph is for read-write with range-size=100 using the QPS for each MySQL version relative to the QPS for MySQL 5.0.96. 
  • The QPS for MySQL 8.0.3 is between 1.05X and 1.26X better than for 5.0.96 on all tests except read-only with range-size=100 where it is 2% to 5% worse. 
  • InnoDB 5.7.17 has the best range scan performance. While there have been regressions since then, scan performance in InnoDB 8.0.3 is still better than in 5.6.35 and up to 1.5X better.
The point-query group includes the point-query test run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests using the QPS for each MySQL version relative to the QPS for MySQL 5.0.96. From MySQL 5.0.96 to 8.0.3 the QPS decreased 36% on the i5 NUC and 46% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 the decreased ~11% on the i5 NUC and 17% on the i3 NUC.  The inlist-query group includes the hot-points test and the random-points tests run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests using the QPS for each MySQL version relative to the QPS for MySQL 5.0.96. From MySQL 5.0.96 to 8.0.3 the QPS decreased ~22%. From MySQL 5.6.35 to 8.0.3 the QPS decreased ~7%. update-inlist

Here and the sections that follow have the QPS and QPS ratio for each MySQL release on the i3 and i5 NUC. The QPS ratio is the QPS for the release relative to the QPS for MySQL 5.0.96 using the test with 1 client.

From MySQL 5.0.96 to 8.0.3 QPS decreased by 38% on the i5 NUC and 41% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS decreased by 20% on the i5 NUC and 11% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
3306    1.00    4335    1.00    5.0.96
2718    0.82    4051    0.93    5.1.72
 645    0.20    1155    0.27    5.5.51
2201    0.67    3349    0.77    5.6.35
2050    0.62    3192    0.74    5.7.17
2037    0.62    3086    0.71    8.0.1
1960    0.59    2709    0.62    8.0.2
1963    0.59    2693    0.62    8.0.3

update-one

From MySQL 5.0.96 to 8.0.3 QPS decreased by ~50%. From MySQL 5.6.35 to 8.0.3 QPS decreased by 26% on the i5 NUC and 33% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
13748   1.00    15689   1.00    5.0.96
12084   0.88    13269   0.85    5.1.72
 9065   0.66    10343   0.66    5.5.51
 9650   0.70    10969   0.70    5.6.35
 8097   0.59     9648   0.61    5.7.17
 7552   0.55     8876   0.57    8.0.1
 6491   0.47     8222   0.52    8.0.2
 6468   0.47     8147   0.52    8.0.3

update-index

From MySQL 5.0.96 to 8.0.3 QPS decreased by 17% on the i5 NUC and 20% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS decreased by 6% on the i5 NUC and 13% on the i3 NUC. The regression here is much smaller than for the other update-heavy tests.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
 3414   1.00    6185    1.00    5.0.96
 3323   0.97    5394    0.87    5.1.72
 3265   0.96    5173    0.84    5.5.51
 3134   0.92    5489    0.89    5.6.35
 2983   0.87    5861    0.95    5.7.17
 2910   0.85    5494    0.89    8.0.1
 2798   0.82    5170    0.84    8.0.2
 2721   0.80    5160    0.83    8.0.3

update-nonindex

From MySQL 5.0.96 to 8.0.3 QPS decreased by ~50%. From MySQL 5.6.35 to 8.0.3 QPS decreased by 24% on the i5 NUC and 31% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
11321   1.00    13146   1.00    5.0.96
 9965   0.88    11392   0.87    5.1.72
 4543   0.40     7294   0.55    5.5.51
 7860   0.69    10148   0.77    5.6.35
 6337   0.56     9135   0.69    5.7.17
 6295   0.56     8514   0.65    8.0.1
 5499   0.49     7812   0.59    8.0.2
 5398   0.48     7745   0.59    8.0.3

delete

From MySQL 5.0.96 to 8.0.3 QPS is 1.1X better on the i5 NUC and ~1.5X better on the i3 NUC. But from MySQL 5.6.35 to 8.0.3 QPS decreased by ~25%.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
 7484   1.00    13819   1.00    5.0.96
 7547   1.01    12725   0.92    5.1.72
 9950   1.33    14830   1.07    5.5.51
14590   1.95    19900   1.44    5.6.35
12758   1.70    17621   1.28    5.7.17
12143   1.62    16539   1.20    8.0.1
11104   1.48    15260   1.10    8.0.2
11085   1.48    15202   1.10    8.0.3

read-write with --range-size=100

From MySQL 5.0.96 to 8.0.3 QPS is 1.04X better on the i5 NUC and 1.17X better on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS decreased by 1% on the i5 NUC and 2% on the i3 NUC. From the peak in MySQL 5.7.17 to 8.0.3 QPS decreased by 7% on the i5 NUC and 5% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
 8208   1.00    11038   1.00    5.0.96
 7875   0.96    10126   0.92    5.1.72
 8904   1.08    11275   1.02    5.5.51
 9844   1.20    11591   1.05    5.6.35
10152   1.24    12290   1.11    5.7.17
 9694   1.18    11698   1.06    8.0.1
 9578   1.17    11570   1.05    8.0.2
 9610   1.17    11481   1.04    8.0.3

read-write with --range-size=10000

From MySQL 5.0.96 to 8.0.3 QPS is ~1.27X better. From MySQL 5.6.35 to 8.0.3 QPS is ~1.5X better. From the peak in MySQL 5.7.17 to 8.0.3 QPS decreased by 3% on the i5 NUC and 5% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
330     1.00    387     1.00    5.0.96
316     0.96    376     0.97    5.1.72
293     0.89    363     0.94    5.5.51
283     0.86    335     0.87    5.6.35
442     1.34    500     1.29    5.7.17
434     1.32    489     1.26    8.0.1
423     1.28    491     1.27    8.0.2
421     1.28    486     1.26    8.0.3

read-only with --range-size=100

From MySQL 5.0.96 to 8.0.3 QPS decreased by 5% on the i5 NUC and 2% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS is ~1.07X better. From the peak in MySQL 5.7.17 to 8.0.3 QPS decreased by 3% on the i5 NUC and 8% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
10748   1.00    12490   1.00    5.0.96
 9858   0.92    11816   0.95    5.1.72
10100   0.94    11396   0.91    5.5.51
 9644   0.90    11120   0.89    5.6.35
11369   1.06    12247   0.98    5.7.17
10516   0.98    11750   0.94    8.0.1
10398   0.97    11980   0.96    8.0.2
10509   0.98    11874   0.95    8.0.3

read-only.pre with --range-size=10000

From MySQL 5.0.96 to 8.0.3 QPS is 1.22X better on the i5 NUC and 1.3X better on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS is ~1.5X better. From the peak in MySQL 5.7.17 to 8.0.3 QPS decreased by 2% on the i5 NUC and 5% on the i3 NUC.

This test is run before the write-heavy tests. The next section has results for the test run after write-heavy tests. They get similar QPS which means that b-tree fragmentation isn't an issue here for an in-memory workload.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
264     1.00    317     1.00    5.0.96
255     0.97    305     0.96    5.1.72
239     0.91    294     0.93    5.5.51
226     0.86    272     0.86    5.6.35
358     1.36    394     1.24    5.7.17
351     1.33    392     1.24    8.0.1
341     1.30    384     1.21    8.0.2
341     1.30    387     1.22    8.0.3

read-only with --range-size=10000

From MySQL 5.0.96 to 8.0.3 QPS is 1.22X better on the i5 NUC and 1.28X better on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS is ~1.5X better. From the peak in MySQL 5.7.17 to 8.0.3 QPS decreased by 7% on the i5 NUC and 5% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
268     1.00    316     1.00    5.0.96
256     0.96    303     0.96    5.1.72
239     0.89    295     0.93    5.5.51
221     0.82    272     0.86    5.6.35
359     1.34    393     1.24    5.7.17
352     1.31    387     1.22    8.0.1
345     1.29    388     1.23    8.0.2
343     1.28    385     1.22    8.0.3

point-query.pre

From MySQL 5.0.96 to 8.0.3 QPS decreased ~40%. From MySQL 5.6.35 to 8.0.3 QPS decreased ~30%.

This test is run before the write-heavy tests. The next section has results for the test run after write-heavy tests. They get similar QPS which means that b-tree fragmentation isn't an issue here for an in-memory workload.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
28360   1.00    27125   1.00    5.0.96
22444   0.79    23049   0.85    5.1.72
18938   0.67    20004   0.74    5.5.51
18329   0.65    20505   0.76    5.6.35
16592   0.59    18126   0.67    5.7.17
16010   0.56    16402   0.60    8.0.1
15031   0.53    16175   0.60    8.0.2
15208   0.54    16188   0.60    8.0.3

point-query

From MySQL 5.0.96 to 8.0.3 QPS decreased ~40%. From MySQL 5.6.35 to 8.0.3 QPS decreased by 11% on the i5 NUC and 17% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
28733   1.00    26883   1.00    5.0.96
21941   0.76    23487   0.83    5.1.72
18541   0.65    20287   0.75    5.5.51
18554   0.65    19257   0.72    5.6.35
16883   0.59    17723   0.66    5.7.17
16163   0.56    16828   0.63    8.0.1
15005   0.52    16626   0.62    8.0.2
15492   0.54    17121   0.64    8.0.3

random-points.pre

From MySQL 5.0.96 to 8.0.3 QPS decreased ~20%. From MySQL 5.6.35 to 8.0.3 QPS decreased by ~6%. The reduction for point-query tests above is larger than the reduction for random-points here. I assume that point-query suffers more from the new CPU overhead in new MySQL releases (more features == more code in network, parse and optimize). The point query test pays that overhead cost per row fetched. For the random-points test here, 100 rows are fetched per SELECT statement.

This test is run before the write-heavy tests. The next section has results for the test run after write-heavy tests. They get similar QPS which means that b-tree fragmentation isn't an issue here for an in-memory workload.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
3665    1.00    3801    1.00    5.0.96
3516    0.96    3707    0.98    5.1.72
3267    0.89    3468    0.91    5.5.51
3040    0.83    3203    0.84    5.6.35
2931    0.80    3121    0.82    5.7.17
2810    0.77    3006    0.79    8.0.1
2817    0.77    3030    0.80    8.0.2
2832    0.77    3023    0.80    8.0.3

random-points

From MySQL 5.0.96 to 8.0.3 QPS decreased ~21%. From MySQL 5.6.35 to 8.0.3 QPS decreased by 7%.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
3693    1.00    3793    1.00    5.0.96
3498    0.95    3684    0.97    5.1.72
3269    0.89    3434    0.91    5.5.51
3036    0.82    3223    0.85    5.6.35
2947    0.80    3123    0.82    5.7.17
2801    0.76    3009    0.79    8.0.1
2839    0.77    3042    0.80    8.0.2
2833    0.77    2998    0.79    8.0.3

hot-points

From MySQL 5.0.96 to 8.0.3 QPS decreased 32%. From MySQL 5.6.35 to 8.0.3 QPS decreased by ~10%.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
4918    1.00    5525    1.00    5.0.96
4388    0.89    5036    0.91    5.1.72
3931    0.80    4595    0.83    5.5.51
3666    0.75    4242    0.77    5.6.35
3458    0.70    3898    0.71    5.7.17
3314    0.67    3753    0.68    8.0.1
3335    0.68    3764    0.68    8.0.2
3338    0.68    3764    0.68    8.0.3

insert

From MySQL 5.0.96 to 8.0.3 QPS decreased 9% on the i5 NUC and 15% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS decreased by 22% on the i5 NUC and 27% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
8027    1.00     9681   1.00    5.0.96
7726    0.96     9215   0.95    5.1.72
6932    0.86     8851   0.91    5.5.51
9340    1.16    11244   1.16    5.6.35
7853    0.98     9892   1.02    5.7.17
7413    0.92     9257   0.96    8.0.1
6941    0.86     8567   0.88    8.0.2
6829    0.85     8822   0.91    8.0.3

This Week in Data with Colin Charles 14: A Meetup in Korea and The Magic Quadrant

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

We’re close to opening up the call for papers for Percona Live Santa Clara 2018 and I expect this to happen next week. We also have a committee all lined up and ready to vote on submissions.

In other news, I’ve spent some time preparing for the Korean MySQL Power Group meetup to be held in Seoul this Saturday, 11 November 2017. This is a great opportunity for us to extend our reach in Asia. This meetup gathers together top DBAs from Internet companies that use MySQL and related technologies.

Gartner has released their Magic Quadrant for Operational Database Management Systems 2017. Reprint rights have been given to several vendors, e.g. EnterpriseDB and Microsoft. I’m sure you can find other links. The Magic Quadrant features far fewer database vendors now, many have been dropped. What’s your take on it?

Releases

This was a slow release week. Check out:

Link List Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

MySQL & MariaDB Database Backup Resources

Most organizations do not realize they have a problem with database backups until they need to restore the data and find it’s not there or not in the form that they were expecting.

The designated administrator managing the database environments must be prepared for situations where any failure may cause an impact to the availability, integrity, or usability of a database or application. Reacting to these failures is a key component of the administrator’s responsibilities and their ability to react correctly depends on whether they have a well-planned strategy for database backups and recovery.

Pixar’s “Toy Story 2” famously almost never happened due a command line mis-run causing the movie to be deleted and an in-effective backup strategy in place. That movie went on to take in nearly $500 million dollars worldwide in box office… money that, without the fact that one team member made their own personal backup, may have never been made.

ClusterControl provides you with sophisticated backup and failover features using a point-and-click interface to easily restore your data if something goes wrong and can be your DBA-sidekick when it comes to building an effective backup strategy. There are many aspects to consider though when building such a strategy.

Here at Severalnines we have database experts who have written much about the topic and in this blog we will collect the top resources to help you build your own database backup strategy for MySQL and MariaDB databases more specifically.

If you are running a MySQL or MariaDB environment our best resource for you is the free whitepaper “The DevOps Guide to Database Backups for MySQL and MariaDB.” The guide covers the two most popular backup utilities available for MySQL and MariaDB, namely mysqldump and Percona XtraBackup. It further covers topics such as how database features like binary logging and replication can be leveraged in backup strategies and provides best practices that can be applied to high availability topologies in order to make database backups reliable, secure and consistent.

In addition to the whitepaper there are two webinars focused on backups that you can watch on-demand. “MySQL Tutorial - Backup Tips for MySQL, MariaDB & Galera Cluster” and “Become a MySQL DBA - Deciding on a Relevant Backup Solution.” Each of these webinars offer tips and best practices on building a backup plan and summarize much of the content that is available throughout our website.

Here are our most popular and relevant blogs on the topic...

Overview of Backup and Restores

In the blog “Become a MySQL DBA - Backup and Restore” we provide a high-level overview of backups and restores when managing a MySQL environment. Included in the blog is an overview of different backup methodologies, overview of logical and physical backups, and some best practices and guidelines you can follow.

The Impact of MySQL Storage Engines on Backups

In the blog “The Choice of MySQL Storage Engine and its Impact on Backup Procedures” we discuss how the selection of different types of storage engines (like MyISAM, InnoDB, etc) can have an impact on your backup strategy.

Building a Backup Strategy and Plan

In our blog “mysqldump or Percona XtraBackup? Backup Strategies for MySQL Galera Cluster” we discuss the different options available to you when making your backup and restore plan with special focus on doing it in a way that does not affect performance.

Making Sure You Perform a Good Backup

In our blog “How to Perform Efficient Backups for MySQL and MariaDB” we discuss a number of ways to backup MySQL and MariaDB, each of which comes with pros and cons.

Using ClusterControl for Backups

In the blog “ClusterControl Tips & Tricks - Best Practices for Database Backups” we should how to effectively manage your backup plan using ClusterControl. With ClusterControl you can schedule logical or physical backups with failover handling and easily restore backups to bootstrap nodes or systems.

ClusterControl Single Console for Your Entire Database Infrastructure Find out what else is new in ClusterControl Install ClusterControl for FREE Additional Blogs

There are several more blogs that have been written over the years that can also aid you in ensuring your backups are performed successfully and efficiently. Here’s a list of them...

Full Restore of a MySQL or MariaDB Galera Cluster from Backup

Performing regular backups of your database cluster is imperative for high availability and disaster recovery. This blog post provides a series of best practices on how to fully restore a MySQL or MariaDB Galera Cluster from backup.

Read the Blog

What’s New in ClusterControl 1.4 - Backup Management

This blog post covers the new backup features available in ClusterControl version 1.4.

Read the Blog

ClusterControl Tips & Tricks: Customizing your Database Backups

ClusterControl follows some best practices to perform backups using mysqldump or Percona xtrabackup. Although these work for the majority of database workloads, you might still want to customize your backups. This blog shows you how.

Read the Blog

Architecting for Failure - Disaster Recovery of MySQL/MariaDB Galera Cluster

Whether you use unbreakable private data centers or public cloud platforms, Disaster Recovery (DR) is indeed a key issue. This is not about copying your data to a backup site and being able to restore it, this is about business continuity and how fast you can recover services when disaster strikes.

Read the Blog

Using BitTorrent Sync to Transfer Database Backups Offsite

BitTorrent Sync is a simple replication application providing encrypted bidirectional file transfers that can run behind NAT and is specifically designed to handle large files. By leveraging the simplicity of Bittorrent Sync, we can transfer backup files away from our cluster, enhancing the backups availability and reducing the cost of broken backup, where you can regularly verify your backups off-site.

Read the Blog

How to Clone Your Database

If you are managing a production database, chances are high that you’ve had to clone your database to a different server than the production server. The basic method of creating a clone is to restore a database from a recent backup onto a different database server. Other methods include replicating from a source database while it is up, in which case it is important the original database be unaffected by any cloning procedure.

Read the Blog

Not Using MySQL? Here are some resources we have to help with other database technologies…

Become a MongoDB DBA: MongoDB Backups

This is our fifth post in the “Become a MongoDB DBA” blog series - how do you make a good backup strategy for MongoDB, what tools are available and what you should watch out for.

Read the Blog

Become a MongoDB DBA: Recovering your Data

This is our sixth post in the “Become a MongoDB DBA” blog series - how do you recover MongoDB using a backup.

Read the Blog

Become a PostgreSQL DBA - Logical & Physical PostgreSQL Backups

Taking backups is one of the most important tasks of a DBA - it is crucial to the availability and integrity of the data. Part of our Become a PostgreSQL DBA series, this blog post covers some of the backup methods you can use with PostgreSQL.

Read the Blog

Tags:  MySQL MariaDB database backups backup restore disaster recovery

MySQL and Linux Context Switches

In this blog post, I’ll look at MySQL and Linux context switches and what is the normal number per second for a database environment.

You might have heard many times about the importance of looking at the number of context switches to indicate if MySQL is suffering from the internal contention issues. I often get the question of what is a “normal” or “acceptable” number, and at what point should you worry about the number of context switches per second?

First, let’s talk about what context switches are in Linux. This StackOverflow Thread provides a good discussion, with a lot of details, but basically it works like this:  

The process (or thread in MySQL’s case) is running its computations. Sooner or later, it has to do some blocking operation: disk IO, network IO, block waiting on a mutex or yield. The execution switches to the other process, and this is called voluntary context switch.On the other hand, the process/thread may need to be preempted by the scheduler because it used an allotted amount of CPU time (and now other tasks need to run) or because it is required to run high priority task. This is called involuntary context switches. When all the process in the system are added together and totaled, this is the system-wide number of context switches reported (using, for example, vmstat):

root@nuc2:~# vmstat 10 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st 17  0      0 12935036 326152 2387388    0    0     0     5     0      1  9  0 91  0  0 20  0      0 12933936 326152 2387384    0    0     0     3 32228 124791 77 22  1  0  0 17  0      0 12933348 326152 2387364    0    0     0    11 33212 124575 78 22  1  0  0 16  0      0 12933380 326152 2387364    0    0     0    78 32470 126100 78 22  1  0  0

This is a global number. In many cases, however, it is better to look at it as context switches per CPU logical core. This is because cores execute tasks independently. As such, they have mostly independent causes for context switches. If you have a large number of cores, there can be quite a difference:

The number of context switches per second on this system looks high (at more than 1,000,000). Considering it has 56 logical cores, however, it is only about 30,000 per second per logical core (which is not too bad).

So how do we judge if the number of context switches is too high in your system? One answer is that it is too high if you’re wasting too much CPU on context switches. This brings up the question: how many context switches can the system handle if it is only doing context switches?

It is easy to find this out!  

Sysbench has a “threads” test designed specifically to measure this. For example:

sysbench --thread-locks=128 --time=7200 --threads=1024 threads run

Check the vmstat output or the Context Switches PMM graph:

We can see this system can handle up to 35 million context switches per second in total (or some 500K per logical CPU core on average).

I don’t recommend using more than 10% of CPU resources on context switching, so I would try to keep the number of the context switches at no more than 50K per logical CPU core.

Now let’s think about context switches from the other side: how many context switches do we expect to have at the very minimum for given load? Even if all the stars align and your query to MySQL doesn’t need any disk IO or context switches due to waiting for mutexes, you should expect at least two context switches: one to the client thread which processes the query and one for the query response sent to the client.    

Using this logic, if we have 100,000 queries/sec we should expect 200,000 context switches at the very minimum.

In the real world, though, I would not worry about contention being a big issue if you have less than ten context switches per query.

It is worth noting that in MySQL not every contention results in a context switch. InnoDB implements its own mutexes and RW-locks, which often try to “spin” to wait for a resource to become available. This wastes CPU time directly rather than doing a context switch.

Summary:

  • Look at the number of context switches per logical core rather than the total for easier-to-compare numbers
  • Find out how many context switches your system can handle per second, and don’t get too concerned if your context switches are no more than 10% of that number
  • Think about the number of context switches per query: the minimum possible is two, and values less than 10 make contention an unlikely issue
  • Not every MySQL contention results in a high number of context switches

InnoDB Cluster: setting up Production… for disaster! (2/2)

Ok, so now we’re got our InnoDB Cluster a-clustering, MySQL Router a-routing, now we need some disaster to be a-disaster-recovering…

A foreword first.

If you’re looking to use Enterprise Backup to recover a single node and restore that node back into an existing InnoDB Cluster, LeFred takes you through that one nicely here.

Preparing for backup

On our single primary server, the one that allows write, which was ic2/10.0.0.12 in my case:

mysql -uroot -poracle << EOF SET sql_log_bin = OFF; create user 'backup'@'%' identified by 'oracle'; grant all on *.* to 'backup'@'%'; SET sql_log_bin = ON; EOF

Let’s create something to backup (if you haven’t already done so of course):

mysqlsh --uri root:oracle@localhost:3306 --sql create database nexus;

Let’s run a full backup to start with:

mysqlbackup -ubackup -poracle --socket=/var/lib/mysql/mysql.sock \ --backup-dir=/home/vagrant/backup/full \ --backup-image=full_backup.img  backup-to-image ls /home/vagrant/backup/full/full_backup.img mysqlbackup --backup-image=/home/vagrant/backup/full/full_backup.img validate

Ok, everything ok.

Enterprise Backup needs to create 2 tables, in CSV format, which is why we commented out the disabled_storage_engine parameter in the my.cnf before. If we hadn’t commented it out, there would be warnings saying that these tables can’t be created, and hence, we’d need to use end_lsn in the incremental backup instead of “history:last_backup”. A more manual process.

Let’s create something to use our incremental backup:

create table nexus.replicant ( id INT(11) NOT NULL AUTO_INCREMENT , `First name` varchar(40) not null default '', `Last name` varchar(40) not null default '', `Replicant` enum('Yes','No') not null default 'Yes', PRIMARY KEY (id) ) engine=InnoDB row_format=COMPACT;

Let’s back it up:

mysqlbackup --user=backup -poracle --socket=/var/lib/mysql/mysql.sock \ --incremental=optimistic \ --incremental-base=history:last_backup \ --backup-dir=/home/vagrant/backup/incre \ --backup-image=incr_backup1.mbi backup-to-image

And validate it too:

mysqlbackup --backup-image=/home/vagrant/backup/incre/incr_backup1.mbi validate

 

Disaster Time

On all 3 nodes, it’s time to create a disaster:

sudo -i systemctl stop mysqld.service rm -rf /var/lib/mysql/* rm -rf /var/log/mysqld.log

 

Now, on ic2, where we took the backup (from the master) restore the full optimistic backup :

mysqlbackup --defaults-file=/etc/my.cnf \ --backup-image=/home/vagrant/backup/full/full_backup.img \ --backup-dir=/tmp/mysql_backup_pitr --datadir=/var/lib/mysql \ --with-timestamp copy-back-and-apply-log

And then, restore the incremental:

mysqlbackup --defaults-file=/etc/my.cnf \ --backup-image=/home/vagrant/backup/incre/incr_backup1.mbi \ --incremental-backup-dir=/tmp/mysql_backup_pitr_incr \ --datadir=/var/lib/mysql --incremental --with-timestamp \ copy-back-and-apply-log

 

Still on ic2 (from where the backup was run), and as the o.s. root user:

chown -R mysql:mysql /var/lib/mysql/* systemctl start mysqld.service mysql -uroot -poracle << EOF SET sql_log_bin = OFF; reset master; # How To Manually Remove InnoDB Cluster Metadata Directory (Doc ID 2274320.1) drop database mysql_innodb_cluster_metadata; SET sql_log_bin = ON; EOF

The previous step is done, as when we start the instance, it still has information on the metadata but everything has been lost, so in order to return to a safe InnoDB Cluster environment, we drop the schema and will rebuild. The slight difference here is that we still have the /etc/my.cnf file.

Now to configure our InnoDB Cluster again:

mysqlsh --uri ic:oracle@10.0.0.12:3306 dba.configureLocalInstance(); dba.checkInstanceConfiguration('ic:oracle@10.0.0.12:3306') var cluster = dba.createCluster( 'myCluster'); cluster.status();

Now we have 1 instance, we can reuse the same backup (full & incremental) to restore the other 2 nodes. We reuse the same commands as before, on both ic1 & ic3:

mysqlbackup --defaults-file=/etc/my.cnf \ --backup-image=/home/vagrant/backup/full/full_backup.img \ --backup-dir=/tmp/mysql_backup_pitr --datadir=/var/lib/mysql \ --with-timestamp copy-back-and-apply-log mysqlbackup --defaults-file=/etc/my.cnf \ --backup-image=/home/vagrant/backup/incre/incr_backup1.mbi \ --incremental-backup-dir=/tmp/mysql_backup_pitr_incr \ --datadir=/var/lib/mysql --incremental --with-timestamp \ copy-back-and-apply-log

Now to do some specifics.

On ic1:

cd /var/lib/mysql rm ic2-bin* chown -R mysql:mysql /var/lib/mysql/* systemctl start mysqld.service mysql -uroot -poracle << EOF SET sql_log_bin = OFF; reset master; drop database mysql_innodb_cluster_metadata; SET sql_log_bin = ON; EOF mysqlsh --uri ic:oracle@10.0.0.11:3306 dba.configureLocalInstance(); dba.checkInstanceConfiguration('ic:oracle@10.0.0.11:3306')

And on ic3:

cd /var/lib/mysql rm ic2-bin* chown -R mysql:mysql /var/lib/mysql/* systemctl start mysqld.service mysql -uroot -poracle << EOF SET sql_log_bin = OFF; reset master; drop database mysql_innodb_cluster_metadata; SET sql_log_bin = ON; EOF mysqlsh --uri ic:oracle@10.0.0.13:3306 dba.configureLocalInstance(); dba.checkInstanceConfiguration('ic:oracle@10.0.0.13:3306')

Now, of course, this seems like a lot of manual process. We could also have just as simply taken a full backup from the restored & reestablished ic2, and then have used “full/meta/backup_gtid_executed.sql” to start the instances in the cluster. This way we reuse the original full and incremental backups.

Back on ic2:

mysqlsh --uri ic:oracle@10.0.0.12:3306 cluster.addInstance('ic@10.0.0.11:3306') cluster.addInstance('ic@10.0.0.13:3306') cluster.status()

We’re back in business.

 

Advertisements

InnoDB Cluster: setting up Production… for disaster! (1/2)

Want to setup InnoDB Cluster and be prepared for a Disaster Recovery scenario? Get ready:

Here’s a way to set up InnoDB Cluster using the 3 environments, on Oracle Linux 7.2, 5.7.19 MySQL Commercial Server, MySQL Shell 8.0.3 DMR, MySQL Router. As this is the first blog post for a complete disaster recovery scenario of InnoDB Cluster, we’ll also be installing MySQL Enterprise Backup.

If you’re new to InnoDB Cluster then I’d highly recommend looking at the following to understand how it works and what Group Replication, Shell & Router are.:

So, to start installing, we’ll need to include a few things first:

  • Python is a must, if it’s not already installed, so:
sudo yum install python
  • And numa awareness is needed for 5.7 MySQL Server:
sudo yum install numactl

As I’m installing the Enterprise software, as I will need Enterprise Backup, I’ve previously downloaded the rpm’s for OLinux 7.2 from http://edelivery.oracle.com, and left them in the directory on each of the 3 servers I’m installing from.

Therefore I can run:

sudo yum install -y mysql-commercial-common-5.7.19-1.1.el7.x86_64.rpm \ mysql-commercial-client-5.7.19-1.1.el7.x86_64.rpm \ mysql-commercial-libs-5.7.19-1.1.el7.x86_64.rpm \ mysql-commercial-server-5.7.19-1.1.el7.x86_64.rpm \ mysql-router-commercial-2.1.4-1.1.el7.x86_64.rpm

And, as I downloaded the following from http:/www.mysql.com/downloads:

sudo yum install -y mysql-shell-8.0.3-0.1.dmr.el7.x86_64.rpm \ meb-4.1.0-el7.x86_64.rpm

Remember, this is done on all 3 servers (in my case ic1, ic2 & ic3 / 10.0.0.11, 10.0.0.12 & 10.0.0.13).

Start the mysql servers. As this is the first time, we instantiate at the same time (i.e. the randomly generated root password will be needed).

sudo systemctl start mysqld.service

Enable the service to automatically start on server reboot:

sudo systemctl enable mysqld.service

And, in my scenario, as you will see, it’s really not a Production environment, so I’m going to take it easy with my password & security convention. I would NOT recommend this for your production environment.

Insert the following at the end of the config file, to avoid having to create complex passwords:

sudo vi + /etc/my.cnf .. .. validate-password=OFF

Then restart mysql so the parameter is taken:

sudo systemctl restart mysqld.service

Change the root password (if you really want to):

sudo grep 'A temporary password is generated for root@localhost' /var/log/mysqld.log |tail -1 mysql -uroot -p SET sql_log_bin = OFF; alter user 'root'@'localhost' identified by 'oracle'; SET sql_log_bin = ON; exit

I don’t want these changes being replicated across the nodes, because I’m actually doing this on all servers, hence I use “SET sql_log_bin = OFF” so the changes aren’t reflected in the bin logs.

Now, change to root and run MySQL Shell to start configuring the InnoDB Cluster environment:

sudo -i mysqlsh MySQL JS > dba.configureLocalInstance(); MySQL JS > \q

This is where we get asked a couple of questions:

Detecting the configuration file... Found configuration file at standard location: /etc/my.cnf Do you want to modify this file? [Y|n]: [Y|n]: MySQL user 'root' cannot be verified to have access to other hosts in the network. 1) Create root@% with necessary grants 2) Create account with different name 3) Continue without creating account 4) Cancel Please select an option [1]: 2 Please provide an account name (e.g: icroot@%) to have it created with the necessary privileges or leave empty and press Enter to cancel. Account Name: ic Password for new account: ****** Confirm password: ****** Validating instance...

As you can see, I chose option 2, to create the user ic/oracle for my cluster. You can choose the name you wish here.

It’s quite normal for the output to tell you to restart the mysql server for the changes in the config to take place. We’ll do it after the next step.

Now, because MEB needs a couple of CSV type tables to run the full and incremental backup (mysql.backup_history & mysql.backup_progress), and if we want MEB to run automatically and not have to script the incremental backup (using end_lsn) and also well want these tables to monitor them from Enterprise Monitor and as well as reduce the amount of warnings being produced on each backup run, we’ll have to comment out the “disabled_storage_engines” entry in the config file:

sudo vi + /etc/my.cnf .. .. #disabled_storage_engines = MyISAM,BLACKHOLE,FEDERATED,CSV,ARCHIVE ..

And restart (remember, we’re still doing this on all 3 nodes):

systemctl restart mysqld.service mysqlsh dba.checkInstanceConfiguration('ic:oracle@10.0.0.12:3306');

We should get an “OK” here. Using older versions of Shell might give you otherwise.

Once the mysql server has restarted, run the following. Specific privileges are required for our “ic” user:

mysql -uroot -poracle << EOF SET sql_log_bin = OFF; GRANT SELECT ON *.* TO ic@'%' WITH GRANT OPTION; FLUSH PRIVILEGES; SET sql_log_bin = ON; EOF

So, everything we’ve done up until now, has been done on all 3 OL servers.

Now, on just one server, doesn’t matter which, let’s create our InnoDB Cluster:

mysqlsh --uri ic:oracle@10.0.0.12:3306 var cluster = dba.createCluster( 'myCluster'); cluster.status();

Now we’ve got 1 instance in our cluster, let’s add the other 2 that we’ve configured at the same time as this one.

cluster.addInstance('ic:oracle@10.0.0.11:3306'); cluster.addInstance('ic:oracle@10.0.0.13:3306');

Remember, by default we’re in single primary mode, i.e. there will only be 1 RW server and the other 2, the ones we just added, will be in SUPER READ ONLY mode, i.e. no writes will be allowed. If you want to go to multi master mode, then that’s a different createCluster option.

Now to check that all 3 are in the same cluster:

cluster.status();

Ok, a slight word of warning, and if you haven’t paid attention to the other referenced documentation mentioned at the beginning, if you come out of the mysqlsh session, and want to go back in and get the cluster status, you’ll need to run this first:

var cluster = dba.getCluster("myCluster")

Otherwise shell will just complain saying that it doesn’t understand what “cluster.” is.

And so, after that, we should see something like the following:

{    "clusterName": "myCluster",    "defaultReplicaSet": {        "name": "default",      "primary": "10.0.0.12:3306",      "ssl": "REQUIRED",      "status": "OK",      "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",      "topology": {          "10.0.0.11:3306": {           "address": "10.0.0.11:3306",           "mode": "R/O",           "readReplicas": {},           "role": "HA",           "status": "ONLINE"        },           "10.0.0.12:3306": {           "address": "10.0.0.12:3306",           "mode": "R/W",           "readReplicas": {},           "role": "HA",           "status": "ONLINE"           },           "10.0.0.13:3306": {          "address": "10.0.0.13:3306",           "mode": "R/O",           "readReplicas": {},           "role": "HA",           "status": "ONLINE"           } } } }

Looking good! Before we jump for joy, we have to make the changes persistent in the my.cnf. This has to be done locally on each machine:

On ic2:

mysqlsh --uri ic:oracle@10.0.0.12:3306 dba.configureLocalInstance('ic:oracle@localhost:3306');

On ic1:
Say Y to when it asks you to override SUPER READ ONLY mode, otherwise no changes will occur.

mysqlsh --uri ic:oracle@10.0.0.11:3306 dba.configureLocalInstance('ic:oracle@localhost:3306');

On ic3:
Again, say Y to when it asks you to override SUPER READ ONLY mode, otherwise no changes will occur.

mysqlsh --uri ic:oracle@10.0.0.13:3306 dba.configureLocalInstance('ic:oracle@localhost:3306');

And, now, just to make sure, back on the master server, or even from any of the 3 nodes:

mysqlsh --uri ic:oracle@10.0.0.12:3306 var cluster = dba.getCluster( 'myCluster'); cluster.status();

With our InnoDB Cluster up and running, all we need to do now is to let the application connect seamlessly using MySQL Router.

I’m installing on the same servers as the instances but the normal operation would be to install each mysqlrouter process on the app server.

And to make things more clearer, I’ll only install 2 mysqlrouters on ic1 & ic3:

On ic1:

sudo mkdir /opt/mysql/router chown -R mysql:mysql /opt/mysql/router mysqlrouter --bootstrap ic:oracle@10.0.0.12:3306 --directory /opt/mysql/router --user=mysql /opt/mysql/router/start.sh

On ic3:

sudo mkdir /opt/mysql/router chown -R mysql:mysql /opt/mysql/router mysqlrouter --bootstrap ic:oracle@10.0.0.12:3306 --directory /opt/mysql/router --user=mysql --force /opt/mysql/router/start.sh

As we’re on the same servers as the MySQL instance, we could even forget tcp/ip access and use sockets:

mysqlrouter --bootstrap ic:oracle@10.0.0.12:3306 --directory /opt/mysql/router --conf-use-sockets

In fact, maybe installing a 3rd mysqlrouter process away from the app servers isn’t a bad idea.. and the connect string from the app could choose the IP’s & sockets of the first 2 mysqlrouters, and if they fail, there’s a fall back mysqlrouter connection.

So, now they’re configured, let’s test them. Try this on both ic3 (or ic1):

mysql -uic -poracle -P6446 -h10.0.0.11 -e "select @@hostname" mysql -uic -poracle -P6446 -h10.0.0.11 -e "select @@hostname" mysql -uic -poracle -P6446 -h10.0.0.11 -e "select @@hostname"

Remember, by default, port 6446 is for RW and port 6447 read-only, and that is just read-only.

mysql -uic -poracle -P6447 -h10.0.0.11 -e "select @@hostname" mysql -uic -poracle -P6447 -h10.0.0.11 -e "select @@hostname" mysql -uic -poracle -P6447 -h10.0.0.11 -e "select @@hostname"

# And on ic1 or ic3:

mysql -uic -poracle -P6446 -h10.0.0.13 -e "select @@hostname" mysql -uic -poracle -P6446 -h10.0.0.13 -e "select @@hostname" mysql -uic -poracle -P6446 -h10.0.0.13 -e "select @@hostname" mysql -uic -poracle -P6447 -h10.0.0.13 -e "select @@hostname" mysql -uic -poracle -P6447 -h10.0.0.13 -e "select @@hostname" mysql -uic -poracle -P6447 -h10.0.0.13 -e "select @@hostname"

Obviously, you can personalize mysqlrouter. Let’s read from all 3 instances independently of whomever is primary or secondary in the innodbcluster_metadata_schema cache:

vi /opt/mysql/router/mysqlrouter.conf .. [metadata_cache:myCluster] router_id=2 bootstrap_server_addresses=mysql://10.0.0.12:3306,mysql://10.0.0.11:3306,mysql://10.0.0.13:3306 user=mysql_router2_r7wlgz3p4fct metadata_cluster=myCluster ttl=300 #ttl=30 .. [routing:myCluster_default_ro] bind_address=0.0.0.0 bind_port=6447 #destinations=metadata-cache://myCluster/default?role=SECONDARY destinations=ic1,ic2,ic3 mode=read-only protocol=classic ..

Thanks for reading. Hopefully this has helped someone…

…. Next up, a complete disaster recovery situation …


Towards Bigger Small Data


 In MySQL ecosystems, it has been a very common practice to shard MySQL databases by application keys, and to manage multiple small sized MySQL instances. Main reason of the sharding was that a single server could not handle so much data. 10 years ago, typical commodity servers had only 16GB RAM, and typical storage configuration was RAID10 with 6~12 SATA HDDs. There was no affordable flash storage available at that time. Because such machines could handle only a few hundred random IOPS, and buffer pool was so limited, we couldn't put much data on a single server — at most a few hundred GBs per server. Even small-mid sized applications easily exceeded single server capacity. They had to split databases into multiple servers.

Disadvantages of Sharding
 Sharding by applications has been a common practice to scale MySQL beyond single machine. But there are a couple of disadvantages like the followings.
  • You have to write application logic to manage shards. Also, you need to manage many more MySQL instances
  • Atomic transaction is not supported across multiple shards
  • Can not take globally consistent backups across multiple shards
  • Cross instance join is not supported (or very slow)
  • Hard to use secondary keys efficiently. Unless secondary keys are part of sharding keys, you need to query all shards, which is very expensive, especially if you need sorting
 Atomic transaction, join, and secondary keys are big deals. Not all applications can easily give up them. They might not be willing to spend time to implement sharding logic, either.

Shared Nothing SQL database is not general purpose
 There are a couple of database products that offer transparent sharding by database internals. MySQL Cluster (NDB) is one of the MySQL engines that has existed for long years, offering shared nothing distributed databases. You can run atomic transactions, run cross-shard joins across multiple instances (data nodes). MySQL Cluster supports NoSQL interface (it is called NDBAPI), to query directly into data nodes, which boost performance significantly. I'm not going to talk about the NoSQL interface, since I'd like to discuss SQL database. Note that MySQL Cluster is a product name and it is not InnoDB. If you want to continue to use InnoDB, Vitess can be a good solution. It helps to build transparent, shared nothing database.  Transparent shared nothing databases like MySQL Cluster, Vitess solved some of the issues mentioned above, but there are still issues like below, and users may need to redesign tables and/or rewrite queries as needed.
  • Limited secondary key support. This was the same as I mentioned above. In MySQL Cluster, by default, rows are distributed by primary keys. If you do not explicitly specify primary keys in WHERE clause, queries need to scan all data nodes, which may significantly limit concurrency and increase latency, depending on how many data nodes you have and other query conditions (e.g. sorting).
  • Queries using JOINs may need lots of network tound-trips. Lots of improvements have been done in MySQL Cluster for handling joins, but it's still slower than InnoDB, especially if queries are complex.
 As a result, to utilize MySQL Cluster, it was generally recommended not using secondary keys, and not using joins frequently. There are many good use cases, especially if people are 100% sure how their databases are used, but it is not recommended as a general purpose SQL database.

Bigger Small Data and its challenges
 Several people at Facebook called MySQL services as "Small Data". Combined MySQL instance size was pretty large, but each instance size was small enough (normally up to 1TB). 10 years ago, people had to run small instances because of limited hardware support. Nowadays, commodity servers have more than 256GB RAM and more than 10TB Flash storage. There are many small-mid databases that fit in 10TB. Successful databases grow beyond 10TB, so they will have to shard anyway. But how about experimental projects, and/or many other applications that are expected to grow up to limited size? Instead of spending engineering efforts to manage shards and rewrite tables/queries, why not just put everything on a single server and take all advantages like atomic and consistent transactions, secondary indexes and joins — running "Bigger Small Data"?  There are a couple of public MySQL services supporting bigger storage size. Amazon Aurora (AWS) and Alibaba PolarDB (Alibaba Cloud) are both based on MySQL and claim to support more than 60TB instance size. It was not surprising to me that they chose bigger small data rather than shared nothing distributed database, because they had lots of customers who wanted to do whatever they wanted. They couldn’t control customers not to use joins. But being supported does not necessarily mean working as expected. To make MySQL really work with bigger small data, it needs lots of improvements, beyond 8.0, especially improving concurrency and long running operations, including the followings.
  • Parallel query
  • Parallel binary copy
  • Parallel DDL
  • Resumable DDL
  • Better join algorithm
  • Much faster replication
  • Handling many more connections
  • Good resource control, so that some bad users don't eat all resources

 These are needed at least, to answer questions like "how can MySQL handle general operations, if our instance grows 10x bigger"?

 I'm not worried about short queries — row look-ups by primary keys or secondary keys. These are where MySQL has been great so far. I'm worried about long running queries. The most common case of the long running queries would be full table scan from a single table. In general, logical dump from 10TB table takes (much) more than 10 times, compared to scanning from 1TB table. InnoDB needs to keep history list for consistent reads. Maintaining history list and reading consistent snapshots from rollback segments get more expensive, if rows are heavily updated during running long running transactions. If you run daily backups by mysqldump, you might not tolerate 10x~20x longer backup time — which might not finish in 24 hours. To make logical dump shorter, parallel query support is needed, and this is not currently not supported by MySQL. Physical backup (binary copy) also needs parallelism, though it can be relatively easily implemented, since physical backup tools are written out side of MySQL and are easily extended.

 Running ALTER TABLE on 10TB table is also challenging. Amazon Aurora supports instant DDL — adding a nullable column at the end of a table can be done without rebuilding the table. But there are still many DDL operations requiring copying tables. First, you will want to know when it ends. MySQL currently does not tell that. If it is expected to take 20 days, you might be worried what will happen if mysqld restarts before finishing the DDL. It would be great if the database remembers DDL state periodically, and can resume operations when restarting mysqld.

 Replication is another technical challenge. MySQL replication is asynchronous. Slaves are often lagged if master instances are heavily updated. On bigger small data, update volume on master can be 10x or even more. How slaves can handle 10x more replication traffics? Amazon Aurora does not have MySQL replication lag issue, if you run it in a single region. Aurora has 6x storage copies in the same region, across three availability zones. You can scale reads in the same region. But scaling reads across multiple regions requires MySQL Replication, and it is challenging unless making MySQL Replication a lot faster. Alibaba PolarDB offers InnoDB physical replication across different datacenters, which is significantly more concurrent and faster than binlog based replication. But you are constrained to one engine (InnoDB, though it's by far the most common engine) and debugging replication issues might be harder, since it's no longer binlog based.

 Reliability improvement should not be ignored. 10x larger data means the instance serves many more connections and queries. If handful bad queries take the whole instance unavailable, the impact in bigger small data is much higher than in small instances. Good resource management is needed. High priority queries can be useful too, for making some important low latency queries finish earlier, without being affected by expensive queries.

 There are lots of technical challenges to make Bigger Small Data really work. I expect Amazon Aurora will be ahead to implement these important features to make Aurora truly support bigger data. But I'm almost sure that AWS won't release them as open source software. For everybody to get such features, somebody else will have to implement. I hope Oracle will do, but I understand that they need to compete Amazon Aurora and Oracle will not be willing to give their features to AWS for free. More realistic scenario might be multiple companies, including us, implementing features, releasing as open source and contributing to Oracle and/or MariaDB. I think these are interesting technical projects for MySQL for a couple of years.

Sysbench: in-memory, MyISAM and a small server, MySQL 5.0 to 8.0

This has results for in-memory sysbench with MyISAM and MySQL versions 5.0, 5.1, 5.5, 5.6, 5.7 and 8.0. I will publish results for InnoDB tomorrow.

tl;dr - from 5.0.96 to 8.0.3
  • For write-heavy tests QPS decreased by more than 50%
  • For scan-heavy tests QPS decreased by no more than 14%
  • For point-query tests QPS decreased by 30% on the i5 NUC and 40% on the i3 NUC
  • For inlist-query tests QPS decreased by 25% on the i5 NUC and 21% on the i3 NUC. There is a large reduction in QPS between MySQL 5.5.51 and 5.6.35. Maybe the optimizer overhead grew for queries with large in-lists.

tl;dr - from 5.6.35 to 8.0.3
  • For write-heavy tests QPS decreased by ~30%
  • For scan-heavy tests QPS did not decrease
  • For point-query tests QPS decreased by ~15%
  • For inlist-query tests QPS increased by 9% on the i5 NUC and decreased by 5% on the i3 NUC

Configuration

The tests used MyISAM from upstream MySQL versions 5.0.96, 5.1.72, 5.5.51, 5.6.35, 5.7.17 and 8.0.3. All tests used jemalloc with mysqld. The i3 and i5 NUC servers are described here. My use of sysbench is described here. The my.cnf files are here for the i3 NUC and i5 NUC. I tried to tune my.cnf for all engines including: disabled SSL for MySQL 5.7, used the same charset and collation. For all tests the binlog was enabled but fsync was disabled for the binlog and database redo log. I compiled all of the MySQL versions on the test servers and did not use binaries from upstream.

Sysbench is run with 2 tables and 2M rows per table. Each test is repeated for 1 and 2 clients. Each test runs for 600 seconds except for the insert-only test which runs for 300 seconds. The database fits in RAM as the i3 NUC has 8gb of RAM and the i5 NUC has 16gb.

I repeat tests on an i5 NUC and i3 NUC. The i5 NUC has more RAM, a faster SSD and faster CPU than the i3 NUC, but I disabled turbo boost on the i5 NUC many months ago to reduce variance in performance and with that the difference in CPU performance between these servers is much smaller. Assuming I will always disable turbo boost in the future I might as well stick with an i3 NUC for my next purchase.

Why MyISAM

Results with MyISAM are a great complement to results with InnoDB. InnoDB has changed significantly across releases. I doubt that MyISAM has. Results with MyISAM make it easier to isolate the impact of the code that runs about MyISAM including parse, optimize and network handling.

I also like MyISAM even though I never depended on it in production.

Results

All of the data for the tests is on github for the i3 NUC and the i5 NUC. Results for each test are listed separately below. For all tests but scan the result has the QPS for the test with 1 client relative to the QPS for MyISAM in MySQL 5.0.96. The tests are explained here.

Graphs

There are 4 types of tests and I provided a graph for each type: write-heavy, scan-heavy, point-query, inlist-query. Fortunately the results within each group are similar and one graph per group is sufficient. The tests are explained here.

The write-heavy group includes update-inlist, update-one, update-index, update-nonindex, delete and insert. The graph is for update-nonindex using the QPS for each MySQL version relative to the QPS for MySQL 5.0.96. From MySQL 5.0.96 to 8.0.3 the QPS decrease is more than 50%. From MySQL 5.6.35 to 8.0.3 the decrease is ~30%. I assume the QPS decrease is from the CPU overhead of code above MyISAM.

The scan-heavy group includes read-write with range-size set to 100 and 10,000 and then read-only with range-size set to 100 and 10,000. The graph is for read-write with range-size=100 using the QPS for each MySQL version relative to the QPS for MySQL 5.0.96. The regression for this test is worse than for the other scan-heavy tests. From MySQL 5.0.96 to 8.0.3 the QPS decrease is 10% on the i5 NUC and 14% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 there is no decrease. The point-query group includes the point-query test run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests using the QPS for each MySQL version relative to the QPS for MySQL 5.0.96. From MySQL 5.0.96 to 8.0.3 the QPS decrease is 30% on the i5 NUC and 40% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 the decrease is ~15%. The inlist-query group includes the random-points test run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests using the QPS for each MySQL version relative to the QPS for MySQL 5.0.96. From MySQL 5.0.96 to 8.0.3 the QPS decrease is 25% on the i5 NUC and 21% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 the increase is 9% on the i5 NUC and the decrease is 5% on the i3 NUC.
  scan

I don't have much to write. The scan takes 0 or 1 seconds when the time is rounded to a whole number and it is hard to use that for comparisons.

update-inlist

Here and the sections that follow have the QPS and QPS ratio for each MySQL release on the i3 and i5 NUC. The QPS ratio is the QPS for the release relative to the QPS for MySQL 5.0.96 using the test with 1 client.

From MySQL 5.0.96 to 8.0.3 QPS decreased by more than 50%. From MySQL 5.6.35 to 8.0.3 QPS decreased by ~38%. All of the update-only tests show a similar regression. QPS on the i5 NUC is not much better than on the i3 NUC probably because I disabled turbo boost.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
9099    1.00    9191    1.00    5.0.96
8361    0.92    8483    0.92    5.1.72
7711    0.85    7892    0.86    5.5.51
6590    0.72    7008    0.76    5.6.35
5171    0.57    5663    0.62    5.7.17
5143    0.57    5468    0.59    8.0.1
4122    0.45    4477    0.49    8.0.2
4134    0.45    4352    0.47    8.0.3

update-one

From MySQL 5.0.96 to 8.0.3 QPS decreased by more than 50%. From MySQL 5.6.35 to 8.0.3 QPS decreased by ~25%. All of the update-only tests show a similar regression.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
21569   1.00    21117   1.00    5.0.96
18491   0.86    18254   0.86    5.1.72
15775   0.73    15884   0.75    5.5.51
12947   0.60    13329   0.63    5.6.35
11019   0.51    11734   0.56    5.7.17
10098   0.47    10773   0.51    8.0.1
 9197   0.43    10093   0.48    8.0.2
 8983   0.42     9920   0.47    8.0.3

update-index

From MySQL 5.0.96 to 8.0.3 QPS decreased by ~50%. From MySQL 5.6.35 to 8.0.3 QPS decreased by ~25%. All of the update-only tests show a similar regression from 5.0.96 to 8.0.3.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
18757   1.00    17301   1.00    5.0.96
15413   0.82    15654   0.90    5.1.72
13545   0.72    13793   0.80    5.5.51
11288   0.60    11519   0.67    5.6.35
 9504   0.51    10066   0.58    5.7.17
 8798   0.47     9407   0.54    8.0.1
 8038   0.43     8851   0.51    8.0.2
 8039   0.43     8744   0.51    8.0.3

update-nonindex

From MySQL 5.0.96 to 8.0.3 QPS decreased by more than 50%. From MySQL 5.6.35 to 8.0.3 QPS decreased by ~27%. All of the update-only tests show a similar regression from 5.0.96 to 8.0.3.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
21501   1.00    20200   1.00    5.0.96
17892   0.83    17926   0.89    5.1.72
15533   0.72    16066   0.80    5.5.51
12502   0.58    13353   0.66    5.6.35
10709   0.50    11458   0.57    5.7.17
 9811   0.46    10621   0.53    8.0.1
 8819   0.41     9667   0.48    8.0.2
 8852   0.41     9704   0.48    8.0.3

delete

From MySQL 5.0.96 to 8.0.3 QPS decreased by ~50%. From MySQL 5.6.35 to 8.0.3 QPS decreased by ~25%. The regression for delete is similar to the update-only tests above.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
19216   1.00    18615   1.00    5.0.96
16669   0.87    16932   0.91    5.1.72
14918   0.78    15045   0.81    5.5.51
12444   0.65    13008   0.80    5.6.35
11184   0.58    11624   0.62    5.7.17
10175   0.53    10779   0.58    8.0.1
 9387   0.49    10080   0.54    8.0.2
 9138   0.48     9989   0.54    8.0.3

read-write with --range-size=100

Of all the scan heavy tests that follow this one has the worst regression. But 10% on the i5 NUC and 14% on the i3 NUC over so many releases is not a big deal. The scans here are smaller than for the next test, so the regression from writes is more significant here.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
 9359   1.00    9941    1.00    5.0.96
 8711   0.93    9595    0.97    5.1.72
 8860   0.95    9546    0.96    5.5.51
 7945   0.85    8842    0.89    5.6.35
 8304   0.89    9226    0.93    5.7.17
 8100   0.87    8848    0.89    8.0.1
 8089   0.86    8968    0.90    8.0.2
 8059   0.86    8919    0.90    8.0.3

read-write with --range-size=10000

The regression from 5.0.96 to 8.0.3 is small to non-existent. See the comment for the previous result.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
227     1.00    240     1.00    5.0.96
225     0.99    239     1.00    5.1.72
209     0.92    236     0.98    5.5.51
193     0.85    217     0.90    5.6.35
254     1.12    283     1.18    5.7.17
216     0.95    242     1.01    8.0.1
215     0.95    238     0.99    8.0.2
214     0.94    238     0.99    8.0.3

read-only with --range-size=100

The regression here is small. Compared to the result for read-write above with range-size=100, this test doesn't suffer the CPU regression from writes.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
8510    1.00    9549    1.00    5.0.96
8202    0.96    8939    0.94    5.1.72
8268    0.97    8937    0.94    5.5.51
7613    0.89    8324    0.87    5.6.35
8370    0.98    9121    0.96    5.7.17
8287    0.87    9307    0.97    8.0.1
8458    0.99    9374    0.98    8.0.2
8367    0.98    9129    0.96    8.0.3

read-only.pre with --range-size=10000

No regression here as QPS is flat to increasing from 5.0 to 8.0. This test is run before and after the write-heavy tests to determine the impact from fragmentation. If you compare the QPS from this test and the one that follows there is an impact as the QPS here is slightly larger.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
178     1.00    214     1.00    5.0.96
179     1.01    214     1.00    5.1.72
168     0.94    197     0.92    5.5.51
160     0.90    190     0.89    5.6.35
216     1.21    253     1.18    5.7.17
199     1.12    224     1.05    8.0.1
187     1.05    212     0.99    8.0.2
188     1.06    215     1.00    8.0.3

read-only with --range-size=10000

The regression here from 5.0 to 8.0 is small.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
175     1.00    188     1.00    5.0.96
173     0.99    187     0.99    5.1.72
162     0.93    186     0.99    5.5.51
150     0.86    170     0.90    5.6.35
199     1.14    223     1.19    5.7.17
169     0.97    189     1.01    8.0.1
168     0.96    186     0.99    8.0.2
168     0.96    186     0.99    8.0.3

point-query.pre

This test is run before the write-heavy tests. The result in the next section are from the same workload run after the write-heavy tests. The QPS difference between the two is small. From MySQL 5.0.96 to 8.0.3 the QPS decreased by 29% on the i5 NUC and 41% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 the QPS decreased by ~15%.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
27525   1.00    44193   1.00    5.0.96
23924   0.87    43199   0.98    5.1.72
20545   0.75    40376   0.91    5.5.51
19287   0.70    37051   0.84    5.6.35
17562   0.64    34449   0.78    5.7.17
17206   0.63    32156   0.73    8.0.1
16713   0.61    31871   0.72    8.0.2
16241   0.59    31504   0.71    8.0.3

point-query

From MySQL 5.0.96 to 8.0.3 the QPS decreased by 30% for the i5 NUC and 40% for the i3 NUC. From MySQL 5.6.35 to 8.0.3 the QPS decreased by ~14%.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
27032   1.00    44978   1.00    5.0.96
22935   0.85    43018   0.96    5.1.72
20339   0.75    39985   0.89    5.5.51
18974   0.70    36666   0.82    5.6.35
17837   0.66    34467   0.77    5.7.17
16810   0.62    32055   0.71    8.0.1
16327   0.60    31946   0.71    8.0.2
16352   0.60    31389   0.70    8.0.3

random-points.pre

This test is run before the write-heavy tests. The result in the next section are from the same workload run after the write-heavy tests. There is little difference between the two. From MySQL 5.0.96 to 8.0.3 the QPS decreased by 33% for the i5 NUC and 21% for the i3 NUC. From MySQL 5.6.35 to 8.0.3 the QPS decreased by ~5%. The biggest decrease in QPS comes between 5.5.51 and 5.6.35. The regression here is smaller than for the point-query tests above. The point-query does a lot of work (network, parse, optimize) to fetch 1 row. That overhead here is amortized over more rows as this uses an in-list to fetch 100 rows. But the regression here is larger than I expected perhaps because of the regression from 5.5 to 5.6.


i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
1971    1.00    3321    1.00    5.0.96
1951    0.99    3478    1.05    5.1.72
1941    0.98    3285    0.99    5.5.51
1682    0.85    2311    0.70    5.6.35
1625    0.82    2149    0.65    5.7.17
1591    0.81    2206    0.66    8.0.1
1584    0.80    2206    0.66    8.0.2
1559    0.79    2219    0.67    8.0.3

random-points

See the comment above. From MySQL 5.0.96 to 8.0.3 the QPS decreased by 25% for the i5 NUC and 21% for the i3 NUC. From MySQL 5.6.35 to 8.0.3 the QPS increased by 8% on the i5 NUC and decreased by 5% on the i3 NUC. The biggest decrease in QPS comes between 5.5.51 and 5.6.35.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
1917    1.00    3151    1.00    5.0.96
1895    0.99    3146    1.00    5.1.72
1844    0.96    3109    0.99    5.5.51
1584    0.83    2168    0.69    5.6.35
1589    0.83    2109    0.67    5.7.17
1559    0.81    2064    0.66    8.0.1
1530    0.80    2088    0.66    8.0.2
1512    0.79    2351    0.75    8.0.3

hot-points

From MySQL 5.0.96 to 8.0.3 the QPS decreased by ~25%. From MySQL 5.6.35 to 8.0.3 the QPS decreased by ~10%. The biggest decrease in QPS comes between 5.5.51 and 5.6.35 which is similar to the results above for random-points. Both use large in-lists.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   release
2689    1.00    2910    1.00    5.0.96
2570    0.96    2831    0.97    5.1.72
2559    0.95    2850    0.98    5.5.51
2192    0.82    2407    0.83    5.6.35
2113    0.79    2301    0.79    5.7.17
2057    0.76    2265    0.78    8.0.1
2015    0.75    2228    0.77    8.0.2
1937    0.72    2184    0.75    8.0.3

insert

From MySQL 5.0.96 to 8.0.3 the QPS decreased by ~50%. From MySQL 5.6.35 to 8.0.3 the QPS decreased by ~27%. This is similar to the other write-heavy tests.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
17603   1.00    16862   1.00    5.0.96
18724   1.06    19231   1.14    5.1.72
13510   0.77    14043   0.83    5.5.51
12056   0.68    12830   0.76    5.6.35
10471   0.59    10963   0.65    5.7.17
 9719   0.55    10147   0.60    8.0.1
 8906   0.51     9473   0.56    8.0.2
 8718   0.50     9316   0.55    8.0.3

Advice on advice

Advice is free but sometimes you get what you pay for. In technology there is an abundance of great ideas and much of that free advice might be excellent. Thanks to my time with web-scale MySQL I have a lot of experience in receiving advice so I will offer advice on giving it. I realize that my advice on advice applies to my advice. I don't write many posts like this, my last might have been this one.

A big problem is that time is limited. We don't have enough time to evaluate all of the good ideas. Just like a modern query optimizer we can't do exhaustive search. Our innovation budget is small so we then have to find the best idea worth doing that can be implemented and deployed given the small budget. Another problem is uncertainty. The system we have is likely good enough today. It was probably good enough yesterday. Any benchmark is unlikely to capture the ability to adapt to changes over time in workload and hardware. That ability is valuable.

Finally, my advice. The list below is ordered from least to most effective ways to offer advice:
  • You should do X (and can you tell me more about your system?)
  • ... and I have taken the time to understand your system
  • ... and I will get the resources to evaluate it
  • ... and I will get the resources to implement it
  • ... and I will join the oncall for it when this is deployed
  • ... and my reputation is on the line when this goes horribly wrong
  • ... and I will stick around for a few years to fix all of the bugs that show up


HAProxy: All the Severalnines Resources

Load balancers are an essential component in MySQL and MariaDB database high availability; especially when making topology changes transparent to applications and implementing read-write split functionality.

HAProxy is free, open source software that provides a high availability load balancer and proxy server for TCP and HTTP-based applications that spreads requests across multiple servers.

ClusterControl provides support for deployment, configuration and optimization of HAProxy as well as for other popular load balancing and caching technologies for MySQL and MariaDB databases.

Here are our top resources for HAProxy to get you started with this widely used technology.

Tutorials MySQL Load Balancing with HAProxy - Tutorial

We have recently updated our tutorial on MySQL Load Balancing with HAProxy. Read about deployment and configuration, monitoring, ongoing maintenance, health check methods, read-write splitting, redundancy with VIP and Keepalived and more.

Read More

On-Demand Webinars How to deploy and manage HAProxy, MaxScale or ProxySQL with ClusterControl

In this webinar we talk about support for proxies for MySQL HA setups in ClusterControl: how they differ and what their pros and cons are. And we show you how you can easily deploy and manage HAProxy, MaxScale and ProxySQL from ClusterControl during a live demo.

Watch the replay

How To Set Up SQL Load Balancing with HAProxy

In this webinar, we cover the concepts around the popular open-source HAProxy load balancer, and shows you how to use it with your SQL-based database clusters.

Watch the replay

Performance Tuning of HAProxy for Database Load Balancing

This webinar discusses the performance tuning basics for HAProxy and explains how to take advantage of some of the new features in 1.5, which was released in June 2014 after 4 years of development work.

Watch the replay

Introducing the Severalnines MySQL© Replication Blueprint

The Severalnines Blueprint for MySQL Replication includes all aspects of a MySQL Replication topology with the ins and outs of deployment, setting up replication, monitoring, upgrades, performing backups and managing high availability using proxies as ProxySQL, MaxScale and HAProxy. This webinar provides an in-depth walk-through of this blueprint and explains how to make best use of it.

Watch the replay

Top Blogs HAProxy Connections vs MySQL Connections - What You Should Know

Max connections determines the maximum number of connections to the database server. This can be set on both the database server, or the proxy in front of it. In this blog post, we’ll dive into HAProxy and MySQL maximum connections variables, and see how to get the best of both worlds.

Read More

SQL Load Balancing Benchmark - Comparing Performance of MaxScale vs HAProxy

In a previous post, we gave you a quick overview of the MaxScale load balancer and walked through installation and configuration. We did some quick benchmarks using sysbench, a system performance benchmark that supports testing CPU, memory, IO, mutex and also MySQL performance. We will be sharing the results in this blog post.

Read More

Load balanced MySQL Galera setup - Manual Deployment vs ClusterControl

Deploying a MySQL Galera Cluster with redundant load balancing takes a bit of time. This blog looks at how long it would take to do it manually vs using ClusterControl to perform the task.

Read More

Read-Write Splitting for Java Apps using Connector/J, MySQL Replication and HAProxy

In this blog post, we will play around with Java and MySQL Replication to perform read-write splitting for Java Apps using Connector/J.

Read More

ClusterControl Single Console for Your Entire Database Infrastructure Find out what else is new in ClusterControl Install ClusterControl for FREE High availability read-write splitting with php-mysqlnd, MySQL Replication and HAProxy

In this blog post, we explore the use of php-mysqlnd_ms with a PHP application (Wordpress) on a standard MySQL Replication backend.

Read More

Become a ClusterControl DBA: Making your DB components HA via Load Balancers

There are various ways to retain high availability with databases. You can use Virtual IPs (VRRP) to manage host availability, you can use resource managers like Zookeeper and Etcd to (re)configure your applications or use load balancers/proxies to distribute the workload over all available hosts.

Read More

Wordpress Application Clustering using Kubernetes with HAProxy and Keepalived

In this blog post, we’re going to play with Kubernetes application clustering and pods. We’ll use Wordpress as the application, with a single MySQL server. We will also have HAProxy and Keepalived to provide simple packet forwarding (for external network) with high availability capability.

Read More

How Galera Cluster Enables High Availability for High Traffic Websites

This post gives an insight into how Galera can help to build HA websites.

Read More

Tags:  haproxy MySQL load balancer clustercontrol

MySQL InnoDB Cluster: how to handle performance issue on one member ?

 

Sometimes when you are using a MySQL InnoDB Cluster, you might encounter some performance issue because one node becomes dramatically slow.

Why ?

First of all, why ? A node can apply the transactions slower than the other nodes for many different reasons. The most frequents are for example, slower disks (remember, it’s advised to have nodes with the same specifications), but if you are using a RAID controller with a BBU, during the learning cycle, the write performance can decrease by 10 or even more. Another example could be an increase of IO operations that will flood the full IO capacity of the system. Making a local backup or sharing the server resources with some other components could lead in such behavior.

Flow Control

To avoid to have a node lagging to much behind and try to sustain the same throughput all over the cluster, Group Replication uses a flow control mechanism (see this post to understand how it works). In summary, when a node as an apply queue increasing and reaching a threshold, the other ones will slow down the amount of transactions they commit to let the slowing member the time to recover.

And me ?

But in such situation, is there something I can do ?

In fact, yes ! There are two types of cases leading to such situation:

  1. an unforeseen incident
  2. a scheduled operation

For the first case, that’s why you must monitor your MySQL InnoDB Cluster. In the performance_schema.replication_group_member_stats table, you have the mount of transaction in the apply queue (COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE):

select * from replication_group_member_stats\G *************************** 1. row *************************** CHANNEL_NAME: group_replication_applier VIEW_ID: 15100558540053980:3 MEMBER_ID: a2f6f75f-c3b2-11e7-9324-08002718d305 COUNT_TRANSACTIONS_IN_QUEUE: 0 COUNT_TRANSACTIONS_CHECKED: 0 COUNT_CONFLICTS_DETECTED: 0 COUNT_TRANSACTIONS_ROWS_VALIDATING: 0 TRANSACTIONS_COMMITTED_ALL_MEMBERS: a38ed37b-c3b2-11e7-94e6-08002718d305:1-14, d59a083e-c3b2-11e7-b358-08002718d305:1-18 LAST_CONFLICT_FREE_TRANSACTION: d59a083e-c3b2-11e7-b358-08002718d305:18 COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0 COUNT_TRANSACTIONS_REMOTE_APPLIED: 0 COUNT_TRANSACTIONS_LOCAL_PROPOSED: 0 COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0 *************************** 2. row *************************** CHANNEL_NAME: group_replication_applier VIEW_ID: 15100558540053980:3 MEMBER_ID: a301c6af-c3b2-11e7-98c8-08002718d305 COUNT_TRANSACTIONS_IN_QUEUE: 0 COUNT_TRANSACTIONS_CHECKED: 7 COUNT_CONFLICTS_DETECTED: 0 COUNT_TRANSACTIONS_ROWS_VALIDATING: 0 TRANSACTIONS_COMMITTED_ALL_MEMBERS: a38ed37b-c3b2-11e7-94e6-08002718d305:1-14, d59a083e-c3b2-11e7-b358-08002718d305:1-18 LAST_CONFLICT_FREE_TRANSACTION: d59a083e-c3b2-11e7-b358-08002718d305:18 COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0 COUNT_TRANSACTIONS_REMOTE_APPLIED: 8 COUNT_TRANSACTIONS_LOCAL_PROPOSED: 0 COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0 *************************** 3. row *************************** CHANNEL_NAME: group_replication_applier VIEW_ID: 15100558540053980:3 MEMBER_ID: a38ed37b-c3b2-11e7-94e6-08002718d305 COUNT_TRANSACTIONS_IN_QUEUE: 0 COUNT_TRANSACTIONS_CHECKED: 15 COUNT_CONFLICTS_DETECTED: 0 COUNT_TRANSACTIONS_ROWS_VALIDATING: 0 TRANSACTIONS_COMMITTED_ALL_MEMBERS: a38ed37b-c3b2-11e7-94e6-08002718d305:1-14, d59a083e-c3b2-11e7-b358-08002718d305:1-18 LAST_CONFLICT_FREE_TRANSACTION: d59a083e-c3b2-11e7-b358-08002718d305:18 COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0 COUNT_TRANSACTIONS_REMselect * from sys.gr_member_routing_candidate_statusOTE_APPLIED: 2 COUNT_TRANSACTIONS_LOselect * from sys.gr_member_routing_candidate_statusCAL_PROPOSED: 15 COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0 3 rows in set (0.00 sec)

This is a example of query you can then use:

mysql> SELECT MEMBER_HOST, COUNT_TRANSACTIONS_IN_QUEUE TRX_LOCAL_Q, COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE TRX_APPLY_Q FROM performance_schema.replication_group_member_stats t1 JOIN performance_schema.replication_group_members t2 ON t2.MEMBER_ID=t1.MEMBER_ID; +-------------+-------------+-------------+ | MEMBER_HOST | TRX_LOCAL_Q | TRX_APPLY_Q | +-------------+-------------+-------------+ | mysql1 | 0 | 0 | | mysql3 | 0 | 8415 | | mysql2 | 0 | 0 | +-------------+-------------+-------------+

You can also use a SYS schema view I created (get it here):

select * from sys.gr_member_routing_candidate_status; +------------------+-----------+---------------------+----------------------+ | viable_candidate | read_only | transactions_behind | transactions_to_cert | +------------------+-----------+---------------------+----------------------+ | YES | NO | 0 | 0 | +------------------+-----------+---------------------+----------------------+ Safe Procedure

So when this happens (expected or not), how should our cluster behave with that member ?

When the incident is too long and flow controls starts to kick in, all the cluster will start slowing down… if this is an incident expected to be long to solve or a maintenance, the first thing we would like to do is to stop sending MySQL traffic to the node (reads and/or writes in case of Multi-Primary cluster). Then we want that the specific node stops sending its statistics to the other nodes.

If you are using ProxySQL as router for your MySQL InnoDB Cluster, as soon as the threshold of transactions_behind is reached, ProxySQL will automatically stop sending traffic to it (see this post). If you are using MySQL-Router, actually the best way to handle this is to add a firewall rule blocking the communication between the router and the MySQL instance running on that member.

Then when we stop sending traffic to the member, we can finally tell it to stop sending its statistics related to the flow control (to be ignored):

mysql> set global group_replication_flow_control_mode='DISABLED';

Now that this node won’t trigger any flow control anymore, the cluster will run as its optimal speed and this provides you extra time to fix the problem or finish the maintenance. The node is still part of the cluster !… it’s just lagging and nobody cares.

That’s why in a multi-primary setup, it’s recommended to not write to it as conflicts may happen very often and as it’s lagging, reading old data might also not be optimal.

As soon as the maintenance is finished and the queue recovered, you can set it back to ‘QUOTA’. This feature works as expected in 8.0.3, on lower releases the statistics were not completely ignored and flow control was kicked anyway.

Let’s see this in action in the following video cast:

On the future of Data Warehousing. Interview with Jacque Istok and Mike Waas

” Open source software comes with a promise, and that promise is not about looking at the code, rather it’s about avoiding vendor lock-in.” –Jacque Istok.

” The cloud has out-paced the data center by far and we should expect to see the entire database market being replatformed into the cloud within the next 5-10 years.” –Mike Waas.

I have interviewed Jacque Istok, Head of Data Technical Field for Pivotal, and Mike Waas, founder and CEO Datometry.
Main topics of the interview are: the future of Data Warehousing, how are open source and the Cloud affecting the Data Warehouse market, and Datometry Hyper-Q and Pivotal Greenplum.

RVZ

Q1. What is the future of Data Warehouses?

Jacque Istok: I believe that what we’re seeing in the market is a slight course correct with regards to the traditional data warehouse. For 25 years many of us spent many cycles building the traditional data warehouse.
The single source of the truth. But the long duration it took to get alignment from each of the business units regarding how the data related to each other combined with the cost of the hardware and software of the platforms we built it upon left everybody looking for something new. Enter Hadoop and suddenly the world found out that we could split up data on commodity servers and, with the right human talent, could move the ball forward faster and cheaper. Unfortunately the right human talent has proved hard to come by and the plethora of projects that have spawned up are neither production ready nor completely compliant or compatible with the expensive tools they were trying to replace.
So what looks to be happening is the world is looking for the features of yesterday combined with the cost and flexibility of today. In many cases that will be a hybrid solution of many different projects/platforms/applications, or at the very least, something that can interface easily and efficiently with many different projects/platforms/applications.

Mike Waas: Indeed, flexibility is what most enterprises are looking for nowadays when it comes to data warehousing. The business needs to be able to tap data quickly and effectively. However, in today’s world we see an enormous access problem with application stacks that are tightly bonded with the underlying database infrastructure. Instead of maintaining large and carefully curated data silos, data warehousing in the next decade will be all about using analytical applications from a quickly evolving application ecosystem with any and all data sources in the enterprise: in short, any application on any database. I believe data warehouses remain the most valuable of databases, therefore, cracking the access problem there will be hugely important from an economic point of view.

Q2. How is open source affecting the Data Warehouse market?

Jacque Istok: The traditional data warehouse market is having its lunch eaten by open source. Whether it’s one of the Hadoop distributions, one of the up and coming new NoSQL engines, or companies like Pivotal making large bets and open source production proven alternatives like Greenplum. What I ask prospective customers is if they were starting a new organization today, what platforms, databases, or languages would you choose that weren’t open source? The answer is almost always none. Open source software comes with a promise, and that promise is not about looking at the code, rather it’s about avoiding vendor lock-in.

Mike Waas: Whenever a technology stack gets disrupted by open source, it’s usually a sign that the technology has reached a certain maturity and customers have begun doubting the advantage of proprietary solutions. For the longest time, analytical processing was considered too advanced and too far-reaching in scope for an open source project. Greenplum Database is a great example for breaking through this ceiling: it’s the first open source database system with a query optimizer not only worth that title but setting a new standard, and a whole array of other goodies previously only available in proprietary systems.

Q3. Are databases an obstacle to adopting Cloud-Native Technology?

Jacque Istok: I believe quite the contrary, databases are a requirement for Cloud-Native Technology. Any applications that are created need to leverage data in some way. I think where the technology is going is to make it easier for developers to leverage whichever database or datastore makes the most sense for them or they have the most experience with – essentially leveraging the right tool for the right job, instead of the tool “blessed” by IT or Operations for general use. And they are doing this by automating the day 0, day 1, and day 2 operations of those databases. Making it easy to instantiate and use these platforms for anyone, which has never really been the case.

Mike Waas: In fact, a cloud-first strategy is incomplete unless it includes the data assets, i.e., the databases. Now, databases have always been one of the hardest things to move or replatform, and, naturally, it’s the ultimate challenge when moving to the cloud: firing up any new instance in the cloud is easy as 1-2-3 but what to do with the 10s of years of investment in application development? I would say it’s actually not the database that’s the obstacle but the applications and their dependencies.

Q4. What are the pros and cons of moving enterprise data to the cloud?

Jacque Istok: I think there are plenty of pros to moving enterprise data to the cloud, the extent of that list will really depend on the enterprise you’re talking to and the vertical that they are in. But cons? The only cons would be using these incredible tools incorrectly, at which point you might find yourself spending more money and feeling that things are slower or less flexible. Treating the cloud as a virtual data center, and simply moving things there without changing how they are architected or how they are used would be akin to taking

Mike Waas: I second that. A few years ago enterprises were still concerned about security, completeness of offering, and maturity of the stack. But now, the cloud has out-paced the data center by far and we should expect to see the entire database market being replatformed into the cloud within the next 5-10 years. This is going to be the biggest revolution in the database industry since the relational model with great opportunities for vendors and customers alike.

Q5. How do you quantify when is appropriate for an enterprise to move their data management to a new platform?

Jacque Istok: It’s pretty easy from my perspective, when any enterprise is done spending exorbitant amounts of money it might be time to move to a new platform. When you are coming up on a renewal or an upgrade of a legacy and/or expensive system it might be time to move to a new platform. When you have new initiatives to start it might be time to move to a new platform. When you are ready to compete with your competitors, both known and unknown (aka startups), it might be time to move to a new platform. The move doesn’t have to be scary either, as some products are designed to be a bridge to a modern a data platform.

Mike Waas: Traditionally, enterprises have held off from replatforming for too long: the switching cost has deterred them from adopting new and highly superior technology with the result that they have been unable to cut costs or gain true competitive advantage. Staying on an old platform is simply bad for business. Every organization needs to ask themselves constantly the question whether their business can benefit from adopting new technology. At Datometry, we make it easy for enterprises to move their analytics — so easy, in fact, the standard reaction to our technology is, “this is too good to be true.”

Q6. What is the biggest problem when enterprises want to move part or all of their data management to the cloud?

Jacque Istok: I think the biggest problem tends to be not architecting for the cloud itself, but instead treating the cloud like their virtual data center. Leveraging the same techniques, the same processes, and the same architectures will not lead to the cost or scalability efficiencies that you were hoping for.

Mike Waas: As Jacque points out, you really need to change your approach. However, the temptation is to use the move to the cloud as a trigger event to rework everything else at the same time. This quickly leads to projects that spiral out of control, run long, go over budget, or fail altogether. Being able to replatform quickly and separate the housekeeping from the actual move is, therefore, critical.
However, when it comes to databases, trouble runs deeper as applications and their dependencies on specific databases are the biggest obstacle. SQL code is embedded in thousands of applications and, probably most surprising, even third-party products that promise portability between databases get naturally contaminated with system-specific configuration and SQL extensions. We see roughly 90% of third-party systems (ETL, BI tools, and so forth) having been so customized to the underlying database that moving them to a different system requires substantial effort, time, and money.

Q7. How does an enterprise move the data management to a new platform without having to re-write all of the applications that rely on the database?

Mike Waas: At Datometry, we looked very carefully at this problem and, with what I said above, identified the need to rewrite applications each time new technology is adopted as the number one problem in the modern enterprise. Using Adaptive Data Virtualization (ADV) technology, this will quickly become a problem of the past! Systems like Datometry Hyper-Q let existing applications run natively and instantly on a new database without requiring any changes to the application. What would otherwise be a multi-year migration project and run into the millions, is now reduced in time, cost, and risk to a fraction of the conventional approach. “VMware for databases” is a great mental model that has worked really well for our customers.

Q8. What is Adaptive Data Virtualization technology, and how can it help adopting Cloud-Native Technology?

Mike Waas: Adaptive Data Virtualization is the simple, yet incredibly powerful, abstraction of a database: by intercepting the communication between application and database, ADV is able to translate in real-time and dynamically between the existing application and the new database. With ADV, we are drawing on decades of database research and solving what is essentially a compatibility problem between programming languages and systems with an elegant and highly effective approach. This is a space that has traditionally been served by consultants and manual migrations which are incredibly labor-intensive and expensive undertaking.
Through ADV, adopting cloud technology becomes orders of magnitude simpler as it takes away the compatibility challenges that hamper any replatforming initiative.

Q9. Can you quantify what are the reduced time, cost, and risk when virtualizing the data warehouse?

Jacque Istok: In the past, virtualizing the data warehouse meant sacrificing performance in order to get some of the common benefits of virtualization (reduced time for experimentation, maximizing resources, relative ease to readjust the architecture, etc). What we have found recently is that virtualization, when done correctly, actually provides no sacrifices in terms of performance, and the only question becomes whether or not the capital cost expenditure of bare metal versus the opex cost structure of virtual is something that makes sense for your organisation.

Mike Waas: I’d like to take it a step further and include ADV into this context too: instead of a 3-5 year migration, employing 100+ consultants, and rewriting millions of lines of application code, ADV lets you leverage new technology in weeks, with no re-writing of applications. Our customers can expect to save at least 85% of the transition cost.

Q10. What is the massively parallel processing (MPP) Scatter/Gather Streaming™ technology, and what is it useful for?

Jacque Istok: This is arguably one of the most powerful features of Pivotal Greenplum and it allows for the fastest loading of data in the industry. Effectively we scatter data into the Greenplum data cluster as fast as possible with no care in the world to where it will ultimately end up. Terabytes of data per hour, basically as much as you can feed down the wires, is sent to each of the workers within the cluster. The data is therefore disseminated to the cluster in the fastest physical way possible. At that point, each of the workers gathers the data that is pertinent to them according to the architecture you have chosen for the layout of those particular data elements, allowing for a physical optimization to be leveraged during interrogation of the data after it has been loaded.

Q11. How Datometry Hyper-Q & Pivotal Greenplum data warehouse work together?

Jacque Istok: Pivotal Greenplum is the world’s only true open source, production proven MPP data platform that provides out of the box ANSI compliant SQL capabilities along with Machine Learning, AI, Graph, Text, and Spatial analytics all in one. When combined with Datometry Hyper-Q, you can transparently and seamlessly take any Teradata application and, without changing a single line of code or a single piece of SQL, run it and stop paying the outrageous Teradata tax that you have been bearing all this time. Once you’re able to take out your legacy and expensive Teradata system, without a long investment to rewrite anything, you’ll be able to leverage this software platform to really start to analyze the data you have. And that analysis can be either on premise or in the cloud, giving you a truly hybrid and cross-cloud proven platform.

Mike Waas: I’d like to share a use case featuring Datometry Hyper-Q and Pivotal Greenplum featuring a Fortune 100 Global Financial Institution needing to scale their business intelligence application, built using 2000-plus stored procedures. The customer’s analysis showed that replacing their existing data warehouse footprint was prohibitively expensive and rewriting the business applications to a more cost-effective and modern data warehouse posed significant expense and business risk. Hyper-Q allowed the customer to transfer the stored procedures in days without refactoring the logic of the application and implement various control-flow primitives, a time-consuming and expensive proposition.

Qx. Anything else you wish to add?

Jacque Istok: Thank you for the opportunity to speak with you. We have found that there has never been a more valid time than right now for customers to stop paying their heavy Teradata tax and the combination of Pivotal Greenplum and Datometry Hyper-Q allows them to do that right now, with no risk, and immediate ROI. On top of that, they are then able to find themselves on a modern data platform – one that allows them to grow into more advanced features as they are able. Pivotal Greenplum becomes their bridge to transforming your organization by offering the advanced analytics you need but giving you traditional, production proven capabilities immediately. At the end of the day, there isn’t a single Teradata customer that I’ve spoken to that doesn’t want Teradata-like capabilities at Hadoop-like prices and you get all this and more with Pivotal Greenplum.

Mike Waas: Thank you for this great opportunity to speak with you. We, at Datometry, believe that data is the key that will unlock competitive advantage for enterprises and without adopting modern data management technologies, it is not possible to unlock value. According to the leading industry group, TDWI, “today’s consensus says that the primary path to big data’s business value is through the use of so-called ‘advanced’ forms of analytics based on technologies for mining, predictions, statistics, and natural language processing (NLP). Each analytic technology has unique data requirements, and DWs must modernize to satisfy all of them.”
We believe virtualizing the data warehouse is the cornerstone of any cloud-first strategy because data warehouse migration is one of the most risk-laden and most expensive initiatives that a company can embark on during their journey to to the cloud.
Interestingly, the cost of migration is primarily the cost of process and not technology and this is where Datometry comes in with its data warehouse virtualization technology.
We are the key that unlocks the power of new technology for enterprises to take advantage of the latest technology and gain competitive advantage.

———————

Jacque Istok serves as the Head of Data Technical Field for Pivotal, responsible for setting both data strategy and execution of pre and post sales activities for data engineering and data science. Prior to that, he was Field CTO helping customers architect and understand how the entire Pivotal portfolio could be leveraged appropriately.
A hands on technologist, Mr. Istok has been implementing and advising customers in the architecture of big data applications and back end infrastructure the majority of his career.

Prior to Pivotal, Mr. Istok co-founded Professional Innovations, Inc. in 1999, a leading consulting services provider in the business intelligence, data warehousing, and enterprise performance management space, and served as its President and Chairman. Mr. Istok is on the board of several emerging startup companies and serves as their strategic technical advisor.


Mike Waas, CEO Datometry, Inc.
Mike Waas founded Datometry after having spent over 20 years in database research and commercial database development. Prior to Datometry, Mike was Sr. Director of Engineering at Pivotal, heading up Greenplum’s Advanced R&Dteam. He is also the founder and architect of Greenplum’s ORCA query optimizer initiative. Mike has held senior engineering positions at Microsoft, Amazon, Greenplum, EMC, and Pivotal, and was a researcher at Centrum voor Wiskunde en Informatica (CWI), Netherlands, and at Humboldt University, Berlin.

Mike received his M.S. in Computer Science from University of Passau, Germany, and his Ph.D. in Computer Science from the University of Amsterdam, Netherlands. He has authored or co-authored 36 publications on the science of databases and has 24 patents to his credit.

Resources

Datometry Releases Hyper-Q Data Warehouse Virtualization Software Version 3.0. AUGUST 11, 2017

Replatforming Custom Business Intelligence | Use Case, ODBMS.org, NOVEMBER 7, 2017

Disaster Recovery Cloud Data Warehouse | Use Case. ODBMS.org, NOVEMBER 3, 2017

– Scaling Business Intelligence in the Cloud | Use Case. ODBMS.org · NOVEMBER 3, 2017

– Re-Platforming Data Warehouses – Without Costly Migration Of Applications. ODBMS.org · NOVEMBER 3, 2017

– Meet Greenplum 5: The World’s First Open-Source, Multi-Cloud Data Platform Built for Advanced Analytics. ODBMS.org · SEPTEMBER 21, 2017

Related Posts

– On Open Source Databases. Interview with Peter Zaitsev, ODBMS Industry Watch, Published on 2017-09-06

– On Apache Ignite, Apache Spark and MySQL. Interview with Nikita Ivanov , ODBMS Industry Watch, Published on 2017-06-30

– On the new developments in Apache Spark and Hadoop. Interview with Amr Awadallah, ODBMS Industry Watch, Published on 2017-03-13

Follow us on Twitter: @odbmsorg

##

Data quality checkers

At Drivy, we store, process and analyse hundreds of gigabytes of data in our production systems and our data warehouse. Data is of utmost importance to us because it makes our marketplace run and we use it to continuously improve our service.

Making sure that the data we store and use is what we expect is a challenge. We use multiple techniques to achieve this goal such as high standard coding practices or checker jobs we run on production data to make sure that our assumptions are respected.

Defining data quality

There are several research papers discussing the data quality dimensions as professionals have a hard time agreeing on the terminology. I found that the article written by the DAMA UK Working Group successfully defines 6 key dimensions that I summarize as follows:

  • Completeness: are all data items recorded? If something is mandatory, 100% completeness will be achieved. Sampling data does not achieve completeness for example.
  • Consistency: can we match the same data across data stores?
  • Timeliness: do we store data when the event occurred? For example, if we know that an event occurred 6 hours ago and we stored it only 1 hour ago, it could break a timeliness constraint.
  • Uniqueness: do we have duplicate records? Nothing will be recorded more than once based upon how a record is identified.
  • Validity: do we store data conforming to the syntax (format, type, range of values) of its definition? Storing a negative integer for a user’s age breaks the validity of the record for example.
  • Accuracy: does the data describe the real-world? For example if a temperature sensor is malconfigured and reports wrong data points that are still within the accepted validity range, the data generated is not accurate.

For example in a log database, uniqueness does not always need to be enforced. However, in another table aggregating these logs we might want to enforce the uniqueness dimension.

Data quality in the context of data warehousing

My main goal was to enforce a high quality of data in our data warehouse, which we fill with standard ETL processes.

For our web application, we already have checker jobs (we talked about this in this blog post) in the context of a monolith Rails application with MySQL databases. They are somewhat simpler: they run on a single database and check the quality of data we have control over because we wrote code to produce it. We can also afford to perform migrations or backfill in case we detect a corruption and want to fix the data.

When working with ETL processes and in the end a data warehouse, we have different needs. The main issue we face is that we pull data from various databases, third parties, APIs, spreadsheets, unreliable FTPs connections etc. Unfortunately, we have little or no control over what we fetch or collect from these external systems. Working with external datasources is a hard challenge.

We ingest raw data, we build aggregates and summaries, and we cross join data. Freshness depends on the source of the data and how we extract it. We don’t want alerts on data that is already corrupted upstream (this point is debatable), but we want to know if an upstream datasource gets corrupted. We usually want to compare datasets side by side (especially when pulling from another database) to make sure that the consistency dimension is respected.

Overall, I find it hard to enforce a strict respect of all data quality dimensions with 100% confidence, as data we pull upstream will never fully respect what was advertised. Data quality checkers can help us in improving our data quality, make sure preconditions hold true and aim for better data quality in the long run.

Abstractions

Now that we have a clearer idea about what data quality dimensions are and what we want to achieve, we can starting building something. My goal was to be able to perform checks to prove that data quality dimensions are respected. I had to come up with high-level abstractions to have a flexible library to work with and this research article helped me.

My key components can be defined as follows:

Data quality checks are performed at a specified interval on one or multiple datasets that are coming from various datasources, using predicates we define. Checks have a tolerance and trigger alerts on alert destinations with an alert level defined by the severity of the found errors.

Let’s define each word used here:

  • Alert levels: define how important the error is
  • Alerters: alert people or systems when errors are detected
  • Checkers: perform predicate checks on datasets
  • Parsers: create datasets from a source (parse a CSV file, read database records, call an API etc.)
  • Tolerance levels: tolerate some errors on a check (number, percentage, known broken points)
  • Escalation policies: switch alert destination depending on alert level
  • Logger: logs failing datasets somewhere
  • Clock: defines when a checker should be executed
  • Scheduler: run checks when they are up for execution
Checkers

Checkers are the most important components of the system. They actually perform the defined data quality checks on datasets. When implementing a new checker, you write a subclass from one of the abstract checkers supporting the core functionalities (extraction types, alert destinations, alert levels, logging etc.)

Available checkers:

  • PredicateSingleDatasetChecker: check that each element of the dataset respects a predicate
  • OffsetPredicateSingleDatasetChecker: given a predicate, an offset, check that two elements separated by the given offset respect the predicate. This is very useful to compare time records for example
  • PredicateDoubleDatasetsChecker: iterate on 2 datasets at the same time and check that the 2 records respect a predicate
Scheduler

We rely on Apache Airflow to specify, schedule and run our tasks for our data platform. We therefore created a pipeline putting together the data quality checks library with Airflow tasks and scheduling capabilities to easily run checks.

The main pipeline is executed every 15 minutes. Each data-quality check is composed of 2 main tasks:

  • a task with a ShortCircuitOperator which determines if the quality check needs to be executed now or not. If the quality check is not up for running, the second task is skipped
  • a task with a SubDagOperator to actually run the check: extract the dataset, run the checker and perform any alerting if needed.
Airflow directed acyclic graph in charge of running the various data quality checks Airflow sub graph of a quality check running on two datasets side by side Alerts

When a check is executed and detects a malfunction, we get alerted. For now on we only use Slack, but there is room for other alerters such as text messages, PagerDuty or emails.

When an alert triggers, we get to know what’s the alert, what’s the purpose of the associated check, how important the alert is with the number of falsy elements etc. Remember that alerts can have a certain level of tolerance - some errors can be tolerated - and different alert levels to help triage alerts. We get a quick view of data points which failed the check to have a rough idea about what’s going on, without jumping to the logs or looking immediately at the dataset.

Sample alert message on Slack showing a breach of SLA for data freshness

If we need to investigate further, we can look at the logs in Airflow or inspect the raw dataset. We find it convenient to have alerts in Slack so that we can start threads explaining why an alert triggered and if we need to take actions.

The future

We’ve been using these data quality checks over the last 3 months and we’ve been really happy to have them. It makes us trust more our data, helps us detect issues or prove that assumptions are indeed always respected. It’s also a good opportunity to step up our data quality level: we can lower thresholds over time, review SLAs and put more pressure on the level of confidence we have in critical parts of our storage.

For now, we plan to add more checkers (we have currently 20-30 checkers) to see if we’re happy with what we have, improve it and keep building on it.

Open source

We thought about open sourcing what we built, but we think that it’s a bit too soon and we want to gain more confidence before publishing it on GitHub.

Ideas and thoughts

If data quality is of interest to you and you want to react to this blog post, I would be thrilled to hear from you! Reach out on my Twitter.

Code sample

To get an idea of what a data quality checker looks like, here is a sample quality check which checks if data is fresh enough for various tables in our data warehouse (Redshift). This class can easily be tested, to have automated tests proving that alerts trigger with specific datasets.

This class is complete enough so that Airflow can know how to extract data from Redshift, transform and run the check automatically.

# -*- coding: utf-8 -*- import datetime from datetime import timedelta from data_quality.alert_levels import FailingElementsThresholds from data_quality.checkers import PredicateSingleDatasetChecker from data_quality.tolerances import LessThan from data_quality_checks.base_checkers.base import BaseQualityCheck from data_quality_checks.base_checkers.base import DatasetTypes from data_quality_checks.base_checkers.base import ExtractionTypes from data_quality_checks.base_checkers.base import ScheduleTypes class DataFreshness(BaseQualityCheck): # Run a query on Redshift EXTRACTION_TYPE = ExtractionTypes.REDSHIFT_QUERY # Dataset can be parsed from a CSV DATASET_TYPE = DatasetTypes.CSV SCHEDULE_TYPE = ScheduleTypes.CRON CRON_SCHEDULE = '20,50 7-22 * * *' def alert_level(self): # 0-2: warning # 2-3: error # > 3: critical return FailingElementsThresholds(2, 3) def tolerance(self): # Get notified as soon as we have a single issue return LessThan(0) def description(self): return 'Check that data is fresh enough in various tables' def checker(self, dataset): class Dummy(PredicateSingleDatasetChecker): def __init__(self, dataset, predicate, options, parent): super(Dummy, self).__init__( dataset, predicate, options ) self.parent = parent def checker_name(self): return self.parent.__class__.__name__ def description(self): return self.parent.description() fn = lambda e: e['last_update'] >= self.target_time(e['table_name']) return Dummy( dataset, fn, self.checker_options(), self ) def freshness_targets(self): conf = { 5: config.FINANCE_TABLES, 8: config.CORE_TABLES, 24: config.NON_URGENT_TABLES } res = [] for lag, tables in conf.iteritems(): for table in tables: res.append({'table': table, 'target': timedelta(hours=lag)}) return res def freshness_configuration(self, table): targets = self.freshness_targets() table_conf = [e for e in targets if e['table'] == table] if len(table_conf) != 1: raise KeyError return table_conf[0] def target_time(self, table): now = datetime.datetime.now() lag = self.freshness_configuration(table)['target'] return now - lag def query(self): parts = [] for table_conf in self.freshness_targets(): query = ''' SELECT MAX("{col}") last_update, '{table}' table_name FROM "{table}" '''.format( col=table_conf.get('col', 'created_at'), table=table_conf['table'], ) parts.append(query) the_query = ' UNION '.join(parts) return self.remove_whitespace(the_query)

IO-bound insert benchmark for InnoDB on a small server: MySQL 5.0 to 8.0

This post explains performance for the insert benchmark with InnoDB from MySQL versions 5.0 to 8.0. The goal is to understand how performance has changed across releases. This uses an IO-bound workload with small servers. The previous post covered an in-memory workload for InnoDB.

tl;dr - from 5.0.96 to 8.0.3
  • Insert performance improved by ~3X after MySQL 5.5 when I switching from built-in InnoDB to modern InnoDB.
  • Write stalls are a problem but I need better metrics for reporting this.
  • Secondary index scans are 1.09X faster on the i5 NUC and 1.64X faster on the i3 NUC
  • PK index scans are 1.88X faster on the i5 NUC and 1.96X faster on the i3 NUC
  • QPS increased by 5% on the i5 NUC and decreased by 25% on the i3 NUC in the read-write test that does 100 writes/second
tl;dr - from 5.6.35 to 8.0.3
  • The insert rate decreased by 8% on the i5 NUC and increased by 19% on the i3 NUC
  • There is a big improvement to range scan performance starting in MySQL 5.7
  • Secondary index scans are 1.24X faster on the i5 NUC and 1.16X faster on the i3 NUC
  • PK index scans are 1.35X faster on the i5 NUC and 1.25X faster on the i3 NUC.
  • QPS didn't change on the i5 NUC and increased by 6% on the i3 NUC in the read-write test that does 100 writes/second

Configuration

The tests used InnoDB from upstream MySQL versions 5.0.96, 5.1.72, 5.5.51, 5.6.35, 5.7.17 and 8.0.3. The built-in InnoDB (remember that?) was used for 5.0, 5.1 and 5.5 and then I switched to modern InnoDB starting with 5.6. All tests used jemalloc with mysqld. The i3 and i5 NUC servers are described here. The insert benchmark is described here. The my.cnf files are here for the i3 NUC and i5 NUC. The i5 NUC has more RAM, faster CPUs and faster storage than the i3 NUC. I tried to tune my.cnf for all engines including: disabled SSL for MySQL 5.7, used the same charset and collation and set innodb_purge_threads=1 to reduce mutex contention. For all tests the binlog was enabled but fsync was disabled for the binlog and database redo log. I compiled all of the MySQL versions on the test servers and did not use binaries from upstream.

The database is larger than RAM as the test table has ~250M rows for the i3 NUC and ~500M rows for the i5 NUC. The i3 NUC has 8gb of RAM and the i5 NUC has 16gb. The insert benchmark loaded the table with 250M or 500M rows rows, then did a full scan of each index on the table (PK, 3 secondary, PK again), then two read-write tests. The first read-write test tries to insert 1000 rows/second with one client while the other client does short range scans as fast as possible. The second read-write test is similar except the insert rate limit is 100/second.

Results

All of the data for the tests is here.

Results: load

The graph below has the insert rate for each release relative to the rate for InnoDB in MySQL 5.0.96. Performance improves dramatically in 5.6 thanks to switching from the built-in to modern InnoDB.

Additional metrics help to explain performance. The metrics are explained here. With the switch from built-in to modern InnoDB in MySQL 5.6 there are big improvements to the insert rate, write-amplification (wKB/i) and the CPU overhead per insert (Mcpu/i). From 5.6 to 8.0 on the i5 NUC the insert rate decreased by 8% while the CPU overhead/insert increased by 9%. I assume the new overhead is from code above InnoDB. From 5.6 to 8.0 on the i3 NUC the insert rate increased by 19% while the CPU overhead/insert decreased by 25%. I assume the change in CPU overhead explains the change in performance. While write-amplification decreased with modern InnoDB, the storage read overhead per insert (rKB/i) has increased. I haven't tried to explain that. Write stalls might be a problem because the 99th percentile insert rate is much worse than the average, but I need a better metric for reporting this.

i3 NUC
        ips.av  ips.99  rKB/i   wKB/i   Mcpu/i  size(GB)
5.0.96   864     502    0.38    106.56  10545    52
5.1.72   883     548    0.36    106.75  10850    52
5.5.51  1246     741    3.52     78.61   7791    47
5.6.35  2665    1185    6.10     39.08   4934    46
5.7.17  3242    1814    2.54     34.24   3526    46
8.0.3   3162    1771    2.63     35.08   3663    46

i5 NUC
        ips.av  ips.99  rKB/i   wKB/i   Mcpu/i  size(GB)
5.0.96  1892    1088    0.28    114.11   8533   102
5.1.72  1877    1116    0.29    114.33   8850   102
5.5.51  2516    1452    1.55     95.17   7754    94
5.6.35  5978    2431    2.10     44.50   4265    90
5.7.17  5634    3131    1.26     44.92   4690    90
8.0.3   5482    3017    1.29     45.93   4657    90

Results: scan

Below are tables that show the number of seconds for each full index scan: 1 is the PK, 2/3/4 are the secondary indexes and 5 is the PK again. The scanned index has 250M entries on the i3 NUC and 500M entries on the i5 NUC. The first PK index scan (#1) is slower than the second PK scan (#5) because writeback is in progress during the first scan and competes for storage IO. Writeback for a dirty page can also delay a read waiting for a clean page. Something was done to make range scans faster starting in MySQL 5.7 (thanks to the InnoDB team).

Comparing InnoDB between 5.0.96 and 8.0.3: secondary index scans are 1.09X faster on the i5 NUC and 1.64X faster on the i3 NUC, PK index scans are 1.88X faster on the i5 NUC and 1.96X faster on the i3 NUC.

Comparing InnoDB between 5.6.35 and 8.0.3: secondary index scans are 1.24X faster on the i5 NUC and 1.16X faster on the i3 NUC, PK index scans are 1.35X faster on the i5 NUC and 1.25X faster on the i3 NUC. Range scans are much faster for InnoDB starting in MySQL 5.7.

#seconds to scan an index, i3 NUC
1       2       3       4       5       2+3+4   engine
-       -       -       -       -       -----   ------
1349    422     198     208     200      828    5.0.96
1161    601     200     216     134     1017    5.1.72
 319    184     197     223     101      604    5.5.51
 292    181     194     212     127      587    5.6.35
 419    148     157     203      99      508    5.7.17
 403    147     154     204     102      505    8.0.3

#seconds to scan an index, i5 NUC
1       2       3       4       5       2+3+4   engine
-       -       -       -       -       -----   ------
2377    255     271     273     313     799     5.0.96
2477    266     283     286     399     835     5.1.72
 425    296     317     307     174     920     5.5.51
 475    290     302     314     225     906     5.6.35
 417    250     267     249     164     766     5.7.17
 420    227     260     245     167     732     8.0.3

Hardware efficiency metrics help to understand the performance improvements between MySQL 5.6.35 and 5.7.17. First are results for index scan #4, which is a secondary index. In MySQL 5.7.17 the CPU overhead per row read (Mcpu/o) is lower and that might explain why the storage read IO rate (rMB/s) is larger.

i3 NUC
secs    rMB/s   rKB/o   rGB     Mcpu/o  engine
212     35.2    0.031   7       13.584  5.6.35
203     37.8    0.031   7       12.552  5.7.17

i5 NUC
secs    rMB/s   rKB/o   rGB     Mcpu/o  engine
314     44.2    0.028   13      10.748  5.6.35
249     56.0    0.028   14       7.592  5.7.17

And then metrics for index scan #5, which is the second scan of the PK. The improvement in HW efficiency here is similar to the i3 NUC results above.

i3 NUC
secs    rMB/s   rKB/o   rGB     Mcpu/o  engine
127     130.5   0.067   16      16.560  5.6.35
 99     161.7   0.065   15       9.728  5.7.17

i5 NUC
secs    rMB/s   rKB/o   rGB     Mcpu/o  engine
225     147.0   0.067   32      14.732  5.6.35
164     198.1   0.067   32       8.900  5.7.17

Results: read-write, 1000 inserts/second

This section has results for the read-write tests where the writer does 1000 inserts/second. The graph has the query rate relative to the rate for MySQL 5.0.96. From MySQL 5.0.96 to 8.0.3 QPS increased by 24% on the i5 NUC and decreased by 22% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS increased by 2% on the i5 NUC and 18% on the i3 NUC. The results for MySQL 5.1.72 are an outlier in this test and in the test below with 100 inserts/second. It looks like InnoDB is able to cache the secondary indexes for 5.1.72, so it does less storage reads per query and the QPS is larger.
Additional metrics help explain the performance, although the results here are more confusing than for other tests. The metrics are explained here. The built-in InnoDB with MySQL 5.5.51 was unable to sustain the target insert rate on the i3 NUC. Write stalls were a problem for all releases based on the values in IPS.99 that are less than 999 and MySQL 5.5.51 was the worst. The CPU overhead per query (CPU/q) decreased starting with MySQL 5.6.35.

i3 NUC
        IPS.av  IPS.99  wKB/i   QPS.av  QPS.99  rKB/q   CPU/q
5.0.96  999     972     65.75    389      98     31.45  31282
5.1.72  999     961     63.98   4562    2149      0.26   6488
5.5.51  948     761     81.24    161      33     78.49  63049
5.6.35  999     977     67.92    258     156    101.61  37834
5.7.17  999     977     65.64    311     238     98.68  26829
8.0.3   999     978     65.97    303     230     99.50  27868

i5 NUC
        IPS.av  IPS.99  wKB/i   QPS.av  QPS.99  rKB/q   CPU/q
5.0.96  999     988     71.25    820     694    30.27   23455
5.1.72  999     994     65.48   5460    2619     0.25    5374
5.5.51  999     968     88.74    710     480    53.41   24597
5.6.35  999     987     73.09    992     626    59.45   16798
5.7.17  999     988     71.22   1042     738    61.07   16115
8.0.3   999     989     71.63   1019     722    61.36   16723

Results: read-write, 100 inserts/second

This section has results for the read-write tests where the writer does 100 inserts/second. The graph has the query rate relative to the rate for MySQL 5.0.96. From MySQL 5.0.96 to 8.0.3 QPS increased by 5% on the i5 NUC and decreased by 25% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS didn't change on the i5 NUC and increased by 6% on the i3 NUC. The QPS for 5.1.72 is an outlier. Similar to above, it looks like InnoDB cached the indexes in that setup and QPS was much larger.
Additional metrics help explain the performance. The metrics are explained here. All of the releases sustained the target insert rate and write stalls were not a problem. The CPU overhead per query (CPU/q) decreased by 26% on the i5 NUC and 25% on the i3 NUC and that might explain the increase in the query rate.

i3 NUC
        IPS.av  IPS.99  wKB/i   QPS.av  QPS.99  rKB/q   CPU/q
5.0.96  100     100     122.04  1171     810    21.97   14084
5.1.72  100     100      95.08  4881    4707     0.01    5278
5.5.51  100     100     165.01   670     369    66.26   16998
5.6.35  100     100     126.74   827     534    66.49   13675
5.7.17  100     100     128.32   900     566    69.56   10269
8.0.3   100     100     129.58   883     547    69.86   10625

i5 NUC
        IPS.av  IPS.99  wKB/i   QPS.av  QPS.99  rKB/q   CPU/q
5.0.96  100     100     107.04  1267    1180    25.84   15237
5.1.72  100     100      91.49  5625    5455     0.01    4477
5.5.51  100     100     171.97  1139     918    51.67   14015
5.6.35  100     100     157.74  1334     951    50.42   11546
5.7.17  100     100     160.00  1383     996    52.50   10807
8.0.3   100     100     161.31  1334    1041    52.85   11333

How to inherit properties from a base class entity using @MappedSuperclass with JPA and Hibernate

Introduction Last week, one of my blog readers asked me of a way to reuse the @Id mapping so that it won’t have to be declared on each an every entity. Because this is a good opportunity to introduce @MappedSuperclass, I decided to answer the question with a blog post. Domain Model Assuming we are … Continue reading How to inherit properties from a base class entity using @MappedSuperclass with JPA and Hibernate →

Pages