Planet MySQL

Join Pythian at Percona Live 2018

Percona Live 2018 is coming up soon, and Pythian is excited to be participating again this year. The Percona Live Open Source Database Conference is the premier open source database event for individuals and businesses that develop and use open source database software. This year the conference will take place April 23 – April 25 in Santa Clara, California. Once again Pythian will be hosting the highly anticipated Pythian Community Dinner on Tuesday, April 24th at Pedro’s Restaurant and Cantina.

We’ve also got a fantastic lineup up of speakers. You’ll want to make sure you register for the following sessions:

Monday, April 23, 2018

9:30 am – 12: 30 pm PST

  • Derek Downey will present “Hands on ProxySQL”
  • Matthias Crauwels & Pep Pla will present “MySQL Break/Fix Lab”

Tuesday, April 24, 2018

1:30 am – 12: 30 pm

  • Igor Donchovski will present “How to scale MongoDB”

11:30 am – 12:20 pm PST

  • Gabriel Ciciliani will present “Running database services on DC/OS”

Wednesday, April 25, 2018

11:00 am – 11:50 am

  • Igor Donchvski will present “Securing Your Data: All Steps for Encrypting Your MongoDB Database”
REGISTER FOR THE PYTHIAN COMMUNITY DINNER


 Visit the Percona Live site for more information. We look forward to seeing you at the show!

Fun With Bugs #62 - On Bugs Related to JSON Support in MySQL

Comparing to "online ALTER" or FULLTEXT indexes in InnoDB, I am not that much exposed to using JSON in MySQL. OK, there is EXPLAIN ... FORMAT=JSON statement that is quite useful, optimizer trace in MySQL is also in JSON format, but otherwise I rarely have to even read this kind of data since my lame attempt to pass one introductory MongoDB training couse back in 2015 (when I mostly enjoyed mongo shell). Even less I care about storing JSON in MySQL.

But it seems community users do use it, and there is a book for them... So, out of pure curiosity last week I decided to check known bugs reported by MySQL Community users that are related to this feature (it has a separate category "MySQL Server: JSON" in public bugs database. This is a (short enough) list of my findings:
  • Bug #89543 - "stack-overflow on gis.geojson_functions / json.json_functions_innodb". This is a recent bug report from Laurynas Biveinis. Based on last comment, it may have something to do with specific GCC compiler version and exact build option (WITH_UBSAN=ON) used. But for me it's yet another indication that community (Percona's in this case) extra QA still matters for MySQL quality. 
  • Bug #88033 - "Generated INT column with value from JSON document". As Geert Vanderkelen says: "would be great if JSON null is NULL".
  • Bug #87440 - "use union index,the value of key_len different with used_key_parts’ value." This is actually a bug in optimizer (or EXPLAIN ... FORMAT=JSON), but as it is still "Open" and nobody cared to check it, it still stays in the wrong category.
  • Bug #85755 - "JSON containing null value is extracted as a string "null"". Similar to Geert's case above, Dave Pullin seems to need an easy way to get real NULL in SQL when the value is null in JSON. It is stated in comments that now everything works as designed, and it works exactly the same in MariaDB 10.2.13. Some users are not happy though and the bug remains still "Verified".
  • Bug #84082 - "JSON_OBJECT("k", dateTime) adds milliseconds to time". In MariaDB 10.2.13 the result is different:
    MariaDB [(none)]> SELECT version(), JSON_OBJECT("k", cast("2016-11-19 17:46:31" as datetime(0)));
    +-----------------+--------------------------------------------------------------+
    | version()       | JSON_OBJECT("k", cast("2016-11-19 17:46:31" as datetime(0))) |
    +-----------------+--------------------------------------------------------------+
    | 10.2.13-MariaDB | {"k": "2016-11-19 17:46:31"}                                 |
    +-----------------+--------------------------------------------------------------+
    1 row in set (0.00 sec)
  • Bug #83954 - "JSON handeling of DECIMAL values, JSON from JSON string". Take care when you cast values to JSON type.
  • Bug #81677 - "Allows to force JSON columns encoding into pure UTF-8". This report is still "Open" and probably everything works as designed.
  • Bug #79813 - "Boolean values are returned inconsistently with JSON_OBJECT". Take care while working with boolean properties.
  • Bug #79053 - "Second argument "one" to JSON_CONTAINS_PATH is a misnomer". Nice request from Roland Bouman to use "some" instead of "one" in this case. Probably too late to implement it...
There also several related feature requests, of them the following are still "Open":
  • Bug #84167 - "Managing deeply nested JSON documents with JSON_SET, JSON_MERGE". As a comment says, "It would by nice if MySQL came up with a solution for this in a more constructive way".
  • Bug #82311 - "Retrieve the last value of an array (JSON_POP)". I am not 100% sure what the user wants here, but even I can get last array element in MongoDB using $slice, for example:
    > db.text.find();
    { "_id" : ObjectId("5aafe41c6bd09c09625aa3d1"), "item" : "test" }
    { "_id" : ObjectId("5ab0e8c208d6fe081ebb4f40"), "item" : null }
    { "_id" : ObjectId("5ab0f0f6cc695c62b4b0a301"), "item" : [ { "val" : 10 }, { "val" : 11 } ] }
    > db.text.find({}, { item: { $slice: -1 }, _id: 0});
    { "item" : "test" }
    { "item" : null }
    { "item" : [ { "val" : 11 } ] }
    >
  • Bug #80545 - "JSON field should be allow INDEX by specifying table_flag". It seems some 3rd party storage engine developers want to have a way to crate and use some nice indexes of their engines to get efficient index-based search in JSON documents.
  • Bug #80349 - "MySQL 5.7 JSON: improve documentation and possible improvements". Simon Mudd asked for more documentation on the way feature is designed (for those interested in the details of JSON data type implementation in modern MySQL versions there is a detailed enough public worklog, WL#8132) and asked some more questions. Morgan Tocker answered at least some of them, but this report still stays "Open" since that times...
It would be great for remaining few "Open" reports from community users to be properly processed, and a couple of real bugs identified fixed. There are also features to work on, and in the meantime workarounds and best practices should be shared. Maybe there are enough of them for the entire book.

That's almost all I have to say about JSON in MySQL at the moment.

Database Audit Log Monitoring for Security and Compliance

We recently conducted a webinar on Audit Log analysis for MySQL & MariaDB Databases. This blog will further provide a deep dive into the security & compliance surrounding databases.

Database auditing is the tracking of database resources utilization and authority, specifically, the monitoring and recording of user database actions. Auditing can be based on a variety of factors, including individual actions, such as the type of SQL statement executed, or on a combination of factors such as user name, application, time, etc.  Performing regular database log analysis bolsters your internal security measures by answering questions like who changed your critical data, when it was changed, and more. Database auditing also helps you comply with increasingly demanding compliance requirements.

The purpose of this blog is to outline the importance of audit log analysis using MariaDB and Enterprise MySQL as examples.

Auditing Requirements

The requirement to track access to database servers and the data itself is not that new, but in recent years, there has been a marked need for more sophisticated tools. When auditing is enabled, each database operation on the audited database records a trail of information such as what database objects were impacted, who performed the operation and when. The comprehensive audit trail of executed database actions can be maintained over time to allow DBAs, security staff, as well as any authorized personnel, to perform in-depth analysis of access and modification patterns against data in the DBMS.  

Keep in mind that auditing tracks what a particular user has done once access has been allowed. Hence, auditing occurs post-activity; it does not do anything to prohibit access. Of course, some database auditing solutions have grown to include capabilities that will identify nefarious access and shut it down before destructive actions can occur.

Audit trails produced by intrusion detection help promote data integrity by enabling the detection of security breaches.  In this capacity, an audited system can serve as a deterrent against users tampering with data because it helps to identify infiltrators. Your company’s business practices and security policies may dictate being able to trace every data modification back to the initiating user. Moreover, government regulations may require your organization to analyze data access and produce regular reports, either on an ongoing basis, or on a case-by-case basis, when there is a need to identify the root cause of data integrity problems.  Auditing is beneficial for all of these purposes.

Moreover, should unauthorized, malicious, or simply ill-advised operations take place, proper auditing will lead to the timely discovery of the root cause and resolution.

The General Data Protection Regulation

On April 27, 2016, the General Data Protection Regulation (GDPR) was adopted by the European Parliament and the council of the European Union that will be taking effect starting on May 25, 2018.  It’s a regulation in EU law governing data protection and privacy for all individuals within the European Union. It introduces numerous security and compliance regulations to all organizations worldwide that handle, process, collect or store personal information of EU citizens. This means that organizations that work with personal data will have to elevate security measures and auditing mechanisms when handling Personal Identifiable Information (PII) of EU citizens.

Furthermore, organizations will have to ensure that only people which should have access to the personal information of EU citizens are granted  access, and in case of unauthorized access, organizations must have mechanisms to detect and be alerted on any such incident in order to resolve any possible issues in an expeditious manner.  Following a data breach, organizations must disclose full information on these events to their local data protection authority (DPA) and all customers concerned with the data breach in no more than 72 hours so they can respond accordingly.

Failing to comply with GDPR standard could result in heavy fines for up to 4% of the offending organization’s global revenue, or up to €20 million (whichever is greater). With this in mind, it is crucial for all affected organizations to make sure that they implement adequate log monitoring on their databases, as defined by the standard.

How Databases Implement Auditing

There is no set standard that defines how a database should implement auditing, so vendors and Database Administrators (DBAs) differ in their approach.  Some employ special tables while others utilize log files. The two DBMSes that we’ll be looking at here today, MariaDB and MySQL, employ log-based auditing.

The MariaDB Audit Plugin

Prior to MariaDB 5.5.20, in order to record user access, you would have had to employ third-party database solutions. To help businesses comply with internal auditing regulations and those of various regulatory bodies, MariaDB developed the MariaDB Audit Plugin.  The MariaDB Audit Plugin can be used also with MySQL, but includes some unique features that are available only for MariaDB.

In MariaDB, the Audit Plugin logs detailed information for any type of access from users to your database server and tables, including:

    • Timestamp
    • Server-Host
    • User
    • Client-Host
    • Connection-ID
    • Query-ID
    • Operation
    • Database
    • Table
    • Error-Code
Installation

Getting the Maria Audit Plugin installed, configured and the auditing activated is fairly simple.  In fact, you only need a few minutes to enable auditing for your database. A restart of the Server is not needed, so you do not need to plan any downtime for the installation of the plugin.  The only requirement is that you are running MariaDB or MySQL Server with version 5.5 or newer (MySQL 5.5.14, MariaDB 5.5.20).

If you installed MariaDB from its official packages, you probably already have the plugin on your system, even though it’s neither installed nor enabled by default. Look for a file called “server_audit.so” (in Linux) or “server_audit.dll” (in Windows) inside your plugins directory.  The file path of the plugin library is stored in the plugin_dir system variable. To see the value of this variable and determine the file path of the plugin library, execute the following SQL statement:

SHOW GLOBAL VARIABLES LIKE 'plugin_dir'; +---------------+--------------------------+ | Variable_name | Value | +---------------+--------------------------+ | plugin_dir | /usr/lib64/mysql/plugin/ | +---------------+--------------------------+

If you don’t find the plugin file inside your plugins directory, download it from the MariaDB site and place it in the plugins directory manually. (In Linux, ensure that the MariaDB server can read the file by giving it 755 permissions and root user ownership.)

Next, install the plugin using the command:
INSTALL PLUGIN server_audit
SONAME 'server_audit.so';

To confirm the plugin is installed and enabled, run the query show plugins;. You should see it appear in the results list:

+-----------------------------+----------+--------------------+-----------------+---------+ | Name | Status | Type | Library | License | +-----------------------------+----------+--------------------+-----------------+---------+ | SERVER_AUDIT | ACTIVE | AUDIT | server_audit.so | GPL | +-----------------------------+----------+--------------------+-----------------+---------+

The MariaDB Audit Plugin comes with many variables to let you fine-tune your auditing to help you better concentrate on just those events and statements that are important to you. You can see the currently set variables with the command show global variables like "server_audit%";:

MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE "server_audit%"; +-------------------------------+-----------------------+ | Variable_name | Value | +-------------------------------+-----------------------+ | server_audit_events | | | server_audit_excl_users | | | server_audit_file_path | server_audit.log | | server_audit_file_rotate_now | OFF | | server_audit_file_rotate_size | 1000000 | | server_audit_file_rotations | 9 | | server_audit_incl_users | | | server_audit_logging | ON | | server_audit_mode | 0 | | server_audit_output_type | file | | server_audit_syslog_facility | LOG_USER | | server_audit_syslog_ident | mysql-server_auditing | | server_audit_syslog_info | | | server_audit_syslog_priority | LOG_INFO | +-------------------------------+-----------------------+

These variables should be specified in the MariaDB server configuration file (e.g /etc/my.cnf.d/server.cnf) in the [server] section in order to be persistent between server restarts. For example, to have the variable server_audit_logging set to ON, add the line server_audit_logging=ON to the file:

[server]

server_audit_logging=OFF

Here is a quick rundown of some of the most important variables:

  • server_audit_logging – Enables audit logging; if it’s not set to ON, audit events will not be recorded and the audit plugin will not do anything.
  • server_audit_events – Specifies the events you wish to have in the log. By default the value is empty, which means that ALL events are recorded. The options are:
    • CONNECTION (users connecting and disconnecting)
    • QUERY (queries and their result)
    • TABLE (which tables are affected by the queries)
  • server_audit_excl_users, server_audit_incl_users – These variables specify which users’ activity should be excluded from or included in the audit. server_audit_incl_users has the higher priority. By default, all users’ activity is recorded.
  • server_audit_output_type – By default auditing output is sent to a file. The other option is “syslog”, meaning all entries go to the syslog facility.
  • server_audit_syslog_facility, server_audit_syslog_priority – Specifies the syslog facility and the priority of the events that should go to syslog.
Understanding the Log File Entries

Once you have the audit plugin configured and running, you can examine the log file, (e.g. /var/lib/mysql/server_audit.log). There you will find all the events that have been enabled by the server_audit_logging variable. For example, CONNECTION entries will show you the user and from where connects and disconnects occur:

20140901 15:33:43,localhost.localdomain,root,localhost,5,0,CONNECT,,,0 20140901 15:45:42,localhost.localdomain,root,localhost,5,0,DISCONNECT,,,0

Here are some example TABLE and QUERY entries:

20140901 15:19:44,localhost.localdomain,root,localhost,4,133,WRITE,video_king,stores, 20140901 15:19:44,localhost.localdomain,root,localhost,4,133,QUERY, video_king,'DELETE FROM stores LIMIT 10',0

The first entry shows that there were WRITE operations on the database video_king and the table stores. The query that made the WRITE changes follows: DELETE FROM stores LIMIT 10. The order of these statements will be always the same – first the TABLE event and then the QUERY event that caused it.

A READ operation looks like this:

20140901 15:20:02,localhost.localdomain,root,localhost,4,134,READ,video_king,stores, 20140901 15:20:05,localhost.localdomain,root,localhost,4,134,QUERY,stores,'SELECT * FROM stores LIMIT 100',0 MySQL Enterprise Audit

MySQL Enterprise Edition includes MySQL Enterprise Audit, implemented using a server plugin named audit_log. MySQL Enterprise Audit uses the open MySQL Audit API to enable standard, policy-based monitoring, logging, and blocking of connection and query activity executed on specific MySQL servers. Designed to meet the Oracle audit specification, MySQL Enterprise Audit provides an out of box auditing and compliance solution for applications that are governed by both internal and external regulatory guidelines.

Installation

The plugin is included with MySQL Enterprise Audit, so you simply need to add the following to your my.cnf file to register and enable the audit plugin:

[mysqld]

plugin-load=audit_log.so (keep in mind the audit_log suffix is platform dependent, so .dll on Windows, etc.)

Alternatively, you can load the plugin at runtime:

mysql> INSTALL PLUGIN audit_log SONAME 'audit_log.so';

Auditing for a specific MySQL server can be dynamically enabled and disabled via the audit_log_policy global variable. It uses the following named values to enable or disable audit stream logging and to filter the audit events that are logged to the audit stream:

  • “ALL” – enable audit stream and log all events
  • “LOGINS” – enable audit stream and log only login events
  • “QUERIES” – enable audit stream and log only query events
  • “NONE” – disable audit stream

Another global variable, audit_log_rotate_on_size, allows you to automate the rotation and archival of audit stream log files based on size. Archived log files are renamed and appended with a datetime stamp
when a new file is opened for logging.

The MySQL audit stream is written as XML, using UFT-8 (without compression or encryption) so that it can be easily formatted for viewing using a standard XML parser. This enables you to leverage third-party tools to view the contents. You may override the default file format by setting the audit_log_format system variable at server startup.  Formats include:

  • Old-style XML format (audit_log_format=OLD): The original audit logging format used by default in older MySQL series.
  • New-style XML format (audit_log_format=NEW): An XML format that has better compatibility with Oracle Audit Vault than old-style XML format. MySQL 5.7 uses new-style XML format by default.
  • JSON format (audit_log_format=JSON)

By default, the file is named “audit.log” and resides in the server data directory. To change the name of the file, you can set the audit_log_file system variable at server startup.

MySQL Enterprise Audit was designed to be transparent at the application layer by allowing you to control the mix of log output buffering and asynchronous or synchronous disk writes to
minimize the associated overhead that comes when the audit stream is enabled. The net result is that, depending on the chosen audit stream log stream options, most application users will see little to no difference in response times when the audit stream is enabled.

Monyog MySQL Monitor

While reading the audit log file is great for a quick casual look, it’s not the most practical way to monitor database logs. Chances are you’ll be better off using the syslog option or, better still, taking advantage of tools that report on the audit log and/or syslogs.  There, you can process entries to focus on certain type of events of interest.

One such tool is Monyog MySQL Monitor.  Version 8.5.0 introduces audit log analysis for MySQL Enterprise and MariaDB.  This feature parses the audit log maintained by the server and displays the content in clean tabular format.

Monyog accesses the audit log file, the same way it does for other MySQL log files, including the Slow Query, General Query and Error log.

Figure 1: Audit Log Options

You can select the server and the time-frame for which you want the audit log to be seen from.  Then, clicking on “SHOW AUDIT LOG” fetches the contents of the log. The limit on the number of rows which can be fetched in one time-frame is 10000.

Figure 2: Audit Log Screen

The section on the top gives you quick summary of the audit log in percentage like Failed Logins, Failed Events, Schema changes, Data Changes and Stored Procedure. All these legends are clickable and shows the corresponding audit log entries on clicking. Furthermore, you can use the filter option to fetch audit log based on Username, Host, Operation, Database and Table/Query. There is also an option to export the fetched audit log content in CSV format.

Conclusion

In this blog, we outlined the importance of audit log analysis using MariaDB and Enterprise MySQL as examples.  

In recent years, there has been a marked need for more sophisticated tools due to increased internal and external security and auditing policies.  

A properly audited system can serve as a deterrent against users tampering with data because it helps to identify infiltrators.  Once an unauthorized, malicious, or simply ill-advised operation has taken place, proper auditing will lead to the timely discovery of the root cause and resolution.

Both MariaDB and MySQL implement auditing via native plugins.  These are fully configurable and may record database activities in a variety of formats.  The resulting log files may be read directly or analyzed by a tool such as the Monyog MySQL Monitor.  It provides a summary of Failed Logins, Failed Events, Schema changes, Data Changes and Stored Procedure, as well as fields such as Username, Host, Operation, Database and Table/Query, all within an easy-to-read tabular format.

Download a 14-day free trial of Monyog MySQL monitor. Here’s the complete video for all those who couldn’t attend the webinar.

The post Database Audit Log Monitoring for Security and Compliance appeared first on Monyog Blog.

Updated: Become a ClusterControl DBA: Making your DB components HA via Load Balancers

Choosing your HA topology

There are various ways to retain high availability with databases. You can use Virtual IPs (VRRP) to manage host availability, you can use resource managers like Zookeeper and Etcd to (re)configure your applications or use load balancers/proxies to distribute the workload over all available hosts.

The Virtual IPs need either an application to manage them (MHA, Orchestrator), some scripting (Keepalived, Pacemaker/Corosync) or an engineer to manually fail over and the decision making in the process can become complex. The Virtual IP failover is a straightforward and simple process by removing the IP address from one host, assigning it to another and use arping to send a gratuitous ARP response. In theory a Virtual IP can be moved in a second but it will take a few seconds before the failover management application is sure the host has failed and acts accordingly. In reality this should be somewhere between 10 and 30 seconds. Another limitation of Virtual IPs is that some cloud providers do not allow you to manage your own Virtual IPs or assign them at all. E.g., Google does not allow you to do that on their compute nodes.

Resource managers like Zookeeper and Etcd can monitor your databases and (re)configure your applications once a host fails or a slave gets promoted to master. In general this is a good idea but implementing your checks with Zookeeper and Etcd is a complex task.

A load balancer or proxy will sit in between the application and the database host and work transparently as if the client would connect to the database host directly. Just like with the Virtual IP and resource managers, the load balancers and proxies also need to monitor the hosts and redirect the traffic if one host is down. ClusterControl supports two proxies: HAProxy and ProxySQL and both are supported for MySQL master-slave replication and Galera cluster. HAProxy and ProxySQL both have their own use cases, we will describe them in this post as well.

Why do you need a load balancer?

In theory you don’t need a load balancer but in practice you will prefer one. We’ll explain why.

If you have virtual IPs setup, all you have to do is point your application to the correct (virtual) IP address and everything should be fine connection wise. But suppose you have scaled out the number of read replicas, you might want to provide virtual IPs for each of those read replicas as well because of maintenance or availability reasons. This might become a very large pool of virtual IPs that you have to manage. If one of those read replicas had a failure, you need to re-assign the virtual IP to another host or else your application will connect to either a host that is down or in worst case, a lagging server with stale data. Keeping the replication state to the application managing the virtual IPs is therefore necessary.

Also for Galera there is a similar challenge: you can in theory add as many hosts as you’d like to your application config and pick one at random. The same problem arises when this host is down: you might end up connecting to an unavailable host. Also using all hosts for both reads and writes might also cause rollbacks due to the optimistic locking in Galera. If two connections try to write to the same row at the same time, one of them will receive a roll back. In case your workload has such concurrent updates, it is advised to only use one node in Galera to write to. Therefore you want a manager that keeps track of the internal state of your database cluster.

Both HAProxy and ProxySQL will offer you the functionality to monitor the MySQL/MariaDB database hosts and keep state of your cluster and its topology. For replication setups, in case a slave replica is down, both HAProxy and ProxySQL can redistribute the connections to another host. But if a replication master is down, HAProxy will deny the connection and ProxySQL will give back a proper error to the client. For Galera setups, both load balancers can elect a master node from the Galera cluster and only send the write operations to that specific node.

On the surface HAProxy and ProxySQL may seem to be similar solutions, but they differ a lot in features and the way they distribute connections and queries. HAProxy supports a number of balancing algorithms like least connections, source, random and round-robin while ProxySQL distributes connections using the weight-based round-robin algorithm (equal weight means equal distribution). Since ProxySQL is an intelligent proxy, it is database aware and is also able to analyze your queries. ProxySQL is able to do read/write splitting based on query rules where you can forward the queries to the designated slaves or master in your cluster. ProxySQL includes additional functionality like query rewriting, caching and query firewall with real-time, in-depth statistics generation about the workload.

That should be enough background information on this topic, so let’s see how you can deploy both load balancers for MySQL replication and Galera topologies.

Deploying HAProxy

Using ClusterControl to deploy HAProxy on a Galera cluster is easy: go to the relevant cluster and select “Add Load Balancer”:

And you will be able to deploy an HAProxy instance by adding the host address and selecting the server instances you wish to include in the configuration:

By default the HAProxy instance will be configured to send connections to the server instances receiving the least number of connections, but you can change that policy to either round robin or source.

Under advanced settings you can set timeouts, maximum amount of connections and even secure the proxy by whitelisting an IP range for the connections.

Under the nodes tab of that cluster, the HAProxy node will appear:

Now your Galera cluster is also available via the newly deployed HAProxy node on port 3307. Don’t forget to GRANT your application access from the HAProxy IP, as now the traffic will be incoming from the proxy instead of the application hosts. Also, remember to point your application connection to the HAProxy node.

Now suppose the one server instance would go down, HAProxy will notice this within a few seconds and stop sending traffic to this instance:

The two other nodes are still fine and will keep receiving traffic. This retains the cluster highly available without the client even noticing the difference.

Deploying a secondary HAProxy node

Now that we have moved the responsibility of retaining high availability over the database connections from the client to HAProxy, what if the proxy node dies? The answer is to create another HAProxy instance and use a virtual IP controlled by Keepalived as shown in this diagram:

The benefit compared to using virtual IPs on the database nodes is that the logic for MySQL is at the proxy level and the failover for the proxies is simple.

So let’s deploy a secondary HAProxy node:

After we have deployed a secondary HAProxy node, we need to add Keepalived:

And after Keepalived has been added, your nodes overview will look like this:

So now instead of pointing your application connections to the HAProxy node directly you have to point them to the virtual IP instead.

In the example here, we used separate hosts to run HAProxy on, but you could easily add them to existing server instances as well. HAProxy does not bring much overhead, although you should keep in mind that in case of a server failure, you will lose both the database node and the proxy.

Deploying ProxySQL

Deploying ProxySQL to your cluster is done in a similar way to HAProxy: "Add Load Balancer" in the cluster list under ProxySQL tab.

In the deployment wizard, specify where ProxySQL will be installed, the administration user/password, the monitoring user/password to connect to the MySQL backends. From ClusterControl, you can either create a new user to be used by the application (the user will be created on both MySQL and ProxySQL) or use the existing database users (the user will be created on ProxySQL only). Set whether are you are using implicit transactions or not. Basically, if you don’t use SET autocommit=0 to create new transaction, ClusterControl will configure read/write split.

After ProxySQL has been deployed, it will be available under the Nodes tab:

Opening the ProxySQL node overview will present you the ProxySQL monitoring and management interface, so there is no reason to log into ProxySQL on the node anymore. ClusterControl covers most of the ProxySQL important stats like memory utilization, query cache, query processor and so on, as well as other metrics like hostgroups, backend servers, query rule hits, top queries and ProxySQL variables. In the ProxySQL management aspect, you can manage the query rules, backend servers, users, configuration and scheduler right from the UI.

Check out our ProxySQL tutorial page which covers extensively on how to perform database Load Balancing for MySQL and MariaDB with ProxySQL.

Deploying Garbd

Galera implements a quorum-based algorithm to select a primary component through which it enforces consistency. The primary component needs to have a majority of votes (50% + 1 node), so in a 2 node system, there would be no majority resulting in split brain. Fortunately, it is possible to add a garbd (Galera Arbitrator Daemon), which is a lightweight stateless daemon that can act as the odd node. The added benefit by adding the Galera Arbitrator is that you can now do with only two nodes in your cluster.

If ClusterControl detects that your Galera cluster consists of an even number of nodes, you will be given the warning/advice by ClusterControl to extend the cluster to an odd number of nodes:

Choose wisely the host to deploy garbd on, as it will receive all replicated data. Make sure the network can handle the traffic and is secure enough. You could choose one of the HAProxy or ProxySQL hosts to deploy garbd on, like in the example below:

Take note that starting from ClusterControl 1.5.1, garbd cannot be installed on the same host as ClusterControl due to risk of package conflicts.

After installing garbd, you will see it appear next to your two Galera nodes:

Final thoughts Related resources  HAProxy Tutorial  ProxySQL Tutorial

We showed you how to make your MySQL master-slave and Galera cluster setups more robust and retain high availability using HAProxy and ProxySQL. Also garbd is a nice daemon that can save the extra third node in your Galera cluster.

This finalizes the deployment side of ClusterControl. In our next blog, we will show you how to integrate ClusterControl within your organization by using groups and assigning certain roles to users.

Tags:  clustercontrol deployment haproxy installation load balancing MariaDB MaxScale MySQL

MySQL 8.0.4rc

MySQL 8.0.4rc was just released as "Pre-General Availability Draft: 2018-03-19".

I decided to take a quick peek and note my impressions here.  Some of this is old news for many as this release has been talked about for awhile but I added my thoughts anyway.. 
First thing I noticed was a simple issue of using the updated mysql client. My older version was still in my path that resulted in 
ERROR 2059 (HY000): Authentication plugin 'caching_sha2_password' cannot be loaded So simple fix and make sure you are using the valid updated mysql client. Of course other options existed like changing the authentication plugin back to  mysql_native_password but why bother, use the secure method.  This is a very good enhancement for security so do not be shocked if you have some connection issues while you get your connections using this more secure method. 

Welcome to the MySQL monitor.  Commands end with ; or \g. Your MySQL connection id is 36 Server version: 8.0.4-rc-log
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
So the first very cool enhancement... 
mysql> show create table user\G *************************** 1. row ***************************        Table: user Create Table: CREATE TABLE `user` (   `Host` char(60) COLLATE utf8_bin NOT NULL DEFAULT '',   `User` char(32) COLLATE utf8_bin NOT NULL DEFAULT '',   `Select_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Insert_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Update_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Delete_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Create_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Drop_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Reload_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Shutdown_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Process_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `File_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Grant_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `References_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Index_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Alter_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Show_db_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Super_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Create_tmp_table_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Lock_tables_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Execute_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Repl_slave_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Repl_client_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Create_view_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Show_view_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Create_routine_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Alter_routine_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Create_user_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Event_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Trigger_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Create_tablespace_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `ssl_type` enum('','ANY','X509','SPECIFIED') CHARACTER SET utf8 NOT NULL DEFAULT '',   `ssl_cipher` blob NOT NULL,   `x509_issuer` blob NOT NULL,   `x509_subject` blob NOT NULL,   `max_questions` int(11) unsigned NOT NULL DEFAULT '0',   `max_updates` int(11) unsigned NOT NULL DEFAULT '0',   `max_connections` int(11) unsigned NOT NULL DEFAULT '0',   `max_user_connections` int(11) unsigned NOT NULL DEFAULT '0',   `plugin` char(64) COLLATE utf8_bin NOT NULL DEFAULT 'caching_sha2_password',   `authentication_string` text COLLATE utf8_bin,   `password_expired` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `password_last_changed` timestamp NULL DEFAULT NULL,   `password_lifetime` smallint(5) unsigned DEFAULT NULL,   `account_locked` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Create_role_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Drop_role_priv` enum('N','Y') CHARACTER SET utf8 NOT NULL DEFAULT 'N',   `Password_reuse_history` smallint(5) unsigned DEFAULT NULL,   `Password_reuse_time` smallint(5) unsigned DEFAULT NULL,   PRIMARY KEY (`Host`,`User`) ) /*!50100 TABLESPACE `mysql` */ ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 COMMENT='Users and global privileges' 1 row in set (0.00 sec)
Yep user table is InnoDB and has own TableSpace. 
With the addition of the new Data Dictionary you will now notice Information_schema changes.  So as a simple example the Columns table historically has not been a view but that has now changed , along with many others as you can see via the url provided. 

mysql> show create table COLUMNS \G *************************** 1. row ***************************                 View: COLUMNS          Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`mysql.infoschema`@`localhost` 
This appears to be done to help performance with the information_schema but removing the temporary table creations per queries into the information_schema. 
Chapter 14 of the documentation goes into depth on this, the provided url below will help you find more information and future blog posts might touch more on this.  The previously mentioned Data Dictionary then also leads into the ability to have atomic Data Definition Language (DDL) statements or  atomic DDL. 

This is likely to trip up a few transactions if you do not review your queries before setting up replication to a new MySQL 8.0 instance. I say that because of how the handling of table maintenance could be impacted. If you write clean queries with "If Exists" it won't be a big problem. Overall it is a more transaction based feature that protects your data and rollback options. 

Resource management looks very interesting and I will have to take more time to focus on this as it is a new feature with MySQL 8.0. Overall you can assign groups and no longer have to set priority of query but let your grouping define how a query should behave and resources allotted to it. 
mysql> select @@version; +------------+ | @@version  | +------------+ | 5.7.16-log | +------------+ 1 row in set (0.00 sec)
mysql> desc INFORMATION_SCHEMA.RESOURCE_GROUPS; ERROR 1109 (42S02): Unknown table 'RESOURCE_GROUPS' in information_schema
mysql> select @@version; +--------------+ | @@version    | +--------------+ | 8.0.4-rc-log | +--------------+ 1 row in set (0.00 sec)
mysql> desc INFORMATION_SCHEMA.RESOURCE_GROUPS; +------------------------+-----------------------+------+-----+---------+-------+ | Field                  | Type                  | Null | Key | Default | Extra | +------------------------+-----------------------+------+-----+---------+-------+ | RESOURCE_GROUP_NAME    | varchar(64)           | NO   |     | NULL    |       | | RESOURCE_GROUP_TYPE    | enum('SYSTEM','USER') | NO   |     | NULL    |       | | RESOURCE_GROUP_ENABLED | tinyint(1)            | NO   |     | NULL    |       | | VCPU_IDS               | blob                  | YES  |     | NULL    |       | | THREAD_PRIORITY        | int(11)               | NO   |     | NULL    |       | +------------------------+-----------------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)

More insight into your InnoDB buffer pool cache in regards to the indexes that are in it is now available. 
mysql> desc INFORMATION_SCHEMA.INNODB_CACHED_INDEXES ; +----------------+---------------------+------+-----+---------+-------+ | Field          | Type                | Null | Key | Default | Extra | +----------------+---------------------+------+-----+---------+-------+ | SPACE_ID       | int(11) unsigned    | NO   |     |         |       | | INDEX_ID       | bigint(21) unsigned | NO   |     |         |       | | N_CACHED_PAGES | bigint(21) unsigned | NO   |     |         |       | +----------------+---------------------+------+-----+---------+-------+ 3 rows in set (0.01 sec)

If you are unsure what to set the InnoDB Buffer pool , log_sizes or flush method MySQL will set these for you now based on the available memory. 
innodb_dedicated_server
[mysqld] innodb-dedicated-server=1
mysql> select @@innodb_dedicated_server; +---------------------------+ | @@innodb_dedicated_server | +---------------------------+ |                         1 | +---------------------------+
This simple test set my innodb_buffer_pool_size to 6GB  for example when the default is 128MB. 
Numerous JSON additions have been made as well as regular expression changes. Both of which look promising. 
The only replication enhancement per this release itself is that is now supports binary logging of partial updates to JSON documents using a compact binary format. 
However overall many features are available ( you can read all about them here) ,  one of which (I wish my client had tomorrow ) is replication filers per channel.  My test instance already had binary logs enabled, but they are on by default now along with TABLE based versus file based master & slave info, ( big fan of having that transaction based by default )
Overall keep in mind this is just a first glance at this release and very high level thoughts on it, many other changes exist. Looking over other blog posts about this release as well as the manual and release notes will also help. Certainly download and review as it looks to be very promising for administration, security and replication points of view. 

MySQL Yum repo setups for commercial and community use cases

MySQL Package Management Options In this blog we will explore some interesting ways to install MySQL Community and Enterprise Edition binaries using your associated Linux package manager.  In this case we’ll look mostly at the Yum package manager on Oracle Linux.  The benefit of these package managers is that you can install software packages easily,… Read More »

ProxySQL new routing algorithm to handle thousands of hundreds of schemas/shards

ProxySQL is a service designed to scale and handle traffic in very large setups.
In the past we already showed examples of how ProxySQL can handle thousands of MySQL servers, and millions (yes, millions!) of distinct users.
Although, recently a new challenge raised: can ProxySQL perform routing based on MySQL schemaname for a vary large number of schemas?
We know that in production setups ProxySQL is already performing routing to MySQL servers based on schemaname for few hundreds schemas, but can it handle thousands or more?

Technically there is no limit on how many shards ProxySQL can handle, as long as there are rules for them.
Tibor Korocz already pointed in a blog post that more query rules can affect performance.
In case of query routing based on schemaname no regular expressions are involved (therefore each rule is not expensive), but the same principle applies: the more rules need to be evaluated, longer it takes to make a decision (in this case to perform the routing).

Scope of this blog post is to understand the performance implication of performing routing based on the number of rules.
To simplify the setup to run some benchmark, we used only 1 MySQL server as backend, and we created 50000 schemas:

for i in `seq 10001 60000` ; do echo "CREATE DATABASE IF NOT EXISTS shard_$i;" done | mysql -u root

To run the benchmark we used a various number of rules.
For every test, we configured ProxySQL setting the right number of rules: for 100 schemas we created 100 rules, for 1000 schema 1000 rules, etc.
For example, for 200 rules we used this configuration:

( echo "DELETE FROM mysql_query_rules; INSERT INTO mysql_query_rules (active,username,cache_ttl) VALUES (1,\"sbtest\",120000);" for i in `seq 10001 10200` ; do echo "INSERT INTO mysql_query_rules (active,username,schemaname,destination_hostgroup,apply) VALUES (1,\"sbtest\",\"shard_$i\",1,1);" done echo "LOAD MYSQL QUERY RULES TO RUNTIME;" ) | mysql -u admin -padmin -h 127.0.0.1 -P6032

We then ran queries as the following:

for j in `seq 1 50000` ; do echo "USE shard_$(($RANDOM%200+10001))" ; echo "SELECT 1;" done | mysql -u sbtest -psbtest -h 127.0.0.1 -P6033 --ssl-mode=disabled -NB > /dev/null

It is relevant to note that we aren’t interested in the total execution time (this is why we used this simple one-liner to generate traffic), but in the time spent inside the Query Processor.
Because the execution time of the queries is not relevant, we also configured the query cache.
To compute the time spent in the Query Processor we set variable mysql-stats_time_query_processor='true' and computed the value of status variable Query_Processor_time_nsec. This status variable measures the time spent inside the Query Processor, in nanoseconds.

As we expected, the more query rules ProxySQL needs to evaluate, the longer it takes to compute the destination hostgroup for that schemaname. This results in latency before executing the query:

The graph above shows that for 10 rules the average latency is around 1us (microsecond), for 100 rules the average latency is 2us , and that it gradually grows to 6us for 500 rules and 256us for 20000 rules.

6us average latency for 500 rules doesn’t seem to be an issue, but 256us (0.256ms) average latency for 20000 does seem like an issue.
What is also important to note is that these values are the average values for all the shards. Routing decision for some shards are faster than others, depending in which order the rules are written.

In fact, with 20000 rules the average latency for shard_10001 is 0.89us , while the average latency for shard_30000 is 581.1us !

Surely, this doesn’t seem a scalable solution.

MySQL Query Rules Fast Routing

ProxySQL 1.4.7 introduces a new routing capability that enhances what already configurable in mysql_query_rules.
The new capability is configurable using table mysql_query_rules_fast_routing. This table is simpler compared to mysql_query_rules, and its purpose is simple: given username, schemaname and flagIN, finds the destination_hostgroup.

For reference, this is the table definition of mysql_query_rules_fast_routing:

CREATE TABLE mysql_query_rules_fast_routing ( username VARCHAR NOT NULL, schemaname VARCHAR NOT NULL, flagIN INT NOT NULL DEFAULT 0, destination_hostgroup INT CHECK (destination_hostgroup >= 0) NOT NULL, comment VARCHAR NOT NULL, PRIMARY KEY (username, schemaname, flagIN) )

Table mysql_query_rules_fast_routing should be considered as an extension of mysql_query_rules. After processing the rules in mysql_query_rules, the rules in mysql_query_rules_fast_routing will be evaluated, unless the last matching rule defined in mysql_query_rules sets apply=1.
Because of the nature of the rules in mysql_query_rules_fast_routing, one and only one rule will be evaluated.
The rules defined in mysql_query_rules_fast_routing are loaded in a hash table, where the key is username, schemaname and FlagIN , and the value is the destination_hostgroup .

New routing algorithm in action

This is the command we used to configure mysql_query_rules_fast_routing with a different number of rules, up to 50000 rules in this example:

( echo "DELETE FROM mysql_query_rules; DELETE FROM mysql_query_rules_fast_routing;" echo "INSERT INTO mysql_query_rules (active,username,cache_ttl) VALUES (1,\"sbtest\",120000);" for i in `seq 10001 60000` ; do echo "INSERT INTO mysql_query_rules_fast_routing (username,schemaname,flagIN,destination_hostgroup,comment) VALUES (\"sbtest\",\"shard_$i\",0,1,\"\");" done echo "LOAD MYSQL QUERY RULES TO RUNTIME;" ) | mysql -u admin -padmin -h 127.0.0.1 -P6032

The results are really interesting:

In other words:

  • between 10 rules to 10k rules, the average latency is between 1.7us to 1.9us
  • at 20k rules the average latency is 2.2us
  • at 50k rules the average latency is 2.4us

What is really interesting is that this new routing algorithm allows ProxySQL to perform schema routing for up to 10000 shards introducing less latency the old routing algorithm was introducing for 100 shards, and it scales very well to 50k rules wth very small overhead compared to 100 shards in the old routing algorithm.

Disabling mysql-stats_time_query_processor by default

In ProxySQL 1.4.4 the new variable mysql-stats_time_query_processor was introduced to dynamically enable or disable measuring the time spent in the Query Processor. Although timing is very important to understand the impact of having too many rules, measuring time elapse with good precision has some performance impact.
In fact, on this same server where we ran these benchmark, measuring time elapse with very good precision has a cost (latency) of 0.3us , mostly spent in kernel space (system calls).
That also means that from the metrics above you should remove 0.3us to determine the latency when measuring time is not enabled. For example, "at 50k rules the average latency is 2.4us" should become "at 50k rules the average latency is 2.1us" .
For this reason, since ProxySQL 1.4.4 , mysql-stats_time_query_processor is disabled by default.

Why flagIN is relevant?

The new routing algorithm performs routing based on username, schemaname, and flagIN .
I think some are asking why flagIN is relevant.
flagIN in mysql_query_rules_fast_routing allows to set flagOUT in mysql_query_rules based on other criteria later relevant for routing.
For example, assume that you don’t want to only perform routing based on username+schemaname, but also read/write split.
You can use mysql_query_rules to:

  • set flagOUT = 1 for all queries to be sent to the server that is the master, no matter in which cluster
  • set flagOUT = 2 for all queries to be sent to the server that is the slave, no matter in which cluster

The value of flagOUT resulting from mysql_query_rules will become the flagIN in mysql_query_rules_fast_routing. This will allow to combine read/write split to routing based on username+schemaname.
For benchmark purpose, let's create 200.000 rules:

( echo "DELETE FROM mysql_query_rules; DELETE FROM mysql_query_rules_fast_routing; INSERT INTO mysql_query_rules (active,username,cache_ttl) VALUES (1,\"sbtest\",120000);" for i in `seq 10001 60000` ; do echo "INSERT INTO mysql_query_rules_fast_routing (username,schemaname,flagIN,destination_hostgroup,comment) VALUES (\"sbtest\",\"shard_$i\",0,1,\"\");" done echo "INSERT INTO mysql_query_rules_fast_routing SELECT username, schemaname, FlagIN+1, destination_hostgroup, comment FROM mysql_query_rules_fast_routing;" echo "INSERT INTO mysql_query_rules_fast_routing SELECT username, schemaname, FlagIN+2, destination_hostgroup, comment FROM mysql_query_rules_fast_routing;" echo "LOAD MYSQL QUERY RULES TO RUNTIME;" ) | mysql -u admin -padmin -h 127.0.0.1 -P6032 mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SELECT COUNT(*) FROM mysql_query_rules_fast_routing" +----------+ | COUNT(*) | +----------+ | 200000 | +----------+

Updating the graph after testing routing based on 200k rules:

As you can see from the graph above, ProxySQL is now able to perform routing decision based on 200k rules with almost no extra impact compared to 100 rules in the lagacy algorithm.

Drawback of the new algorithm

This new routing algorithm seems to have a minimum overhead of 1.7us (or 1.4us with mysql-stats_time_query_processor disabled).
The minimum overhead is not present in the old/legacy algorithm, therefore it makes sense to use the new routing algorithm only if you have more than 100 routing rules.
Furthermore, ProxySQL Cluster capability doesn't support mysql_query_rules_fast_routing yet.

Memory usage

To reduce contention between worker threads in ProxSQL, threads maintain their own copy of query rules. This means that the more rules you have and the more threads you have configured, the more memory is needed. In this example, loading 200k rules and using 4 threads lead to a memory usage of 280MB.
In future we plan to make configurable if, for query processing, ProxySQL should use more memory and be lock free, or use less memory and not be lock free, or a mix of the two options:

  • mysql_query_rules to be lock free
  • mysql_query_rules_fast_routing to be shared between threads

These are implementation details we will be looking into when planning to optimize ProxySQL even further.

Conclusion

ProxySQL 1.4.7 introduces a new routing algorithm that doesn’t replace the legacy algorithm, but enhances it.
Thank to the new routing algorithm, now ProxySQL can easily handle routing based on hundreds of thousands of schemas/shards with almost no impact (few microseconds).

Percona XtraDB Cluster, MySQL Asynchronous Replication and log-slave-updates

Recently, I’ve been looking into issues with the interactions between MySQL asynchronous replication and Galera replication. In this blog post, I’d like to share what I’ve learned.

MySQL asynchronous replication and Galera replication interactions are complicated due to the number of factors involved (Galera replication vs. asynchronous replication, replication filters, and row-based vs. statement-based replication). So as a start, I’ll look at an issue that came up with setting up an asynchronous replication channel between two Percona XtraDB Cluster (PXC) clusters.

Here’s a view of the desired topology:

The Problem

We want to set up an asynchronous replication channel between two PXC clusters. We also set log-slave-updates on the async slave (PXC node 2a in the topology diagram).

This is an interesting configuration and results in unexpected behavior as the replication depends on the node where the operation was performed. Let’s use CREATE TABLE as an example.

  • Run CREATE TABLE on Node 1a.  The table replicates to Node 1b, but not to the nodes in cluster 2.
  • Run CREATE TABLE on Node 1b. The table replicates to all nodes (both cluster 1 and cluster 2).
Some background information

Understanding the problem requires some knowledge of MySQL threads. However, as a simplification, I’ll group the threads into three groups:

  • Main MySQL Client Thread: This thread handles the requests for the client connection (here the client is an external entity).
  • Async Replication Threads: There are multiple threads in use here, some handle I/O, and some apply the updates, but we will view them as a single entity for simplicity.
  • Galera Threads: There are also multiple threads here, similar to the Async Replication Threads. (The name Galera here refers to the underlying replication technology used by PXC.)

For more information on MySQL threads, see
https://dev.mysql.com/doc/refman/5.7/en/mysql-threads.html

Why is the data not replicating?

In the first case (CREATE TABLE executed on Node1a)

  • The table is replicated from Node1a -> Node 1b via Galera replication.
  • The table is not replicated because the async replication threads are not picking up the changes.

In the second case (CREATE TABLE executed on Node 1b)

  • The table is replicated from Node1b -> Node 1a via Galera replication.
  • The table is replicated from Node1b -> Node 2a via async replication. This differs from the first case because the statement is executed on the Main MySQL client thread.  The async replication threads pick up the changes and send them to Node 2a.
  • The table is replicated from Node 2a -> Node 2b via Galera replication because log-slave-updates has been enabled on Node2a.

That last part is the important bit. We can view the Galera replication threads as another set of asynchronous replication threads. So if data is coming in via async replication, they have to be made visible to Galera by log-slave-updates.  This is true in the other direction also: log-slave-updates must be enabled for Galera to supply data to async replication.

This is very similar to chained replication
https://dev.mysql.com/doc/refman/5.7/en/replication-options-slave.html

The Solution

In this scenario, the answer is to set log-slave-updates on Node 1b (the async master) and on Node 2a (the async slave).

We set log-slave-updates on node 1b to allow the async threads to pickup the changes from the Galera threads.

We set log-slave-updates on node 2a to allow the Galera threads to pickup the changes from the async threads. Starting with PXC 5.7.17, calling START SLAVE on a PXC node will return an error unless log-slave-updates is enabled.

You must enable log-slave-updates on the node for data to be transferred between Galera and asynchronous replication.

Recommendations/Best Practices

If you plan to use MySQL asynchronous replication with Percona XtraDB Cluster (either as async master or slave), we recommend that you enable log-slave-updates on all nodes within the cluster. This to (1) to ensure that any async replication connections to/from the cluster work correctly and (2) to ensure that all the nodes within a cluster share the same configuration and behavior.

Recommended configuration diagram for the clusters:

The post Percona XtraDB Cluster, MySQL Asynchronous Replication and log-slave-updates appeared first on Percona Database Performance Blog.

Getting Started with ProxySQL - MySQL & MariaDB Load Balancing Tutorial

We’re excited to announce a major update to our tutorial “Database Load Balancing for MySQL and MariaDB with ProxySQL

ProxySQL is a lightweight yet complex protocol-aware proxy that sits between the MySQL clients and servers. It is a gate, which basically separates clients from databases, and is therefore an entry point used to access all the database servers.

In this new update we’ve…

  • Updated the information about how to best deploy ProxySQL via ClusterControl
  • Revamped the section “Getting Started with ProxySQL”
  • Added a new section on Data Masking
  • Added new frequently asked questions (FAQs)

Load balancing and high availability go hand-in-hand. ClusterControl makes it easy to deploy and configure several different load balancing technologies for MySQL and MariaDB with a point-and-click graphical interface, allowing you to easily try them out and see which ones work best for your unique needs.

ClusterControl Single Console for Your Entire Database Infrastructure Find out what else is new in ClusterControl Install ClusterControl for FREE ClusterControl for ProxySQL

Included in ClusterControl Advanced and Enterprise, ProxySQL enables MySQL, MariaDB and Percona XtraDB database systems to easily manage intense, high-traffic database applications without losing availability. ClusterControl offers advanced, point-and-click configuration management features for the load balancing technologies we support. We know the issues regularly faced and make it easy to customize and configure the load balancer for your unique application needs.

We know load balancing and support many different technologies. ClusterControl has many things preconfigured to get you started with a couple of clicks. If you run into challenged we also provide resources and on-the-spot support to help ensure your configurations are running at peak performance.

Related resources  ProxySQL for ClusterControl  Video: ProxySQL ClusterControl Demonstration  ProxySQL: All the Severalnines Resources

ClusterControl delivers on an array of features to help deploy and manage ProxySQL

  • Advanced Graphical Interface - ClusterControl provides the only GUI on the market for the easy deployment, configuration and management of ProxySQL.
  • Point and Click deployment - With ClusterControl you’re able to apply point and click deployments to MySQL, MySQL replication, MySQL Cluster, Galera Cluster, MariaDB, MariaDB Galera Cluster, and Percona XtraDB technologies, as well the top related load balancers with HAProxy, MaxScale and ProxySQL.
  • Suite of monitoring graphs - With comprehensive reports you have a clear view of data points like connections, queries, data transfer and utilization, and more.
  • Configuration Management - Easily configure and manage your ProxySQL deployments with a simple UI. With ClusterControl you can create servers, re-orientate your setup, create users, set rules, manage query routing, and enable variable configurations.

Make sure to check out the update tutorial today!

Tags:  proxysql MySQL MariaDB database load balancing high availability

MySQL @ FOSSASIA 2018!

MySQL is again a part of the FOSSASIA - Free Open Source Summit Asia  2018 which will be hold on March 22-25, 2018 in Singapore. Do not miss to come to our booth in exhibition area as well as do not miss MySQL related talks in Database track... See some of them below and at Fossasia schedule:

Saturday, March 24, 2018:

  • 10:00-11:00 MySQL Community Gathering - come to join to the MySQL meetup or other community! Come to Lounge area of the Summit to meet Ricky Setyawan.
  • 11:30-11:55 Database Lightning Talk by Ricky Setyawan, the Principal Sales Consultant
  • 17:00-17:25 Breaking through with MySQL 8.0, by Ricky Setyawan, the Principal Sales Consultant

Sunday, March 25, 2018:

  • 10:00-10:25 Atomic DDL in MySQL 8.0 by Shipra Jain, the Principal Software Engineer
  • 10:30-10:55 MySQL for Distributed transaction and Usage of JSON as a fusion between SQL & NOSQL by Ajo Robert, the Principal Member Technical Staff 
  • 11:00-11:25 Histograms and the way we use it in MySQL 8.0 by Amit Bhattacharya, the Senior SW Development Manager 
  • 13:00-13:25 What's new in MySQL Optimizer 8.0 by Chaithra M G the Principal MySQL SW Developer
  • 13:30-13:55 MySQL : Improving Connection Security by Harin Nalin Vadodara the Principal MTS
  • 14:00-14:25 New Replication Features in MySQL 8.0 by Venkatesh Duggirala the Senior Principal Member of Technical Staff
  • 14:30-14:55 Improved Error Logging in MySQL 8.0 by Praveenkumar Hulakund the SW Developer
  • 15:00-15:25 The State of the Art on MySQL Group Replication by Hemant Dangi the Senior Technical Staff
  • 15:30-15:55 Enhanced XA Support for Replication in MySQL-5.7 by Nisha Gopalakrishnan the Principal Member of Technical Staff
  • 16:00-16:25 MySQL Performance Schema - A great insight of running MySQL Server by Manyank Prasad the Principal Member of Technical Staff

We are looking forward to talking to you at our booth in expo area or at the Training room 2-1 where the Database track is hold. 

NDB Cluster and disk columns

NDB is mainly an In-memory database. We have however also the possibility to
store non-indexed columns on disk. This data uses a page cache as any
other normal disk-based DBMS.

Interestingly with the increases of memory sizes one could think that
disk data becomes less important for MySQL Cluster. The answer is actually
the opposite.

The reason is again the HW development. NDB is designed with predictable
latency as a very basic requirement. In the past disks meant hard drives. Access
time to a hard disk was several milliseconds at best. Given that our requirement
was to handle complex transactions within 10 milliseconds disk data storage
was out of the question.

Modern HW is completely different, they use SSD devices, first attached through
the SATA interface that enabled up to around 500 MByte per second and
a few thousand IO operations per second (IOPS). The second step was the
introduction of SSD devices on the PCI bus. This lifted the performance up to more
than  1 GByte per second. These devices are extremely small and still very powerful.
I have an Intel NUC at home that has two of those devices.

Thus the performance difference between disk storage and RAM has decreased.

The next step on the way was to change the storage protocol and introduce NVMe
devices. These still use the same HW, but use a new standard that is designed for
the new type of storage devices. Given those devices we have now the ability to
execute millions of IOPS on a standard server box with access times of a few tens
of microseconds.

For NDB this means that this HW fits very well into the NDB architecture. The work
we did on developing the Partial LCP algorithm did also a lot of work on improving
our disk data implementation. We see more and more people that use disk data
columns in NDB.

The next step is even more interesting, this will bring storage into the memory bus and
access times of around one microsecond. For NDB this disk storage can be treated as
memory to start with, thus making it possible to soon have multiple TBytes of memory
in standard boxes.

Thus HW development is making the NDB engine more and more interesting to use.

One notable example that uses disk data columns in NDB is HopsFS. They use the
disk data columns to store small files in the meta data server of the HopsFS
implementation of the Hadoop HDFS Name Server. This means much faster
access to small files. The tests they did showed that they could handled hundreds
of thousands of file reads and writes per second even using fairly standard SSD disks
on the servers.

The implementation of disk data in NDB is done such that each row can have three
parts. The fixed memory part that is accessed quickly using a row id. The variable
sized part that is accessed through a pointer from the fixed size part.

The disk columns are also accessed through a reference in the fixed size part. This
reference is an 8-bit value that refers to the page id and page index of the disk
columns.

Before we can access those pages we go through a page cache. The page cache was
implemented on caching techniques that was state of the art a few years ago.

The idea is quite simple. The page cache uses a normal hot page queue. Pages are
brought up in this queue when they are accessed. A single access will bring it up,
but to be more permanent in the page cache a page has to be accessed several times.

Now each page is represented in those queues by a page state record. The basis
of the page cache algorithm is that a page can be represented in a page state
record even if the page is not in the page cache.

NDB has a configuration variable called DiskPageBufferEntries, by default this is
set to 10. It is the multiplication factor of how many more pages we have
page state records compared to the amount of pages we have in the page cache.

So for example if we have set DiskPageBufferMemory to 10 GByte and we have
set DiskPageBufferEntries we will have page state records that holds pages of
100 GBytes in the queues. Thus even when a page is paged out we keep it in the
list and thus we can see patterns of reuse that are longer than the page cache
we have access to. The factor of 10 means that the page state records are of
about 3% of the size of the page cache itself. Thus the benefits of the extra
knowledge about page usage patterns comes at a fairly low cost. The factor
10 is configurable.

Many cloud servers comes equipped with hundreds of GBytes (some even TBytes)
and can also store a number of TBytes on NVMe devices. NDB is well suited
for those modern machines and MySQL Cluster 7.6 have been designed to be
suitable for this new generation of HW.

Discovering rows that have been updated since last checkpoint

One important problem that requires a solution is to decide whether
a row has been updated since the last checkpoint or not.

Most implementations use some kind of mechanism that requires extra
memory resources and/or CPU resources to handle this.

NDB uses the fact that each row is already stamped with a timestamp.
The timestamp is what we call a global checkpoint id. A new global
checkpoint is created about once every 2 seconds (can be faster or
slower by configuration).

Thus we will overestimate the number of rows written since last checkpoint
with a little bit, but with checkpoints taking a few minutes, the extra overhead
of this is only around 1%.

Thus when we scan rows we check the global checkpoint id of the row, if
it is bigger than the global checkpoint that the last checkpoint had fully
covered we will write the row as changed since last checkpoint. Actually
we also have the same information on the page level, thus we can check
the page header and very quickly scan past an entire page if it hasn't been
updated since last checkpoint.

The same type of scanning is used also to bring a restarting node up to
synch with the live node. This algorithm has been present in NDB since
MySQL 5.1.

Partial LCPs and Read-only tables

In MySQL Cluster 7.5 we use Complete Checkpoints. In MySQL Cluster 7.6
we implement an approach where we only checkpoint a part of the database
in each checkpoint.

A special case is a checkpoint of a table partition where no changes
at all have happened since the last checkpoint. In this case we implemented
a special optimisation such that it is not necessary to checkpoint anything
at all for this table partition. It is only necessary to write a new LCP
control file which is 4 kBytes in size for each table partition (can grow to
8 kBytes if the recovery will require more than 980 checkpoints to
recover.

This means that if your database contains a large set of read-only tables,
there will be no need to checkpoint those tables at all. This feature
is used also when setting EnablePartialLcp to false.

Partial LCPs and disk space

One of the main objectives of the new Partial LCP algorithm in MySQL
Cluster 7.6 is to keep up with the development of modern HW.

I have already described in previous blogs how Partial LCP can handle
nicely even database sizes of 10 TBytes of memory with a very modest
load on the disk devices.

Now modern HW has shifted from using hard drives to using SSDs.

The original approach in NDB is assuming that the checkpoints and
REDO logs are stored on hard drives. In MySQL Cluster 7.5 the
disk space required for the REDO log is that it is a bit larger than the
DataMemory size. The reason is that we want to survive also when
loading massive amounts of data.

In MySQL Cluster 7.5 we cannot remove any checkpoint files until
a checkpoint is fully completed. This means that we require around
4x the memory size of disk space for REDO logs and checkpoints.

With hard drives this is not a problem at all. As an example my
development box has 32 GBytes of memory with 2 TByte of disk
space. Thus 64x more disk space compared to the memory space.

With modern servers this size difference between memory and
disks is decreasing. For example many cloud VMs only have
a bit more than 2x the disk size compared to the memory size.

So one goal of MySQL Cluster 7.6 is to fit in much less disk
space.

The aim is to solve this with a three-thronged approach.

1) Partial LCP means that we can execute the checkpoints much
faster. Since REDO logs only need to be kept for around two
checkpoints this means a significant decrease of size requirements
for REDO logs. The aim is to only need around 10% of the disk
space of memory for the REDO logs. This work is not completed
in 7.6.4. As usual there are no guarantees when this work will be
completed.

2) Using Partial LCP we can throw away old LCP files as soon
as we have created a new recoverable LCP for the table partition.
Thus it is no longer necessary to store 2 LCPs on disk. At the
same time there is some overhead related to Partial LCPs. By default
setting this overhead is 50% plus a bit more. Thus we should always
fit within about 1.6x times the memory size.

It is possible to set EnablePartialLcp to false, in this case all
checkpoints will be Complete Checkpoints. This means more
writes to disk for checkpoints, but it will decrease the storage
space to around the same as the memory size.

3) Using CompressedLCP set to 1 we can decrease LCP storage
by another factor of 2-3x (usually around 2.7x). This feature has
existed for a long time in NDB.

Thus it should be possible to significantly decrease the requirements
on storage space when running NDB using MySQL Cluster 7.6.

NDB Checkpoints and research on In-Memory Databases

I just read an article called Low-Overhead Asynchronous Checkpointing in
Main-Memory Database Systems. It was mentioned in a course in Database
Systems at Carnegie-Mellon University, see here.

In MySQL Cluster 7.6.4 we released a new variant of our checkpointing designed
for modern HW with TBytes of main memory. I think studying this implementation
will be very worthwhile both for users of NDB, but also for researchers in DBMS
implementations. It implements a new class of checkpoint algorithms that is currently
a research topic in the database research community.

It was interesting to compare our approach that I called Partial LCP with approaches
taken by other commercial in-memory databases and with the approach presented
in the paper.

LCP is Local CheckPoint which is the name we use for our checkpoint protocol
in NDB.

The course presents a number of ideal properties of a checkpoint implementation.

The first property is that doesn't slow down regular transaction processing.

In the case of NDB we execute checkpoints at a steady pace which consumes
around 5-10% of the available CPU resources. This will decrease even more with
the implementation in 7.6.

The second is that it doesn't introduce any latency spikes.

NDB checkpointing both new and old executes in steps of at most 10-20
microseconds. So there will be extremely small impact on latency of
transactions due to checkpointing.

The third property is that it doesn't require excessive memory overhead.

NDB checkpointing consumes a configurable buffer in each database thread. The
ideal size of this is around 1 MByte. In addition we have a REDO log buffer that
is usually a bit bigger than that. That is all there is to it. There is no extra memory
space needed for checkpointing rows. The checkpointing performs a normal scan
of the rows and copies the memory content to the buffer and as soon as the buffer
is full it writes it to disk using sequential disk writes.

It is fair to say that NDB does a good job in handling those ideal properties.

The course presents two variants called fuzzy checkpoints and consistent checkpoints.
The course defines fuzzy checkpoints as a checkpoint that can write uncommitted
data. I would normally use the term fuzzy checkpoint to mean that the checkpoint
is not consistent at a database level, but can still be consistent on a row basis.

Actually NDB is a mix of the definition provided in the course material. It is a
consistent checkpoint for each row. But different rows can be consistent at very
different points in time. So on a row basis NDB is consistent, but at the database
level the checkpoint is fuzzy. Thus to perform recovery one needs to install the
checkpoint and then apply the REDO log to get a consistent checkpoint restored.

Next the course presents two variants called Complete Checkpoints and Delta
Checkpoints. Complete Checkpoint means that the entire database is written in
each checkpoint. Delta Checkpoint means that only changes are written in a
checkpoint.

This is where MySQL Cluster 7.6 differs from 7.5. 7.5 uses a Complete Checkpoint
scheme. 7.6 uses a Partial Checkpoint scheme.

In my view the NDB variant is a third variant which is not complete and not a
Delta Checkpoint. Partial means that it writes the Delta, that is it writes all changes
since the last checkpoint. But it does also write a Complete Checkpoint for a part
of the database, thus the name Partial Checkpoint. Thus it is similar to an
incremental backup scheme.

NDB can divide the database up in up to 2048 parts, each checkpoint can write
0 parts (only if no changes occurred in the table partition since last checkpoint).
It can write 1 part if the number of writes is very small, it can write all 2048 parts
if almost all rows have been updated and it can write anywhere between 1 and
2048 based on how many rows were updated since last checkpoint.

Almost all commercial In-Memory DBMSs still use a complete checkpoint scheme.
As we move towards TBytes of memory this is no longer a plausible approach.

The NDB approach means that we can perform a checkpoint in a few minutes
even in a system with 16 TBytes of memory where we need to write about
8 GBytes plus the changes since the last checkpoint.

Thus NDB takes the step into a new world of massively large In-Memory DBMSs
with the introduction of MySQL Cluster 7.6 and its new Partial LCP implementation.

My new book "MySQL Cluster 7.5 inside and out" describes the LCP
implementation in 7.5, the description of the Partial LCP can be found in my blogs
and also some very detailed descriptions in the source code itself. Among other
things a 10-page proof of that the algorithm actually works :)

The nice thing with the Partial LCP approach in NDB is that it requires no
more work after writing the checkpoint. There is no need of merging checkpoints.
This happens automatically at recovery. There is some amount of overhead in
that the checkpoints can have some rows in multiple checkpoints and thus there is
some amount of overhead at recovery. We calculate the number of parts to use
based on the amount of changes. We even implemented a LCP simulator that
calculates the overhead while inserting and deleting large amounts of row
and has been used to find the proper configurable parameters for the algorithm.


Shinguz: MySQL Environment MyEnv 2.0.0 has been released

FromDual has the pleasure to announce the release of the new version 2.0.0 of its popular MySQL, Galera Cluster and MariaDB multi-instance environment MyEnv.

The new MyEnv can be downloaded here.

In the inconceivable case that you find a bug in the MyEnv please report it to the FromDual bug tracker.

Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.

Upgrade from 1.1.x to 2.0.0 # cd ${HOME}/product # tar xf /download/myenv-2.0.0.tar.gz # rm -f myenv # ln -s myenv-2.0.0 myenv
Plug-ins

If you are using plug-ins for showMyEnvStatus create all the links in the new directory structure:

cd ${HOME}/product/myenv ln -s ../../utl/oem_agent.php plg/showMyEnvStatus/
Upgrade of the instance directory structure

From MyEnv v1 to v2 the directory structure of instances has fundamentally changed. Nevertheless MyEnv v2 works fine with MyEnv v1 directory structures.

Old structure

~/data/instance1/ibdata1 ~/data/instance1/ib_logfile? ~/data/instance1/my.cnf ~/data/instance1/error.log ~/data/instance1/mysql ~/data/instance1/test~/data/mypprod/ ~/data/instance1/general.log ~/data/instance1/slow.log ~/data/instance1/binlog.0000?? ~/data/instance2/...

New structure

~/database/instance1/binlog/binlog.0000?? ~/database/instance1/data/ibdata1 ~/database/instance1/data/ib_logfile? ~/database/instance1/data/mysql ~/database/instance1/data/test ~/database/instance1/etc/my.cnf ~/database/instance1/log/error.log ~/database/instance1/log/general.log ~/database/instance1/log/slow.log ~/database/instance1/tmp/ ~/database/instance2/...

But over time you possibly want to migrate the old structure to the new one. The following steps describe how you upgrade MyEnv instance structure v1 to v2:

mysql@chef:~ [mysql-57, 3320]> mypprod mysql@chef:~ [mypprod, 3309]> stop .. SUCCESS! mysql@chef:~ [mypprod, 3309]> mkdir ~/database/mypprod mysql@chef:~ [mypprod, 3309]> mkdir ~/database/mypprod/binlog ~/database/mypprod/data ~/database/mypprod/etc ~/database/mypprod/log ~/database/mypprod/tmp mysql@chef:~ [mypprod, 3309]> mv ~/data/mypprod/binary-log.* ~/database/mypprod/binlog/ mysql@chef:~ [mypprod, 3309]> mv ~/data/mypprod/my.cnf ~/database/mypprod/etc/ mysql@chef:~ [mypprod, 3309]> mv ~/data/mypprod/error.log ~/database/mypprod/log/ mysql@chef:~ [mypprod, 3309]> mv ~/data/mypprod/slow.log ~/database/mypprod/log/ mysql@chef:~ [mypprod, 3309]> mv ~/data/mypprod/general.log ~/database/mypprod/log/ mysql@chef:~ [mypprod, 3309]> mv ~/data/mypprod/* ~/database/mypprod/data/ mysql@chef:~ [mypprod, 3309]> rmdir ~/data/mypprod mysql@chef:~ [mypprod, 3309]> vi /etc/myenv/myenv.conf - datadir = /home/mysql/data/mypprod + datadir = /home/mysql/database/mypprod/data - my.cnf = /home/mysql/data/mypprod/my.cnf + my.cnf = /home/mysql/database/mypprod/etc/my.cnf + instancedir = /home/mysql/database/mypprod mysql@chef:~ [mypprod, 3309]> source ~/.bash_profile mysql@chef:~ [mypprod, 3309]> cde mysql@chef:~/database/mypprod/etc [mypprod, 3309]> vi my.cnf - log_bin = binary-log + log_bin = /home/mysql/database/mypprod/binlog/binary-log - datadir = /home/mysql/data/mypprod + datadir = /home/mysql/database/mypprod/data - tmpdir = /tmp + tmpdir = /home/mysql/database/mypprod/tmp - log_error = error.log + log_error = /home/mysql/database/mypprod/log/error.log - slow_query_log_file = slow.log + slow_query_log_file = /home/mysql/database/mypprod/log/slow.log - general_log_file = general.log + general_log_file = /home/mysql/database/mypprod/log/general.log mysql@chef:~/database/mypprod/etc [mypprod, 3309]> cdb mysql@chef:~/database/mypprod/binlog [mypprod, 3309]> vi binary-log.index - ./binary-log.000001 + /home/mysql/database/mypprod/binlog/binary-log.000001 - ./binary-log.000001 + /home/mysql/database/mypprod/binlog/binary-log.000001 mysql@chef:~/database/mypprod/binlog [mypprod, 3309]> start mysql@chef:~/database/mypprod/binlog [mypprod, 3309]> exit
Changes in MyEnv 2.0.0 MyEnv
  • New v2 instance directory structure and instancedir variable introduced, aliases adapted accordingly.
  • Configuration files aliases.conf and variables.conf made more user friendly.
  • PHP 7 support added.
  • Made MyEnv MySQL 8.0 ready.
  • Packaging (DEB/RPM) for RHEL 6 and 7 and SLES 11 and 12 DEB (Ubuntu/Debian) available.
  • OEM agent plug-in made ready for OEM v12.
  • More strict configuration checking.
  • Version more verbose.
  • Database health check mysqladmin replace by UNIX socket probing.
  • Various bug fixes (#168, #161, ...)
  • MyEnv made ready for systemd.
  • Bind-address output nicer in up.
  • New variables added to my.cnf template (super_read_only, innodb_tmpdir, innodb_flush_log_at_trx_commit, MySQL Group Replication, crash-safe Replication, GTID, MySQL 8.0)
MyEnv Installer
  • Installer made ready for systemd.
  • Question for angel process (mysqld_safe) and cgroups added.
  • Check for duplicate socket added.
  • Various bug fixes.
  • Purge data implemented.
MyEnv Utilities
  • Utility mysqlstat.php added.
  • Scripts for keepalived added.
  • Utilities mysql-create-instance.sh and mysql-remove-instance.sh removed.
  • Famous insert_test.sh, insert_test.php and test table improved.

For subscriptions of commercial use of MyEnv please get in contact with us.

Taxonomy upgrade extras:  MyEnv multi-instance virtualization consolidation SaaS Operations release upgrade mysqld_multi

MySQL 8.0: It Goes to 11!

MySQL has over the years grown into a considerable family of products and components. Years ago, someone running MySQL would typically have a single Server instance. Advanced users might have two and run replication. These days, MySQL is much more of a distributed system, offering advanced replication technology for HA, load balancing, systems for monitoring […]

Verifying Query Performance Using ProxySQL

In this blog post, we’ll look at how you can verify query performance using ProxySQL.

In the previous blog post, I showed you how many information can you get from the “stats.stats_mysql_query_digest” table in ProxySQL. I also mentioned you could even collect and graph these metrics. I will show you this is not just theory, it is possible.

These graphs could be very useful to understand the impact of the changes what you made on the query count or execution time.

I used our all-time favorite benchmark tool called Sysbench. I was running the following query:

UPDATE sbtest1 SET c=? WHERE k=?

There was no index on “k” when I started the test. During the test, I added an index. We expect to see some changes in the graphs.

I selected the “stats.stats_mysql_query_digest” into a file in every second, then I used Percona Monitoring and Management (PMM) to create graphs from the metrics. (I am going write another blog post on how can you use PMM to create graphs from any kind of metrics.)

Without the index, the update was running only 2-3 times per second. By adding the index, it went up to 400-500 hundred. We can see the results immediately on the graph.

Let’s see the average execution time:

Without the index, it took 600000-700000 microseconds, which is around 0.7s. By adding an index, it dropped to 0.01s. This is a big win, but most importantly we can see the effects on the query response time and query count if we are making some changes to the schema, query or configuration as well.

Conclusion

If you already have a ProxySQL server collecting and graphing these metrics, they could be quite useful when you are optimizing your queries. They can help make sure you are moving in the right direction with your tunings/modifications.

The post Verifying Query Performance Using ProxySQL appeared first on Percona Database Performance Blog.

ChickTech Austin PopUp Workshop: Database Basics with MySQL

This Saturday I will be teaching Database Basics with MySQL and there are literally just two seats  left!  The MySQL Community Team is always looking for ways to reach new audiences and we would like to make this class available to other groups (So let me know if you are interested).

And six hours is a really short time so it only scratches the surface.  Maybe we also need some intermediate classes above and beyond the introduction.

Comparing Oracle RAC HA Solution to Galera Cluster for MySQL or MariaDB

Business has continuously desired to derive insights from information to make reliable, smarter, real-time, fact-based decisions. As firms rely more on data and databases, information and data processing is the core of many business operations and business decisions. The faith in the database is total. None of the day-to-day company services can run without the underlying database platforms. As a consequence, the necessity on scalability and performance of database system software is more critical than ever. The principal benefits of the clustered database system are scalability and high availability. In this blog, we will try to compare Oracle RAC and Galera Cluster in the light of these two aspects. Real Application Clusters (RAC) is Oracle’s premium solution to clustering Oracle databases and provides High Availability and Scalability. Galera Cluster is the most popular clustering technology for MySQL and MariaDB.

Architecture overview

Oracle RAC uses Oracle Clusterware software to bind multiple servers. Oracle Clusterware is a cluster management solution that is integrated with Oracle Database, but it can also be used with other services, not only the database. The Oracle Clusterware is an additional software installed on servers running the same operating system, which lets the servers to be chained together to operate as if they were one server.

Oracle Clusterware watches the instance and automatically restarts it if a crash occurs. If your application is well designed, you may not experience any service interruption. Only a group of sessions (those connected to the failed instance) is affected by the failure. The blackout can be efficiently masked to the end user using advanced RAC features like Fast Application Notification and the Oracle client’s Fast Connection Failover. Oracle Clusterware controls node membership and prevents split brain symptoms in which two or more instances attempt to control the instance.

Galera Cluster is a synchronous active-active database clustering technology for MySQL and MariaDB. Galera Cluster differs from what is known as Oracle’s MySQL Cluster - NDB. MariaDB cluster is based on the multi-master replication plugin provided by Codership. Since version 5.5, the Galera plugin (wsrep API) is an integral part of MariaDB. Percona XtraDB Cluster (PXC) is also based on the Galera plugin. The Galera plugin architecture stands on three core layers: certification, replication, and group communication framework. Certification layer prepares the write-sets and does the certification checks on them, guaranteeing that they can be applied. Replication layer manages the replication protocol and provides the total ordering capability. Group Communication Framework implements a plugin architecture which allows other systems to connect via gcomm back-end schema.

To keep the state identical across the cluster, the wsrep API uses a Global Transaction ID. GTID unique identifier is created and associated with each transaction committed on the database node. In Oracle RAC, various database instances share access to resources such as data blocks in the buffer cache to enqueue data blocks. Access to the shared resources between RAC instances needs to be coordinated to avoid conflict. To organize shared access to these resources, the distributed cache maintains information such as data block ID, which RAC instance holds the current version of this data block, and the lock mode in which each instance contains the data block.

Data storage key concepts

Oracle RAC relies on a distributed disk architecture. The database files, control files and online redo logs for the database need be accessible to each node in the cluster. There is a variation of ways to configure shared storage including directly attached disks, Storage Area Networks (SAN), and Network Attached Storage (NAS) and Oracle ASM. Two most popular are OCFS and ASM. Oracle Cluster File System (OCFS) is a shared file system designed specifically for Oracle RAC. OCFS eliminates the requirement that Oracle database files be connected to logical drives and enables all nodes to share a single Oracle Home ASM, RAW Device. Oracle ASM is Oracle's advised storage management solution that provides an alternative to conventional volume managers, file systems, and raw devices. The Oracle ASM provides a virtualization layer between the database and storage. It treats multiple disks as a single disk group and lets you dynamically add or remove drives while maintaining databases online.

There is no need to build sophisticated shared disk storage for Galera, as each node has its full copy of data. However it is a good practice to make the storage reliable with battery-backed write caches.

Oracle RAC, Cluster storage Galera replication, disks attached to database nodes Cluster nodes communication and cache

Oracle Real Application Clusters has a shared cache architecture, it utilizes Oracle Grid Infrastructure to enable the sharing of server and storage resources. Communication between nodes is the critical aspect of cluster integrity. Each node must have at least two network adapters or network interface cards: one for the public network interface, and one for the interconnect. Each cluster node is connected to all other nodes via a private high-speed network, also recognized as the cluster interconnect.

Oracle RAC, network architecture

The private network is typically formed with Gigabit Ethernet, but for high-volume environments, many vendors offer low-latency, high-bandwidth solutions designed for Oracle RAC. Linux also extends a means of bonding multiple physical NICs into a single virtual NIC to provide increased bandwidth and availability.

While the default approach to connecting Galera nodes is to use a single NIC per host, you can have more than one card. ClusterControl can assist you with such setup. The main difference is the bandwidth requirement on the interconnect. Oracle RAC ships blocks of data between instances, so it places a heavier load on the interconnect as compared to Galera write-sets (which consist of a list of operations).

With Redundant Interconnect Usage in RAC, you can identify multiple interfaces to use for the private cluster network, without the need of using bonding or other technologies. This functionality is available starting with Oracle Database 11gR2. If you use the Oracle Clusterware excessive interconnect feature, then you must use IPv4 addresses for the interfaces (UDP is a default).

To manage high availability, each cluster node is assigned a virtual IP address (VIP). In the event of node failure, the failed node's IP address can be reassigned to a surviving node to allow applications continue to reach the database through the same IP address.

Sophisticated network setup is necessary to Oracle's Cache Fusion technology to couple the physical memory in each host into a single cache. Oracle Cache Fusion provides data stored in the cache of one Oracle instance to be accessed by any other instance by transporting it across the private network. It also protects data integrity and cache coherency by transmitting locking and supplementary synchronization information beyond cluster nodes.

On top of the described network setup, you can set a single database address for your application - Single Client Access Name (SCAN). The primary purpose of SCAN is to provide ease of connection management. For instance, you can add new nodes to the cluster without changing your client connection string. This functionality is because Oracle will automatically distribute requests accordingly based on the SCAN IPs which point to the underlying VIPs. Scan listeners do the bridge between clients and the underlying local listeners which are VIP-dependent.

For Galera Cluster, the equivalent of SCAN would be adding a database proxy in front of the Galera nodes. The proxy would be a single point of contact for applications, it can blacklist failed nodes and route queries to healthy nodes. The proxy itself can be made redundant with Keepalived and Virtual IP.

ClusterControl Single Console for Your Entire Database Infrastructure Find out what else is new in ClusterControl Install ClusterControl for FREE Failover and data recovery

The main difference between Oracle RAC and MySQL Galera Cluster is that Galera is shared nothing architecture. Instead of shared disks, Galera uses certification based replication with group communication and transaction ordering to achieve synchronous replication. A database cluster should be able to survive a loss of a node, although it's achieved in different ways. In case of Galera, the critical aspect is the number of nodes, Galera requires a quorum to stay operational. A three node cluster can survive the crash of one node. With more nodes in your cluster, your availability will grow. Oracle RAC doesn't require a quorum to stay operational after a node crash. It is because of the access to distributed storage that keeps consistent information about cluster state. However, your data storage could be a potential point of failure in your high availability plan. While it's reasonably straightforward task to have Galera cluster nodes spread across geolocation data centers, it wouldn't be that easy with RAC. Oracle RAC requires additional high-end disk mirroring however, basic RAID like redundancy can be achieved inside an ASM diskgroup.

Disk Group Type Supported Mirroring Levels Default Mirroring Level External redundancy Unprotected (none) Unprotected Normal redundancy Two-way, three-way, unprotected (none) Two-way High redundancy Three-way Three-way Flex redundancy Two-way, three-way, unprotected (none) Two-way (newly-created) Extended redundancy Two-way, three-way, unprotected (none) Two-way ASM Disk Group redundancy Locking Schemes

In a single-user database, a user can alter data without concern for other sessions modifying the same data at the same time. However, in a multi-user database multi-node environment, this become more tricky. A multi-user database must provide the following:

  • data concurrency - the assurance that users can access data at the same time,
  • data consistency - the assurance that each user sees a consistent view of the data.

Cluster instances require three main types of concurrency locking:

  • Data concurrency reads on different instances,
  • Data concurrency reads and writes on different instances,
  • Data concurrency writes on different instances.

Oracle lets you choose the policy for locking, either pessimistic or optimistic, depending on your requirements. To obtain concurrency locking, RAC has two additional buffers. They are Global Cache Service (GCS) and Global Enqueue Service (GES). These two services cover the Cache Fusion process, resource transfers, and resource escalations among the instances. GES include cache locks, dictionary locks, transaction locks and table locks. GCS maintains the block modes and block transfers between the instances.

In Galera cluster, each node has its storage and buffers. When a transaction is started, database resources local to that node are involved. At commit, the operations that are part of that transaction are broadcasted as part of a write-set, to the rest of the group. Since all nodes have the same state, the write-set will either be successful on all nodes or it will fail on all nodes.

Galera Cluster uses at the cluster-level optimistic concurrency control, which can appear in transactions that result in a COMMIT aborting. The first commit wins. When aborts occur at the cluster level, Galera Cluster gives a deadlock error. This may or may not impact your application architecture. High number of rows to replicate in a single transaction would impact node responses, although there are techniques to avoid such behavior.

Hardware & Software requirements

Configuring both clusters hardware doesn’t require potent resources. Minimal Oracle RAC cluster configuration would be satisfied by two servers with two CPUs, physical memory at least 1.5 GB of RAM, an amount of swap space equal to the amount of RAM and two Gigabit Ethernet NICs. Galera’s minimum node configuration is three nodes (one of nodes can be an arbitrator, gardb), each with 1GHz single-core CPU 512MB RAM, 100 Mbps network card. While these are the minimal, we can safely say that in both cases you would probably like to have more resources for your production system.

Each node stores software so you would need to prepare several gigabytes of your storage. Oracle and Galera both have the ability to individually patch the nodes by taking them down one at a time. This rolling patch avoids a complete application outage as there are always database nodes available to handle traffic.

What is important to mention is that a production Galera cluster can easily run on VM’s or basic bare metal, while RAC would need investment in sophisticated shared storage and fiber communication.

Monitoring and management

Oracle Enterprise Manager is the favored approach for monitoring Oracle RAC and Oracle Clusterware. Oracle Enterprise Manager is an Oracle Web-based unified management system for monitoring and administering your database environment. It’s part of Oracle Enterprise License and should be installed on separate server. Cluster control monitoring and management is done via combination on crsctl and srvctl commands which are part of cluster binaries. Below you can find a couple of example commands.

Clusterware Resource Status Check:

crsctl status resource -t (or shorter: crsctl stat res -t)

Example:

$ crsctl stat res ora.test1.vip NAME=ora.test1.vip TYPE=ora.cluster_vip_net1.type TARGET=ONLINE STATE=ONLINE on test1

Check the status of the Oracle Clusterware stack:

crsctl check cluster

Example:

$ crsctl check cluster -all ***************************************************************** node1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ***************************************************************** node2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online

Check the status of Oracle High Availability Services and the Oracle Clusterware stack on the local server:

crsctl check crs

Example:

$ crsctl check crs CRS-4638: Oracle High Availablity Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online

Stop Oracle High Availability Services on the local server.

crsctl stop has

Stop Oracle High Availability Services on the local server.

crsctl start has

Displays the status of node applications:

srvctl status nodeapps

Displays the configuration information for all SCAN VIPs

srvctl config scan

Example:

srvctl config scan -scannumber 1 SCAN name: testscan, Network: 1 Subnet IPv4: 192.51.100.1/203.0.113.46/eth0, static Subnet IPv6: SCAN 1 IPv4 VIP: 192.51.100.195 SCAN VIP is enabled. SCAN VIP is individually enabled on nodes: SCAN VIP is individually disabled on nodes:

The Cluster Verification Utility (CVU) performs system checks in preparation for installation, patch updates, or other system changes:

cluvfy comp ocr

Example:

Verifying OCR integrity Checking OCR integrity... Checking the absence of a non-clustered configurationl... All nodes free of non-clustered, local-only configurations ASM Running check passed. ASM is running on all specified nodes Checking OCR config file “/etc/oracle/ocr.loc"... OCR config file “/etc/oracle/ocr.loc" check successful Disk group for ocr location “+DATA" available on all the nodes NOTE: This check does not verify the integrity of the OCR contents. Execute ‘ocrcheck' as a privileged user to verify the contents of OCR. OCR integrity check passed Verification of OCR integrity was successful.

Galera nodes and the cluster requires the wsrep API to report several statuses, which is exposed. There are currently 34 dedicated status variables can be viewed with SHOW STATUS statement.

mysql> SHOW STATUS LIKE 'wsrep_%'; wsrep_apply_oooe
wsrep_apply_oool
wsrep_cert_deps_distance
wsrep_cluster_conf_id
wsrep_cluster_size
wsrep_cluster_state_uuid
wsrep_cluster_status
wsrep_connected
wsrep_flow_control_paused
wsrep_flow_control_paused_ns
wsrep_flow_control_recv wsrep_local_send_queue_avg
wsrep_local_state_uuid
wsrep_protocol_version
wsrep_provider_name
wsrep_provider_vendor
wsrep_provider_version
wsrep_flow_control_sent
wsrep_gcomm_uuid
wsrep_last_committed
wsrep_local_bf_aborts
wsrep_local_cert_failures wsrep_local_commits
wsrep_local_index
wsrep_local_recv_queue
wsrep_local_recv_queue_avg
wsrep_local_replays
wsrep_local_send_queue
wsrep_ready
wsrep_received
wsrep_received_bytes
wsrep_replicated
wsrep_replicated_bytes
wsrep_thread_count

The administration of MySQL Galera Cluster in many aspects is very similar. There are just few exceptions like bootstrapping the cluster from initial node or recovering nodes via SST or IST operations.

Bootstrapping cluster:

$ service mysql bootstrap # sysvinit $ service mysql start --wsrep-new-cluster # sysvinit $ galera_new_cluster # systemd $ mysqld_safe --wsrep-new-cluster # command line

The equivalent Web-based, out of the box solution to manage and monitor Galera Cluster is ClusterControl. It provides a web-based interface to deploy clusters, monitors key metrics, provides database advisors, and take care of management tasks like backup and restore, automatic patching, traffic encryption and availability management.

Restrictions on workload

Oracle provides SCAN technology which we found missing in Galera Cluster. The benefit of SCAN is that the client’s connection information does not need to change if you add or remove nodes or databases in the cluster. When using SCAN, the Oracle database randomly connects to one of the available SCAN listeners (typically three) in a round robin fashion and balances the connections between them. Two kinds load balancing can be configured: client side, connect time load balancing and on the server side, run time load balancing. Although there is nothing similar within Galera cluster itself, the same functionality can be addressed with additional software like ProxySQL, HAProxy, Maxscale combined with Keepalived.

When it comes to application workload design for Galera Cluster, you should avoid conflicting updates on the same row, as it leads to deadlocks across the cluster. Avoid bulk inserts or updates, as these might be larger than the maximum allowed writeset. That might also cause cluster stalls.

Designing Oracle HA with RAC you need to keep in mind that RAC only protects against server failure, and you need to mirror the storage and have network redundancy. Modern web applications require access to location-independent data services, and because of RAC’s storage architecture limitations, it can be tricky to achieve. You also need to spend a notable amount of time to gain relevant knowledge to manage the environment; it is a long process. On the application workload side, there are some drawbacks. Distributing separated read or write operations on the same dataset is not optimal because latency is added by supplementary internode data exchange. Things like partitioning, sequence cache, and sorting operations should be reviewed before migrating to RAC.

Multi data-center redundancy

According to the Oracle documentation, the maximum distance between two boxes connected in a point-to-point fashion and running synchronously can be only 10 km. Using specialized devices, this distance can be increased to 100 km.

Galera Cluster is well known for its multi-datacenter replication capabilities. It has rich support for Wider Area Networks network settings. It can be configured for high network latency by taking Round-Trip Time (RTT) measurements between cluster nodes and adjusting necessary parameters. The wsrep_provider_options parameters allow you to configure settings like suspect_timeout, interactive_timeout, join_retrans_timouts and many more.

Using Galera and RAC in Cloud

Per Oracle note www.oracle.com/technetwork/database/options/.../rac-cloud-support-2843861.pdf no third-party cloud currently meets Oracle’s requirements regarding natively provided shared storage. “Native” in this context means that the cloud provider must support shared storage as part of their infrastructure as per Oracle’s support policy.

Thanks to its shared nothing architecture, which is not tied to a sophisticated storage solution, Galera cluster can be easily deployed in a cloud environment. Things like:

  • optimized network protocol,
  • topology-aware replication,
  • traffic encryption,
  • detection and automatic eviction of unreliable nodes,

makes cloud migration process more reliable.

Licenses and hidden costs

Oracle licensing is a complex topic and would require a separate blog article. The cluster factor makes it even more difficult. The cost goes up as we have to add some options to license a complete RAC solution. Here we just want to highlight what to expect and where to find more information.

RAC is a feature of Oracle Enterprise Edition license. Oracle Enterprise license is split into two types, per named user and per processor. If you consider Enterprise Edition with per core license, then the single core cost is RAC 23,000 USD + Oracle DB EE 47,500 USD, and you still need to add a ~ 22% support fee. We would like to refer to a great blog on pricing found on https://flashdba.com/2013/09/18/the-real-cost-of-oracle-rac/.

Related resources  ClusterControl for Galera Cluster  Tutorial: Galera Cluster for MySQL  Load balanced MySQL Galera setup - Manual Deployment vs ClusterControl  MySQL High Availability tools - Comparing MHA, MRM and ClusterControl

Flashdba calculated the price of a four node Oracle RAC. The total amount was 902,400 USD plus additional 595,584 USD for three years DB maintenance, and that does not include features like partitioning or in-memory database, all that with 60% Oracle discount.

Galera Cluster is an open source solution that anyone can run for free. Subscriptions are available for production implementations that require vendor support. A good TCO calculation can be found at https://severalnines.com/blog/database-tco-calculating-total-cost-ownership-mysql-management.

Conclusion

While there are significant differences in architecture, both clusters share the main principles and can achieve similar goals. Oracle enterprise product comes with everything out of the box (and it's price). With a cost in the range of >1M USD as seen above, it is a high-end solution that many enterprises would not be able to afford. Galera Cluster can be described as a decent high availability solution for the masses. In certain cases, Galera may well be a very good alternative to Oracle RAC. One drawback is that you have to build your own stack, although that can be completely automated with ClusterControl. We’d love to hear your thoughts on this.

Tags:  MySQL galera cluster galera oracle rac ha high availability

Pages