Planet MySQL

How to run multiple ProxySQL instances

How to run multiple ProxySQL instances

Some time ago I wrote about the ability of running multiple ProxySQL instances listening on the same port(s).
This capability is present since ProxySQL 1.3.0 , but not much used.
Although the feature seems super interesting, how to use it seems a bit not intiutive.
After a series of Q&A on Slack (mostly reported at the bottom of this blog post), I decided it is time to better describe this feature.
This blog post try to describe how to use this feature, starting from installation.

Note: this feature requires Linux kernel >= 3.9 .

Installation

In this example, installation is performed using deb packages from github.com repository:

root@nebula:~# wget -q https://github.com/sysown/proxysql/releases/download/v1.4.1/proxysql_1.4.1-ubuntu16_amd64.deb root@nebula:~# root@nebula:~# dpkg -i proxysql_1.4.1-ubuntu16_amd64.deb Selecting previously unselected package proxysql. (Reading database ... 269571 files and directories currently installed.) Preparing to unpack .../proxysql_1.4.1-ubuntu16_amd64.deb ... Unpacking proxysql (1.4.1) ... Setting up proxysql (1.4.1) ... Processing triggers for systemd (229-4ubuntu16) ... Processing triggers for ureadahead (0.100.0-19) ...

During the installation, an empty datadir is created, as well as a minimal config file:

root@nebula:~# ls -l /var/lib/proxysql/ total 0 root@nebula:~# ls -l /etc/proxysql.cnf -rw-r--r-- 1 root root 4517 ago 1 13:23 /etc/proxysql.cnf Starting ProxySQL

When ProxySQL starts, it normally starts as a daemon. As part of this process, it checks if there is a pid file in the datadir and if another proxysql instance is running. If yes, it returns immediately.
Therefore running two instances of ProxySQL needs some extra configuration.
Although it is possible for two proxysql instances to share the same database file proxysql.db , it is a bit complex and we won't describe this scenario. We will instead run two proxysql instances, each with its own datadir and configuration file (minimal configuration file!)

Preparing the datadirs

We will simply create two directories in /var/lib/proxysql , each directory representing the datadir of a process.

root@nebula:~# mkdir /var/lib/proxysql/proxy01 root@nebula:~# mkdir /var/lib/proxysql/proxy02

Next, we will create 2 config files, one in each datadir. The only thing we will configure is the listener port of the Admin interface, because otherwise we won't be able to specify to which ProxySQL's Admin we will connect.
Note that is also possible to configure two different Unix domain sockets, but in this example we will use TCP sockets.

root@nebula:~# cat > /var/lib/proxysql/proxy01/proxysql.cnf << EOF > datadir="/var/lib/proxysql/proxy01" > admin_variables= > { > mysql_ifaces="0.0.0.0:6031" > } > EOF root@nebula:~# cat > /var/lib/proxysql/proxy02/proxysql.cnf << EOF > datadir="/var/lib/proxysql/proxy02" > admin_variables= > { > mysql_ifaces="0.0.0.0:6032" > } > EOF

Verify the files:

root@nebula:~# cat /var/lib/proxysql/proxy01/proxysql.cnf datadir="/var/lib/proxysql/proxy01" admin_variables= { mysql_ifaces="0.0.0.0:6031" } root@nebula:~# cat /var/lib/proxysql/proxy02/proxysql.cnf datadir="/var/lib/proxysql/proxy02" admin_variables= { mysql_ifaces="0.0.0.0:6032" }

Now we can start two proxysql instances, each one with its config file:

root@nebula:~# proxysql -c /var/lib/proxysql/proxy01/proxysql.cnf root@nebula:~# proxysql -c /var/lib/proxysql/proxy02/proxysql.cnf

Next, let's check what is running:

root@nebula:~# ps -eF | grep proxysql | grep cnf root 16383 1855 0 8307 7052 6 20:32 ? 00:00:00 proxysql -c /var/lib/proxysql/proxy01/proxysql.cnf root 16384 16383 0 24920 31956 5 20:32 ? 00:00:00 proxysql -c /var/lib/proxysql/proxy01/proxysql.cnf root 16410 1855 0 8307 3368 3 20:32 ? 00:00:00 proxysql -c /var/lib/proxysql/proxy02/proxysql.cnf root 16411 16410 0 24920 8044 2 20:32 ? 00:00:00 proxysql -c /var/lib/proxysql/proxy02/proxysql.cnf

Perfect, we have four proxysql processes! Remember, when starting as a daemon, proxysql immediately forks:

  • the child process is the real service
  • the parent process checks the exit status of the child process, and eventually restart it immediately in under 15ms if it crashes.

We can now verify that the two services are using a datadir each:

root@nebula:~# ls -l /var/lib/proxysql/ total 8 drwxr-xr-x 2 root root 4096 ago 1 20:32 proxy01 drwxr-xr-x 2 root root 4096 ago 1 20:32 proxy02 root@nebula:~# ls -l /var/lib/proxysql/proxy01/ total 108 -rw-r--r-- 1 root root 87 ago 1 20:30 proxysql.cnf -rw------- 1 root root 98304 ago 1 20:32 proxysql.db -rw------- 1 root root 1239 ago 1 20:32 proxysql.log -rw-r--r-- 1 root root 6 ago 1 20:32 proxysql.pid root@nebula:~# ls -l /var/lib/proxysql/proxy02/ total 108 -rw-r--r-- 1 root root 87 ago 1 20:30 proxysql.cnf -rw------- 1 root root 98304 ago 1 20:32 proxysql.db -rw------- 1 root root 1239 ago 1 20:32 proxysql.log -rw-r--r-- 1 root root 6 ago 1 20:32 proxysql.pid Verify listening processes

Next, let's check which ports are used by the two proxysql instances:

root@nebula:~# netstat -ntap | grep proxysql tcp 0 0 0.0.0.0:6031 0.0.0.0:* LISTEN 16384/proxysql tcp 0 0 0.0.0.0:6032 0.0.0.0:* LISTEN 16411/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 16411/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 16411/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 16411/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 16411/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 16384/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 16384/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 16384/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 16384/proxysql

As expected, they are listening on port 6031 (pid 16384, proxysql1), port 6032 (pid 16411, proxysql2), and port 6033 (both pid 16384 and 16411).

But why each proxysql processes is reported 4 times as listening on port 6033?
In proxysql, each MySQL Thread is listening on said port. Because mysql-threads=4 by default, that means that each proxysql process has 4 threads listening on port 6033. That is, multiple threads are listening on the same socket:

root@nebula:~# lsof -i:6033 -n COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME proxysql 16384 root 15u IPv4 705075 0t0 TCP *:6033 (LISTEN) proxysql 16384 root 16u IPv4 705076 0t0 TCP *:6033 (LISTEN) proxysql 16384 root 17u IPv4 705077 0t0 TCP *:6033 (LISTEN) proxysql 16384 root 18u IPv4 705078 0t0 TCP *:6033 (LISTEN) proxysql 16411 root 15u IPv4 700302 0t0 TCP *:6033 (LISTEN) proxysql 16411 root 16u IPv4 700303 0t0 TCP *:6033 (LISTEN) proxysql 16411 root 17u IPv4 700304 0t0 TCP *:6033 (LISTEN) proxysql 16411 root 18u IPv4 700305 0t0 TCP *:6033 (LISTEN)

As a result, 8 threads are listening on the same port, 4 threads for each proxysql process.

Create a user and a backend in ProxySQL

To test how connections are load balanced, we first need to create a user in ProxySQL to allow clients to connect.
We will also configure a backend.
Because each proxysql instance has its own configuration, both need to be configured.

root@nebula:~# for i in 6031 6032 ; do mysql -u admin -padmin -h 127.0.0.1 -P$i -e "INSERT INTO mysql_users (username,password) VALUES ('sbtest','sbtest'); LOAD MYSQL USERS TO RUNTIME;" ; done mysql: [Warning] Using a password on the command line interface can be insecure. mysql: [Warning] Using a password on the command line interface can be insecure. root@nebula:~# for i in 6031 6032 ; do mysql -u admin -padmin -h 127.0.0.1 -P$i -e "INSERT INTO mysql_servers (hostname) VALUES ('127.0.0.1'); LOAD MYSQL SERVERS TO RUNTIME;" ; done mysql: [Warning] Using a password on the command line interface can be insecure. mysql: [Warning] Using a password on the command line interface can be insecure. Load balance traffic

It is now time to check how the kernel will load balance the traffic.
We will run very simple queries, specifically SELECT 1 using mysqlslap :

root@nebula:~# time mysqlslap --create-schema=sbtest -u sbtest -psbtest -h 127.0.0.1 -P6033 -c 8 -i 1000 -q "SELECT 1" mysqlslap: [Warning] Using a password on the command line interface can be insecure. Benchmark Average number of seconds to run all queries: 0.000 seconds Minimum number of seconds to run all queries: 0.000 seconds Maximum number of seconds to run all queries: 0.004 seconds Number of clients running queries: 8 Average number of queries per client: 1 real 0m1.131s user 0m0.344s sys 0m0.868s

How do we verify how many queries were executed and connections established on each proxysql?

root@nebula:~# for i in 6031 6032 ; do mysql -u admin -padmin -h 127.0.0.1 -P$i -e "SELECT * FROM stats_mysql_global WHERE VARIABLE_NAME IN ('Client_Connections_created','Questions')" 2> /dev/null ; done +----------------------------+----------------+ | Variable_Name | Variable_Value | +----------------------------+----------------+ | Client_Connections_created | 4034 | | Questions | 4033 | +----------------------------+----------------+ +----------------------------+----------------+ | Variable_Name | Variable_Value | +----------------------------+----------------+ | Client_Connections_created | 3967 | | Questions | 3967 | +----------------------------+----------------+

The kernel load balanced the traffic almost evenly:

  • 50.41% of the connections to proxysql1
  • 49.59% of the connections to proxysql2
PROXYSQL PAUSE

Very old versions of ProxySQL (maybe 0.2 or 1.0) has a command that allows to gracefully shutdown the listener. While ProxySQL became more modular and complex, this feature was somehow broken, but finally restored in 1.4.1 , following a feature request.
Command PROXYSQL PAUSE stops accepting new connections, while continue processing the requests from the current connections.
Why this feature is important? If there is the need to restart ProxySQL (for example to perform an upgrade), it is possible to execute PROXYSQL PAUSE, wait for all the clients to disconnect, and finally stop the process, so that clients' connections are terminated gracefully.

Let's try this running again mysqlslap for a longer period of time, and issuing PROXYSQL PAUSE on one of the proxies.

(Note: both proxies were erroneously restarted between the previous test and this one, therefore pids are changed and counters reset)

root@nebula:~# ( ( sleep 2 ; mysql -u admin -padmin -h 127.0.0.1 -P6031 -e "PROXYSQL PAUSE") & ) ; time mysqlslap --create-schema=sbtest -u sbtest -psbtest -h 127.0.0.1 -P6033 -c 8 -i 300 -q "SELECT sleep(0.01)" mysqlslap: [Warning] Using a password on the command line interface can be insecure. mysql: [Warning] Using a password on the command line interface can be insecure. Benchmark Average number of seconds to run all queries: 0.011 seconds Minimum number of seconds to run all queries: 0.010 seconds Maximum number of seconds to run all queries: 0.111 seconds Number of clients running queries: 8 Average number of queries per client: 1 real 0m3.763s user 0m0.204s sys 0m0.632s

Let's check where queries were executed. We expect that proxysql1 has received less connections because its listener was stopped during he test:

root@nebula:~# for i in 6031 6032 ; do mysql -u admin -padmin -h 127.0.0.1 -P$i -e "SELECT * FROM stats_mysql_global WHERE VARIABLE_NAME IN ('Client_Connections_created','Questions')" 2> /dev/null ; done +----------------------------+----------------+ | Variable_Name | Variable_Value | +----------------------------+----------------+ | Client_Connections_created | 644 | | Questions | 644 | +----------------------------+----------------+ +----------------------------+----------------+ | Variable_Name | Variable_Value | +----------------------------+----------------+ | Client_Connections_created | 1757 | | Questions | 1756 | +----------------------------+----------------+

One important thing to highlight is that mysqlslap didn't receive any error: this because when the listener is stopped, current connections are not terminated.

Now let's check listeners. Only one process is listening on port 6033:

root@nebula:~# netstat -ntlp | grep proxysql tcp 0 0 0.0.0.0:6031 0.0.0.0:* LISTEN 13909/proxysql tcp 0 0 0.0.0.0:6032 0.0.0.0:* LISTEN 13936/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13936/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13936/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13936/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13936/proxysql

That means that if we executes mysqlslap again, all traffic will go to proxysql2:

root@nebula:~# time mysqlslap --create-schema=sbtest -u sbtest -psbtest -h 127.0.0.1 -P6033 -c 8 -i 300 -q "SELECT sleep(0.01)" mysqlslap: [Warning] Using a password on the command line interface can be insecure. Benchmark Average number of seconds to run all queries: 0.011 seconds Minimum number of seconds to run all queries: 0.011 seconds Maximum number of seconds to run all queries: 0.018 seconds Number of clients running queries: 8 Average number of queries per client: 1 real 0m3.817s user 0m0.220s sys 0m0.648s root@nebula:~# for i in 6031 6032 ; do mysql -u admin -padmin -h 127.0.0.1 -P$i -e "SELECT * FROM stats_mysql_global WHERE VARIABLE_NAME IN ('Client_Connections_created','Questions')" 2> /dev/null ; done +----------------------------+----------------+ | Variable_Name | Variable_Value | +----------------------------+----------------+ | Client_Connections_created | 644 | | Questions | 644 | +----------------------------+----------------+ +----------------------------+----------------+ | Variable_Name | Variable_Value | +----------------------------+----------------+ | Client_Connections_created | 4158 | | Questions | 4156 | +----------------------------+----------------+

Confirmed! All new requests went to proxysql2.

PROXYSQL RESUME

Surprise! :)
ProxySQL doesn't have just the command PROXYSQL PAUSE, but also the command PROXYSQL RESUME: all the MySQL Threads inside ProxySQL will start listening again on listener port. Let's run again mysqlslap, and while it run we execute PROXYSQL RESUME on proxysql1:

root@nebula:~# ( ( sleep 2 ; mysql -u admin -padmin -h 127.0.0.1 -P6031 -e "PROXYSQL RESUME") & ) ; time mysqlslap --create-schema=sbtest -u sbtest -psbtest -h 127.0.0.1 -P6033 -c 8 -i 300 -q "SELECT sleep(0.01)" mysqlslap: [Warning] Using a password on the command line interface can be insecure. mysql: [Warning] Using a password on the command line interface can be insecure. Benchmark Average number of seconds to run all queries: 0.012 seconds Minimum number of seconds to run all queries: 0.011 seconds Maximum number of seconds to run all queries: 0.017 seconds Number of clients running queries: 8 Average number of queries per client: 1 real 0m3.839s user 0m0.224s sys 0m0.700s

Let's now check where connections were established and queries executed:

root@nebula:~# for i in 6031 6032 ; do mysql -u admin -padmin -h 127.0.0.1 -P$i -e "SELECT * FROM stats_mysql_global WHERE VARIABLE_NAME IN ('Client_Connections_created','Questions')" 2> /dev/null ; done +----------------------------+----------------+ | Variable_Name | Variable_Value | +----------------------------+----------------+ | Client_Connections_created | 1210 | | Questions | 1210 | +----------------------------+----------------+ +----------------------------+----------------+ | Variable_Name | Variable_Value | +----------------------------+----------------+ | Client_Connections_created | 5993 | | Questions | 5990 | +----------------------------+----------------+

As expected, now proxysql1 has resumed operation and processing requests!
And finally, netstat confirms that both proxies are listening on port 6033:

root@nebula:~# netstat -ntlp | grep proxysql tcp 0 0 0.0.0.0:6031 0.0.0.0:* LISTEN 13909/proxysql tcp 0 0 0.0.0.0:6032 0.0.0.0:* LISTEN 13936/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13909/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13909/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13909/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13909/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13936/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13936/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13936/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13936/proxysql Conclusion

Running multiple ProxySQL instances on the same box, all listening on the same port(s), it is a great way to provide HA, to perform graceful upgrade, or test new functionalities or configurations.
All this, completely transparent to the application.

Q & A

Below is reported the Q&A that we had on Slack on this subject.

Q. Wait, "Multiple ProxySQL instances can listen on same TCP port" - really? How did you do that?
A. Yes, actually, it's very easy, you just need Linux kernel 3.9+. Yet, a lot of production systems use 2.6. (Example, CentOS 6.x used 2.6 and current CentOS 7 uses 3.10.)

Q. Wow. The kernel must round robin incoming connections to listening processes? And keeps track by source IP and TCP port number?
A. Yes. Once a connection is bound to a process, it stays locked there, until it is disconnected.

Q. So tell me again how I can use this in ProxySQL.
A. The new command "PROXYSQL PAUSE" allows a graceful shutdown. If you have 2 proxies, you can issue PROXYSQL PAUSE in one of them, and from that moment on, all new connections will go to the other proxy. When the first proxy has no more connections, you can shut it down.

Q. And my connecting applications will never know the difference! I could run /usr/bin/proxysql01 and /usr/bin/proxysql02 both on port 6033, then PAUSE proxysql01, shut it down, and upgrade just that proxysql01 binary, while traffic still routes through 02?
A. Yes you can. If you run them in the foreground, that is easy. If you run them in the background, you need to specify 2 different datadirs, because otherwise the 2nd won't start because if will find a pid file.

Q. Right, data directory management ... though you could technically have both proxysql01 and proxysql02 share the same proxysql.db file? Admin them both separately, tell each to SAVE BLAH TO DISK, one config database, but then you are overwriting the db file so better be careful you know what you are doing!
A. Exactly. They can share the same proxysql.db, but using different database files makes things easier.

Q. I guess you better also run them on separate admin TCP ports -- else what a nightmare.
A. Yes. Technically you can use different unix socket domains, but it just makes management more complex. Using different datadir and different admin ports is way easier. Something like 6301, 6302, etc.

Q. But no matter what, I have to remember to make any configuration changes twice, on both instances, (or more) right?
A. Yes, you have to manage each instance separately, and be careful to keep them in sync since they are listening on the same host on the same port. Linux is choosing which one gets each incoming proxysql connection and you cannot control that. If you have different RUNTIME configs on same port you will get unpredictable results.

Q. I guess I could carefully use different pid files but share one proxysql.db file, SAVE {CONFIG} TO DISK in my master admin instance, then LOAD {CONFIG} FROM DISK and LOAD {CONFIG} TO RUNTIME on the other instances, to make sure they all had the same changes.
A. Yes, that would work. Example, LOAD MYSQL QUERY RULES FROM DISK and LOAD MYSQL QUERY RULES TO RUNTIME would bring up changes into your secondary instance(s) after saving from primary.

Q. In summary, this feature seems powerful but could be tricky to manage.
A. Yes for administration here you must be careful. It is only if you want to be able to upgrade proxysql in-place without losing service for even a few seconds. Remember you can still do a lot with proxysql in its more typical mode of one binary running and listening on one port. You can still accept thousands of connections and route them to mysql backends based on almost any criteria.

Releasing ProxySQL 1.4.1

Releasing ProxySQL 1.4.1

ProxySQL is a high performance, high availability, protocol aware proxy for MySQL, with a GPL license!
Today I am excited to announce the release of ProxySQL 1.4.1, the first stable release of series 1.4.x .
The previous version of ProxySQL 1.4 (1.4.0) was released in April and was tested in various environments, leading to what today is a stable 1.4 release.
A list of features, improvements and changes introduced in 1.4 are listed here.
As announced previously, all ProxySQL versions 1.3.x will be mostly bug fixes releases, and unlikely will introduce new features: all the new features will go into ProxySQL 1.4 .
If you are currently using ProxySQL 1.3 in production, make sure to test 1.4 before upgrading, and to perform a backup before the upgrade itself.

A special thanks to all the people that reported and reports bugs: this makes each version of ProxySQL better than the previous one.
Please report any bugs or feature requests on github issue tracker

Thanks

Group Replication: The Sweet and the Sour

In this blog, we’ll look at group replication and how it deals with flow control (FC) and replication lag. 

Overview

In the last few months, we had two main actors in the MySQL ecosystem: ProxySQL and Group-Replication (with the evolution to InnoDB Cluster). 

While I have extensively covered the first, my last serious work on Group Replication dates back to some lab version years past.

Given that Oracle decided to declare it GA, and Percona’s decision to provide some level of Group Replication support, I decided it was time for me to take a look at it again.

We’ve seen a lot of coverage already too many Group Replication topics. There are articles about Group Replication and performance, Group Replication and basic functionalities (or lack of it like automatic node provisioning), Group Replication and ProxySQL, and so on.

But one question kept coming up over and over in my mind. If Group Replication and InnoDB Cluster have to work as an alternative to other (virtually) synchronous replication mechanisms, what changes do our customers need to consider if they want to move from one to the other?

Solutions using Galera (like Percona XtraDB Cluster) must take into account a central concept: clusters are data-centric. What matters is the data and the data state. Both must be the same on each node at any given time (commit/apply). To guarantee this, Percona XtraDB Cluster (and other solutions) use a set of data validation and Flow Control processes that work to the ensure a consistent cluster data set on each node.

The upshot of this principle is that an application can query ANY node in a Percona XtraDB Cluster and get the same data, or write to ANY node and know that the data is visible everywhere in the cluster at (virtually) the same time.

Last but not least, inconsistent nodes should be excluded and either rebuild or fixed before rejoining the cluster.

If you think about it, this is very useful. Guaranteeing consistency across nodes allows you to transparently split write/read operations, failover from one node to another with very few issues, and more.

When I conceived of this blog on Group Replication (or InnoDB Cluster), I put myself in the customer shoes. I asked myself: “Aside from all the other things we know (see above), what is the real impact of moving from Percona XtraDB Cluster to Group Replication/InnoDB Cluster for my application? Since Group Replication still (basically) uses replication with binlogs and relaylog, is there also a Flow Control mechanism?” An alarm bell started to ring in my mind.

My answer is: “Let’s do a proof of concept (PoC), and see what is really going on.”

The POC

I setup a simple set of servers using Group Replication with a very basic application performing writes on a single writer node, and (eventually) reads on the other nodes. 

You can find the schema definition here. Mainly I used the four tables from my windmills test suite — nothing special or specifically designed for Group Replication. I’ve used this test a lot for Percona XtraDB Cluster in the past, so was a perfect fit.

Test Definition

The application will do very simple work, and I wanted to test four main cases:

  1. One thread performing one insert at each transaction
  2. One thread performing 50 batched inserts at each transaction
  3. Eight threads performing one insert to each transaction
  4. Eight threads performing 50 batched inserts at each transaction

As you can see, a pretty simple set of operations. Then I decided to test it using the following four conditions on the servers:

  1. One slave worker FC as default
  2. One slave worker FC set to 25
  3. Eight slave workers FC as default
  4. Eight slave workers FC set to 25

Again nothing weird or strange from my point of view. I used four nodes:

  1. Gr1 Writer
  2. Gr2 Reader
  3. Gr3 Reader minimal latency (~10ms)
  4. Gr4 Reader minimal latency (~10ms)

Finally, I had to be sure I measured the lag in a way that allowed me to reference it consistently on all nodes. 

I think we can safely say that the incoming GTID (last_ Received_transaction_set from replication_connection_status) is definitely the last change applied to the master that the slave node knows about. More recent changes could have occurred, but network delay can prevent them from being “received.” The other point of reference is GTID_EXECUTED, which refers to the latest GTID processed on the node itself.

The closest query that can track the distance will be:

select @last_exec:=SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX( @@global.GTID_EXECUTED,':',-2),':',1),'-',-1) last_executed;select  @last_rec:=SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX( Received_transaction_set,':',-2),':',1),'-',-1) last_received FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier'; select (@last_rec - @last_exec) as real_lag

Or in the case of a single worker:

select @last_exec:=SUBSTRING_INDEX(SUBSTRING_INDEX( @@global.GTID_EXECUTED,':',-1),'-',-1) last_executed;select  @last_rec:=SUBSTRING_INDEX(SUBSTRING_INDEX(Received_transaction_set,':',-1),'-',-1) last_received FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier'; select (@last_rec - @last_exec) as real_lag;

The result will be something like this:

+---------------+ | last_executed | +---------------+ | 23607         | +---------------+ +---------------+ | last_received | +---------------+ | 23607         | +---------------+ +----------+ | real_lag | +----------+ |        0 | +----------+

The whole set of tests can be found here, with all the commands you need to run the application (you can find it here) and replicate the tests. I will focus on the results (otherwise this blog post would be far too long), but I invite you to see the details.

The Results Efficiency on Writer by Execution Time and Rows/Sec

Using the raw data from the tests (Excel spreadsheet available here), I was interested in identifying if and how the Writer is affected by the use of Group Replication and flow control.

Reviewing the graph, we can see that the Writer has a linear increase in the execution time (when using default flow control) that matches the increase in the load. Nothing there is concerning, and all-in-all we see what is expected if the load is light. The volume of rows at the end justifies the execution time.

It’s a different scenario if we use flow control. The execution time increases significantly in both cases (single worker/multiple workers). In the worst case (eight threads, 50 inserts batch) it becomes four times higher than the same load without flow control.

What happens to the inserted rows? In the application, I traced the rows inserted/sec. It is easy to see what is going on there:

We can see that the Writer with flow control activated inserts less than a third of the rows it processes without flow control. 

We can definitely say that flow control has a significant impact on the Writer performance. To clarify, let’s look at this graph:

Without flow control, the Writer processes a high volume of rows in a limited amount of time (results from the test of eight workers, eight threads, 50 insert batch). With flow control, the situation changes drastically. The Writer takes a long time processing a significantly smaller number of rows/sec. In short, performance drops significantly.

But hey, I’m OK with that if it means having a consistent data-set cross all nodes. In the end, Percona XtraDB Cluster and similar solutions pay a significant performance price match the data-centric principle. 

Let’s see what happen on the other nodes.

Entries Lag

Well, this scenario is not so good:

When NOT using flow control, the nodes lag behind the writer significantly. Remember that by default flow control in Group Replication is set to 25000 entries (I mean 25K of entries!!!).

What happens is that as soon as I put some salt (see load) on the Writer, the slave nodes start to lag. When using the default single worker, that will have a significant impact. While using multiple workers, we see that the lag happens mainly on the node(s) with minimal (10ms) network latency. The sad thing is that is not really going down with respect to the single thread worker, indicating that the simple minimal latency of 10ms is enough to affect replication.

Time to activate the flow control and have no lag:

Unfortunately, this is not the case. As we can see, the lag of single worker remains high for Gr2 (154 entries). While using multiple workers, the Gr3/4 nodes can perform much better, with significantly less lag (but still high at ~1k entries).

It is important to remember that at this time the Writer is processing one-third or less of the rows it is normally able to. It is also important to note that I set 25 to the entry limit in flow control, and the Gr3 (and Gr4) nodes are still lagging more than 1K entries behind.

To clarify, let check the two graphs below:

Using the Writer (Master) as a baseline in entry #N, without flow control, the nodes (slaves) using Group Replication start to significantly lag behind the writer (even with a light load).

The distance in this PoC ranged from very minimal (with 58 entries), up to much higher loads (3849 entries):

Using flow control, the Writer (Master) diverges less, as expected. If it has a significant drop in performance (one-third or less), the nodes still lag. The worst-case is up to 1363 entries. 

I need to underline here that we have no further way (that I am aware of, anyway) to tune the lag and prevent it from happening.

This means an application cannot transparently split writes/reads and expect consistency. The gap is too high.

A Graph That Tells Us a Story

I used Percona Monitoring and Management (PMM) to keep an eye on the nodes while doing the tests. One of the graphs really showed me that Group Replication still has some “limits” as the replication mechanism for a cluster:

This graph shows the MySQL queries executed on all the four nodes, in the testing using 8-50 threads-batch and flow control. 

As you can see, the Gr1 (Writer) is the first one to take off, followed by Gr2. Nodes Gr3 and Gr4 require a bit more, given the binlog transmission (and 10ms delay). Once the data is there, they match (inconsistently) the Gr2 node. This is an effect of flow control asking the Master to slow down. But as previously seen, the nodes will never match the Writer. When the load test is over, the nodes continue to process the queue for additional ~130 seconds. Considering that the whole load takes 420 seconds on the Writer, this means that one-third of the total time on the Writer is spent syncing the slave AFTERWARDS.

The above graph shows the same test without flow control. It is interesting to see how the Writer moved above 300 queries/sec, while G2 stayed around 200 and Gr3/4 far below. The Writer was able to process the whole load in ~120 seconds instead 420, while Gr3/4 continue to process the load for an additional ~360 seconds.

This means that without flow control set, the nodes lag around 360 seconds behind the Master. With flow control set to 25, they lag 130 seconds.

This is a significant gap.

Conclusions

Going back to the reason why I was started this PoC, it looks like my application(s) are not a good fit for Group Replication given that I have set Percona XtraDB Cluster to scale out the reads and efficiently move my writer to another when I need to. 

Group Replication is still based on asynchronous replication (as my colleague Kenny said). It makes sense in many other cases, but it doesn’t compare to solutions based on virtually synchronous replication. It still requires a lot of refinement.

On the other hand, for applications that can afford to have a significant gap between writers and readers it is probably fine. But … doesn’t standard replication already cover that? 

Reviewing the Oracle documentations (https://dev.mysql.com/doc/refman/5.7/en/group-replication-background.html), I can see why Group Replication as part of the InnoDB cluster could help improve high availability when compared to standard replication. 

But I also think it is important to understand that Group Replication (and derived solutions like InnoDB cluster) are not comparable or a replacement for data-centric solutions as Percona XtraDB Cluster. At least up to now.

Good MySQL to everyone.

Platform End of Life (EOL) Announcement for RHEL 5 and Ubuntu 12.04 LTS

Upstream platform vendors have announced the general end of life (EOL) for Red Hat Enterprise Linux 5 (RHEL 5) and its derivatives, as well as Ubuntu 12.04 LTS. With this announcement comes some implications to support for Percona software running on these operating systems.

RHEL 5 was EOL as of March 31st, 2017 and Ubuntu 12.04 LTS was end of life as of April 28th, 2017. Pursuant to our end of life policies, we are announcing that these EOLs will go into effect for Percona software on August 1st, 2017. As of this date, we will no longer be producing new packages, binary builds, hotfixes, or bug fixes for Percona software on these platforms.

We generally align our platform end of life dates with those of the upstream platform vendor. The platform end of life dates are published in advance on our website under the page Supported Linux Platforms and Versions.

Per our policies, Percona will continue to provide operational support for your databases on EOLed platforms. However, we will be unable to provide any bug fixes, builds or OS-level assistance if you encounter an issue outside the database itself.

Each platform vendor has a supported migration or upgrade path to their next major release.  Please reach out to us if you need assistance in migrating your database to your vendor’s supported platform – Percona will be happy to assist you.

Webinar Wednesday August 2, 2017: MySQL Disk Encryption with LUKS

Join Percona’s, Senior Architect, Matthew Boehm as he presents MySQL Disk Encryption with LUKS on Wednesday, August 2, 2017, at 1:00 pm PDT / 4:00 pm EDT (UTC-7).

Register Now

Clients require strong security measures for PCI, HIPAA or PHI. You must encrypt MySQL “at rest” to satisfy the data managed under these standards. InnoDB’s built-in encryption features work, but there are some caveats to that solution.

In this talk, you’ll see how to encrypt your entire disk to protect everything from data, redo logs and binary logs.

Register for the webinar here.

Matthew Boehm, Architect

Matthew joined Percona in the fall of 2012 as a MySQL consultant. His areas of knowledge include the traditional Linux/Apache/MySQL/PHP stack, memcached, MySQL Cluster, massive sharding topologies, PHP development and a bit of MySQL-C-API development. Previously, Matthew DBAed for the 5th largest MySQL installation at eBay/PayPal, and also hails from managed hosting environments. During his off-hours, Matthew is a nationally-ranked competitive West Coast Swing dancer, and travels to competitions around the US. He enjoys working out, camping, biking and playing MMOs with his son.

How We Encrypt Data In MySQL With Go

A SaaS product needs to use security measures you might not ordinarily use in an on-premises solution. In particular, it’s important that all sensitive data be secured. Encryption plays an important role in information security. At VividCortex, we encrypt data in-flight and at-rest, so your sensitive data is never exposed.

We use Go and MySQL extensively at VividCortex and thought other Go programmers might be interested to see how we’ve integrated encryption into our services layer (APIs). (And if you'd like to learn more about programming with Go in general, please take a look at our free ebook The Ultimate Guide to Building Database-Driven Apps with Go.)


Image Source

Encryption Techniques

At a high level, you can think of two kinds of data encryption inside of MySQL or any similar data store. I’ll oversimplify for purposes of illustration. You can:

  1. Store the data in MySQL as normal, but encrypt the container that holds MySQL. Usually this means storing MySQL’s data on an encrypted disk volume. The protection? Broadly speaking, if someone gains access to a backup disk, they can’t see your data.
  2. Encrypt the data before sending it to MySQL. In this case the security boundary is pushed out further: even if someone gets access to the server, and can run SQL commands, they can’t see your data.

Each of these has advantages and disadvantages. These include ease of use, programmer overhead, ability to inspect (e.g. recovering from backups), searchability and indexability, and so on. There are a lot of things to consider here. Just a few:

  • Will data be exposed if backups are unencrypted? (Our backups are encrypted, by the way.)
  • Are sensitive values possibly in cleartext in query logs?
  • Will sensitive values be visible in status commands like SHOW FULL PROCESSLIST?

At VividCortex we err on the side of safety and security, rather than favoring convenience. There’s a fairly simple question that does a pretty good job of illustrating our goal: if someone succeeds in a SQL injection attack against our databases, will they see any sensitive data in cleartext? The answer needs to be “no.” This is a higher standard than on-disk encryption. It means that someone has to get access to the keys for the particular data they’re trying to decrypt, in order to see anything other than garbage. And those keys are not present or accessible on the database servers in any form, not even in-memory.

Making It Convenient

Convenience is important. If it’s too hard to do encryption, there’s an increased risk that it won’t be done. Fortunately, Go’s elegant interfaces for the database/sql package make the burden transparent to the programmer!

We learned how to do this from Jason Moiron’s excellent blog post on the Valuer and Scanner interfaces. Please read that if you haven’t yet.

To implement transparent encryption and decryption, we created a custom data type that implements the Valuer and Scanner interfaces. The implementation is straightforward and quite similar to Jason’s example of compressing and decompressing, except that we used encryption libraries instead.

Now our code is incredibly simple to use with encrypted values. All we do is define a variable of our custom type. For example, instead of

var password string err = rows.Scan(&password)

We simply use

var password EncryptedValue err = rows.Scan(&password)

It’s similarly simple to insert values encrypted into the database. Magic! This is why I often say that Go’s design, although it seems minimalistic at first, is actually very advanced and powerful.

"Go feels under-engineered because it only solves real problems." Exactly. http://t.co/18LhLT0ALB #golang

— VividCortex (@VividCortex) September 18, 2014 Nuts And Bolts

The code is small. The exact details of all the code are not all that important for this blog post; much of it is about things that are out of scope here. The gist of it, though, is that we store values as byte arrays:

  • The first byte is an indicator of the version of our encryption algorithm used, so there’s a clear migration path for changes.
  • The next four bytes indicate which key we used to encrypt this value, so we have 4 billion possible keys.
  • The rest is the encrypted payload.

We can even change this in the future. For example, we can switch on the first byte’s value, if we want, to determine whether the key ID is in the next 4 bytes, or if it’s something more, such as the next 8 bytes. So we can easily expand the number of keys we can indicate. We can also, if we ever hit version 255, use that to indicate that the version number continues in the next byte. This is a standard trick used, among other places, by the MySQL wire protocol.

The result is that we have a simple and future-proof way to encrypt values.

Alternative Approaches

In addition to the approaches we’ve mentioned, there are several others. There are commercial projects designed to help ease the encryption and decryption techniques you might otherwise wrap around MySQL and perhaps fumble in some ways. There are encryption functions inside of MySQL—but educate yourself about those before using them. There are others, too, but you should be able to find all you need with a search.

Conclusions

By using Go’s built-in interfaces, we created a solution for transparently encrypting values in our database so that it’s never in the database in cleartext, either on-disk or in-memory. The code is easy for programmers to use, which improves our security posture automatically. All sensitive data gets encrypted in-flight and at-rest, and an attacker would have to have extensive access to our systems (an SQL injection wouldn’t suffice) to be able to decrypt the data.

We highly recommend that you use the standard Go interfaces for the power they give you. And please, ask your SaaS providers, including us, hard questions about security and how it’s implemented. Every service needs to be secure to make the Internet a safer place.

If you liked this and want to learn more about Go, you might also like our webinar about developing database-driven apps with Go and MySQL. Click below to watch a recording.

Post Updated 7/31/2017

How We Encrypt Data In MySQL With Go

A SaaS product needs to use security measures you might not ordinarily use in an on-premises solution. In particular, it’s important that all sensitive data be secured. Encryption plays an important role in information security. At VividCortex, we encrypt data in-flight and at-rest, so your sensitive data is never exposed.

We use Go and MySQL extensively at VividCortex and thought other Go programmers might be interested to see how we’ve integrated encryption into our services layer (APIs). (And if you'd like to learn more about programming with Go in general, please take a look at our free ebook The Ultimate Guide to Building Database-Driven Apps with Go.)


Image Source

Encryption Techniques

At a high level, you can think of two kinds of data encryption inside of MySQL or any similar data store. I’ll oversimplify for purposes of illustration. You can:

  1. Store the data in MySQL as normal, but encrypt the container that holds MySQL. Usually this means storing MySQL’s data on an encrypted disk volume. The protection? Broadly speaking, if someone gains access to a backup disk, they can’t see your data.
  2. Encrypt the data before sending it to MySQL. In this case the security boundary is pushed out further: even if someone gets access to the server, and can run SQL commands, they can’t see your data.

Each of these has advantages and disadvantages. These include ease of use, programmer overhead, ability to inspect (e.g. recovering from backups), searchability and indexability, and so on. There are a lot of things to consider here. Just a few:

  • Will data be exposed if backups are unencrypted? (Our backups are encrypted, by the way.)
  • Are sensitive values possibly in cleartext in query logs?
  • Will sensitive values be visible in status commands like SHOW FULL PROCESSLIST?

At VividCortex we err on the side of safety and security, rather than favoring convenience. There’s a fairly simple question that does a pretty good job of illustrating our goal: if someone succeeds in a SQL injection attack against our databases, will they see any sensitive data in cleartext? The answer needs to be “no.” This is a higher standard than on-disk encryption. It means that someone has to get access to the keys for the particular data they’re trying to decrypt, in order to see anything other than garbage. And those keys are not present or accessible on the database servers in any form, not even in-memory.

Making It Convenient

Convenience is important. If it’s too hard to do encryption, there’s an increased risk that it won’t be done. Fortunately, Go’s elegant interfaces for the database/sql package make the burden transparent to the programmer!

We learned how to do this from Jason Moiron’s excellent blog post on the Valuer and Scanner interfaces. Please read that if you haven’t yet.

To implement transparent encryption and decryption, we created a custom data type that implements the Valuer and Scanner interfaces. The implementation is straightforward and quite similar to Jason’s example of compressing and decompressing, except that we used encryption libraries instead.

Now our code is incredibly simple to use with encrypted values. All we do is define a variable of our custom type. For example, instead of

var password string err = rows.Scan(&password)

We simply use

var password EncryptedValue err = rows.Scan(&password)

It’s similarly simple to insert values encrypted into the database. Magic! This is why I often say that Go’s design, although it seems minimalistic at first, is actually very advanced and powerful.

"Go feels under-engineered because it only solves real problems." Exactly. http://t.co/18LhLT0ALB #golang

— VividCortex (@VividCortex) September 18, 2014 Nuts And Bolts

The code is small. The exact details of all the code are not all that important for this blog post; much of it is about things that are out of scope here. The gist of it, though, is that we store values as byte arrays:

  • The first byte is an indicator of the version of our encryption algorithm used, so there’s a clear migration path for changes.
  • The next four bytes indicate which key we used to encrypt this value, so we have 4 billion possible keys.
  • The rest is the encrypted payload.

We can even change this in the future. For example, we can switch on the first byte’s value, if we want, to determine whether the key ID is in the next 4 bytes, or if it’s something more, such as the next 8 bytes. So we can easily expand the number of keys we can indicate. We can also, if we ever hit version 255, use that to indicate that the version number continues in the next byte. This is a standard trick used, among other places, by the MySQL wire protocol.

The result is that we have a simple and future-proof way to encrypt values.

Alternative Approaches

In addition to the approaches we’ve mentioned, there are several others. There are commercial projects designed to help ease the encryption and decryption techniques you might otherwise wrap around MySQL and perhaps fumble in some ways. There are encryption functions inside of MySQL—but educate yourself about those before using them. There are others, too, but you should be able to find all you need with a search.

Conclusions

By using Go’s built-in interfaces, we created a solution for transparently encrypting values in our database so that it’s never in the database in cleartext, either on-disk or in-memory. The code is easy for programmers to use, which improves our security posture automatically. All sensitive data gets encrypted in-flight and at-rest, and an attacker would have to have extensive access to our systems (an SQL injection wouldn’t suffice) to be able to decrypt the data.

We highly recommend that you use the standard Go interfaces for the power they give you. And please, ask your SaaS providers, including us, hard questions about security and how it’s implemented. Every service needs to be secure to make the Internet a safer place.

If you liked this and want to learn more about Go, you might also like our webinar about developing database-driven apps with Go and MySQL. Click below to watch a recording.

Post Updated 7/31/2017

MySQL 8.0.2 More Flexible Undo Tablespace Management

In MySQL 8.0.2 DMR we will introduce features which make managing undo tablespaces easier in InnoDB.

The main improvement is that you can now create and drop undo tablespaces at any time.  You can change the config file setting before any startup, whether recovery is needed or not. …

What’s New in MariaDB Connector/C 3.0

What’s New in MariaDB Connector/C 3.0 RalfGebhardt Mon, 07/31/2017 - 05:41

We are pleased to announce the general availability (GA) of MariaDB Connector/C 3.0. MariaDB Connector/C 3.0.2 is the newest version of MariaDB Connector/C. This release is compatible with MariaDB Connector/C 2.3 - no code changes necessary to upgrade.

MariaDB Connector/C 3.0 includes new security enhancements, plugins and API functions.

Security

In addition to OpenSSL, MariaDB Connector/C 3.0 now supports:

  • GnuTLS
  • Windows SChannel: removes dependencies on external libraries
  • Windows SChannel: becomes the default for SSL on Windows
  • TLSv1.1 and TLSv1.2 support
  • Passphrase protected private keys
Plugins
  • All plugins can either be linked statically or built as shared objects (or dynamic link libraries on Windows)
  • Pluggable Virtual IO (PVIO) for communication via socket, named pipe and shared memory
  • Connection plugins, e.g for Aurora failover or replication (master write, slave read)
  • Remote IO plugin, which allows to access remote files (via http, https, ftp, ldap, ...)
  • Trace plugin (for analyzing and dumping network traffic)
  • New GSSAPI authentication plugin
New API Functions

MariaDB Connector/C 3.0 is introducing the following new API functions:

  • Bulk operations (array binding) for prepared statements (insert, update, delete).
  • support for extended client/server capabilities (requires MariaDB 10.2 or newer)
  • mariadb_get_charset_by_name and mariadb_get_charset_by_nr, which return charset information for a given internal number or name of character set. These functions have been previously used internally by MariaDB Connector/ODBC and are now exported, so they can be used also within plugins.
  • mariadb_stmt_execute_direct prepares and executes in one step (mainly used by MariaDB ODBC driver)
  • mariadb_cancel aborts a connection immediately by making all subsequent read/write operations fail
  • mysql_get_option and mysql_get_optionv (variable argument list) for obtaining option values for a given connection.
  • mysql_reconnect which was used internally before (if the option MYSQL_OPT_RECONNECT was set) is now part of the API and can be used by applications and plugins to re-establish a failing connection
  • mysql_reset_connection resets the current connection and clears session state
  • mysql_stmt_warning_count returns warnings per statement
  • Functions for obtaining session state changes:
    • mysql_session_track_get_first
    • mysql_session_track_get_next
  • Added tls_version support for SChannel. tls_version has to be specified via mysql_options(mysql, MARIADB_OPT_TLS_VERSION, ...)
  •  

Download the MariaDB Connector now and learn about the newest evolution of MariaDB Connector/C 3.0.

Download Knowledge Base

We are pleased to announce the general availability (GA) of MariaDB Connector/C 3.0. MariaDB Connector/C 3.0.2 is the newest version of MariaDB Connector/C. This release is compatible with MariaDB Connector/C 2.3

Login or Register to post comments

Cassandra Query Language

After installing Cassandra and reading Cassandra The Definitive Guide, it struck me that I should learn a bit more about the Cassandra Query Language (CQL). So, after I setup a single-node environment and created a .bashcassandra environment file to connect as a student user to the Cassandra instance:

# Add the Java and JRE paths to the $PATH environments. export set PATH=$PATH:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.45.x86_64:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.45.x86_64/jre # Add the $JAVA_HOME and $JRE_HOME environment variables. export set JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.45.x86_64/ export set JRE_HOME=/usr

Having started Cassandra as the cassandra user, I connected to the Cassandra Query Language Shell (cqlsh) to learn how to write CQL. You can find the basic structure of the Cassandra Query Language (CQL) on the Apache Cassandra website. I also discovered that CQL by itself can’t let you join tables without using Apache SparkSQL. Apache SparkSQL adds the ability to perform CQL joins in Cassandra, and became available in 2015.

I also learned you can’t use a CREATE OR REPLACE command when you change certain aspects of User-Defined Functions (UDFs). You actually need to drop any UDF before you change RETURNS NULL ON NULL INPUT clause to a CALLED ON NULL INPUT clause or vice versa. You can’t embed Java that connects to database without using the cassandra-java-driver-2.0.2 driver.

You connect to the cqlsh like this:

cqlsh

Here’s my script that creates Cassandra keyspace, which is more or less a database. You use the USE command to connect to the keyspace or database, like you would in MySQL. You do not have sequences in Cassandra because they’re not a good fit for a distributed architecture. Cassandra does not support a native procedural extension like relational databases. You must create User-defined functions (UDFs) by embedding the logic in Java.

This script does the following:

  • Creates a keyspace
  • Uses the keyspace
  • Conditionally drops tables and functions
  • Creates two tables
  • Inserts data into the two tables
  • Queries data from the tables

I also included a call to a UDF inside a query in two of the examples. One of the queries demonstrates how to return a JSON structure from a query. To simplify things and provide clarification of the scripts behaviors, the details are outlined below.

  • The first segment of the script creates the keyspace, changes the scope to use the keyspace, conditionally drop tables, create tables, and insert values into the tables:

    /* Create a keyspace in Cassandra, which is like a database in MySQL or a schema in Oracle. */ CREATE KEYSPACE IF NOT EXISTS student WITH REPLICATION = { 'class':'SimpleStrategy' ,'replication_factor': 1 } AND DURABLE_WRITES = true; /* Use the keyspace or connect to the database. */ USE student; /* Drop the member table from the student keyspace. */ DROP TABLE IF EXISTS member; /* Create a member table in the student keyspace. */ CREATE TABLE member ( member_number VARCHAR , member_type VARCHAR , credit_card_number VARCHAR , credit_card_type VARCHAR , PRIMARY KEY ( member_number )); /* Conditionally drop the contact table from the student keyspace. */ DROP TABLE IF EXISTS contact; /* Create a contact table in the student keyspace. */ CREATE TABLE contact ( contact_number VARCHAR , contact_type VARCHAR , first_name VARCHAR , middle_name VARCHAR , last_name VARCHAR , member_number VARCHAR , PRIMARY KEY ( contact_number )); /* Insert a row into the member table. */ INSERT INTO member ( member_number, member_type, credit_card_number, credit_card_type ) VALUES ('SFO-12345','GROUP','2222-4444-5555-6666','VISA'); /* Insert a row into the contact table. */ INSERT INTO contact ( contact_number, contact_type, first_name, middle_name, last_name, member_number ) VALUES ('CUS_00001','FAMILY','Barry', NULL,'Allen','SFO-12345'); /* Insert a row into the contact table. */ INSERT INTO contact ( contact_number, contact_type, first_name, middle_name, last_name, member_number ) VALUES ('CUS_00002','FAMILY','Iris', NULL,'West-Allen','SFO-12345'); /* Insert a row into the member table. */ INSERT INTO member ( member_number, member_type, credit_card_number, credit_card_type ) VALUES ('SFO-12346','GROUP','3333-8888-9999-2222','VISA'); /* Insert a row into the contact table. */ INSERT INTO contact ( contact_number, contact_type, first_name, middle_name, last_name, member_number ) VALUES ('CUS_00003','FAMILY','Caitlin','Marie','Snow','SFO-12346');
  • The following queries the member table:

    /* Select all columns from the member table. */ SELECT * FROM member;

    It returns the following:

    member_number | credit_card_number | credit_card_type | member_type ---------------+---------------------+------------------+------------- SFO-12345 | 2222-4444-5555-6666 | VISA | GROUP SFO-12346 | 3333-8888-9999-2222 | VISA | GROUP
  • Create a concatenate User-defined function (UDF) for Cassandra. The first step requires you to edit the cassandra.yaml file, which you find in the /etc/cassandra/default.conf directory. There is a single parameter that you need to edit, and it is the enable_user_defined_functions parameter. By default the parameter is set to false, and you need to enable it to create UDFs.

    After you make the edit, the cassandra.yaml file should look like this:

    # If unset, all GC Pauses greater than gc_log_threshold_in_ms will log at # INFO level # UDFs (user defined functions) are disabled by default. # As of Cassandra 3.0 there is a sandbox in place that should prevent execution of evil code. enable_user_defined_functions: true

    After you make the change, you can create your own UDF. The following UDF formats the first, middle, and last name so there’s only one whitespace between the first and last name when there middle name value is null.

    This type of function must use a CALLED ON NULL INPUT clause in lieu of a RETURNS NULL ON NULL INPUT clause. The latter would force the function to return a null value if any one of the parameters were null.

    /* Drop the concatenate function because a replace disallows changing a RETURNS NULL ON NULL INPUT with a CALLED ON NULL INPUT without raising an "89: InvalidRequest" exception. */ DROP FUNCTION concatenate; /* Create a user-defined function to concatenate names. */ CREATE OR REPLACE FUNCTION concatenate (first_name VARCHAR, middle_name VARCHAR, last_name VARCHAR) CALLED ON NULL INPUT RETURNS VARCHAR LANGUAGE java AS $$ /* Concatenate first and last names when middle name is null, and first, middle, and last names when middle name is not null. */ String name; /* Check for null middle name. */ if (middle_name == null) { name = first_name + " " + last_name; } else { name = first_name + " " + middle_name + " " + last_name; } return name; $$;
  • Query the values from the contact table with the UDF function in the SELECT-list:

    /* Query the contact information. */ SELECT member_number , contact_number , contact_type , concatenate(first_name, middle_name, last_name) AS full_name FROM contact;

    It returns the following:

    member_number | contact_number | contact_type | full_name ---------------+----------------+--------------+-------------------- SFO-12345 | CUS_00001 | FAMILY | Barry Allen SFO-12345 | CUS_00002 | FAMILY | Iris West-Allen SFO-12346 | CUS_00003 | FAMILY | Caitlin Marie Snow
  • Query the values from the contact table with a JSON format:

    /* Query the contact information and return in a JSON format. */ SELECT JSON contact_number , contact_type , concatenate(first_name, middle_name, last_name) AS full_name FROM contact;

    It returns the following:

    [json] ------------------------------------------------------------------------------------------------- {"contact_number": "CUS_00001", "contact_type": "FAMILY", "full_name": "Barry Allen"} {"contact_number": "CUS_00002", "contact_type": "FAMILY", "full_name": "Iris West-Allen"} {"contact_number": "CUS_00003", "contact_type": "FAMILY", "full_name": "Caitlin Marie Snow"}

You can call the script from a relative directory inside cqlsh, like this:

source 'cstudent.cql'

At the end of the day, the concept of adding and removing nodes is attractive. Though, the lack of normal relational mechanics and narrowly supported set of CQL semantics leaves me with open questions. For example, is clustering without a coordinator really valuable enough to settle for eventual, or tunable, consistency with such a narrowly scoped query language?

As always, I hope this helps those looking for a quick how-to on Cassandra.

How to Find Processlist Thread id in gdb

I was involved in a discussion on some complex MySQL-related problem where we had to study backtraces of all threads in gdb (produced by the thread apply all bt command if you ever forgets this) in a hope to find out why MySQL hangs. In the process the question appeared on how to find the thread id for each thread to match it against previous collected outputs of SHOW PROCESSLIST. and SHOW ENGINE INNODB STATUS.

I assumed I know the answer, as I had to find this out recently enough for this blog post (and before that for the real customer case). The idea is simple. Find a frame where function has a parameter of THD * type (usually named thd), like this:
#10 0x0000000000cb47fe in do_command (thd=0x7f32512b7000)
    at /usr/src/debug/percona-server-5.7.18-15/percona-server-5.7.18-15/sql/sql_parse.cc:960and check thread_id item of this structure.

In that my blog post it looked as simple as just referring to thd of do_command's frame without even checking much:
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f7f5ce02b00 (LWP 9232))]
#0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
238     ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S: No such file or directory.
(gdb) p do_command::thd->thread_id
$9 = 14I prefer to double check my suggestions before making them, so I immediately tried this with my CentOS 6.9 VM running recent Percona Server 5.7.x by default since that times when I worked at Percona:
[root@centos ~]# gdb -p `pidof mysqld`
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-92.el6)
...
Loaded symbols for /usr/lib64/mysql/plugin/tokudb_backup.so
0x00007f550ad35383 in poll () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.209.el6_9.2.x86_64 jemalloc-3.6.0-1.el6.x86_64 keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libaio-0.3.107-10.el6.x86_64 libcom_err-1.41.12-23.el6.x86_64 libgcc-4.4.7-18.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 libstdc++-4.4.7-18.el6.x86_64 nss-softokn-freebl-3.14.3-23.3.el6_8.x86_64 numactl-2.0.9-2.el6.x86_64 openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) thread 1
[Switching to thread 1 (Thread 0x7f550d2b2820 (LWP 1978))]#0  0x00007f550ad35383 in poll () from /lib64/libc.so.6
(gdb) p do_command::thd->thread_id
No frame is currently executing in block do_command(THD*).
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f54d837b700 (LWP 2183))]#0  0x00007f550ad35383 in poll () from /lib64/libc.so.6
(gdb) p do_command::thd->thread_id
Cannot take address of method thread_id.
(gdb) call do_command::thd->thread_id()
Cannot evaluate function -- may be inlinedAs you can see I started to check threads one by one and apply that good old trick. Thread 1 had no frame executing do_command(), but I did not gave up and proceeded to the next thread, as I knew I had at least one active connection (I checked the output of SHOW PROCESSLIST). There I had a surprise, no way to get thread_id of thd. I used tab completion, so I know that thread_id (variable or method) exists, but attempt to call it also failed as you can see.

This is a problem with using gdb-based "tricks" over the code that evolves/changes in time. Last time I used p do_command::thd->thread_id it was for MariaDB 10.1.x probably, and the item was there. But in MySQL 5.7 (and all forks based on it) there were many code changes, so we should be ready to changes in unexpected places.

I had not added more comments on finding thread id to that discussion, made a note to myself and then, later, decided to check the source code of MySQL 5.7 (I did not have Percona 5.7 one at hand, but they hardly differs in such basic details) to find out what had changed in the THD structure so that thread_id is not longer just a variable. I expect to see the structure defined in sql/sql_class.h from the past, but grep will help to find this out even if it's no longer the case:
[root@centos mysql-server]# grep -n "class THD" sql/*.h
sql/debug_sync.h:27:class THD;
sql/derror.h:24:class THD;
sql/event_data_objects.h:40:class THD;
...
sql/sql_class.h:1412:class THD :public MDL_context_owner,
sql/sql_class.h:4175:    raise_error() or raise_warning() methods provided by class THD.
sql/sql_cmd.h:25:class THD;
... I found the following there:
class THD :public MDL_context_owner,
           public Query_arena,
           public Open_tables_state
{
...
private:
  my_thread_id  m_thread_id;public:
...
  /**
    Assign a value to m_thread_id by calling
    Global_THD_manager::get_new_thread_id().
  */
  void set_new_thread_id();
  my_thread_id thread_id() const { return m_thread_id; }...So, in MySQL 5.7 thread_id() is, indeed, a method that was inlined, and essentially it returns private m_thread_id item. Benefits of C++... I had highlighted Global_THD_manager singleton also as during my next gdb sessions I had found out that simple global list of threads is also gone and in 5.7 everything is done via that Global_THD_manager. This is a topic for some other post, though.

At least now I know what to do in gdb:
...
(gdb) thread 7[Switching to thread 7 (Thread 0x7f54d8236700 (LWP 2275))]#0  0x00007f550ad35383 in poll () from /lib64/libc.so.6
(gdb) p do_command::thd->m_thread_id
$1 = 86
(gdb) p do_command::thd->m_main_security_ctx
$3 = {m_user = {m_ptr = 0x7f5500fdaf90 "myuser", m_length = 6,
    m_charset = 0x1ded640, m_alloced_length = 8, m_is_alloced = true},
  m_host = {m_ptr = 0x7f54d98ab090 "localhost", m_length = 9,
    m_charset = 0x1ded640, m_alloced_length = 16, m_is_alloced = true},
  m_ip = {m_ptr = 0x7f54f0eb0210 "127.0.0.1", m_length = 9,
    m_charset = 0x1ded640, m_alloced_length = 16, m_is_alloced = true},
  m_host_or_ip = {m_ptr = 0x7f54d98ab090 "localhost", m_length = 9,
    m_charset = 0x1ded640, m_alloced_length = 0, m_is_alloced = false},
  m_external_user = {m_ptr = 0x15167ab "", m_length = 0,
    m_charset = 0x1ded640, m_alloced_length = 0, m_is_alloced = false},
  m_priv_user = "myuser", '\000' <repeats 89 times>, m_priv_user_length = 6,
  m_proxy_user = '\000' <repeats 161 times>, m_proxy_user_length = 0,
  m_priv_host = "localhost", '\000' <repeats 51 times>,
  m_priv_host_length = 9, m_master_access = 1589248, m_db_access = 0,
  m_password_expired = false}
...So, I know that specific thread  7 was for a session with Id 86 in the output of SHOW PROCESSLIST, and (from m_main_security_ctx, also a new name for old things in 5.7) I know it was a session of myuser connecting locally.

To summarize, there were notable changes in MySQL 5.7 in THD structure and threads management-related code in general, so make sure to re-check your "old gdb tricks" when you start working with 5.7. Reading the code helps.

Unfortunately (for gdb beginners like me) a lot of C++ approaches were introduced, including singletons, iterators based on templates instead of simple double linked lists etc, so one has to work hard to adapt to these. I hope to discuss some of my further findings and new "C++ specific" and "MySQL 5.7 specific" approaches studying MySQL in gdb in my upcoming posts.

Percona Server for MySQL 5.7.18-16 Is Now Available

Percona is glad to announce the GA release of Percona Server for MySQL 5.7.18-16 on July 28, 2017 (Downloads are available here and from the Percona Software Repositories).

Based on MySQL 5.7.18, including all the bug fixes in it, Percona Server for MySQL 5.7.18-16 is the current GA release in the Percona Server for MySQL 5.7 series. All of Percona‘s software is open-source and free, and you can find all the release details in the 5.7.18-16 milestone at Launchpad

Please note that RHEL 5, CentOS 5 and Ubuntu versions 12.04 and older are not supported in future releases of Percona Server and no further packages are added for these distributions.

New Features:

  • Percona Server for MySQL is now available on Debian 9 (stretch). The support only covers the amd64 architecture.
  • Percona Server for MySQL can now be built with the support of OpenSSL 1.1.
  • MyRocks storage engine has been merged into Percona Server.
  • TokuDB enables to kill a query that is awaiting an FT locktree lock.
  • TokuDB enables using the MySQL DEBUG_SYNC facility within Percona FT.

Bugs Fixed:

  • Row counts in TokuDB could be lost intermittently after restarts. Bug fixed #2.
  • In TokuDB, two races in the fractal tree lock manager could significantly affect transactional throughput for some applications that used a small number of concurrent transactions. These races manifested as transactions unnecessarily waiting for an available lock. Bug fixed #3.
  • Percona FT could assert when opening a dictionary with no useful information to an error log. Bug fixed #23.
  • Percona FT could assert for various reasons deserializing nodes with no useful error output. Bug fixed #24.
  • It was not possible to build Percona Server on Debian 9 (stretch) due to issues with OpenSSL 1.1. Bug fixed #1702903 (upstream #83814).
  • Packaging was using the dpkg --verify command which is not available on wheezy/precise. Bug fixed #1694907.
  • Enabling and disabling the slow query log rotation spuriously added the version suffix to the next slow query log file name. Bug fixed #1704056.
  • With two client connections to a server (debug server build), the server could crash after one of the clients set the global option userstat and flushed the client statistics (FLUSH CLIENT_STATISTICS) and then both clients were closed. Bug fixed #1661488.
  • Percona FT did not pass cmake flags on to snappy cmake. Bug fixed #41. The progress status for partitioned TokuDB table ALTERs was misleading. Bug fixed #42.
  • When a client application is connecting to the Aurora cluster end point using SSL (--ssl-verify-server-cert or --ssl-mode=VERIFY_IDENTITY option), wildcard and SAN enabled SSL certificates were ignored. Note that the --ssl-verify-server-cert option is deprecated in Percona Server 5.7. Bug fixed #1673656 (upstream #68052).
  • Killing a stored procedure execution could result in an assert failure on a debug server build. Bug fixed #1689736 (upstream #86260).
  • The SET STATEMENT .. FOR statement changed the global instead of the session value of a variable if the statement occurred immediately after the SET GLOBAL or SHOW GLOBAL STATUS command. Bug fixed #1385352.
  • When running SHOW ENGINE INNODB STATUS, the Buffer pool size, bytes entry contained 0. BUg fixed #1586262.
  • The synchronization between the LRU manager and page cleaner threads was not done at shutdown. Bug fixed #1689552.
  • Spurious lock_wait_timeout_thread wakeup in lock_wait_suspend_thread() could occur. Bug fixed #1704267 (upstream #72123).

Other bugs fixed: #1686603#6#44#65#1160986#1686934#1688319#1689989#1690012#1691682#1697700#1699788#1121072, and #1684601 (upstream #86016).

The release notes for Percona Server for MySQL 5.7.18-16 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

Note

Due to new package dependency, Ubuntu/Debian users should use apt-get dist-upgrade or apt-get install percona-server-server-5.7 to upgrade.

Node.js and MySQL on the Oracle Cloud

Let's explore how to deploy a node.js app with MySQL backend on the Oracle Cloud. I want to cover several aspects:
  • How to deploy and initialize a MySQL instance
  • How to adapt the source code
  • How to deploy the application
  • How to scale the application
There are different ways to configure this. I tested the easiest deployment with MySQL Cloud Service and the Application Container Cloud for the node.js part. All configurations are done via the cloud web GUI. There is also a REST interface available. But let's keep that for later.
If you don't have access to the Oracle Cloud you can get a trial access here.
How to deploy a MySQL instanceOnce you logged into the Oracle cloud you can create new instances from the dashboard. The following screenshots describe the process.


On the next screen we upload the public key for admin access to the instance. Either upload your own public key or generate a new key pair. (If you generate a new key pair you need to download the private key to your local machine.)

I skipped the backup and monitoring configurations for this demo. Let's focus on the application instead. After creating the instance (approx. 10 min) you can navigate via the dashboard to this instance and get the IP address. This is needed for the next step.
To initialize the database I ran this little script that runs ssh to the instance (using the private key), switch user to "oracle" and then call the MySQL CLI to run a few SQL statements.

How to adapt the source code
    The Application Container Cloud sets a few environment variables that should be used inside the application to adapt to the environment. In my case this are the following variables:
    • PORT is the port number that the application should listen on
    • MYSQLCS_USER_NAME is the MySQL user name for the database backend
    • MYSQLCS_USER_PASSWORD is the corresponding password
    • MYSQLCS_CONNECT_STRING is the hostname and port of the database backend
    I could have hardcoded the database connection parameters but that is inflexible if the architecture changes. So let's use these variables. The Node.js code looks like this:

    How to deploy the applicationThere are two simple steps needed: Creating an application and defining service bindings. In my case the only service binding is the MySQL backend. But one step after the other. First let's create the application. First you need to create a manifest.json file to describe the application. Here is mine:
    Ideally you create a zip archive with the source code, the manifest.json file and all other resources that your application needs. If you want to use my zip archive, feel free. You find it on GitHub. From the Oracle Cloud Dashboard click on "create instance -> application container" and then select "Create Instance" and "Node.js". (Java SE, Java EE, Python, Ruby and PHP are available as well.)

    On the pop-up you define the application artifacts, number of application instances and the memory per instance. After you click "create" the application is deployed automatically within a few minutes.
    The last step is to connect the application service with the database backend. To achieve that, click on the application in the application overview page. Here you find the URL under which your application will be available. And on the left hand side you see three tabs:

    Overview, Deployments and Administration. Click on "Deployments". Here you can add the service binding as described in the following screenshot:

    After modifying the service bindings you have to click "Apply changes". This will restart the application instances. Obviously needed because now the environment variables for the database backend are set correctly.
    That's it. We have an application. The URL to access the new app is listed in the application overview tab. Because this URL is not so nice for offering a short url service, I registered a new domain and forwarded that to the anota application. Maybe it is still running? Check here. How to scale the applicationThis is really easy. On the application overview tab you can just increase the number of instances and the memory per instance. After applying the changes, the Application Container Cloud platform will deploy new instances, stop spare instances or reconfigure the existing instances. If you use my ANOTA application, go to the report page. The last line prints the hostname of the application server. Requests are automatically load balanced between the available application instances.  Summary There are some minor changes to the application to run on the Oracle Cloud Platform: Reading the port variable and database connection parameters from the provided environment variables and that's it. Deployment is really easy via the GUI. And scalability is so simple now that the full Oracle Cloud Plattform is available and can be provisioned within minutes. 

    What is MySQL Partitioning?

    In this blog, we’ll quickly look at MySQL partitioning.

    Partitioning is a way in which a database (MySQL in this case) splits its actual data down into separate tables, but still get treated as a single table by the SQL layer.

    When partitioning, it’s a good idea to find a natural partition key. You want to ensure that table lookups go to the correct partition or group of partitions. This means that all SELECT, UPDATE, DELETE should include that column in the WHERE clause. Otherwise, the storage engine does a scatter-gather, and queries ALL partitions in a UNION that is not concurrent.

    Generally, you must add the partition key into the primary key along with the auto increment, i.e., PRIMARY KEY (part_id,id). If you don’t have well-designed and small columns for this composite primary key, it could enlarge all of your secondary indexes.

    You can partition by range or hash. Range is great because you have groups of known IDs in each table, and it helps when querying across partition IDs. This still can create hotspots in the newest partition, as all new inserts go there. Partitioning by hash “load balances” the table, and allows you to write to partitions more concurrently. This makes range queries on the partition key a bad idea.

    In MySQL 5.7, partitioning became native to the store engine and deprecated the old method where MySQL itself had to handle the partitions. This means InnoDB partitions (and a larger amount of partitions) are a better choice than in the past.

    As with all features and recommendations, this only makes sense if it helps your data and workload!

    MySQL Group Replication: who is the primary master – updated!

    Some time ago I wrote this article explaining how to find which node was the Primary Master in a MySQL Group Replication cluster running in Single-Primary Mode.

    In the latest release of Group Replication, MySQL 8.0.2 dmr, Jaideep improved the visibility of Group Replication extending the performance_schema tables (see his article).

    Thanks to these improvements, it’s now very easy to find which host is acting a Primary-Master. This is the query you can use:

    mysql-sql> SELECT MEMBER_HOST as `PRIMARY` FROM performance_schema.replication_group_members WHERE MEMBER_ROLE='PRIMARY'; +---------+ | PRIMARY | +---------+ | mysql3 | +---------+ 1 row in set (0.00 sec)

    Much easier isn’t it ?

    I let you discover the other improvements and features in MySQL 8.0.2 and don’t forget that your feedback is very important !

    High Availability with Multi-Source Replication in MariaDB Server

    High Availability with Multi-Source Replication in MariaDB Server gerrynarvaja Wed, 07/26/2017 - 19:29

    Multi-source replication is a handy way to simplify your high availability database setup – especially as compared to the regular MySQL replication approach. Let's look at how it works.

    (Note: This post is a revision of one from 2014, with minor updates made.)

    The Basics

    For our example scenario, we'll use a three-server topology, which is commonly employed for simple failover. (I used the same setup in my post on how to enable GTIDs in MariaDB Server.) A is the active master, B is a standby master set up to replicate from A, and C is a multi-purpose slave replicating from A. I also set up A to replicate from B. This way, if A fails, the system will start writing to B, and once A comes back up it will recover all the transactions from B through replication. Typically this setup is used in combination with a tool like Master High Availability Manager (MHA) to migrate the slave from one master to the next.

    For this particular case, assuming that the application(s) write to the active master only, we're going to set the domain ID for all servers to 1.

    SET GLOBAL gtid_domain_id = 1;

    We'll do the same in the my.cnf files:

    gtid-domain-id=1 Set Up the Slave to Replicate from Two Masters

    With the basics taken care of, we now will set up server C to replicate from both server A (active master) and server B (standby master). Keep in mind that now each transaction that reaches server C will come from two different sources. For example, if you issue an INSERT on the active master A, it will replicate to C and B. Since C is also B's slave, the same transaction will replicate from B to C. By default, C will apply both transactions, which is redundant and can lead to data inconsistencies and errors. To avoid this, it is necessary to set the gtid_ignore_duplicates variable to ON.

    SET GLOBAL gtid_ignore_duplicates=ON;

    And do the same in the my.cnf files:

    gtid-ignore-duplicates=ON

    This way, when C receives the same transaction through two different slave connections, it will verify the GTID and only apply it once.

    The next step is to set the second slave connection. In my previous article, I set up the gtid_slave_pos global variable. Since server C is already replicating successfully from server A, this variable already has the proper value and there is no need to set it up manually. We only need to define the new connection using CHANGE MASTER TO:

    CHANGE MASTER 's-102' TO  MASTER_HOST='192.168.56.102', MASTER_USER='repl', MASTER_PASSWORD='repl', master_use_gtid=slave_pos;

    Notice that we add the label 's-102' after the MASTER reserved word. This string identifies the second replication connection. This identifier needs to be used for every replication command. So, to start replicating from server B, you would use START SLAVE like this:

    START SLAVE 's-102';

    In the same way, to check the status of this replication connection you would use:

    SHOW SLAVE 's-102' STATUS\G

    You can also issue the commands to all replication connections by using the keyword ALL; for example:

    SHOW ALL SLAVES STATUS\G STOP ALL SLAVES; START ALL SLAVES;

    Notice that when using the keyword ALL, you also have to use SLAVES (plural form). The STOP and START commands will show as many warnings as the number of replication connections:

    MariaDB [(none)]> START ALL SLAVES; Query OK, 0 rows affected, 2 warnings (0.02 sec) MariaDB [(none)]> SHOW WARNINGS; +-------+------+-----------------------+ | Level | Code | Message               | +-------+------+-----------------------+ | Note  | 1937 | SLAVE 's-102' started | | Note  | 1937 | SLAVE '' started      | +-------+------+-----------------------+ 2 rows in set (0.00 sec)

    After completing the setup, the resulting topology will look similar to this:

    +-----+         +-----+ |  A  | |  B  | +-----+         +-----+    |               |    \               /     \             /      V           V         +-----+         |  C  |         +-----+ Failures

    If server A goes away or the connection between server A and server C fails, server C will still receive all the transactions through the alternate replication connection to server B (labeled 's-102'). Once server A comes back online again, and/or server C can reconnect to it, server C will compare all the GTIDs coming from server A and ignore those which have already been applied.

    The advantage of using this approach over the traditional single-source replication is that now there is no need to re-source the slaves. When the active master fails and the applications start writing to the alternative masters, the slaves will continue to replicate from it, simplifying the management of failover scenarios.

    Multi-source replication is a handy way to simplify your high availability database setup – especially as compared to the regular MySQL replication approach. This blog post looks at how it works. 

    Login or Register to post comments

    Enabling GTIDs for Server Replication in MariaDB Server 10.2

    Enabling GTIDs for Server Replication in MariaDB Server 10.2 gerrynarvaja Wed, 07/26/2017 - 18:56

    I originally wrote this post in 2014, after the release of MariaDB Server 10.0. Most of what was in that original post still applies, but I've made some tweaks and updates since replication and high availability (HA) remain among the most popular MariaDB/MySQL features.

    Replication first appeared on the MySQL scene more than a decade ago, and as replication implementations became more complex over time, some limitations of MySQL’s original replication mechanisms started to surface. To address those limitations, MySQL v5.6 introduced the concept of global transaction identifiers (GTIDs), which enable some advanced replication features. MySQL DBAs were happy with this, but complained that in order to implement GTIDs you needed to stop all the servers in the replication group and restart them with the feature enabled. There are workarounds – for instance, Booking.com documented a procedure to enable GTIDs with little or no downtime, but it involves more complexity than most organizations are willing to allow. (Check out this blog post for more on how Booking.com handles replication and high availability.)

    MariaDB implements GTIDs differently from MySQL, making it possible to enable and disable them with no downtime. Here’s how.

    A Simple HA Implementation

    Let’s start with a common HA implementation, with three servers running MariaDB 10.0 or higher: One active master (server A), one passive master (server B), and a slave replicating from the active master (server C). The active and passive masters are set up to replicate master-master.

    I’m not showing it, but between servers A and B and the application you would want an additional layer to switch database traffic to server B in case A fails. Some organizations might deploy another slave replicating from B, or a mechanism to move server C to replicate from B, such as Master High Availability Manager (MHA), but let’s keep things simple here.

    Step 1 – Setting Up the Configuration Files

    GTIDs in MariaDB 10 have three parts: server ID, transaction ID, and domain ID. The server ID and transaction ID are similar in concept to those found in MySQL 5.6. In MariaDB, the server ID is a number and not a UUID, and it is taken from the server_id global variable. The domain ID is an important new concept for multisource replication, which you can read more about in the domain ID article in the MariaDB knowledgebase. In our case, this is the only variable we need to set up; the server ID should already be set up if replication is functional. Let’s use 1 for server A's domain ID, 2 for server B's and 3 for server C's by executing commands like the following on each of the servers:

    SET GLOBAL gtid_domain_id = 1;

    Keep in mind that each session can have its own value for gtid_domain_id, so you'll have to reset all existing connections to properly reset the gtid_domain_id. Finally, persist the values in the corresponding my.cnf files:

    # Domain = 1 for active master: server A gtid-domain-id=1 Step 2 – Changing Replication on the Slave

    Running SHOW MASTER STATUS on server A, the active master, shows the current coordinates for its binary log file:

    MariaDB [(none)]> SHOW MASTER STATUS\G *************************** 1. row *************************** File: mariadb-bin.000001 Position: 510 Binlog_Do_DB: Binlog_Ignore_DB: 1 row in set (0.00 sec)

    We can use these coordinates to find the information we need in order to use GTIDs from the active master by running this command:

    SELECT BINLOG_GTID_POS('mariadb-bin.000001', 510); +--------------------------------------------+ | BINLOG_GTID_POS('mariadb-bin.000001', 510) | +--------------------------------------------+ | 1-101-1 | +--------------------------------------------+ 1 row in set (0.00 sec)

    Note that the GTID can be an empty string; for clarity these examples work with non-empty GTID values. The result from the function call is the current GTID, which corresponds to the binary file position on the master. With this value, we can now modify the slave configuration on servers B and C, executing the following statements on each of them:

    STOP SLAVE; SET GLOBAL gtid_slave_pos = '1-101-1'; CHANGE MASTER TO master_use_gtid=slave_pos; START SLAVE;

    Check the slave status to see that the change has taken effect:

    MariaDB [mysql]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.56.101 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mariadb-bin.000001 Read_Master_Log_Pos: 510 Relay_Log_File: mysqld-relay-bin.000002 Relay_Log_Pos: 642 Relay_Master_Log_File: mariadb-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes ... Using_Gtid: Slave_Pos Gtid_IO_Pos: 1-101-1 1 row in set (0.00 sec)

    The last two lines of SHOW SLAVE STATUS indicate that the slave is now using GTIDs to track replication.

    Conclusion

    As you can see, the procedure to enable GTIDs is straightforward, and doesn’t require restarting servers or planning for downtime. If you want to revert back to regular replication using binary log position, you can do so by using RESET SLAVE on the slave and resetting the proper binary log coordinates the traditional way. In fact, once you update your servers to use MariaDB Server and review the binary log files with the mysqlbinlog, you'll notice that every transaction in the MariaDB binary logs has the GTID already included. For the binary log in the examples used in this article, here's what you see:

    # at 314 #140807 14:16:01 server id 101 end_log_pos 352 GTID 1-101-1 /*!100001 SET @@session.gtid_domain_id=1*//*!*/; /*!100001 SET @@session.server_id=101*//*!*/; /*!100001 SET @@session.gtid_seq_no=1*//*!*/;

    I hope that the ease of implementing GTIDs in MariaDB Server piques your curiosity and encourages you to explore the variety of replication features. For more on replication and other high availability/disaster recovery strategies, check out our white paper, High Availability with MariaDB TX: The Definitive Guide.

    Replication has been one of the most popular MySQL features since it made its way into the application more than a decade ago. Global Transaction IDs was introduced to make handling complex solutions easier. This blog post explains how MariaDB makes handling GTID simpler.

    Login or Register to post comments

    MariaDB Galera Cluster 5.5.57 and Connector/C 3.0.2 now available

    The MariaDB project is pleased to announce the availability of MariaDB Galera Cluster 5.5.57 as well as MariaDB Connector/C 3.0.2. See the release notes and changelogs for details. Download MariaDB Galera Cluster 5.5.57 Release Notes Changelog What is MariaDB Galera Cluster? MariaDB APT and YUM Repository Configuration Generator Download MariaDB Connector/C 3.0.2 Release Notes Changelog […]

    The post MariaDB Galera Cluster 5.5.57 and Connector/C 3.0.2 now available appeared first on MariaDB.org.

    What is innodb_autoinc_lock_mode and why should I care?

    In this blog post, we’ll look at what innodb_autoinc_lock_mode is and how it works.

    I was recently discussing innodb_autoinc_lock_mode with some colleagues to address issues at a company I was working with.

    This variable defines the lock mode to use for generating auto-increment values. The permissible values are 0, 1 or 2 (for “traditional”, “consecutive” or “interleaved” lock mode, respectively). In most cases, this variable is set to the default of 1.

    We recommend setting it to 2 when the BINLOG_FORMAT=ROW. With interleaved, INSERT statements don’t use the table-level AUTO-INC lock and multiple statements can execute at the same time. Setting it to 0 or 1 can cause a huge hit in concurrency for certain workloads.

    Interleaved (or 2) is the fastest and most scalable lock mode, but it is not safe if using STATEMENT-based replication or recovery scenarios when SQL statements are replayed from the binary log. Another consideration – which you shouldn’t rely on anyway – is that IDs might not be consecutive with a lock mode of 2. That means you could do three inserts and expect IDs 100,101 and 103, but end up with 100, 102 and 104. For most people, this isn’t a huge deal.

    If you are only doing simple inserts, this might not help you. I did a sysbench test on MySQL 5.7 in Amazon RDS with 100 threads and found no difference in performance or throughput between lock modes 1 and 2. It helps the most when you when the number of rows can’t be determined, such as with INSERT INTO…SELECT statements.

    You can find a longer form article in the manual, but I highly recommend setting this value to 2 if you are not using STATEMENT-based replication.

    Pages