Planet MySQL

Charset and Collation Settings Impact on MySQL Performance

Following my post MySQL 8 is not always faster than MySQL 5.7, this time I decided to test very simple read-only CPU intensive workloads, when all data fits memory. In this workload there is NO IO operations, only memory and CPU operations.

My Testing Setup

Environment specification

  • Release | Ubuntu 18.04 LTS (bionic)
  • Kernel | 4.15.0-20-generic
  • Processors | physical = 2, cores = 28, virtual = 56, hyperthreading = yes
  • Models | 56xIntel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz<
  • Memory Total | 376.6G
  • Provider | packet.net x2.xlarge.x86 instance

I will test two workloads, sysbench oltp_read_only and oltp_point_select varying amount of threads

sysbench oltp_read_only --mysql-ssl=off --report-interval=1 --time=300 --threads=$i --tables=10 --table-size=10000000 --mysql-user=root run

sysbench oltp_point_select --mysql-ssl=off --report-interval=1 --time=300 --threads=$i --tables=10 --table-size=10000000 --mysql-user=root run

The results for OLTP read-only (latin1 character set):

MySQL 5.7.25 MySQL 8.0.15 threads throughput throughput throughput ratio 1 1241.18 1114.4 1.11 4 4578.18 4106.69 1.11 16 15763.64 14303.54 1.10 24 21384.57 19472.89 1.10 32 25081.17 22897.04 1.10 48 32363.27 29600.26 1.09 64 39629.09 35585.88 1.11 128 38448.23 34718.42 1.11 256 36306.44 32798.12 1.11

The results for point_select (latin1 character set):

point select MySQL 5.7.25 MySQL 8.0.15 threads throughput throughput throughput ratio 1 31672.52 28344.25 1.12 4 110650.7 98296.46 1.13 16 390165.41 347026.49 1.12 24 534454.55 474024.56 1.13 32 620402.74 554524.73 1.12 48 806367.3 718350.87 1.12 64 1120586.03 972366.59 1.15 128 1108638.47 960015.17 1.15 256 1038166.63 891470.11 1.16

We can see that in the OLTP read-only workload, MySQL 8.0.15 is slower by 10%, and for the point_select workload MySQL 8.0.15 is slower by 12-16%.

Although the difference is not necessarily significant, this is enough to reveal that MySQL 8.0.15 does not perform as well as MySQL 5.7.25 in the variety of workloads that I am testing.

However, it appears that the dynamic of the results will change if we use the utf8mb4 character set instead of latin1.

Let’s compare MySQL 5.7.25 latin1 vs utf8mb4, as utf8mb4 is now default CHARSET in MySQL 8.0

But before we do that let’s take look also at COLLATION.

MySQL 5.7.25 uses a default collation utf8mb4_general_ci, However, I read that to use proper sorting and comparison for Eastern European languages, you may want to use the utf8mb4_unicode_ci collation. For MySQL 8.0.5 the default collation is

So let’s compare each version latin1 vs utf8mb4 (with default collation). First 5.7:

Threads utf8mb4_general_ci latin1 latin1 ratio 4 2957.99 4578.18 1.55 24 13792.55 21384.57 1.55 64 24516.99 39629.09 1.62 128 23977.07 38448.23 1.60

So here we can see that utf8mb4 in MySQL 5.7 is really much slower than latin1 (by 55-60%)

And the same for MySQL 8.0.15

Threads utf8mb4_0900_ai_ci (default) latin1 latin1 ratio 4 3968.88 4106.69 1.03 24 18446.19 19472.89 1.06 64 32776.35 35585.88 1.09 128 31301.75 34718.42 1.11

For MySQL 8.0 the hit from utf8mb4 is much lower (up to 11%)

Now let’s compare all collations for utf8mb4

For MySQL 5.7

utf8mb4_general_ci (default) utf8mb4_bin utf8mb4_unicode_ci utf8mb4_unicode_520_ci 4 2957.99 3328.8 2157.61 1942.78 24 13792.55 15857.29 9989.96 9095.17 64 24516.99 28125.16 16207.26 14768.64 128 23977.07 27410.94 15970.6 14560.6

If you plan to use utf8mb4_unicode_ci, you will get an even further performance hit (comparing to utf8mb4_general_ci )

And for MySQL 8.0.15

utf8mb4_general_ci utf8mb4_bin utf8mb4_unicode_ci utf8mb4_0900_ai_ci (default) 4 3461.8 3628.01 3363.7 3968.88 24 16327.45 17136.16 15740.83 18446.19 64 28960.62 30390.29 27242.72 32776.35 128 27967.25 29256.89 26489.83 31301.75

So now let’s compare MySQL 8.0 vs MySQL 5.7 in utf8mb4 with default collations:

MySQL 8.0 utf8mb4_0900_ai_ci MySQL 5.7 utf8mb4_general_ci MySQL 8.0 ratio 4 3968.88 2957.99 1.34 24 18446.19 13792.55 1.34 64 32776.35 24516.99 1.34 128 31301.75 23977.07 1.31

So there we are. In this case, MySQL 8.0 is actually better than MySQL 5.7 by 34%

Conclusions

There are several observations to make:

  • MySQL 5.7 outperforms MySQL 8.0 in latin1 charset
  • MySQL 8.0 outperforms MySQL 5.7 by a wide margin if we use utf8mb4 charset
  • Be aware that utf8mb4  is now default MySQL 8.0, while MySQL 5.7 has latin1 by default
  • When running comparison between MySQL 8.0 vs MySQL 5.7 be aware what charset you are using, as it may affect the comparison a lot.

MySQL User Camp, Bangalore – February 13th, 2019

At MySQL we love opportunities to interact with our users and MySQL User Camp gives us the chance to do just that. The first MySQL User Camp of this year was held on February 13th 2019 at the Oracle Kalyani Magnum Office.…

Facebook Twitter Google+ LinkedIn

Presentation : JSON improvements in MySQL 8.0

MySQL User camp is one of the prestigious MySQL meetup happening in India for the past 6 years. Mydbops DBA’s PonSuresh and Vignesh has presented about the “JSON functions and their improvements in MySQL 8.0” at MySQL User Camp Bangalore on 13-02-2019.

JSON has been more improved a lot in MySQL 8.0 and improvements in LOB storage of MySQL boost it performance further. This presentation covers the JSON performance enhancements in MySQL 8.0 with its basic functions.

JSON improvements in MySQL 8.0 from Mydbops

How to Install Matomo Web Analytics on CentOS 7

Matomo (formerly Piwik) is a free and open source web analytics application developed by a team of international developers, that runs on a PHP/MySQL web server. This tutorial will show you how to install Matomo on a CentOS 7 system using Nginx as the web server and we will secure the website with a Let's Encrypt SSL certificate.

Oracle Open World 2019 – CodeONE Call For Paper

The Oracle Open World 2019 Call For Paper is open until March 13th.

MySQL track will be part of CodeONE, the parallel conference focused on developers.

We encourage you to submit a session related to the following topics:

  • case studies / user stories of your MySQL usage
  • lessons learned in running web scale MySQL
  • production DBA/devops perspectives into MySQL Architecture, Performance, Replication, InnoDB, Security, …
  • Migration to MySQL
  • MySQL 8.0 (Document Store, InnoDB Cluster, new Data Dictionary, …)

Don’t miss the chance to participate to this amazing event. Submit now() here !

Percona XtraBackup Now Supports Dump of InnoDB Buffer Pool

InnoDB keeps hot data in memory on its buffer named InnoDB Buffer Pool. For a long time, when a MySQL instance needed to bounce, this hot cached data was lost and the instance required a warm-up period to perform as well as it did before the service restart.

That is not the case anymore. Newer versions of MySQL/MariaDB allow users to save the state of this buffer by dumping tablespace ID’s and page ID’s to a file on disk that will be loaded automatically on startup, making the newly started server buffer pool as it was prior the restart.

Details about the MySQL implementation can be found at https://dev.mysql.com/doc/refman/5.7/en/innodb-preload-buffer-pool.html

With that in mind, Percona XtraBackup versions 2.4.13 can now instruct MySQL to dump the content of buffer pool while taking a backup. This means you can restore the backup on a new server and make MySQL perform just like the other instance in terms of InnoDB Buffer Pool data.

How it works

The buffer pool dump happens at the beginning of backup if --dump-innodb-buffer-pool is set.

The user can choose to change the default innodb_buffer_pool_dump_pct. If --dump-innodb-buffer-pool-pct is set, it stores the current MySQL innodb_buffer_pool_dump_pct value, then it changes it to the desired percentage. After the end of the backup, original values is restored back.

The actual file copy happens at the end of the backup.

Percona XtraDB Cluster

A very good use case is PXC/Galera. When a node initiates SST, we would like the joiner to have a copy of InnoDB Buffer Pool from the donor. We can configure PXC nodes to do that:

[xtrabackup] dump-innodb-buffer-pool dump-innodb-buffer-pool-pct=100

Here is an example of a PXC node that just received SST:

Before PXB-1548:

[root@marcelo-altmann-pxb-pxc-3 ~]# systemctl stop mysql && rm -rf /var/lib/mysql/* && systemctl start mysql && mysql -psekret -e "SHOW ENGINE INNODB STATUS\G" | grep 'Database pages' mysql: [Warning] Using a password on the command line interface can be insecure. Database pages 311

Joiner started with a cold buffer pool.

After adding dump-innodb-buffer-pool and dump-innodb-buffer-pool-pct=100 to my.cnf :

[root@marcelo-altmann-pxb-pxc-3 ~]# systemctl stop mysql && rm -rf /var/lib/mysql/* && systemctl start mysql && mysql -psekret -e "SHOW ENGINE INNODB STATUS\G" | grep 'Database pages' mysql: [Warning] Using a password on the command line interface can be insecure. Database pages 30970

Joiner started with a copy of the buffer pool from the donor, which will reduce the joiner warm-up period.

Conclusion

The new version of Percona XtraBackup can help to minimize the time a newly restored backup will take to perform like source server


Photo by Jametlene Reskp on Unsplash

Galera Cluster 4 available for use in the latest MariaDB 10.4.3 Release Candidate!

The much anticipated release of Galera 4 makes its way first in the latest release of MariaDB Server 10.4.3 Release Candidate which you can download now. Congratulations to MariaDB Corporation and MariaDB Foundation on this release candidate — please get testing Galera 4, and share your feedback with us via our Google Group discussion list! Do not forget that you can always reach us via email: mailto:info@galeracluster.com and via our contact us form.

The feature we are excited about the most, is a feature request from many users. This is none other than huge transaction support, since large write transactions are processed on the “master” node until commit time, and there have long been configuration options to prevent large transactions. Galera 4 implements streaming replication, which means it is now possible to replicate transactions of any size, while still keeping existing transaction size limits. In MariaDB, it also makes use of the group commit in the binary log (group commit is enabled for both regular replication and streaming transactions), so if you have binary logging enabled, this may be of interest to you from a throughput standpoint. How does all this work? The transaction itself is replicated in small increments, and there is a configurable size threshold. Replicated rows itself are locked and cannot be conflicted. A point of note: LOAD DATA has been refactored to work with streaming replication, so there are no intermediate commits during such processing.

As a MariaDB user, to perform State Transfer Snapshots (SSTs), you will be using Mariabackup. The server itself has backup locks, fixed in MariaDB Server 10.4.2. All in, this should be helpful for mariabackup SST.

If you look at the mysql database, there are several new tables of interest: wsrep_cluster, wsrep_cluster_members, and wsrep_streaming_log. As a DBA, you can utilise wsrep_cluster and wsrep_cluster_members to visualise the currently active nodes in your cluster.

To implement causal reads from the application, there are also 3 new functions that will benefit developers in Galera 4: WSREP_LAST_WRITTEN_GTID(), which returns the Global Transaction ID (GTID) that corresponds to the last write transaction made by the client, WSREP_LAST_SEEN_GTID(), which returns the GTID of the last write transaction that was observed by the client, and WSREP_SYNC_WAIT_UPTO_GTID(), which requires a GTID in the argument, and blocks the client until the GTID provided has been applied and committed on the node, or until a timeout occurs.

Some understated features that have been present since Galera 3.x include intelligent donor selection and cluster crash recovery. These were originally planned for Galera 4, but made it into earlier releases largely due to customer requirements.

When it comes to donor node selection in Galera 3, the State Snapshot Transfer (SST) donor was selected at random. However with Galera 4, you get a much more intelligent choice when it comes to choosing a donor, as it will favour a donor that can provide an Incremental State Transfer (IST), or pick a donor in the same segment. As a database administrator (DBA), you can force this via setting wsrep_sst_donor.

No matter where you host your Galera Cluster, there is a chance that the data center suffers a catastrophic power outage taking out your entire set of nodes in that data center (you of course were running with data center High Availability in mind, and have another, right?) for a timeframe. When power gets restored, you typically had to bootstrap a node to be a new cluster and then start MariaDB on the other nodes (since nodes can automatically join a cluster, but first a node must be elected!). You now have a new option, pc.recovery=ON which means that all nodes maintain cluster group information persistently, and upon a shutdown/restart scenario, the full group continues to start in the same configuration! This is a huge win from an automation standpoint as it means the DBA does not have to be on call during the restart.

By now, you are probably wanting to update a test cluster from an earlier MariaDB version. Currently, rolling upgrades of an idle 10.3 Galera Cluster can be migrated to a 10.4 Galera Cluster with Galera 4. It should also be safe to direct DML only workloads to a mixed cluster during said process, however the 10.4 nodes should be read only and writes should be directed to the 10.3 nodes.

However, the safest way might be to do a node-by-node upgrade from MariaDB 10.3 to the current MariaDB 10.4 release candidate. So the best way to do this, as quoted in the MariaDB release notes is to:

  • Install MariaDB 10.4.2 and Galera 4 version 26.4.0
  • Start MariaDB server, but make sure it is not trying to connect to the cluster by configuring wsrep_provider=none
  • While MariaDB server is running, run mysql_upgrade for the server
  • Restart MariaDB Server with the proper wsrep_provider, and it will join the cluster topology

Note that the update operation should be done fast enough so that joining will happen with Incremental State Transfer (IST). If the 10.4 node joins back with SST, the data directory would be overwritten with the old 10.3 data directory from the donor node.

So, what are you waiting for? Try out MariaDB 10.4.3 Release Candidate with Galera 4 today! As always, don’t forget to drop feedback at our Google Group discussion list. We will also be monitoring the maria-discuss and maria-developers lists, as well as MariaDB’s Jira for any pesky bugs or feature enhancement feedback!

MySQL Master High Availability and Failover: more thoughts

Some months ago, Shlomi Noach published a series about Service Discovery.  In his posts, Shlomi describes many ways for an application to find the master.  He also gives detail on how these solutions cope with failover to a slave, including their integration with Orchestrator. This is a great series, and I recommend its reading for everybody implementing master failover, with or without

Building MySQL on OCI Compute with Scripting

The Oracle Cloud has a lot to offer with it’s Security Focused 2nd Generation OCI environment.  My previous blog on OCI IaaS walks us through some of the great IaaS features such as compartments and VCN Subnets where you can run services across private ip addresses.  What I’d like to look at this time is… Read More »

SSL Encryption at ProxySQL Part 2

This is the second part of a two part series. In SSL Encryption at ProxySQL Part 1 we have seen how to enable backend SSL encryption for ProxySQL. In this second article, we are going to see how to enable frontend SSL, so client can establish a secure remote connection towards a ProxySQL instance.

In a standard configuration, a client connection is unencrypted, which can lead to data being intercepted on the way. In previous versions of ProxySQL it was recommended to deploy ProxySQL on your application servers and use sockets to secure traffic however starting from version 2.0 Frontend SSL connections are available.

Note : Frontend SSL is available from ProxySQL v2.0+

Verify encryption on ProxySQL

Before going ahead let's make sure connections are not using SSL encryption.

By default, ProxySQL listens for MySQL Traffic on 0.0.0.0:6033 and ProxySQL Admin commands on 0.0.0.0:6032 (used for monitoring and configuration).

# When Frontend SSL is disable $ mysql -h127.0.0.1 -P6033 -usysbench -psysbench -e '\s' | grep -P 'SSL|Connection' Connection id: 5 SSL: Not in use Connection: 127.0.0.1 via TCP/IP

As the above output indicates, SSL is not currently in use for our connection, even though we are connected over TCP. Assuming you already have configured backend SSL from last blog post and now we can see SSL is currently in use for backend connections.

# When Backend SSL is enabled : $ mysql -h127.0.0.1 -P6033 -usysbench -psysbench -e 'SHOW SESSION STATUS LIKE "Ssl_cipher"' +---------------+----------------------+ | Variable_name | Value | +---------------+----------------------+ | Ssl_cipher | ECDHE-RSA-AES256-SHA | +---------------+----------------------+ Enabling encryption on ProxySQL

Now lets enable frontend SSL by using ProxySQL's Admin interface.

UPDATE global_variables SET variable_value='true' WHERE variable_name='mysql-have_ssl'; OR SET mysql-have_ssl='true';

Load this configuration to runtime and save on disk if you want to make changes persistent.

LOAD MYSQL VARIABLES TO RUNTIME; SAVE MYSQL VARIABLES TO DISK;

Note : Changes will only apply on new client connection.

Lets go ahead and verify this by executing the following commands:

$ mysql -h127.0.0.1 -P6032 -uadmin -padmin -e '\s' | grep -P 'SSL|Connection' SSL: Cipher in use is DHE-RSA-AES256-SHA Connection: 127.0.0.1 via TCP/IP $ mysql -h127.0.0.1 -P6033 -usysbench_FB -psysbench -e '\s' | grep -P 'SSL|Connection' SSL: Cipher in use is DHE-RSA-AES256-SHA Connection: 127.0.0.1 via TCP/IP

As we can see SSL Cipher in use while making frontend connections towards ProxySQL.

Supported protocol/ciphers for frontend SSL:

  • SSLv2
  • SSLv3
  • TLSv1
  • TLSv1.1
  • TLSv1.2

Supported ciphers:

  • DHE-RSA-AES256-GCM-SHA384
  • DHE-RSA-AES256-SHA256
  • DHE-RSA-AES256-SHA
  • DHE-RSA-CAMELLIA256-SHA
  • AES256-GCM-SHA384
  • AES256-SHA256
  • AES256-SHA
  • CAMELLIA256-SHA
  • DHE-RSA-AES128-GCM-SHA256
  • DHE-RSA-AES128-SHA256
  • DHE-RSA-AES128-SHA
  • DHE-RSA-SEED-SHA
  • DHE-RSA-CAMELLIA128-SHA
  • AES128-GCM-SHA256
  • AES128-SHA256
  • AES128-SHA
  • SEED-SHA
  • CAMELLIA128-SHA
  • DES-CBC3-SHA

We have performed a simple read-only test using sysbench against ProxySQL with backend/frontend SSL enabled and disabled. ProxySQL was configured to use 8 internal threads.

From above graphs we can see that encryption has not added any major overhead and not impacted much on performance, making it safe to use encryption and eliminate data flowing insecurely through the network.

For reference, we used following sysbench one-liner commands :

Backend SSL Test : for conn in 1 8 128 ; do for i in $(seq 1 3) ; do echo "${conn}:${i}"; ulimit -n 1048576; ./src/sysbench ./src/lua/oltp_read_only.lua --db-debug=on --report-interval=20 --table-size=70000000 --tables=20 --mysql-db=sbtest_rw --mysql-user=sysbench--mysql-password=sysbench --db-driver=mysql --mysql-host=127.0.0.1 --max-requests=0 --mysql-port=6033 --db-ps-mode=disable --skip-trx=on --threads=${conn} --max-time=60 run ; ulimit -n 1048576; ./src/sysbench ./src/lua/oltp_read_only.lua --db-debug=on --report-interval=20 --table-size=70000000 --tables=20 --mysql-db=sbtest_rw --mysql-user=sysbench --mysql-password=sysbench --db-driver=mysql --mysql-host=127.0.0.1 --max-requests=0 --mysql-port=6033 --db-ps-mode=disable --skip-trx=on --threads=${conn} --max-time=120 run |tee /data/benchmark/v2.0/backend_ssl/v2.0_8threads${conn}connections.${i}_line.log ; done ; done Frontend SSL Test : for conn in 1 8 128 ; do for i in $(seq 1 3) ; do echo "${conn}:${i}"; ulimit -n 1048576; ./src/sysbench ./src/lua/oltp_read_only.lua --db-debug=on --report-interval=20 --table-size=70000000 --tables=20 --mysql-db=sbtest_rw --mysql-user=sysbench --mysql-password=sysbench --mysql-ssl=on --db-driver=mysql --mysql-host=127.0.0.1 --max-requests=0 --mysql-port=6033 --db-ps-mode=disable --skip-trx=on --threads=${conn} --max-time=60 run ; ulimit -n 1048576; ./src/sysbench ./src/lua/oltp_read_only.lua --db-debug=on --report-interval=20 --table-size=70000000 --tables=20 --mysql-db=sbtest_rw --mysql-user=sysbench --mysql-password=sysbench --mysql-ssl=on --db-driver=mysql --mysql-host=127.0.0.1 --max-requests=0 --mysql-port=6033 --db-ps-mode=disable --skip-trx=on --threads=${conn} --max-time=120 run |tee /data/benchmark/v2.0/frontend_backend/v2.0_8threads${conn}connections.${i}_line.log ; done ; done

Happy ProxySQLing !

Authored by: Ashwini Ahire & Nick Vyzas

If you have any questions please do not hesitate to contact us. Our performance and scalability experts will help you to analyze your infrastructure and help to build fast and reliable architecture. We also offer long term support and DBRE consulting for ProxySQL users.

MySQL Challenge: 100k Connections

In this post, I want to explore a way to establish 100,000 connections to MySQL. Not just idle connections, but executing queries.

100,000 connections. Is that really needed for MySQL, you may ask? Although it may seem excessive, I have seen a lot of different setups in customer deployments. Some deploy an application connection pool, with 100 application servers and 1,000 connections in each pool. Some applications use a “re-connect and repeat if the query is too slow” technique, which is a terrible practice. It can lead to a snowball effect, and could establish thousands of connections to MySQL in a matter of seconds.

So now I want to set an overachieving goal and see if we can achieve it.

Setup

For this I will use the following hardware:

Bare metal server provided by packet.net, instance size: c2.medium.x86
Physical Cores @ 2.2 GHz
(1 X AMD EPYC 7401P)
Memory: 64 GB of ECC RAM
Storage : INTEL® SSD DC S4500, 480GB

This is a server grade SATA SSD.

I will use five of these boxes, for the reason explained below. One box for the MySQL server and four boxes for client connections.

For the server I will use Percona  Server for MySQL 8.0.13-4 with the thread pool plugin. The plugin will be required to support the thousands of connections.

Initial server setup

Network settings (Ansible format):

- { name: 'net.core.somaxconn', value: 32768 } - { name: 'net.core.rmem_max', value: 134217728 } - { name: 'net.core.wmem_max', value: 134217728 } - { name: 'net.ipv4.tcp_rmem', value: '4096 87380 134217728' } - { name: 'net.ipv4.tcp_wmem', value: '4096 87380 134217728' } - { name: 'net.core.netdev_max_backlog', value: 300000 } - { name: 'net.ipv4.tcp_moderate_rcvbuf', value: 1 } - { name: 'net.ipv4.tcp_no_metrics_save', value: 1 } - { name: 'net.ipv4.tcp_congestion_control', value: 'htcp' } - { name: 'net.ipv4.tcp_mtu_probing', value: 1 } - { name: 'net.ipv4.tcp_timestamps', value: 0 } - { name: 'net.ipv4.tcp_sack', value: 0 } - { name: 'net.ipv4.tcp_syncookies', value: 1 } - { name: 'net.ipv4.tcp_max_syn_backlog', value: 4096 } - { name: 'net.ipv4.tcp_mem', value: '50576   64768 98152' } - { name: 'net.ipv4.ip_local_port_range', value: '4000 65000' } - { name: 'net.ipv4.netdev_max_backlog', value: 2500 } - { name: 'net.ipv4.tcp_tw_reuse', value: 1 } - { name: 'net.ipv4.tcp_fin_timeout', value: 5 }

These are the typical settings recommended for 10Gb networks and high concurrent workloads.

Limits settings for systemd:

[Service] LimitNOFILE=1000000 LimitNPROC=500000

And the relevant setting for MySQL in my.cnf:

back_log=3500 max_connections=110000

For the client I will use sysbench version 0.5 and not 1.0.x, for the reasons explained below.

The workload is

sysbench --test=sysbench/tests/db/select.lua --mysql-host=139.178.82.47 --mysql-user=sbtest --mysql-password=sbtest --oltp-tables-count=10 --report-interval=1 --num-threads=10000 --max-time=300 --max-requests=0 --oltp-table-size=10000000 --rand-type=uniform --rand-init=on run Step 1. 10,000 connections

This one is very easy, as there is not much to do to achieve this. We can do this with only one client. But you may face the following error on the client side:

FATAL: error 2004: Can't create TCP/IP socket (24)

This is caused by the open file limit, which is also a limit of TCP/IP sockets. This can be fixed by setting  

ulimit -n 100000  on the client.

The performance we observe:

[  26s] threads: 10000, tps: 0.00, reads: 33367.48, writes: 0.00, response time: 3681.42ms (95%), errors: 0.00, reconnects:  0.00 [  27s] threads: 10000, tps: 0.00, reads: 33289.74, writes: 0.00, response time: 3690.25ms (95%), errors: 0.00, reconnects:  0.00

Step 2. 25,000 connections

With 25,000 connections, we hit an error on MySQL side:

Can't create a new thread (errno 11); if you are not out of available memory, you can consult the manual for a possible OS-dependent bug

If you try to lookup information on this error you might find the following article:  https://www.percona.com/blog/2013/02/04/cant_create_thread_errno_11/

But it does not help in our case, as we have all limits set high enough:

cat /proc/`pidof mysqld`/limits Limit                     Soft Limit Hard Limit           Units Max cpu time              unlimited unlimited            seconds Max file size             unlimited unlimited            bytes Max data size             unlimited unlimited            bytes Max stack size            8388608 unlimited            bytes Max core file size        0 unlimited            bytes Max resident set          unlimited unlimited            bytes Max processes             500000 500000               processes Max open files            1000000 1000000              files Max locked memory         16777216 16777216             bytes Max address space         unlimited unlimited            bytes Max file locks            unlimited unlimited            locks Max pending signals       255051 255051               signals Max msgqueue size         819200 819200               bytes Max nice priority         0 0 Max realtime priority     0 0 Max realtime timeout      unlimited unlimited            us

This is where we start using the thread pool feature:  https://www.percona.com/doc/percona-server/8.0/performance/threadpool.html

Add:

thread_handling=pool-of-threads

to the my.cnf and restart Percona Server

The results:

[   7s] threads: 25000, tps: 0.00, reads: 33332.57, writes: 0.00, response time: 974.56ms (95%), errors: 0.00, reconnects:  0.00 [   8s] threads: 25000, tps: 0.00, reads: 33187.01, writes: 0.00, response time: 979.24ms (95%), errors: 0.00, reconnects:  0.00

We have the same throughput, but actually the 95% response time has improved (thanks to the thread pool) from 3690 ms to 979 ms.

Step 3. 50,000 connections

This is where we encountered the biggest challenge. At first, trying to get 50,000 connections in sysbench we hit the following error:

FATAL: error 2003: Can't connect to MySQL server on '139.178.82.47' (99)

Error (99) is cryptic and it means: Cannot assign requested address.

It comes from the limit of ports an application can open. By default on my system it is

cat /proc/sys/net/ipv4/ip_local_port_range : 32768   60999

This says there are only 28,231 available ports — 60999 minus 32768 — or the limit of TCP connections you can establish from or to the given IP address.

You can extend this using a wider range, on both the client and the server:

echo 4000 65000 > /proc/sys/net/ipv4/ip_local_port_range

This will give us 61,000 connections, but this is very close to the limit for one IP address (maximal port is 65535). The key takeaway from here is that if we want more connections we need to allocate more IP addresses for MySQL server. In order to achieve 100,000 connections, I will use two IP addresses on the server running MySQL.

After sorting out the port ranges, we hit the following problem with sysbench:

sysbench 0.5:  multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 50000 FATAL: pthread_create() for thread #32352 failed. errno = 12 (Cannot allocate memory)

In this case, it’s a problem with sysbench memory allocation (namely lua subsystem). Sysbench can allocate memory for only 32,351 connections. This is a problem which is even more severe in sysbench 1.0.x.

Sysbench 1.0.x limitation

Sysbench 1.0.x uses a different Lua JIT, which hits memory problems even with 4000 connections, so it is impossible to go over 4000 connection in sysbench 1.0.x

So it seems we hit a limit with sysbench sooner than with Percona Server. In order to use more connections, we need to use multiple sysbench clients, and if 32,351 connections is the limit for sysbench, we have to use at least four sysbench clients to get up to 100,000 connections.

For 50,000 connections I will use 2 servers (each running separate sysbench), each running 25,000 threads from sysbench.

The results for each sysbench looks like:

[  29s] threads: 25000, tps: 0.00, reads: 16794.09, writes: 0.00, response time: 1799.63ms (95%), errors: 0.00, reconnects:  0.00 [  30s] threads: 25000, tps: 0.00, reads: 16491.03, writes: 0.00, response time: 1800.70ms (95%), errors: 0.00, reconnects:  0.00

So we have about the same throughput (16794*2 = 33588 tps in total), however the 95% response time doubled. This is to be expected as we are using twice as many connections compared to the 25,000 connections benchmark.

Step 3. 75,000 connections

To achieve 75,000 connections we will use three servers with sysbench, each running 25,000 threads.

The results for each sysbench:

[ 157s] threads: 25000, tps: 0.00, reads: 11633.87, writes: 0.00, response time: 2651.76ms (95%), errors: 0.00, reconnects:  0.00 [ 158s] threads: 25000, tps: 0.00, reads: 10783.09, writes: 0.00, response time: 2601.44ms (95%), errors: 0.00, reconnects:  0.00

Step 4. 100,000 connections

There is nothing eventful to achieve75k and 100k connections. We just spin up an additional server and start sysbench. For 100,000 connections we need four servers for sysbench, each shows:

[ 101s] threads: 25000, tps: 0.00, reads: 8033.83, writes: 0.00, response time: 3320.21ms (95%), errors: 0.00, reconnects:  0.00 [ 102s] threads: 25000, tps: 0.00, reads: 8065.02, writes: 0.00, response time: 3405.77ms (95%), errors: 0.00, reconnects:  0.00

So we have the same throughput (8065*4=32260 tps in total) with 3405ms 95% response time.

A very important takeaway from this: with 100k connections and using a thread pool, the 95% response time is even better than for 10k connections without a thread pool. The thread pool allows Percona Server to manage resources more efficiently and provides better response times.

Conclusions

100k connections is quite achievable for MySQL, and I am sure we could go even further. There are three components to achieve this:

  • Thread pool in Percona Server
  • Proper tuning of network limits
  • Using multiple IP addresses on the server box (one IP address per approximately 60k connections)
Appendix: full my.cnf

[mysqld] datadir {{ mysqldir }} ssl=0 skip-log-bin log-error=error.log # Disabling symbolic-links is recommended to prevent assorted security risks symbolic-links=0 character_set_server=latin1 collation_server=latin1_swedish_ci skip-character-set-client-handshake innodb_undo_log_truncate=off # general table_open_cache = 200000 table_open_cache_instances=64 back_log=3500 max_connections=110000 # files innodb_file_per_table innodb_log_file_size=15G innodb_log_files_in_group=2 innodb_open_files=4000 # buffers innodb_buffer_pool_size= 40G innodb_buffer_pool_instances=8 innodb_log_buffer_size=64M # tune innodb_doublewrite= 1 innodb_thread_concurrency=0 innodb_flush_log_at_trx_commit= 0 innodb_flush_method=O_DIRECT_NO_FSYNC innodb_max_dirty_pages_pct=90 innodb_max_dirty_pages_pct_lwm=10 innodb_lru_scan_depth=2048 innodb_page_cleaners=4 join_buffer_size=256K sort_buffer_size=256K innodb_use_native_aio=1 innodb_stats_persistent = 1 #innodb_spin_wait_delay=96 innodb_adaptive_flushing = 1 innodb_flush_neighbors = 0 innodb_read_io_threads = 16 innodb_write_io_threads = 16 innodb_io_capacity=1500 innodb_io_capacity_max=2500 innodb_purge_threads=4 innodb_adaptive_hash_index=0 max_prepared_stmt_count=1000000 innodb_monitor_enable = '%' performance_schema = ON

Automating Tungsten upgrades using Ansible

Continuent Tungsten is one of the few all-in-one solutions for MySQL high availability. In this post I will show you how to automate the upgrade process using Ansible. I will walk you through the individual tasks and, finally, give you the complete playbook.

We will use a rolling approach, upgrading the slaves first, and finally upgrading the former master. There is no need for a master switch, as the process is transparent to the application.

I am assuming you are using the .ini based installation method. If you are still using staging host, I suggest you update your set up.

Pre tasks

The first step is ensuring the cluster is healthy, because we don’t want to start taking nodes offline unless we are sure the cluster is in good shape. One way of doing that is by using the built-in script tungsten_monitor. When we run the playbook, we only need to validate the cluster status on one node, so I am adding run_once: true to limit this task to a single node.

- name: verify cluster is healthy shell: "/opt/continuent/tungsten/cluster-home/bin/tungsten_monitor" run_once: true

Next, we may want to do some parameter changes as part of the upgrade. The following tasks will help deal with that. The first one adds a line to the end of the file, while the second will look for a line starting with skip-validation-check, and add the new line right after.

Notice I am using the become: yes so that the tasks will run as root.

- name: Disable bridge mode for connector become: yes lineinfile: path: /etc/tungsten/tungsten.ini line: connector-bridge-mode=false insertafter: EOF - name: Disable the check for modified files become: yes lineinfile: path: /etc/tungsten/tungsten.ini line: skip-validation-check=ModifiedConfigurationFilesCheck insertafter: '^skip-validation-check'

Now we need to know the current master node for the purpose of the upgrade, so we get that from cctrl and store it as a fact so we can reference it further down the road.

- name: Capture Master Node become: yes shell: "su - tungsten -c \"echo 'ls' | cctrl \" | grep -B 5 'REPLICATOR(role=master, state=ONLINE)' | grep progress | cut -d'(' -f1 | sed 's/|//'" changed_when: False run_once: true register: master_node_result

The cluster might have a different master now than when it was initially provisioned, so we need to update the tungsten.ini file accordingly

- name: Replace current master in tungsten.ini file become: yes lineinfile: path: /etc/tungsten/tungsten.ini regexp: '^master=' line: 'master={{ master_node_result.stdout }}'

Now we need to set downtime on the monitoring system. The delegate_to is useful for when you want to run a task on a different host than the one ansible is running against.

- name: Downtime alerts shell: "/usr/local/bin/downtime --host='{{ ansible_hostname }} --service='{{ item }}' --comment='Downtime services for Tungsten Upgrade'" with_items: [ 'Tungsten policy mode', 'Tungsten Replicator Status', 'Tungsten THL Retention' ] run_once: true delegate_to: nagios_host.example.com

Next, we set the cluster policy to maintenance to prevent any automatic operations from interfering with our upgrade.

- name: Set Policy to Maintenance become: yes shell: "su - tungsten -c \"echo 'set policy maintenance' | cctrl\"" run_once: true register: maintenance_results - debug: var=maintenance_results.stdout_lines Upgrade process

Now we are ready to start the upgrade on the slaves. The approach I am taking here is to copy the rpm file from my local machine to the servers and install using yum. The tungsten rpm package does everything automatically, provided the tungsten.ini file exists.

- name: copy Tungsten packages copy: src: /vagrant/tungsten dest: /tmp - name: Install Continuent Tungsten rpm yum: name: - /tmp/tungsten/tungsten-clustering-5.3.5-623.x86_64.rpm state: present

Once the slaves are done, we operate on the master and bring the replicator online on it.

- name: Upgrade Continuent Tungsten rpm on master become: yes yum: name: - tungsten-clustering state: present - name: Online cluster replicator on master shell: "su - tungsten -c \"trepctl online\""

After that is done, we can set the policy to automatic again, which will bring the slave replicators online.

- name: Set Policy to automatic become: yes shell: "su - tungsten -c \"echo 'set policy automatic' | /opt/continuent/tungsten/tungsten-manager/bin/cctrl\"" register: maintenance_results - debug: var=maintenance_results.stdout_lines Wrapping up

Finally, we do a check of the cluster status and bring hosts out of maintenance mode.

- name: Check cluster status shell: "su - tungsten -c \"echo ls | cctrl\"" register: maintenance_results run_once: true - debug: var=maintenance_results.stdout_lines run_once: true - name: Cancel downtime alerts shell: "/usr/local/bin/downtime --cancel --host='{{ ansible_hostname }} --service='{{ item }}' --comment='Downtime services for Tungsten Upgrade'" with_items: [ 'Tungsten policy mode', 'Tungsten Replicator Status', 'Tungsten THL Retention' ] run_once: true delegate_to: nagios_host.example.com

The full playbook is available here.

Closing thoughts

Ansible is a very powerful tool, and when you are dealing with hundreds or thousands of servers, you can save a lot of time by automating repetitive tasks. I hope you have learned something useful from this post and if you have any comments or Ansible tips you’d like to share, do let me know in the section below.

Happy automating!

MySQL Exact Row Count For All The Tables

Getting the row count from mysql tables are not a big deal and even there is no need for a blog for this. Simply go and query the INFORMATION_SCHEMA and get the row count for the tables. But this is not your actual row counts. It’ll show the row count of the tables during the last statistics update. So if you want to track your tables growth then you should do select count(*) from table_name for all the tables and insert the results to somewhere. There are a lot of ways available. Im just make this as a blog post. So others can benefit from it.

Row Count - From Stored Procedure:

We’ll get the list of table names from the information_schema and use cursor to run select count(*) on that table and save the row count value to a table.

In this example, Im going to collect the row count of the tables from the database called prod_db. And this procedure and tracking table will be saved on the database called dbadmin.

use dbadmin; CREATE TABLE table_growth ( id INT (11) NOT NULL AUTO_INCREMENT ,timestamp DATETIME DEFAULT CURRENT_TIMESTAMP ,table_name VARCHAR(50) DEFAULT NULL ,rows INT (11) DEFAULT NULL ,PRIMARY KEY (id) ); delimiter // CREATE PROCEDURE rows_growth() BEGIN DECLARE start INTEGER DEFAULT 0; DECLARE t_name varchar(255); DEClARE table_names CURSOR FOR SELECT table_name FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='prod_db'; DECLARE CONTINUE HANDLER FOR NOT FOUND SET start=1; OPEN table_names; get_tablename: table_name FETCH table_names INTO t_name; SET @query =CONCAT('insert into table_growth (table_name, rows) select "',t_name,'" as tablename ', ',', 'count(*) from prod_db.', t_name); select @query; PREPARE insert_stmt FROM @query; EXECUTE insert_stmt; IF start = 1 THEN LEAVE get_tablename; END IF; END table_name get_tablename; CLOSE table_names; END// delimiter ;Row Count - From Shell Script mysql -h IP_ADDRESS -usqladmin -p'password!' Foodie -N -e "SELECT table_name FROM INFORMATION_SCHEMA.tables where table_schema='Foodie';" | while read -r table_name do count=$(mysql -h IP_ADDRESS -usqladmin -p'password' Foodie -N -e "SELECT COUNT(*) FROM $table_name;") mysql -h IP_ADDRESS -usqladmin -p'W0rdsarewind!' dbadmin -e "INSERT INTO table_growth (table_name, rows) VALUES ('$table_name',$count);" done

If your tables are having a huge amount of data and running with one or two read replica, then use replica’s IP address for doing the select count(*).

Fun with Bugs #80 - On MySQL Bug Reports I am Subscribed to, Part XVI

Today I'd like to continue my review of public MySQL bug reports with a list of some bugs I've subscribed to over last 3 weeks. It's already long enough and includes nice cases to check and share. Note that I usually subscribe to a bug either because it directly affects me or customers I work with, or I consider it technically interesting (so I mostly care about InnoDB, replication, partitioning and optimizer bugs), or it's a "metabug" - a problem in the way public bug report is handled by Oracle engineers. These are my interests related to MySQL bugs.

As usual, I start with the oldest bugs and try to mention bug reporters by name with links to their other reports whenever this may give something useful to a reader. I try to check if MariaDB is also affected in some cases. Check also my summary comments at the end of this blog post.
  • Bug #94148 - "Unnecessary Shared lock on parent table During UPDATE on a child table". In this bug report Uday Varagani reasonably pointed out that formally there is no need to lock parent row when column NOT included in the foreign key gets updated. This happens though when this column is included into the index used to support foreign key constraint. IMHO it's a reasonable feature request and both Trey Raymond and Sveta Smirnova tried their best to  highlight this, but this report now has a "Need Feedback" status with a request to explain new algorithm suggested. It's simple - "Stop it", check that column changed is NOT the one foreign key is defined on, even if it's in the same index...I see no reason NOT to verify this as a reasonable feature request. Is it a new policy that every feature request should come with details on how to implement it? I truly doubt.
  • Bug #94224 - "[5.6] Optimizer reconsiders index based on index definition order, not value". Domas Mituzas found yet another case (see also Bug #36817 - "Non optimal index choice, depending on index creation order" from Jocelyn Fournier, the bug I verified more than 10 years ago) when in MySQL order of index definition matters more for optimizer than anything else.  My quick check shows that MariaDB 10.3.7 is not affected:
    MariaDB [test]> explain select distinct b from t1 where c not in (0) and d > 0;+------+-------------+-------+-------+---------------+--------------------+---------+------+------+-------------+| id   | select_type | table | type  | possible_keys | key            | key_len
    | ref  | rows | Extra                    |+------+-------------+-------+-------+---------------+--------------------+--------+------+------+-------------+
    |    1 | SIMPLE      | t1    | index | NULL          | non_covering_index | 9    | NULL |    1 | Using where |
    +------+-------------+-------+-------+---------------+--------------------+---------+------+------+-------------+
    1 row in set (0.002 sec)

    MariaDB [test]> alter table t1 add index covering_index (b, c, d);
    Query OK, 0 rows affected (0.149 sec)
    Records: 0  Duplicates: 0  Warnings: 0

    MariaDB [test]> explain select distinct b from t1 where c not in (0) and d > 0;
    +------+-------------+-------+-------+---------------+----------------+---------+------+------+--------------------------+
    | id   | select_type | table | type  | possible_keys | key            | key_len
    | ref  | rows | Extra                    |
    +------+-------------+-------+-------+---------------+----------------+---------+------+------+--------------------------+
    |    1 | SIMPLE      | t1    | index | NULL          | covering_index | 14
    | NULL |    1 | Using where; Using index |
    +------+-------------+-------+-------+---------------+----------------+---------+------+------+--------------------------+
    1 row in set (0.025 sec) Fortunately MySQL 8 is no longer affected. Unfortunately we do not see a public comment showing the results of testing on MySQL 5.7 (or any version, for that matter), from engineer who verified the bug. I already pointed out that this "metabug" becomes popular in my previous blog post.
  • Bug #94243 - "WL#9508 introduced non-idiomatic potentially-broken C macros". Laurynas Biveinis from Percona found new code that in ideal world wound not pass any serious code review.
  • Bug #94251 - "Aggregate function result is dependent by window is defined directly or as named". This bug was reported by Владислав Сокол. From what I see:
    MariaDB [test]> WITH RECURSIVE cte AS (
        -> SELECT 1 num
        -> UNION ALL
        -> SELECT num+1 FROM cte WHERE num < 5
        -> )
        -> SELECT num, COUNT(*) OVER (frame) cnt_named, COUNT(*) OVER (ORDER BY num
    DESC) cnt_direct
        -> FROM cte
        -> WINDOW frame AS (ORDER BY num DESC);
    +------+-----------+------------+
    | num  | cnt_named | cnt_direct |
    +------+-----------+------------+
    |    1 |         5 |          5 |
    |    2 |         4 |          4 |
    |    3 |         3 |          3 |
    |    4 |         2 |          2 |
    |    5 |         1 |          1 |
    +------+-----------+------------+
    5 rows in set (0.117 sec)

    MariaDB [test]> WITH RECURSIVE cte AS (
        -> SELECT 1 num
        -> UNION ALL
        -> SELECT num+1 FROM cte WHERE num < 5
        -> )
        -> SELECT num, COUNT(*) OVER (frame) cnt_named, COUNT(*) OVER (ORDER BY num
    DESC) cnt_direct
        -> FROM cte
        -> WINDOW frame AS (ORDER BY num DESC)
        -> ORDER BY num desc;
    +------+-----------+------------+
    | num  | cnt_named | cnt_direct |
    +------+-----------+------------+
    |    5 |         1 |          1 |
    |    4 |         2 |          2 |
    |    3 |         3 |          3 |
    |    2 |         4 |          4 |
    |    1 |         5 |          5 |
    +------+-----------+------------+
    5 rows in set (0.003 sec)MariaDB 10.3.7 is NOT affected.
  • Bug #94283 - "MySQL 8.0.15 is slower than MySQL 5.7.25". Percona's CTO Vadim Tkachenko reported that MySQL 8.0.15 is notably slower than 5.7.25 on a simple oltp_read_write sysbench test. He had recently written a separate blog post about this, with more details.There is one detail to clarify based on today's comment from Peter Zaitsev (was the same default character set used), but as my dear friend Sinisa Milivojevic verified the bug without any questions, requests or his own test outputs shared, we can assume that Oracle officially accepted this performance regression (even though "regression" tag was not set).

    Check also later Bug #94387 - "MySQL 8.0.15 is slower than MySQL 5.7.25 in read only workloads", yet another performance regression report from Vadim, where he found that on read only (sysbench oltp_point_select) all in memory workloads MySQL 8.0.15 may also be slower than MySQL 5.7.25.
  • Bug #94302 - "reset master could not break dump thread in some cases". This bug was reported by Ashe Sun. This is definitely a corner case, as it happens only master is still writing to the very first binary log. We can not find out from public comments in the bug report if any other versions besides 5.7.x are affected. This is yet another "metabug" - during my days in Oracle's MySQL bugs verification team we had to check on all versions still supported and present the results explicitly.
  • Bug #94319 - "Format_description_log_event::write can cause segfaults". Nice bug report by Manuel Ung from Facebook.
  • Bug #94330 - "Test for possible compressed failures before upgrade?". Change of zlib version starting from MySQL 5.7.24 means that some operations for InnoDB tables with ROW_FORMAT=COMPRESSED that previously worked may start to fail. In this report Monty Solomon asks for some way to determine if there will be a problem with existing compressed tables before upgrading to 5.7.24. The bug is still "Open".
  • Bug #94338 - "Dirty read-like behavior in READ COMMITTED transaction". Bug reporter, Masaki Oguro, stated that MySQL 8 is not affected (only 5.6 and 5.7) and the bug is verified on these versions, so we should assume it's really the case. But I miss public comment showing the result of testing on recent MySQL 8.0.15.
  • Bug #94340 - "backwards incompatible changes in 8.0: Error number: 3747". Simon Mudd complains about incompatible change in 8.0.13 that does not allow slave to easily switch from SBR to RBR without restart (and was not clearly documented as a change in behavior). Make sure to read all comments.
  • Bug #94370 - "Performance regression of btr_cur_prefetch_siblings". Nice bug report with a patch from Zhai Weixiang.
  • Bug #94383 - "simple ALTER cause unnecessary InnoDB index rebuilds, 5.7.23 or later 5.7 rlses". In this bug report Mikhail Izioumtchenko presented the detailed analysis and suggested diagnostics patches to show what really happens and why. This bug is also a regression of a kind, so while testing results are presented, I still think that it could be processed better according to the good old rules I have in mind.
  • Bug #94394 - "Absence of mysql.user leads to auto-apply of --skip-grant-tables". Great finding by Ceri Williams from Percona. Sveta Smirnova provided a separate MTR test case and clarified the impact of the bug. Surely this is also a regression comparing to MySQL 5.7, as there you can not start MySQL if mysql.user table is missing. I leave it to a reader to decide if there is any security-related impact of this bug...
  • Bug #94396 - "Error message too broad: The used command is not allowed with this MySQL version". This bug was reported by my former colleague in Percona Support, famous Bill Karwin. Informative error messages matter for good user experience.
We rely on MySQL in a same way as that guys on top of dolphins pyramid on this strange monument in some court somewhere at the Lanes. Reliable foundation matters, so regressions should better be avoided. To summarize:
  1. Looks like it's time for Oracle to spend some efforts to make MySQL 8 great again, by fixing some of the bugs mentioned above, especially performance regressions vs MySQL 5.7 found recently by Vadim Tkachenko from Percona.
  2. Oracle continues to introduce backward-incompatible changes in behavior in minor MySQL 8.0.x releases at GA stage. This is not really good for any production environment.
  3. Asking bug reporters to provide "the basics of such a new algorithm" when they complain that current one is wrong or not optimal is a new word in bugs processing!
  4. When I joined MySQL bugs verification team in 2005 we've set up a culture of bugs processing that included, among other things, presenting in a public comment any successful or unsuccessful attempt to verify the bug, by copy-pasting all commands and statements used along with the outputs, whenever possible and with enough context to show what was really checked. I've studied this approach from Oracle's Tom Kyte over the previous 10 years when I followed him closely. I used to think it's standard for more than a decade already, a kind of my (and not only my) "heritage". It's sad to see this approach is no longer followed by many Oracle engineers who process bugs, in too many cases.
  5. Oracle engineers still do not use "regression" tag when setting "Verified" status for obviously regression bugs. I think bug reporters should care then to always set it when they report regression of any kind.

Percona Live 2019 First Sneak Peek!

We know you’ve been really looking forward to a glimpse of what to expect at Percona Live Austin, so here is the first sneak peek of the agenda!

Our conference committee has been reviewing hundreds of talks over the last few weeks and is delighted to present some initial talks.

  • New features in MySQL 8.0 Replication by Luís Soares, Oracle OSS
  • Shaping the Future of Privacy & Data Protection by Cristina DeLisle, XWiki SAS
  • Galera Cluster New Features by Seppo Jaakola, Codership
  • MySQL Security and Standardization at PayPal by Stacy Yuan &  Yashada Jadha, PayPal
  • Mailchimp Scale: a MySQL Perspective by John Scott, Mailchimp
  • The State of Databases in 2019 by Dinesh Joshi, Apache Cassandra

PingCAP will be sponsoring the TiDB track and have a day of really exciting content to share! Liu Tang, Chief Engineer at PingCAP, will be presenting: Using Chaos Engineering to Build a Reliable TiDB. Keep your eye out for more coming soon!

We could not put on this conference without the support of our sponsors. By being a sponsor at Percona Live it gives companies the opportunity to showcase their products and services, interact with the community for invaluable face time, meet with users or customers and showcase their recruitment opportunities.

It’s with great pleasure to announce the first round of sponsors for Percona Live!

Diamond Sponsors

 

 

Silver Sponsors

If you’d like to find out more about being a sponsor, download the prospectus here   Stay tuned for more updates on the conference agenda! 

Shinguz: FromDual Backup and Recovery Manager for MariaDB and MySQL 2.1.0 has been released

FromDual has the pleasure to announce the release of the new version 2.1.0 of its popular Backup and Recovery Manager for MariaDB and MySQL (brman).

The new FromDual Backup and Recovery Manager can be downloaded from here. How to install and use the Backup and Recovery Manager is describe in FromDual Backup and Recovery Manager (brman) installation guide.

In the inconceivable case that you find a bug in the FromDual Backup and Recovery Manager please report it to the FromDual Bugtracker or just send us an email.

Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.

Upgrade from 1.2.x to 2.1.0

brman 2.1.0 requires a new PHP package for ssh connections.

shell> sudo apt-get install php-ssh2 shell> cd ${HOME}/product shell> tar xf /download/brman-2.1.0.tar.gz shell> rm -f brman shell> ln -s brman-2.1.0 brman
Changes in FromDual Backup and Recovery Manager 2.1.0

This release is a new major release series. It contains a lot of new features. We have tried to maintain backward-compatibility with the 1.2 and 2.0 release series. But you should test the new release seriously!

You can verify your current FromDual Backup Manager version with the following command:

shell> fromdual_bman --version shell> bman --version
FromDual Backup Manager
  • Usage (--help) updated.
  • Some WARN severities downgraded to INFO to keep mail output clean.
  • Error messages made more flexible and fixed PHP library advice.
  • Split some redundant code from bman library into brman library.
  • Security fix: Password from config file is hidden now.
  • Bug on simulation of physical backup fixed (xtrabackup_binlog_info not found).
  • Options --backup-name and --backup-overwrite introduced for restore automation.
  • Minor typo bugs fixed.
  • Option --options remove.
  • Sort order for schema backup changed to ORDER BY ASC.
  • 2 PHP errors fixed for simulation.
  • Maskerade API added.
  • Physical backup sftp archiving with special characters (+foodmarat) in archive directory name fixed.

FromDual Recovery Manager
  • Rman has progress report.
  • Full logical restore is implemented.
  • Schema logical restore is implemented.
  • Physical restore is implemented.
  • Physical restore of compressed backups is implemented.
  • Option --cleanup-first was implemented for physical backup as well.
  • Option: --stop-instance implemented.

FromDual Backup Manager Catalog
  • No changes.

Subscriptions for commercial use of FromDual Backup and Recovery Manager you can get from from us.

Taxonomy upgrade extras:  Backup Restore Recovery pitr brman release bman rman

ProxySQL Native Galera Support

One of the latest enhancements in ProxySQL v2.0 is native support for Galera Clustering. In previous versions of ProxySQL an external scheduler was required to track the status of Galera nodes however due to the widespread use we have now integrated support directly in ProxySQL's core configuration.

This blog discusses how to take an advantage of the new feature and integrate ProxySQL with Galera Cluster to monitor node status and implement read-write split with ProxySQL using a 3x node cluster.

So, let’s have a look at whats new in ProxySQL's admin interface! In the admin interface you'll find the following new tables and variables available to configure your Galera cluster and monitor the cluster's status.

ProxySQL Admin The definition of the mysql_galera_hostgroups table used to configure your Galera cluster is as follows: +--------------------------------------------+ | tables | +--------------------------------------------+ | [..] | | mysql_galera_hostgroups | | [..] | | runtime_mysql_galera_hostgroups | | [..] | +--------------------------------------------+ CREATE TABLE mysql_galera_hostgroups ( writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY, backup_writer_hostgroup INT CHECK (backup_writer_hostgroup>=0 AND backup_writer_hostgroup<>writer_hostgroup) NOT NULL, reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND backup_writer_hostgroup<>reader_hostgroup AND reader_hostgroup>0), offline_hostgroup INT NOT NULL CHECK (offline_hostgroup<>writer_hostgroup AND offline_hostgroup<>reader_hostgroup AND backup_writer_hostgroup<>offline_hostgroup AND offline_hostgroup>=0), active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1, max_writers INT NOT NULL CHECK (max_writers >= 0) DEFAULT 1, writer_is_also_reader INT CHECK (writer_is_also_reader IN (0,1,2)) NOT NULL DEFAULT 0, max_transactions_behind INT CHECK (max_transactions_behind>=0) NOT NULL DEFAULT 0, comment VARCHAR, UNIQUE (reader_hostgroup), UNIQUE (offline_hostgroup), UNIQUE (backup_writer_hostgroup));

The fields have the following semantics:

  • writer_hostgroup: the id of the hostgroup that will contain all the members that are writers
    backup_writer_hostgroup: if the cluster is running in multi-primary mode (i.e. there are multiple nodes with read_only=0) and max_writers is set to a smaller number than the total number of nodes, the additional nodes are moved to this backup writer hostgroup
  • reader_hostgroup: the id of the hostgroup that will contain all the members that are readers (i.e. nodes that have read_only=1)
  • offline_hostgroup: when ProxySQL's monitoring determines a host to be OFFLINE, the host will be moved to the offline_hostgroup
  • active: a boolean value (0 or 1) to activate a hostgroup
  • max_writers: controls the maximum number of allowable nodes in the writer hostgroup, as mentioned previously, additional nodes will be moved to the backup_writer_hostgroup
  • writer_is_also_reader: when 1, a node in the writer_hostgroup will also be placed in the reader_hostgroup so that it will be used for reads. When set to 2, the nodes from backup_writer_hostgroup will be placed in the reader_hostgroup_, instead of the node(s) in the writer_hostgroup.
  • max_transactions_behind: determines the maximum number of writesets a node in the cluster can have queued before the node is SHUNNED to prevent stale reads (this is determined by querying the wsrep_local_recv_queue Galera variable).

For reference: https://github.com/sysown/proxysql/wiki/Main-(runtime)#mysql_galera_hostgroups

The definition of the monitor.mysql_server_galera_log table used to monitor your Galera cluster is as follows: +------------------------------------+ | tables | +------------------------------------+ | [..] | | mysql_server_galera_log | | [..] | +------------------------------------+ CREATE TABLE mysql_server_galera_log ( hostname VARCHAR NOT NULL, port INT NOT NULL DEFAULT 3306, time_start_us INT NOT NULL DEFAULT 0, success_time_us INT DEFAULT 0, primary_partition VARCHAR NOT NULL DEFAULT 'NO', read_only VARCHAR NOT NULL DEFAULT 'YES', wsrep_local_recv_queue INT DEFAULT 0, wsrep_local_state INT DEFAULT 0, wsrep_desync VARCHAR NOT NULL DEFAULT 'NO', wsrep_reject_queries VARCHAR NOT NULL DEFAULT 'NO', wsrep_sst_donor_rejects_queries VARCHAR NOT NULL DEFAULT 'NO', error VARCHAR, PRIMARY KEY (hostname, port, time_start_us))

The fields have the following semantics:

  • hostname: the Galera node hostname or IP address
  • port: the Galera MySQL service port
  • time_start_us: the time the monitor check was started (microseconds)
  • success_time_us: the amount of time for the monitor check to complete (microseconds)
  • primary_partition: whether the Galera member node is PRIMARY
  • read_only: whether the node is READ ONLY
  • wsrep_local_recv_queue: the length of the receive queue during the check see wsrep_local_recv_queue
  • wsrep_local_state: the node's local state number see wsrep_local_state
  • wsrep_desync: whether the node has been set to desync see wsrep_desync
  • wsrep_reject_queries: whether the node has been set to reject queries see wsrep_reject_queries
  • wsrep_sst_donor_rejects_queries: whether the node has been set to reject queries when donor see wsrep_sst_donor_reject_queries
  • error: any error messages that occurred while checking a node
The global variables to control timeout and interval check: ProxySQL Admin> select * from global_variables where variable_name like '%monitor_galera%'; +-------------------------------------------+----------------+ | variable_name | variable_value | +-------------------------------------------+----------------+ | mysql-monitor_galera_healthcheck_interval | 5000 | | mysql-monitor_galera_healthcheck_timeout | 800 | +-------------------------------------------+----------------+ ProxySQL Configuration

We will configure the 3x node cluster in ProxySQL as follows:

  • one writer node responsible for handling all write traffic
  • one backup writer node which will be available as a standby in case the primary writer node goes offline (or becomes unavailable for writes)
  • one reader node for handling SELECT traffic

The nodes that will be used are:

  • 172.16.1.112: db-node01
  • 172.16.1.113: db-node02
  • 172.16.1.114: db-node03

First connect to the admin interface to start configuring ProxySQL:

mysql -P6032 -uadmin -padmin -h 127.0.0.1 --prompt "ProxySQL Admin> "

Now we can set up ProxySQL’s behavior for our Galera cluster. Lets setup the following hostgroups:

  • offline_hostgroup with hostgroup_id=1
  • writer_hostgroup with hostgroup_id=2
  • reader_hostgroup with hostgroup_id=3
  • backup_writer_hostgroup with hostgroup_id=4

We'll set max_writers=1 to ensure we only have 1x writer at a time and also configure the writer to be dedicated for writes ONLY so we'll also set writer_is_also_reader=0.

INSERT INTO mysql_galera_hostgroups (offline_hostgroup, writer_hostgroup, reader_hostgroup, backup_writer_hostgroup, active, max_writers, writer_is_also_reader, max_transactions_behind) VALUES (1,2,3,4,1,1,0,100);

We need to manually populate the mysql_servers table with information about our Galera nodes. The node with the lowest weight will be moved to the backup_writer_hostgroup automatically after loading configuration into runtime, for now we'll just add it to the writer_hostgroup.

INSERT INTO mysql_servers(hostgroup_id,hostname,port,weight) VALUES (2,'172.16.1.112',3306,100); INSERT INTO mysql_servers(hostgroup_id,hostname,port,weight) VALUES (2,'172.16.1.113',3306,10); INSERT INTO mysql_servers(hostgroup_id,hostname,port,weight) VALUES (3,'172.16.1.114',3306,100);

Now, we can verify the configured tables:

ProxySQL Admin> select hostgroup_id,hostname,port,status,weight,max_connections from mysql_servers; +--------------+--------------+------+--------+--------+-----------------+ | hostgroup_id | hostname | port | status | weight | max_connections | +--------------+--------------+------+--------+--------+-----------------+ | 2 | 172.16.1.112 | 3306 | ONLINE | 100 | 1000 | | 2 | 172.16.1.113 | 3306 | ONLINE | 10 | 1000 | | 3 | 172.16.1.114 | 3306 | ONLINE | 100 | 1000 | +--------------+--------------+------+--------+--------+-----------------+ ProxySQL Admin> select * from mysql_galera_hostgroups; +------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+ | writer_hostgroup | backup_writer_hostgroup | reader_hostgroup | offline_hostgroup | active | max_writers | writer_is_also_reader | max_transactions_behind | comment | +------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+ | 2 | 4 | 3 | 1 | 1 | 1 | 0 | 100 | NULL | +------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+ 1 row in set (0.00 sec)

We can now load our configuration to runtime and also save the configuration to disk to persist across restarts:

LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;

After loading the configuration to runtime, we can now see that host 172.16.1.113, which is configured with a lower weight, has been moved to hostgroup 4. To recap, this happened because we configured ProxySQL to have max_writers=1 and backup_writer_hostgroup=4.

The backup writer node will only be used in case the node 172.16.1.112 becomes unavailable.

ProxySQL Admin> select hostgroup,srv_host,status,ConnUsed,MaxConnUsed,Queries,Latency_us from stats.stats_mysql_connection_pool order by srv_host; +-----------+--------------+--------+----------+-------------+-----------+------------+ | hostgroup | srv_host | status | ConnUsed | MaxConnUsed | Queries | Latency_us | +-----------+--------------+--------+----------+-------------+-----------+------------+ | 2 | 172.16.1.112 | ONLINE | 3 | 4 | 930742390 | 118 | | 4 | 172.16.1.113 | ONLINE | 0 | 0 | 0 | 136 | | 3 | 172.16.1.114 | ONLINE | 1 | 1 | 233130345 | 123 | +-----------+--------------+--------+----------+-------------+-----------+------------+

Now it’s time to go ahead and define some query rules, which will handle read/write split. For illustrative purposes we'll just use some generic rules to redirect SELECT traffic (NOTE: IT IS NOT RECOMMENDED TO USE GENERIC QUERY RULES IN A PRODUCTION ENVIRONMENT, QUERY RULES SHOULD BE GENERATED FOR SPECIFIC QUERIES OR QUERY PATTERNS INSTEAD).

The default_hostgroup for the application user in the mysql_users table is set to 2 i.e the writer_hostgroup.

  • 1st Rule: Query processor scans the query rule to find a match for ^SELECT.* pattern and if a match is found, ProxySQL will forward these queries to destination_hostgroup=3.
  • 2nd Rule: Queries with a ^SELECT.* FOR UPDATE pattern should always be served from the writer hostgroup, so we must set the destination_hostgroup=2.
  • All traffic not matching the above criteria will be routed to the default hostgroup i.e. the writer_hostgroup
INSERT INTO mysql_query_rules (active, match_digest, destination_hostgroup, apply) VALUES (1, '^SELECT.*',3, 0); INSERT INTO mysql_query_rules (active, match_digest, destination_hostgroup, apply) VALUES (1, '^SELECT.* FOR UPDATE',2, 1); LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK;

Make sure read_only is enabled in MySQL for any hosts that should be part of the reader_hostgroup and optionally set writer_is_also_reader=0 if you want to prevent hosts in your writer_hostgroup to be used for reads.

db-node03 mysql> SET GLOBAL read_only=ON;

Alternatively, you can configure writer_is_also_reader=2: in this way you won't need to set read_only=ON because the host(s) in backup_writer_hostgroup will be used as reader(s).

When you are done with the configuration, as you can see below, there is a useful table in ProxySQL which helps to get a quick view of the state of each node of the cluster, as seen by ProxySQL:

ProxySQL Admin> select * from mysql_server_galera_log order by time_start_us desc limit 3; +--------------+------+------------------+-----------------+-------------------+-----------+------------------------+-------------------+--------------+----------------------+---------------------------------+-------+ | hostname | port | time_start_us | success_time_us | primary_partition | read_only | wsrep_local_recv_queue | wsrep_local_state | wsrep_desync | wsrep_reject_queries | wsrep_sst_donor_rejects_queries | error | +--------------+------+------------------+-----------------+-------------------+-----------+------------------------+-------------------+--------------+----------------------+---------------------------------+-------+ | 172.16.1.114 | 3306 | 1529510693289001 | 1234 | YES | NO | 0 | 4 | NO | NO | NO | NULL | | 172.16.1.113 | 3306 | 1529510693287804 | 1209 | YES | NO | 0 | 4 | NO | NO | NO | NULL | | 172.16.1.112 | 3306 | 1529510693286879 | 1158 | YES | NO | 0 | 4 | NO | NO | NO | NULL | +--------------+------+------------------+-----------------+-------------------+-----------+------------------------+-------------------+--------------+----------------------+---------------------------------+-------+ Testing time!

Let’s enable a Donor/Desync state on the Galera node we've configured as the priority writer node and check how ProxySQL handles write traffic. When the writer node changes to a Donor/Desync status we expect to see ProxySQL move all write traffic to the backup writer node after promoting it from HG4 (backup_writer_hostgroup) to HG2 (writer_hostgroup).

1.) Check the initial ProxySQL configuration:

ProxySQL Admin> select hostgroup,srv_host,status,ConnUsed,MaxConnUsed,Queries,Latency_us from stats.stats_mysql_connection_pool order by srv_host; +-----------+--------------+--------+----------+-------------+------------+------------+ | hostgroup | srv_host | status | ConnUsed | MaxConnUsed | Queries | Latency_us | +-----------+--------------+--------+----------+-------------+------------+------------+ | 2 | 172.16.1.112 | ONLINE | 4 | 4 | 2295114892 | 131 | | 4 | 172.16.1.113 | ONLINE | 0 | 0 | 0 | 162 | | 3 | 172.16.1.114 | ONLINE | 1 | 1 | 539211603 | 142 | +-----------+--------------+--------+----------+-------------+------------+------------+

2.) Desync db-node01 by setting wsrep_desync=ON

db-node01 mysql> SET GLOBAL wsrep_desync=ON;

3.) Re-check ProxySQL's configuration:

ProxySQL Admin> select hostgroup,srv_host,status,ConnUsed,MaxConnUsed,Queries,Latency_us from stats.stats_mysql_connection_pool order by srv_host; +-----------+--------------+--------+----------+-------------+-----------+------------+ | hostgroup | srv_host | status | ConnUsed | MaxConnUsed | Queries | Latency_us | +-----------+--------------+--------+----------+-------------+-----------+------------+ | 1 | 172.16.1.112 | ONLINE | 0 | 0 | 0 | 149 | | 2 | 172.16.1.113 | ONLINE | 3 | 4 | 1156479 | 129 | | 3 | 172.16.1.114 | ONLINE | 0 | 1 | 542028027 | 128 | +-----------+--------------+--------+----------+-------------+-----------+------------+

Great stuff! We see that db-node01 was moved to the offline_hostgroup as expected and db-node02 has been allocated to the writer_hostgroup in order to continue serving write traffic.

The same behaviour will occur when the primary writer node goes down and leaves the cluster. As a final note, its worthwhile to mention that apart from monitoring wsrep_dsync variable, ProxySQL is also continuously checking the status of wsrep_reject_queries and wsrep_sst_donor_rejects_queries variables and take a required action when needed.
Happy ProxySQLing !

Authored by: Ashwini Ahire & Nick Vyzas

“How to write your first patch ? ” – MariaDB Unconference Presentations

 Have you ever wondered how to get started with contributions to the world’s most popular open source database? Did you have a problems with building and configuring from source code, writing the contribution patch and testing the server with  use of mysql-test-run (mtr) framework  afterwards? How to make your patch visible to other developers? In […]

The post “How to write your first patch ? ” – MariaDB Unconference Presentations appeared first on MariaDB.org.

Measuring Percona Server for MySQL On-Disk Decryption Overhead

Percona Server for MySQL 8.0 comes with enterprise grade total data encryption features. However, there is always the question of how much overhead – or performance penalty – comes with the data decryption. As we saw in my networking performance post, SSL under high concurrency might be problematic. Is this the case for data decryption?

To measure any overhead, I will start with a simplified read-only workload, where data gets decrypted during read IO.

During query execution, the data in memory is already decrypted so there is no additional processing time. The decryption happens only for blocks that require a read from storage.

For the benchmark I will use the following workload:

sysbench oltp_read_only --mysql-ssl=off --tables=20 --table-size=10000000 --threads=$i --time=300 --report-interval=1 --rand-type=uniform run

The datasize for this workload is about 50GB, so I will use

innodb_buffer_pool_size = 5GB  to emulate a heavy disk read IO during the benchmark. In the second run, I will use innodb_buffer_pool_size = 60GB  so all data is kept in memory and there are NO disk read IO operations.

I will only use table-level encryption at this time (ie: no encryption for binary log, system tablespace, redo-  and undo- logs).

The server I am using has AES hardware CPU acceleration. Read more at https://en.wikipedia.org/wiki/AES_instruction_set

Benchmark N1, heavy read IO

Threads encrypted storage no encryption encryption overhead 1 389.11 423.47 1.09 4 1531.48 1673.2 1.09 16 5583.04 6055 1.08 32 8250.61 8479.61 1.03 64 8558.6 8574.43 1.00 96 8571.55 8577.9 1.00 128 8570.5 8580.68 1.00 256 8576.34 8585 1.00 512 8573.15 8573.73 1.00 1024 8570.2 8562.82 1.00 2048 8422.24 8286.65 0.98 Benchmark N2, data in memory, no read IO

Threads Encryption No encryption 1 578.91 567.65 4 2289.13 2275.12 16 8304.1 8584.06 32 13324.02 13513.39 64 20007.22 19821.48 96 19613.82 19587.56 128 19254.68 19307.82 256 18694.05 18693.93 512 18431.97 18372.13 1024 18571.43 18453.69 2048 18509.73 18332.59 Observations

For a high number of threads, there is no measurable difference between encrypted and unencrypted storage. This is because a lot of CPU resources are spent in contention and waits, so the relative time spend in decryption is negligible.

However, we can see some performance penalty for a low number of threads: up to 9% penalty for hardware decryption. When data fully fits into memory, there is no measurable difference between encrypted and unencrypted storage.

So if you have hardware support then you should see little impact when using storage encryption with MySQL. The easiest way to check if you have support for this is to look at CPU flags and search for ‘aes’ string:

> lscpu | grep aes Flags: ... tsc_deadline_timer aes xsave avx f16c ...

Pages