Planet MySQL

40 million tables in MySQL 8.0 with ZFS

In my previous blog post about millions of table in MySQL 8, I was able to create one million tables and test the performance of it. My next challenge is to create 40 million tables in MySQL 8 using shared tablespaces (one tablespace per schema). In this blog post I’m showing how to do it and what challenges we can expect.

Background

Once again – why do we need so many tables in MySQL, what is the use case? The main reason is: customer isolation. With the new focus on security and privacy (take GDPR for example) it is much easier and more beneficial to create a separate schema (or “database” in MySQL terms) for each customer. That creates a new set of challenges that we will need to solve. Here is the summary:

  1. Too many files. For each table MySQL creates an FRM file. With MySQL 8.0, this is not the case for InnoDB tables (new data dictionary): it does not create FRM files, only creates IBD file.
  2. Too much storage overhead. Just to create 40 million tables we will need to have ~4 – 5 Tb of space. The ZFS filesystem can help here a lot, through compression – see below.
  3. MySQL does not work well with so many tables. We have observed a lot of overhead (MySQL needs to open/close table definition files) and contention (table definitions needs to be stored in memory to avoid performance penalty, which introduce mutex contention)
Challenges

When I approached the task of creating 40 million tables, my first challenge was disk space. Just to create them, I needed at least 5Tb of fast disk storage. The good news is: we have the ZFS filesystem which provides compression out of the box. With compression I was able to use just a 250G drive with ZFS – the compression ratio is > 10x:

# du -sh --apparent-size /var/lib/mysql-data 4.7T /var/lib/mysql-data # du -sh /var/lib/mysql-data 131G /var/lib/mysql-data

The second challenge is how to create those tables in a reasonable amount of time. I created a script to “provision” the databases (create all 40 millions tables). The good new is that the performance regression in “create table” speed and scalability bug was fixed so I was able to use this script to create 40 million tables using shared tablespaces (one tablespace per schema):

#/bin/bash function do_db { db_exist=$(mysql -A -s -Nbe "select 1 from information_schema.schemata where schema_name = '$db'") if [ "$db_exist" == "1" ]; then echo "Already exists $db"; return 0; fi; mysql -vvv -e "create database $db"; mysql -vvv $db -e "CREATE TABLESPACE $db ADD DATAFILE '$db.ibd' engine=InnoDB;" for i in {1..100} do table="CREATE TABLE sbtest$i ( id int(10) unsigned NOT NULL AUTO_INCREMENT, k int(10) unsigned NOT NULL DEFAULT '0', c varchar(120) NOT NULL DEFAULT '', pad varchar(60) NOT NULL DEFAULT '', PRIMARY KEY (id), KEY k_1 (k) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 tablespace $db;" mysql $db -e "$table" done } c=0 for m in {1..4000000} do for i in {1..40} do let c=$c+1 echo $c db="sbtest_$c" do_db & done wait #if [ $c > 4000000 ]; then exit; fi done

40 million tables in MySQL 8

Now it’s time for a real test. I’m using the latest MySQL 8 version (at the time of writing): 8.0.12. This implements the new data dictionary.

MySQL config file:

[mysqld] datadir=/var/lib/mysql-data socket=/var/lib/mysql-data/mysql.sock datadir=/var/lib/mysql-data log-error = /var/lib/mysql-log/error.log server_id = 12345 log_bin = /var/lib/mysql-log/binlog relay_log=/var/lib/mysql-log/relay-bin skip-log-bin=1 innodb_log_group_home_dir = /var/lib/mysql-log innodb_doublewrite = 0 innodb_flush_log_at_trx_commit=0 innodb_log_file_size=2G innodb_buffer_pool_size=4G tablespace_definition_cache = 524288 schema_definition_cache = 524288 table_definition_cache = 524288 table_open_cache=524288 open-files-limit=1000000

Sysbench shell script:

function run_sb() { conn=" --db-driver=mysql --mysql-socket=/var/lib/mysql-data/mysql.sock --mysql-db=sbtest_1 --mysql-user=sbtest --mysql-password=abc " sysbench $conn --oltp_db_count=$db_count --oltp_tables_count=$table_count --oltp-table-size=10000 --report-interval=1 --num-threads=$num_threads --max-requests=0 --max-time=$max_time ./select_custom.lua run | tee -a sysbench_2.txt } let db_count=400000 table_count=100 max_time=10000 num_threads=32 run_sb

Sysbench lua script:

pathtest = "/usr/share/sysbench/tests/include/oltp_legacy/" if pathtest then dofile(pathtest .. "common.lua") else require("common") end function thread_init(thread_id) set_vars() end function event() local table_name local i local c_val local k_val local pad_val oltp_db_count = tonumber(oltp_db_count) or 1 -- local oltp_db_count = 4 table_name = "sbtest_" .. sb_rand(1, oltp_db_count)..".sbtest".. sb_rand(1, oltp_tables_count) k_val = sb_rand(1, oltp_table_size) c_val = sb_rand_str([[ ###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]]) pad_val = sb_rand_str([[ ###########-###########-###########-###########-###########]]) rs = db_query("SELECT id FROM " .. table_name .." LIMIT 1") end

Please note that the tables are empty – no data.

Now we can run the benchmark. Unfortunately, we have a serious mutex contention in the data dictionary. Here are the results:

[ 453s ] thds: 32 tps: 1203.96 qps: 1203.96 (r/w/o: 1203.96/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00 [ 454s ] thds: 32 tps: 1202.32 qps: 1202.32 (r/w/o: 1202.32/0.00/0.00) lat (ms,95%): 42.61 err/s: 0.00 reconn/s: 0.00 [ 455s ] thds: 32 tps: 1196.74 qps: 1196.74 (r/w/o: 1196.74/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00 [ 456s ] thds: 32 tps: 1197.18 qps: 1197.18 (r/w/o: 1197.18/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00 [ 457s ] thds: 32 tps: 887.11 qps: 887.11 (r/w/o: 887.11/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00 [ 458s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 459s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 460s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 461s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 462s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 463s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 464s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 465s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 466s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 467s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 468s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 469s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 470s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 471s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 472s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 473s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 474s ] thds: 32 tps: 403.96 qps: 403.96 (r/w/o: 403.96/0.00/0.00) lat (ms,95%): 16819.24 err/s: 0.00 reconn/s: 0.00 [ 475s ] thds: 32 tps: 1196.00 qps: 1196.00 (r/w/o: 1196.00/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00 [ 476s ] thds: 32 tps: 1208.96 qps: 1208.96 (r/w/o: 1208.96/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00 [ 477s ] thds: 32 tps: 1192.06 qps: 1192.06 (r/w/o: 1192.06/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00 [ 478s ] thds: 32 tps: 1173.89 qps: 1173.89 (r/w/o: 1173.89/0.00/0.00) lat (ms,95%): 43.39 err/s: 0.00 reconn/s: 0.00

As we can see, for ~15 seconds no queries were processed: a complete MySQL stall. That situation – complete stall – happens constantly, every ~25-30 seconds.

Show engine innodb status query shows mutex contention:

SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 498635 --Thread 140456572004096 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore: Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1 --Thread 140451898689280 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore: Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1 --Thread 140451896919808 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore: Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1 --Thread 140456571119360 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore: Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1 --Thread 140457044215552 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore: Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1 --Thread 140456572299008 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore: Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1 --Thread 140457043035904 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore: Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1 --Thread 140456571709184 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore: Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1 --Thread 140451897214720 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore: Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1 --Thread 140451896624896 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore: Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1 --Thread 140457042740992 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore: Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1 --Thread 140451899279104 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore: Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1 --Thread 140457042446080 has waited at que0que.cc line 1072 for 0.00 seconds the semaphore: Mutex at 0x34a3fe0, Mutex PARSER created pars0pars.cc:98, lock var 1 OS WAIT ARRAY INFO: signal count 89024 RW-shared spins 11216, rounds 14847, OS waits 3641

I’ve filed a  new MySQL bug: DICT_SYS mutex contention causes complete stall when running with 40 mill tables.

I’ve also tested with pareto distribution in sysbench, and even set the ratio to 0.05 (5%) and 0.01 (1%), and mutex contention is still an issue. I have used the following updated sysbench script:

function run_sb() { conn=" --db-driver=mysql --mysql-socket=/var/lib/mysql-data/mysql.sock --mysql-db=sbtest_1 --mysql-user=sbtest --mysql-password=abc " sysbench $conn --rand-type=$rand_type --rand-pareto-h=$pareto_h --oltp_db_count=$db_count --oltp_tables_count=$table_count --oltp-table-size=10000 --report-interval=1 --num-threads=$num_threads --max-requests=0 --max-time=$max_time $test_name run | tee -a sysbench_2.txt } let db_count=400000 table_count=100 max_time=10000 num_threads=32 rand_type="pareto" pareto_h=0.01 test_name="./select_custom.lua" echo "Now running $rand_type for $max_time seconds, test=$test_name" run_sb

And the results with 0.01 (1%) are the following:

[ 55s ] thds: 32 tps: 72465.29 qps: 72465.29 (r/w/o: 72465.29/0.00/0.00) lat (ms,95%): 0.53 err/s: 0.00 reconn/s: 0.00 [ 56s ] thds: 32 tps: 68641.04 qps: 68641.04 (r/w/o: 68641.04/0.00/0.00) lat (ms,95%): 0.61 err/s: 0.00 reconn/s: 0.00 [ 57s ] thds: 32 tps: 70479.82 qps: 70479.82 (r/w/o: 70479.82/0.00/0.00) lat (ms,95%): 0.57 err/s: 0.00 reconn/s: 0.00 [ 58s ] thds: 32 tps: 31395.55 qps: 31395.55 (r/w/o: 31395.55/0.00/0.00) lat (ms,95%): 0.49 err/s: 0.00 reconn/s: 0.00 [ 59s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 60s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 61s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 62s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 63s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 64s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 65s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 66s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 67s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 68s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 69s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 70s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 71s ] thds: 32 tps: 18879.04 qps: 18879.04 (r/w/o: 18879.04/0.00/0.00) lat (ms,95%): 0.75 err/s: 0.00 reconn/s: 0.00 [ 72s ] thds: 32 tps: 70924.82 qps: 70924.82 (r/w/o: 70924.82/0.00/0.00) lat (ms,95%): 0.48 err/s: 0.00 reconn/s: 0.00 [ 73s ] thds: 32 tps: 72395.57 qps: 72395.57 (r/w/o: 72395.57/0.00/0.00) lat (ms,95%): 0.47 err/s: 0.00 reconn/s: 0.00 [ 74s ] thds: 32 tps: 72483.22 qps: 72484.22 (r/w/o: 72484.22/0.00/0.00) lat (ms,95%): 0.58 err/s: 0.00 reconn/s: 0.00

ZFS

The ZFS filesystem provides compression, which helps tremendously in this case. When MySQL creates an InnoDB table it will create a new blank .ibd file and pre-allocate some pages, which will be blank. I have configured ZFS compression and can see > 10x compression ratio:

# zfs get all | grep compressratio mysqldata compressratio 12.47x - mysqldata refcompressratio 1.00x - mysqldata/mysql compressratio 12.47x - mysqldata/mysql refcompressratio 1.00x - mysqldata/mysql/data compressratio 12.51x - mysqldata/mysql/data refcompressratio 12.54x - mysqldata/mysql/log compressratio 2.79x - mysqldata/mysql/log refcompressratio 4.57x -

Conclusion

It is possible to create 40 million tables with MySQL 8.0 using shared tablespaces. ZFS provides an excellent compression ratio (with gzip) which can help by reducing the overhead of “schema per customer” architecture. Unfortunately, the new data dictionary in MySQL 8.0.12 suffers from the DICT_SYS mutex contention and causes constant “stalls”.

The post 40 million tables in MySQL 8.0 with ZFS appeared first on Percona Database Performance Blog.

This Week in Data With Colin Charles 51: Debates Emerging on the Relicensing of OSS

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

There has been a lot of talk around licenses in open source software, and it has hit the database world in the past weeks. Redis Labs relicensed some AGPL software to the Commons Clause (in their case, Apache + Commons Clause; so you can’t really call it Apache any longer). I’ll have more to say on this topic soon, but in the meantime you might enjoy reading Open-source licensing war: Commons Clause. This was the most balanced article I read about this move and the kerfuffle it has caused. We also saw this with Lerna (not database related), and here’s another good read: Open Source Devs Reverse Decision to Block ICE Contractors From Using Software.

Reviewing is under way for Percona Live Europe 2018 talks: the review of the tutorials is complete. We can expect to see a schedule by mid-September, so hang in there—I’ve received a lot of messages asking if talks are going to be approved or not.

Releases
  • While not a new release, MySQL Shell 8.0.12 is worth spending some time with, especially since you might enjoy the pluggable password store.
  • SqlKata for C# – SqlKata is an elegant Sql Query Builder for C#, it helps you to talk with your database engine with a higher order of freedom, it allows you to write complex queries in an Object Oriented Manner, helpful when you need. Works with MySQL, PostgreSQL, and more
Link List Industry Updates
  • Balazs Pocze is now a database SRE at Wikimedia Foundation. He has spoken at several Percona Live events too!
Upcoming Appearances Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

The post This Week in Data With Colin Charles 51: Debates Emerging on the Relicensing of OSS appeared first on Percona Database Performance Blog.

Webinar Wed 9/5: Choosing the Right Open Source Database

Please join Percona’s CEO, Peter Zaitsev as he presents Choosing the Right Open Source Database on Wednesday, September 5th, 2018 at 11:00 AM PDT (UTC-7) / 2:00 PM EDT (UTC-4).

 

Register Now

 

The world of open-source databases is overwhelming. There are dozens of types of databases – relational DBMS, time-series, graph, document, etc. – not to mention the dozens of software options within each of those categories. More and more, the strategies of enterprises involve open source software and open source database software. This allows companies to be more agile, quick to market, and cost-effective.

So how do you know which open source database to choose? Open-source database companies worldwide are competing for our attention. In this talk, we will distill the madness. I will give an overview of the types of open-source databases, use-cases, and key players in the ecosystem.

Register for this webinar on choosing the right open source database.

The post Webinar Wed 9/5: Choosing the Right Open Source Database appeared first on Percona Database Performance Blog.

MySQL Connector/Python on iOS Using Pythonista 3

One of the nice things about MySQL Connector/Python is that it is available in a pure Python implementation. This makes it very portable. Today I have been exploring the possibility to take advantage of that to make MySQL Connector/Python available on my iPad.

There are few Python interpreters available for iPad. The one I will be discussing today is Pythonista 3 which has support for both Python 2.7 and 3.6. One of the things that caught my interest is that it comes with libraries to work with iOS such as accessing the contact and photos as well as UI tools. This is a commercial program (AUD 15), but this far looks to be worth the money. There are other options and I hope to write about those another day.

MySQL Connector/Python is available from PyPi. This makes it easy to install the modules in a uniform way across platforms. Unfortunately, Pythonista 3 does not support the pip command out of the box. Instead there is a community contribution called StaSh that can be used to get a shell-like environment that can be used to execute pip. So, our first step is to install StaSh.

Coding with MySQL Connector/Python on an iPadInstall StaSh

StaSh is a project maintained by ywangd and can be found on GitHub. It is self-installing by executing a small Python program that can be found in the README.md file on the project repository page. You copy the source code into a new file in Pythonista 3 and execute it (using the “Play” button):

StaSh is installed by executing a downloaded script.

Then you need to restart Pythonista 3. At this point StaSh is installed.

Installing PyPi Package Using StaSh pip

In order to be able to use the pip command through StaSh, you need to launch the launch_stash.py program which was installed in the previous step. The program can be found in the This iPad folder:

Open the launch_stash.py Script in This iPad

Open the program and use the “Play” button again to execute it. This creates the shell. You can do other things than use the pip command, but for the purpose of installing MySQL Connector/Python that is all that is required. The command is:

pip install mysql-connector-python

The console output is:

Using the pip command in StaSh to install MySQL Connector/Python.

That’s it. Now you can use the MySQL Connector/Python modules on your iPad just as in any other environment.

Example

To verify it is working, let’s create a simple example. For this to work, you need MySQL installed allowing external access which likely requires enabling access to port 3306 in your firewall.

A simple example that queries the city table in the world database for the city with ID = 130 is:

connect_args = { "host": "192.0.2.1", "port": 3306, "user": "testuser", "password": "password", } db = mysql.connector.connect(**connect_args) cursor = db.cursor(named_tuple=True) sql = "SELECT * FROM world.city WHERE ID = 130" cursor.execute(sql) print(cursor.fetchone()) cursor.close() db.close()

Edit the connect_args dictionary with the connection arguments required for your MySQL instance.

Warning: The connection arguments are included inside the source code here to keep the example simple. Do not do this in real programs. It is unsafe to store the password in the source code and it makes the program hard to maintain.

When you run it, the details of Sydney, Australia is printed to the console:

Example of querying the world.city table using MySQL Connector/Python in Pythonista 3.

This all looks great, but what about the X DevAPI? Unfortunately, there are some problems there.

X DevAPI

The X DevAPI is new in MySQL 8.0. It is included in MySQL Connector/Python in the mysqlx module. However, it does not work out of the box in Pythonista 3. When you try to execute the import mysqlx command, it fails:

>>> import mysqlx Traceback (most recent call last): File "<string>", line 1, in <module> File "/private/var/mobile/Containers/Shared/AppGroup/BA51D6FF-34C8-460C-88B7-0078A16FBBD2/Pythonista3/Documents/site-packages-3/mysqlx/__init__.py", line 35, in <module> from .connection import Session File "/private/var/mobile/Containers/Shared/AppGroup/BA51D6FF-34C8-460C-88B7-0078A16FBBD2/Pythonista3/Documents/site-packages-3/mysqlx/connection.py", line 50, in <module> from .crud import Schema File "/private/var/mobile/Containers/Shared/AppGroup/BA51D6FF-34C8-460C-88B7-0078A16FBBD2/Pythonista3/Documents/site-packages-3/mysqlx/crud.py", line 34, in <module> from .statement import (FindStatement, AddStatement, RemoveStatement, File "/private/var/mobile/Containers/Shared/AppGroup/BA51D6FF-34C8-460C-88B7-0078A16FBBD2/Pythonista3/Documents/site-packages-3/mysqlx/statement.py", line 36, in <module> from .expr import ExprParser File "/private/var/mobile/Containers/Shared/AppGroup/BA51D6FF-34C8-460C-88B7-0078A16FBBD2/Pythonista3/Documents/site-packages-3/mysqlx/expr.py", line 34, in <module> from .protobuf import Message, mysqlxpb_enum File "/private/var/mobile/Containers/Shared/AppGroup/BA51D6FF-34C8-460C-88B7-0078A16FBBD2/Pythonista3/Documents/site-packages-3/mysqlx/protobuf/__init__.py", line 75, in <module> from . import mysqlx_connection_pb2 File "/private/var/mobile/Containers/Shared/AppGroup/BA51D6FF-34C8-460C-88B7-0078A16FBBD2/Pythonista3/Documents/site-packages-3/mysqlx/protobuf/mysqlx_connection_pb2.py", line 8, in <module> from google.protobuf import reflection as _reflection File "/var/containers/Bundle/Application/743A0BDC-D77A-4479-A1D9-DFF16785FBC7/Pythonista3.app/Frameworks/Py3Kit.framework/pylib/site-packages/google/protobuf/reflection.py", line 69, in <module> from google.protobuf.internal import python_message File "/var/containers/Bundle/Application/743A0BDC-D77A-4479-A1D9-DFF16785FBC7/Pythonista3.app/Frameworks/Py3Kit.framework/pylib/site-packages/google/protobuf/internal/python_message.py", line 849 except struct.error, e: ^ SyntaxError: invalid syntax

So, the Protobuf module that comes with Pythonista 3 uses the old comma syntax for exception handling. This is not allowed with Python 3 (PEP 3110). An obvious thought is to try to use the StaSh pip command to update Protobuf, but StaSh does not touch the built-in modules. Installing a second copy of Protobuf does not help either. If I find a solution, I will provide an update. However, for the time being you will need to stick to the good old mysql.connector module.

My oldest still-open MySQL bug

This morning I received an update on a MySQL bug, someone added a comment on an issue I filed in November 2003, originally for MySQL 4.0.12 and 4.1.0. It’s MySQL bug#1956 (note the very low number!), “Parser allows naming of PRIMARY KEY”:

[25 Nov 2003 16:23] Arjen Lentz Description: When specifying a PRIMARY KEY, the parser still accepts a name for the index even though the MySQL server will always name it PRIMARY. So, the parser should NOT accept this, otherwise a user can get into a confusing situation as his input is ignored/changed. How to repeat: CREATE TABLE pk (i INT NOT NULL); ALTER TABLE pk ADD PRIMARY KEY bla (i); 'bla' after PRIMARY KEY should not be accepted. Suggested fix: Fix grammar in parser.

Most likely we found it during a MySQL training session, training days have always been a great bug catching source as students would be encouraged to try lots of quirky things and explore.

It’s not a critical issue, but one from the “era of sloppiness” as I think we may now call it, with the benefit of hindsight.  At the time, it was just regarded as lenient: the parser would silently ignore things it couldn’t process in a particular context.  Like creating a foreign key on a table using an engine that didn’t support foreign keys would see the foreign keys silently disappear, rather than reporting an error.  Many  if not most of those quirks have been cleaned up over the years.  Some are very difficult to get rid of, as people use them in code and thus essentially we’re stuck with old bad behaviour as otherwise we’d break to many applications.  I think that one neat example of that is auto-casting:

SELECT "123 apples" + 1; +------------------+ | "123 apples" + 1 | +------------------+ | 124 | +------------------+ 1 row in set, 1 warning (0.000 sec) SHOW WARNINGS; +---------+------+-----------------------------------------------+ | Level | Code | Message | +---------+------+-----------------------------------------------+ | Warning | 1292 | Truncated incorrect DOUBLE value: '123 hello' | +---------+------+-----------------------------------------------+

At least that one chucks a warning now, which is good as it allows developers to catch mistakes and thus prevent trouble.  A lenient parser (or grammar) can be convenient, but it tends to also enable application developers to be sloppy, which is not really a good thing.  Something that creates a warning may be harmless, or an indication of a nasty mistake: the parser can’t tell.  So it’s best to ensure that code is clean, free even of warnings.

Going back to the named PRIMARY KEY issue… effectively, a PRIMARY KEY has the name ‘PRIMARY’ internally.  So it can’t really have another name, and the parser should not accept an attempt to name it.  The name silently disappears so when you check back in SHOW CREATE TABLE or SHOW INDEXES, you won’t ever see it.
CREATE TABLE pkx (i INT PRIMARY KEY x) reports a syntax error. So far so good.
But, CREATE TABLE pkx (i INT, PRIMARY KEY x (i)) is accepted, as is ALTER TABLE pkx ADD PRIMARY KEY x (i).

MySQL 8.0 still allows these constructs, as does MariaDB 10.3.9 !

The bug is somewhere in the parser, so that’s sql/sql_yacc.yy.  I’ve tweaked and added things in there before, but it’s been a while and particularly in parser grammar you have to be very careful that changes don’t have other side-effects.  I do think it’s worthwhile fixing even these minor issues.

Is It a Read Intensive or a Write Intensive Workload?

One of the common ways to classify database workloads is whether it is  “read intensive” or “write intensive”. In other words, whether the workload is dominated by reads or writes.

Why should you care? Because recognizing if the workload is read intensive or write intensive will impact your hardware choices, database configuration as well as what techniques you can apply for performance optimization and scalability.

This question looks trivial on the surface, but as you go deeper—complexity emerges. There are different “levels” of reads and writes for you to consider. You can also choose to look at event counts or at the time it takes to do operations. These can provide very different responses, especially as the cost difference between a single read and a single write can be an order of magnitude.

Let’s examine the TPC-C Benchmark from this point of view, or more specifically its implementation in Sysbench. The illustrations below are taken from Percona Monitoring and Management (PMM) while running this benchmark.

Analyzing read/write workload by counts


At the highest level, you can think about queries that are sent to the database. In this case we can see about 30K of SELECT queries versus 20K of UPDATE+INSERT queries, making this benchmark slightly more read intensive by this measure.


Another way to look at the load is through actual operations at the row level – a single query may touch just one row or may touch millions. In this benchmark the difference between looking at workload from a SQL commands standpoint vs a row operation standpoint yields the same results, but it is not going to always be the case.


Let’s now look at the operating system level. We can see the amount of data written to the disk is 2x more than the amount of data being read from the disk. This workload is write intensive by this measure.

Yet another way to take a look at your workload is to take a look at it from the aspect of tables. This view shows us that tables are being mostly accessed for reads and writes. This in turn allows us to see whether a given table is getting more reads or writes. This is helpful, for example, if you are considering to move some of the tables to a different server and want to clearly understand how your workload will be impacted.

Analyzing Read/Write Workload by Response Time

As I mentioned already, the counts often do not reflect the time to respond, which is typically more representative of the real work being done. To look at timing information from query point of view, we want to look at query analytics.


The “Load” column here is a measure of such a combined response time, versus count which is reflective of query counts. Looking at this list we can see that three out of top five queries are SELECT queries. Looking at the numbers overall, we can see we have a read intensive application from this perspective.

In terms of row level operations, there is currently no easy way to see if reads or writes are dominating overall but  you can get an idea from the table operations dashboard:


This shows the load on a per table basis. It labels reads “Fetch” and breaks down writes in more detail—“Update”, “Delete”, “Inserts”—which is helpful. Not all writes are equal either.

If we want to look at a response time based view of read vs write on an operating system, we can check out this disk IO Load graph. You can see in this case it happens to match the IO activity graph, with storage taking more time to serve write requests versus read requests

Summary

As you can see, the question about whether a workload is read intensive or write intensive, while simple on the surface, can have many different answers. You might ask me “OK, so what should I use?” Well… it really depends.

Looking at query counts is a great way to understand the application’s demands on the database—you can’t really do anything to change the database size.  However by changing the database configuration and schema you may drastically alter the impact of these queries, both from the standpoint of the number of rows they crunch and in terms of the disk IO they require.

The response time based statistics, gathered from the impact your queries cause on the system or disk IO, provide a better representation of the load these queries currently generate.

Another thing to keep in mind—reads and writes are not created equal. My rule of thumb for InnoDB is that a single row write is about 10x more expensive than a single row read.

More resources that you might enjoy

If you found this post useful, you might also like to see some of Percona’s other resources.

For an introduction to PMM, our free and open source management and monitoring software, you might find value in my recorded webinar, MySQL Troubleshooting and Performance Optimization with PMM

While our white paper Performance at Scale could provide useful insight if you are at the planning or review stage.

The post Is It a Read Intensive or a Write Intensive Workload? appeared first on Percona Database Performance Blog.

Installation and configuration of Percona XtraDB Cluster on CentOS 7.3

This blog will show how to install the Percona XtraDB Cluster on three CentOS 7.3 servers, using the packages from Percona repositories. This is a step-by-step installation and configuration blog, We recommend Percona XtraDB Cluster for maximum availability / reliability and scale-out READ/WRITE optimally. We are an private-label independent and vendor neutral consulting, support, managed services and education solutions provider for MySQL, MariaDB, Percona Server and ClickHouse with core expertise in performance, scalability, high availability and database reliability engineering. All our blog posts are purely focussed on education and research across open source database systems infrastructure operations. To engage us for building and managing web-scale database infrastructure operations, Please contact us on contact@minervadb.com

This cluster will be assembled of three servers/nodes:

node #1 hostname: PXC1 IP: 138.197.70.35 node #2 hostname: PXC2 IP: 159.203.118.230 node #3 hostname: PXC3 IP: 138.197.8.226

Prerequisites

  • All three nodes have a CentOS 7.3 installation.
  • Firewall has been set up to allow connecting to ports 3306, 4444, 4567 and 4568
  • SELinux is disabled
Installing from Percona Repository on 138.197.70.35
  • Install the Percona repository package:

$ sudo yum install http://www.percona.com/downloads/percona-release/redhat/0.1-4/percona-release-0.1-4.noarch.rpm

  • You should see the following if successful:

Installed: percona-release.noarch 0:0.1-4 Complete!

  • Check that the packages are available:

$ sudo yum list | grep Percona-XtraDB-Cluster-57 Percona-XtraDB-Cluster-57.x86_64 5.7.14-26.17.1.el7 percona-release-x86_64 Percona-XtraDB-Cluster-57-debuginfo.x86_64 5.7.14-26.17.1.el7 percona-release-x86_64

  • Install the Percona XtraDB Cluster packages:

$ sudo yum install Percona-XtraDB-Cluster-57

  • Start the Percona XtraDB Cluster server:

$ sudo service mysql start

  • Copy the automatically generated temporary password for the superuser account:

$ sudo grep 'temporary password' /var/log/mysqld.log

  • Use this password to login as root:

$ mysql -u root -p

  • Change the password for the superuser account and log out. For example:

mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY 'root'; Query OK, 0 rows affected (0.00 sec) mysql> exit Bye

  • Stop the mysql service:

$ sudo service mysql stop

Repeat the same Percona XtraDB Cluster installation process for 159.203.118.230 and 138.197.8.226 Configuring nodes

We have to configure separately the nodes 138.197.70.35, 159.203.118.230 and 138.197.8.226 for successfully implementing an fully operational Percona XtraDB Cluster ecosystem.

Configuring the node 138.197.70.35

Configuration file /etc/my.cnf for the first node should look like:

[mysqld] datadir=/var/lib/mysql user=mysql # Path to Galera library wsrep_provider=/usr/lib64/libgalera_smm.so # Cluster connection URL contains the IPs of node#1, node#2 and node#3 wsrep_cluster_address=gcomm://138.197.70.35,159.203.118.230,138.197.8.226 # In order for Galera to work correctly binlog format should be ROW binlog_format=ROW # MyISAM storage engine has only experimental support default_storage_engine=InnoDB # This changes how InnoDB autoincrement locks are managed and is a requirement for Galera innodb_autoinc_lock_mode=2 # Node #1 address wsrep_node_address=138.197.70.35 # SST method wsrep_sst_method=xtrabackup-v2 # Cluster name wsrep_cluster_name=pxc_cluster # Authentication for SST method wsrep_sst_auth="sstuser:sstuser"

The first node can be started with the following command:

# /etc/init.d/mysql bootstrap-pxc

We are using CentOS 7.3 so systemd bootstrap service should be used:

# systemctl start mysql@bootstrap.service

This command will start the cluster with initial wsrep_cluster_address set to gcomm://. This way the cluster will be bootstrapped and in case the node or MySQL have to be restarted later, there would be no need to change the configuration file.

After the first node has been started, cluster status can be checked by:

mysql> show status like 'wsrep%'; +------------------------------+------------------------------------------------------------+ | Variable_name | Value | +------------------------------+------------------------------------------------------------+ | wsrep_local_state_uuid | 5ea977b8-0fc0-11e7-8f73-26f60f083bd5 | | wsrep_protocol_version | 7 | | wsrep_last_committed | 8 | | wsrep_replicated | 4 | | wsrep_replicated_bytes | 906 | | wsrep_repl_keys | 4 | | wsrep_repl_keys_bytes | 124 | | wsrep_repl_data_bytes | 526 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 9 | | wsrep_received_bytes | 1181 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 2 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0.111111 | | wsrep_local_cached_downto | 3 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_interval | [ 28, 28 ] | | wsrep_cert_deps_distance | 1.000000 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 2 | | wsrep_cert_bucket_count | 22 | | wsrep_gcache_pool_size | 3128 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0.000000 | | wsrep_incoming_addresses | 159.203.118.230:3306,138.197.8.226:3306,138.197.70.35:3306 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | b79d90df-1077-11e7-9922-3a1b217f7371 | | wsrep_cluster_conf_id | 3 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 5ea977b8-0fc0-11e7-8f73-26f60f083bd5 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 2 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 3.20(r7e383f7) | | wsrep_ready | ON | +------------------------------+------------------------------------------------------------+ 60 rows in set (0.01 sec)

This output above shows that the cluster has been successfully bootstrapped.

In order to perform successful State Snapshot Transfer using XtraBackup new user needs to be set up with proper privileges:

mysql@PXC1> CREATE USER 'sstuser'@'localhost' IDENTIFIED BY 'sstuser'; mysql@PXC1> GRANT PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost'; mysql@PXC1> FLUSH PRIVILEGES;

Configuration file /etc/my.cnf on the second node (PXC2) should look like this:

[mysqld] datadir=/var/lib/mysql user=mysql # Path to Galera library wsrep_provider=/usr/lib64/libgalera_smm.so # Cluster connection URL contains the IPs of node#1, node#2 and node#3 wsrep_cluster_address=gcomm://138.197.70.35,159.203.118.230,138.197.8.226 # In order for Galera to work correctly binlog format should be ROW binlog_format=ROW # MyISAM storage engine has only experimental support default_storage_engine=InnoDB # This changes how InnoDB autoincrement locks are managed and is a requirement for Galera innodb_autoinc_lock_mode=2 # Node #2 address wsrep_node_address=159.203.118.230 # SST method wsrep_sst_method=xtrabackup-v2 # Cluster name wsrep_cluster_name=pxc_cluster # Authentication for SST method wsrep_sst_auth="sstuser:sstuser"

Second node can be started with the following command:

# systemctl start mysql

Cluster status can now be checked on both nodes. This is the example from the second node (PXC2):

mysql> show status like 'wsrep%'; +------------------------------+------------------------------------------------------------+ | Variable_name | Value | +------------------------------+------------------------------------------------------------+ | wsrep_local_state_uuid | 5ea977b8-0fc0-11e7-8f73-26f60f083bd5 | | wsrep_protocol_version | 7 | | wsrep_last_committed | 8 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 10 | | wsrep_received_bytes | 1238 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_cached_downto | 6 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_interval | [ 28, 28 ] | | wsrep_cert_deps_distance | 1.000000 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 2 | | wsrep_cert_bucket_count | 22 | | wsrep_gcache_pool_size | 2300 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0.000000 | | wsrep_incoming_addresses | 159.203.118.230:3306,138.197.8.226:3306,138.197.70.35:3306 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | 248e2782-1078-11e7-a269-4a3ec033a606 | | wsrep_cluster_conf_id | 3 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 5ea977b8-0fc0-11e7-8f73-26f60f083bd5 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 0 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 3.20(r7e383f7) | | wsrep_ready | ON | +------------------------------+------------------------------------------------------------+ 60 rows in set (0.00 sec)

This output shows that the new node has been successfully added to the cluster.

MySQL configuration file /etc/my.cnf on the third node (PXC3) should look like this:

[mysqld] datadir=/var/lib/mysql user=mysql # Path to Galera library wsrep_provider=/usr/lib64/libgalera_smm.so # Cluster connection URL contains the IPs of node#1, node#2 and node#3 wsrep_cluster_address=gcomm://138.197.70.35,159.203.118.230,138.197.8.226 # In order for Galera to work correctly binlog format should be ROW binlog_format=ROW # MyISAM storage engine has only experimental support default_storage_engine=InnoDB # This changes how InnoDB autoincrement locks are managed and is a requirement for Galera innodb_autoinc_lock_mode=2 # Node #3 address wsrep_node_address=138.197.8.226 # SST method wsrep_sst_method=xtrabackup-v2 # Cluster name wsrep_cluster_name=pxc_cluster # Authentication for SST method wsrep_sst_auth="sstuser:sstuser"

Third node can now be started with the following command:

# systemctl start mysql

Percona XtraDB Cluster status can now be checked from the third node (PXC3):

mysql> show status like 'wsrep%'; +------------------------------+------------------------------------------------------------+ | Variable_name | Value | +------------------------------+------------------------------------------------------------+ | wsrep_local_state_uuid | 5ea977b8-0fc0-11e7-8f73-26f60f083bd5 | | wsrep_protocol_version | 7 | | wsrep_last_committed | 8 | | wsrep_replicated | 2 | | wsrep_replicated_bytes | 396 | | wsrep_repl_keys | 2 | | wsrep_repl_keys_bytes | 62 | | wsrep_repl_data_bytes | 206 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 4 | | wsrep_received_bytes | 529 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_cached_downto | 6 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_interval | [ 28, 28 ] | | wsrep_cert_deps_distance | 1.000000 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 2 | | wsrep_cert_bucket_count | 22 | | wsrep_gcache_pool_size | 2166 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0.000000 | | wsrep_incoming_addresses | 159.203.118.230:3306,138.197.8.226:3306,138.197.70.35:3306 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | 3f51b20e-1078-11e7-8405-8e9b37a37cb1 | | wsrep_cluster_conf_id | 3 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 5ea977b8-0fc0-11e7-8f73-26f60f083bd5 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 1 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 3.20(r7e383f7) | | wsrep_ready | ON | +------------------------------+------------------------------------------------------------+ 60 rows in set (0.03 sec)

This output confirms that the third node has joined the cluster.

Testing Replication

Creating the new database on the PXC1 node:

mysql> create database minervadb; Query OK, 1 row affected (0.01 sec)

Creating the example table on the PXC2 node:

mysql> use minervadb; Database changed mysql> CREATE TABLE example (node_id INT PRIMARY KEY, node_name VARCHAR(30)); Query OK, 0 rows affected (0.01 sec)

Inserting records on the PXC3 node:

mysql> INSERT INTO minervadb.example VALUES (1, 'MinervaDB'); Query OK, 1 row affected (0.07 sec)

Retrieving all the rows from that table on the PXC1 node:

mysql> select * from minervadb.example; +---------+-----------+ | node_id | node_name | +---------+-----------+ | 1 | MinervaDB | +---------+-----------+ 1 row in set (0.00 sec)

 

The post Installation and configuration of Percona XtraDB Cluster on CentOS 7.3 appeared first on MySQL Consulting, Support and Remote DBA Services.

MySQL InnoDB Cluster: upgrade from 8.0.11 to 8.0.12

In April, I already posted an article on how to upgrade safely your MySQL InnoDB Cluster, let’s review this procedure now that MySQL 8.0.12 is out.

To upgrade all the members of a MySQL InnoDB Cluster (Group), you need to keep in mind the following points:

  • upgrade all the nodes one by one
  • always end by the Primary Master in case of Single Primary Mode
  • after upgrading the binaries don’t forget to start MySQL without starting Group Replication (group_replication_start_on_boot=0)
  • to run mysql_upgrade

Let’s see this in action on the video below:

As you could see, this is quick and easy.

Summary

This is an overview of the operations to run on each nodes, one by one:

mysql> set persist group_replication_start_on_boot=0; # systemctl stop mysqld # yum update mysql-community-server mysql-shell # systemctl start mysqld # mysql_upgrade mysql> set persist group_replication_start_on_boot=1; mysql> restart;

Enjoy MySQL InnoDB Cluster, good migration to 8.0.12 and don’t forget to register to Oracle Open World if you want to learn more about MySQL 8.0 and InnoDB Cluster !

Installation and configuration of Percona XtraDB Cluster on CentOS 7.3

This blog will show how to install the Percona XtraDB Cluster on three CentOS 7.3 servers, using the packages from Percona repositories. This is a step-by-step installation and configuration blog, We recommend Percona XtraDB Cluster for maximum availability / reliability and scale-out READ/WRITE optimally. We are an private-label independent and vendor neutral consulting, support, managed services and education solutions provider for MySQL, MariaDB, Percona Server and ClickHouse with core expertise in performance, scalability, high availability and database reliability engineering. All our blog posts are purely focussed on education and research across open source database systems infrastructure operations. To engage us for building and managing web-scale database infrastructure operations, Please contact us on contact@minervadb.com  

This cluster will be assembled of three servers/nodes:

node #1 hostname: PXC1 IP: 138.197.70.35 node #2 hostname: PXC2 IP: 159.203.118.230 node #3 hostname: PXC3 IP: 138.197.8.226

Prerequisites

  • All three nodes have a CentOS 7.3 installation.
  • Firewall has been set up to allow connecting to ports 3306, 4444, 4567 and 4568
  • SELinux is disabled
Installing from Percona Repository on 138.197.70.35
  • Install the Percona repository package:

$ sudo yum install http://www.percona.com/downloads/percona-release/redhat/0.1-4/percona-release-0.1-4.noarch.rpm

  • You should see the following if successful:

Installed: percona-release.noarch 0:0.1-4 Complete!

  • Check that the packages are available:

$ sudo yum list | grep Percona-XtraDB-Cluster-57 Percona-XtraDB-Cluster-57.x86_64 5.7.14-26.17.1.el7 percona-release-x86_64 Percona-XtraDB-Cluster-57-debuginfo.x86_64 5.7.14-26.17.1.el7 percona-release-x86_64

  • Install the Percona XtraDB Cluster packages:

$ sudo yum install Percona-XtraDB-Cluster-57

  • Start the Percona XtraDB Cluster server:

$ sudo service mysql start

  • Copy the automatically generated temporary password for the superuser account:

$ sudo grep 'temporary password' /var/log/mysqld.log

  • Use this password to login as root:

$ mysql -u root -p

  • Change the password for the superuser account and log out. For example:

mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY 'root'; Query OK, 0 rows affected (0.00 sec) mysql> exit Bye

  • Stop the mysql service:

$ sudo service mysql stop

Repeat the same Percona XtraDB Cluster installation process for 159.203.118.230 and 138.197.8.226 Configuring nodes

We have to configure separately the nodes 138.197.70.35, 159.203.118.230 and 138.197.8.226 for successfully implementing an fully operational Percona XtraDB Cluster ecosystem.

Configuring the node 138.197.70.35

Configuration file /etc/my.cnf for the first node should look like:

[mysqld] datadir=/var/lib/mysql user=mysql # Path to Galera library wsrep_provider=/usr/lib64/libgalera_smm.so # Cluster connection URL contains the IPs of node#1, node#2 and node#3 wsrep_cluster_address=gcomm://138.197.70.35,159.203.118.230,138.197.8.226 # In order for Galera to work correctly binlog format should be ROW binlog_format=ROW # MyISAM storage engine has only experimental support default_storage_engine=InnoDB # This changes how InnoDB autoincrement locks are managed and is a requirement for Galera innodb_autoinc_lock_mode=2 # Node #1 address wsrep_node_address=138.197.70.35 # SST method wsrep_sst_method=xtrabackup-v2 # Cluster name wsrep_cluster_name=pxc_cluster # Authentication for SST method wsrep_sst_auth="sstuser:sstuser"

The first node can be started with the following command:

# /etc/init.d/mysql bootstrap-pxc

We are using CentOS 7.3 so systemd bootstrap service should be used:

# systemctl start mysql@bootstrap.service

This command will start the cluster with initial wsrep_cluster_address set to gcomm://. This way the cluster will be bootstrapped and in case the node or MySQL have to be restarted later, there would be no need to change the configuration file.

After the first node has been started, cluster status can be checked by:

mysql> show status like 'wsrep%'; +------------------------------+------------------------------------------------------------+ | Variable_name | Value | +------------------------------+------------------------------------------------------------+ | wsrep_local_state_uuid | 5ea977b8-0fc0-11e7-8f73-26f60f083bd5 | | wsrep_protocol_version | 7 | | wsrep_last_committed | 8 | | wsrep_replicated | 4 | | wsrep_replicated_bytes | 906 | | wsrep_repl_keys | 4 | | wsrep_repl_keys_bytes | 124 | | wsrep_repl_data_bytes | 526 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 9 | | wsrep_received_bytes | 1181 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 2 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0.111111 | | wsrep_local_cached_downto | 3 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_interval | [ 28, 28 ] | | wsrep_cert_deps_distance | 1.000000 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 2 | | wsrep_cert_bucket_count | 22 | | wsrep_gcache_pool_size | 3128 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0.000000 | | wsrep_incoming_addresses | 159.203.118.230:3306,138.197.8.226:3306,138.197.70.35:3306 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | b79d90df-1077-11e7-9922-3a1b217f7371 | | wsrep_cluster_conf_id | 3 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 5ea977b8-0fc0-11e7-8f73-26f60f083bd5 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 2 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 3.20(r7e383f7) | | wsrep_ready | ON | +------------------------------+------------------------------------------------------------+ 60 rows in set (0.01 sec)

This output above shows that the cluster has been successfully bootstrapped.

In order to perform successful State Snapshot Transfer using XtraBackup new user needs to be set up with proper privileges:

mysql@PXC1> CREATE USER 'sstuser'@'localhost' IDENTIFIED BY 'sstuser'; mysql@PXC1> GRANT PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost'; mysql@PXC1> FLUSH PRIVILEGES;

Configuration file /etc/my.cnf on the second node (PXC2) should look like this:

[mysqld] datadir=/var/lib/mysql user=mysql # Path to Galera library wsrep_provider=/usr/lib64/libgalera_smm.so # Cluster connection URL contains the IPs of node#1, node#2 and node#3 wsrep_cluster_address=gcomm://138.197.70.35,159.203.118.230,138.197.8.226 # In order for Galera to work correctly binlog format should be ROW binlog_format=ROW # MyISAM storage engine has only experimental support default_storage_engine=InnoDB # This changes how InnoDB autoincrement locks are managed and is a requirement for Galera innodb_autoinc_lock_mode=2 # Node #2 address wsrep_node_address=159.203.118.230 # SST method wsrep_sst_method=xtrabackup-v2 # Cluster name wsrep_cluster_name=pxc_cluster # Authentication for SST method wsrep_sst_auth="sstuser:sstuser"

Second node can be started with the following command:

# systemctl start mysql

Cluster status can now be checked on both nodes. This is the example from the second node (PXC2):

mysql> show status like 'wsrep%'; +------------------------------+------------------------------------------------------------+ | Variable_name | Value | +------------------------------+------------------------------------------------------------+ | wsrep_local_state_uuid | 5ea977b8-0fc0-11e7-8f73-26f60f083bd5 | | wsrep_protocol_version | 7 | | wsrep_last_committed | 8 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 10 | | wsrep_received_bytes | 1238 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_cached_downto | 6 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_interval | [ 28, 28 ] | | wsrep_cert_deps_distance | 1.000000 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 2 | | wsrep_cert_bucket_count | 22 | | wsrep_gcache_pool_size | 2300 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0.000000 | | wsrep_incoming_addresses | 159.203.118.230:3306,138.197.8.226:3306,138.197.70.35:3306 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | 248e2782-1078-11e7-a269-4a3ec033a606 | | wsrep_cluster_conf_id | 3 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 5ea977b8-0fc0-11e7-8f73-26f60f083bd5 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 0 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 3.20(r7e383f7) | | wsrep_ready | ON | +------------------------------+------------------------------------------------------------+ 60 rows in set (0.00 sec)

This output shows that the new node has been successfully added to the cluster.

MySQL configuration file /etc/my.cnf on the third node (PXC3) should look like this:

[mysqld] datadir=/var/lib/mysql user=mysql # Path to Galera library wsrep_provider=/usr/lib64/libgalera_smm.so # Cluster connection URL contains the IPs of node#1, node#2 and node#3 wsrep_cluster_address=gcomm://138.197.70.35,159.203.118.230,138.197.8.226 # In order for Galera to work correctly binlog format should be ROW binlog_format=ROW # MyISAM storage engine has only experimental support default_storage_engine=InnoDB # This changes how InnoDB autoincrement locks are managed and is a requirement for Galera innodb_autoinc_lock_mode=2 # Node #3 address wsrep_node_address=138.197.8.226 # SST method wsrep_sst_method=xtrabackup-v2 # Cluster name wsrep_cluster_name=pxc_cluster # Authentication for SST method wsrep_sst_auth="sstuser:sstuser"

Third node can now be started with the following command:

# systemctl start mysql

Percona XtraDB Cluster status can now be checked from the third node (PXC3):

mysql> show status like 'wsrep%'; +------------------------------+------------------------------------------------------------+ | Variable_name | Value | +------------------------------+------------------------------------------------------------+ | wsrep_local_state_uuid | 5ea977b8-0fc0-11e7-8f73-26f60f083bd5 | | wsrep_protocol_version | 7 | | wsrep_last_committed | 8 | | wsrep_replicated | 2 | | wsrep_replicated_bytes | 396 | | wsrep_repl_keys | 2 | | wsrep_repl_keys_bytes | 62 | | wsrep_repl_data_bytes | 206 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 4 | | wsrep_received_bytes | 529 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_cached_downto | 6 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_interval | [ 28, 28 ] | | wsrep_cert_deps_distance | 1.000000 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 2 | | wsrep_cert_bucket_count | 22 | | wsrep_gcache_pool_size | 2166 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0.000000 | | wsrep_incoming_addresses | 159.203.118.230:3306,138.197.8.226:3306,138.197.70.35:3306 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | 3f51b20e-1078-11e7-8405-8e9b37a37cb1 | | wsrep_cluster_conf_id | 3 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 5ea977b8-0fc0-11e7-8f73-26f60f083bd5 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 1 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 3.20(r7e383f7) | | wsrep_ready | ON | +------------------------------+------------------------------------------------------------+ 60 rows in set (0.03 sec)

This output confirms that the third node has joined the cluster.

Testing Replication

Creating the new database on the PXC1 node:

mysql> create database minervadb; Query OK, 1 row affected (0.01 sec)

Creating the example table on the PXC2 node:

mysql> use minervadb; Database changed mysql> CREATE TABLE example (node_id INT PRIMARY KEY, node_name VARCHAR(30)); Query OK, 0 rows affected (0.01 sec)

Inserting records on the PXC3 node:

mysql> INSERT INTO minervadb.example VALUES (1, 'MinervaDB'); Query OK, 1 row affected (0.07 sec)

Retrieving all the rows from that table on the PXC1 node:

mysql> select * from minervadb.example; +---------+-----------+ | node_id | node_name | +---------+-----------+ | 1 | MinervaDB | +---------+-----------+ 1 row in set (0.00 sec)

 

The post Installation and configuration of Percona XtraDB Cluster on CentOS 7.3 appeared first on MySQL Consulting, Support and Remote DBA Services.

Scaling IO-Bound Workloads for MySQL in the Cloud

Is increasing GP2 volumes size or increasing IOPS for IO1 volumes a valid method for scaling IO-Bound workloads? In this post I’ll focus on one question: how much can we improve performance if we use faster cloud volumes? This post is a continuance of previous cloud research posts:

To recap, in Amazon EC2 we can use gp2 and io1 volumes. gp2 performance can be scaled with size, i.e for gp2 volume size of 500GB we get 1500 iops; size 1000GB – 3000 iops; and for 3334GB – 10000 iops (maximal possible value). For io1 volumes we can “buy” throughput up to 30000 iops.

So I wanted to check how both InnoDB and RocksDB storage engines perform on these volumes with different throughput.

Benchmark Scenario

I will use the same datasize that I used in Saving With MyRocks in The Cloud, that is sysbench-tpcc, 50 tables, 100W each, about 500GB datasize in InnoDB and 100GB in RocksDB (compressed with LZ4).

Volumes settings: gp2 volumes from 500GB (1000GB for InnoDB) to 3400GB with 100GB increments (so each increment increases throughput by 300 iops); io1 volumes: 1TB in size, iops from 1000 to 30000 with 1000 increments.

Let’s take look at the results. I will use a slightly different format than usual, but hopefully it represents the results better. You will see density throughout the plots—a higher and narrower chart represents less variance in the throughput. The plot represents the distribution of the throughput.

Results on GP2 volumes:

It’s quite interesting to see how the result scales with better IO throughput. InnoDB does not improve its throughput after gp2 size 2600GB, while MyRocks continues to scale linearly. The problem with MyRocks is that there is a lot of variance in throughput (I will show a one second resolution chart).

Results on IO1 volumes

Here MyRocks again shows an impressive growth as as we add more IO capacity, but also shows a lot of variance on high capacity volumes.

Let’s compare how engines perform with one second resolution. GP2 volume, 3400GB:

IO1 volume, 30000 iops:

So for MyRocks there seems to be periodical background activity, which does not allow it to achieve a stable throughput.

Raw results, if you’d like to review them, can be found here: https://github.com/Percona-Lab-results/201808-rocksdb-cloudio

Conclusions

If you are looking to improve throughput in IO-bound workloads, either increasing GP2 volumes size or increasing IOPS for IO1 volumes is a valid method, especially for the MyRocks engine.

The post Scaling IO-Bound Workloads for MySQL in the Cloud appeared first on Percona Database Performance Blog.

MySQL Shell 8.0.12 – What’s New?

Pluggable Password Store

This feature which is enabled by default, allows the Shell using an external component “Secret Store”, to persist session passwords in a secure way. Whenever a new session is created while working with the shell in interactive mode, the option to persist the password for the session will be made available.…

Getting Help in MySQL Shell

The MySQL Shell has a collection of commands and APIs that allow performing a variety of tasks on MySQL instances  interactively or through scripts.

Although there’s documentation for the JavaScript and Python APIs as well as the User Guide, it comes handy being able to get help about how to use specific elements of the application while working with it, this is what the centralized help system does.…

SQLyog MySQL GUI 13.1.1 Released

This release fixes a rare crash as well as a regression bug introduced in 13.1.0.

Bug Fixes:

–Fixed a regression bug introduced in 13.1.0, an error was returned on adding or editing the data in the ‘Result’ tab. The same steps also caused SQLyog to crash sometimes. This is fixed now.

 

The post SQLyog MySQL GUI 13.1.1 Released appeared first on SQLyog Blog.

Scaling from a Dark Ages of Big Data to Brighter Today

This topic has been up in the air for a long time. What would be the best approach: get less large boxes or smaller for the same price? What would be pros and cons for every approach for building database layer infrastructure? What would be the best approach to spread write traffic, etc. Let’s think out loud.

First things first so let's talk about having single box versus few. This is really simple. Having one box for all reads and writes is the simplest architecture possible, but as every simplification it causes some inconveniences. Like outages. Nothing lasts forever and so will not a single box or even cloud instance (they are still utilizing bare metal somehow, right?), which means it will become unavailable at some point so if you used to have just a single box to serve the traffic you will have to wait for recovery or having fun restoring production from backup with downtime and most likely data loss. Not really exciting.

Here comes the second box and so MySQL replication which rise more questions than answers. Since we’re investing money in the second box should it serve live traffic or just be a small backup replica? Well, if master box dies and second is not powerful enough to serve all traffic we can still call it an outage. Not as severe as the one with the single box as we will likely not lose any data, but service is still interrupted which will not make business happy.

What would be the answer? The answer is to have enough capacity in the cluster. if you use two boxes you should have 50% spare capacity which means one completely idle box so in case first will die second will be able to handle full production traffic. Looks like we’ve just spent 50% of infrastructure budget for almost nothing. We can still use stand-by node for backups and to serve some ad-hoc queries, but still doesn’t look efficient, and still vulnerable for a double fault.

Simple math shows us that in the case of 3 boxes we only need 1/3 or 33.3% spare capacity and for four boxes this number is down to 25% which looks way better than half. If you have 10 boxes and one of them dies overall capacity will only drop by 10% so you may not even notice until you’ll get an alert. So what would be the best number? Well, it depends on your MySQL workload types, application requirements and environment you’re in. Cloud environment allows you to fire up instances quickly so you only have to wait for data to be copied to disk from latest backup which turns down turnaround to less than 24 hours (usually). For bare-metal installation, you have to consider delivery time for new boxes so you need to ensure you’ll have enough spare capacity to wait for new boxes to arrive or have the ability to route traffic to another data center.
In my experience I would recommend to start with at least three boxes:

  • Active master (write traffic)
  • Stand-by master (read traffic)
  • Backup instance

Having master to serve write traffic and replication-sensitive queries let you utilize stand-by master for read traffic and keep it warm in case of an outage and automatic failover (can be done using MHA, Orchestrator, etc). The backup instance is usually used as delayed replica to prevent data loss in case of human mistake, application bug, etc, and can be scaled down in the cloud environment to keep overall ownership costs low. Three nodes is also a minimal recommended number of nodes for multi-writable MySQL-based technologists like MySQL InnoDB Cluster, MySQL Group Replication, Galera Cluster and XtraDB Cluster.

Next question is how to route traffic from application to DB cluster. There are several ways to do that. One of the approaches is to make the application aware of several hosts and make it to route traffic to different MySQL servers. This will include read-write split on the application side. Issue with this approach it that every time of the server is unavailable or overloaded application instance will be unable to keep working properly unless more or less complicated logic is implemented.

Another approach is to use third-party software to route traffic dynamically based on current cluster status, load, and roles. There is a number of such software available on the market you could find most of them including benchmarks here. Using third party tools giving you the ability to make your application infrastructure unaware and so avoid complicated non-user-oriented logic inside the application, but more importantly change the query routing dynamically without application deployment.

Speaking about routers there are some things which makes ProxySQL a great choice: it supports query level logic and able to understand query itself, user, schema and other details about the traffic. This will allow you to utilize a lot of ProxySQL features like query-based routing, load balancing, failover support, and many more. We usually recommend to setup ProxySQL instances on the application boxes for availability reasons. Typical architecture also includes monitoring host with graphing and failover tools. You can find it below on the infrastructure diagram.

If you have any questions please do not hesitate to contact us. Our performance and scalability experts will help you to analyze your infrastructure and help to build fast and reliable architecture. We also offer long term support and consulting for ProxySQL users.

Follow us on:

Authored by: Vlad Fedorkov

Exporting significant SQL reports with ActiveRecord

A few months ago we faced a memory issue on some of our background jobs. Heroku was killing our dyno because it was exceeding its allowed memory. Thanks to our instrumentation of Sidekiq, it was easy to spot the culprit. The job was doing a fairly complex SQL request, and outputing the query’s result into a CSV file before archiving this file.

In this article, I’ll explain what happened and detail the method we used to solve the problem. I had never seen or used this technique before thus I thought it would be nice to share.

More context

We run a tiny framework, something more like a convention, to run SQL queries and archive the results. If I remove the noise of the framework, we had a code like:

rows = with_replica_database do ActiveRecord::Base.connection.select_rows(query) end CSV.generate do |csv| csv << header rows.each { |row| csv << row } end

In this simplified example, there are:

  • with_slave_database: a helper that helps us run a piece of code using a replica database,
  • query: our SQL query, as a String, and
  • header: a placeholder for the Array of our columns names.

We used select_rows as the results of the query didn’t really match any of our models. It is a reporting query that does too many join, group by, and subqueries. The query takes dozens of minutes to run. We could, and probably should, integrate that into our ETL but that’s not the point…

The resulting CSV file wasn’t that big, maybe a hundred megabytes.

The issue

The memory comsumption of this came from the many rows returned by the select_rows method. Each row is an array containing many entries as our CSV have many columns. Each entry could be a complex datatype converted by ActiveRecord into even more complex Ruby objects. We had many instances of Time with their TimeZone, BigDecimal, …

Since the query returns millions of rows, even while having a linear complexity, the memory consumption is too high.

An impossible approach

At first I thought about paginating the results much in the same way that find_each works. The problem with that was that for 10000 rows, if I paginatd by 1000, it would take 10 times the time of the same request without pagination.

Our query looked like this:

SELECT t.a, u.b, SUM(v.c) as c FROM t JOIN u ON u.id = t.u_id JOIN v ON v.id = u.v_id GROUP BY t.a, u.b

Just imagine t, u, v being subqueries with unions, OR conditions, other GROUP BYand more of poorly performing stuff. The sad part is the GROUP BY which required the engine to go through all results in order to group rows correctly. Using pagination on this would be something like:

SELECT t.a, u.b, SUM(v.c) as c FROM t JOIN u ON u.id = t.u_id JOIN v ON v.id = u.v_id GROUP BY 1, 2 ORDER BY 1, 2 LIMIT 10000 OFFSET 1000000

So the fewer entries on a page, the less memory used on the client-side but the more time spent in the database beacause more requests will be done. The more entries on a page, the more memory used on the client-side but the less time spent in the database because less requests will be done.

In the end, this approach wouldn’t have been future-proof.

Focusing more on the problem

It was easy to try to find solutions to the results does not fit in memory problem because it is a known one. It is common with Rails that long lists and association-preloading will cause you memory issues. The quick-fix is to use the find_each or in_batches methods.

I realized that I didn’t actually need to load everything in memory, I’m only interested in getting one line at a time in order to write it into the CSV and then forgotting about it, thanks to the garbage collector.

Solving the right problem

After acknowledging what the true issue was, it was possible to find something more efficient: streaming APIs.

CSV.generate do |csv| csv << header with_replica_database do mysql2 = ActiveRecord::Base.connection.instance_variable_get(:@connection) rows = mysql2.query(query, stream: true, cache_rows: false) rows.each { |row| csv << row } end end

The idea was to bypass ActiveRecord and use the underlying MySQL client which was providing the stream option. I’m sure there are similar options for other databases.

With that implementation, we only do one request, so no pagination, but we won’t have all the results in memory. We never needed to have all those results in memory in the first place anyway.

Conclusion

I would be very interested to use this feature with ActiveRecord’s ability to return models rather than rows. Maybe it is already possible but I didn’t find it. If you have any further information on the subject, please let me know!

I hope you won’t have to use these lower level APIs. But, if you do encounter the same kind of memory issues, don’t throw money at it right away. Try this first ^^

And obviously, most of this could be avoided by tweaking the layout of data and their relations. In our case, denormalization could make this easier but we’re not ready to pay that cost - yet.

Replication from External Primary/Leader into GCP

This is a post based on recent tutorials I published, with the goal of discussing how to prepare your current MySQL instance to be configured as an External Primary Server with a Replica/Follower into Google Cloud Platform.

First, I want to talk about the jargon used here. I will be using primary to represent the external “master” server, and replica to represent the “slave” server. Personally, I prefer the terms leader/follower but primary/replica currently seems to be more common in the industry. At some point, the word slave will be used, but because it is the keyword embedded on the server to represent a replica.

The steps given will be in the context of a VM running a one-click install of WordPress acquired through the Google Marketplace (formerly known as Launcher) .

To help prepare for replication you need to configure your primary to meet some requirements.

  1. server-id must be configured; it needs to have binary logging enabled; it needs to have GTID enabled, and GTID must be enforced. Tutorial.
  2. A Replication User must exist on the primary, remembering you may need root to create it
  3. A dump file must be generated using the mysqldump command with some information on it.

The steps above are also necessary if you are migrating from another cloud or on-prem.

Why split the application and database and use a service like Cloud SQL? Cloud SQL

First, you will be able to use your application server to do what it was mainly designed for: serve requests of your WordPress application (and it doesn’t much matter for the purposes of this post if you are using nginx or Apache).

Databases are heavy, their deadly sin is gluttony, they tend to occupy as much memory as they can to make lookups fairly fast. Once you are faced with this reality, sharing resources with your application is not a good idea.

Next, you may say: I could use Kubernetes! Yes, you could, but just because you can do something doesn’t mean you should. Configuring stateful applications inside Kubernetes is a challenge, and the fact that pods can be killed at any moment may pose a threat to your data consistency if it happens mid transaction. There are solutions on the market that use MySQL on top of Kubernetes, but that would be a totally different discussion.

You also don’t need to use Cloud SQL, you can set up your
database replicas, or even the primary, on another VM (still wins when compared with putting the database and application together), but in this scenario you are perpetually risking hitting the limits of your finite hardware capabilities.

Finally, Cloud SQL has a 99.95% availability and it is curated by the SRE team of Google. That means you can focus your efforts on what really matters — developing your application — and not spend hours, or even days, setting up servers. Other persuasively convenient features include PITR (Point in Time Recovery) and High Availability in case a failover is necessary.

Setting up the replica on GCP

Accessing the menu SQL in your Google Cloud Console will give you a listing of your current Cloud SQL instances. From there execute the following:

  1. Click on the Migrate Data button
  2. Once you have familiarized yourself with the steps shown on the screen, click on Begin Migration
  3. In the Data source details , fill the form out as follows:
    1. Name of data source: Any valid name for a Cloud SQL instance that will represent the primary server name
    2. Public IP address of source: The IP address of the primary
    3. Port number of source: The port number for the primary, usually 3306
    4. MySQL replication username: The username associated with the replication permissions on the primary
    5. MySQL replication password: The password for the replication username
    6. Database version: Choose between MySQL 5.6 and MySQL 5.7. If you are not sure which version you are running, execute SELECT @@version; in your primary server and you will have the answer.
    7. (Optional) Enable SSL/TLS certification: Upload or enter the Source CA Certificate
  4. Click on Next

The next section Cloud SQL read replica creation, will allow you to choose:

  1. Read replica instance ID: Any valid name for a Cloud SQL instance that will represent the replica server name
  2. Location: choose the Region and then the Zone for which your instance will be provisioned.
  3. Machine Type: Choose a Machine Type for your replica; This can be modified later! In some cases it is recommended to choose a higher instance configuration than what you will keep after replication synchronization finishes
  4. Storage type: Choice between SSD and HDD. For higher performance choose SSD
  5. Storage capacity: It can be from 10GB up to 10TB. The checkbox for Enable automatic storage increases means whenever you’re near capacity, space will be incrementally increased. All increases are permanent
  6. SQL Dump File: Dump generated containing binary logging position and GTID information.
  7. (Optional) More options can be configured by clicking on Show advanced options like Authorized networks, Database flags, and Labels.
  8. Once you’ve filled out this information, click on Create.

The following section, Data synchronization, will display the previous selected options as well the Outgoing IP Address which must be added to your current proxy, firewall, white-list to be able to connect and fetch replication data. Once you are sure your primary can be accessed using the specified credentials, and the IP was white-listed, you can click on Next. After that replication will start.

Live demo

If you want to see this feature in action, please check this video from Google Cloud Next 2018:

Extend Metrics for Percona Monitoring and Management Without Modifying Code

Percona Monitoring and Management (PMM) provides an excellent solution for system monitoring. Sometimes, though, you’ll have the need for a metric that’s not present in the list of node_exporter metrics out of the box. In this post, we introduce a simple method and show how to extend the list of available metrics without modifying the node_exporter code. It’s based on the textfile collector.

Enable the textfile collector in pmm-client

This collector is not enabled by default in the latest version of pmm-client. So, first let’s enable the textfile collector.

# pmm-admin rm linux:metrics OK, removed system pmm-client-hostname from monitoring. # pmm-admin add linux:metrics -- --collectors.enabled=diskstats,filefd,filesystem,loadavg,meminfo,netdev,netstat,stat,time,uname,vmstat,textfile --collector.textfile.directory="/tmp" OK, now monitoring this system. # pmm-admin ls pmm-admin 1.13.0 PMM Server      | 10.178.1.252   Client Name     | pmm-client-hostname Client Address  | 10.178.1.252   Service Manager | linux-upstart -------------- -------------------- ----------- -------- ------------ -------- SERVICE TYPE   NAME                 LOCAL PORT  RUNNING  DATA SOURCE  OPTIONS   -------------- -------------------- ----------- -------- ------------ -------- linux:metrics  pmm-client-hostname  42000       YES      -

Notice that the whole list of default collectors has to be re-enabled. Also, don’t forget to specify the directory for reading files with the metrics (–collector.textfile.directory=”/tmp”). The exporter reads files with the extension .prom

Add a crontab task

The second step is to add a crontab task to collect metrics and place them into a file.

Here are the cron commands for collecting the number of running and stopping docker containers.

*/1 * * * * root echo -n "" > /tmp/docker_all.prom; /usr/bin/docker ps -a | sed -n '1!p'| /usr/bin/wc -l | sed -ne 's/^/node_docker_containers_total /p' >> /tmp/doc ker_all.prom; */1 * * * *     root   echo -n "" > /tmp/docker_running.prom; /usr/bin/docker ps | sed -n '1!p'| /usr/bin/wc -l | sed -ne 's/^/node_docker_containers_running_total /p' >> /tmp/docker_running.prom;

The result of the commands is placed into the files 

/tmp/docker_running.prom and /tmp/docker_running.prom and read by exporter.

Adding the crontab tasks by using a script

Also, we have a few bash scripts that make it much easier to add crontab tasks.

The first one allows you to collect the logged-in users and the size of Innodb data files.

You may use the suggested names of files and metrics or set new ones.

The second script is more universal. It allows us to get the size of any directories or files. This script can be placed directly into a crontab task. You should just specify the list of monitored instances (e.g. /var/log /var/cache/apt /var/lib/mysql/ibdata1)

echo "*/5 * * * * root bash /root/object_sizes.sh /var/log /var/cache/apt /var/lib/mysql/ibdata1" > /etc/cron.d/object_size

So, I hope this has provided useful insight into how to set up the collection of new PMM metrics without the need to write code. Please feel free to use the scripts or configure commands similar to the ones provided above.

More resources you might enjoy

If you are new to PMM, there is a great demo site of the latest version, showing you those out of the box metrics. Or how about our free webinar on monitoring Amazon RDS with PMM?

The post Extend Metrics for Percona Monitoring and Management Without Modifying Code appeared first on Percona Database Performance Blog.

Webinar Wed 8/29: Databases in the Hosted Cloud

Please join Percona’s Chief Evangelist, Colin Charles on Wednesday, August 29th, 2018, as he presents Databases in the Hosted Cloud at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4).

Register Now

 

Nearly everyone today uses some form of database in the hosted cloud. You can use hosted MySQL, MariaDB, Percona Server, and PostgreSQL in several cloud providers as a database as a service (DBaaS).

In this webinar, Colin Charles explores how to efficiently deploy a cloud database configured for optimal performance, with a particular focus on MySQL.

You’ll learn the differences between the various public cloud offerings for Amazon RDS including Aurora, Google Cloud SQL, Rackspace OpenStack DBaaS, Microsoft Azure, and Alibaba Cloud, as well as the access methods and the level of control you have. Hosting in the cloud can be a challenge but after today’s webinar, we’ll make sure you walk away with a better understanding of how you can leverage the cloud for your business needs.

Topics include:

  • Backup strategies
  • Planning multiple data centers for availability
  • Where to host your application
  • How to get the most performance out of the solution
  • Cost
  • Monitoring
  • Moving from one DBaaS to another
  • Moving from a DBaaS to your own hosted platform

Register Now.

The post Webinar Wed 8/29: Databases in the Hosted Cloud appeared first on Percona Database Performance Blog.

Generating a mysqldump to import into Google Cloud SQL

This tutorial is for you that is trying to import your current database into a Google Cloud SQL instance, replica, that will be setup for replication purposes.

According to the documentation, you will need to run:

mysqldump \ -h [MASTER_IP] -P [MASTER_PORT] -u [USERNAME] -p \ --databases [DBS] \ --hex-blob --skip-triggers --master-data=1 \ --order-by-primary --compact --no-autocommit \ --default-character-set=utf8 --ignore-table [VIEW] \ --single-transaction --set-gtid-purged=on | gzip | \ gsutil cp - gs://[BUCKET]/[PATH_TO_DUMP]

The mysqldump parameters are:

  • -h the hostname or IPV4 address of the primary should replace [MASTER_IP]
  • -P the port or the primary server, usually [MASTER_PORT] value will be 3306
  • -u takes the username passed on [USERNAME]
  • -p informs that a password will be given
  • --databases a comma separated list of the databases to be imported. Keep in mind [DBS] should not include the sys, performance_schema, information_schema, and mysql schemas
  • --hex-blob necessary for dumping binary columns which types could be BINARY, BLOB and others
  • --skip-triggers recommended for the initial load, you can import the triggers at a later moment
  • --master-data according to the documentation: “It causes the dump output to include a CHANGE MASTER TO statement that indicates the binary log coordinates (file name and position) of the dumped server”
  • --order-by-primary it dumps the data in the primary key order
  • --compact produces a more compact output, enabling several flags for the dump
  • --no-autocommit encloses the table between a SET autocommit=0 and COMMIT statements
  • --default-character-set informs the default character set
  • --ignore-table must list the VIEW to be ignored on import, for multiple views, use this option multiple times. Views can be imported later on after promotion of the replica is done
  • --single-transaction a START TRANSACTION is sent to the database so the dump will contain the data up to that point in time
  • --set-gtid-purged writes the the state of the GTID information into the dump file and disables binary logging when the dump is loaded into the replica

After that the result is compressed in a GZIP file and uploaded to a bucket on Google Cloud Storage with gsutil cp - gs://[BUCKET]/[PATH_TO_DUMP] where [BUCKET] is the bucket you created on GCS and [PATH_TO_DUMP] will save the file in the desired path.

Be aware that no DDL operations should be performed in the database while the dump is being generated else you might find inconsistencies.

See something wrong in this tutorial? Please don’t hesitate to message me through the comments or the contact page.

Pages