Planet MySQL

systemd: a revolution gets used to itself

New versions of systemd (like in Ubuntu 16.04) are able to configure pid Cgroups (ulimit -u via Cgroups) using TasksMax.

Check the commit implementing TaksMax.

Author: Lennart Poettering <lennart@poettering.net> Date: Fri Nov 13 19:28:32 2015 +0100 core: enable TasksMax= for all services by default, and set it to 512 Also, enable TasksAccounting= for all services by default, too. See: http://lists.freedesktop.org/archives/systemd-devel/2015-November/035006.html

But not we’ve got silly defaults:

$ systemctl show -p TasksMax docker TasksMax=512 $ systemctl show -p TasksMax mysql TasksMax=512

So if you use Docker or MySQL most likely you are going to have trouble without really putting load on your server.

Just set TasksMax for your service.

TasksMax=infinity

Feel free to configure that setting fitting you load :)

Viel Spaß

Erkan \o/

Update 1: There will be an update for Debian/Ubuntu setting TaskMax=infinity as default.

Update 2: With version 230 there is a new default \o/ KillUserProcesses=yes. Check it out yourself and cry :(


PlanetMySQL Voting: Vote UP / Vote DOWN

Linux User-Group Console

This post shows you how to add the menu option and GUI to set users and groups. It’s quite a bit easier than mastering all the command-line syntax. It makes setting up the required user and group accounts for an Oracle Enterprise or MySQL database solution much easier.

You add the utility by calling the yum (Yellowdog Updater, Modified) utility like this:

yum installed -y system-config_users

You should see the following:

Loaded plugins: langpacks Usage: yum [options] COMMAND   List of Commands:   check Check for problems in the rpmdb check-update Check for available package updates clean Remove cached data deplist List a package's dependencies distribution-synchronization Synchronize installed packages to the latest available versions downgrade downgrade a package erase Remove a package or packages from your system fs Creates filesystem snapshots, or lists/deletes current snapshots. fssnapshot Creates filesystem snapshots, or lists/deletes current snapshots. groups Display, or use, the groups information help Display a helpful usage message history Display, or use, the transaction history info Display details about a package or group of packages install Install a package or packages on your system langavailable Check available languages langinfo List languages information langinstall Install appropriate language packs for a language langlist List installed languages langremove Remove installed language packs for a language list List a package or groups of packages load-transaction load a saved transaction from filename makecache Generate the metadata cache provides Find what package provides the given value reinstall reinstall a package repo-pkgs Treat a repo. as a group of packages, so we can install/remove all of them repolist Display the configured software repositories search Search package details for the given string shell Run an interactive yum shell swap Simple way to swap packages, instead of using shell update Update a package or packages on your system update-minimal Works like upgrade, but goes to the 'newest' package match which fixes a problem that affects your system updateinfo Acts on repository update information upgrade Update packages taking obsoletes into account version Display a version for the machine and/or available repos.     Command line error: no such option: -i sh-4.2# yum install -y system-config-users Loaded plugins: langpacks adobe-linux-x86_64 | 951 B 00:00 ol7_UEKR3 | 1.2 kB 00:00 ol7_latest | 1.4 kB 00:00 Resolving Dependencies --> Running transaction check ---> Package system-config-users.noarch 0:1.3.5-2.el7 will be installed --> Processing Dependency: system-config-users-docs for package: system-config-users-1.3.5-2.el7.noarch --> Running transaction check ---> Package system-config-users-docs.noarch 0:1.0.9-6.el7 will be installed --> Processing Dependency: rarian-compat for package: system-config-users-docs-1.0.9-6.el7.noarch --> Running transaction check ---> Package rarian-compat.x86_64 0:0.8.1-11.el7 will be installed --> Processing Dependency: rarian = 0.8.1-11.el7 for package: rarian-compat-0.8.1-11.el7.x86_64 --> Processing Dependency: rarian for package: rarian-compat-0.8.1-11.el7.x86_64 --> Processing Dependency: librarian.so.0()(64bit) for package: rarian-compat-0.8.1-11.el7.x86_64 --> Running transaction check ---> Package rarian.x86_64 0:0.8.1-11.el7 will be installed --> Finished Dependency Resolution   Dependencies Resolved   ================================================================================ Package Arch Version Repository Size ================================================================================ Installing: system-config-users noarch 1.3.5-2.el7 ol7_latest 337 k Installing for dependencies: rarian x86_64 0.8.1-11.el7 ol7_latest 97 k rarian-compat x86_64 0.8.1-11.el7 ol7_latest 65 k system-config-users-docs noarch 1.0.9-6.el7 ol7_latest 307 k   Transaction Summary ================================================================================ Install 1 Package (+3 Dependent packages)   Total download size: 805 k Installed size: 3.9 M Downloading packages: (1/4): rarian-0.8.1-11.el7.x86_64.rpm | 97 kB 00:00 (2/4): rarian-compat-0.8.1-11.el7.x86_64.rpm | 65 kB 00:00 (3/4): system-config-users-1.3.5-2.el7.noarch.rpm | 337 kB 00:00 (4/4): system-config-users-docs-1.0.9-6.el7.noarch.rpm | 307 kB 00:00 -------------------------------------------------------------------------------- Total 830 kB/s | 805 kB 00:00 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : rarian-0.8.1-11.el7.x86_64 1/4 Installing : rarian-compat-0.8.1-11.el7.x86_64 2/4 Installing : system-config-users-1.3.5-2.el7.noarch 3/4 Installing : system-config-users-docs-1.0.9-6.el7.noarch 4/4 Verifying : rarian-compat-0.8.1-11.el7.x86_64 1/4 Verifying : system-config-users-1.3.5-2.el7.noarch 2/4 Verifying : rarian-0.8.1-11.el7.x86_64 3/4 Verifying : system-config-users-docs-1.0.9-6.el7.noarch 4/4   Installed: system-config-users.noarch 0:1.3.5-2.el7   Dependency Installed: rarian.x86_64 0:0.8.1-11.el7 rarian-compat.x86_64 0:0.8.1-11.el7 system-config-users-docs.noarch 0:1.0.9-6.el7   Complete!

After successfully installing the radian, rarian-compat, system-config-users, and system-config-users-docs packages, you will find that there’s now a Users and Groups option when you navigate by clicking on Applications and then clicking on Sundry from the menu.

Menu Instructions

  1. You navigate to the Applications menu, and choose Sundry from the menu list and Users and Groups from the menu item to continue.

  1. You will be prompted for the sudoer’s password in this dialog.

  1. At this point, you can use the GUI interface to set users and groups.

As always, I hope this helps those trying to set users and passwords without mastering the command-line syntax.


PlanetMySQL Voting: Vote UP / Vote DOWN

Infiniflash Benchmark

Sandisk (FusionIO) and Nexenta are working together to build this SDS solution.

Infiniflash is a very large SDS production, which manages for very large DW system who requires large storage space and also high IOPS.

We test infiniflash system ,read this Infiniflash_benchmark


PlanetMySQL Voting: Vote UP / Vote DOWN

Releasing ocelotgui 1.0.0

Today ocelotgui, the Ocelot Graphical User Interface for MySQL and MariaDB, version 1.0.0, is generally available for Linux. Read the manual, see the screenshots, and download binary (.deb or .rpm) packages and GPL-licensed C++ source here.


PlanetMySQL Voting: Vote UP / Vote DOWN

Galera Error Failed to Report Last Committed (Interrupted System Call)

In this blog, we’ll discuss the ramifications of the Galera Error Failed to Report Last Committed (Interrupted System Call).

I have recently seen this error with Percona XtraDB Cluster (or Galera):

[Warning] WSREP: Failed to report last committed 549684236, -4 (Interrupted system call)

It was posted in launchpad as a bug in 2013: https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1434646

My colleague Przemek replied, and explained it as:

Reporting the last committed transaction is just a part of the certification index purge process. In case it fails for some reason (it occasionally does), the cert index purge may be a little delayed. But it does not mean the transaction was not applied successfully. This is a warning after all.

If we look up this error in the source code, we realize it is reusing Linux system errors. Specifically:

#define EINTR 4 /* Interrupted system call */

As there isn’t much documentation regarding this error, and internet searches did not bring up useful information, my colleague David Bennett and I delved into the source code (as we do on occasion).

If we look in the Galera source code gcs_sm.hpp we see:

289  * @retval -EINTR  - was interrupted by another thread

We also see:

317                 /* was interrupted, will be handled by someone else */

This means that the thread was interrupted, but the server will retry on another thread. As it is just a warning, it isn’t anything to be too concerned about – unless they begin to pile up (which could be a sign of concurrency issues).

The specific warning is thrown from galera_service_thd.cpp here:

58                 if (gu_unlikely(ret < 0))
59                 {
60                     log_warn << "Failed to report last committed "
61                              << data.last_committed_ << ", " << ret
62                              << " (" << strerror (-ret) << ')';
63                     // @todo: figure out what to do in this case
64                 }

This warning could be handled better so as to not bloody the logs, or sound cryptic enough to concern administrators.


PlanetMySQL Voting: Vote UP / Vote DOWN

Asynchronous Query Execution with MySQL 5.7 X Plugin

In this blog, we will discuss MySQL 5.7 asynchronous query execution using the X Plugin.

Overview

MySQL 5.7 supports X Plugin / X Protocol, which allows (if the library supports it) asynchronous query execution. In 2014, I published a blog on how to increase a slow query performance with the parallel query execution. There, I created a prototype in the bash shell. Here, I’ve tried a similar idea with NodeJS + mysqlx library (which uses MySQL X Plugin).

TL;DR version: By using the MySQL X Plugin with NodeJS I was able to increase query performance 10x (some query rewrite required).

X Protocol and NodeJS

Here are the steps required:

  1. First, we will need to enable X Plugin in MySQL 5.7.12+, which will use a different port (33060 by default).
  2. Second, download and install NodeJS (>4.2) and mysql-connector-nodejs-1.0.2.tar.gz (follow Getting Started with Connector/Node.JS guide).
    # node --version v4.4.4 # wget https://dev.mysql.com/get/Downloads/Connector-Nodejs/mysql-connector-nodejs-1.0.2.tar.gz # npm install mysql-connector-nodejs-1.0.2.tar.gz
    Please note: on older systems, you will probably need to upgrade the nodejs version. Follow the Installing Node.js via package manager guide.
  3. All set! Now we can use the asynchronous queries feature.

Test data 

I’m using the same Wikipedia Page Counts dataset (wikistats) I’ve used for my Apache Spark and MySQL example. Let’s imagine we want to compare the popularity of MySQL versus PostgeSQL in January 2008 (comparing the total page views). Here are the sample queries:

mysql> select sum(tot_visits) from wikistats_by_day_spark where url like '%mysql%'; mysql> select sum(tot_visits) from wikistats_by_day_spark where url like '%postgresql%';

The table size only holds data for English Wikipedia for January 2008, but still has ~200M rows and ~16G in size. Both queries run for ~5 minutes each, and utilize only one CPU core (one connection = one CPU core). The box has 24 CPU cores, Intel(R) Xeon(R) CPU L5639 @ 2.13GHz. Can we run the query in parallel, utilizing all cores?

That is possible now with NodeJS and X Plugin, but require some preparation:

  1. Partition the table using hash, 24 partitions:
    CREATE TABLE `wikistats_by_day_spark_part` ( `id` int(11) NOT NULL AUTO_INCREMENT, `mydate` date NOT NULL, `url` text, `cnt` bigint(20) NOT NULL, `tot_visits` bigint(20) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=239863472 DEFAULT CHARSET=latin1 /*!50100 PARTITION BY HASH (id) PARTITIONS 24 */
  2. Rewrite the query running one connection (= one thread) per each partition, choosing its own partition for each thread:
    select sum(tot_visits) from wikistats_by_day_spark_part partition (p<N>) where url like '%mysql%';
  3. Wrap it up inside the NodeJS Callback functions / Promises.

The code

var mysqlx = require('mysqlx'); var cs_pre = { host: 'localhost', port: 33060, dbUser: 'root', dbPassword: 'mysql' }; var cs = { host: 'localhost', port: 33060, dbUser: 'root', dbPassword: 'mysql' }; var partitions = []; var res = []; var total = 0; mysqlx.getNodeSession( cs_pre ).then(session_pre => { var sql="select partition_name from information_schema.partitions where table_name = 'wikistats_by_day_spark_part' and table_schema = 'wikistats' "; session_pre.executeSql(sql) .execute(function (row) { partitions.push(row); }).catch(err => { console.log(err); }) .then( function () { partitions.forEach(function(p) { mysqlx.getNodeSession( cs ).then(session => { var sql="select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(" + p + ") where url like '%mysql%';" console.log("Started SQL for partiton: " + p); return Promise.all([ session.executeSql(sql) .execute(function (row) { console.log(p + ":" + row); res.push(row); total = Number(total) + Number(row); }).catch(err => { console.log(err); }), session.close() ]); }).catch(err => { console.log(err + "partition: " + p); }).then(function() { // All done if (res.length == partitions.length) { console.log("All done! Total: " + total); // can now sort "res" array if needed an display } }); }); }); session_pre.close(); }); console.log("Starting...");

The explanation

The idea here is rather simple:

  1. Find all the partitions for the table by using “select partition_name from information_schema.partitions”
  2. For each partition, run the query in parallel: create a connection, run the query with a specific partition name, define the callback function, then close the connection.
  3. As the callback function is used, the code will not be blocked, but rather proceed to the next iteration. When the query is finished, the callback function will be executed.
  4. Inside the callback function, I’m saving the result into an array and also calculating the total (actually I only need a total in this example).
    .execute(function (row) { console.log(p + ":" + row); res.push(row); total = Number(total) + Number(row); ...

Asynchronous Salad: tomacucumtoes,bersmayonn,aise *

This may blow your mind: because everything is running asynchronously, the callback functions will return when ready. Here is the result of the above script:

$ time node async_wikistats.js Starting... Started SQL for partiton: p0 Started SQL for partiton: p1 Started SQL for partiton: p2 Started SQL for partiton: p3 Started SQL for partiton: p4 Started SQL for partiton: p5 Started SQL for partiton: p7 Started SQL for partiton: p8 Started SQL for partiton: p6 Started SQL for partiton: p9 Started SQL for partiton: p10 Started SQL for partiton: p12 Started SQL for partiton: p13 Started SQL for partiton: p11 Started SQL for partiton: p14 Started SQL for partiton: p15 Started SQL for partiton: p16 Started SQL for partiton: p17 Started SQL for partiton: p18 Started SQL for partiton: p19 Started SQL for partiton: p20 Started SQL for partiton: p21 Started SQL for partiton: p22 Started SQL for partiton: p23

… here the script will wait for the async calls to return, and they will return when ready – the order is not defined.

Meanwhile, we can watch MySQL processlist:

+------+------+-----------------+-----------+---------+-------+--------------+-------------------------------------------------------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+------+-----------------+-----------+---------+-------+--------------+-------------------------------------------------------------------------------------------------------------------+ | 186 | root | localhost:44750 | NULL | Sleep | 21391 | cleaning up | PLUGIN | | 2290 | root | localhost | wikistats | Sleep | 1417 | | NULL | | 2510 | root | localhost:41737 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p0) where url like '%mysql%' | | 2511 | root | localhost:41738 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p1) where url like '%mysql%' | | 2512 | root | localhost:41739 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p2) where url like '%mysql%' | | 2513 | root | localhost:41741 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p4) where url like '%mysql%' | | 2514 | root | localhost:41740 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p3) where url like '%mysql%' | | 2515 | root | localhost:41742 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p5) where url like '%mysql%' | | 2516 | root | localhost:41743 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p6) where url like '%mysql%' | | 2517 | root | localhost:41744 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p7) where url like '%mysql%' | | 2518 | root | localhost:41745 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p8) where url like '%mysql%' | | 2519 | root | localhost:41746 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p9) where url like '%mysql%' | | 2520 | root | localhost:41747 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p10) where url like '%mysql%' | | 2521 | root | localhost:41748 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p11) where url like '%mysql%' | | 2522 | root | localhost:41749 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p12) where url like '%mysql%' | | 2523 | root | localhost:41750 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p13) where url like '%mysql%' | | 2524 | root | localhost:41751 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p14) where url like '%mysql%' | | 2525 | root | localhost:41752 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p15) where url like '%mysql%' | | 2526 | root | localhost:41753 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p16) where url like '%mysql%' | | 2527 | root | localhost:41754 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p17) where url like '%mysql%' | | 2528 | root | localhost:41755 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p18) where url like '%mysql%' | | 2529 | root | localhost:41756 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p19) where url like '%mysql%' | | 2530 | root | localhost:41757 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p20) where url like '%mysql%' | | 2531 | root | localhost:41758 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p21) where url like '%mysql%' | | 2532 | root | localhost:41759 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p22) where url like '%mysql%' | | 2533 | root | localhost:41760 | NULL | Query | 2 | Sending data | PLUGIN: select sum(tot_visits) from wikistats.wikistats_by_day_spark_part partition(p23) where url like '%mysql%' | | 2534 | root | localhost | NULL | Query | 0 | starting | show full processlist | +------+------+-----------------+-----------+---------+-------+--------------+-------------------------------------------------------------------------------------------------------------------+

And CPU utilization:

Tasks: 41 total, 1 running, 33 sleeping, 7 stopped, 0 zombie %Cpu0 : 91.9 us, 1.7 sy, 0.0 ni, 6.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu1 : 97.3 us, 2.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 97.0 us, 3.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 97.7 us, 2.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 95.7 us, 2.7 sy, 0.0 ni, 1.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 98.3 us, 1.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 98.3 us, 1.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 97.7 us, 2.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu8 : 96.7 us, 3.0 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu9 : 98.3 us, 1.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu10 : 95.7 us, 4.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu11 : 97.7 us, 2.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu12 : 98.0 us, 2.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu13 : 98.0 us, 1.7 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu14 : 97.7 us, 2.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu15 : 97.3 us, 2.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu16 : 98.0 us, 2.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu17 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu18 : 97.3 us, 2.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu19 : 98.7 us, 1.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu20 : 99.3 us, 0.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu21 : 97.3 us, 2.3 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu22 : 97.0 us, 3.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu23 : 96.0 us, 4.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st ... PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 18901 mysql 20 0 25.843g 0.017t 7808 S 2386 37.0 295:34.05 mysqld

Now, here is our “salad”:

p1:2499 p23:2366 p2:2297 p0:4735 p12:12349 p14:1412 p3:2045 p16:4157 p20:3160 p18:8717 p17:2967 p13:4519 p15:5462 p10:1312 p5:2815 p7:4644 p9:766 p4:3218 p6:4175 p21:2958 p8:929 p19:4182 p22:3231 p11:4020

As we can see, all partitions are in random order. If needed, we can even sort the result array (which isn’t needed for this example as we only care about the total). Finally our result and timing:

All done! Total: 88935 real 0m30.668s user 0m0.256s sys 0m0.028s

Timing and Results

  • Original query, single thread: 5 minutes
  • Modified query, 24 threads in Node JS: 30 seconds
  • Performance increase: 10x

If you are interested in the original question (MySQL versus PostgreSQL, Jan 2008):

  • MySQL, total visits: 88935
  • PostgreSQL total visits: 17753

Further Reading:

PS: Original Asynchronous Salad Joke, by Vlad @Crazy_Owl (in Russian)


PlanetMySQL Voting: Vote UP / Vote DOWN

Generated MySQL Columns and Changing Values

I was speaking at PHP[Tek] this week on the JSON Data Type and using generated columns. JSON columns can not be indexed but data from a JSON column can be extracted via a generated column and that column can be indexed. All was going well until someone asked me about modifying data in a generated column. Was it possible?

I blinked hard. I have not tried that! I had not seen any mention of that in the documentation. So I had to admit that I did not know and would have to try that.

The Test
mysql> CREATE TABLE gentest (a INT, b INT AS (a + 1) STORED, INDEX(b));
...
mysql> INSERT INTO gentest VALUES (1),(2),(3),(4);
...
So now we have a table with data to test. So lets try to modify the value of one of the generated columns.
mysql> UPDATE gentest SET b = 9 WHERE a = 1;

And what did the server do?

It returned an error and told me The value specified for generated column 'b' in table 'gentest' is not allowed. Lesson LearnedSo now I know that the server will not allow you to mangle, er, change the value of generated columns away from their definition. And yes, I find the same thing with both the VIRTUAL and STORED versions of generated columns.
PlanetMySQL Voting: Vote UP / Vote DOWN

How to Setup MySQL Master-Master Replication

This article consolidates information from several sources into the format I use to setup MySQL Master/Master Replication. The beauty of Linux and open source is that there are many different ways to do this. Please take a look at my references and use them to accommodate any needs you may have.
PlanetMySQL Voting: Vote UP / Vote DOWN

Understanding MySQL Fabric Faulty Server Detection

Awhile ago I found myself analyzing a MySQL fabric installation to understand why a group member was occasionally being marked as FAULTY even when the server was up and running and no failures were observed.  

server_uuid address status mode weight ------------------------------------ ----------- ------- ---------- ------ ab0b0653-6121-11c5-55a0-007543445454 mysql1:3306 PRIMARY READ_WRITE 1.0 f34dd331-2432-11f4-a2d3-006754678533 mysql2:3306 FAULTY READ_ONLY 1.0

 

Upon reviewing mysqlfabric logs, I found the following warnings were being logged from time to time:

[WARNING] 1442221217.920115 - FailureDetector(xc_grp_1) - Server (f34dd331-2432-11f4-a2d3-006754678533) in group (xc_grp_1) is unreachable

 

Since I was not clear under which circumstances a server is marked as FAULTY, I decided to review MySQL Fabric code (Python) to better understand the process.

The module responsible for printing this message is failure_detection.py and more specifically, the _run method belonging to FailureDetector class. This method will loop through every server in a group, and attempt a connection to the MySQL instance running on that node. MySQLServer.Is_alive (mysql/fabric/server.py) method is called for this purpose.

Before reviewing the failure detection process, we first need to know that there are four MySQL fabric parameters that will affect when a server is considered unstable or faulty:

DETECTION_TIMEOUT
DETECTION_INTERVAL
DETECTIONS
FAILOVER_INTERVAL

 

Based on the above variables, the logic followed by FailureDetector._run() to mark a server as FAULTY is the following:

1) Every {DETECTION_INTERVAL/DETECTIONS} seconds, a connection against each server in the group is attempted with a timeout equal to DETECTION_TIMEOUT

2) If DETECTION_TIMEOUT is exceeded, the observed message is logged and a counter incremented

3) When this counter reaches DETECTIONS, the server is marked as “unstable” and if the last time the master changed was greater than FAILOVER_INTERVAL ago, the server is marked as FAULTY

With a better understanding of the logic followed by MySQL fabric to detect faulty nodes, I went to the configuration file to check the existing values for each of the parameters:

DETECTIONS=3 DETECTION_TIMEOUT=1 FAILOVER_INTERVAL=0 DETECTION_INTERVAL=6

From the values above we can notice that each group will be polled every 2 seconds (DETECTION_INTERVAL/DETECTIONS) and that the monitored server should respond within a second for the test to be considered successful.

On high concurrency nodes, or nodes under heavy load, a high polling frequency combined with tight timeouts could cause the servers to be marked as FAULTY just because the connection attempt would not be completed or processed (in the case of high connection rates or saturated network interfaces) before the timeout occurs.

Also, having FAILOVER_INTERVAL reduced to 0, will cause the server to be marked as FAULTY even if a failover had just occurred.

A less aggressive configuration would be more appropriated for heavy loaded environment:

DETECTIONS=3 -> It's Ok DETECTION_TIMEOUT -> 5 TO 8 FAILOVER_INTERVAL -> 1200 DETECTION_INTERVAL -> 45 TO 60 Conclusion

As with any other database clustering solution that relies on a database connection to test node status, situations where the database server would take longer to respond should also be considered. The polling frequency should be adjusted so the detection window is within an acceptable range, but the rate of monitoring connections generated is also kept to the minimum. Check timeouts should also be adjusted to avoid false positives caused by the server not being able to respond in a timely manner.

 


PlanetMySQL Voting: Vote UP / Vote DOWN

AWS Aurora Benchmarking part 2

Some time ago, I published the article on AWS Aurora Benchmarking (AWS Aurora Benchmarking – Blast or Splash?), in which I analyzed the behavior of different solutions using synchronous replication in AWS environment. This blog follows up with some of the comments and suggestions I received regarding that post from the community and Amazon engineers.

I decided to perform another round of tests, keeping in mind comments and suggestions received.

I presented some of the results during the Percona conference in Santa Clara last April 2016. The following is the transposition that presentation, with more details.

Not interested in the preliminary descriptions? Go to the results section

Why new tests?

A very good question, with an easy answer.

Aurora is a product that is still under development and refinement: six months of development could present major changes in performance. Not only that, but the initial tests focused on entry-level solutions, meaning I was analyzing the kind of users that are currently starting their business and looking for a flexible solution that allows them to save money and scale.

This time, I put the focus on enterprise solutions by analyzing what an already well-established company would get when looking for a decent scalable solution.

These are two different scenarios.

Why so many (different) tests?

I used many different benchmarking tools, and I am still planning to run others. Why so? Why not simply use one of them?

Again, a simple answer. I used different tools because in some cases, they provide me a different way of accessing and using data. I also do not trust benchmarking tools, not even the ones I developed. I wanted to test the same thing using different tools and compare the results. ONLY if I see a common pattern, then would I consider the test valid. Personally, I tend to discard any test that is not consistent, or if the analysis performed is using a single benchmarking tool. In my opinion, being lazy is not an option when doing these kind of exercises.

About the tests

It was difficult to compare apples to apples here. And I think that is the main point to keep in mind.

Aurora is not a standard RDS solution, like we are used to. Aurora looks like MySQL, smells like MySQL, but is not vanilla MySQL. To achieve what they have, the engineers had to change many parts. The more you dig in, the more you realize there are significant differences.

Because of that, I had to focus more on identifying what each solution can do and compare the solutions against expectations, rather than comparing the numbers.

I was more interested to see what happen if:

  • I have a burst of connections, and my application goes from 4K to 40K connections. Will it crash? Will it slow down?
  • How long should I wait if a node fails?
  • What should I not have in my schema design, to prevent bottlenecks?

Those are relevant questions in my opinion, more so than discovering that solution A has 3000 rows written/sec, and solution B has 3100. Or that I might (might) have some additional page rotation, file -> memory-> flushes because the amount of memory differs.

That is valuable information, for sure, but less valuable than having a decent understanding of which platform will help my business grow and remain stable.

What is the right tool for the job? This is the question I am addressing.

Tests run

I had run three main kinds of tests:

  • Performance and load stress
  • High availability failover
  • Response time (latency) from the application point of view
Performance and load stress

These tests were the most extensive and demanding.

I analyzed the capacity to serve the load under different conditions, from a light load up to full utilization, and some degree of resource saturation.

  • The first set of tests were to evaluate a simple load on a single table, causing the table to become a hotspot and showing how the platform would manage the increasing contention.
  • The second set of tests were to perform a similar load, but distributing it cross multiple tables and batching the operations. Parallelization, contention, scalability and distributed hotspots were in the picture.

The two above focused on write operations only, and were done using different tools (comparing the results as they were complementary).

  • Third set of tests, using my own stress tool, were focused on R/W oriented usage. The tests were executed against multiple tables, performing CRUD actions, using simple and batch insert, reads by PK, index, by range, IN and exact match conditions.
  • The fourth set of tests were performed using a TPC-C like load (OLTP).
  • The fifth set of tests were using sysbench in OLTP mode, with 250 tables.

The scope of the last three set of tests was to identify how the platforms would manage the load, considering the following:

  • Read and write contention on the same tables
  • High level of parallelism (from the application)
  • Possible hot-spots (TPCC district)
  • Increasing utilization (memory, threads, IO)
  • Saturation (connections)

Finally, all tests were run with fully utilized BufferPool.

The machines

Small boxes (first round of tests):

EIP = 1 VPC = 1 ELB=1 Subnets = 4 (1 public, 3 private) HAProxy = 6 MHA Monitor (micro ec2) = 1 NAT Instance (EC2) =1 (hosting EIP) DB Instances (EC2) = 3 (m4.xlarge) 16GB Application Instances (EC2) = 6 (4) EBS SSD 3000 PIOS Aurora RDS node = 3 (db.r3.xlarge) 30GB

Large boxes (latest tests):

EIP = 1 VPC = 1 ELB=1 Subnets = 4 (1 public, 3 private) HAProxy = 4 MHA Monitor (micro ec2) = 1 NAT Instance (EC2) =1 (hosting EIP) DB Instances (EC2) = 3 (c3.8xlarge) 60GB Application Instances (EC2) = 4 EBS SSD 5000 PIOS Aurora RDS node = 3 (db.r3.8xlarge) 244GB

A note

It was pointed out to me that I deliberately chose to use an Ec2 solution for Percona XtraDB Cluster with less memory than the one available in Aurora. This is true, and we must take that into consideration. The reason for this is that the only Ec2 solution matching the memory of a db_r3.8xlarge is the d2.8xlarge.

I did try it, but the level of scalability I got (from the CPU point of view) was less efficient than the one available with c3.8xlarge. I decided to prefer CPU resources to memory, especially because I was going to test concurrency and parallelism in conjunction with the load increase.

From the result, I feel confident that I chose correctly – but I am open to comment.

The layout

This is what the setup looks like:

Where you read Java, those are the application nodes running the different test applications.

Two words about Aurora first

Aurora has a few key concepts that we must have clearly in mind, especially how it manages the writes across replica, and how connections are implemented.

The IO activity

To replicate the information across the different storage, Aurora only replicates FRM files and data coming from IB_LOGS. This is a quite significant advantage to other forms of replication, given the limited number of bytes that are replicated over the network (and also if they are replicated six times).

Another significant advantage is that Aurora does not use a double write buffer, which is obviously another blast (see the recent optimization in Percona Server https://www.percona.com/blog/2016/05/09/percona-server-5-7-parallel-doublewrite/ ).

In other words, writes in Aurora are organized by filling its commit queue and pushing the changes as group commit to the storage.

In some presentations, you might have seen that all steps are asynchronous. But is important to underline that a commit is acknowledged by Aurora when at least two availability zones (AZ) have received and written the incoming data related to that commit. Writes here mean received in the storage node incoming queue and with a quorum of four over six nodes.

This means that no matter what, data has to travel on the network to reach the final destination, and ACK signals come back before Aurora returns the ACK to the commit operation. The network is in the same region, but still it could represent an incognita about performance. No wonder we could have some latency at this stage!

As you can see, what I am reporting is also confirmed in the image below (and in the observations). The point is that the impact of steps 1 – 2 is not obviously clear.

Thread pooling

Aurora also use thread pooling – a lot! That will become very clear later, and as more of the work is based on parallelism, the more efficient thread pooling seems to be.

In most cases we are used to seeing CPUs on database servers not fully utilized, unless there is some heavy ordering operation or a bad query. That behavior is also (not only) a direct consequence of the connection-to-thread model, which implies a period of latency and stand by. In Aurora, the incoming connections are not following the same model. Instead, the pool redistributes the load of the incoming connection to a pool of threads, optimizing the latency period, resulting in a higher CPU utilization. Which is what you want from your resource: to be utilized and not waiting for something else to do its job.

 

The results

Without wasting more electronic ink, let see what comes out of this round of tests (not the final one by the way). To simplify the results, I will also report the graphs from the first set of tests, but will focus on the latest.Small Boxes = SB, Large Boxes LB.

Small Boxes = SB, Large Boxes = LB.

First Test: IIBench

As declared previously, my scope was to verify how the two platforms would have reacted to a simple load focus on inserts with a basic single table. The bufferpool was saturated before running the test.

SB

LB

As we can see, in the presence of a hot spot the solution using Percona XtraDB Cluster outperformed Aurora, in both cases. What is notable, though, is that while XtraDB Cluster remained approximately around the same time/performance, Aurora is significantly reduced the time taken. This shows that Aurora was taking advantage of the more powerful platform, while XtraDB Cluster was not able to.

With further analyzation of the details, we notice that Aurora performs better atomically. It was able to manage more writes/second as well as rows and pages managed. But it was inconsistent: Aurora had performance hiccups at regular intervals. As such the final result was that it took more time to process the whole workload.

I was not able to dig to deeply, given some metrics are not fully available in Aurora. As such I had to rely fully on Aurora engineers, who mentioned to me that hot-spot contention was a possible issue.

Aurora Handler calls:

XtraDB Cluster Handlers:

The execution in XtraDB Cluster showed fewer calls but constant performance, while Aurora has hiccups.

Aurora page activity write:

XtraDB Cluster page activity write:

The trend shown by the handlers stayed consistent in the page management and rows insert, as expected.

Second Test: Application Ingest

As mentioned, this test showed many threads from different application servers, inserted by a batch of 50 statements against multiple tables.

The results coming from this test are quite favorable to Aurora, as we can see starting from the time taken to complete the same workload:

LB

SB

With small ones, the situation was inverted.

But here is where the interesting part starts.

Aurora can manage significantly higher numbers of rows, as the picture below shows:

The results are also constant, and don’t decrease significantly like the inserts with XtraDB Cluster.

The number of handler commits, however, are significantly less.

Once more they stay the same with the load increase, without impacting performance.

Reviewing all handler calls, we get our first surprise.

XtraDB Cluster handler calls:

Aurora handler calls:

The gap/drop existing in the two graphs are the different tests (with an increasing number of threads).

Two things to notice here: the first one is that XtraDB Cluster decreases in performance while processing the load, while Aurora does not. The second (you need to zoom the image) is the number of commits is floating in XtraDB Cluster, while it stays fixed in Aurora.

An even bigger surprise comes up when reviewing the connections graphs.

As expected, XtraDB Cluster has all my connections open, and the number of threads running is quite close to the number of connected threads.

Both of them follow the increasing number of connected threads.

But this is not the case in Aurora.

Also, if my applications are trying to open ~800 threads, the Aurora node see only a part of them, and the number of running is fixed to 32 threads.

The important things to consider here are that a) my applications don’t connect directly to the Aurora instance, but to a connector (MariaDB), and b) that Aurora, in this case, caps the number of running threads to the number of CPU available on the instance (here 32).

Given that, I expected to have worse performance (but I don’t). The fact that Aurora uses one thread for multiple connections seems to be working quite efficiently.

The number of rows inserted is also consistent with the handler calls, and has better performance than XtraDB Cluster.

Aurora rows inserted:

XtraDB Cluster rows inserted

Again we have the same trend, only, this time, Aurora performs better than XtraDB Cluster.

Third Test: OLTP Application

When run on the small boxes, this test saw XtraDB Cluster performing much better than Aurora. The time taken by Aurora was ~3 times the time taken by XtraDB Cluster.

With a large box, I had the inverse result: Aurora is outperforming XtraDB Cluster from 2 to 7 times the speed.

Analyzing the number of commands executed with the increasing workload, we can see how XtraDB Cluster can perform better than Aurora with a workload of 128 threads, but starts to have worse performance as the load increases.

On the other hand, Aurora manages the read/write load without significant performance loss, which includes being able to increase the number of commits/sec.

Reviewing the handler calls, we see that the handler commit calls are significantly less in Aurora (as already noticed in the ingest tests).

Another thing to note is that the number of calls for XtraDB Cluster is significantly higher and not scaling, while Aurora has a nice scaling trend.

Fourth Test: TPCC-mysql

The TPCC test is mainly to test OLTP traffic, with the note that some tables (like district) might become a hotspot. The tests I ran were executed against 400 warehouses, and used 128 threads maximum for the small box and 2048 threads for the large box.

During this test, I hit one of the Aurora limitations and I escalated it to the Aurora engineers (who are aware of the problem).

Small boxes:

In the case of small boxes, there is nothing to say: XtraDB Cluster manages the load more efficiently. This trend is not optimal, having significant fluctuation. Aurora is just not able to keep it up.

Large boxes:

 

It is a different and a more complex scenario in the case of the use of large boxes. I would like to say that Aurora performs better.

This is true for two of the three tests, and up to when it got stuck by internal limitation Aurora was also performing better on the third. But then its performance just collapsed.

With a more in-depth investigation, I noticed that under the hood Aurora was not performing as well as it appeared. This comes out quite clearly by looking at a comparison between the graphs covering Comm_ execution, open files, handlers and InnoDBrow lock time.

In all of them it is evident how XtraDB Cluster keeps serving the workload with consistent behavior, while Aurora fails the second test on (512 threads) — not just on the third with 2048 threads.

Aurora:

XtraDB Cluster:

It is clear that Aurora was better served during the test with 256 threads going over the 450K com select serve (in 10 sec interval), compared with XtraDB Cluster that was not able to go over 350K.

But in the following tests, while XtraDB Cluster was able to keep going (with decreasing performance), Aurora started to struggle with very inconsistent behavior.

This was also confirmed by the open files graph.

Aurora:

XtraDB Cluster:

The graphs show the instances of files open during the test, not the ones already open. It reflects the Open_file metric “The number of files that are open. This count includes regular files opened by the server. It does not include other types of files such as sockets or pipes. Also, the count does not include files that storage engines open using their own internal functions rather than asking the server level to do so.”

I was quite surprised by the number of files open by Aurora.

Handlers reflected the same behavior, as well.

Aurora:

XtraDB Cluster:

Perfectly in line with the com trend.

So what was increasing in reverse?

Aurora:

XtraDB Cluster:

As you can see from the above, the exactly same workload generated an increasing lock row time, from quite low in the test with 256 threads, up to a crazy high with 2048 threads.

As mentioned, we know that TPCC has a couple of tables that act as hotspots, and we already saw with IIbench how Aurora is not working efficiently in that case.

I also was getting a lot of 188 errors during the test. This is an Aurora internal error. When I reported it, I was told they know about it, and they are planning to work on it.

I hope they do soon, because if this issue is solved it is very likely that Aurora will not only be able to manage the tested workload, but exceed it by far.

I am saying this because also with the identified issues Aurora was able to keep going and manage a more than decent response time during the second test (with 512 threads).

Fifth Test: Sysbench

I added the sysbench tests to test scalability, and to see the what happens when the system reaches a saturation point. This test brought up some limitations existing in the Aurora solution, related more to the connector than the Aurora engine itself.

Aurora has a limit of 16k connections. I wanted to see what happens if I got to saturation point or close to it. It doesn’t matter if this is a ridiculously high number or not.

What happened was that Aurora managed traffic up to 4K. The closer I got to the limit, however, the more I had a connectivity issue. At the end I had to run the test with 8K, 12K and 20K threads pointing directly to the Aurora instance, bypassing the connector that was not able to serve the traffic. After that, I was able to hit up to ~15500 threads (but with a lot of inconsistent performance). I am defining the limit of a meaningful test from the previous level of 12K threads.

XtraDB Cluster was able to scale up to 16K no problem.

What also is notable here is that Aurora was able to manage the workload more efficiently regarding transaction handling (i.e., as transactions executed and latency).

The number of transactions executed by Aurora was ~three times the one executed by XtraDB Cluster.

Regarding latency, Aurora showed less latency then XtraDB Cluster.

Internally, Aurora and XtraDB Cluster operations were once again different regarding how the workload was handled. The most divergent result was the handler calls:

Commit calls in Aurora were a fraction of the calls in XtraDB Cluster, while the number of rollbacks was higher.

The read calls had an even more divergent behavior, with XtraDB Cluster performing a higher number of read_keys, while Aurora was having a very limited number of them. Read_rnd are very high in XtraDB Cluster, but totally absent in Aurora (note that in Aurora, read_rnds are reported but seem not to increase). On the other hand, Aurora reported a high number of read_rnd_next, while XtraDB Cluster has none.

HA availability Fail-over time

Both solutions:

In this test, the fail-over time for the solution using Galera and HAProxy was more efficient. For both a limited or mid-level load. One assumption is that given Aurora has to verify both the status of the data transmitted and its consistency across the six data store nodes in every case; the process is not as fast as it could be.

It could also be that the cluster connector is not as efficient as it should in redirecting the traffic from one node to another. It would be a very interesting exercise to replace it with some other custom solution.

Note that I was performing the tests following the Amazon recommendation to use the following to simulate a real crash:

ALTER SYSTEM CRASH [INSTANCE|NODE]

As such, I was not doing anything strange or out of the ordinary.

It is worth mentioning that of the eight seconds taken by MySQL/Galera to perform the failover, six were due to the HAProxy settings (which had a 3000 ms interval and two loops in the settings before executing failover).

Execution latency

The purpose of these tests was to identify the latency existing between the moment that application sends the request and the moment MySQL/Aurora took the request in “charge”. The expectation is that the busier the database, the higher the latency.

For this test, I reported both results: the one coming from the old tests with the small box, and the new one with the large box.

Small boxes:

Large boxes:

It is clear from the graphs that the two tests report different scenarios. In the first, Galera was able to manage the load more efficiently and serve requests with lower latency. For the new tests, I had used a higher number of threads than the ones for the small box. Nevertheless, in the second test the CPU utilization and the number of running threads lead me to think that Aurora was finally able to utilize resources more efficiently and the lower latency.

The latency jumped up again when the number of connections rose above 12K, but that was expected given previous tests results.

Conclusions High Availability

The two platforms were able to manage the failover operation in a limited time frame (below 1 minute). Nevertheless, MySQL/Galera was shown to be more efficient and consistent. This result is a direct consequence of synchronous replication, which by design prevents MySQL/Galera from allowing an active node to fall behind.

In my opinion, the replication method used in Aurora is efficient, and given that data is shared across the read replicas, fail-over should happen faster.

The tests suffered because of the connector, and I have the feeling that having another solution in place may bring some surprises (actually, I would like to test that as well).

Performance

In this run of tests, Aurora was able to invert the results I had in the first test with the small boxes. In almost all cases, Aurora performed as well or better then XtraDB Cluster. There are still cases where Aurora is penalized, and those are the ones where hotspots are present. The contention in Aurora is killing performance, and raise errors (188). But I hope we will see a significant evolution soon.

General Comments on Aurora

The product is evolving quickly, and benchmark results may become obsolete in very short time (this is why it is important to have repeatable and comparable tests). From my point of view, in this set of tests Aurora clearly shows where it’s a better fit: higher-end levels, where high availability and CPU power is the focus (not concerns about the cost).

There is no reason to use Aurora in small-mid boxes: the platform is not going to be as efficient as a standard solution like XtraDB Cluster. But if cost is not an issue, and the applications require a lot of parallelism, Aurora on db.r3.8xlarge is a good solution.

I still see space for improvements (like for cluster connectors, or the time taken to restart a cluster after a full stop, or contention reduction). But I am also confident that the work led by the development team will fix most of my concerns (and more) soon.

Final note: it would be nice to have the code open source, so that the community could contribute (but I understand the business reasons not to).

About Cost

I don’t think it is this the right place to mention the cost of each solution (especially because each need is different).

As such, I am not reporting any specific numbers. You can, however, follow the links below and do the necessary math:

Aurora cost calculator

AWS cost calculator

 


PlanetMySQL Voting: Vote UP / Vote DOWN

Planets9s - ClusterControl 1.3 webinar replay, Polyglot Persistence Meetups, and more!

Welcome to this week’s Planets9s, covering all the latest resources and technologies we create around automation and management of open source database infrastructures.

Watch the replay: ClusterControl 1.3 webinar with new features for MySQL, MariaDB, Percona Server and PostgreSQL

Thanks to everyone who joined our ClusterControl 1.3 release webinar this week. If you weren’t able to attend or would simply like to watch it again, the replay is now available online. Johan Andersson, CTO at Severalnines, gave an overview of ClusterControl’s deployment, monitoring, management and scaling functionalities for MySQL, MariaDB, Percona Server, MongoDB and PostgreSQL, as well as the new features around key management, operational reports and the new deployment tool for MySQL NDB Cluster … and more!

Watch the webinar replay

Sign up for our European Polyglot Persistence Meetups Tour

Polyglot Persistence means that when storing data, it is best to use multiple data storage technologies, chosen based upon the way data is being used by the application. We’re starting off with Amsterdam, then moving on to Dublin, Paris, Berlin, Stockholm and London. And for our kick-off in Amsterdam, we’re lucky to be sponsored by Booking.com. Please sign up for the meetup of your choice, as we’re announcing locations, speakers etc. via the Meetup platform.

Sign up for a meetup

Become a ClusterControl DBA: Operational Reports for MySQL and MariaDB

Performing database health checks is a lot more time-consuming than it should be. As a DBA, you have to gather data from multiple places in order to understand the state of your databases. This is why we introduced operational reports in ClusterControl 1.3, where we gather some of the most important data into a single document, which can be quickly reviewed. This new blog post explains how you can make best use of this new functionality in ClusterControl.

Read the blog

That’s it for this week! Feel free to share these resources with your colleagues and follow us in our social media channels.

Have a good end of the week,

Jean-Jérôme Schmidt
Planets9s Editor
Severalnines AB

Tags:
PlanetMySQL Voting: Vote UP / Vote DOWN

Log Buffer #475: A Carnival of the Vanities for DBAs

This Log Buffer Edition goes through various blogs, and selects some of the top posts from Oracle, SQL Server and MySQL.

Oracle:

MOS Note:136697.1 – New HCHECK.SQL for Oracle Database 12c

ORAchk / EXAchk questions.

Cloud control won’t start!

ASMLib is an optional utility that can be used on Linux systems to manage Oracle ASM devices.

ORA-56841: Master Diskmon cannot connect to a CELL.

Oracle BITAND Function with Examples.

 

SQL Server:

Natively Compiled Stored Procedures: What they are all about

Considerations around validation errors 41305 and 41325 on memory optimized tables with foreign keys

Taking Azure SQL Data Warehouse for a Test-Drive.

Persistent PowerShell: The PowerShell Profile.

SQL Server Always On Endpoint Encryption Algorithm Compatibility Error.

 

MySQL:

Fixing MySQL scalability problems with ProxySQL or thread pool.

Installing a Web, Email & MySQL Database Cluster on Debian 8.4 Jessie with ISPConfig 3.1

Planets9s – Download the new ClusterControl 1.3 for MySQL, MongoDB & PostgreSQL

AWS Aurora Benchmark – Choose the right tool for the job

Where is the MySQL 5.7 root password?


PlanetMySQL Voting: Vote UP / Vote DOWN

Why Use Slow Query Logs When There’s VividCortex?

At VividCortex, we believe that database monitoring can be much, much better than what many people currently settle for. In fact, we know that monitoring needs to be better -- more reliable and more efficient than what many engineering and operations teams currently use. Databases are getting larger and more complex all the time, and older monitoring techniques can’t properly optimize modern systems or keep up with issues when they arise. For instance, we consider slow query logs a prime example of a monitoring tool that’s become virtually obsolete in a world where you can find an alternative such as VividCortex.

We tell people to beware of slow query logs for two principal reasons: they can put a dangerous strain on your system, and, even when used well, they don’t provide nearly enough insight. That means that even if you decide to take the (unnecessary!) risk of running slow query logs, they don’t track metrics such as CPU usage -- they don’t provide adequate information to be a reliable guide for understanding your database. As Baron Schwartz, VividCortex’s founder and CEO, wrote in a blog post a couple years ago, “If you’re using slow query log analysis to ‘optimize’ your server, you’re potentially working on the wrong things.” There’s no reason to waste time and effort on such imprecise methods.

Though slow query logs have been a cornerstone of database monitoring for a long time, we frequently see our customers relieved to be able to move on to better strategies and more powerful tools. “Traditionally, the way for us to diagnose and follow up on the performance of any data was to follow the old rule of checking slow query logs,” Sergio Roysen, senior MySQL DBA at Shopify, told us in a recent conversation. Prior to using VividCortex, Sergio said, “it was painful to turn on the slow query log to capture all traffic and run pt-query-digest afterwards. If you don’t capture all traffic then those tools will show you a biased point of view, because they’re only logging the slow queries, and even then you get too much data. It’s just too slow to analyze -- you can’t just open the dashboard and see which queries are not using indexes.”   

Sergio went on to compare other qualities of VividCortex with what he expects when using more traditional tooling. Sergio said, “[Unlike VividCortex,] you can’t enable slow query logs to capture all traffic for an extended period of time. It requires so much storage that you can only do it for 5 minutes and even then you need to be very careful.” And that’s true: with a slow query log, you need to be cautious, as you can inadvertently take down your server if you leave the log running for too long. On top of that risk, you’re then left with short slices of time with which to work, to identify the issues causing problems in your system; and you’re missing all the other performance metrics from the system during that time, too.

Beyond that, VividCortex doesn’t merely do the work of slow query logs better, faster, or more reliably; VividCortex changes your entire expectations for the sort of information you’ll discover and use. “For instance,” Sergio said, “VividCortex proved to be a big help when we were diagnosing an issue regarding missing rows in a table that we were planning to use for a new feature with new code. We weren’t sure if those rows had actually never been inserted, or if they had been inserted but then deleted, later on. We realized that we had performed an online schema change in that table recently -- such an operation works by inserting rows in a temporary table with the new schema while using triggers to keep track of the changes taking place at the same time. With any tool other than VividCortex, it would have been very difficult for us to go beyond a hypothesis for how those rows were lost. But with VividCortex, we were able to look and say, ‘Guess what? Those inserts that our schema change tool should have run, just didn't happen as they should have.’ VividCortex helped us find out why. It not only showed us what was there -- it also showed us what was not there and what should have been. After that, we knew we could trust VividCortex.”

For us, that kind of trust represents more than VividCortex’s ability to provide better visibility than older tools like slow query logs; it means our customers understand that as their systems and technologies grow, VividCortex will grow alongside them. As modern databases evolve and make increasing demands on monitoring, VividCortex will continue to meet them.


PlanetMySQL Voting: Vote UP / Vote DOWN

Percona Server 5.6.30-76.3 is now available


Percona
is glad to announce the release of Percona Server 5.6.30-76.3 on May 25, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Based on MySQL 5.6.30, including all the bug fixes in it, Percona Server 5.6.30-76.3 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free – this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release can be found in the 5.6.30-76.3 milestone on Launchpad.

Bugs Fixed:

  • When Read Free Replication was enabled for TokuDB, and there was no explicit primary key for the replicated TokuDB table, there could be duplicated records in the table on update operation. The fix disables Read Free Replication for tables without an explicit primary key and does rows lookup for UPDATE and DELETE binary log events and issues warning. Bug fixed #1536663 (#950).
  • Attempting to execute a non-existing prepared statement with Response Time Distribution plugin enabled could lead to a server crash. Bug fixed #1538019.
  • TokuDB was using using different memory allocators; this was causing safemalloc warnings in debug builds and crashes because memory accounting didn’t add up. Bug fixed #1546538 (#962).
  • Fixed heap allocator/deallocator mismatch in Metrics for scalability measurement. Bug fixed #1581051.
  • Percona Server is now built with system zlib library instead of the older bundled one. Bug fixed #1108016.
  • Reduced the memory overhead per page in the InnoDB buffer pool. The fix was based on Facebook patch #91e979e. Bug fixed #1536693 (upstream #72466).
  • CREATE TABLE ... LIKE ... could create a system table with an unsupported enforced engine. Bug fixed #1540338.
  • Change buffer merge could throttle to 5% of I/O capacity on an idle server. Bug fixed #1547525.
  • Slave_open_temp_tables would fail to decrement on the slave with a disabled binary log if the master was killed. Bug fixed #1567361.
  • The server will now show a more descriptive error message when Percona Server fails with errno == 22 "Invalid argument", if innodb_flush_method was set to ALL_O_DIRECT. Bug fixed #1578604.
  • Killed connection threads could get their sockets closed twice on shutdown. Bug fixed #1580227.
  • AddressSanitizer build with LeakSanitizer enabled was failing at gen_lex_hash invocation. Bug fixed #1580993 (upstream #80014).
  • apt-cache show command for percona-server-client was showing innotop included as part of the package. Bug fixed #1201074.
  • mysql-systemd would fail with PAM authentication and proxies due to a regression introduced when fixing #1534825 in Percona Server 5.6.29-76.2. Bug fixed #1558312.
  • Upgrade logic for figuring if TokuDB upgrade can be performed from the version on disk to the current version was broken due to a regression introduced when fixing bug #684 in Percona Server 5.6.27-75.0. Bug fixed #717.
  • If ALTER TABLE was run while tokudb_auto_analyze variable was enabled it would trigger auto-analysis, which could lead to a server crash if ALTER TABLE DROP KEY was used because it would be operating on the old table/key meta-data. Bug fixed #945.
  • The TokuDB storage engine with tokudb_pk_insert_mode set to 1 is safe to use in all conditions. On INSERT IGNORE or REPLACE INTO, it tests to see if triggers exist on the table, or replication is active with !BINLOG_FORMAT_STMT before it allows the optimization. If either of these conditions is met, then it falls back to the “safe” operation of looking up the target row first. Bug fixed #952.
  • Bug in TokuDB Index Condition Pushdown was causing ORDER BY DESC to reverse the scan outside of the WHERE bounds. This would cause a query to hang in a sending data state for several minutes in some environments with large amounts of data (3 billion records) if the ORDER BY DESC statement was used. Bugs fixed #988, #233, and #534.

Other bugs fixed: #1399562 (upstream #75112), #1510564 (upstream #78981), #1496282 (#964), #1496786 (#956), #1566790, #1552673, #1567247, #1567869, #718, #914, #970, #971, #972, #976, #977, #981, #637, and #982.

Release notes for Percona Server 5.6.30-76.3 are available in the online documentation. Please report any bugs on the launchpad bug tracker.


PlanetMySQL Voting: Vote UP / Vote DOWN

Watch the replay: ClusterControl 1.3 webinar with new features for MySQL, MariaDB, Percona Server, PostgreSQL and more!

Thanks to everyone who joined us yesterday for our ClusterControl 1.3 release webinar!

Johan Andersson, CTO at Severalnines and creator of ClusterControl, walked us through the latest features of the 1.3 release and demonstrated them live as well. In addition to an overview of ClusterControl’s deployment, monitoring, management and scaling functionalities for MySQL, MariaDB, Percona Server, MongoDB and PostgreSQL, Johan focussed our attention on new features around key management, operational reports and more.

One feature-set that triggered particular interest in yesterday’s audience was the automated deployment of a production setup of NDB / MySQL Cluster: users can create a production setup of NDB/MySQL Cluster from ClusterControl and deploy management, SQL/API and data nodes - all via the ClusterControl interface.

The replay of this webinar and the slides are now available for viewing online:

Sign up for the the replay Read the slides

To get started with ClusterControl, download it today.

Webinar Agenda
  • ClusterControl overview
  • New features deep-dive
    • Key management and encryption
    • Additional operational reports
    • Improved security
    • Create / mirror repository
    • Create NDB / MySQL Cluster
  • Live Demo
  • Q&A
Speaker

Johan Andersson, CTO, Severalnines - Johan's technical background and interest are in high performance computing as demonstrated by the work he did on main-memory clustered databases at Ericsson as well as his research on parallel Java Virtual Machines at Trinity College Dublin in Ireland. Prior to co-founding Severalnines, Johan was Principal Consultant and lead of the MySQL Clustering & High Availability consulting group at MySQL / Sun Microsystems / Oracle, where he designed and implemented large-scale MySQL systems for key customers. Johan is a regular speaker at MySQL User Conferences as well as other high profile community gatherings with popular talks and tutorials around architecting and tuning MySQL Clusters.

For more information on ClusterControl 1.3:

To get started with ClusterControl, download it today.

Tags:
PlanetMySQL Voting: Vote UP / Vote DOWN

Shinguz: Why you should take care of MySQL data types

A customer reported last month that MySQL does a full table scan (FTS) if a query was filtered by a INT value on a VARCHAR column. First I told him that this is not true any more because MySQL has fixed this behaviour long time ago. He showed me that I was wrong:

CREATE TABLE `test` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `data` varchar(64) DEFAULT NULL, `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `data` (`data`) ) ENGINE=InnoDB; EXPLAIN SELECT * FROM test WHERE data = 42\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: test partitions: NULL type: ALL possible_keys: data key: NULL key_len: NULL ref: NULL rows: 522500 filtered: 10.00 Extra: Using where EXPLAIN SELECT * FROM test WHERE data = '42'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: test partitions: NULL type: ref possible_keys: data key: data key_len: 67 ref: const rows: 1 filtered: 100.00 Extra: NULL

When I executed the query I got some more interesting information:

SELECT * FROM test WHERE data = '42'; Empty set (0.00 sec) SELECT * FROM test WHERE data = 42; +--------+----------------------------------+---------------------+ | id | data | ts | +--------+----------------------------------+---------------------+ | 1096 | 42a5cb4a3e76857a3efe7af44ba9f4dd | 2016-05-25 10:26:59 | ... | 718989 | 42a1921fb2df42126d85f9586532eda4 | 2016-05-25 10:27:12 | +--------+----------------------------------+---------------------+ 767 rows in set, 65535 warnings (0.26 sec)

Looking at the warnings we also find the reason: MySQL does the cast on the column and not on the value which is a bit odd IMHO:

show warnings; | Warning | 1292 | Truncated incorrect DOUBLE value: '80f52706c2f9de40472ec29a7f70c992' |

A bit suspicious I looked at the warnings of the query execution plan again:

show warnings; +---------+------+---------------------------------------------------------------------------------------------+ | Level | Code | Message | +---------+------+---------------------------------------------------------------------------------------------+ | Warning | 1739 | Cannot use ref access on index 'data' due to type or collation conversion on field 'data' | | Warning | 1739 | Cannot use range access on index 'data' due to type or collation conversion on field 'data' | +---------+------+---------------------------------------------------------------------------------------------+

I thought this was fixed, but it seems not. The following releases behave like this: MySQL 5.0.96, 5.1.73, 5.5.38, 5.6.25, 5.7.12 and MariaDB 5.5.41, 10.0.21 and 10.1.9

The other way around it seems to work in both cases:

SELECT * FROM test WHERE id = 42; +----+----------------------------------+---------------------+ | id | data | ts | +----+----------------------------------+---------------------+ | 42 | 81d74057d7be8f20563da404bb1b3ab0 | 2016-05-25 10:26:56 | +----+----------------------------------+---------------------+ SELECT * FROM test WHERE id = '42'; +----+----------------------------------+---------------------+ | id | data | ts | +----+----------------------------------+---------------------+ | 42 | 81d74057d7be8f20563da404bb1b3ab0 | 2016-05-25 10:26:56 | +----+----------------------------------+---------------------+ EXPLAIN SELECT * FROM test WHERE id = 42\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: test partitions: NULL type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 filtered: 100.00 Extra: NULL
Taxonomy upgrade extras: query tuningexplaindata typesql
PlanetMySQL Voting: Vote UP / Vote DOWN

The differences between IDEMPOTENT and AUTO-REPAIR mode

I posted recently Lossless RBR for MySQL 8.0 about a concern I have about moving to minimal RBR in MySQL 8.0.  This seems to be the direction that Oracle is considering, but I am not sure it is a good idea as a default setting.

I talked about a hypothetical new replication mode lossless RBR and also about recovery after a crash where perhaps the data on the slave may get out of sync with the master. Under normal circumstances this should not happen but in the real world sometimes it does.

Note: I’m talking about an environment that does not use GTID.  GTID is good but may have its own issues and it’s probably best to leave those discussions to another post.

So let us talk about the difference between IDEMPOTENT mode (slave_exec_mode=IDEMPOTENT) and what I’ll call AUTO-REPAIR mode, mentioned in feature request bug#54250 to Oracle in 2010.  By default the DBA wants to avoid any data corruption, so this should be the default behaviour. Thus I’d prefer auto-repair mode to be off by default, stopping replication if any inconsistencies are found. I could enable it if I see such an issue as it should help me recover the state of the database without adding further “corruption” to the slave.

If I’m confident that this procedure works fine and I’m monitoring the counters mentioned below then it may be fine to leave enabled all the time.

A slave fails, it may crash and it recovers. It’s likely that the replication position it “remembers” is behind the actual state in the database.

If we use full RBR (default setting) in these circumstances then we may get in a set of changes which the SQL thread tries to apply.

They’ll be in the form of:

before row image / after row image

before row image / after row image

where each row image is the set of column values prior to and after the row changes.  Traditionally we use the abbreviations BI and AI for this.

Currently the SQL thread will start up and look for the first row to change and once it has found it change it.  If the exact matching conditions it needs can not be found then an error will be generated and replication stops.

IDEMPOTENT mode attempts to address this and tries to “continue whatever the cost”. To be honest I’m not exactly sure what it does, but it’s clear that it will either do nothing or perhaps it might try to find the row by primary key and update that row. I’d expect it probably does nothing.

See a comment later on.  So I did go and check and the comments in slave_exec_mode say that it suppresses duplicate-key and no-key-found errors. There is no mention of updates where the full AI is unavailable. (e.g. when using minimal RBR)

It also looks like it does not “repair” the issue, but it simply ignores it. The documentation is not 100% clear to me.

I made a comment about different options for AUTO-REPAIR mode and when it can work and when it can not. In FULL RBR mode it should always be able to do something. In MINIMAL RBR mode there will be cases when it can not. Let’s see the case of FULL RBR mode:

  1. For an UPDATE when the requested row can not be found:
    • auto-repair mode would INSERT the row. You have a full AI so you can do this safely.
    • A counter should be updated to record this action.
  2. For a DELETE row operation when the row can not be found:
    • auto-repair mode would ignore the error and given the row does not exist anyway the effect of the DELETE has already been accomplished.
    • A counter should be updated to record this action
  3. For an INSERT row operation when the ROW already exists.
    • Duplicate key insert) This is what generally breaks replication.
    • auto-repair mode would treat this as an UPDATE operation (based on the primary key in the table) and ensure the row is changed to have the same primary key and the columns of the AI.
    • Again a counter should be updated to record this action.

In FULL RBR mode these 3 actions should allow replication to continue. The database is no more corrupt than it was before. In fact it’s in a state that’s somewhat better.

In many cases other row events will proceed as expected without issue:  INSERTS will happen, UPDATES and DELETEs to existing rows will work as the row is found, and things will proceed as normal.

So should we get in a situation like this we can check the 3 counters and this gives us a clue as to the number of “repair actions” which MySQL has had to execute.  It also gives us an idea of how inconsistent the slave seems to be, though those inconsistencies should now have been removed.

As I said I can’t remember exactly what IDEMPOTENT mode does in these 3 circumstances.  It may do something similar to my AUTO-REPAIR mode or it may just skip the errors.

Why don’t I know?  Well I’m currently in a plane and the mysql documentation is not provided with my mysql server software and I’m not online so I can’t check.  I used to find the info file or a pdf of the manual quite helpful in such situations and would love to see it put back again so I don’t need to speculate about what the documentation says.

Yes, I could update this text when I’m back online, but I think I’ll make the point and leave this paragraph here.

So with FULL RBR the situation seems to me to be clear. IDEMPOTENT mode may not do the same thing as the AUTO-REPAIR mode, and whether it does or not there are no counters to see the effect it produces on my server. So I’m blind. I do not like that.

Let’s change the topic slightly and now switch to MINIMAL RBR and do the same thing. In theory now IDEMPOTENT mode and AUTO-REPAIR mode may seem to be the same (assuming IDEMPOTENT mode changes what it can) but that’s also not entirely true.

With minimal RBR mode we get a set of  primary key plus changed columns for each row that changes. For INSERTS we get the full ROW and for DELETES we only need the primary key. That should be enough.

What changes here are the UPDATES: as if we don’t get the full row image we can not know what was in the table before. We only have information on the new data.  So other columns which are not mentioned are unknown to us. If we are UPDATING a row and we can not find it, an INSERT is not possible as we do not have enough information to complete the columns that are unknown to us. So replication MUST stop if we want to avoid corruption.

Additionally, with minimal RBR UPDATES even if you find the ROW to UPDATE you can not be sure you are doing the right thing as you have no reference to the content or state of the before image. My thought here was that the ideal thing would be to send with each row a checksum of the row content on the master.  This would be “small” (so efficient) and could be checked against the row content on the slave prior to making the update.  If the values match we know the RBR UPDATE is working on expected data.  This makes a DBA feel more comfortable.

Table definitions on a master and its slaves are not always identical.  There are several reasons for this such as the fact that different (major) versions of MySQL are being used, or simply due to it being impossible to take downtime on the server some sort of out of band ALTER TABLE may have been run on the slave and that change is still pending on the master. The typical case here is adding new columns, or changing the type, width, character set or collation of an existing column. In these circumstances the binary image on the master and slave may well not be the same so the before row image “checksum” on the master would not be usable.  To detect such a situation it may be necessary to also send a table definition checksum with the row before image checksum, though this could be sent for each set of events on a table not each row. The combination of the two values should be enough to allow us to be ensure that minimal RBR changes can be validated even if we do not push down a full before image into the binlog stream. Again, if the definitions do not match it would seem sensible to update a counter to indicate such a situation.  We probably do not want to stop replication in this situation. Those who do not expect any sort of differences between master and slave may be paranoid enough to want to not continue, but I know for my usage I’d like to monitor changes to the counter but probably just continue.

Even my proposed LOSSLESS RBR would need this checksum to be safe as it would not contain the full before image but only the PK + all columns for an UPDATE operation, so potentially “slave drift” might happen and go undetected.

I can see therefore that optionally being able to add to minimal- and lossless-RBR such checksums would be a good way to ensure that replication works safely and pushes out changes to the slaves which are expected, and catches unexpected inconsistencies. 

The additional counters mentioned would help “catch” the number of inconsistencies that take place and they would be good even with the current replication setup when IDEMPOTENT mode is used. This lack of visibility of errors should make most DBAs rather sleepless, but I suspect there are those that are not aware and those that just have to live without that knowledge. Having these extra counters would help us see when things are not the same and allow us to take any necessary action based on that information should it be necessary.

I hope with this post I have clarify why IDEMPOTENT mode is not the same as my suggested AUTO-REPAIR mode and when it’s safe to continue replicating and when it is not under a variety of different conditions which would normally make RBR stop.

It also seems clear to me that MINIMAL RBR would benefit from some additional checksums to allow the DBA to be more confident that the changes being made on the slave match those made on the master.  This is especially so if using minimal RBR.

As always comments and feedback on this post is most welcome.


PlanetMySQL Voting: Vote UP / Vote DOWN

Looking inside the MySQL 5.7 document store

In this blog, we’ll look at the MySQL 5.7 document store feature, and how it is implemented.

Document Store

MySQL 5.7.12 is a major new release, as it contains quite a number of new features:

  1. Document store and “MongoDB” like NoSQL interface to JSON storage
  2. Protocol X / X Plugin, which can be used for asynchronous queries (I will write about it as well)
  3. New MySQL shell

Peter already wrote the document store overview; in this post, I will look deeper into the document store implementation. In my next post, I will demonstrate how to use document store for Internet of Things (IoT) and event logging.

Older MySQL 5.7 versions already have a JSON data type, and an ability to create virtual columns that can be indexed. The new document store feature is based on the JSON datatype.

So what is the document store anyway? It is an add-on to a normal MySQL table with a JSON field. Let’s take a deep dive into it and see how it works.

First of all: one can interface with the document store’s collections using the X Plugin (default port: 33060). To do that:

  1. Enable X Plugin and install MySQL shell.
  2. Login to a shell:
    mysqlsh --uri root@localhost
  3. Run commands (JavaScript mode, can be switched to SQL or Python):
    mysqlsh --uri root@localhost Creating an X Session to root@localhost:33060 Enter password: No default schema selected. Welcome to MySQL Shell 1.0.3 Development Preview Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help', 'h' or '?' for help. Currently in JavaScript mode. Use sql to switch to SQL mode and execute queries. mysql-js> db = session.getSchema('world_x') <Schema:world_x> mysql-js> db.getCollections() { "CountryInfo": <Collection:CountryInfo> }

Now, how is the document store’s collection different from a normal table? To find out, I’ve connected to a normal MySQL shell:

mysql world_x Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 2396 Server version: 5.7.12 MySQL Community Server (GPL) Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql> show create table CountryInfo *************************** 1. row *************************** Table: CountryInfo Create Table: CREATE TABLE `CountryInfo` ( `doc` json DEFAULT NULL, `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL, PRIMARY KEY (`_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 1 row in set (0.00 sec) mysql> show tables; +-------------------+ | Tables_in_world_x | +-------------------+ | City | | Country | | CountryInfo | | CountryLanguage | +-------------------+ 4 rows in set (0.00 sec)

So the document store is actually an InnoDB table with one field: doc json + Primary key, which is a generated column.

As we can also see, there are four tables in the world_x database, but db.getCollections() only shows one. So how does MySQL distinguish between a “normal” table and a “document store” table? To find out, we can enable the general query log and see which query is being executed:

$ mysql -e 'set global general_log=1' $ tail /var/log/general.log 2016-05-17T20:53:12.772114Z 186 Query SELECT table_name, COUNT(table_name) c FROM information_schema.columns WHERE ((column_name = 'doc' and data_type = 'json') OR (column_name = '_id' and generation_expression = 'json_unquote(json_extract(`doc`,''$._id''))')) AND table_schema = 'world_x' GROUP BY table_name HAVING c = 2 2016-05-17T20:53:12.773834Z 186 Query SHOW FULL TABLES FROM `world_x`

As you can see, every table that has a specific structure (doc JSON or specific generation_expression) is considered to be a JSON store. Now, how does MySQL translate the .find or .add constructs to actual MySQL queries? Let’s run a sample query:

mysql-js> db.getCollection("CountryInfo").find('Name= "United States"').limit(1) [ { "GNP": 8510700, "IndepYear": 1776, "Name": "United States", "_id": "USA", "demographics": { "LifeExpectancy": 77.0999984741211, "Population": 278357000 }, "geography": { "Continent": "North America", "Region": "North America", "SurfaceArea": 9363520 }, "government": { "GovernmentForm": "Federal Republic", "HeadOfState": "George W. Bush", "HeadOfState_title": "President" } } ] 1 document in set (0.02 sec)

and now look at the slow query log again:

2016-05-17T21:02:21.213899Z 186 Query SELECT doc FROM `world_x`.`CountryInfo` WHERE (JSON_EXTRACT(doc,'$.Name') = 'United States') LIMIT 1

We can verify that MySQL translates all document store commands to SQL. That also means that it is 100% transparent to the existing MySQL storage level and will work with other storage engines. Let’s verify that, just for fun:

mysql> alter table CountryInfo engine=MyISAM; Query OK, 239 rows affected (0.06 sec) Records: 239 Duplicates: 0 Warnings: 0 mysql-js> db.getCollection("CountryInfo").find('Name= "United States"').limit(1) [ { "GNP": 8510700, "IndepYear": 1776, "Name": "United States", "_id": "USA", "demographics": { "LifeExpectancy": 77.0999984741211, "Population": 278357000 }, "geography": { "Continent": "North America", "Region": "North America", "SurfaceArea": 9363520 }, "government": { "GovernmentForm": "Federal Republic", "HeadOfState": "George W. Bush", "HeadOfState_title": "President" } } ] 1 document in set (0.00 sec) 2016-05-17T21:09:21.074726Z 2399 Query alter table CountryInfo engine=MyISAM 2016-05-17T21:09:41.037575Z 2399 Quit 2016-05-17T21:09:43.014209Z 186 Query SELECT doc FROM `world_x`.`CountryInfo` WHERE (JSON_EXTRACT(doc,'$.Name') = 'United States') LIMIT 1

Worked fine!

Now, how about the performance? We can simply take the SQL query and run explain:

mysql> explain SELECT doc FROM `world_x`.`CountryInfo` WHERE (JSON_EXTRACT(doc,'$.Name') = 'United States') LIMIT 1 *************************** 1. row *************************** id: 1 select_type: SIMPLE table: CountryInfo partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 239 filtered: 100.00 Extra: Using where 1 row in set, 1 warning (0.00 sec)

Hmm, it looks like it is not using an index. That’s because there is no index on Name. Can we add one? Sure, we can add a virtual column and then index it:

mysql> alter table CountryInfo add column Name varchar(255) -> GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.Name'))) VIRTUAL; Query OK, 0 rows affected (0.12 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> alter table CountryInfo add key (Name); Query OK, 0 rows affected (0.02 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> explain SELECT doc FROM `world_x`.`CountryInfo` WHERE (JSON_EXTRACT(doc,'$.Name') = 'United States') LIMIT 1 *************************** 1. row *************************** id: 1 select_type: SIMPLE table: CountryInfo partitions: NULL type: ref possible_keys: name key: name key_len: 768 ref: const rows: 1 filtered: 100.00 Extra: NULL 1 row in set, 1 warning (0.00 sec)

That is really cool! We have added an index, and now the original query starts using it. Note that we do not have to reference the new field, the MySQL optimizer is smart enough to translate the (JSON_EXTRACT(doc,'$.Name') = 'United States' to an index scan on the virtual column.

But please note: JSON attributes are case-sensitive. If you will use (doc,'$.name') instead of (doc,'$.Name') it will not generate an error, but will simply break the search and all queries looking for “Name” will return 0 rows.

Finally, if you looked closely at the output of db.getCollection("CountryInfo").find('Name= "United States"').limit(1) , you noticed that the database has outdated info:

"government": { "GovernmentForm": "Federal Republic", "HeadOfState": "George W. Bush", "HeadOfState_title": "President" }

Let’s change “George W. Bush” to “Barack Obama” using the .modify clause:

mysql-js> db.CountryInfo.modify("Name = 'United States'").set("government.HeadOfState", "Barack Obama" ); Query OK, 1 item affected (0.02 sec) mysql-js> db.CountryInfo.find('Name= "United States"') [ { "GNP": 8510700, "IndepYear": 1776, "Name": "United States", "_id": "USA", "demographics": { "LifeExpectancy": 77.0999984741211, "Population": 278357000 }, "geography": { "Continent": "North America", "Region": "North America", "SurfaceArea": 9363520 }, "government": { "GovernmentForm": "Federal Republic", "HeadOfState": "Barack Obama", "HeadOfState_title": "President" } } ] 1 document in set (0.00 sec)

Conclusion

Document store is an interesting concept and a good add-on on top of the existing MySQL JSON feature. Using the new .find/.add/.modify methods instead of the original SQL statements can be convenient in some cases.

Some might ask, “why do you want to use document store and store information in JSON inside the database if it is relational anyway?” Storing data in JSON can be quite useful in some cases, for example:

  • You already have a JSON (i.e., from external feeds) and need to store it anyway. Using the JSON datatype will be more convenient and more efficient.
  • You have a flexible schema, typical for the Internet of Things for example, where some sensors might only send temperature data, some might send temperature/humidity/light (but light information is only recorded during the day), etc. Storing it in the JSON format can be more convenient so that you do not have to declare all possible fields in advance, and do not have to run “alter table” if a new sensor starts sending new types of data.

In the next two blog posts, I will show how to use document store for Internet of Things / event streaming, and how to use X Protocol for asynchronous queries in MySQL.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Keywords and Reserved Words

I've seen some confusion on what constitutes a keyword or a reserved word in MySQL. The manual defines them, and has a complete list. Reserved words are a special subset of keywords, and you can't use a reserved word as an identifier unless you quote it with backticks. I discourage the use of backticks to quote identifiers, because it allows you to use lots of words and characters in your identifier that you'll probably regret later. For example:

`` mysql> create tableYou will regret this!((╯°□°)╯︵ ┻━┻` int); Query OK, 0 rows affected (0.03 sec)

mysql> desc You will regret this!; +--------------------------------+---------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------------------------+---------+------+-----+---------+-------+ | (╯°□°)╯︵ ┻━┻ | int(11) | YES | | NULL | | +--------------------------------+---------+------+-----+---------+-------+ 1 row in set (0.00 sec) ```

There is also an exception for table names. You can use a reserved word as a table name without quoting it with backticks as long as you prefix it with the schema name, but again I would discourage that. Here's an example:

mysql> create table table (id int); ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'table (id int)' at line 1 mysql> create table test.table (id int); Query OK, 0 rows affected (0.04 sec)

Non-reserved keywords are a little bit trickier. Many of them probably should be reserved, but are not. For example, lots of data type names are not reserved, nor is the word "view". Thus I can create a table like this without even having to use backticks:

create table view ( bit bit, bool bool, boolean boolean, date date, datetime datetime, enum enum('enum'), text text, time time, timestamp timestamp, year year, comment int comment 'int' );

Yuck. I think it's a good practice to familiarize yourself with all of the non-reserved MySQL keywords and avoid using them as identifiers, but unfortunately there is nothing to prevent you from doing so.


PlanetMySQL Voting: Vote UP / Vote DOWN

Pages