Planet MySQL

New Option to Stop the Server If Binlogging Fails in MySQL 5.6

In this post I will try to explain the new MySQL binlog_error_action server option. This new option is available from MySQL 5.6.22 and later.

Background:
——————–
As part of MySQL replication all data changes that happen on the master server are recorded into a binary log so that they can be sent to slave and replayed there. If an error occurs that prevents mysqld from writing to the binary log (disk full, readonly file system, etc.) then the logs are simply disabled and operations continue on the master. This error mainly occurs during rotation of the binary log or while opening a binary log file.

This problem creates a serious potential for data loss within a replication group. When a master hits this failure in production, all downstream replication clients stop receiving replication events. The master will not store binlog events for committed transactions to its binary log and consequently there will be no new events in the binary log to send to replication clients. If that master then fails, then all the transactions the master received while binlogging was turned off are lost forever. This can lead to out of sync slaves and improper backups.

Error message when file system becomes readonly:
As part of the bug fix for Bug#51014 the binlog_error_action server option was introduced. Using this option a user can choose to either ignore the error (still the default so as to avoid behavior changes in GA releases) or to abort the server. The IGNORE_ERROR option value refers to the default behavior (as described above) where binlogging will simply be disabled and the master will continue with its normal operations.

mysql> SHOW GLOBAL VARIABLES LIKE 'binlog_error_action'; +---------------------+--------------+ | Variable_name | Value | +---------------------+--------------+ | binlog_error_action | IGNORE_ERROR | +---------------------+--------------+ 1 row in set (0.00 sec)

However, the ABORT_SERVER option value will cause the server to exit when binlogging operations fail. At the time of the resulting server exit a fatal error is pushed to clients and the server will shut down. The error details being:

Error code: ER_BINLOG_LOGGING_IMPOSSIBLE Error message: Binary logging not possible. Either disk is full or file system is read only while rotating the binlog. Aborting the server

Specifying the option:
———————————–
This option can be specified as a startup option (either on the command-line or in a config file) or dynamically in the running server using the SET command:
mysql> SET GLOBAL binlog_error_action=ABORT_SERVER;

Demonstration of the new option in the case of a read-only file system:
——————————————————————————————————————
Step 1: SET GLOBAL binlog_error_action= ABORT_SERVER;
Step 2: Make your file system readonly
Step 3: flush logs

Summary:
——————
If an error occurs that prevents mysqld from writing to the binary log the existing behaviour is: binary logging is disabled and the server continues with its normal operations. Now with the new binlog_error_action server option the user can choose to either ignore the error (IGNORE_ERROR) or to abort the server (ABORT_SERVER) when binary logging failures occur. This optional behavior was first introduced in MySQL 5.6.20 using the binlogging_impossible_mode server option. That option name is now deprecated in MySQL 5.6.22 and the option is instead now referred to as binlog_error_action.

We look forward to your feedback on this new feature! If you have any questions or encounter any bugs, please do let us know by opening a support ticket or filing a bug. As always, THANK YOU for using MySQL!


PlanetMySQL Voting: Vote UP / Vote DOWN

Testing backup locks during Xtrabackup SST on Percona XtraDB Cluster

Background on Backup Locks

I was very excited to see Backup locks support in release notes for the latest Percona XtraDB Cluster 5.6.21 release. For those who are not aware, backup locks offer an alternative to FLUSH TABLES WITH READ LOCK (FTWRL) in Xtrabackup. While Xtrabackup can hot-copy Innodb, everything else in MySQL must be locked (usually briefly) to get a consistent snapshot that lines up with Innodb. This includes all other storage engines, but also things like table schemas (even on Innodb) and async replication binary logs. You can skip this lock, but it isn’t generally considered a ‘safe’ backup in every case.

Until recently, Xtrabackup (like most other backup tools) used FTWRL to accomplish this. This worked great, but had the unfortunate side-effect of locking every single table, even the Innodb ones.  This functionally meant that even a hot-backup tool for Innodb had to take a (usually short) global lock to get a consistent backup with MySQL overall.

Backup locks change that by introducing a new locking command on Percona Server called ‘LOCK TABLES FOR BACKUP’.  This works by locking writes to non-transactional tables, as well as locking DDL on all tables (including Innodb).  If Xtrabackup (of a recent vintage) detects that it’s backing up a Percona Server (also of recent vintage), it will automatically use LOCK TABLES WITH BACKUP instead of FLUSH TABLES WITH READ LOCK.

The TL;DR of this is that you can keep on modifying your Innodb data through the entire backup, since we don’t need to use FTWRL any longer.

This feature was introduced in Percona Server 5.6.16-64.0 and Percona XtraBackup 2.2.  I do not believe you will find it in any other MySQL variant, though I could be corrected.

What this means for Percona XtraDB Cluster (PXC)

The most common (and logical) SST method for Percona XtraDB Cluster is using Xtrabackup. This latest release of PXC includes support for backup locks, meaning that Xtrabackup donor nodes will no longer need to get a global lock. Practically for PXC users, this means that your Donor nodes can stay in rotation without causing client interruptions due to FTWRL.

Seeing it in action

To test this out, I spun up a 3-node cluster on AWS and fired up a sysbench run on the first node. I forced and SST on the node. Here is a snippet of the innobackup.backup.log (generated by all Xtrabackup donors in Percona XtraDB Cluster):

InnoDB Backup Utility v1.5.1-xtrabackup; Copyright 2003, 2009 Innobase Oy and Percona LLC and/or its affiliates 2009-2013. All Rights Reserved. This software is published under the GNU GENERAL PUBLIC LICENSE Version 2, June 1991. Get the latest version of Percona XtraBackup, documentation, and help resources: http://www.percona.com/xb/p 141218 19:22:01 innobackupex: Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtraback up;mysql_socket=/var/lib/mysql/mysql.sock' as 'sst' (using password: YES). 141218 19:22:01 innobackupex: Connected to MySQL server 141218 19:22:01 innobackupex: Starting the backup operation IMPORTANT: Please check that the backup run completes successfully. At the end of a successful backup run innobackupex prints "completed OK!". innobackupex: Using server version 5.6.21-70.1-56 innobackupex: Created backup directory /tmp/tmp.Rm0qA740U3 141218 19:22:01 innobackupex: Starting ibbackup with command: xtrabackup --defaults-file="/etc/my.cnf" --defaults-group="mysqld" --backup --suspe nd-at-end --target-dir=/tmp/tmp.dM03LgPHFY --innodb_data_file_path="ibdata1:12M:autoextend" --tmpdir=/tmp/tmp.dM03LgPHFY --extra-lsndir='/tmp/tmp.dM 03LgPHFY' --stream=xbstream innobackupex: Waiting for ibbackup (pid=21892) to suspend innobackupex: Suspend file '/tmp/tmp.dM03LgPHFY/xtrabackup_suspended_2' xtrabackup version 2.2.7 based on MySQL server 5.6.21 Linux (x86_64) (revision id: ) xtrabackup: uses posix_fadvise(). xtrabackup: cd to /var/lib/mysql xtrabackup: open files limit requested 0, set to 5000 xtrabackup: using the following InnoDB configuration: xtrabackup: innodb_data_home_dir = ./ xtrabackup: innodb_data_file_path = ibdata1:12M:autoextend xtrabackup: innodb_log_group_home_dir = ./ xtrabackup: innodb_log_files_in_group = 2 xtrabackup: innodb_log_file_size = 1073741824 xtrabackup: using O_DIRECT >> log scanned up to (10525811040) xtrabackup: Generating a list of tablespaces [01] Streaming ./ibdata1 >> log scanned up to (10529368594) >> log scanned up to (10532685942) >> log scanned up to (10536422820) >> log scanned up to (10539562039) >> log scanned up to (10543077110) [01] ...done [01] Streaming ./mysql/innodb_table_stats.ibd [01] ...done [01] Streaming ./mysql/innodb_index_stats.ibd [01] ...done [01] Streaming ./mysql/slave_relay_log_info.ibd [01] ...done [01] Streaming ./mysql/slave_master_info.ibd [01] ...done [01] Streaming ./mysql/slave_worker_info.ibd [01] ...done [01] Streaming ./sbtest/sbtest1.ibd >> log scanned up to (10546490256) >> log scanned up to (10550321726) >> log scanned up to (10553628936) >> log scanned up to (10555422053) [01] ...done ... [01] Streaming ./sbtest/sbtest17.ibd >> log scanned up to (10831343724) >> log scanned up to (10834063832) >> log scanned up to (10837100278) >> log scanned up to (10840243171) [01] ...done xtrabackup: Creating suspend file '/tmp/tmp.dM03LgPHFY/xtrabackup_suspended_2' with pid '21892' >> log scanned up to (10843312323) 141218 19:24:06 innobackupex: Continuing after ibbackup has suspended 141218 19:24:06 innobackupex: Executing LOCK TABLES FOR BACKUP... 141218 19:24:06 innobackupex: Backup tables lock acquired 141218 19:24:06 innobackupex: Starting to backup non-InnoDB tables and files innobackupex: in subdirectories of '/var/lib/mysql/' innobackupex: Backing up files '/var/lib/mysql//mysql/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (74 files) >> log scanned up to (10846683627) >> log scanned up to (10847773504) innobackupex: Backing up files '/var/lib/mysql//sbtest/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (21 files) innobackupex: Backing up file '/var/lib/mysql//test/db.opt' innobackupex: Backing up files '/var/lib/mysql//performance_schema/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (53 files) >> log scanned up to (10852976291) 141218 19:24:09 innobackupex: Finished backing up non-InnoDB tables and files 141218 19:24:09 innobackupex: Executing LOCK BINLOG FOR BACKUP... 141218 19:24:09 innobackupex: Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS... 141218 19:24:09 innobackupex: Waiting for log copying to finish >> log scanned up to (10856996124) xtrabackup: The latest check point (for incremental): '9936050111' xtrabackup: Stopping log copying thread. .>> log scanned up to (10856996124) xtrabackup: Creating suspend file '/tmp/tmp.dM03LgPHFY/xtrabackup_log_copied' with pid '21892' 141218 19:24:10 innobackupex: Executing UNLOCK BINLOG 141218 19:24:10 innobackupex: Executing UNLOCK TABLES 141218 19:24:10 innobackupex: All tables unlocked 141218 19:24:10 innobackupex: Waiting for ibbackup (pid=21892) to finish xtrabackup: Transaction log of lsn (9420426891) to (10856996124) was copied. innobackupex: Backup created in directory '/tmp/tmp.Rm0qA740U3' 141218 19:24:30 innobackupex: Connection to database server closed 141218 19:24:30 innobackupex: completed OK!

We can see the LOCK TABLES FOR BACKUP issued at 19:24:06 and unlocked at 19:24:10. Let’s see Galera apply stats from this node during that time:

mycluster / ip-10-228-128-220 (idx: 0) / Galera 3.8(rf6147dd) Wsrep Cluster Node Repl Queue Ops Bytes Conflct Gcache Window Flow time P cnf # Stat Laten Up Dn Up Dn Up Dn lcf bfa ist idx dst appl comm p_ms 19:23:55 P 5 3 Dono 698µs 0 72 0 5418 0.0 3.5M 0 0 187k 94 3k 3 2 0 19:23:56 P 5 3 Dono 701µs 0 58 0 5411 0.0 3.5M 0 0 188k 229 3k 3 2 0 19:23:57 P 5 3 Dono 701µs 0 2 0 5721 0.0 3.7M 0 0 188k 120 3k 3 2 0 19:23:58 P 5 3 Dono 689µs 0 5 0 5643 0.0 3.6M 0 0 188k 63 3k 3 2 0 19:23:59 P 5 3 Dono 679µs 0 55 0 5428 0.0 3.5M 0 0 188k 115 3k 3 2 0 19:24:01 P 5 3 Dono 681µs 0 1 0 4623 0.0 3.0M 0 0 188k 104 3k 3 2 0 19:24:02 P 5 3 Dono 690µs 0 0 0 4301 0.0 2.7M 0 0 188k 141 3k 3 2 0 19:24:03 P 5 3 Dono 688µs 0 2 0 4907 0.0 3.1M 0 0 188k 227 3k 3 2 0 19:24:04 P 5 3 Dono 692µs 0 44 0 4894 0.0 3.1M 0 0 188k 116 3k 3 2 0 19:24:05 P 5 3 Dono 706µs 0 0 0 5337 0.0 3.4M 0 0 188k 63 3k 3 2 0

Initially the node is keeping up ok with replication. The Down Queue (wsrep_local_recv_queue) is sticking around 0. We’re applying 4-5k transactions per second (Ops Dn). When the backup lock kicks in, we do see an increase in the queue size, but note that transactions are still applying on this node:

19:24:06 P 5 3 Dono 696µs 0 170 0 5671 0.0 3.6M 0 0 187k 130 3k 3 2 0 19:24:07 P 5 3 Dono 695µs 0 2626 0 3175 0.0 2.0M 0 0 185k 2193 3k 3 2 0 19:24:08 P 5 3 Dono 692µs 0 1248 0 6782 0.0 4.3M 0 0 186k 1800 3k 3 2 0 19:24:09 P 5 3 Dono 693µs 0 611 0 6111 0.0 3.9M 0 0 187k 651 3k 3 2 0 19:24:10 P 5 3 Dono 708µs 0 93 0 5316 0.0 3.4M 0 0 187k 139 3k 3 2 0

So this node isn’t locked from innodb write transactions, it’s just suffering a bit of IO load while the backup finishes copying its files and such. After this, the backup finished up and the node goes back to a Synced state pretty quickly:

19:24:11 P 5 3 Dono 720µs 0 1 0 4486 0.0 2.9M 0 0 188k 78 3k 3 2 0 19:24:12 P 5 3 Dono 715µs 0 0 0 3982 0.0 2.5M 0 0 188k 278 3k 3 2 0 19:24:13 P 5 3 Dono 1.2ms 0 0 0 4337 0.0 2.8M 0 0 188k 143 3k 3 2 0 19:24:14 P 5 3 Dono 1.2ms 0 1 0 4901 0.0 3.1M 0 0 188k 130 3k 3 2 0 19:24:16 P 5 3 Dono 1.1ms 0 0 0 5289 0.0 3.4M 0 0 188k 76 3k 3 2 0 19:24:17 P 5 3 Dono 1.1ms 0 42 0 4998 0.0 3.2M 0 0 188k 319 3k 3 2 0 19:24:18 P 5 3 Dono 1.1ms 0 15 0 3290 0.0 2.1M 0 0 188k 75 3k 3 2 0 19:24:19 P 5 3 Dono 1.1ms 0 0 0 4124 0.0 2.6M 0 0 188k 276 3k 3 2 0 19:24:20 P 5 3 Dono 1.1ms 0 4 0 1635 0.0 1.0M 0 0 188k 70 3k 3 2 0 19:24:21 P 5 3 Dono 1.1ms 0 0 0 5026 0.0 3.2M 0 0 188k 158 3k 3 2 0 19:24:22 P 5 3 Dono 1.1ms 0 20 0 4100 0.0 2.6M 0 0 188k 129 3k 3 2 0 19:24:23 P 5 3 Dono 1.1ms 0 0 0 5412 0.0 3.5M 0 0 188k 159 3k 3 2 0 19:24:24 P 5 3 Dono 1.1ms 0 315 0 4567 0.0 2.9M 0 0 187k 170 3k 3 2 0 19:24:25 P 5 3 Dono 1.0ms 0 24 0 5535 0.0 3.5M 0 0 188k 131 3k 3 2 0 19:24:26 P 5 3 Dono 1.0ms 0 0 0 5427 0.0 3.5M 0 0 188k 71 3k 3 2 0 19:24:27 P 5 3 Dono 1.0ms 0 1 0 5221 0.0 3.3M 0 0 188k 256 3k 3 2 0 19:24:28 P 5 3 Dono 1.0ms 0 0 0 5317 0.0 3.4M 0 0 188k 159 3k 3 2 0 19:24:29 P 5 3 Dono 1.0ms 0 1 0 5491 0.0 3.5M 0 0 188k 163 3k 3 2 0 19:24:30 P 5 3 Sync 1.0ms 0 0 0 5540 0.0 3.5M 0 0 188k 296 3k 3 2 0 19:24:31 P 5 3 Sync 992µs 0 106 0 5594 0.0 3.6M 0 0 187k 130 3k 3 2 0 19:24:33 P 5 3 Sync 984µs 0 19 0 5723 0.0 3.7M 0 0 188k 275 3k 3 2 0 19:24:34 P 5 3 Sync 976µs 0 0 0 5508 0.0 3.5M 0 0 188k 182 3k 3 2 0

Compared to Percona XtraDB Cluster 5.5

The Backup Locking is only a feature of Percona XtraDB Cluster 5.6, so if we repeat the experiment on 5.5, we can see a more severe lock:

141218 20:31:19 innobackupex: Executing FLUSH TABLES WITH READ LOCK... 141218 20:31:19 innobackupex: All tables locked and flushed to disk 141218 20:31:19 innobackupex: Starting to backup non-InnoDB tables and files innobackupex: in subdirectories of '/var/lib/mysql/' innobackupex: Backing up files '/var/lib/mysql//sbtest/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (21 files) innobackupex: Backing up files '/var/lib/mysql//mysql/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (72 files) >> log scanned up to (6633554484) innobackupex: Backing up file '/var/lib/mysql//test/db.opt' innobackupex: Backing up files '/var/lib/mysql//performance_schema/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (18 files) 141218 20:31:21 innobackupex: Finished backing up non-InnoDB tables and files 141218 20:31:21 innobackupex: Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS... 141218 20:31:21 innobackupex: Waiting for log copying to finish xtrabackup: The latest check point (for incremental): '5420681649' xtrabackup: Stopping log copying thread. .>> log scanned up to (6633560488) xtrabackup: Creating suspend file '/tmp/tmp.Cq5JRZEFki/xtrabackup_log_copied' with pid '23130' 141218 20:31:22 innobackupex: All tables unlocked

Our lock lasts from 20:31:19 until 20:31:21, so it’s fairly short. Note that with larger databases with more schemas and tables, this can be quite a bit longer. Let’s see the effect on the apply rate for this node:

mycluster / ip-10-229-68-156 (idx: 0) / Galera 2.11(r318911d) Wsrep Cluster Node Repl Queue Ops Bytes Conflct Gcache Window Flow time P cnf # Stat Laten Up Dn Up Dn Up Dn lcf bfa ist idx dst appl comm p_ms 20:31:13 P 5 3 Dono N/A 0 73 0 3493 0.0 1.8M 0 0 1.8m 832 746 2 2 0.0 20:31:14 P 5 3 Dono N/A 0 29 0 3578 0.0 1.9M 0 0 1.8m 850 749 3 2 0.0 20:31:15 P 5 3 Dono N/A 0 0 0 3513 0.0 1.8M 0 0 1.8m 735 743 2 2 0.0 20:31:16 P 5 3 Dono N/A 0 0 0 3651 0.0 1.9M 0 0 1.8m 827 748 2 2 0.0 20:31:17 P 5 3 Dono N/A 0 27 0 3642 0.0 1.9M 0 0 1.8m 840 762 2 2 0.0 20:31:18 P 5 3 Dono N/A 0 0 0 3840 0.0 2.0M 0 0 1.8m 563 776 2 2 0.0 20:31:19 P 5 3 Dono N/A 0 0 0 4368 0.0 2.3M 0 0 1.8m 823 745 2 1 0.0 20:31:20 P 5 3 Dono N/A 0 3952 0 339 0.0 0.2M 0 0 1.8m 678 751 1 1 0.0 20:31:21 P 5 3 Dono N/A 0 7883 0 0 0.0 0.0 0 0 1.8m 678 751 0 0 0.0 20:31:22 P 5 3 Dono N/A 0 4917 0 5947 0.0 3.1M 0 0 1.8m 6034 3k 7 6 0.0 20:31:24 P 5 3 Dono N/A 0 10 0 8238 0.0 4.3M 0 0 1.8m 991 1k 7 6 0.0 20:31:25 P 5 3 Dono N/A 0 0 0 3016 0.0 1.6M 0 0 1.8m 914 754 2 1 0.0 20:31:26 P 5 3 Dono N/A 0 0 0 3253 0.0 1.7M 0 0 1.8m 613 766 1 1 0.0 20:31:27 P 5 3 Dono N/A 0 1 0 3600 0.0 1.9M 0 0 1.8m 583 777 2 1 0.0 20:31:28 P 5 3 Dono N/A 0 0 0 3640 0.0 1.9M 0 0 1.8m 664 750 2 2 0.0

The drop here is more severe and the apply rate hits 0 (and stays there for the duration of the FTWRL).

Implications

Obviously Xtrabackup running on a PXC node will cause some load on the node itself, so there still maybe good reasons to keep a Donor node out of rotation from your application.  However, this is less of an issue than it was in the past, where writes would definitely stall on a Donor node and present potentially intermittent stalls on the application.

How you allow applications to start using a Donor node automatically (or not) depends on how you have your HA between the application and cluster setup.  If you use HAproxy or similar with clustercheck, you can either modify the script itself or change a command line argument. The node is in the Donor/Desynced state below:

[root@ip-10-229-64-35 ~]# /usr/bin/clustercheck clustercheckuser clustercheckpassword! HTTP/1.1 503 Service Unavailable Content-Type: text/plain Connection: close Content-Length: 44 Percona XtraDB Cluster Node is not synced. [root@ip-10-229-64-35 ~]# /usr/bin/clustercheck clustercheckuser clustercheckpassword! 1 HTTP/1.1 200 OK Content-Type: text/plain Connection: close Content-Length: 40 Percona XtraDB Cluster Node is synced.

For those doing their own custom health checking, you basically just need to pass nodes that have a wsrep_local_state_comment of either ‘Synced’ or ‘Donor/Desynced’.

The post Testing backup locks during Xtrabackup SST on Percona XtraDB Cluster appeared first on MySQL Performance Blog.


PlanetMySQL Voting: Vote UP / Vote DOWN

Chef Cookbooks for ClusterControl - Management and Monitoring for your Database Clusters

December 22, 2014 By Severalnines

If you are automating your infrastructure deployments with Chef, then read on. We are glad to announce the availability of a Chef cookbook for ClusterControl. This cookbook replaces previous cookbooks we released for ClusterControl and Galera Cluster. For those using Puppet, please have a look at our Puppet module for ClusterControl.

 

ClusterControl Cookbook on Chef Supermarket

 

The ClusterControl cookbook is available on Chef Supermarket, and getting the cookbook is as easy as:

$ knife cookbook site download clustercontrol

 

This cookbook supports the installation of ClusterControl on top of existing database clusters:

  • Galera Cluster
  • MySQL Galera Cluster by Codership
  • Percona XtraDB Cluster by Percona
  • MariaDB Galera Cluster by MariaDB
  • MySQL Cluster (NDB)
  • MySQL Replication
  • Standalone MySQL/MariaDB server
  • MongoDB or TokuMX Clusters
  • Sharded Cluster
  • Replica Set

 

Installing ClusterControl using Cookbook

 

We will show you how to install ClusterControl on top of an existing database cluster using the cookbook. It requires the following criteria to be met:

  • The node for ClusterControl must be a clean/dedicated host.
  • ClusterControl node must run on 64-bit Linux platform, on the same OS distribution as the monitored DB nodes. Mixing Debian with Ubuntu or CentOS with Red Hat is acceptable.
  • ClusterControl node must have an internet connection during the initial deployment. After the deployment, ClusterControl does not need internet access.
  • Make sure your database cluster is up and running before doing this deployment.

 

read more


PlanetMySQL Voting: Vote UP / Vote DOWN

The Monitoring mistake OR how dreaming can bring ideas

"Hi Paul how is going?" "Good Marco and you?" "Good, I had a stressful week last week but now is ok, I mange to close some pending activities, working a little bit more during the day, doing that I was able to reduce the queue of pending task, and now all is normal again", "good for you that you manage, I had too many things ongoing and was not able to dedicate more time to queue".

 

The simple (boring) conversation above hides one of the most complex elaborations of monitoring data. We as human being do a lot of data processing in very short time. We may be less fast in doing some calculations respect to computers, but no computer can compete with us when we are talking about multitask and data processing.

 

To answer to someone asking you how you are, you do not simple review your status in that moment, your brain decide on the base of the last time you have see the person to review all the relevant data and provide a synthesis of the relevant facts, then again you summarize in "good" because you do not consider relevant to pass over all single facts to your friend but only the conclusion.

 

Not only, during the same process, you evaluate, in relation to your relationship with the person, what kind of information you may want to share and why, how to present to him/her such that it will be relevant and interesting for the interaction.

 

The simple process of talking also may happen while you are walking along the street, taking care of the traffic, and expressing interest, curiosity or annoyance to your collocutor.

Each expression you will show on your face is the result of the data collection, analysis and decision your brain is taking. Plus some other coming from more in depth inner process, like deep fear or surprise, but that is out of the context now.

The funniest interesting thing is that we are so use to do this and to summarize in such efficient way, that we consider funny or totally out of context, when we see someone not doing so.

 

Just think about how hilarious is Sheldon Lee Cooper (for the ones who do not know what I am talking about http://en.wikipedia.org/wiki/Sheldon_Cooper).

In the show Sheldon is quite often answering to the simple question "How are you?" with a massive amount of information, that not only is not understood, but also totally irrelevant, and as such in that context hilarious.

Hilarious in that context I sais, but far to be hilarious in real life, this because we are so expose to external signal and information that we should not and cannot spend time and resource, elaborating incoming information just to know "How our friend is doing". In the evolution it was decide that was the owner of the information that has to process it, that has to elaborate his data and expose only what is relevant for his interaction.

 

Just think what life would be IF instead of saying "Good thank you" to the question "How are you", you would start to enumerate all the facts in each single details, or with some aggregation, to each single person that asks you the same question and expect them to sort out if that means good or not. Crazy eh? I would say quite inefficient and source of possible misunderstanding as well.

Someone may decide that working an hour more per day is totally unacceptable, and as such your status would be "Bad" instead "Good", which is the exact opposite of how you really feel.

As said this way of acting and behaving, is not something coming from the void, but instead the result of a long process that had be refine in 2,5 millions of years (Homo habilis). The evolution had decide that is much more efficient to have Marco telling to Paul how he is doing, than Paul try to read all the information from Marco and then elaborate, with his parameters, how Marco is doing.

I am going to say that, well the evolution is right, and I am quite happy with what we had achieve, also if we had taken some million of years to get there.

I am also confident that you too, see how this is more efficient, and correct.

So, for God sake, why are we still using a method that is not only inefficient but also exposing us to mistakes, when we have to know how complex system feel, systems that are less complex then us, but complex anyhow.

Why are we "monitoring" things, exposing numbers, and pretend to elaborate those with the illusion to finally GET "How are you?"

Would not much more efficient, and statically more prune of errors just ask "Hi my nice system, how are you today?" "Marco you are boring, you ask me that every day, anyhow I am good" "There is anything you need?" "Yes please, check the space I may run out in a week" "Oh thanks to let me know in advance I will".

 

Am I crazy? No I don't think so, is it something that we see only in the movies? Again no I don't think so, and actually is not so far from what we may start to do.

How we can move from a quite wrong way of doing, collecting useless amount of data to analyze to get simple synthetic information?

 

Here is my dream

Let us start simple, you cannot ask to someone "How are you?" if he is dead, is a yes/no condition. But this does not apply to complex systems, in our body every day we loose cells they die, but we replace tem, and our brain is not sending us warning message for each one of them.

But we do have alert messages if the number of them become too hi such that primary function can be compromise.

In short our brain discriminate between what can be compensate automatically and what not, and bother us only when the last one occur.

What can be done to create monitor monad, that is internally consistent and that allow us to scale and to aggregate?

The main point as state above is to do not flood the collocutor with information, but at the same time do not loose the meaning and if in the need the details.

 

This is the first point we can separate between what we need to archive and what we need to get as answer.

To be clear, I ask you "How are you" "Good thank you", that is what I need to know, but at the same time I may be in the position to download your data, and collect all the metrics relevant.

I had to use a heart monitor after few events, and what happened was quite simple. They attach to my body a monitor that was sending detailed metrics of my heart to them directly, plus they were calling me to ask, "How you feel today?" The detailed information was for them to eventually dig in the system if something goes bad.

 

The detailed information is easy to collect the challenge come from the amount of data, how to store aggregate and so on, but all that is the usual boring and old stuff.

What I see instead interesting is how to get the monad to work, how to define the correct way to balance the analysis.

My idea is quite simple; assume the easiest case where we have to process just 3 different metrics to get a meaningful state, something like IO/CPUuser/Net incoming.

A simple

 

 

will work fine; each solid vertex is a metric plus one that is the yes/no condition (am I alive?).

The centre of the solid represent the initial state; state in which all the forces are in perfect balance and there is no variation in the position of the Point of Balance from the centre itself.

We know that any system is never in perfect balance, we also know that each system may behave differently on the base of the N factors, where N is not determinate, but change not only in relation of the kind of system, but also between system that behave to the same class. In short try to define N is a waste of time.

 

What can be done, and guess what is exactly what we do when we move from Blastula to new born, we can learn what is the correct level of variation, meaning we can learn by each system which is the variation that do not compromise our functions.

 

Initially we may have a define AC which is the acceptable range inside which the point can fluctuate, for each vertex we have an F for the possible fluctuation, when F =0 in one of more of the directions we can say "Huston we have a problem".

 

While learning, our system will identify what may be the right shape and distance for the F such that the initial circle may become something like this:

 

Which means that any movement of our point inside the AC area will give us the answer "I am good thanks". Any movement outside, will generate a possible signal like "I am stressed my CPU is overload".

This is a very simple basic example, and it may be not clear how this scale and how it could resolve much more complex scenario. So let us go ahead.

 

A simple solid like a triangular pyramid covers something that is very basic. But if for instances you need to provide the status of a more complex interaction say a database status or a more complex system, then you may have one or many solid with much more complex interaction:

With the solid disdyakis triacontahedron we can have 62 vertexes, meaning that with the same algorithm we can a associate a significant number of metrics.

 

Each solid is seen from the whole as single entity, like if enclose in a sphere that shows only the "final" status:

The flexibility comes from the fact we can connect any solid to another in exclusive mode or as subsystem(s), at the same time each element can be quite complex internally but expose simple and direct status.

 

So for instance a simple one can be like:

While a more complex and probably the most common would be:

In this case we can assume to have a group describing the status for a Storage Engine another for whatever happen on the storage, and so on until we have a Node of our architecture fully describe.

At this point it should be clear that once we had cover the internal complexity of the variation for each solid, the outcome is a simplify message "I am good" no matter at what level we are looking it.

That will allow us to eventually have quite complex system, with complex relations, be described and report status in a very simple and immediate way.

 

Understanding what is going on in a system like this:

Can be quite difficult and taking time. Using standard way of monitoring, we will not be sure if there is a correlation between the metrics, and if it is taking in to account correctly the behaviour of the Node.

 

Using the new approach will allow us to, first of all get simple feedback:

Basically, given a node affected (badly ... give it is dead) all the others are still answering, "I am good", but the Nodes related will start to say, "I am not happy", "I am very sad my node is dead", "I am cool don't worry load and service are under control".

And I will focus directly on my dead node and how to fix it. Given the way I collect my information and report the state, I will be able to see that in the timeline and focus directly on the moment issues starts, for the node.

 

No message is a message

What is also very important to consider, is that once we have define the correct behaviour for each solid, that happen during the learning period, we also know what is the expected behaviour and what signals we should see.

 

In short if you go in the gym and do 45 minutes on the treadmill, you expect to have higher heart rate, to feel fatigued and sweaty. If that doesn't happen then either you were cheating not doing the right exercise, or probably you are a cyber-man and you were not aware of that.

 

Getting the right signal in the right context, also when the signal is a negative one is as important as, or even more, then getting a good one.

 

Last year my mother had a quite important surgery, the day after that she was great, feeling perfectly, no pain, no bad signals. And she was dying down; the doctor start to be uncomfortable with her NOT feeling some level of pain or whatever discomfort. Luckily they take action and save her (at the last second) but they did.

 

Nowadays we just collect metrics, and very rarely we put them in relation, and even more rarely we try to get the negative reading as relevant event. This because we currently do not have a way to contextualize the right behaviour, to know how thing can be correctly handled, and as such what is the deviation from that.

 

The implementation I am describing not only takes in to account the behaviour not the singe event, but it also can trace and identify the lack of a negative signal, a signal that must take place to keep the behaviour healthy.

Conclusion

What I really want to stress out is that the way we do monitor today is the same that trying to manage the space shuttle with stone and scalpel.

 

There are many solutions out there, but all of them are focus on more or less the same approach/model.

Better then nothing of course, and yes we still have situation in which we have NO monitoring. But still I think that changing the paper of the wraps is not doing something new in relation to the content.

 

I do not pretend to know how to implement my idea, the algorithm to calculate the variation and the interaction in the solid, is something I do not see in my range. But that just means I need to find someone able to share a dream and with quite good mathematical skills.


PlanetMySQL Voting: Vote UP / Vote DOWN

Log Buffer #402, A Carnival of the Vanities for DBAs

This Log Buffer edition hits the ball out of park by smashing yet another record of surfacing with a unique collection of blog posts from various database technologies. Enjoy!!!

Oracle:

EM12c and the Optimizer Statistics Console.

SUCCESS and FAILURE Columns in DBA_STMT_AUDIT_OPTS.

OBIEE and ODI on Hadoop : Next-Generation Initiatives To Improve Hive Performance.

Oracle 12.1.0.2 Bundle Patching.

Performance Issues with the Sequence NEXTVAL Call.

SQL Server:

GUIDs GUIDs everywhere, but how is my data unique?

Questions About T-SQL Transaction Isolation Levels You Were Too Shy to Ask.

Introduction to Advanced Transact SQL Stairway and Using the CROSS JOIN Operator.

Introduction to Azure SQL Database Scalability.

What To Do When the Import and Export Wizard Fails.

MySQL:

Orchestrator 1.2.9 GA released.

Making HAProxy 1.5 replication lag aware in MySQL.

Monitor MySQL Performance Interactively With VividCortex.

InnoDB’s multi-versioning handling can be Achilles’ heel.

Memory summary tables in Performance Schema in MySQL 5.7.


PlanetMySQL Voting: Vote UP / Vote DOWN

Store UUID in an optimized way

A few years ago Peter Zaitsev, in a post titled “To UUID or not to UUID,” wrote: There is timestamp based part in UUID which has similar properties to auto_increment and which could be used to have values generated at same point in time physically local in BTREE index.”

For this post I’ve rearranged the timestamp part of UUID (Universal Unique Identifier) and did some benchmarks.

Many people store UUID as char (36) and use as row identity value (PRIMARY KEY) because it is unique across every table, every database and every server and allow easy merging of records from different databases. But here comes the problem, using it as PRIMARY KEY causes the problems described below.

Problems with UUID
  • UUID has 36 characters which makes it bulky.
  • InnoDB stores data in the PRIMARY KEY order and all the secondary keys also contain PRIMARY KEY. So having UUID as PRIMARY KEY makes the index bigger which can not be fit into the memory
  • Inserts are random and the data is scattered.

Despite the problems with UUID, people still prefer it because it is UNIQUE across every table, can be generated anywhere. In this blog, I will explain how to store UUID in an efficient way by re-arranging timestamp part of UUID.

Structure of UUID

MySQL uses UUID version 1 which is a 128-bit number represented by a utf8 string of five hexadecimal numbers

  • The first three numbers are generated from a timestamp.
  • The fourth number preserves temporal uniqueness in case the timestamp value loses monotonicity (for example, due to daylight saving time).
  • The fifth number is an IEEE 802 node number that provides spatial uniqueness. A random number is substituted if the latter is not available (for example, because the host computer has no Ethernet card, or we do not know how to find the hardware address of an interface on your operating system). In this case, spatial uniqueness cannot be guaranteed. Nevertheless, a collision should have very low probability.

The timestamp is mapped as follows:
When the timestamp has the (60 bit) hexadecimal value: 1d8eebc58e0a7d7. The following parts of the UUID are set:: 58e0a7d7-eebc-11d8-9669-0800200c9a66. The 1 before the most significant digits (in 11d8) of the timestamp indicates the UUID version, for time-based UUIDs this is 1.

Fourth and Fifth parts would be mostly constant if it is generated from a single server. First three numbers are based on timestamp, so they will be monotonically increasing. Lets rearrange the total sequence making the UUID closer to sequential. This makes the inserts and recent data look up faster. Dashes (‘-‘) make no sense, so lets remove them.
58e0a7d7-eebc-11d8-9669-0800200c9a66 => 11d8eebc58e0a7d796690800200c9a66

Benchmarking

I created created three tables

  • events_uuid – UUID binary(16) PRIMARY KEY
  • events_int – Additional BIGINT auto increment column and made it as primary key and index on UUID column
  • events_uuid_ordered – Rearranged UUID binary(16) as PRIMARY KEY

I created three stored procedures which insert 25K random rows at a time into the respective tables. There are three more stored procedures which call the random insert-stored procedures in a loop and also calculate the time taken to insert 25K rows and data and index size after each loop. Totally I have inserted 25M records.

    • Data Size
      Horizontal Axis – Number of inserts x 25,000
      Vertical Axis – Data Size in MB

      The data size for UUID table is more than other two tables.
    • Index Size
      Horizontal axis – Number of inserts x 25,000
      Vertical axis – Index Size in MB
    • Total Size
      Horizontal Axis – Number of inserts x 25,000
      Vertical Axis – Total Size in MB
    • Time taken
      Horizontal axis – Number of inserts x 25,000
      Vertical axis – Time Taken in seconds

For the table with UUID as PRIMARY KEY, you can notice that as the table grows big, the time taken to insert rows is increasing almost linearly. Whereas for other tables, the time taken is almost constant.

The size of UUID table is almost 50% bigger than Ordered UUID table and 30% bigger than table with BIGINT as PRIMARY KEY. Comparing the Ordered UUID table BIGINT table, the time taken to insert rows and the size are almost same. But they may vary slightly based on the index structure.

root@localhost:~# ls -lhtr /media/data/test/ | grep ibd -rw-rw---- 1 mysql mysql  13G Jul 24 15:53 events_uuid_ordered.ibd -rw-rw---- 1 mysql mysql  20G Jul 25 02:27 events_uuid.ibd -rw-rw---- 1 mysql mysql  15G Jul 25 07:59 events_int.ibd

Table Structure

#1 events_int CREATE TABLE `events_int` (  `count` bigint(20) NOT NULL AUTO_INCREMENT,  `id` binary(16) NOT NULL,  `unit_id` binary(16) DEFAULT NULL,  `event` int(11) DEFAULT NULL,  `ref_url` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,  `campaign_id` binary(16) COLLATE utf8_unicode_ci DEFAULT '',  `unique_id` binary(16) COLLATE utf8_unicode_ci DEFAULT NULL,  `user_agent` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,  `city` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,  `country` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,  `demand_partner_id` binary(16) DEFAULT NULL,  `publisher_id` binary(16) DEFAULT NULL,  `site_id` binary(16) DEFAULT NULL,  `page_id` binary(16) DEFAULT NULL,  `action_at` datetime DEFAULT NULL,  `impression` smallint(6) DEFAULT NULL,  `click` smallint(6) DEFAULT NULL,  `sold_impression` smallint(6) DEFAULT NULL,  `price` decimal(15,7) DEFAULT '0.0000000',  `actioned_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',  `unique_ads` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,  `notification_url` text COLLATE utf8_unicode_ci,  PRIMARY KEY (`count`),  KEY `id` (`id`),  KEY `index_events_on_actioned_at` (`actioned_at`),  KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; #2 events_uuid CREATE TABLE `events_uuid` (  `id` binary(16) NOT NULL,  `unit_id` binary(16) DEFAULT NULL, ~ ~ PRIMARY KEY (`id`),  KEY `index_events_on_actioned_at` (`actioned_at`),  KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; #3 events_uuid_ordered CREATE TABLE `events_uuid_ordered` (   `id` binary(16) NOT NULL,   `unit_id` binary(16) DEFAULT NULL, ~ ~ PRIMARY KEY (`id`),   KEY `index_events_on_actioned_at` (`actioned_at`),   KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Conclusions

 

    • Create function to rearrange UUID fields and use it

DELIMITER // CREATE DEFINER=`root`@`localhost` FUNCTION `ordered_uuid`(uuid BINARY(36)) RETURNS binary(16) DETERMINISTIC RETURN UNHEX(CONCAT(SUBSTR(uuid, 15, 4),SUBSTR(uuid, 10, 4),SUBSTR(uuid, 1, 8),SUBSTR(uuid, 20, 4),SUBSTR(uuid, 25))); // DELIMITER ;

Inserts

INSERT INTO events_uuid_ordered VALUES (ordered_uuid(uuid()),'1','M',....);

Selects

SELECT HEX(uuid),is_active,... FROM events_uuid_ordered ;

    • Define UUID as binary(16) as binary does not have any character set

 

References

 

The post Store UUID in an optimized way appeared first on MySQL Performance Blog.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Debian/Ubuntu packaging sprint

Debian/Ubuntu packaging sprint participants. From left: James Page, Norvald H. Ryeng, George Lorch, Akhil Mohan, Otto Kekäläinen, Robie Basak.

Last week, Canonical invited the MySQL packaging team in Debian to a packaging sprint in their London office, and most of us were able to participate. We’ve met online on IRC and UOSs before, but this was the first time we were all in the same room.

The results of our sprint will soon be available in a .deb near you. Since Debian Jessie is currently in feature freeze, most of it will hit Ubuntu first. The two main things we achieved on the MySQL side were to make MySQL 5.6 ready for Ubuntu Vivid (15.04) and to split MySQL, Percona and MariaDB configuration files. The configuration file split reduces coupling between MySQL and variant packages, and it is key to solving most of the challenges the team faces. We can now do packaging work on one variant without coordinating with all the others. That is a major achievement!

In addition to this, we also made smaller improvements: parallel builds and MTR test runs, defined and implemented a test plan in order to run the right tests at the right time (build vs. DEP8), and cleanup of the debian/rules file. Not all of this is visible to the end user, but to the package maintainers it’s a big deal.

Time flies when you’re having fun, so when the week was over, we still had more we’d like to do with the packages. The work will continue, but at least most of the heavy lifting is now done. I don’t think we ever would have got this far without meeting in person.

Let me end by extending my thanks to Canonical for organizing and hosting this sprint! It was really useful and fun! I think we all achieved and learned a lot. So once again, to Canonical and everybody that participated, thank you, and merry Christmas!


PlanetMySQL Voting: Vote UP / Vote DOWN

A Year In The Life Of MaxScale

Fri, 2014-12-19 09:14mriddoch

 

This time of the year it is traditional, at least in the UK, to look back and reflect on the year that is coming to a close. Since we have just produced the release candidate for MaxScale and are looking forward to the GA release early in the New Year, it seems like a good time to reflect on the events that have bought us to this stage in the story of MaxScale.

Going Public

The start of 2014 also marked the start for MaxScale, with the first public announcements regarding MaxScale and the first downloadable binaries. MaxScale itself had been started internally before that, but we wanted to hold off on letting it out into "the wild" until there was enough of the functionality provided in order to be able to do more than just give "what it might be" type promises. At first we only had tar files available and only for CentOS/RedHat Linux distributions, we also have the source code available in GitHub for the first time.

That first version of MaxScale contain the fundamental modules it needed to demonstrate the functionality we wanted to show, namely the MySQL protocol modules that allowed connections using the MySQL client protocol and for MaxScale to be able to connect with the backend MariaDB and MySQL databases. It also contained the first iterations of monitors that allowed us to monitor the basic state of a Galera cluster or a Master/Slave replication setup using MySQL replication. The two classes of router that MaxScale supports were represented by the read connection router and an early version of the read/write splitter router. This early version included many restrictions on SQL that would and would not work with the splitter. In the first few months of the year we worked on improving the functionality of these modules and talking about MaxScale at events such as MySQL meet up groups, FOSDEM and Percona Live. We also created the MaxScale Google Group as a way to allow users to communicate with us and for the MaxScale team to announce improvements and bug fixes within MaxScale. We also launched the public bug site, http://bugs.mariadb.com.

We had always planned MaxScale with the five plugin classes, protocols, monitors, routers, filters and  authentication; at this time we had two plugin classes missing completely, filters and authentication. The filters we could do without for the time, but we needed some form of authentication, so we built that into the protocol module as a short term expediency. We built what we called "transparent authentication", we loaded the user data from the backend database at startup and used this to authenticate the users that logged into MaxScale. We had to do this since it is key to the design of MaxScale that we have the ability to create multiple connections from MaxScale to the backend databases and to be able to create those at different times in the lifecycle of the client session. This meant we could not simply proxy the package exchange for the login sequence, we had to do more than that. While this is a vital part of the design, it does lead to some challenges and is perhaps one area in which the users need to be aware of MaxScale when creating authentication configurations within the underlying database cluster, a considerable amount of time has been invested in answering questions in this area and providing documentation, however it still remains as one of those areas that needs more time begin invested in to explain fully how to setup the authentication models that are required.

Improved Galera Support

We were considering better support for Galera clusters, in particular we wanted to do something that would allow us to segregate the write operations in order to remove write conflicts. We considered writing a Galera specific router that would send all writes to a single node whilst distributing the reads across all nodes. It had been my plan to use this as a tutorial on the subject of writing your own router. It was while doing this that I realised this was not really necessary, since our existing read/write splitter could do this task for us, it just needed a way to determine which node should receive the writes.

The current Read/Write splitter used the Master bit in the server status to determine which database should receive the writes, therefore all that was needed was for the Galera monitor to select one of the  running nodes to be the nominal master and set the master bit for that node. We wanted the selection to be predictable, so that if two MaxScales were front ending the same Galera Cluster they would both choose the same node as the master. We used the fact that Galera allocates each node an ID to then select the node with the lowest ID as the master. The result was that we had a solution that provided a Galera cluster as a high availability solution with read scale out for MySQL/MariaDB. Failover was very quick, since there was no database activity to do or binlog manipulation as there would be with a MySQL replication cluster. What this solution does not give you is any write scale out ability.

Users and Surprises

It had always been my hope that users would find MaxScale useful and want to extend the MaxScale functionality to fit the particular needs they have, this is part of the reason or the plugin nature of MaxScale and the division of responsibilities between those plugins. However it was somewhat surprising to be contacted with the idea of using MaxScale not as a proxy between the database clients and the databases but as a proxy between the master and slaves in a MySQL replication cluster. I have to admit that my first reaction was that this was really not something MaxScale was designed to do, but upon meeting with the people proposing this I was convinced not just that it was an extremely good idea, but also that it was something that we could fit into the MaxScale architecture without having to make changes to the core or jeopardising the original MaxScale concept or design. So began a period of working closely with Booking.com to produce a specialist router module for MaxScale that allows MaxScale to act as an intermediate master within a replication tree. It was particularly nice to be able to have such an unexpected use case presented to us and find that MaxScale was flexible enough to be able to facilitate it. It was also nice to then find such a well respected member of the user community publicly talk about this use of our work and even present it at conferences.

We have also had a number of other requests from users through the year, these have resulted in either the modification of existing module to fit better in a given environment or the creation of completely new modules. This has included a number of different monitor modules to cover more database deployment architectures; MMM multi-master and MySQL NDB Cluster being two cases of this. We also produced a hinting mechanism at the request of a user such that the original SQL text could include hints as to the destination to which statements should be sent.

We have also come across other users that wanted additional functionality who have written or are writing plugin modules of their own. This again has been a vindication of our original concept of the pluggable architecture and the flexibility that we had hoped to achieve.

Filters & Other New Features

Throughout the year we have also added many new features to MaxScale, the biggest of these probably being the introduction of the filter API and a set of "example" filters for various tasks. This has provided us with some simple logging mechanisms, query rewriting and query duplication functionality. Other features that have been added include the ability to not just monitor a single level master/slave implementation, but a fully hierarchical replication cluster.

The concept of weighting in the routing decision has been added; this provides a way to allow MaxScale to manage set of servers that have dissimilar configurations or to segregate load between servers within a cluster. A client application has been added to allow for some management and monitoring of MaxScale to be undertaken. The read/write splitter has been enhanced to remove not just the limitations that existed when it was first launched, but also to add new facilities to control the routing or support constructs not previously available.

Authentication has also undergone some change, with the addition of a mechanism to allow MaxScale to track new or changed users within the database and the ability to wildcard match IP addresses in the MySQL user table. Also support has been added for users that have been given access to only a subject of databases.

The MySQL Replication cluster monitor has also been to measure slave lag within the cluster. This slave lag is then made available to the routing modules so that the Read/Write splitter can be configure to disregard slaves that are more than a certain amount behind the master, or always choose to see read operations to the slave that is most up to date.

As well as new features and enhancements we have also spent a lot of time fixing bugs, writing test cases and generally trying to improve MaxScale and the MaxScale documentation.

Packaging is another area that has benefited from more attention during the year, with RPM and Debian packages being introduced, allowing for much easier installation of MaxScale and also reflecting support for more Linux distributions.

Beta Testing & Towards First GA

The last few months has seen the focus of the MaxScale team to be more related to the upcoming GA release, with new features playing less of a role in the development. That is not to say there has not been ay new features, but these have been targeted for release after the first GA release. This allows us to concentrate on stabilising the features in that GA release and have them more fully tested without preventing new features moving forwards.

We also engaged with a number of users that expressed an interest in becoming beta testers for the MaxScale GA release. This, together with the internal testing which we have been performing is helping improve the quality of the MaxScale core and those modules that will be included in the first GA release.

Thank You

MaxScale of course is not a "one man band" and I would like to thank the developers that have been working with me over the year; Massimiliano and Vilho, plus those that have joined the team during the later part of the year; Markus, Martin & Timofey. They have all put an enormous amount of effort in to bring MaxScale to the brink of GA release. I should also not forget Ivan Zoratti with whom a lot of the early ideas for MaxScale have been developed, although he is no longer working with the team his influence is still felt in a lot of areas.

The other group I would like to thank are the users that have given the their time to try MaxScale, give us feedback and encouragement that what we are working is useful and has a place in the database infrastructure. It has also been very gratifying to get the personal feedback at events and to see others outside of the group start to talk about MaxScale and share some of the same thoughts we have had within the group as to what MaxScale can do within a database deployment - even if some of those have surprised us.

The MaxScale Release Candidate packages are avialable for download from http://www.mariadb.com/

Tags: MaxScale
PlanetMySQL Voting: Vote UP / Vote DOWN

Shinguz: Avoid temporary disk tables with MySQL

Taxonomy upgrade extras: temporary tablediskselectquery tuning

For processing SELECT queries MySQL needs some times the help of temporary tables. These temporary tables can be created either in memory or on disk.

The number of creations of such temporary tables can be found with the following command:

mysql> SHOW GLOBAL STATUS LIKE 'created_tmp%tables'; +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Created_tmp_disk_tables | 4 | | Created_tmp_tables | 36 | +-------------------------+-------+
There are 2 different reasons why MySQL is creating a temporary disk table instead of a temporary memory table:
  • The result is bigger than the smaller one of the MySQL variables max_heap_table_size and tmp_table_size.
  • The result contains columns of type BLOB or TEXT.
In the following example we can see how the temporary disk table can be avoided without changing the column types: mysql> CREATE TABLE test ( id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , data TEXT , type TINYINT UNSIGNED ); mysql> INSERT INTO test VALUES (NULL, 'State is green', 1), (NULL, 'State is green', 1) , (NULL, 'State is red', 3), (NULL, 'State is red', 3) , (NULL, 'State is red', 3), (NULL, 'State is orange', 2); mysql> EXPLAIN SELECT data, COUNT(*) FROM test GROUP BY data; +----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+ | 1 | SIMPLE | test | ALL | NULL | NULL | NULL | NULL | 6 | Using temporary; Using filesort | +----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+ mysql> SHOW SESSION STATUS LIKE 'created_tmp%tables'; +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Created_tmp_disk_tables | 0 | | Created_tmp_tables | 3 | +-------------------------+-------+ mysql> SELECT data, COUNT(*) FROM test GROUP BY data; +-----------------+----------+ | data | count(*) | +-----------------+----------+ | State is green | 2 | | State is orange | 1 | | State is red | 3 | +-----------------+----------+ mysql> SHOW SESSION STATUS LIKE 'created_tmp%tables'; +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Created_tmp_disk_tables | 1 | | Created_tmp_tables | 4 | +-------------------------+-------+ mysql> SELECT SUBSTR(data, 1, 32), COUNT(*) FROM test GROUP BY SUBSTR(data, 1, 32); mysql> SHOW SESSION STATUS LIKE 'created_tmp%tables'; +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Created_tmp_disk_tables | 1 | | Created_tmp_tables | 5 | +-------------------------+-------+

This method can be used if changing the table structure from TEXT to VARCHAR or the use of a RAM disk are not possible solutions.


PlanetMySQL Voting: Vote UP / Vote DOWN

Orchestrator 1.2.9 GA released

Orchestrator 1.2.9 GA has been released. Noteworthy:

  • Added "ReadOnly" (true/false) configuration param. You can have orchestrator completely read-only
  • Added "AuthenticationMethod": "multi": works like BasicAuth (your normal HTTP user+password) only it also accepts the special user called "readonly", which, surprise, can only view and not modify
  • Centralized/serialized most backend database writes (with hundreds/thousands monitored servers it was possible or probable that high concurrency led to too-many-connections openned on the backend database).
  • Fixed evil evil bug that would skip some checks if binary logs were not enabled
  • Better hostname resolve (now also asking MySQL server to resolve hostname; resolving is cached)
  • Pseudo-GTID (read here, here, here) support now considered stable (apart from being tested it has already been put to practice multiple times in production at Outbrain, in different planned and unplanned crash scenarios)

I continue developing orchestrator as free and open source at my new employer, Booking.com.

 

 


PlanetMySQL Voting: Vote UP / Vote DOWN

Making HAProxy 1.5 replication lag aware in MySQL

HAProxy is frequently used as a software load balancer in the MySQL world. Peter Boros, in a past post, explained how to set it up with Percona XtraDB Cluster (PXC) so that it only sends queries to available nodes. The same approach can be used in a regular master-slaves setup to spread the read load across multiple slaves. However with MySQL replication, another factor comes into play: replication lag. In this case the approach mentioned for Percona XtraDB Cluster does not work that well as the check we presented only returns ‘up’ or ‘down’. We would like to be able to tune the weight of a replica inside HAProxy depending on its replication lag. This is what we will do in this post using HAProxy 1.5.

Agent checks in HAProxy

HAProxy 1.5 allows us to run an agent check, which is a check that can be added to a regular health check. The benefit of agent checks is that the return value can be ‘up’ or ‘down’, but also a weight.

What is an agent? It is simply a program that can be accessed from a TCP connection on a given port. So if we want to run an agent on a MySQL server that will:

  • Mark the server as down in HAProxy if replication is not working
  • Set the weight to 100% if the replication lag is < 10s
  • Set the weight to 50% if the replication lag is >= 10s and < 60s
  • Set the weight to 5% in all other situations

We can use a script like this:

$ less agent.php = 10 && $lag < 60){ return "up 50%"; } else return "up 5%"; } set_time_limit(0); $socket = stream_socket_server("tcp://127.0.0.1:$port", $errno, $errstr); if (!$socket) { echo "$errstr ($errno) n"; } else { while ($conn = stream_socket_accept($socket,9999999999999)) { $cmd = "$mysql -h127.0.0.1 -u$user -p$password -P$mysql_port -Ee "$query" | grep Seconds_Behind_Master | cut -d ':' -f2 | tr -d ' '"; exec("$cmd",$lag); $weight = set_weight($lag[0]); unset($lag); fputs ($conn, $weight); fclose ($conn); } fclose($socket); } ?>

If you want the script to be accessible from port 6789 and connect to a MySQL instance running on port 3306, run:

$ php agent.php 6789 3306

You will also need a dedicated MySQL user:

mysql> GRANT REPLICATION CLIENT ON *.* TO 'haproxy'@'127.0.0.1' IDENTIFIED BY 'haproxy_pwd';

When the agent is started, you can check that it is working properly:

# telnet 127.0.0.1 6789 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. up 100% Connection closed by foreign host.

Assuming it is run locally on the app server, that 2 replicas are available (192.168.10.2 and 192.168.10.3) and that the application will send all reads on port 3307, you will define a frontend and a backend in your HAProxy configuration like this:

frontend read_only-front bind *:3307 mode tcp option tcplog log global default_backend read_only-back backend read_only-back mode tcp balance leastconn server slave1 192.168.10.2 weight 100 check agent-check agent-port 6789 inter 1000 rise 1 fall 1 on-marked-down shutdown-sessions server slave2 192.168.10.3 weight 100 check agent-check agent-port 6789 inter 1000 rise 1 fall 1 on-marked-down shutdown-sessions

Demo

Now that everything is set up, let’s see how HAProxy can dynamically change the weight of the servers depending on the replication lag.

No lag

# Slave1 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # Slave2 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # HAProxy $ echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19 # pxname,svname,status,weight read_only-front,FRONTEND,OPEN, read_only-back,slave1,UP,100 read_only-back,slave2,UP,100 read_only-back,BACKEND,UP,200

Slave1 lagging

# Slave1 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 25 # Slave2 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19 # pxname,svname,status,weight read_only-front,FRONTEND,OPEN, read_only-back,slave1,UP,50 read_only-back,slave2,UP,100 read_only-back,BACKEND,UP,150

Slave2 down

# Slave1 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # Slave2 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: NULL # echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19 # pxname,svname,status,weight read_only-front,FRONTEND,OPEN, read_only-back,slave1,UP,100 read_only-back,slave2,DOWN (agent),100 read_only-back,BACKEND,UP,100

Conclusion

Agent checks are a nice addition in HAProxy 1.5. The setup presented above is a bit simplistic though: for instance, if HAProxy fails to connect to the agent, it will not mark the corresponding as down. It is then recommended to keep a regular health check along with the agent check.

Astute readers will also notice that in this configuration, if replication is broken on all nodes, HAProxy will stop sending reads. This may not be the best solution. Possible options are: stop the agent and mark the servers as UP using the stats socket or add the master as a backup server.

And as a final note, you can edit the code of the agent so that replication lag is measured with Percona Toolkit’s pt-heartbeat instead of Seconds_Behind_Master.

The post Making HAProxy 1.5 replication lag aware in MySQL appeared first on MySQL Performance Blog.


PlanetMySQL Voting: Vote UP / Vote DOWN

Chegg

Pythian revamped Chegg’s MySQL architecture and processes and has provided 24×7 operations and incident management support and continual architectural guidance for Chegg’s MySQL, MongoDB,and Cassandra environments for the past five years—prior to, during, and after their IPO. Pythian also spearheaded an implementation on Amazon Web Services’ cloud for the elasticity needed by an educational application’s cyclical peaks. Chegg’s partnership with Amazon and Pythian has allowed for rapid growth and constant evolution without sacrificing performance. Many years later, Chegg still has no in-house DBAs, and relies on Pythian’s expertise for all their data management needs.

Chegg Case Study


PlanetMySQL Voting: Vote UP / Vote DOWN

Drop Table slow


A few days back we were given a MySQL database hosted in EC2 with around 5TB data to move some of the tables to a new server.  The intention was to share the tables between 2 EC2 instances. Since AWS had the option to take online snapshots, the plan was to take a snapshot, create a new machine with that snapshot and drop the unwanted tables.So everything went as planned until creating a new machine with the snapshot.  The real challenge was dropping the unwanted tables.  It took around 4 minutes to Drop a table whose size is 20GB.  It took 20 minutes to drop a 100GB table. The time kept on increasing for larger tables. MySQL even went to “defunct” when we killed the drop query and at times crashed. To track down this issue we executed drop table in one session and checked the processlist from another session that gave the below output.
mysql> show processlist \G*************************** 1. row ***************************     Id: 2352   User: dbuser   Host: localhost     db: dbCommand: Query   Time: 2573  State: checking permissions   Info: DROP TABLE `test`
I wondered if it is a MySQL user related permission or a OS related one. Now I went on to check the “InnoDB status”. Found something interesting in the “ROW OPERATIONS”. Printing it below
--------------ROW OPERATIONS--------------0 queries inside InnoDB, 0 queries in queue1 read views open inside InnoDBMain thread process no. 6117, id 47405873539392, state: doing insert buffer mergeNumber of rows inserted 167872705, updated 1018100, deleted 55251906, read 1953144114171.93 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
InnoDB Change Buffering :
From MySQL reference manual “Index changes resulting from SQL statements, which could normally involve random I/O operations, are held back and performed periodically by a background thread. This sequence of operations can write the disk blocks for a series of index values more efficiently than if each value were written to disk immediately."
So here is what had really happened. The given DB server is big in data and used to get frequent inserts and didn't have enough ram to accumulate all the pages in “InnoDB buffer pool”. So it had to cache all the secondary index changes to “Insert Buffer”. These cached changes are flushed to disk only when the pages are loaded to the “buffer pool” or when the server is idle. When we took a snapshot and mounted it in a new server the database was idle and so the InnoDB main thread started merging all the changes cached in “insert buffer” to the disk. So it was the “Innodb main thread” that held the lock on those tables and the drop statement that we executed has waited for the InnoDB main thread to complete the insert buffer merge process.
So we waited for the buffer merge process to complete and then we executed the drop statements.All those tables got dropped in seconds. Even a 700GB table got dropped in 5 to 10 seconds.Also setting “innodb_fast_shutdown=0” and shutting down MySQL does the buffer merge operation.But what I feel is “buffer merging by keeping the server idle or setting “innodb_fast_shutdown=0” and restarting results in same amount of time”.
Also the status “Checking permissions” that we got in the processlist output is very misleading. It is natural for everybody to thing that the drop statement is waiting for MySQL user permission or OS related permission. But actually it was waiting for permission from the InnoDB main thread.
I still wonder why we need to buffer merges for a table that is to be dropped. I need to think further.
Note : There might some some other reasons too for drop table slowness. But this is the one I have faced till date.

--Aravinth C


PlanetMySQL Voting: Vote UP / Vote DOWN

Monitor MySQL Performance Interactively With VividCortex

If you’re monitoring MySQL performance on dozens or hundreds of servers, chances are you have a “rainbow chart” – a time-series chart with hundreds of tiny area graphs stacked on top of each other, crawling imperceptibly one pixel at a time across the big-screen monitor in your office. The trouble with these charts is they’re hard to see. It takes many minutes for enough new pixels to display after a change. In the meantime you can’t see the change clearly.

At VividCortex, we think we’ve found a better way to keep tabs on what’s going on in your infrastructure: a bubble visualization. It is compact, and immediately communicates current status and trend, with visualizations that your eye interprets in a glance.

This is the Activity layout on the Hosts dashboard. It’s designed to scale to hundreds, even thousands of hosts. Tabular layouts and strip-charts won’t do the trick, but live, interactive bubbles will.

The bubbles update live, as agents send metrics to the APIs. They’re actually a D3 physics simulation of heat, gravity, friction, and several other forces. A moment later, this environment’s load characteristics changed dramatically!

The bubbles are sized according to the host’s load, so busier hosts are larger. The color is the trend: redder bubbles are currently trending1 upwards, and bluer ones are getting less loaded. Similar hosts are grouped together in a single circle. It takes a lot of words to explain, but when you see it changing live, you get it instantly.

You don’t see it in this screenshot, but as the visualization auto-selects the hottest host, the right-hand sidebar of the page updates with the host’s details and history, making this a great dashboard to display on that big screen for everyone to see. If you’re using it interactively, quick-targets activate so you can drill down into the auto-selected host.

If you’re not getting immediate feedback from your MySQL monitoring system, what are you waiting for? Start your 30-day free trial today. Installation is super-fast and unintrusive.

1 In case you’re curious about the trend calculation, we use a MACD – Moving Average Convergence-Divergence. The exact metric varies by host type, but is either a utilization metric such as CPU busy, or a throughput metric such as queries per second.


PlanetMySQL Voting: Vote UP / Vote DOWN

Scaling TokuDB Performance with Binlog Group Commit

TokuDB offers high throughput for write intensive applications, and the throughput scales with the number of concurrent clients.  However, when the binary log is turned on, TokuDB 7.5.2 throughput suffers.  The throughput scaling problem is caused by a poor interaction between the binary log group commit algorithm in MySQL 5.6 and the way TokuDB commits transactions.   TokuDB 7.5.4 for Percona Server 5.6 fixes this problem, and the result is roughly an order of magnitude increase in SysBench throughput for in memory workloads.

MySQL uses two phase commit protocol to synchronize the MySQL binary log with the recovery logs of the storage engines when a transaction commits.  Since fsync’s are used to ensure the durability of the data in the various logs, and fsync’s can be very slow, the fsync can easily become a bottleneck.  A group commit algorithm can be used to amortize the fsync cost over many log writes.  The binary log group commit algorithm is intended to amortize the cost of the binary log fsync’s over many transactions.

The binary log group commit blog describes how two phase commit works with the binary log in MySQL 5.6.

When a transaction commits, a transaction runs through a prepare phase and a commit phase.  Hey, it is called two phase commit for a reason.

During the prepare phase, TokuDB writes a prepare event to its recovery log and uses a group commit algorithm to fsync its recovery log. Since there can be many transactions in the prepare phase concurrently, the transaction prepare throughput scales with the number of transactions.

During the commit phase, the transaction’s write events are written to the binary log and the binary log is fsync’ed. MySQL 5.6 uses a group commit algorithm to fsync the binary log.  Also during the commit phase, TokuDB writes a commit event to its recovery log and uses a group commit algorithm to fsync its recovery log. Since the transaction has already been prepared and the binlog has already been written, the fsync of the TokuDB recovery log is not necessary. XA crash recovery will commit all of the prepared transactions that the binary log knows about and abort the others.

Unfortunately, MySQL 5.6 serializes the commit phase so that the commit order is the same as the write order in the binary log. Since the commit phase is serialized, TokuDB’s group commit algorithm is ineffective. Luckily, MySQL 5.6 tells TokuDB to ignore durability in the commit phase (the HA_IGNORE_DURABILITY property is set), so TokuDB does not fsync its recovery log. This fixes the throughput bottleneck caused by serialized fsync’s of the TokuDB recovery log during the commit phase of the two phase commit.

Since MariaDB uses a different binlog group commit algorithm, we have some additional work to ensure that TokuDB works nicely with it.

We used the SysBench update non-indexed test to measure throughput and will post a detailed blog with results later.

 

 

The post Scaling TokuDB Performance with Binlog Group Commit appeared first on Tokutek.


PlanetMySQL Voting: Vote UP / Vote DOWN

InnoDB’s multi-versioning handling can be Achilles’ heel

I believe InnoDB storage engine architecture is great for a lot of online workloads, however, there are no silver bullets in technology and all design choices have their trade offs. In this blog post I’m going to talk about one important InnoDB limitation that you should consider.

InnoDB is a multiversion concurrency control (MVCC) storage engine which means many versions of the single row can exist at the same time. In fact there can be a huge amount of such row versions. Depending on the isolation mode you have chosen, InnoDB might have to keep all row versions going back to the earliest active read view, but at the very least it will have to keep all versions going back to the start of SELECT query which is currently running.

In most cases this is not a big deal – if you have many short transactions happening you will have only a few row versions to deal with. If you just use the system for reporting queries but do not modify data aggressively at the same time you also will not have many row versions. However, if you mix heavy updates with slow reporting queries going at the same time you can get into a lot of trouble.

Consider for example an application with a hot row (something like actively updated counter) which has 1000 updates per second together with some heavy batch job that takes 1000 to run. In such case we will have 1M of row versions to deal with.

Let’s now talk about how those old-row versions are stored in InnoDB – they are stored in the undo space as an essentially linked list where each row version points to the previous row version together with transaction visibility information that helps to decide which version will be visible by this query. Such design favors short new queries that will typically need to see one of the newer rows, so they do not have to go too far in this linked list. This might not be the case with reporting queries that might need to read rather old row version which correspond to the time when the query was started or logical backups that use consistent reads (think mysqldump or mydumper) which often would need to access such very old row versions.

So going through the linked list of versions is expensive, but how expensive it can get? In this case a lot depends upon whenever UNDO space fits in memory, and so the list will be traversed efficiently – or it does not, in which case you might be looking at the massive disk IO. Keep in mind undo space is not clustered by PRIMARY key, as normal data in InnoDB tables, so if you’re updating multiple rows at the same time (typical case) you will be looking at the row-version chain stored in many pages, often as little as one row version per page, requiring either massive IO or a large amount of UNDO space pages to present in the InnoDB Buffer pool.

Where it can get even worse is Index Scan. This is because Indexes are structured in InnoDB to include all row versions corresponding to the key value, current and past. This means for example the index for KEY=5 will contain pointers to all rows that either have value 5 now or had value 5 some time in the past and have not been purged yet. Now where it can really bite is the following – InnoDB needs to know which of the values stored for the key are visible by the current transaction – and that might mean going through all long-version chains for each of the keys.

This is all theory, so lets see how we can simulate such workloads and see how bad things really can get in practice.

I have created 1Bil rows “sysbench” table which takes some 270GB space and I will use a small buffer pool – 6GB. I will run sysbench with 64 threads pareto distribution (hot rows) while running a full table scan query concurrently: select avg(k) from sbtest1 Here is exact sysbench run done after prepare.

sysbench --num-threads=64 --report-interval=10 --max-time=0 --max-requests=0 --rand-type=pareto --oltp-table-size=1000000000 --mysql-user root --mysql-password=password --test /usr/share/doc/sysbench/tests/db/oltp.lua run

Here is the explain for the “reporting” query that you would think to be a rather efficient index scan query. With just 4 bytes 1 Billion of values would be just 4G (really more because of InnoDB overhead) – not a big deal for modern systems:

mysql> explain select avg(k) from sbtest1 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: sbtest1 type: index possible_keys: NULL key: k_1 key_len: 4 ref: NULL rows: 953860873 Extra: Using index 1 row in set (0.00 sec)

2 days have passed and the “reporting” query is still running… furthermore the foreground workload started to look absolutely bizarre:

[207850s] threads: 64, tps: 0.20, reads: 7.40, writes: 0.80, response time: 222481.28ms (95%), errors: 0.70, reconnects: 0.00 [207860s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207870s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207880s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207890s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207900s] threads: 64, tps: 2.70, reads: 47.60, writes: 11.60, response time: 268815.49ms (95%), errors: 0.00, reconnects: 0.00 [207910s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207920s] threads: 64, tps: 2.30, reads: 31.60, writes: 9.50, response time: 294954.28ms (95%), errors: 0.00, reconnects: 0.00 [207930s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207940s] threads: 64, tps: 2.90, reads: 42.00, writes: 12.20, response time: 309332.04ms (95%), errors: 0.00, reconnects: 0.00 [207950s] threads: 64, tps: 0.20, reads: 4.60, writes: 1.00, response time: 318922.41ms (95%), errors: 0.40, reconnects: 0.00 [207960s] threads: 64, tps: 0.20, reads: 1.90, writes: 0.50, response time: 335170.09ms (95%), errors: 0.00, reconnects: 0.00 [207970s] threads: 64, tps: 0.60, reads: 13.20, writes: 2.60, response time: 292842.88ms (95%), errors: 0.60, reconnects: 0.00 [207980s] threads: 64, tps: 2.60, reads: 37.60, writes: 10.20, response time: 351613.43ms (95%), errors: 0.00, reconnects: 0.00 [207990s] threads: 64, tps: 5.60, reads: 78.70, writes: 22.10, response time: 186407.21ms (95%), errors: 0.00, reconnects: 0.00 [208000s] threads: 64, tps: 8.10, reads: 120.20, writes: 32.60, response time: 99179.05ms (95%), errors: 0.00, reconnects: 0.00 [208010s] threads: 64, tps: 19.50, reads: 280.50, writes: 78.90, response time: 27559.69ms (95%), errors: 0.00, reconnects: 0.00 [208020s] threads: 64, tps: 50.70, reads: 691.28, writes: 200.70, response time: 5214.43ms (95%), errors: 0.00, reconnects: 0.00 [208030s] threads: 64, tps: 77.40, reads: 1099.72, writes: 311.31, response time: 2600.66ms (95%), errors: 0.00, reconnects: 0.00 [208040s] threads: 64, tps: 328.20, reads: 4595.40, writes: 1313.40, response time: 911.36ms (95%), errors: 0.00, reconnects: 0.00 [208050s] threads: 64, tps: 538.20, reads: 7531.90, writes: 2152.10, response time: 484.46ms (95%), errors: 0.00, reconnects: 0.00 [208060s] threads: 64, tps: 350.70, reads: 4913.45, writes: 1404.09, response time: 619.42ms (95%), errors: 0.00, reconnects: 0.00 [208070s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208080s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208090s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208100s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208110s] threads: 64, tps: 1.60, reads: 24.20, writes: 6.80, response time: 42385.40ms (95%), errors: 0.10, reconnects: 0.00 [208120s] threads: 64, tps: 0.80, reads: 28.20, writes: 3.40, response time: 51381.54ms (95%), errors: 2.80, reconnects: 0.00 [208130s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208140s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208150s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208160s] threads: 64, tps: 0.60, reads: 14.20, writes: 2.40, response time: 93248.04ms (95%), errors: 0.80, reconnects: 0.00

As you can see we have long stretches of times when there are no queries completed at all… going to some spikes of higher performance. This is how it looks on the graph:

Corresponding CPU usage:

This shows what we are not only observing something we would expect with InnoDB design but also there seems to be some serve starvation happening in this case which we can confirm:

Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357 --Thread 139790809552640 has waited at btr0cur.cc line 3852 for 194.00 seconds the semaphore: S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock' a writer (thread id 139790814770944) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file row0sel.cc line 4125 Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357 --Thread 139790804735744 has waited at row0sel.cc line 3506 for 194.00 seconds the semaphore: S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock' a writer (thread id 139790814770944) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file row0sel.cc line 4125 Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357 --Thread 139790810756864 has waited at row0sel.cc line 4125 for 194.00 seconds the semaphore: S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock' a writer (thread id 139790814770944) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file row0sel.cc line 4125 Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357 --Thread 139790811158272 has waited at btr0cur.cc line 3852 for 194.00 seconds the semaphore: S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock' a writer (thread id 139790814770944) has reserved it in mode wait exclusive

Waiting for the given buffer pool block to become available for more than 3 minutes is a big issue – this lock should never be held by more than a few microseconds.

SHOW PROCESSLIST confirms even most basic selects by primary key can get stalled for long time

| 5499 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5500 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5501 | root | localhost | sbtest | Query | 185 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5502 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5503 | root | localhost | sbtest | Query | 14 | statistics | SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN 1 AND 1+99 ORDER BY c | 0 | 0 | | 5504 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5505 | root | localhost | sbtest | Query | 14 | updating | UPDATE sbtest1 SET k=k+1 WHERE id=1 | 0 | 0 | | 5506 | root | localhost | sbtest | Query | 236 | updating | DELETE FROM sbtest1 WHERE id=1 | 0 | 0 | | 5507 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5508 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id BETWEEN 1 AND 1+99 | 0 | 0 | | 5509 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5510 | root | localhost | sbtest | Query | 14 | updating | UPDATE sbtest1 SET c='80873149502-45132831358-41942500598-17481512835-07042367094-39557981480-593123 | 0 | 0 | | 5511 | root | localhost | sbtest | Query | 236 | updating | UPDATE sbtest1 SET k=k+1 WHERE id=18 | 0 | 1 | | 5512 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=7 | 0 | 0 | | 6009 | root | localhost | sbtest | Query | 195527 | Sending data | select avg(k) from sbtest1 | 0 | 0 |

How do I know it is UNDO space related issue in this case? Because it ends up taking majority of buffer pool

mysql> select page_type,count(*) from INNODB_BUFFER_PAGE group by page_type; +-------------------+----------+ | page_type | count(*) | +-------------------+----------+ | EXTENT_DESCRIPTOR | 1 | | FILE_SPACE_HEADER | 1 | | IBUF_BITMAP | 559 | | IBUF_INDEX | 855 | | INDEX | 2186 | | INODE | 1 | | SYSTEM | 128 | | UNDO_LOG | 382969 | | UNKNOWN | 6508 | +-------------------+----------+ 9 rows in set (1.04 sec)

And it does so in a very crazy way – when there is almost no work being done UNDO_LOG contents of the buffer pool is growing very rapidly while when we’re getting some work done the INDEX type pages take a lot more space. To me this seems like as the index scan is going it touches some hot rows and some not-so-hot ones, containing less row versions and so does not put much pressure on “undo space.”

Take Away: Now you might argue that this given workload and situation is rather artificial and rather narrow. It well might be. My main point here is what if you’re looking at just part of your workload, such as your main short application queries, and not taking reporting or backups into account “because their performance is not important.” In this case you might be in for a big surprise. Those background activities might be taking much more than you would expect, and in addition, they might have much more of a severe impact to your main application workload, like in this case above.

The post InnoDB’s multi-versioning handling can be Achilles’ heel appeared first on MySQL Performance Blog.


PlanetMySQL Voting: Vote UP / Vote DOWN

New Webinar: A DevOps Guide to Database Infrastructure Automation for eCommerce

December 17, 2014 By Severalnines

For an online shop, the website is the cash register. It has to be open for business 24 hours a day.

As the ops person on duty, when you get a call at 3am after your website went down, your priority number one is to restore the service asap. But why can we not have our application stack automatically recover, and not have the pager wake us at all? Why do we still stick to tedious manual processes, which take up time and resources, and hinder future growth?

Infrastructure automation isn’t easy, but it’s not rocket science either, says Riaan Nolan. Riaan has been in operations for the past decade, and has built over a dozen eCommerce properties. Automation is a worthwhile investment for retailers serious about eCommerce, but deciding on which tools to invest in can be a confusing and overwhelming process.

Join us for this webinar to understand the key pain points that online retailers experience which indicate it’s time to invest in database automation. Our guest speaker this time will be Riaan Nolan of Foodpanda, Rocket Internet’s global online food delivery marketplace operating in over 40 countries. 

 

New Webinar: A DevOps Guide to Database Infrastructure Automation for eCommerce

 

DATE & TIME

 

Europe/MEA/APAC

Tuesday, February 17th 2015 at 09:00 GMT (UK) / 10:00 CET (Germany, France, Sweden)

Register Now

 

North America/LatAm

Tuesday, February 17th 2015 at 9:00 Pacific Time (US) / 12:00 Eastern Time (US)

Register Now

 

TOPICS 
  • eCommerce infrastructure challenges in 2014
  • Provisioning of test/QA and highly available production environments across multi-datacenter and multi-cloud environments
  • Building and maintaining configuration management systems such as Puppet and Chef
  • Enabling self-service infrastructure services to internal dev teams
  • Health and performance monitoring 
  • Capacity analysis and planning
  • Elastic scaling 
  • Automating failure handling
  • Disaster recovery

 

SPEAKER

     Riaan Nolan

  • Expert Live Systems Administrator, foodpanda | Hellofood
  • Senior Systems Administrator / Infrastructure Lead, Rocket Internet GmbH
  • Senior Technology Manager, Africa Internet Accelerator
  • Uses Amazon EC2, VPC and Autoscale with Cloudformation.
  • First Puppet Labs Certified Professional in South Africa. 
  • Specialties: Puppet Automation, Cloud Deployments, eCommerce, eMarketing, Specialized Linux Services, Windows, Process making, Budgets, Asset Tracking, Procurement

 

 

read more


PlanetMySQL Voting: Vote UP / Vote DOWN

Memory summary tables in Performance Schema in MySQL 5.7

One of great innovation in MySQL 5.7 is memory summary tables в Performance Schema and corresponding views in sys schema

And as troubleshooting freak I have huge reason to greet this feature.

Before version 5.7 we had very limited abilities to diagnose memory issues in MySQL. We could use operating system tools, such as vmstat, top, free, but they only showed what MySQL server uses memory, but do not show how. In version 5.7 things changed.

Lets examine what can we study about memory usage by MySQL Server.

At first, this is total amount of memory, used by all internal MySQL structures:

mysql> select * from sys.memory_global_total;
+-----------------+
| total_allocated |
+-----------------+
| 458.44 MiB      |
+-----------------+

There is also similar statistics by hosts, users and threads, including those which are created for user connections:

mysql> select host, current_count_used ccu, current_allocated, current_avg_alloc, current_max_alloc, total_allocated from sys.memory_by_host_by_current_bytes WHERE host IS NOT NULL;
+------------+------+-------------------+-------------------+-------------------+-----------------+
| host       | ccu  | current_allocated | current_avg_alloc | current_max_alloc | total_allocated |
+------------+------+-------------------+-------------------+-------------------+-----------------+
| background | 2773 | 10.84 MiB         | 4.00 KiB          | 8.00 MiB          | 30.69 MiB       |
| localhost  | 1509 | 809.30 KiB        | 549 bytes         | 176.38 KiB        | 83.59 MiB       |
+------------+------+-------------------+-------------------+-------------------+-----------------+

mysql> select host, current_count_used ccu, current_allocated, current_avg_alloc, current_max_alloc, total_allocated from sys.memory_by_user_by_current_bytes;
+------+------+-------------------+-------------------+-------------------+-----------------+
| user |  ccu | current_allocated | current_avg_alloc | current_max_alloc | total_allocated |
+------+------+-------------------+-------------------+-------------------+-----------------+
| root | 1401 | 1.09 MiB          | 815 bytes         | 334.97 KiB        | 42.73 MiB       |
| mark |  201 | 496.08 KiB        | 2.47 KiB          | 334.97 KiB        | 5.50 MiB        |
+------+------+-------------------+-------------------+-------------------+-----------------+

mysql> select thread_id tid, user, current_count_used ccu, current_allocated ca, current_avg_alloc caa, current_max_alloc cma, total_allocated from sys.memory_by_thread_by_current_bytes;
+-----+-------------------------+--------+-------------+-----------+-----------+-----------------+
| tid | user                    |    ccu |          ca |       caa |       cma | total_allocated |
+-----+-------------------------+--------+-------------+-----------+-----------+-----------------+
|   1 | sql/main                | 660327 | 2.53 GiB    | 4.01 KiB  | 2.10 GiB  | 2.69 GiB        |
| 150 | root@127.0.0.1          |    620 | 4.06 MiB    | 6.71 KiB  | 2.00 MiB  | 32.17 MiB       |
| 146 | sql/slave_sql           |     38 | 1.31 MiB    | 35.37 KiB | 1.00 MiB  | 1.44 MiB        |
| 145 | sql/slave_io            |    102 | 1.08 MiB    | 10.84 KiB | 1.00 MiB  | 2.79 MiB        |
...
|  92 | innodb/io_write_thread  |      0 | 0 bytes     | 0 bytes   | 0 bytes   | 384 bytes       |
| 124 | innodb/io_write_thread  |      0 | 0 bytes     | 0 bytes   | 0 bytes   | 384 bytes       |
|  28 | innodb/io_read_thread   |      0 | 0 bytes     | 0 bytes   | 0 bytes   | 2.25 KiB        |
|  60 | innodb/io_read_thread   |      0 | 0 bytes     | 0 bytes   | 0 bytes   | 384 bytes       |
| 139 | innodb/srv_purge_thread |    -24 | -328 bytes  | 14 bytes  | 272 bytes | 754.21 KiB      |
|  69 | innodb/io_write_thread  |    -14 | -1008 bytes | 72 bytes  | 0 bytes   | 34.28 KiB       |
|  68 | innodb/io_write_thread  |    -20 | -1440 bytes | 72 bytes  | 0 bytes   | 298.05 KiB      |
|  74 | innodb/io_write_thread  |    -23 | -1656 bytes | 72 bytes  | 0 bytes   | 103.55 KiB      |
|   4 | innodb/io_log_thread    |    -40 | -2880 bytes | 72 bytes  | 0 bytes   | 132.38 KiB      |
|  72 | innodb/io_write_thread  |   -106 | -7632 bytes | 72 bytes  | 0 bytes   | 1.10 MiB        |
+-----+-------------------------+--------+-------------+-----------+-----------+-----------------+
145 rows in set (2.65 sec)

This way you can find out, for example, which thread used most memory and kill it, if necessary.

But it is not always good idea to kill a thread even if it is certain that it leaks memory. For example, in case, described in bug #69848, it is not good idea to kill Slave IO and SQL threads. Although one of them, certainly, leaks memory.

You can see on this output how slave IO thread used more and more memory until I stopped my test. But after two cups of tea memory usage was still high and stayed at 1.04 GiB RAM even on idle server:

mysql> select thread_id tid, user, current_count_used ccu, current_allocated ca, current_avg_alloc caa, current_max_alloc cma, total_allocated from sys.memory_by_thread_by_current_bytes where thread_id in (145, 146);
+-----+---------------+--------+------------+-----------+------------+-----------------+
| tid | user          |    ccu |         ca |       caa |        cma | total_allocated |
+-----+---------------+--------+------------+-----------+------------+-----------------+
| 145 | sql/slave_io  | 170357 | 501.41 MiB | 3.01 KiB  | 496.90 MiB | 8.06 GiB        |
| 146 | sql/slave_sql |  10133 | 1.54 MiB   | 159 bytes | 1.00 MiB   | 10.38 GiB       |
+-----+---------------+--------+------------+-----------+------------+-----------------+
2 rows in set (2.76 sec)

...

mysql> select thread_id tid, user, current_count_used ccu, current_allocated ca, current_avg_alloc caa, current_max_alloc cma, total_allocated from sys.memory_by_thread_by_current_bytes where thread_id in (145, 146);
+-----+---------------+--------+------------+-----------+------------+-----------------+
| tid | user          |    ccu |         ca |       caa |        cma | total_allocated |
+-----+---------------+--------+------------+-----------+------------+-----------------+
| 145 | sql/slave_io  | 229012 | 641.95 MiB | 2.87 KiB  | 636.07 MiB | 10.32 GiB       |
| 146 | sql/slave_sql |  14033 | 1.61 MiB   | 120 bytes | 1.00 MiB   | 10.79 GiB       |
+-----+---------------+--------+------------+-----------+------------+-----------------+
2 rows in set (3.04 sec)

...

mysql> select thread_id tid, user, current_count_used ccu, current_allocated ca, current_avg_alloc caa, current_max_alloc cma, total_allocated from sys.memory_by_thread_by_current_bytes where thread_id in (145, 146);
+-----+---------------+--------+------------+-----------+------------+-----------------+
| tid | user          |    ccu |         ca |       caa |        cma | total_allocated |
+-----+---------------+--------+------------+-----------+------------+-----------------+
| 145 | sql/slave_io  | 412396 | 1.04 GiB   | 2.64 KiB  | 1.03 GiB   | 17.10 GiB       |
| 146 | sql/slave_sql |  26083 | 1.79 MiB   | 72 bytes  | 1.00 MiB   | 12.03 GiB       |
+-----+---------------+--------+------------+-----------+------------+-----------------+
2 rows in set (2.79 sec)

In this context we'd rather find out how it is used in order to decide if we can find workaround for it. In version 5.7 we have a tool which gives us this information! This is table performance_schema.memory_summary_by_thread_by_event_name and its column CURRENT_NUMBER_OF_BYTES_USED, which contains amount of currently used memory.

mysql> select THREAD_ID tid,EVENT_NAME,COUNT_ALLOC ca,SUM_NUMBER_OF_BYTES_ALLOC snba,SUM_NUMBER_OF_BYTES_FREE as bf,CURRENT_NUMBER_OF_BYTES_USED as cbu,HIGH_NUMBER_OF_BYTES_USED as hbu from performance_schema.memory_summary_by_thread_by_event_name where SUM_NUMBER_OF_BYTES_ALLOC > 0 and THREAD_ID in (145) and CURRENT_NUMBER_OF_BYTES_USED>0 order by CURRENT_NUMBER_OF_BYTES_USED DESC;
+-----+-------------------------------+----------+------------+-------------+------------+------------+
| tid | EVENT_NAME                    |       ca |       snba | bf          | cbu        | hbu        |
+-----+-------------------------------+----------+------------+-------------+------------+------------+
| 145 | memory/sql/thd::main_mem_root |     1039 | 1104280592 |           0 | 1104280592 | 1104280592 |
| 145 | memory/innodb/std             | 19378997 | 2848628832 |  2838767096 |    9861736 |    9862208 |
| 145 | memory/sql/NET::buff          |        3 |     557077 |       16391 |     540686 |     557077 |
| 145 | memory/innodb/trx0undo        |    11424 |    3929856 |     3720360 |     209496 |     209496 |
| 145 | memory/mysys/array_buffer     |       11 |      25952 |        8320 |      17632 |      25504 |
| 145 | memory/vio/read_buffer        |        1 |      16384 |           0 |      16384 |      16384 |
| 145 | memory/sql/THD::transaction...|        1 |       4112 |           0 |       4112 |       4112 |
| 145 | memory/client/MYSQL           |        4 |       1366 |           0 |       1366 |       1366 |
| 145 | memory/client/mysql_options   |        9 |      545 |           0 |        545 |        545 |
| 145 | memory/vio/vio                |        1 |        496 |           0 |        496 |        496 |
| 145 | memory/sql/Log_event          |       11 |       1510 |        1122 |        388 |        772 |
| 145 | memory/innodb/mem0mem         |  2405744 | 1220964072 | 11220963816 |        256 |      21928 |
| 145 | memory/sql/String::value      |        4 |        232 |          16 |        216 |        216 |
| 145 | memory/sql/THD::Session_tra...|        6 |        209 |           0 |        209 |        209 |
| 145 | memory/sql/Gtid_set::Interv...|        1 |        200 |           0 |        200 |        200 |
| 145 | memory/sql/Mutex_cond_array...|        1 |        168 |           0 |        168 |        168 |
| 145 | memory/mysys/lf_slist         |        5 |        160 |           0 |        160 |        160 |
| 145 | memory/sql/THD::debug_sync_...|        1 |        120 |           0 |        120 |        120 |
| 145 | memory/sql/Owned_gtids::sid...|        1 |         96 |           0 |         96 |         96 |
| 145 | memory/mysys/lf_node          |        2 |         80 |           0 |         80 |         80 |
| 145 | memory/innodb/ha_innodb       |        1 |         72 |           0 |         72 |         72 |
| 145 | memory/sql/Sid_map::Node      |        1 |         20 |           0 |         20 |         20 |
| 145 | memory/sql/plugin_ref         |        2 |         16 |           0 |         16 |         16 |
| 145 | memory/sql/THD::Session_sys...|        1 |          8 |           0 |          8 |          8 |
+-----+-------------------------------+----------+------------+-------------+------------+------------+
24 rows in set (0.17 sec)

I watched this output in quite a time when slave was active, so I can be certain what memory/sql/thd::main_mem_root is the reason:

| 145 | memory/sql/thd::main_mem_root |      707 |    512337872 |         0 |   512337872 |  512337872 |
...
| 145 | memory/sql/thd::main_mem_root |      802 |    658752752 |         0 |   658752752 |  658752752 |
...
| 145 | memory/sql/thd::main_mem_root |     1039 |   1104280592 |         0 |  1104280592 | 1104280592 |


Unfortunately, in this case, this knowledge is hardly useful for users (but not for MySQL developers!), because this member of class THD (./sql/sql_class.h):

is used for two purposes:
    - for conventional queries, to allocate structures stored in main_lex
    during parsing, and allocate runtime data (execution plan, etc.)
    during execution.
    - for prepared queries, only to allocate runtime data. The parsed
    tree itself is reused between executions and thus is stored elsewhere.


Although we can be sure what reason of memory leak is not SQL thread, so can save our time and don't do unnecessary optimizations. In other situations we can be more lucky and observe information which could help to find workaround.
PlanetMySQL Voting: Vote UP / Vote DOWN

How to install osCommerce on Ubuntu 14.04 (Trusty Tahr)

How to install osCommerce on Ubuntu 14.04 (Trusty Tahr)

This document describes how to install osCommerce in Ubuntu 14.04. Open Source Commerce (osCommerce) is a popular e-Commerce and online store-management software program that may be easily used on any web server with PHP and MySQL installed. osCommerce is available to users as a free software under the General Public License (GNU) The versatile and fuss-free software enables easy setting up and maintenance of e-stores using minimal effort. This tutorial describes the process of installing osCommerce on Ubuntu 14.04.


PlanetMySQL Voting: Vote UP / Vote DOWN

Pages