Planet MySQL

MySQL at OpenSUSE Conference, Dubrovnik, Croatia & JavaScript UnConference, Hamburg, Germany

The OpenSUSE Conference in Dubrovnik, Croatia started today and will continue till Monday April 28, 2014. MySQL team is going to be part of this event. Do not miss to visit our booth, which will be located on the ground floor as well as come to listen 2 MySQL talks (see OpenSUSE schedule):

  • Georgi Kodinov is going to have a talk about "Protect your MySQL server". His talk is scheduled for Friday, April 25, 2014 @ 13:00 in Ragusa 
  • Norvald Ryeng will talk about "The upstream view" - Package maintenance as seen from MySQL Engineering. His talk is scheduled for Sunday, April 27, 2014 @ 13:00 in Ragusa

For the whole conference plan you can check the Schedule section on the OpenSUSE website.

The second event is a small one-day one called JavaScript UnConference, it's taking place in Hamburg, Germany on Apr 26, 2014. Ulf Wendel will attend this UnConference and will be available at MySQL table as well as around the event. Since no schedule is usually done prior the UnConferences there is no official schedule announced. However attendees usually pick the most interesting topics on place, so do not miss the chance to vote for MySQL talk given by Ulf!

We are looking forward to talking to you at OpenSUSE & JavaScript UnConference! 

PlanetMySQL Voting: Vote UP / Vote DOWN

Importing Raster Based Spatial Data into MySQL 5.7

Setting up the Geospatial Data Abstraction Library
GDAL 1.10+ (by OS Geo) supports converting spatial data between various raster and vector (points, lines, polygons, etc.) based formats, along with supporting MySQL. MySQL and ESRI Shapefiles are generally the best supported combination today.

Version 1.10 of GDAL has big improvements over older versions, but most linux distros (EL6 has GDAL 1.7) only have packages for the older versions. I would recommend building the latest 1.11 RC release from source because the latest stable 1.10.1 source package had a bug building with MySQL 5.6+.

Building it from source would look something like this on UNIX/Linux:

./configure --with-mysql=/usr/local/mysql-5.7.4-labs-april-linux-el6-x86_64/bin/mysql_config make make install

You’ll also need to build the related python based tools separately. For example:

yum install python-devel cd /home/matt/gdal/gdal-1.11.0/swig/python make python install --prefix=/usr/

One additional note related to these new Python tools. If you already have an older version of GDAL installed (PostGIS bundles an older version) you’ll see this error when you try to run them because the old library will be earlier in your runtime linker’s path, if not in its cache:

/home/matt/gdal/gdal-1.11.0/swig/python/scripts/ o41078a1.tif -f "ESRI Shapefile" testdata/ Traceback (most recent call last): File "/home/matt/gdal/gdal-1.11.0/swig/python/scripts/", line 36, in import gdal, ogr, osr File "/usr/lib64/python2.6/site-packages/GDAL-1.11.0-py2.6-linux-x86_64.egg/", line 2, in from osgeo.gdal import deprecation_warn File "/usr/lib64/python2.6/site-packages/GDAL-1.11.0-py2.6-linux-x86_64.egg/osgeo/", line 21, in _gdal = swig_import_helper() File "/usr/lib64/python2.6/site-packages/GDAL-1.11.0-py2.6-linux-x86_64.egg/osgeo/", line 17, in swig_import_helper _mod = imp.load_module('_gdal', fp, pathname, description) ImportError: /usr/lib64/python2.6/site-packages/GDAL-1.11.0-py2.6-linux-x86_64.egg/osgeo/ undefined symbol: GDALRasterBandGetVirtualMem

In this case, we can tell the linker to preload our newer 1.11.0 library in our shell this way:

export LD_PRELOAD=/usr/local/lib/

Setting up MySQL
I would also recommend that you use the MySQL 5.7.4-april lab release (the one labeled “MySQL GIS…” in the pulldown). You can read more about the new GIS work in 5.7 here.

An Example
Now that we have all of the necessary pieces in place, here’s a complete working example using Oracle Linux 6.5 on x86_64, with the MySQL 5.7.4-april labs release (the pre-built binaries), and GDAL 1.11.0 (built from source):

  • 1) I downloaded a sample GeoTIFF file to use.
  • 2) I created a database to play in:
    mysql> create database tifftest;
  • 3) I converted the tiff data into shapefile data (raster data -> polygon data), which we can then load into MySQL:
    mkdir testdata /home/matt/gdal/gdal-1.11.0/swig/python/scripts/ o41078a1.tif -f "ESRI Shapefile" testdata/
  • 4) I finally loaded the generated shapefile into MySQL:
    ogr2ogr -f MySQL MySQL:tifftest,user=root testdata/out.shp -update -overwrite -progress -nln testdata

The final results

mysql> show tables; +--------------------+ | Tables_in_tifftest | +--------------------+ | geometry_columns | | spatial_ref_sys | | testdata | +--------------------+ 3 rows in set (0.00 sec) mysql> show create table testdata\G *************************** 1. row *************************** Table: testdata Create Table: CREATE TABLE `testdata` ( `OGR_FID` int(11) NOT NULL AUTO_INCREMENT, `SHAPE` geometry NOT NULL, `dn` decimal(10,0) DEFAULT NULL, UNIQUE KEY `OGR_FID` (`OGR_FID`), SPATIAL KEY `SHAPE` (`SHAPE`) ) ENGINE=InnoDB AUTO_INCREMENT=3488 DEFAULT CHARSET=latin1 1 row in set (0.00 sec)

mysql> select ogr_fid, st_astext(shape), dn from testdata limit 3\G *************************** 1. row *************************** ogr_fid: 1 st_astext(shape): POLYGON((752396.619854 4557341.646744,752401.496654 4557341.646744,752401.496654 4557339.208344,752403.935054 4557339.208344,752403.935054 4557334.331544,752401.496654 4557334.331544,752401.496654 4557336.769944,752399.058254 4557336.769944,752399.058254 4557339.208344,752396.619854 4557339.208344,752396.619854 4557341.646744)) dn: 0 *************************** 2. row *************************** ogr_fid: 2 st_astext(shape): POLYGON((752311.275854 4557327.016344,752313.714254 4557327.016344,752313.714254 4557322.139544,752311.275854 4557322.139544,752311.275854 4557327.016344)) dn: 0 *************************** 3. row *************************** ogr_fid: 3 st_astext(shape): POLYGON((752386.866254 4557329.454744,752389.304654 4557329.454744,752389.304654 4557324.577944,752386.866254 4557324.577944,752384.427854 4557324.577944,752384.427854 4557327.016344,752386.866254 4557327.016344,752386.866254 4557329.454744)) dn: 0 3 rows in set (0.00 sec)

I hope that this is helpful! If you ever run into a situation where you need to import/load raster based spatial data into MySQL (spatial databases generally only support vector format), as I did recently, then this should at least help you get started.

PlanetMySQL Voting: Vote UP / Vote DOWN

Abdel-Mawla Gharieb: How to Setup MySQL Master/Slave Replication ?

It's not usual to find an easy source on how to setup MySQL replication, I thought it might be useful at least for the beginners to write a direct and simple howto blog on setting up Master/Slave replication in MySQL.

Before going through the replication setup steps, I think it's better to explain first how Replication works in MySQL.

MySQL replication is mainly consists of three-part process:

  • The master server records all data changes to its binary logs (binary log events) and send it to the slave using a thread called (Binlog dump thread) once the slave connects to the master.
  • The slave copies the binary log events sent by the master's binlog dump thread to its relay logs using a thread called (Slave I/O thread).
  • The slave applies these changes from the relay logs to its data by replaying (executing) all events using a thread called (Slave SQL thread).

Now, lets go through the setup process which is divided into 3 main sections:

Master's side configuration:
  • Add the following variables to the MySQL configuration file (my.cnf): [mysqld] server-id=1 ## must be unique log-bin=mysql-bin binlog_format=ROW
  • Restart MySQL so that configuration changes take place: shell> /etc/init.d/mysql restart
  • Create a MySQL user to be used by the slave: SQL> GRANT REPLICATION SLAVE ON *.* TO 'slave_user_name'@'slave_ip' IDENTIFIED BY 's3cret';
  • Take a full snapshot from the master's databases: shell> mysqldump -u root -p --all-databases --flush-privileges --single-transaction --master-data=2 --flush-logs --triggers --routines --events --hex-blob >/path/to/backupdir/full_backup-$TIMESTAMP.sql


    If you have MyISAM tables you should omit the option --single-transaction while --master-data=2 will automatically turn on --lock-all-tables option which is required for MyISAM tables).
  • After preparing the backup file transfer it to the slave server.
  • If you are setting up fresh M/S servers (no data is there on the master) you don't have to backup the master databases and all what you need is to execute the following command on the master server just to know the master's coordinates which will be used later on the slave's: SQL> SHOW MASTER STATUS\G *************************** 1. row *************************** File: mysql-bin.000275 Position: 120 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set:

And that is it on the master's, let's do the slave's work.

Slave's side configuration:
  • Add the following variables to the my.cnf file: [mysqld] server-id=2 ## must be unique. relay_log=relay-log skip-slave-start ## useful to make any checks before starting the slave (this way, slave must be started manually after each mysql restart)
  • Restart MySQL so that configuration changes take place: shell> /etc/init.d/mysql restart
  • Restore the full backup taken from the master (if performed): shell> mysql -u root -p < /path/to/backupdir/full_backup-$TIMESTAMP.sql.
  • Get the master's coordinates information from the backup file: shell> head -n 50 /path/to/backupdir/full_backup-$TIMESTAMP.sql|grep "CHANGE MASTER TO"

    OR from the output of "SHOW MASTER STATUS;" in case of no backups taken (check the final step in the "Master's side configuration").

  • Set the master information on the slave's: SQL> CHANGE MASTER TO -> MASTER_HOST='master_ip', -> MASTER_PORT=3306, -> MASTER_USER='slave_user_name', ## The user which was created on the master -> MASTER_PASSWORD='s3cret', -> MASTER_LOG_FILE='mysql-bin.000275', ## taken from the above step -> MASTER_LOG_POS=120; ## taken from the above step
  • Start replication: SQL> START SLAVE;
Replication checking and troubleshooting:
  • Once the slave is started, check the replication status on the slave by the following command: SQL> SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: Master_User: slave_user_name Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000275 Read_Master_Log_Pos: 120 Relay_Log_File: mysql-relay.000001 Relay_Log_Pos: 283 Relay_Master_Log_File: mysql-bin.000275 Slave_IO_Running: Yes Slave_SQL_Running: Yes . . . Last_SQL_Error: . . .
  • If the Slave_IO_State= connecting .... then make sure that the slave user information is set correctly and there is no firewall restrictions between the two servers (master and slave) this could be checked by connecting to the master's MySQL from the salve server by the replication user (in this example, slave_user_name).
  • If both Slave_IO_Running and Slave_SQL_Running = Yes, then the replication had been set up correctly.
  • If the Slave_SQL_Running = No, check the value of Last_SQL_Error for more details about the SQL error.
  • If you know that error and you want to ignore it, you can execute "SET GLOBAL sql_slave_skip_counter = 1;" on the slave and then start the slave again "START SLAVE;".
  • To restrict all normal users from changing data on the slave - which might break the replication - the option "read-only" should be added in the slave's my.cnf file.
  • the server option "server-id" must be unique among all servers inside the replication (masters and slaves).
  • If your database size is big (100GB or so) Xtrabackup tool could be used instead of mysqldump - when preparing the master snapshot - for faster backup and restore operations. For more information on how to use Xtrabackup, check out this link.
  • For more information on how to setup MySQL replication, check out the manual documentation.

PlanetMySQL Voting: Vote UP / Vote DOWN

Encrypted and incremental MySQL backups with Percona XtraBackup

We’ve recently received a number of questions on how to implement incremental MySQL backups alongside encryption with Percona XtraBackup. Some users thought it was not initially possible because with the default--encryptoptions with XtraBackup, all files will be encrypted, but alas, that is not the case. This is where the option--extra-lsn-dirbecomes useful, because it allows you to save LSN (Log Sequence Number) information to another directory and exclude it from encryption, allowing you to use the same information needed by incremental backups. Enough talk, let me show you.

Because you would want to usually script your backup and restore procedure, I’d use variables here as well to make you more familiar. First, I’d create 3 folders, where my backups will be stored, ‘full’ for full backups, ‘incr’ for incremental backups, and ‘lsns’ to store an extra copy of myxtrabackup_checkpointsfile with--extra-lsn-dir.

mkdir -p /ssd/msb/msb_5_5_360/bkp/full mkdir -p /ssd/msb/msb_5_5_360/bkp/incr mkdir -p /ssd/msb/msb_5_5_360/bkp/lsns

Second, to have better control of where my backups would go, I prefer assigning timestamped folders instead and use the –no-timestamp option to innobackupex.

CURDATE=$(date +%Y-%m-%d_%H_%M_%S)

Then manually create the specific directory where the backup’s xtrabackup_checkpoints file would be saved:

mkdir -p /ssd/msb/msb_5_5_360/bkp/lsns/$CURDATE

Of course, I need an encryption key for my encrypted backups, in this case, taking the example from the manual, I used openssl to generate a random key. You can use your own key string as long as its size conforms to the size required by the--encryptalgorithm you chose.

echo -n $( openssl enc -aes-256-cbc -pass pass:Password -P -md sha1 \ | grep iv | cut -d'=' -f2 ) > /ssd/msb/msb_5_5_360/bkp/backups.key

Next, I would run my full backup:

innobackupex --defaults-file=/ssd/msb/msb_5_5_360/my.sandbox.cnf \ --extra-lsndir=/ssd/msb/msb_5_5_360/bkp/lsns/$CURDATE \ --encrypt=AES256 --encrypt-key-file=/ssd/msb/msb_5_5_360/bkp/backups.key \ --no-timestamp /ssd/msb/msb_5_5_360/bkp/full/$CURDATE

The output says my full backup is saved to:

innobackupex: Backup created in directory '/ssd/msb/msb_5_5_360/bkp/full/2014-04-23_01_20_46' 140423 01:20:55 innobackupex: Connection to database server closed 140423 01:20:55 innobackupex: completed OK!

Now here’s the trick, because the full backup is encrypted, we will use the xtrabackup_checkpoints file separately saved by xtrabackup to the--extra-lsn-dirpath we specified above to get the LSN and use that for our next incremental backup.

LAST_LSN=$( cat /ssd/msb/msb_5_5_360/bkp/lsns/$CURDATE/xtrabackup_checkpoints \ | grep to_lsn | cut -d'=' -f2 ) CURDATE=$(date +%Y-%m-%d_%H_%M_%S) mkdir /ssd/msb/msb_5_5_360/bkp/lsns/$CURDATE

Above, we get the LSN value and assign it to a variable. Similarly, we created a new CURDATE string for our incremental backup to use and created a new directory for the xtrabackup_checkpoints file. If you plan to create another incremental backup based off of what we are about to take now, you will use this next xtrabackup_checkpoints file to get LAST_LSN.

With the up and coming Percona XtraBackup 2.2.1, you will not need--extra-lsn-diranymore nor parse thextrabackup_checkpointsfile anymore for this purpose. A new feature that will allow the user to save backup metadata to an InnoDB table will be available.

So, now that we got our$LAST_LSNvalue, we execute our incremental backup with the command:

innobackupex --defaults-file=/ssd/msb/msb_5_5_360/my.sandbox.cnf \ --extra-lsndir=/ssd/msb/msb_5_5_360/bkp/lsns/$CURDATE \ --encrypt=AES256 --encrypt-key-file=/ssd/msb/msb_5_5_360/bkp/backups.key \ --no-timestamp --incremental --incremental-lsn $LAST_LSN \ /ssd/msb/msb_5_5_360/bkp/incr/$CURDATE

Again, based on the output, my backup was created at:

innobackupex: Backup created in directory '/ssd/msb/msb_5_5_360/bkp/incr/2014-04-23_01_21_00' 140423 01:21:47 innobackupex: Connection to database server closed 140423 01:21:47 innobackupex: completed OK!

No we have a full backup and an incremental backup, of course to make sure our backups are usable, we’d like to validate them. To do that, our first step is to decrypt both full and incremental backups. innobackupex has another handy--decryptoption for that, you can even use--parallelto make it faster.

innobackupex --decrypt=AES256 \ --encrypt-key-file=/ssd/msb/msb_5_5_360/bkp/backups.key \ /ssd/msb/msb_5_5_360/bkp/full/2014-04-23_01_20_46 innobackupex --decrypt=AES256 \ --encrypt-key-file=/ssd/msb/msb_5_5_360/bkp/backups.key \ /ssd/msb/msb_5_5_360/bkp/incr/2014-04-23_01_21_00

Once the backups are decrypted, we can go through the usual process of preparing a full and incremental backups as described on the manual.

innobackupex --defaults-file=/ssd/msb/msb_5_5_360/my.sandbox.cnf \ --apply-log --redo-only /ssd/msb/msb_5_5_360/bkp/full/2014-04-23_01_20_46 innobackupex --defaults-file=/ssd/msb/msb_5_5_360/my.sandbox.cnf \ --apply-log --redo-only /ssd/msb/msb_5_5_360/bkp/full/2014-04-23_01_20_46 \ --incremental-dir=/ssd/msb/msb_5_5_360/bkp/incr/2014-04-23_01_21_00 innobackupex --defaults-file=/ssd/msb/msb_5_5_360/my.sandbox.cnf \ --apply-log /ssd/msb/msb_5_5_360/bkp/full/2014-04-23_01_20_46

The post Encrypted and incremental MySQL backups with Percona XtraBackup appeared first on MySQL Performance Blog.

PlanetMySQL Voting: Vote UP / Vote DOWN

How to Setup Centralized Authentication of ClusterControl Users with LDAP

April 24, 2014 By Severalnines

ClusterControl 1.2.6 introduces integration with Active Directory and LDAP authentication. This allows users to log into ClusterControl by using their corporate credentials instead of a separate password. LDAP groups can be mapped onto ClusterControl user groups to apply roles to the entire group, so it is very convenient for larger organizations who have a centralized LDAP-compliant authentication system. This blog shows you how to configure LDAP authentication in ClusterControl, and allow users to use their Active Directory or LDAP username and password to log in to ClusterControl. 


LDAP authentication can be configured from ClusterControl, in the Admin dashboard (ClusterControl > Admin > LDAP Settings). If you are running ClusterControl v1.2.5 or older, please upgrade to v1.2.6. It requires PHP LDAP package to be installed and enabled. To install, you can just simply use following command:

For Redhat-based:

$ yum install php-ldap

For Debian-based:

$ apt-get install php5-ldap

You may need to restart Apache web server to ensure that PHP LDAP is loaded properly. If you are upgrading your ClusterControl installation using the s9s_upgrade_cmon script, PHP LDAP will be installed automatically.


Organization Structure


The following is a simple organization structure running on OpenLDAP:

dc=severalnines,dc=org (2)

  • ou=group (4)
    • cn=Database Administrator
    • cn=Manager
    • cn=System Administrator
    • cn=Programmer
  • ou=people (4)
    • cn=Eric Steven
    • cn=Matthew Carey
    • cn=Michael Joshua
    • cn=Ronnie Harp

read more

PlanetMySQL Voting: Vote UP / Vote DOWN

Really large NLP corpora

Jeeze people. You’re all noisy. I’m sure it was all done for posterity’s sake.

23M irclogs/MagNET/#perl.log 29M irclogs/freenode/#mysql.log 36M irclogs/freenode/#debian.log 37M irclogs/foonetic/#xkcd.log 39M irclogs/OFTC/#debian.log 43M irclogs/freenode/#jquery.log 44M irclogs/freenode/#perl.log
PlanetMySQL Voting: Vote UP / Vote DOWN

FYI: Galera is just a provider

No news in telling Galera is the synchronous multi master solution for MySQL.
But Galera is just a provider.

What do you mean by provider?

Remember configuring “Galera”?
There are two options
wsrep_provider and wsrep_provider_options
These define the provider (Galera) and the options for the provider.
The rest of the wsrep_ options are for wsrep.

wsrep API defines a set of application callbacks and replication library calls necessary to implement synchronous writeset replication of transactional databases and similar applications. It aims to abstract and isolate replication implementation from application details. Although the main target of this interface is a certification-based multi-master replication, it is equally suitable for both asynchronous and synchronous master/slave replication.

So yes, other providers than galera are possible \o/

I.e. an asynchronous replication provider could already benefit from the parallel applying provided by wsrep :)

Have fun


PlanetMySQL Voting: Vote UP / Vote DOWN

Concurrent, read-only, not cached: MongoDB, TokuMX, MySQL

I repeated the tests described here using a database larger than RAM. The test database has 8 collections/tables with 400M documents/rows per table. I previously reported results for this workload using a server with 24 CPU cores and a slightly different flash storage device. This time I provide a graph and use a server with more CPU cores. The goal for this test is to determine whether the DBMS can use the capacity of a high-performance storage device, the impact from different filesystem readahead settings for MongoDB and TokuMX and the impact from different read page sizes for TokuMX and InnoDB. It will take two blog posts to share everything. I think I will have much better QPS for MongoDB and TokuMX in my next post so I won't list any conclusions here.
SetupI used my forked Java and C sysbench clients. The test query fetches one document/row by PK. The test database has 8 collections/tables with 400M rows per collection/table. All are in one database. I still need to enhance the Java sysbench client to support a database per collection. I tested the configurations listed below. I don't think these are the best configurations for TokuMX and MongoDB and am running more tests to confirm. The test server has 144G RAM, 40 CPU cores and a fast flash storage device.
  • fb56.handler - 740G database, MySQL 5.6.12 with the Facebook patch, InnoDB, page_size=8k, data fetched via HANDLER
  • fb56.sql - 740G database, MySQL 5.6.12 with the Facebook path, InnoDB, page_size=8k, data fetched via SELECT
  • orig57.handler - 740G database, official MySQL 5.7.4, InnoDB, page_size=8k, data fetched via HANDLER. 
  • orig57.sql - 740G database, official MySQL 5.7.4, InnoDB, page_size=8k, data fetched via SELECT
  • tokumx32 - 554G database, TokuMX 1.4.1, quicklz, readPageSize=32K, 16K filesystem readahead
  • tokumx64 - 582G database, TokuMX 1.4.1, quicklz, readPageSize=64K, 32K filesystem readahead
  • mongo24 - 834G database, MongoDB 2.4.9, powerOf2Sizes=0, 16K filesystem readahead
  • mongo26 - 874G database, MongoDB 2.6.0, powerOf2Sizes=1, 16K filesystem readahead
ResultsResults for MySQL 5.7.4 are not in the graph to keep it readable and are similar to MySQL 5.6.12. Note that MySQL is able to get more than 100,000 QPS at high concurrency, TokuMX reaches 30,000 and MongoDB isn't able to reach 20,000. I think MongoDB and TokuMX can do a lot better when I reduce the filesystem readahead for both and reduce the read page size for TokuMX and results for that are in my next post. MongoDB also suffers in this test because the PK index is so large that all leaf nodes cannot fit in RAM so there is more than one disk read per query. This isn't something that goes away via tuning. The workaround it to make sure the database:RAM ratio isn't too big (and spend more money on hardware).
This lists the QPS from the graph.

point queries per second
     8     16     32     40  clients
 39928  63542 102294 107769  fb56.handler
 33630  56834  91132 102336  fb56.sql
 39714  63359 101987 106205  orig57.handler
 33561  56725  90900 101476  orig57.sql
 12586  22738  31407  32167  tokumx32
 10119  16373  18310  18232  tokumx64
 12782  16639  17350  17435  mongo24
 12503  17474  17988  18022  mongo26
AnalysisThese tables list the average disk read rate from iostat r/s and the average number of disk reads per query. InnoDB is by far the most efficient with the smallest number of disk reads per query. TokuMX benefits from having the smallest database courtesy of quicklz compression but might suffer from a larger read page size (32k and 64k). But I don't think that is the only reason why the disk reads per query ratio is so much larger than InnoDB and TokuMX. I am repeating tests with an 8k read page size to confirm. MongoDB suffers from a PK index that is too large to be cached so disk reads are done for it and the document store. Both TokuMX and MongoDB might also do extra reads because of the filesystem readahead and I am repeating tests with smaller values for it to confirm.

iostat r/s
     8     16     32     40  clients
 33661  53502  86028  90616  fb56.handler
 29120  49155  78748  88423  fb56.sql
 33776  53702  86193  89755  orig57.handler
 29244  49268  78801  88027  orig57.sql
 26756  47813  65885  67840  tokumx32
 23728  37442  41357  42089  tokumx64
 18966  24440  25147  25322  mongo24
 18312  25313  25701  25781  mongo26

disk reads per query
     8     16     32     40  clients
  0.84a  0.84   0.84   0.84  fb56.handler
  0.86   0.86   0.86   0.86  fb56.sql
  0.85   0.84   0.84   0.84  orig57.handler
  0.87   0.86   0.86   0.86  orig57.sql
  2.12   2.10   2.09   2.10  tokumx32
  2.34   2.28   2.25   2.29  tokumx64
  1.48   1.46   1.44   1.45  mongo24
  1.54   1.44   1.42   1.43  mongo26

PlanetMySQL Voting: Vote UP / Vote DOWN

RW locks are hard

MongoDB and TokuMX saturated at a lower QPS rate then MySQL when running read-only workloads on a cached database with high concurrency. Many of the stalls were on the per-database RW-lock and I was curious about the benefit from removing that lock. I hacked MongoDB to not use the RW-lock per query (not safe for production) and repeated the test. I got less than 5% more QPS at 32 concurrent clients. I expected more, looked at performance with PMP and quickly realized there were several other sources of mutex contention that are largely hidden by contention on the per-database RW-lock. So this problem won't be easy to fix but I think it can be fixed.

The easy way to implement a reader-writer lock uses the pattern listed below. That includes pthread_rwlock_t in glibc the last time I checked and the per-database RW-lock used by MongoDB. InnoDB used this pattern many years ago and then we rewrote it to make InnoDB better on multi-core. An implementation like this tends to have problems on multi-core servers. The first problem is from locking/unlocking the internal mutex at least twice per use, once to get it in read or write mode and then again to unlock it. When there is contention it can be locked/unlocked many more times than twice per use from threads that wait, wake-up and then wait again. If the operation protected by this RW-lock is very fast then a mutex is usually a better choice. Note that even when all threads are trying to lock in read mode there is still contention on the internal mutex ("mtx" below). Another problem occurs when the thread trying to unlock a RW-lock is blocked trying to lock the internal state mutex ("mtx" below). There might be other threads waiting to run as soon as the unlock gets through but the unlock is stalled because incoming lock requests are competing for the same mutex ("mtx"). I have seen many PMP thread stacks where the unlocking thread is stuck on the lock_mutex call.


Something betterThe alternative that scales better is to use a lock-free approach to get and set internal state in the RW-lock. We did this as part of the Google MySQL patch many years ago and that code was contributed upstream. Such an approach removes much of the contention added by an inefficient RW-lock. It won't prevent contention added because threads want the lock in read and write mode at the same time. That still requires some threads to wait. When we did the work at Google on the InnoDB RW-lock, Yasufumi Kinoshita was working on a similar change. I am very happy he continues to make InnoDB better.

A lock-free implementation for a DBMS is likely to be much more complex than what you might read about on the web or a top-tier systems conference paper. There is more complexity because of the need to support performance monitoring, manageability, special semantics and the occasional wrong design decision. For performance monitoring we need to know how frequently a lock is used and how long threads wait on it. For manageability we need to know what threads wait on a lock and which thread holds it. A frequent pattern is for today's special semantics to become tomorrow's design decisions that we regret. But we can't expect perfection given the need to move fast and the rate at which hardware changes.

The low-level reader-writer lock in MongoDB, QLock, is a RW-lock with special semantics. It has two modes each for read and write locks:  r, R, w and W. It also supports upgrades and downgrades: W to R, R to W, w to X and X to w (I didn't mention X above). Internally there are 6 condition variables, one each for r, R, w and W and then two others, U and X, to support upgrades and downgrades. Read the source for more details. I don't understand the code enough to guess whether lock-free state changes can be supported as they were for the InnoDB RW-lock.
MongoDB detailsI spent a few hours browsing the source for the MongoDB RW-lock and these are my notes. I hope they help you, otherwise they will be a reference for me in the future. Queries that call find to fetch one row by PK start to run in mongod via the newRunQuery function. That gets the per-database RW-lock in read mode by creating a Client::ReadContext object on the stack and ReadContext gets the per-database RW-lock in read mode:
    /** "read lock, and set my context, all in one operation"     *  This handles (if not recursively locked) opening an unopened database.     */    Client::ReadContext::ReadContext(const string& ns, const std::string& path) {        {            lk.reset( new Lock::DBRead(ns) );            Database *db = dbHolder().get(ns, path);            if( db ) {                c.reset( new Context(path, ns, db) );                return;            }        }        ...
The dbHolder().get() call above locks a mutex in DatabaseHolder while using the database name to find the database object. There is simple string searching while the mutex is locked. It might be easy to move some of that work outside the scope of the mutex and perhaps use a mutex per hash table bucket.

        Database * get( const string& ns , const string& path ) const {
            SimpleMutex::scoped_lock lk(_m);
            Paths::const_iterator x = _paths.find( path );
            if ( x == _paths.end() )
                return 0;
            const DBs& m = x->second;
            string db = _todb( ns );
            DBs::const_iterator it = m.find(db);
            if ( it != m.end() )
                return it->second;
            return 0;

        static string __todb( const string& ns ) {
            size_t i = ns.find( '.' );
            if ( i == string::npos ) {
                uassert( 13074 , "db name can't be empty" , ns.size() );
                return ns;
            uassert( 13075 , "db name can't be empty" , i > 0 );
            return ns.substr( 0 , i );


Lets get back to the DBRead constructor that was called in the ReadContext constructor above. It calls lockDB to do the real work. The code below will call other functions that lock mutexes but no mutex is held by the caller to the code below. In my case the block with "if (DB_LEVEL_LOCKING_ENABLED)" is entered and lockTop gets called to do the real work.

    Lock::DBRead::DBRead( const StringData& ns )
        : ScopedLock( 'r' ), _what(ns.toString()), _nested(false) {
        lockDB( _what );

    void Lock::DBRead::lockDB(const string& ns) {
        fassert( 16254, !ns.empty() );
        LockState& ls = lockState();

        Acquiring a(this,ls);

        if ( ls.isRW() )
            StringData db = nsToDatabaseSubstring(ns);
            Nestable nested = n(db);
            if( !nested )
            if( nested )
        else {
            _locked_r = true;

Well, lockTop doesn't do the real work during my benchmark. It calls qlk.lock_r to do that.
    void Lock::DBRead::lockTop(LockState& ls) {        switch( ls.threadState() ) {        case 'r':        case 'w':            break;        default:            verify(false);        case  0  :            qlk.lock_r();            _locked_r = true;        }    }
Almost there, just one more level of indirection. The call to qlk.lock_r calls the lock_r method on an instance of QLock and then something gets done.
    void lock_r() {        verify( threadState() == 0 );        lockState().lockedStart( 'r' );        q.lock_r();    }
    inline void QLock::lock_r() {        boost::mutex::scoped_lock lk(m);        while( !r_legal() ) {            r.c.wait(m);        }        r.n++;    }
Eventually the unlock_r method is called for the same instance of QLock. I won't show the route there however.
    inline void QLock::unlock_r() {        boost::mutex::scoped_lock lk(m);        fassert(16137, r.n > 0);        --r.n;        notifyWeUnlocked('r');    }
And notifyWeUnlocked provides the special semantics. This includes not letting a new reader in when there is a pending write request. The code below also wakes all waiting write requests when one is waiting. This might cause many threads to be scheduled to run even though at most one will get the RW-lock. InnoDB does something similar.
    inline void QLock::notifyWeUnlocked(char me) {        fassert(16201, W.n == 0);        if ( me == 'X' ) {            X.c.notify_all();        }        if( U.n ) {            // U is highest priority            if( (r.n + w.n + W.n + X.n == 0) && (R.n == 1) ) {                U.c.notify_one();                return;            }        }        if ( X_legal() && i_block(me, 'X') ) {            X.c.notify_one();        }        if ( W_legal() && i_block(me, 'W') ) {            W.c.notify_one();            if( _areQueueJumpingGlobalWritesPending() )                return;        }        if ( R_legal_ignore_greed() && i_block(me, 'R') ) {            R.c.notify_all();        }        if ( w_legal_ignore_greed() && i_block(me, 'w') ) {            w.c.notify_all();        }        if ( r_legal_ignore_greed() && i_block(me, 'r') ) {            r.c.notify_all();        }    }

PlanetMySQL Voting: Vote UP / Vote DOWN

MariaDB 5.5.37 Overview and Highlights

MariaDB 5.5.37 was recently released (it is the latest MariaDB 5.5), and is available for download here:

This is a maintenance release, and so there are not too many big changes of note, just a number of normal bug fixes. However, there are a few items worth mentioning:

  • Includes all bugfixes and updates from MySQL 5.5.37
  • XtraDB updated to the version from Percona Server 5.5.36-34.0
  • TokuDB updated to version 7.1.5
  • Default compression for TokuDB is now TOKUDB_ZLIB (instead of TOKUDB_UNCOMPRESSED)
  • The MariaDB Audit Plugin now included.

Given the number of InnoDB and XtraDB fixes (in 5.5.37 and 5.5.36 respectively), if you are using InnoDB or XtraDB+ in MariaDB 5.5, then I would definitely recommend upgrading to MariaDB 5.5.37.

If interested, there is more about the 5.5.37 release here:

And the full list of fixed bugs and changes in MariaDB 5.5.37 can be found here:

Hope this helps.


PlanetMySQL Voting: Vote UP / Vote DOWN

Newbie password mistake

I received a panic call from a newbie MySQL DBA. Or should I say the ‘Linux Admin’/’MySQl DBA’/’CSS guru’/’PHP Programmer’/’Network Admin’/’Backup Operator’/’CIO’ of a small business. He had reset his password was was now locked out. Luckily, he had only changed his password and still had root access.

What he did:
mysql>use mysql;
SET PASSWORD for 'mrdoesall'@'10.%' to 'bigsecret';

Long time MySQL DBAs should be groaning at this with a wince remembering when it happened to them. For those of you who did not catch the problem, what happened is that he value in the user.password table is set to the string ‘bigsecret’. When our friend tries to login, the password is encrypted and compared to the value in user.Password. The comparison of the encrypted value does not equate to the unencrypted value and the login fails.

What he meant to do:
mysql>use mysql;
SET PASSWORD for 'mrdoesall'@'10.%' = PASSWORD('bigsecret');

So with the help of the root account, all was resolved. The CLI interface can let you step on your own feet which is one of the reasons I recommend MySQL Workbench to novices and non full-time DBAs.

PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Connect — Reviewing the submitted presentations

The presentations submitted for MySQL Connect are amazing. This year again I am very fortunate and slightly cursed to be on the selection committee again. Fortunate that I get a glimpse of some truly amazing work being done with MySQL. And cursed in that there are a lot of great talks that will not make it that I still want to see.

We had over 190 submissions for 50 some odd speaking slots. In the past there has been some over lap with two members of the same engineering team proposing slight variations on a topic that can be combined. Sometimes there are blatantly obvious marketing sessions that will not only speed you queries, brighten you teeth, but also walk your dog. Usually it is easy to pick out a few obviously poor proposals. But not this year. They are all pretty much technical raw meat, dripping with staggering loads of vital information. Submissions came from the usual players you would expect to gear at a MySQL Conference (Oracle, Percona, Pithian, Yahoo, Mozilla, SkySQL, etc.) and an amazing number of people from the community.

So please ignore several of us while we tear our hair out or as we wish for 26 hour conference days. We hope you will be eager for MySQL Connect to arrive when you see the list of presentations.

PlanetMySQL Voting: Vote UP / Vote DOWN

Percona Live 2014 behind; MySQL ahead

I started using MySQL 11 years ago.  That’s not too long compared to other people in the industry, but nonetheless here’s my perspective on the state of the MySQL industry after attending Percona Live MySQL Conference & Expo 2104.

In short, the attitude around MySQL has changed from “Does it work?” to “Is it fast and reliable?” to “How do we manage it?” To further generalize, these periods correspond roughly to the original MySQL AB team, Percona and Oracle, and the last period is the current period so key players are emerging, like WebScaleSQL.

Does it work?

Peter Zaitsev said in one of his keynote talks that MySQL used to be considered a toy.  Today that assessment is wrong, or at least very difficult to defend.  The proof is that nearly all of the largest and most successful web companies today use it (Facebook, Twitter, Google, Tumblr, Box, etc.), and myriad other web and traditional companies use it too.  MySQL works, but it’s not a panacea and successful companies realize this.  I’ll talk more about this at this later.

Is it fast and reliable?

The have been rough spots, like MySQL 5.0 and the MySQL-Sun-Oracle transition, but MySQL is past these.  The history is, of course, more nuanced but generally speaking those rough spots gave rise to Percona.  Fast and reliable has been at the heart of Percona Server even before it was Percona Server (i.e. when it was still a collection of patches).  Other projects and companies were created during this time, but in my biased opinion Percona held the fort.  When MySQL became an Oracle product, the collective MySQL conscience waited to see if they would kill or revive it.  They have revived it.  MySQL 5.6 is great, and 5.7 is looking good so far too.  The work Percona did and still does combined with Oracle’s work has made MySQL a product you can bet the business on.  In other words: MySQL won’t fail you.  Moreover, the work at companies like Fusion-io proves that the state of the art apropos performance is alive and well, as highlighted by Nisha Talagala’s excellent keynote “The Evolution of MySQL in the All-Flash Datacenter.”

How do we manage it?

MySQL has become business.  Let me put it another way that’s difficult to say because I consider myself a hacker but I think it’s true nonetheless (and I’ve heard others say it too): MySQL isn’t cool any more.  ”Cool” is the context of technology a weird euphemism for “new and unstable but useful and promising”.  MySQL was all these in past years, but now it’s mature, proven to be stable and useful, and it has delivered on the promise of being a free, open-source RDBMS that’s stable and useful.  As a business product, the concern is manageability: deploying, scaling, monitoring, maintaining, etc.  These are not new concerns; the difference today is focus: in the past these mattered less because we still had core usability and performance issues, but today MySQL usability and performance are solved problems.  Mark Callaghan’s PLMCE 2012 keynote was aptly titled: “What Comes Next?”  In 2012 he saw that MySQL at core was stable, so he turned his attention to things around it which can be pain points, like migrating shards and row compression.  In other words, his message was not “here’s what we still need to fix in MySQL”, it was “here’s what we need to manage MySQL sanely.”  He reiterated this message in a recent blog post, “Modern databases“:

“We have much needed work on write-optimized database algorithms – Tokutek, LevelDB, RocksDB, HBase, Cassandra. We also get reports of amazing performance. I think there is too much focus on peak performance and not enough on predictable performance and manageability.”

In my humble opinion, this is the current state of the MySQL industry: learning, developing, and establishing how to manage MySQL.  Although the new WebScaleSQL collaboration is focused prima facie on performance at scale, as Mark said in his blog post, “Predictable performance is part of manageability.”  There are many other companies and projects for managing various aspects of MySQL, like ClusterControl for MySQL Galera by Severalnines and Propagator by Outbrain (both were are PLMCE this year).

Earlier I said “MySQL works, but it’s not a panacea and successful companies realize this.”  Successful companies like Dyn (who presented this year) use MySQL and other technologies.  It’s important to realize that MySQL is one part of a business.  The other parts are Hadoop, Redis, memcached, etc.  OpenStack and other cloud platforms are increasingly mentioned, too.  Therefore, managing MySQL is only half the story.  The other half is understanding MySQL’s place in and interaction with other business technologies.

In summary, for me Percona Live MySQL Conference & Expo 2014 highlighted how MySQL has become one castle in the kingdom whereas 10 years ago it was an outpost on the frontier.  People no longer ask “is MySQL fast and reliable?” Instead they ask, “how can we manage 100 MySQL instances and a handful of other technologies with 2 full-time DBAs?”  The MySQL industry will continue to add features and improve performance, but we have shifted from doing that in the service of making a stable product to making a manageable product.

The post Percona Live 2014 behind; MySQL ahead appeared first on MySQL Performance Blog.

PlanetMySQL Voting: Vote UP / Vote DOWN

It is event season for SkySQL and MariaDB: Where will we see you next?

The conference season has begun for us here at SkySQL. It seems like the almost all linux and MySQL shows are in the next four months. This is the time of year that we get to collect those airline points, have our beds made for us, eat out a few times a day and put on a few pounds! I have been practicing for this all year. Check out our Events Section to see all of our upcoming events.

read more

PlanetMySQL Voting: Vote UP / Vote DOWN

It is event season for SkySQL and MariaDB: Where will we see you next?

The conference season has begun for us here at SkySQL. It seems like the almost all linux and MySQL shows are in the next four months. This is the time of year that we get to collect those airline points, have our beds made for us, eat out a few times a day and put on a few pounds! I have been practicing for this all year. Check out our Events Section to see all of our upcoming events.

read more

PlanetMySQL Voting: Vote UP / Vote DOWN

Updates from VividCortex

What's new in VividCortex? We'll do these blog posts once a week to let you know all of the changes such as new features, bug fixes, performance improvements, and even a peek into the future.

Upcoming Events

If you haven't seen the VividCortex application in action, please register for a live demo on production systems on May 14th.

Catch us at the following upcoming events:

Upcoming Improvements
  • Agent management UI in the web application.
  • Agent performance improvements.
  • Events dashboard redesign and performance improvements.
  • Replication dashboard, for real-time live updating view of your replication topology and status.
Improvements This Week
  • Added the ability to pin agents to a specified version on specific hosts, all hosts in an environment, and for hosts that have not yet been registered (future host registrations). This is useful for hosts that may be very sensitive and for which a careful evaluation is desired before allowing the agents to upgrade themselves. It's not user-controllable yet, but a new agent management UI is in development, and this will allow users to manage pinned hosts easily. For now, if you want to take advantage of this feature, please use the Intercom icon at the bottom of your left navigation bar to request us to manage it for you.
  • Greatly improved performance of some edge cases in Top Queries, Top Processes, and similar.

Webapp bug fixes:

  • Fixed a layout bug in the Graphs page when it initially loads.
  • Normalized data from sparklines in Top Queries and related views, so the units are nicely consistent.
  • Fixed a problem with the Host Filter not doing type-ahead suggestions correctly.
  • Removed an inappropriate metric from the InnoDB Buffer Pool graph.

Agent bug fixes:

  • General agent performance improvement in all agents, via an update to the Go version we use to build agents.
  • Fixed a problem that could cause vc-mysql-query to be unresponsive and/or use too much CPU while idle.
  • De-duplicated some disk-usage data that was double-reported in some cases.
  • Added the ability to generate metrics for COMSTMTCLOSE commands in the MySQL protocol. Because no response is returned to the client for this command, these events weren't being counted.
  • Fixed several cases where idle servers could be diagnosed with false positive faults.
  • Improved the probabilistic query sampling algorithm.
  • Fixed a problem that caused vc-aggregator to stop sending metrics after network errors.
  • Made vc-mysql-metrics behave more nicely when its MySQL login credentials are changed.
Agent Performance This Week

Here's a graph of agent performance on a typical server over the last week. We're investigating the slight increase in CPU usage by vc-agent-007, but notice that it's only using 0.1779% of a CPU anyway, so that's not a big concern.

PlanetMySQL Voting: Vote UP / Vote DOWN

Ubuntu 14.04 – some MySQL ecosystem notes

Following my previous post on the launch, I just rolled Ubuntu 14.04 LTS on an Amazon EC2 t1.micro instance (not something you expect to run a database server on, for sure – 1 vCPU, 0.613GiB RAM). If you do an apt-cache search mysql you get 435 return result sets with the default configuration (trusty: main & universe).

If you do apt-get install mysql-server, you get MySQL 5.5. You enter the password of choice, and before you know it, MySQL is installed (a SELECT VERSION() will return 5.5.35-1ubuntu1).

Next you decide to install MariaDB. I run an apt-get install mariadb-server. It pulls in libjemalloc (for TokuDB) and I expect future releases to ship this engine by default. You enter the password, and you get a new message (as pictured).


I verify my test database that I created exists. It does. A SELECT VERSION() returns 5.5.36-MariaDB-1. The innodb_version returns 5.5.36-MariaDB-33.0.

I’m curious about MySQL 5.6 now. So I run apt-get install mysql-server-5.6. Not so straightforward. 

start: Job failed to start invoke-rc.d: initscript mysql, action "start" failed. dpkg: error processing package mysql-server-5.6 (--configure): subprocess installed post-installation script returned error exit status 1 Setting up mysql-common-5.6 (5.6.16-1~exp1) ... Processing triggers for libc-bin (2.19-0ubuntu6) ... Errors were encountered while processing: mysql-server-5.6 E: Sub-process /usr/bin/dpkg returned an error code (1)

Looks like MySQL 5.6 is more memory hungry… I edited /etc/mysql/my.cnf to ensure that innodb_buffer_pool_size = 64M (I increased this to 128M and it worked too) was set (there was nothing in the default config) and re-ran apt-get install mysql-server-5.6 and it started. My test database was still around ;-)

I wanted to make sure that MySQL 5.6 isn’t more memory hungry just on that instance so I created yet another clean t1.micro instance and did an apt-get install mysql-server-5.6. Same error. Reported lp#1311387.

Nothing to report in particular about Percona – 5.5.34 Percona XtraDB Cluster (GPL), Release 31.1 (Ubuntu), wsrep_25.9.rXXXX. One thing is for sure – if you’re playing around with the ecosystem, installs and upgrades aren’t exactly straightforward.

Related posts:

  1. MariaDB 10.0.5 storage engines – check the Linux packages
  2. Using MariaDB on CentOS 6
  3. Testing Fedora 19

PlanetMySQL Voting: Vote UP / Vote DOWN