Planet MySQL

Password reuse policy in MySQL 8.0

MySQL has various kinds of password policy enforcement tools: a password can expire (even automatically), can be forced to be of a certain length, contain amounts of various types of characters and be checked against a dictionary of common passwords or the user account name itself.…

MySQL Connector/Java 5.1.46 GA has been released

Dear MySQL Users,

MySQL Connector/J 5.1.46, a maintenance release of the production 5.1
branch has been released. Connector/J is the Type-IV pure-Java JDBC
driver for MySQL.

MySQL Connector Java is available in source and binary form from the
Connector/J download pages at
and mirror sites as well as Maven-2 repositories.

MySQL Connector Java (Commercial) is already available for download on the
My Oracle Support (MOS) website. This release will shortly be available on
eDelivery (OSDC).

As always, we recommend that you check the “CHANGES” file in the
download archive to be aware of changes in behavior that might affect
your application.

MySQL Connector/J 5.1.46 includes the following general bug fixes and
improvements, also available in more detail on

Changes in MySQL Connector/J 5.1.46 (2018-03-12)

Version 5.1.46 is a maintenance release of the production 5.1
branch. It is suitable for use with MySQL Server versions
5.5, 5.6, 5.7, and 8.0. It supports the Java Database
Connectivity (JDBC) 4.2 API.

Functionality Added or Changed

* Because Connector/J restricted TLS versions to v1.1 and
below by default when connecting to MySQL Community
Server 8.0 (which used to be compiled with yaSSL by
default and thus supporting only TLS v1.1 and below), it
failed to connect to to a MySQL 8.0.4 Community Server
(which has been compiled with OpenSSL by default and thus
supports TLS v1.2) that was configured to only allow TLS
v1.2 connections. TLS v1.2 is now enabled for connections
with MySQL Community Server 8.0.4 and later. (Bug

* The bundle for Connector/J 5.1 delivered by Oracle now
contains an additional jar package with the name
(mysql-connector-java-commercial-5.1.ver.jar for
commercial bundles). It is identical with the other jar
package with the original package named
(mysql-connector-java-commercial-5.1.ver-bin.jar for
commercial bundles), except for its more Maven-friendly
file name. (Bug #27231383)

* The lower bound for the connection property
packetDebugBufferSize has been changed to 1, to avoid the
connection errors that occur when the value is set to 0.
(Bug #26819691)

* For multi-host connections, when a MySQL Server was
configured with autocommit=0, Connection.getAutoCommit()
did not return the correct value. This was because
useLocalSessionState=true was assumed for multi-host
connections, which might not be the case, resulting thus
in inconsistent session states.
With this fix, by default, Connector/J executes some
extra queries in the connection synchronization process
to guarantee consistent session states between the client
and the server at any connection switch. This would mean,
however, that when none of the hosts are available during
an attempted server switch, an exception for closed
connection will be thrown immediately while, in earlier
Connector/J versions, there would be a connection error
thrown first before a closed connection error. Error
handling in some applications might need to be adjusted
Applications can skip the new session state
synchronization mechanism by having
useLocalSessionState=true. (Bug #26314325, Bug #86741)

* Connector/J now supports the new caching_sha2_password
authentication plugin for MySQL 8.0, which is the default
authentication plugin for MySQL 8.0.4 and later (see
Caching SHA-2 Pluggable Authentication
gable-authentication.html) for details).
To authenticate accounts with the caching_sha2_password
plugin, either a secure connection to the server using
reference-using-ssl.html) or an unencrypted connection
that supports password exchange using an RSA key pair
(enabled by setting one or both of the connecting
properties allowPublicKeyRetrieval and
serverRSAPublicKeyFile) must be used.
Because earlier versions of Connector/J 5.1 do not
support the caching_sha2_password authentication plugin
and therefore will not be able to connect to accounts
that authenticate with the new plugin (which might
include the root account created by default during a new
installation of a MySQL 8.0 Server), it is highly
recommended that you upgrade now to Connector/J 5.1.46,
to help ensure that your applications continue to work
smoothly with the latest MySQL 8.0 Server.

Bugs Fixed

* When Connector/J 5.1.44 or earlier connected to MySQL
5.7.20 or later, warnings are issued because Connector/J
used the deprecated system variables tx_isolation and
tx_read_only. These SQL-level warnings, returned from a
SHOW WARNINGS statement, might cause some applications to
throw errors and stop working. With this fix, the
deprecated variables are no longer used for MySQL 5.7.20
and later; also, to avoid similar issues, a SHOW WARNINGS
statement is no longer issued for the use of deprecated
variables. (Bug #27029657, Bug #88227)

* When the default database was not specified for a
connection, the connection attributes did not get stored
in the session_connect_attrs table in the Performance
Schema of the MySQL Server. (Bug #22362474, Bug #79612)

On Behalf of Oracle/MySQL Release Engineering Team
Hery Ramilison

MySQL 8.0 : meta-data added to Performance_Schema’s Instruments

In MySQL 8.0, the engineers have added useful meta-data to the table SETUP_INSTRUMENT. This table lists the classes of instrumented objects for which events can be collected.


Let’s have a quick look at these new columns:

PROPERTIES can have the following values

  • global_statistics: only global summaries are available for this instrument. Example: memory/performance_schema/metadata_locks that return the memory used for table performance_schema.metadata_locks
  • mutable: only applicable for statement instruments as they can “mutate” into a more specific one. Example: statement/abstract/relay_log that returns the new event just read from the relay log.
  • progress: applied only to stage instruments, it reports progress data. Example: stage/sql/copy to tmp table
  • singleton: instruments having a single instance. Example: wait/synch/mutex/sql/LOCK_error_log, like most global mutex locks this lock on error log is a singleton.
  • user: instrument related to user workload. Example: the instrument on idle


This define the life or creation occurrence of the instrument. The possible values from low to high are:

  • 0 : unknown
  • 1 : permanent
  • 2 : provisioning
  • 3 : ddl
  • 4 : cache
  • 5 : session
  • 6 : transaction
  • 7 : query
  • 8 : intra_query

For example, wait/synch/mutex/sql/THD::LOCK_thd_query as a volatility of 5, which means this mutex is created each time a session connects and destroyed when the session disconnects.

There is no point then to enable an instrument for an object already created.


Finally, now there is a documentation column describing the purpose of the instrument. Currently there are 80 instruments documented with the help of that column.

This is an example:

NAME: memory/performance_schema/prepared_statements_instances ENABLED: YES TIMED: NULL PROPERTIES: global_statistics VOLATILITY: 1 DOCUMENTATION: Memory used for table performance_schema.prepared_statements_instances

All this is explained in details in the MySQL’s manual.

Enjoy MySQL 8.0 and I wish you a pleasant discovery of all the new features !

dbdeployer release candidate

The latest release of dbdeployer is possibly the last one with a leading 0. If no serious bugs are found in the next two weeks, the next release will bear a glorious 1.0.

Latest news

The decision to get out of the stream of pre-releases that were published until now comes because I have implemented all the features that I wanted to add: mainly, all the ones that I wished to add to MySQL-Sandbox but it would have been too hard:

The latest addition is the ability of running multi-source topologies. Now we can run four topologies:

  • master-slave is the default topology. It will install one master and two slaves. More slaves can be added with the option --nodes.
  • group will deploy three peer nodes in group replication. If you want to use a single primary deployment, add the option --single-primary. Available for MySQL 5.7 and later.
  • fan-in is the opposite of master-slave. Here we have one slave and several masters. This topology requires MySQL 5.7 or higher.
    all-masters is a special case of fan-in, where all nodes are masters and are also slaves of all nodes.

It is possible to tune the flow of data in multi-source topologies. The default for fan-in is three nodes, where 1 and 2 are masters, and 2 are slaves. You can change the predefined settings by providing the list of components:

$ dbdeployer deploy replication \
--topology=fan-in \
--nodes=5 \
--master-list="1 2 3" \
--slave-list="4 5" \
8.0.4 \

In the above example, we get 5 nodes instead of 3. The first three are master (--master-list="1 2 3") and the last two are slaves (--slave-list="4 5") which will receive data from all the masters. There is a test automatically generated to test replication flow. In our case it shows the following:

$ ~/sandboxes/fan_in_msb_8_0_4/test_replication
# master 1
# master 2
# master 3
# slave 4
ok - '3' == '3' - Slaves received tables from all masters
# slave 5
ok - '3' == '3' - Slaves received tables from all masters
# pass: 2
# fail: 0

The first three lines show that each master has done something. In our case, each master has created a different table. Slaves in nodes 5 and 6 then count how many tables they found, and if they got the tables from all masters, the test succeeds.
Note that for all-masters topology there is no need to specify master-list or slave-list. In fact, those lists will be auto-generated, and they will both include all deployed nodes.

What now?

Once I make sure that the current features are reasonably safe (I will only write more tests for the next 10~15 days) I will publish the first (non-pre) release of dbdeployer. From that moment, I'd like to follow the recommendations of the Semantic Versioning:

  • The initial version will be 1.0.0 (major, minor, revision);
  • The spects for 1.0 will be the API that needs to be maintained.
  • Bug fixes will increment the revision counter.
  • New features that don't break compatibility with the API will increment the minor counter;
  • New features or changes that break compatibility will trigger a major counter increment.

Using this method will give users a better idea of what to expect. If we get a revision number increase, it is only bug fixes. An increase in the minor counter means that there are new features, but all previous features work as before. An increase in the major counter means that something will break, either because of changed interface or because of changed behavior.
In practice, the tests released with 1.0.0 should run with any 1.x subsequent version. When those tests need changes to run correctly, we will need to bump up the major version.

Let's see if this method is sustainable. So far, I haven't had need to do behavioural changes, which are usually provoked by new versions of MySQL that introduce incompatible behavior (definitely MySQL does not follow the Semantic Versioning principles.) When the next version becomes available, I will see if this RC of dbdeployer can stand its ground.

Online Schema Change for Tables with Triggers.

In this post, We will learn how to handle online schema change if the table has triggers.

In PXC, an alter can be made directly ( TOI ) on tables with less than a 1G ( by default) , but on a 20GB or 200GB table we need some downtime to do ( RSU ).

Pt-osc is a good choice for Percona Cluster/Galera. By default percona toolkit’s pt-online-schema-change will create After “insert / update / delete” triggers for maintaining the sync between the shadow and the original table.

pt-online-schema-change process flow:

Check out the complete slides for effective MySQL administration here

If the tables has triggers already then pt-osc wont work well in those cases. It was an limitation with online schema changes.

Still MySQL 5.6, We cannot create multiple triggers for the same event and type.

From Documentation:

There cannot be multiple triggers for a given table that have the same trigger event and action time. For example, you cannot have two BEFORE UPDATE triggers for a table. But you can have a BEFORE UPDATE and a BEFORE INSERT trigger, or a BEFORE UPDATE and an AFTER UPDATE trigger.

On this case, We will have to drop the triggers before starting online schema change. And re-create the triggers after completion of online schema change. In a production environment it’s a complex operation to perform and requires a downtime.

On MySQL 5.6.32:

[root@mysql-5.6.32 ~]# pt-online-schema-change --version pt-online-schema-change 3.0.6 [root@mysql-5.6.32 ~]# pt-online-schema-change D=mydbops,t=employees,h=localhost \ --user=root --alter "drop column test,add column test text" \ --no-version-check --execute The table `mydbops`.`employees` has triggers. This tool needs to create its own triggers, so the table cannot already have triggers.

From MySQL 5.7.2, A table can hold multiple triggers.

From Documentation:

It is possible to define multiple triggers for a given table that have the same trigger event and action time. For example, you can have two BEFORE UPDATE triggers for a table.

Complete list of new features of MySQL 5.7 here. This relaxed the complexity of implementation of pt-osc support for tables with triggers.

pt-online-schema-change – v3.0.4, released on 2017-08-02 came with an option –preserve-triggers. Which added a 5.7 only feature, To allow pt-osc to handle OSC operation even the table has triggers.

We can find interesting discussions and implementation complexities in the following ticket

Even Gh-ost won’t work for PXC without locking the table in MySQL 5.7. Issues

On MySQL 5.7.19:

[root@mysql-5.7.19 ~]# pt-online-schema-change --version pt-online-schema-change 3.0.6 [root@mysql-5.7.19 ~]# pt-online-schema-change D=mydbops,t=employees,h=localhost \ --user=root --alter "drop column test,add column test text" \ --no-version-check --preserve-triggers --execute Operation, tries, wait: analyze_table, 10, 1 copy_rows, 10, 0.25 create_triggers, 10, 1 drop_triggers, 10, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1 Altering `mydbops`.`employees`... Creating new table... Created new table mydbops._employees_new OK. Altering new table... Altered `mydbops`.`_employees_new` OK. 2018-03-02T07:27:35 Creating triggers... 2018-03-02T07:27:35 Created triggers OK. 2018-03-02T07:27:35 Copying approximately 10777 rows... 2018-03-02T07:27:35 Copied rows OK. 2018-03-02T07:27:35 Adding original triggers to new table. 2018-03-02T07:27:35 Analyzing new table... 2018-03-02T07:27:35 Swapping tables... 2018-03-02T07:27:36 Swapped original and new tables OK. 2018-03-02T07:27:36 Dropping old table... 2018-03-02T07:27:36 Dropped old table `mydbops`.`_employees_old` OK. 2018-03-02T07:27:36 Dropping triggers... 2018-03-02T07:27:36 Dropped triggers OK. Successfully altered `mydbops`.`employees`.

–preserve-triggers If this option is enabled, pt-online-schema-change will create all the existing triggers to the new table (mydbops._employees_new) after copying rows from the original table (mydbops.employees).

Explained the output with PTDEBUG=1 enabled for better understanding.

1. pt-osc created similar table and applied modifications on it.

# Cxn:3953 2845 DBI::db=HASH(0x2db1260) Connected dbh to mysql-5.7.19 h=localhost # TableParser:3265 2845 SHOW CREATE TABLE `mydbops`.`employees` Creating new table... # pt_online_schema_change:10392 2845 CREATE TABLE `mydbops`.`_employees_new` ( # `employeeNumber` int(11) NOT NULL, # `lastName` varchar(50) DEFAULT NULL, # `firstName` varchar(50) DEFAULT NULL, # `extension` varchar(10) DEFAULT NULL, # `email` varchar(100) DEFAULT NULL, # `officeCode` varchar(10) DEFAULT NULL, # `reportsTo` int(11) DEFAULT NULL, # `jobTitle` varchar(50) DEFAULT NULL, # `test` text, # PRIMARY KEY (`employeeNumber`), # KEY `reportsTo` (`reportsTo`), # KEY `officeCode` (`officeCode`) # ) ENGINE=InnoDB DEFAULT CHARSET=latin1 Created new table mydbops._employees_new OK. Altering new table... # pt_online_schema_change:9192 2845 ALTER TABLE `mydbops`.`_employees_new` drop column test,add column test text Altered `mydbops`.`_employees_new` OK.

2. pt-osc created After [insert / update / delete] triggers to sync the upcoming data between the source table and new table.

# pt_online_schema_change:11058 2845 CREATE TRIGGER `pt_osc_mydbops_employees_del` AFTER DELETE ON `mydbops`.`employees` FOR EACH ROW DELETE IGNORE FROM `mydbops`.`_employees_new` WHERE `mydbops`.`_employees_new`.`employeenumber` <=> OLD.`employeenumber` # pt_online_schema_change:11058 2845 CREATE TRIGGER `pt_osc_mydbops_employees_upd` AFTER UPDATE ON `mydbops`.`employees` FOR EACH ROW BEGIN DELETE IGNORE FROM `mydbops`.`_employees_new` WHERE !(OLD.`employeenumber` <=> NEW.`employeenumber`) AND `mydbops`.`_employees_new`.`employeenumber` <=> OLD.`employeenumber`;REPLACE INTO `mydbops`.`_employees_new` (`employeenumber`, `lastname`, `firstname`, `extension`, `email`, `officecode`, `reportsto`, `jobtitle`, `test`) VALUES (NEW.`employeenumber`, NEW.`lastname`, NEW.`firstname`, NEW.`extension`, NEW.`email`, NEW.`officecode`, NEW.`reportsto`, NEW.`jobtitle`, NEW.`test`);END # pt_online_schema_change:11058 2845 CREATE TRIGGER `pt_osc_mydbops_employees_ins` AFTER INSERT ON `mydbops`.`employees` FOR EACH ROW REPLACE INTO `mydbops`.`_employees_new` (`employeenumber`, `lastname`, `firstname`, `extension`, `email`, `officecode`, `reportsto`, `jobtitle`, `test`) VALUES (NEW.`employeenumber`, NEW.`lastname`, NEW.`firstname`, NEW.`extension`, NEW.`email`, NEW.`officecode`, NEW.`reportsto`, NEW.`jobtitle`, NEW.`test`) 2018-03-02T05:56:46 Created triggers OK.

3. pt-osc copied the existing records.

# pt_online_schema_change:11332 2845 INSERT LOW_PRIORITY IGNORE INTO `mydbops`.`_employees_new` (`employeenumber`, `lastname`, `firstname`, `extension`, `email`, `officecode`, `reportsto`, `jobtitle`, `test`) SELECT `employeenumber`, `lastname`, `firstname`, `extension`, `email`, `officecode`, `reportsto`, `jobtitle`, `test` FROM `mydbops`.`employees` FORCE INDEX(`PRIMARY`) WHERE ((`employeenumber` >= ?)) AND ((`employeenumber` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 2845 copy nibble*/ lower boundary: 0 upper boundary: 2335

4. After existing row copied, pt-osc created the trigger present on the source table to the newly created table (mydbops._employees_new).

# pt_online_schema_change:9795 2845 CREATE DEFINER=`root`@`%` TRIGGER `mydbops`.`mydbops_employee_update` BEFORE UPDATE ON _employees_new # FOR EACH ROW # BEGIN # INSERT INTO employees_audit # SET action = 'update', # employeeNumber = OLD.employeeNumber, # lastname = OLD.lastname, # changedat = NOW(); # END

5. Swapping the table and dropping the triggers.

2018-03-02T05:56:47 Analyzing new table... # pt_online_schema_change:10465 2836 ANALYZE TABLE `mydbops`.`_employees_new` /* pt-online-schema-change */ 2018-03-02T05:56:47 Swapping tables... # pt_online_schema_change:10503 2836 RENAME TABLE `mydbops`.`employees` TO `mydbops`.`_employees_old`, `mydbops`.`_employees_new` TO `mydbops`.`employees` 2018-03-02T05:56:47 Swapped original and new tables OK. 2018-03-02T05:56:47 Dropping old table... # pt_online_schema_change:9937 2845 DROP TABLE IF EXISTS `mydbops`.`_employees_old` 2018-03-02T05:56:47 Dropped old table `mydbops`.`_employees_old` OK. 2018-03-02T05:56:47 Dropping triggers... # pt_online_schema_change:11182 2845 DROP TRIGGER IF EXISTS `mydbops`.`pt_osc_mydbops_employees_del` # pt_online_schema_change:11182 2845 DROP TRIGGER IF EXISTS `mydbops`.`pt_osc_mydbops_employees_upd` # pt_online_schema_change:11182 2845 DROP TRIGGER IF EXISTS `mydbops`.`pt_osc_mydbops_employees_ins` 2018-03-02T05:56:47 Dropped triggers OK. Successfully altered `mydbops`.`employees`.

I hope this gives you a better idea about –preserve-triggers.

Key Takeaways:

  • Upto MySQL 5.6, Only way to alter the tables using pt-osc is to drop the existing triggers and create after the alter done.
  • From MySQL 5.7, We can use –preserve-triggers option of pt-osc for seamless schema changes though we have triggers present on our table.

It gives us one more reason to recommend MySQL 5.7 upgrade. I also feel pt-osc can be provided with support for the tables with before triggers at least for MySQL versions until 5.6.


External Tables + Merge

This is an example of how you would upload data from a flat file, or Comma Separated Value (CSV) file. It’s important to note that in the file upload you are transferring information that doesn’t have surrogate key values by leveraing joins inside a MERGE statement.

Step #1 : Create a virtual directory

You can create a virtual directory without a physical directory but it won’t work when you try to access it. Therefore, you should create the physical directory first. Assuming you’ve created a /u01/app/oracle/upload file directory on the Windows platform, you can then create a virtual directory and grant permissions to the student user as the SYS privileged user.

The syntax for these steps is:

CREATE DIRECTORY upload AS '/u01/app/oracle/upload'; GRANT READ, WRITE ON DIRECTORY upload TO student;

Step #2 : Position your CSV file in the physical directory

After creating the virtual directory, copy the following contents into a file named kingdom_import.csv in the /u01/app/oracle/upload directory or folder. If you attempt to do this in Windows, you need to disable Windows UAC before performing this step.

Place the following in the kingdom_import.csv file. The trailing commas aren’t too meaningful in Oracle but they’re very helpful if you use the file in MySQL. A key element in creating this files requires that you avoid trailing line returns at the bottom of the file because they’re inserted as null values. There should be no lines after the last row of data.

'Narnia',77600,'Peter the Magnificent','20-MAR-1272','19-JUN-1292', 'Narnia',77600,'Edmund the Just','20-MAR-1272','19-JUN-1292', 'Narnia',77600,'Susan the Gentle','20-MAR-1272','19-JUN-1292', 'Narnia',77600,'Lucy the Valiant','20-MAR-1272','19-JUN-1292', 'Narnia',42100,'Peter the Magnificent','12-APR-1531','31-MAY-1531', 'Narnia',42100,'Edmund the Just','12-APR-1531','31-MAY-1531', 'Narnia',42100,'Susan the Gentle','12-APR-1531','31-MAY-1531', 'Narnia',42100,'Lucy the Valiant','12-APR-1531','31-MAY-1531', 'Camelot',15200,'King Arthur','10-MAR-0631','12-DEC-0686', 'Camelot',15200,'Sir Lionel','10-MAR-0631','12-DEC-0686', 'Camelot',15200,'Sir Bors','10-MAR-0631','12-DEC-0635', 'Camelot',15200,'Sir Bors','10-MAR-0640','12-DEC-0686', 'Camelot',15200,'Sir Galahad','10-MAR-0631','12-DEC-0686', 'Camelot',15200,'Sir Gawain','10-MAR-0631','12-DEC-0686', 'Camelot',15200,'Sir Tristram','10-MAR-0631','12-DEC-0686', 'Camelot',15200,'Sir Percival','10-MAR-0631','12-DEC-0686', 'Camelot',15200,'Sir Lancelot','30-SEP-0670','12-DEC-0682',

Step #3 : Reconnect as the student user

Disconnect and connect as the student user, or reconnect as the student user. The reconnect syntax that protects your password is:

CONNECT student@xe

Step #4 : Run the script that creates tables and sequences

Copy the following into a create_kingdom_upload.sql file within a directory of your choice. Then, run it as the student account.

-- Conditionally drop tables and sequences. BEGIN FOR i IN (SELECT table_name FROM user_tables WHERE table_name IN ('KINGDOM','KNIGHT','KINGDOM_KNIGHT_IMPORT')) LOOP EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name||' CASCADE CONSTRAINTS'; END LOOP; FOR i IN (SELECT sequence_name FROM user_sequences WHERE sequence_name IN ('KINGDOM_S1','KNIGHT_S1')) LOOP EXECUTE IMMEDIATE 'DROP SEQUENCE '||i.sequence_name; END LOOP; END; / -- Create normalized kingdom table. CREATE TABLE kingdom ( kingdom_id NUMBER , kingdom_name VARCHAR2(20) , population NUMBER); -- Create a sequence for the kingdom table. CREATE SEQUENCE kingdom_s1; -- Create normalized knight table. CREATE TABLE knight ( knight_id NUMBER , knight_name VARCHAR2(24) , kingdom_allegiance_id NUMBER , allegiance_start_date DATE , allegiance_end_date DATE); -- Create a sequence for the knight table. CREATE SEQUENCE knight_s1; -- Create external import table. CREATE TABLE kingdom_knight_import ( kingdom_name VARCHAR2(20) , population NUMBER , knight_name VARCHAR2(24) , allegiance_start_date DATE , allegiance_end_date DATE) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY upload ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII BAFFLE 'UPLOAD':'kingdom_import.bad' DISCARDFILE 'UPLOAD':'kingdom_import.dis' LOGFILE 'UPLOAD':'kingdom_import.log' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY "'" MISSING FIELD VALUES ARE NULL ) LOCATION ('kingdom_import.csv')) REJECT LIMIT UNLIMITED;

Step #5 : Test your access to the external table

There a number of things that could go wrong with setting up an external table, such as file permissions. Before moving on to the balance of the steps, you should test what you’ve done. Run the following query from the student account to check whether or not you can access the kingdom_import.csv file.

COL kingdom_name FORMAT A8 HEADING "Kingdom|Name" COL population FORMAT 99999999 HEADING "Population" COL knight_name FORMAT A30 HEADING "Knight Name" SELECT kingdom_name , population , knight_name , TO_CHAR(allegiance_start_date,'DD-MON-YYYY') AS allegiance_start_date , TO_CHAR(allegiance_end_date,'DD-MON-YYYY') AS allegiance_end_date FROM kingdom_knight_import;

Step #6 : Create the upload procedure

Copy the following into a create_upload_procedure.sql file within a directory of your choice. Then, run it as the student account.

-- Create a procedure to wrap the transaction. CREATE OR REPLACE PROCEDURE upload_kingdom IS BEGIN -- Set save point for an all or nothing transaction. SAVEPOINT starting_point; -- Insert or update the table, which makes this rerunnable when the file hasn't been updated. MERGE INTO kingdom target USING (SELECT DISTINCT k.kingdom_id , kki.kingdom_name , kki.population FROM kingdom_knight_import kki LEFT JOIN kingdom k ON kki.kingdom_name = k.kingdom_name AND kki.population = k.population) source ON (target.kingdom_id = source.kingdom_id) WHEN MATCHED THEN UPDATE SET kingdom_name = source.kingdom_name WHEN NOT MATCHED THEN INSERT VALUES ( kingdom_s1.nextval , source.kingdom_name , source.population); -- Insert or update the table, which makes this rerunnable when the file hasn't been updated. MERGE INTO knight target USING (SELECT kn.knight_id , kki.knight_name , k.kingdom_id , kki.allegiance_start_date AS start_date , kki.allegiance_end_date AS end_date FROM kingdom_knight_import kki INNER JOIN kingdom k ON kki.kingdom_name = k.kingdom_name AND kki.population = k.population LEFT JOIN knight kn ON k.kingdom_id = kn.kingdom_allegiance_id AND kki.knight_name = kn.knight_name AND kki.allegiance_start_date = kn.allegiance_start_date AND kki.allegiance_end_date = kn.allegiance_end_date) source ON (target.kingdom_allegiance_id = source.kingdom_id) WHEN MATCHED THEN UPDATE SET allegiance_start_date = source.start_date , allegiance_end_date = source.end_date WHEN NOT MATCHED THEN INSERT VALUES ( knight_s1.nextval , source.knight_name , source.kingdom_id , source.start_date , source.end_date); -- Save the changes. COMMIT; EXCEPTION WHEN OTHERS THEN ROLLBACK TO starting_point; RETURN; END; /

Step #7 : Run the upload procedure

You can run the file by calling the stored procedure built by the script. The procedure ensures that records are inserted or updated into their respective tables.

EXECUTE upload_kingdom;

Step #8 : Test the results of the upload procedure

You can test whether or not it worked by running the following queries.

-- Check the kingdom table. SELECT * FROM kingdom; -- Format Oracle output. COLUMN knight_id FORMAT 999 HEADING "Knight|ID #" COLUMN knight_name FORMAT A23 HEADING "Knight Name" COLUMN kingdom_allegiance_id FORMAT 999 HEADING "Kingdom|Allegiance|ID #" COLUMN allegiance_start_date FORMAT A11 HEADING "Allegiance|Start Date" COLUMN allegiance_end_date FORMAT A11 HEADING "Allegiance|End Date" SET PAGESIZE 999 -- Check the knight table. SELECT knight_id , knight_name , kingdom_allegiance_id , TO_CHAR(allegiance_start_date,'DD-MON-YYYY') AS allegiance_start_date , TO_CHAR(allegiance_end_date,'DD-MON-YYYY') AS allegiance_end_date FROM knight;

It should display the following information:

KINGDOM_ID KINGDOM_NAME POPULATION ---------- -------------------- ---------- 1 Narnia 42100 2 Narnia 77600 3 Camelot 15200 Kingdom Knight Allegiance Allegiance Allegiance ID # Knight Name ID # Start Date End Date ------ ----------------------- ---------- ----------- ----------- 1 Peter the Magnificent 2 20-MAR-1272 19-JUN-1292 2 Edmund the Just 2 20-MAR-1272 19-JUN-1292 3 Susan the Gentle 2 20-MAR-1272 19-JUN-1292 4 Lucy the Valiant 2 20-MAR-1272 19-JUN-1292 5 Peter the Magnificent 1 12-APR-1531 31-MAY-1531 6 Edmund the Just 1 12-APR-1531 31-MAY-1531 7 Susan the Gentle 1 12-APR-1531 31-MAY-1531 8 Lucy the Valiant 1 12-APR-1531 31-MAY-1531 9 King Arthur 3 10-MAR-0631 12-DEC-0686 10 Sir Lionel 3 10-MAR-0631 12-DEC-0686 11 Sir Bors 3 10-MAR-0631 12-DEC-0635 12 Sir Bors 3 10-MAR-0640 12-DEC-0686 13 Sir Galahad 3 10-MAR-0631 12-DEC-0686 14 Sir Gawain 3 10-MAR-0631 12-DEC-0686 15 Sir Tristram 3 10-MAR-0631 12-DEC-0686 16 Sir Percival 3 10-MAR-0631 12-DEC-0686 17 Sir Lancelot 3 30-SEP-0670 12-DEC-0682

You can rerun the procedure to check that it doesn’t alter any information, then you could add a new knight to test the insertion portion.

Concurrent sandbox deployment

Version 0.3.0 of dbdeployer has gained the ability of deploying multiple sandboxes concurrently. Whenever we deploy a group of sandboxes (replication, multiple) we can use the --concurrent flag, telling dbdeployer that it should run operations concurrently.

What happens when a single sandbox gets deployed? There are six sets of operations:

  1. Create the sandbox directory and write down its scripts;
  2. Run the initialisation script;
  3. Start the database server;
  4. Run the pre-grants SQL commands (if any;)
  5. Load the grants;
  6. Run the post-grants SQL commands (if any;)

When several sandboxes are deployed concurrently, dbdeployer runs only the first step, and then creates a list of commands with an associated priority index. These commands are assembled for every sandbox, and then executed concurrently for every step.
The sequence of events for a deployment of three sandboxes in replication would be like this:

  1. Create the sandbox skeleton for every sandbox;
  2. Initialise all database servers;
  3. start all the servers;
  4. run the pre-grants, grants, post-grants scripts.
  5. Runs the group initialisation script (start master and slaves, or setup group replication).

Depending on the computer architecture, the server version, and the number of nodes, the speed of deployment can increase from 2 to 5 times.

Let's see an example:

$ time dbdeployer deploy replication 5.7.21
real 0m13.789s
user 0m1.143s
sys 0m1.873s

$ time dbdeployer deploy replication 5.7.21 --concurrent
real 0m7.780s
user 0m1.329s
sys 0m1.811s

There is a significant speed increase. The gain rises sharply if we use an higher number of nodes.

$ time dbdeployer deploy replication 5.7.21 --nodes=5
real 0m23.425s
user 0m1.923s
sys 0m3.106s

$ time dbdeployer deploy replication 5.7.21 \
--nodes=5 --concurrent
real 0m7.686s
user 0m2.248s
sys 0m2.777s

As we can see, the time for deploying 5 nodes is roughly the same used for 3 nodes. While the sequential operations take time proportionally with the number of nodes, the concurrent task stays almost constant.

Things a re a bit different for group replication, as the group initialisation (which happens after all the servers are up and running) takes more time than the simple master/slave deployment, and can't be easily reduced using the current code.

A similar optimisation happens when we delete multiple sandboxes. Here the operation is at sandbox level (1 replication cluster = 1 sandbox) not at server level, and for that reason the gain is less sharp. Still, operations are noticeably faster.

There is room for improvement, but I have seen that the total testing time for dbdeployer test suite has dropped from 26 to 15 minutes. I think it was a week end well spent.

Checking User Threads With gdb in MySQL 5.7+

In one of my gdb-related posts last year I noted that there is no more simple global list of user threads in MySQL 5.7+:
"I had highlighted Global_THD_manager singleton also as during my next gdb sessions I had found out that simple global list of threads is also gone and in 5.7 everything is done via that Global_THD_manager. This is a topic for some other post, though."In that post and many times later when I had to deal with MySQL 5.7+ I just checked OS threads one by one in gdb using thread  1 ... thread N commands. This is not efficient at best, as I also hit numerous background threads that I often do not care about. So, a couple of weeks ago I finally decided to get back to this topic and find out how to check just user threads one by one in recent MySQL versions. I had a nice hint by Shane Bester on how to get information about $i-th thread (that he shared in one of his comments to my Facebook post):
set $value = (THD**)(Global_THD_manager::thd_manager-> + (sizeof(THD**) * $i))I've attached gdb to an instance of Percona Server 5.7.x that I had running in my CentOS 6.9 VM and tried few commands to check types and content of the Global_THD_manager elements:
(gdb) p Global_THD_manager::thd_manager
$1 = (Global_THD_manager *) 0x7fab087fd000
(gdb) p Global_THD_manager::thd_manager->thd_list
$2 = {m_size = 2, m_capacity = 500, m_buff = {{
      data = "\000\060b\344\252\177\000\000\000\220i\344\252\177\000\000\000\200x\344\252\177", '\000' <repeats 3977 times>, align = {<No data fields>}}},
  m_array_ptr = 0x7fab087fd010, m_psi_key = 0}
So, we see that internally there is some array of elements thd_list with m_size items (2 in my case) probably stored in some pre-allocated buffer of m_capacity (500) elements, stored in The type of elements is not clear, but we can try Shane's hint and assume that they are of type THD**. Let's try to check what we see there after type castings:
(gdb) p (THD**)(Global_THD_manager::thd_manager->
$4 = (THD **) 0x7fab087fd010
(gdb) p  *(THD**)(Global_THD_manager::thd_manager->
$5 = (THD *) 0x7faae4623000
(gdb) p  **(THD**)(Global_THD_manager::thd_manager->
$6 = {<MDL_context_owner> = {
    _vptr.MDL_context_owner = 0x1c51f50}, <Query_arena> = {
...So, we get reasonable addresses and when we dereference the resulting THD** pointer twice we indeed get a structure that looks like THD of MySQL 5.7+ (it's very different, say, in MariaDB 10.1.x), with reasonable content (that is huge and skipped above).

I've tried to get processlist id of thread based on findings of that post using intermediate gdb variables:

(gdb) set $ppthd = (THD**)(Global_THD_manager::thd_manager->
(gdb) p *($ppthd)
$7 = (THD *) 0x7faae4623000

(gdb) set $pthd = *($ppthd)
(gdb) p $pthd->m_thread_id
$10 = 5and then directly, using offsets and checking for security contexts of threads:
(gdb) p  (**(THD**)(Global_THD_manager::thd_manager->
$14 = {m_ptr = 0x7faae463b060 "myuser", m_length = 6, m_charset = 0x1d21760,
  m_alloced_length = 8, m_is_alloced = true}
(gdb) p  (**(THD**)(Global_THD_manager::thd_manager-> + (sizeof(THD**)))).m_main_security_ctx.m_user
$15 = {m_ptr = 0x7faae46b1090 "root", m_length = 4, m_charset = 0x1d21760,
  m_alloced_length = 8, m_is_alloced = true}
(gdb) p  (**(THD**)(Global_THD_manager::thd_manager-> + (sizeof(THD**)))).m_thread_id
$16 = 9to confirm that I correctly get user names and thread ids for both 2 user threads I had in that "list". As usual Shane Bester was right!

Now, if you want to get more details about Global_THD_manager, you can just check the sql/mysqld_thd_manager.h file. I was interested mostly in the following:
  int get_num_thread_running() const { return num_thread_running; }
  uint get_thd_count() const { return global_thd_count; }

  static Global_THD_manager *thd_manager;

  // Array of current THDs. Protected by LOCK_thd_list.
  typedef Prealloced_array<THD*, 500, true> THD_array;
  THD_array thd_list;

  // Array of thread ID in current use. Protected by LOCK_thread_ids.
  typedef Prealloced_array<my_thread_id, 1000, true> Thread_id_array;
  Thread_id_array thread_ids;First of all, how consistent it is to use both int and uint data types for values that are always >=0... The fact that our thd_list elements is actually some template-based container, Prealloced_array, it also interesting, as it would be useful to find out how it is implemented. We can find all relevant details in the include/prealloced_array.h file. I'd like to highlight the following here:
"The interface is chosen to be similar to std::vector."

  size_t         m_size;
  size_t         m_capacity;
  // This buffer must be properly aligned.
  my_aligned_storage<Prealloc * sizeof(Element_type), MY_ALIGNOF(double)>m_buff;
Element_type *m_array_ptr;To summarize, MySQL 5.7+ uses more C++ now, with templates, singletons, iterators and more, but still Oracle prefers to implement their own container types instead of using some standard ones. One of these generic types, Prealloced_array, is widely used and is easy to deal with in gdb, as long as you know the element type.

This Week in Data with Colin Charles 31: Meltdown/Spectre Performance Regressions and Percona Live 2018

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Have you been following the Meltdown/Spectre performance regressions? Some of the best blog posts have been coming from Brendan Gregg, who’s keynoting at Percona Live this year. We’ve also got Scott Simpson from Upwork giving a keynote about how and why they use MongoDB. This is in addition to all the other fun talks we have, so please register now. Don’t forget to also book your hotel room!

Even though the Percona Live conference now covers much more than just MySQL, it’s worth noting that the MySQL Community Awards 2018: Call for Nominations! is happening now. You have until Friday, March 15, 2018, to make a nomination. Winners get into the Hall of Fame. Yes, I am also on the committee to make selections.

Another highlight: Open-sourcing a 10x reduction in Apache Cassandra tail latency by Dikang Gu of Instagram (Facebook). This is again thanks to RocksDB. Check out Rocksandra, and don’t forget to register for Percona Live to see the talk: Cassandra on RocksDB.

This week, I spent some time at Percona Headquarters in Raleigh, North Carolina. The building from the outside is pictured well in Google Maps. I thought it might be fun to show you a few photos (the office is huge with quite a handful working there despite the fact that Percona is largely remote).

Percona awards and bookshelf, featuring some very antique encyclopedias.


Peter Zaitsev, Percona CEO, outside his office (no, it is not an open office plan – everyone has rooms, including visitors like myself).


We’re all at SCALE16x now – so come see our talks (Peter Zaitsev and I are both speaking), and we have a booth where you can say hello to Rick Golba, Marc Sherwood and Dave Avery.

Releases Link List Upcoming appearances
  • SCALE16x – Pasadena, California, USA – March 8-11 2018
  • FOSSASIA 2018 – Singapore – March 22-25 2018

I look forward to feedback/tips via e-mail at or on Twitter @bytebot.

MySQL : command delimiter curiosity – go & ego

Recently, I received a question related to GO as delimter to send a query. The user got some generated statements from a third party tool that looked like this:

/* CreateTable VersionInfo */ CREATE TABLE VersionInfo (Version BIGINT NOT NULL) ENGINE = INNODB GO /* VersionMigration migrated */ /* VersionUniqueMigration migrating ========================================== */ /* CreateIndex VersionInfo (Version) */ CREATE UNIQUE INDEX UC_Version ON VersionInfo (Version ASC) GO

and so on…

To be honest I was lost ?! I never heard about that syntax and I was convinced that this was not valid (and you?)…

But in fact it is ! It’s the long command name for \g and EGO is the one for \G.

You can try help in the client (see the manual) and you will see it:

ego (\G) Send command to mysql server, display result vertically. ... go (\g) Send command to mysql server.

I wasn’t aware of this… So I tried of course:

node1 [localhost] {root} ((none)) > select now() -> go ->

so it failed, it expected ‘;‘, ‘\g‘ or ‘\G‘ to send the command…

Then I found this bug #69534 where the solution was explained: --named-commands. In fact the client needs to be started with this option to enable the long named commands, see the manual. The manual is not that clear as it takes for example \q and quit… but in fact both are working even when --named-commands is not enabled.

Let’s try starting the client with that option to see if it works:

node1 [localhost] {msandbox} ((none)) > select now() -> go +---------------------+ | now() | +---------------------+ | 2018-03-09 22:49:04 | +---------------------+ 1 row in set (0.00 sec)

Wuuhuuu It works \o/

So I learned something about MySQL that was completely NOT new. I don’t know how useful this can be other than for a quiz but at least if you have a tool generating statements like these ones, now you know how to use them even without replacing all GO by ‘;‘.

Cache amplification

How much of the database must be in cache so that a point-query does at most one read from storage? I call this cache-amplification or cache amplification. The answer depends on the index structure (b-tree, LSM, something else). Cache amplification can join read, write and space amplification. Given that RWS was renamed RUM by the excellent RUM Conjecture now we have CRUM which is close to crummy. I briefly wrote about this in a previous post.

To do at most 1 storage read for a point query:
  • clustered b-tree - everything above the leaf level must be in cache. This is a key/pointer pair per leaf block. The InnoDB primary key index is an example.
  • non-clustered b-tree - the entire index must be in cache. This is a key/pointer pair per row which is much more memory than the cache-amplification for a clustered-btree. Non-covering secondary indexes with InnoDB are an example, although in that case everything you must also consider the cache-amplification for the PK index.
  • LSM - I assume there is a bloom filter per SST. Bloom filters for all levels but the max level should be in cache. Block indexes for all levels should be in cache. Data blocks don't have to be in cache. I assume there are no false positives from the bloom filter so at most one data block will be read. Note that with an LSM, more space amplification means more cache amplification. So cache-amp is worse (higher) for tiered compaction than for leveled.
  • something else - there have a been a few interesting variants on the same theme that I call index+log -- BitCask, ForestDB and WiscKey. These are similar to a non-clustered b-tree in that the entire index must be in cache so that the storage read can be spent on reading the data from the log.
I have ignored hash-based solutions for now but eventually they will be important. SILT is a great example of a solution with excellent cache-amplification.

Updated to correct what should be in cache for the LSM.

MyRocks Storage Engine in MariaDB is Now Release Candidate

MyRocks Storage Engine in MariaDB is Now Release Candidate Sergey Petrunya Fri, 03/09/2018 - 11:19

The MyRocks storage engine was introduced in MariaDB Server 10.2 as an alpha plugin – the maturity of plugins is separate from the database. It became a beta plugin earlier this year, and with the release of MariaDB Server 10.3.5 (RC) last week, it is now a release candidate plugin.

So, what is MyRocks? It is a storage engine like InnoDB, but optimized for disk space and write efficiency. It uses a log-structured merge-tree (LSM Tree) technology to achieve higher compression and write performance. MyRocks is developed by Facebook, where it is used in production.

We make MyRocks available for MariaDB users via binaries and packages for recent versions of Ubuntu/Debian (deb), Red Hat/CentOS (rpm), Generic Linux (tarballs) and Microsoft Windows. For developers, you can continue to use features like common table expressions (CTEs) and window functions. For administrators, you can continue to enable and configuration parallel replication.

While it’s easy to get started with MyRocks using MariaDB, you have to run a couple of commands to enable it. The process is documented in our knowledge base (and there are links for further MyRocks documentation).

Additional Resources

The MyRocks storage engine was introduced in MariaDB Server 10.2 as an alpha plugin – the maturity of plugins is separate from the database. It became a beta plugin earlier this year, and with the release of MariaDB Server 10.3.5 (RC) last week, it is now a release candidate plugin.

Login or Register to post comments

Sneak Peek at Proxytop Utility

In this blog post, I’ll be looking at a new tool Proxytop for managing MySQL topologies using ProxySQL. Proxytop is a self-contained, real-time monitoring tool for ProxySQL. As some of you already know ProxySQL is a popular open source, high performance and protocol-aware proxy server for MySQL and its forks (Percona and MariaDB).

My lab uses MySQL and ProxySQL on Docker containers provided by Nick Vyzas. This lab also uses Alexey Kopytov’s Sysbench utility to perform benchmarking against ProxySQL.


Installation of Proxytop is pretty straightforward:

## You may first need to install system Python and MySQL dev packages ## e.g. "sudo apt install python-dev libmysqlclient-dev" pip install MySQL-python npyscreen wget -P /usr/bin

At this stage, we have everything we need to demonstrate Proxytop. The lab we have setup provides a bunch of bash scripts to demonstrate load for reruns. I’m using following script under the bin directory:

root@localhost docker-mysql-proxysql]# ./bin/docker-benchmark.bash [Fri Feb 16 10:19:58 BRST 2018] Dropping 'sysbench' schema if present and preparing test dataset:mysql: [Warning] Using a password on the command line interface can be insecure. [Fri Feb 16 10:19:58 BRST 2018] Running Sysbench Benchmarksi against ProxySQL:sysbench 1.0.12 (using bundled LuaJIT 2.1.0-beta2)

This script is totally customizable to benchmark as parameters can be tuned within the script:


Now let’s take a look at the Proxytop utility. It has menu driven style similarly to Innotop. Once you are in the tool, use [tab] to toggle between screens. Various shortcuts are also available to do things like changing sort order (‘s’), filter on specific criteria (‘l’) or changing the refresh interval for the view you are on (‘+’ / ‘-’).

Current, y it supports viewing the following aspects of a ProxySQL instance.

  • ConnPool – “ProxySQL Connection Pool” statistics
  • QueryRules – “ProxySQL Query Rules” statistics and definitions
  • GloStat – “ProxySQL Global Status” statistics
  • ProcList – “ProxySQL Processlist” for all incoming DML / DQL
  • ComCount – “ProxySQL Command Counter” statistics

We’ll go each of these screens in detail.

ConnPool Screen:

This screen basically shows the Connection Pool, specifically:

  • MySQL hostname and port
  • Assigned ProxySQL hostgroup
  • Connection statistics: Used / Free / OK / Error
  • MySQL Server state in ProxySQL i.e. ONLINE / OFFLINE / etc.
  • MySQL Server latency

Query Rules Screen:

This screen shows query rules and their use by count, and can be sorted either by rule_id or hits (ascending or descending) by cycling through the ordering list by pressing “s”.

It also allows you to view the actual definition of each rule by selecting and entering a rule. In the popup window, you will find a list of the relevant and defined columns for the query rule. For example:

If you have a lot of query rules defined, you can filter on a specific rule by pressing the letter “l”:

Global Statistics Screen: This screen shows Global Statistics from ProxySQL divided into four sections.

  • Connection Information
  • Prepared Statement Information
  • Command Information
  • Query Cache information

Proclist Screen: In this screen, we’re able to see running active queries with a minimum of a five-second refresh interval. In this way you can monitor long running queries in flight for troubleshooting:

ComCount Screen: This screen shows all command types executed with the total time and counts for each type, and also provides drill down to view the number of queries executed within specific ranges. This way type of workload can be easily identified both during testing and production:

You can drill down on each Com by using arrows and hitting enter key:

We all know the power of command line utilities such as proxysql-admin. The proxysql-admin utility is designed to be part of the configuration and ad-hoc monitoring of ProxySQL that is explained here in this blog post. Proxytop is designed to be menu driven to repeat commands in intervals. You can easily monitor and administer ProxySQL from the command line, but sometimes running recursive commands and monitoring over a period of time is annoying. This tool helps with that situation.

On RDBMS, NoSQL and NewSQL databases. Interview with John Ryan

“The single most important lesson I’ve learned is to keep it simple. I find designers sometimes deliver over-complex, generic solutions that could (in theory) do anything, but in reality are remarkably difficult to operate, and often misunderstood.”–John Ryan

I have interviewed John Ryan, Data Warehouse Solution Architect (Director) at UBS.


Q1. You are an experienced Data Warehouse architect, designer and developer. What are the main lessons you have learned in your career?

John Ryan: The single most important lesson I’ve learned is to keep it simple. I find designers sometimes deliver over-complex, generic solutions that could (in theory) do anything, but in reality are remarkably difficult to operate, and often misunderstood. I believe this stems from a lack of understanding of the requirement – the second most important lesson.

Everyone from the senior stakeholders to architects, designers and developers need to fully understand the goal. Not the solution, but the “problem we’re trying to solve”. End users never ask for what they need (the requirement), but instead, they describe a potential solution. IT professionals are by nature delivery focused, and, get frustrated when it appears “the user changed their mind”. I find the user seldom changes their mind. In reality, the requirement was never fully understood.

To summarise. Focus on the problem not the solution. Then (once understood), suggest a dozen solutions and pick the best one. But keep it simple.

Q2.  How has the Database industry changed in the last 20 years?

John Ryan: On the surface, not a lot. As recent as 2016 Gartner estimated Oracle, Microsoft and IBM accounted for 80% of the commercial database market, but that hides an underlying trend that’s disrupting this $50 billion industry.

Around the year 2000, the primary options were Oracle, DB2 or SQL Server with data warehouse appliances from Teradata and Netezza. Fast forward to today, and the database engine rankings include over 300 databases of which 50% are open source, and there are over 11 categories including Graph, Wide column, Key-value and Document stores; each suited to a different use-case.

While relational databases are still popular, 4 of the top 10 most popular solutions are non-relational (classed as NoSQL), including MongoDB, Redis and Cassandra. Cross-reference this against a of survey of the most highly sought skills, and we find MongoDB, Redis and Cassandra again in the top ranking, with nearly 40% of respondents seeking MongoDB skills compared to just 12% seeking Oracle expertise.

Likewise open source databases make up 60% of the top ten ranking, with open source database MySQL in second place behind Oracle in the rankings, and Gartner states that “By 2018, more than 70% of new in-house applications will be developed on an [Open Source] DBMS”.

The move towards cloud computing and database-as-a-service is causing further disruption in the data warehouse space with cloud and hybrid challengers including Vertica, Amazon Redshift and Snowflake.

In conclusion, the commercial relational vendors currently dominate the market in revenue terms. However, there has been a remarkable growth of open source alternatives, and huge demand for solutions to handle high velocity unstructured and semi-structured data. These use-cases including social media, and the Internet of Things are ill-suited to the legacy structured databases provided by Oracle, DB2 and SQL Server, and this void has been largely filled by open source NoSQL and NewSQL databases.

Q3. RDBMS vs. NoSQL vs. NewSQL: How do you compare Database Technologies?

John Ryan: The traditional RDBMS solutions from Oracle, IBM and Microsoft implement the relational model on a 1970s hardware architecture, and typically provide a good general purpose database platform which can be applied to most OLTP and Data Warehouse use cases.

However, as Dr. Michael Stonebraker indicated in this 2007 paper, The End of an Architectural Era (It’s Time for a Complete Rewrite), these are no longer fit for purpose, as both the hardware technology, and processing demands have moved on. In particular, the need for real time (millisecond) performance, greater scalability to handle web-scale applications, and the need to handle unstructured and semi-structured data.

Whereas the legacy RDBMS is a general purpose (will do anything) database, the NoSQL and NewSQL solutions are dedicated to a single problem, for example, short lived OLTP operations.

The Key-Value NoSQL databases were developed to handle the massive transaction volume, and low latency needed to handle web commerce at Amazon and LinkedIn. Others (eg. MongoDB) where developed to handle semi-structured data, while still others (eg. Neo4J) were built to efficiently model data networks of the type found at Facebook or LinkedIn.

The common thread with NoSQL databases is they tend to use an API interface rather than industry standard SQL, although increasingly that’s changing. They do however, entirely reject the relational model and ACID compliance. They typically don’t support cross-table join operations, and are entirely focused on low latency, trading consistency for scalability.

The so-called NewSQL databases include VoltDB , MemSQL and CockroachDB are a return to the relational model, but re-architected for modern hardware and web scale use cases.  Like NoSQL solutions, they tend to run on a shared nothing architecture, and scale to millions of transactions per second, but they also have full transaction support and ACID compliance that are critical for financial operations.

Q4. What are the typical trade-off of performance and consistency when using NoSQL and NewSQL databases to support high velocity OLTP and real time analytics?

John Ryan: The shared nothing architecture is built to support horizontal scalability, and when combined with data replication, can provide high availability and good performance. If one node in the cluster fails, the system continues, as the data is available on other nodes. The NoSQL database is built upon this architecture, and to maximize throughput, ACID compliance is relaxed in favor of Eventual Consistency, and in some cases (eg. Cassandra), it supports tunable consistency, allowing the developer to trade performance for consistency, and durability.

For example, after a write operation, the change cannot be considered durable (the D in ACID) until the change is replicated to at least one, ideally two other nodes , but this would increase latency, and reduce performance. It’s possible however, to relax this constraint, and return immediately, with the risk the change may be lost if the node crashes before the data is successfully replicated. This becomes even more of a potential issue if the node is temporarily disconnected from the network, but is allowed to continue accepting transactions until the connection is restored. In practice, consistency will be eventually be achieved when the connection is reestablished – hence the term Eventual Consistency.

A NewSQL database on the other hand accepts no such compromise, and some databases (eg. VoltDB), even support full serializability, executing transactions as if they were executed serially. Impressively, they manage this impressive feat at a rate of millions of transactions per second, potentially on commodity hardware.

Q5. One of the main challenges for real time systems architects is the potentially massive throughput required which could exceed a million transactions per second.  How do handle such a challenge?

John Ryan: The short answer is – with care! The longer answer is described in my article, Big Data – Velocity. I’d break the problem into three components, Data Ingestion, Transformation and Storage.

Data ingestion requires message based middleware (eg. Apache Kafka), with a range of adapters and interfaces, and the ability to smooth out the potentially massive spikes in velocity, with the ability to stream data to multiple targets.

Transformation, typically requires an in-memory data streaming solution to restructure and transform data in near-real time. Options include Spark Streaming, Storm or Flink.

Storage and Analytics is sometimes handled by a NoSQL database, but for application simplicity (avoiding the need to implement transactions or handle eventual consistency problems in the application), I’d recommend a NewSQL database.
All the low-latency, high throughput of the NoSQL solutions, but with the flexibility and ease of a full relational database, and full SQL support.

In conclusion, the solution needs to abandon the traditional batch oriented solution in favour of an always-on streaming solution with all processing in memory.

Q6.  Michael Stonebraker introduced the so called “One Size no longer fits all”-concept. Has this concept come true on the database market?

John Ryan: First stated in the paper One Size Fits All – An Idea Whose Time Has Come And Gone Dr. Michael Stonebraker argued that the legacy RDBMS dominance was at an end, and would be replaced by specialized database technology including stream processing, OLTP and Data Warehouse solutions.

Certainly disruption in the Data Warehouse database market has been accelerated with the move towards the cloud, and as this Gigaom Report illustrates, there are at least nine major players in the market, with new specialized tools including Google Big Query, Amazon Redshift and Snowflake, and the column store (in memory or on secondary storage) dominates.

Finally, the explosion of specialized NoSQL and NewSQL databases, each with its own specialty including Key-Value, Document Stores, Text Search and Graph databases lend credence to the statement “One Size no longer fits all”.

I do think however, we’re still in a transformation stage, and the shake-out is not yet complete. I think a lot of large corporations (especially Financial Services) are wary of change, but it’s already happening.

I think the quote Stewart Brand is appropriate: “Once a new technology rolls over you, if you’re not part of the steamroller, your part of the road”.

Q7. Eventual consistency vs ACID. A pragmatic approach or a step too far?

John Ryan: As with so many decisions in IT, it depends. Eventual Consistency was built into the Amazon Dynamo database as a pragmatic decision, because it’s difficult to maintain high throughput and availability at scale. Amazon accepted the relatively minor risk of inconsistencies because of the huge benefits including scalability and availability. In many other web scale applications (eg. Social media), the implications of temporary inconsistency are not important, and it’s therefore an acceptable approach.

Having said that, it does place a huge addition burden on the developer to code for relatively rare and unexpected conditions, and I’d question why would anyone settle for a database which supported eventual consistency, when full ACID compliance and transaction handling is available.

Q8. What is your take on the Lambda Architecture ?

John Ryan: The Lambda Architecture is a an approach to handle massive data volumes, and provide real time results by running two parallel solutions, a Batch Processing and a real time Speed Processing stream.

As a Data Warehouse and ETL designer, I’m aware of the complexity involved in data transformation, and I was immediately concerned about an approach which involved duplicating the logic, probably in two different technologies.

I’d also question the sense of repeatedly executing batch processing on massive data volumes when processing is moving to the cloud, on a pay for what you use basis.

I’ve since found this article on Lambda by Jay Kreps of LinkedIn and more recently Confluent (who developed Apache Kafka), and he describes the challenges from a position of real (and quite painful) experience.

The article recommends an alternative approach, a single Speed Processing stream with slight changes to allow data to be reprocessed (the primary advantage of the Lambda architecture). This solution, named the Kappa Architecture, is based upon the ability of Kafka to retain and re-play data, and it seems an entirely sensible and more logical approach.

Qx Anything else you wish to add?

John Ryan: Thank you for this opportunity. It may be a Chinese curse that “may you live in interesting times”, but I think it’s a fascinating time to be working in the database industry.


John Ryan, Data Warehouse Solution Architect (Director), UBS.

John has over 30 years experience in the IT industry, and has maintained a keen interest in database technology since his University days in the 1980s when he invented a B-Tree index, only to find Oracle (and several others) had got there already.

He’s worked as a Data Warehouse Architect and Designer for the past 20 years in a range of industries including Mobile Communications, Energy and Financial Services. He’s regularly found writing articles on Big Data and database architecture, and you can find him on LinkedIn.





Related Posts

– On the InterSystems IRIS Data Platform ,ODBMS Industry Watch,  2018-02-09

– Facing the Challenges of Real-Time Analytics. Interview with David Flower, ODBMS Industry Watch, 2017-12-19

– On the future of Data Warehousing. Interview with Jacque Istok and Mike Waas, ODBMS Industry Watch, 2017-11-09

– On Vertica and the new combined Micro Focus company. Interview with Colin Mahony, ODBMS Industry Watch, 2017-10-25

– On Open Source Databases. Interview with Peter Zaitsev, ODBMS Industry Watch, 2017-09-06

Follow us on Twitter: @odbmsorg



MySQL 8.0 : more about new authentication plugin and replication

MySQL 8.0’s new default authentication plugin means more secure connections for users connections but also for replication… but you need to be aware of it !

Yesterday Giuseppe – the datacharmer – Maxia, submitted a bug related to a strange behavior of MySQL’s replication.

He observed that after creating a new user for replication (using the new authentication method, caching_sha2_password), when he created the slave as usual, replication was not starting… Slave_IO_Running was constantly in “Connecting” and the Last_IO_Error was:

Last_IO_Error: error connecting to master 'repl2@' - retry-time: 60 retries: 51

So, like almost everybody would do, Giuseppe, tried the credentials he created using the mysql client… and it worked ! He could connect to the master, so the credentials where correct ! But the most surprising thing is that replication restarted automatically at the next retry.

He didn’t change anything and all sudendly it worked. I’m sure he tried this scenario several times with always the same result before submitting the bug.

But this is not a bug… In fact, with the new default authentication plugin you need to have the public key of the master for RSA key pair-based password exchange.

With Asynchronous Replication, you need to add to your CHANGE MASTER statement one of the two options below:

  •  MASTER_PUBLIC_KEY_PATH, which indicates the path name to a file containing a slave-side copy of the public key required.
  •  GET_MASTER_PUBLIC_KEY, which notifies the SLAVE to request the public key to the MASTER when set to 1.

For Group Replication, there are also two options:

  • group_replication_recovery_public_key_path
  • group_replication_recovery_get_public_key

In Giuseppe’s case, why did the connection from the client helped ?

This is because caching_sha2_password creates an in-memory cache after first successful authentication through secure channel for a given user account. This cache allows fast authentication based on sha256 scramble for subsequent connection.


But if you setup replication like that without using the appropriate replication options listed above, your solution is temporary, because if the cache gets cleared (FLUSH PRIVILEGES, or restart of the server), the connection will fail again.

Also please note, that this is not the safer way, defining the path and let the DBA distribute the public key is of course more secure, and the the best remains to use SSL.


Indeed, sometimes safer rhymes with a little more difficulty, or at least with some more operations. This means that if you are using, and I recommend you do to so, the new authentication plugin, you will have to manage your keys or at least setup replication to get it from the master.

All this is already in the manual, you can find all this information in the following pages :

Binlog Encryption with Percona Server for MySQL

In this blog post, we’ll look at how to turn on binlog encryption in Percona Server for MySQL.

Why do I need this?

As you probably know, Percona Server for MySQL’s binlog contains sensitive information. Replication uses the binlog to copy events between servers. They contain all the information from one server so that it can be applied on another. In other words, if somebody has access to a binlog, it means they have access to all the data in the server. Moreover, said person (or, “Hacker”) could create a clone copy of our server by just making a replica of it. In the end, they have access to our binlog. This shows how important protecting a binlog really is – leakage of binlogs not only make a particular table/tablespace or a group of tables accessible to a hacker, but literally the whole server is at risk. The same situation is true with relay log – a relay log is really a copy of binlog on the slave server.

But have no fear – a new feature to the rescue – binary log encryption. Since Percona Server for MySQL version 5.7.20-19 (beta version) it is possible to enable binlog encryption for all the binlogs and relay logs produced by the server.

How do you turn it on?

To start binlog encryption, you need to start the server with –encrypt-binlog=1. This, in turn, requires –master_verify_checksum and –binlog_checksum both to be ON. Also, you need to install one of the keyring plugins.

From now on all the binlogs and relay logs produced by the server get encrypted. However, for the replication to be safe as a whole the connection between servers also has to be encrypted. See for details on how to do this.

Please note that this does not mean that all binlogs in our replication schema get encrypted. Remember you need to turn on encrypt-binlog on slave servers too, even if they do not produce binlog files. Slave servers still produce relay logs when replicating from a master server. Turn on encrypt-binlog on slave servers so that their relay logs also get encrypted.

How does this work in the big picture?

The master encrypts the event before writing it into the binlog. The slave connects to master and ask for events. The master decrypts the events from the binary log and sends them over to slave.

Note that events send between the master and slave servers are not encrypted! This is why the connection between the master and slave needs to use a secure channel, i.e., TLS.

The slave receives events from the master, encrypts them and writes them down into the relay log.

That is why we need to enable encrypt-binlog on a slave. The relay log has to get encrypted too.

Next, the slave decrypts events from relay log and applies them. After applying the event the slave encrypts it and writes it down into its binlog file (given binlog is enabled on the slave).

In summary to make our replication secure, we need:

  • Turn on encrypt-binlog on the master
  • Turn on encrypt-binlog on the slave
  • The connection between master and slave needs to use TLS.

It’s worth noting that servers in replication have no idea if other servers are encrypted or not.

Why do master_verify_checksum and binlog_checksum need to be turned ON?

This is needed for “authenticate encryption”. Simply put, this is how we make sure that what we decrypt has not been changed by a third party. Also, it checks if the key that was used to decrypt the event was the correct one.

Digging deeper with mysqlbinlog

Mysqlbinlog is a standalone application that lets you read binlog files. As I write this blog post, it is not capable of decrypting binary logs – at least not by itself. However, it still can read encrypted binlog files when using a running Percona Server for MySQL. Use option –read-from-remote-server to read binary log produced by a given server.

Let’s see what happens when we try to read an encrypted binlog with mysqlbinlog without read-from-remote-server enabled. You will get something like this:

As you can see it is only possible to read binary log till event type 9f gets read. This event is the Start_encryption_event. After this event the rest of the binlog is encrypted. One thing to note is that Start_encryption_event is never propagated in replication. For instance, the master server is run with –encryt_binlog. This means that the server writes Start_encryption_event to its binary logs. However it is never sent to the slave server (the slave has no idea whether the master is encrypted).

Another option you can use with mysqlbinlog is –force option. It forces mysqlbinlog to read all the events from the binlog, even if they are encrypted. You will see something like this in the output:

As you can see, it is only possible to read two first events – until the Start_encryption_event. However, this time we can see that there are other events that follow, which are encrypted.

Running mysqlbinlog (without –read-from-remote) on encrypted binary logs may only make sense if we want to see if a given binary log is encrypted. For point-in-time recovery, and for other purposes that might require reading encrypted binlog, we would use mysqlbinlog with –read-from-remote option.

For instance, if we want to read binlog master-bin.000001, and Percona Server for MySQL is running on, port 3033, with user:robert, password:hard_password, we would use mysqlbinlog like this:

mysqlbinlog –read-from-remote-server –protocol=tcp –host= –port=3033 –user=robert –password=hard_password master-bing.000001.

When you look at the output of this command, you see something like this:

You can now see the decrypted binlog. One interesting thing to note here is that we do not see our Start_encryption_event (type 9f). This proves my point – Start_encryption_event never leaves the server (we are reading from the server now as we use –read-from-remote-server).

For more information how to use mysqlbinlog for point-in-time recovery see

However, for more modern approaches for point-in-time recovery that do not use mysqlbinlog and make use of parallel appliers, see here:

Have fun with binlog encryption!

Migrating MySQL Users to Amazon RDS

In this blog post, we’ll look at what is needed when migrating MySQL users to Amazon RDS. We’ll discuss how we can transform MySQL user grants and make them compatible with Amazon RDS.

In order to deliver a managed service experience, Amazon RDS does not provide shell access to the underlying operating system. It also restricts access to certain procedures that require advanced privileges.

Every MySQL instance has some users with ALL PRIVILEGES, and you can’t directly migrate these users to Amazon RDS because it does not support following privileges for regular users.

  • SUPER – Enable use of other administrative operations such as CHANGE MASTER TO, KILL, PURGE BINARY LOGS, SET GLOBAL, and mysqladmin debug command. Level: Global.
  • SHUTDOWN – Enable use of mysqladmin shutdown. Level: Global.
  • FILE – Enable the user to cause the server to read or write files. Level: Global.
  • CREATE TABLESPACE – Enable tablespaces and log file groups to be created, altered, or dropped. Level: Global.

The RDS parameter groups manage changes to the MySQL configuration (dynamic and non-dynamic variables). Amazon RDS also provides stored procedures to perform various administrative tasks that require SUPER privileges.

For example, we’ve got this user in MySQL instance running on Amazon EC2.

db01 (none)> show grants for percona@'%'; +-----------------------------------------------------------------------------------------------------------------------------------+ | Grants for percona@% | +-----------------------------------------------------------------------------------------------------------------------------------+ | GRANT ALL PRIVILEGES ON *.* TO 'percona'@'%' IDENTIFIED BY PASSWORD '*497030855D20D6B22E65436D0DFC75AA347B32F0' WITH GRANT OPTION | +-----------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)

If we try to run the same grants in RDS, it will fail.

[RDS] (none)> GRANT ALL PRIVILEGES ON *.* TO 'percona'@'%' IDENTIFIED BY PASSWORD '*497030855D20D6B22E65436D0DFC75AA347B32F0' WITH GRANT OPTION; ERROR 1045 (28000): Access denied for user 'admin'@'%' (using password: YES)

We’ll follow these steps for migrating users to RDS.

  1. Identify users with privileges that aren’t supported by RDS.
  2. Export their grants using pt-show-grants.
  3. Import grants in a separate clean MySQL instance running the same version.
  4. Remove the forbidden privileges using the REVOKE statement.
  5. Export grants again using pt-show-grants and load them to RDS.
Identify users having privileges that aren’t supported by RDS

First, we’ll find the users with privileges that aren’t supported by Amazon RDS. I’ve excluded the localhost users because there is no direct shell access in RDS and you shouldn’t migrate these users.

db01 (none)> select concat("'",user,"'@'",host,"'") as 'user', CONCAT("REVOKE SUPER, SHUTDOWN, FILE, CREATE TABLESPACE ON *.* FROM '",user,"'@'",host,"';") as 'query' from mysql.user where host not in ('localhost','') and (Super_Priv='Y' OR Shutdown_priv='Y' OR File_priv='Y' OR Create_tablespace_priv='Y'); +---------------+----------------------------------------------------------------------------+ | user | query | +---------------+----------------------------------------------------------------------------+ | 'appuser'@'%' | REVOKE SUPER, SHUTDOWN, FILE, CREATE TABLESPACE ON *.* FROM 'appuser'@'%'; | | 'percona'@'%' | REVOKE SUPER, SHUTDOWN, FILE, CREATE TABLESPACE ON *.* FROM 'percona'@'%'; | +---------------+----------------------------------------------------------------------------+ 2 rows in set (0.00 sec)

We’ve two users with incompatible grants. Let’s transform their grants to make them compatible with RDS. We’ll use the query in second column output later in this process.

Export grants using pt-show-grants

The next step is exporting these two users’ grants using pt-show-grants:

[root@db01 ~]# pt-show-grants --only='appuser'@'%','percona'@'%' -- Grants dumped by pt-show-grants -- Dumped from server Localhost via UNIX socket, MySQL 5.6.38-83.0 at 2018-02-24 10:02:21 -- Grants for 'appuser'@'%' GRANT FILE ON *.* TO 'appuser'@'%' IDENTIFIED BY PASSWORD '*46BDE570B30DFEDC739A339B0AFA17DB62C54213'; GRANT ALTER, CREATE, CREATE TEMPORARY TABLES, DELETE, DROP, EXECUTE, INSERT, LOCK TABLES, SELECT, TRIGGER, UPDATE ON `sakila`.* TO 'appuser'@'%'; -- Grants for 'percona'@'%' GRANT ALL PRIVILEGES ON *.* TO 'percona'@'%' IDENTIFIED BY PASSWORD '*497030855D20D6B22E65436D0DFC75AA347B32F0' WITH GRANT OPTION;

As we can see from above output, both users have at least one privilege that isn’t supported by RDS. Now, all we need to do is to import these users into a separate clean MySQL instance running the same version, and REVOKE the privileges that aren’t supported by RDS.

Import users in a separate MySQL instance running the same version

I’m going to import grants in a separate VM where I’ve just installed Percona Server for MySQL 5.6. Let’s call this instance as db02:

[root@db02 ~]# pt-show-grants --host=db01 --only='appuser'@'%','percona'@'%' --user=percona --ask-pass | mysql Enter password:

Remove the forbidden privileges using the REVOKE statement

In this step, we will use REVOKE statement from Step 1 to remove the privileges that aren’t supported by Amazon RDS:

db02 (none)> REVOKE SUPER, SHUTDOWN, FILE, CREATE TABLESPACE ON *.* FROM 'appuser'@'%'; Query OK, 0 rows affected (0.00 sec) db02 (none)> REVOKE SUPER, SHUTDOWN, FILE, CREATE TABLESPACE ON *.* FROM 'percona'@'%'; Query OK, 0 rows affected (0.00 sec)

Export grants again using pt-show-grants and load them to RDS

At this point, db02 has the grants that are compatible with RDS. Let’s take a look at them:

[root@db02 ~]# pt-show-grants --only='appuser'@'%','percona'@'%' -- Grants dumped by pt-show-grants -- Dumped from server Localhost via UNIX socket, MySQL 5.6.39-83.1 at 2018-02-24 10:10:38 -- Grants for 'appuser'@'%' GRANT USAGE ON *.* TO 'appuser'@'%' IDENTIFIED BY PASSWORD '*46BDE570B30DFEDC739A339B0AFA17DB62C54213'; GRANT ALTER, CREATE, CREATE TEMPORARY TABLES, DELETE, DROP, EXECUTE, INSERT, LOCK TABLES, SELECT, TRIGGER, UPDATE ON `sakila`.* TO 'appuser'@'%'; -- Grants for 'percona'@'%' GRANT ALTER, ALTER ROUTINE, CREATE, CREATE ROUTINE, CREATE TEMPORARY TABLES, CREATE USER, CREATE VIEW, DELETE, DROP, EVENT, EXECUTE, INDEX, INSERT, LOCK TABLES, PROCESS, REFERENCES, RELOAD, REPLICATION CLIENT, REPLICATION SLAVE, SELECT, SHOW DATABASES, SHOW VIEW, TRIGGER, UPDATE ON *.* TO 'percona'@'%' IDENTIFIED BY PASSWORD '*497030855D20D6B22E65436D0DFC75AA347B32F0' WITH GRANT OPTION;

These grants look good, these can be safely migrated to RDS now. Let’s do it:


We have successfully migrated users to Amazon RDS, which would have failed in direct migration.

What about rest of the users that don’t have SUPER/SHUTDOWN/FILE/CREATE TABLESPACE privileges? Well, it’s easy. We can migrate them directly using pt-show-grants. They don’t need any transformation before migration.

List them using the following query:

db01 (none)> select concat("'",user,"'@'",host,"'") as 'user' from mysql.user where host not in ('localhost','') and (Super_Priv<>'Y' AND Shutdown_priv<>'Y' AND File_priv<>'Y' AND Create_tablespace_priv<>'Y'); +-----------------------+ | user | +-----------------------+ | 'readonly'@'%' | | 'repl'@'' | +-----------------------+ 2 rows in set (0.01 sec)

Export them using pt-show grants and load to RDS.

[root@db01 ~]# pt-show-grants --only='readonly'@'%','repl'@'' | mysql --host=<rds.endpoint> --user=percona -p Enter password:


Amazon RDS is a great platform for hosting your MySQL databases. When migrating MySQL users to Amazon RDS, some grants might fail because of having privileges that aren’t supported by RDS. Using pt-show-grants from Percona Toolkit and a separate clean MySQL instance, we can easily transform grants and migrate MySQL users to Amazon RDS without any hassle.

Comparison of Window Functions & CTEs in MySQL 8 vs MariaDB

Every MySQL database programmer should learn and apply the newly added MariaDB and MySQL Window Functions and Common Table Expressions(CTEs) in their daily work. Both CTEs and window functions enable easy solutions to many query challenges that in prior releases have been difficult and sometimes impossible to surmount. Mastering these features opens the door to query solutions that are more robust, execute faster, and are easier to maintain over time than prior solutions using older techniques.

In our last blog we compared User Roles in MySQL 8 vs. MariaDB.  Today, we will here compare Window Functions and Common Table Expressions in both databases.

Window Functions

While all database administrators are familiar with aggregate functions like COUNT(), SUM(), and AVG(), far less people make use of window functions in their queries.  Unlike aggregate functions, which operate on an entire table, window functions operate on a set of rows and return a single aggregated value for each row.

The main advantage of using window functions over regular aggregate functions is that window functions do not cause rows to become grouped into a single output row.  Instead, the rows retain their separate identities and an aggregated value is added to each row.

Window Functions in MariaDB

Windowing functions were added to the ANSI/ISO Standard SQL:2003 and then extended in ANSI/ISO Standard SQL:2008. DB2, Oracle, Sybase, PostgreSQL and other products have had full implementations for years. Other vendors added support for window functions later on. Case in point, when Microsoft introduced Window Functions in SQL Server 2005, it only included a handful of functions, namely ROW_NUMBER, RANK, NTILE and DENSE_RANK. It was not until SQL Server 2012 that they implemented a full range of Window Functions.

After numerous wishes and feature requests for window functions over the years, they were finally introduced in MariaDB 10.2.0 to great fanfare!  Now MariaDB includes window functions such as ROW_NUMBER, RANK, DENSE_RANK, PERCENT_RANK, CUME_DIST, NTILE, COUNT, SUM, AVG, BIT_OR, BIT_AND and BIT_XOR.

The Syntax

Window function queries are characterised by the OVER keyword, following which the set of rows used for the calculation is specified. By default, the set of rows used for the calculation (the “window”) is the entire dataset, which can be ordered with the ORDER BY clause. The PARTITION BY clause is then employed to reduce the window to a particular group within the dataset.

Here’s an example:
Given the following table of student test scores:

+------------+---------+--------+ | name | test | score | +------------+---------+--------+ | Steve | SQL | 75 | +------------+---------+--------+ | Robert | SQL | 43 | +------------+---------+--------+ | Tracy | SQL | 56 | +------------+---------+--------+ | Tatiana | SQL | 87 | +------------+---------+--------+ | Steve | Tuning | 73 | +------------+---------+--------+ | Robert | Tuning | 31 | +------------+---------+--------+ | Tracy | Tuning | 88 | +------------+---------+--------+ | Tatiana | Tuning | 83 | +------------+---------+--------+

The following two queries return the average test scores partitioned by test and by name respectively – in other words, aggregated by test and by name:

SELECT name, test, score, AVG(score) OVER (PARTITION BY test) AS average_by_test FROM test_scores; +----------+--------+-------+-----------------+ | name | test | score | average_by_test | +----------+--------+-------+-----------------+ | Steve | SQL | 75 | 65.2500 | | Steve | Tuning | 73 | 68.7500 | | Robert | SQL | 43 | 65.2500 | | Robert | Tuning | 31 | 68.7500 | | Tracy | SQL | 56 | 65.2500 | | Tracy | Tuning | 88 | 68.7500 | | Tatiana | SQL | 87 | 65.2500 | | Tatiana | Tuning | 83 | 68.7500 | +----------+--------+-------+-----------------+ SELECT name, test, score, AVG(score) OVER (PARTITION BY name) AS average_by_name FROM student; +---------+--------+-------+-----------------+ | name | test | score | average_by_name | +---------+--------+-------+-----------------+ | Steve | SQL | 75 | 74.0000 | | Steve | Tuning | 73 | 74.0000 | | Robert | SQL | 43 | 37.0000 | | Robert | Tuning | 31 | 37.0000 | | Tracy | SQL | 56 | 72.0000 | | Tracy | Tuning | 88 | 72.0000 | | Tatiana | SQL | 87 | 85.0000 | | Tatiana | Tuning | 83 | 85.0000 | +---------+--------+-------+-----------------+

In both cases, note that the original scores are still available to each row.

Window Functions in MySQL 8

MySQL has been even later in adopting the Window Functions standard, with it being part of version 8.0 that is expected to be released later this year.   

MySQL employs the same ANSI/ISO Standard as other DBMSes whereby Window function queries are characterised by the OVER keyword and the PARTITION BY clause is employed to reduce the window to a specific group within the result set.

The currently supported functions include:

Name Description CUME_DIST() Cumulative distribution value DENSE_RANK() Rank of current row within its partition, without gaps FIRST_VALUE() Value of argument from the first row of window frame LAG() Value of argument from row lagging current row within partition LAST_VALUE() Value of argument from the last row of window frame LEAD() Value of argument from row leading current row within partition NTH_VALUE() Value of argument from N-th row of window frame NTILE() Bucket number of the current row within its partition. PERCENT_RANK() Percentage rank value RANK() Rank of current row within its partition, with gaps ROW_NUMBER() Number of current row within its partition

As an example, we’ll explore the CUME_DIST() function.

It returns the cumulative distribution of a value within a group of values; that is, the percentage of partition values less than or equal to the value in the current row. This represents the number of rows preceding or peer with the current row in the window ordering of the window partition divided by the total number of rows in the window partition. Return values range from 0 to 1.

This function is usually used with ORDER BY to sort partition rows into the desired order. Without ORDER BY, all rows are peers having a value of 1.

The following query shows, for the set of values in the val column, the CUME_DIST() value for each row, as well as the percentage rank value returned by the similar PERCENT_RANK() function. For reference, the query also displays row numbers using ROW_NUMBER():

SELECT val, ROW_NUMBER() OVER w AS 'row_number', CUME_DIST() OVER w AS 'cume_dist', PERCENT_RANK() OVER w AS 'percent_rank' FROM numbers WINDOW w AS (ORDER BY val); +------+------------+--------------------+--------------+ | val | row_number | cume_dist | percent_rank | +------+------------+--------------------+--------------+ | 1 | 1 | 0.2222222222222222 | 0 | | 1 | 2 | 0.2222222222222222 | 0 | | 2 | 3 | 0.3333333333333333 | 0.25 | | 3 | 4 | 0.6666666666666666 | 0.375 | | 3 | 5 | 0.6666666666666666 | 0.375 | | 3 | 6 | 0.6666666666666666 | 0.375 | | 4 | 7 | 0.8888888888888888 | 0.75 | | 4 | 8 | 0.8888888888888888 | 0.75 | | 5 | 9 | 1 | 1 | +------+------------+--------------------+--------------+

The OVER clause is permitted for many aggregate functions, including:

  • AVG()
  • COUNT()
  • MAX()
  • MIN()
  • SUM()
  • VAR_SAMP()

These can be used as window or non-window functions, depending on whether the OVER clause is present or absent:

MySQL also supports non-aggregate functions that are used only as window functions. For these, the OVER clause is mandatory:

  • LAG()
  • LEAD()
  • NTILE()
  • RANK()

As an example of a non-aggregate window functions, this query uses ROW_NUMBER(), which produces the row number of each row within its partition. In this case, rows are numbered per country. By default, partition rows are unordered and row numbering is indeterminate. To sort partition rows, include an ORDER BY clause within the window definition. The query uses unordered and ordered partitions (the row_num1 and row_num2 columns) to illustrate the difference that omitting and including ORDER BY makes:

SELECT year, country, product, profit, ROW_NUMBER() OVER(PARTITION BY country) AS row_num1, ROW_NUMBER() OVER(PARTITION BY country ORDER BY year, product) AS row_num2 FROM sales; +------+---------+------------+--------+----------+----------+ | year | country | product | profit | row_num1 | row_num2 | +------+---------+------------+--------+----------+----------+ | 2000 | Finland | Computer | 1500 | 2 | 1 | | 2000 | Finland | Phone | 100 | 1 | 2 | | 2001 | Finland | Phone | 10 | 3 | 3 | | 2000 | India | Calculator | 75 | 2 | 1 | | 2000 | India | Calculator | 75 | 3 | 2 | | 2000 | India | Computer | 1200 | 1 | 3 | | 2000 | USA | Calculator | 75 | 5 | 1 | | 2000 | USA | Computer | 1500 | 4 | 2 | | 2001 | USA | Calculator | 50 | 2 | 3 | | 2001 | USA | Computer | 1500 | 3 | 4 | | 2001 | USA | Computer | 1200 | 7 | 5 | | 2001 | USA | TV | 150 | 1 | 6 | | 2001 | USA | TV | 100 | 6 | 7 | +------+---------+------------+--------+----------+----------+

Common Table Expressions (CTEs)

A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. However, unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.

A CTE can be used to:

  • Create a recursive query.
  • Substitute for a view when the general use of a view is not required; that is, you do not have to store the definition in metadata.
  • Enable grouping by a column that is derived from a scalar subselect, or a function that is either not deterministic or has external access.
  • Reference the resulting table multiple times in the same statement.

Using a CTE offers the advantages of improved readability and ease in maintenance of complex queries. The query can be divided into separate, simple, logical building blocks. These simple blocks can then be used to build more complex, interim CTEs until the final result set is generated.

Recursive CTEs

Recursive common table expressions (CTEs) were an implementation of standard SQL:1999 for hierarchical queries.  The first implementations of Recursive CTEs began appearing in 2007. The recursive CTEs from the standard were relatively close to the existing implementation in IBM DB2 version 2.  Recursive CTEs were eventually supported by Microsoft SQL Server (since SQL Server 2008 R2), Firebird 2.1, PostgreSQL 8.4+, SQLite 3.8.3+, Oracle 11g Release 2, and IBM Informix version 11.50+.

Without Common-table-expressions or a connected-by clause it is still possible to achieve hierarchical queries with user-defined recursive functions, but these tend to result in very complex SQL.

CTEs in MariaDB

In MariaDB, a non-recursive CTE is basically considered to be a query-local VIEW whose syntax is more readable than nested FROM (SELECT …). A CTE can refer to another and it can be referenced from multiple places.

Thus, CTEs are similar to derived tables. For example,

SQL with derived table:

SELECT * FROM ( SELECT * FROM employees WHERE dept = 'Engineering' ) AS engineers WHERE ...

SQL with CTE:

WITH engineers AS ( SELECT * FROM employees WHERE dept = 'Engineering' ) SELECT * FROM engineers WHERE ...

SQL is generally poor at recursion.  One of the advantages of CTEs is that they permit a query to reference itself, hence recursive SQL. A recursive CTE will repeatedly execute subsets of the data until it obtains the complete result set. This makes it particularly useful for handling hierarchical or tree-structured data.

With recursive CTEs you can achieve things that would be very difficult to do with standard SQL and at a faster execution speed. They can help solve many types of business problems and even simplify some complex SQL/application logic down to a simple recursive call to the database.

Some example uses for recursive CTE are to find gaps in data, create organization charts and create test data.

WITH RECURSIVE signifies a recursive CTE. It is given a name, followed by a body (the main query) as follows:

Below is a recursive CTE that counts from 1 to 50. WITH cte AS (SELECT 1 AS n -- anchor member UNION ALL SELECT n + 1 -- recursive member FROM cte WHERE n < 50 -- terminator ) SELECT n FROM cte;

The above statement prints a number series from 1 to 49.


MySQL 8.0 adds CTEs via the standard WITH keyword, in much the same way it is implemented in competing products.

To specify common table expressions, use a WITH clause that has one or more comma-separated subclauses. Each subclause provides a subquery that produces a result set, and associates a name with the subquery. The following example defines CTEs named cte1 and cte2 in the WITH clause, and refers to them in the top-level SELECT that follows the WITH clause:

WITH cte1 AS (SELECT a, b FROM table1), cte2 AS (SELECT c, d FROM table2) SELECT b, d FROM cte1 JOIN cte2 WHERE cte1.a = cte2.c;

Immediately preceding SELECT for statements that include a SELECT statement:


A recursive common table expression is one having a subquery that refers to its own name. For example:

WITH RECURSIVE cte (n) AS ( SELECT 1 UNION ALL SELECT n + 1 FROM cte WHERE n < 5 ) SELECT * FROM cte; +------+ | n | +------+ | 1 | | 2 | | 3 | | 4 | | 5 | +------+

Window Functions and Common Table Expressions (CTEs) have been a mainstay of many popular database products for some time now.  With the release of MySQL version 8 and MariaDB 10.2.0, both vendors have caught up with competing DBMSes such as SQL Server and Oracle.

The post Comparison of Window Functions & CTEs in MySQL 8 vs MariaDB appeared first on SQLyog Blog.

Updated: ClusterControl Tips & Tricks - Transparent Database Failover for your Applications

ClusterControl is a great tool to deploy and manage databases clusters - if you are into MySQL, you can easily deploy clusters based on both traditional MySQL master-slave replication, Galera Cluster or MySQL NDB Cluster. To achieve high availability, deploying a cluster is not enough though. Nodes may (and will most probably) go down, and your system has to be able to adapt to those changes.

This adaptation can happen at different levels. You can implement some kind of logic within the application - it would check the state of cluster nodes and direct traffic to the ones which are reachable at the given moment. You can also build a proxy layer which will implement high availability in your system. In this blog post, we’d like to share some tips on how you can achieve that using ClusterControl.

Deploying HAProxy using the ClusterControl

HAProxy is the standard - one of the most popular proxies used in connection with MySQL (but not only, of course). ClusterControl supports deployment and monitoring of HAProxy nodes. It also helps to implement high availability of the proxy itself using keepalived.

Deployment is pretty simple - you need to pick or fill in the IP address of a host where HAProxy will be installed, pick port, load balancing policy, decide if ClusterControl should use existing repository or the most recent source code to deploy HAProxy. You can also pick which backend nodes you’d like to have included in the proxy configuration, and whether they should be active or backup.

By default, the HAProxy instance deployed by ClusterControl will work on MySQL Cluster (NDB), Galera Cluster, PostgreSQL streaming replication and MySQL Replication. For master-slave replication, ClusterControl can configure two listeners, one for read-only and another one for read-write. Applications will then have to send reads and writes to the respective ports. For multi-master replication, ClusterControl will setup the standard  TCP load-balancing based on least connection balancing algorithm (e.g., for Galera Cluster where all nodes are writeable).

Keepalived is used to add high availability to the proxy layer. When you have at least two HAProxy nodes in your system, you can install Keepalived from the ClusterControl UI.

You’ll have to pick two HAProxy nodes and they will be configured as an active - standby pair. A Virtual IP would be assigned to the active server and, should it fail, it will be reassigned to the standby proxy. This way you can just connect to the VIP and all your queries will be routed to the currently active and working HAProxy node.

You can find more details in how the internals are configured by reading through our HAProxy tutorial.

Deploying ProxySQL using ClusterControl

While HAProxy is a rock-solid proxy and very popular choice, it lacks database awareness, e.g., read-write split. The only way to do it in HAProxy is to create two backends and listen on two ports - one for reads and one for writes. This is, usually, fine but it requires you to implement changes in your application - the application has to understand what is a read and what is a write, and then direct those queries to the correct port. It’d be much easier to just connect to a single port and let the proxy decide what to do next - this is something HAProxy cannot do as what it does is just routing packets - no packet inspection is done and, especially, it has no understanding of the MySQL protocol.

ProxySQL solves this problem - it talks MySQL protocol and it can (among other things) perform a read-write split through its powerful query rules and route the incoming MySQL traffic according to various criterias. Installation of MaxScale from ClusterControl is simple - you want to go to Manage -> Load Balancer section and fill the “Deploy ProxySQL” tab with the required data.

In short, we need to pick where ProxySQL will be installed, what administration user and password it should have, which monitoring user it should use to connect to the  MySQL backends and verify their status and monitor state. From ClusterControl, you can either create a new user to be used by the application - you can decide on its name, password, access to which databases are granted and what MySQL privileges that user will have. Such user will be created on both MySQL and ProxySQL side. Second option, more suitable for existing infrastructures, is to use the existing database users. You need to pass username and password, and such user will be created only on ProxySQL.

Finally, you need to answer a question: are you using implicit transactions? By that we understand transactions started by running SET autocommit=0; If you do use it, ClusterControl will configure ProxySQL to send all of the traffic to the master. This is required to ensure ProxySQL will handle transactions correctly in ProxySQL 1.3.x and earlier. If you don’t use SET autocommit=0 to create new transaction, ClusterControl will configure read/write split.

ProxySQL, as every proxy, can become a single point of failure and it has to be made redundant to achieve high availability. There are a couple of methods to do that. One of them is to collocate ProxySQL on the web nodes. The idea here is that, most of the time, the ProxySQL process will work just fine and the reason for its unavailability is that the whole node went down. In such case, if ProxySQL is collocated with the web node, not much harm has been done because that particular web node will not be available either.

Another method, is to use Keepalived in a similar way like we did in the case of HAProxy.

You can find more details in how the internals are configured by reading through our ProxySQL tutorial.

Tags:  MySQL MariaDB haproxy MaxScale keepalived high availability replication load balancer failover

Preparing your Community Connector for MySQL 8 – part 2 – SHA256

In part 1 of this series we looked at implementing support for the caching_sha2_password authentication plugin using the libmyqlclient C library.  This is possible for C based connectors or connectors that can make use of an external C library.  For some, this is not possible.  For example, Java and C# both function better if they don’t have to thunk out to an external library.  For these, implementing the MySQL client/server protocol natively is required.  This part 2 is about implementing support for the caching_sha2_password plugin natively.  It is expected that you already have a working connector and understand the client/server protocol.  For reference please internal documentation here.  This post will not attempt to always give you precise byte positions or counts.  For that please see the above linked internal documentation.

This blog post will not attempt to explain every aspect of implementing the connection phase of the client server protocol.   For help in implementing it please refer to

Initial Handshake

When initially connected, the server sends an initial handshake packet with lots of information.  Part of that information is the default authentication plugin that is set for the server.  Many connectors give the user the option of specifying an authentication plugin via a configuration or connection string option.  Doing this can eliminate a round trip during authentication.  However if you don’t do this or the user has not specified an authentication method, then attempting authentication using the default method is the way to go.

So a handshake response packet is now sent.  Part of this packet is the username, the “auth response”, and the plugin name.   Username is simple and the plugin name will be the name of the authentication plugin you received in the initial handshake packet.   By default starting with 8.0.4, that will be “caching_sha2_password” (assuming we are not switching to a different method).  The “auth response” part of the packet is the SHA256 scramble of the password.  This scramble uses the format

XOR(SHA256(PASSWORD), SHA256(SHA256(SHA256(PASSWORD)), seed_bytes))

Here the seed_bytes are the bytes originally passed to the plugin.

After sending the handshake response packet to the server, the server will respond with a packet.  There are four possible responses from the server:

  • Server sends a “More data” packet (first byte == 0x01) with the second byte = 0x03.  This will be followed by a normal OK packet.  This is the case when the user’s password is already in the server cache and authentication has succeeded.  We call this the “fast” authentication.
  • Server sends back an “error” packet.  This means the user’s password is in the cache but it doesn’t match what was given.
  • Server sends back an “auth switch” packet (first byte == 0xFE).  This means that the user who is trying to authenticate is using a different authentication plugin than the one specified and an auth switch cycle needs to happen.
  • Server sends back a “More data” packet (first byte == 0x01) with the second byte = 0x04.  This means that more data is needed to complete the authentication.  In the example of using “caching_sha2_password” this means that the users password is not in the server cache and the server is asking the client to send the user’s full password.  This is called the “full” authentication.
Full Authentication

When the server cache does not contain the password hash, the server asks for the full password so the cache can be populated.  This can be done in one of two ways:

  • Passing it in plain text if the connection is already secure using SSL/TLS.
  • Encrypting it with the servers public key if the connection is not secure.
Full Authentication via SSL/TLS

If the connection is already secure then all that is necessary is to pass the users password in clear text.  This is as simple as simply sending a single packet containing only the password as a zero terminated array of bytes.  The server will then respond with either an “Ok” packet if authentication was successful or an “Error” packet if not.

Full Authentication via RSA Key Exchange

If the connection is not already secure then passing the password in clear text is obviously not going to work.  This case is handled by the client encrypting the user’s password with the server’s public key and sending that back.  Here are the steps to accomplishing that.

  • In response to receiving the 0x01 0x04 packet from the server, the client has two choices.  The client could respond with a packet containing the single byte 0x02.  This requests the server send it’s public key.  This is considered somewhat insecure.  A better option if available would be for the client to have the server public key locally.  This would be stored in some implementation defined way.
  • If the client requested the server send the public key, the server will then respond with a “More data” packet containing the servers public key
  • The client then responds with the password encrypted with the servers public key. (see below for more details)
  • The server would then respond with either the Ok or Error packet.
Password Encryption (a closer look)

The password is “obfuscated” first by employing a rotating “xor” against the seed bytes that were given to the authentication plugin upon initial handshake.   Pseudocode for this might look like this:

for (int i=0; i < password.length; i++) buffer[i] = password[i] ^ seedbytes[i % seedbytes.length];

Buffer would then be encrypted using the RSA public key the server passed to the client.  The resulting buffer would then be passed back to the server.

RSA Padding Change

It’s important to note that a incompatible change happened in server 8.0.5.  Prior to server 8.0.5 the encryption was done using RSA_PKCS1_PADDING.  With 8.0.5 it is done with RSA_PKCS1_OAEP_PADDING.  This means that if you have implemented support for this authentication scheme for servers prior to 8.0.5 you will need to update your connector to make this change.


I hope this blog has helped you understand a bit better how to implement this new security protocol into your product.  We continually strive to find new and better ways to help protect your data.  Please let us know if you have any questions or find any errors with this post!