Planet MySQL

Percona XtraBackup 2.4.13 Is Now Available

Percona is glad to announce the release of Percona XtraBackup 2.4.13 on January 18, 2018. You can download it from our download site and apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.

New features and improvements:
  • PXB-1548: Percona XtraBackup enables updating the ib_buffer_pool file with the latest pages present in the buffer pool using the --dump-innodb-buffer-pool option. Thanks to Marcelo Altmann for contribution.
Bugs fixed
  • xtrabackup did not delete missing tables from the partial backup which led to error messages logged by the server on startup. Bug fixed PXB-1536.
  • The --history option did not work when autocommit was disabled. Bug fixed PXB-1569.
  • xtrabackup could fail to backup encrypted tablespace when it was recently created or altered. Bug fixed PXB-1648.
  • When the --throttle option was used, the applied value was different from the one specified by the user (off by one error). Bug fixed PXB-1668.
  • It was not allowed for MTS (multi-threaded slaves) without GTID to be backed up with --safe-slave-backup. Bug fixed PXB-1672.
  • Percona Xtrabackup could crash when the ALTER TABLE … TRUNCATE PARTITION command was run during a backup without locking DDL. Bug fixed PXB-1679.
  • xbcrypt could display an assertion failure and generated core if the required parameters are missing. Bug fixed PXB-1683.
  • Using --lock-ddl-per-table caused the server to scan all records of partitioned tables which could lead to the “out of memory error”. Bugs fixed PXB-1691 and PXB-1698.
  • xtrabackup --prepare could hang while performing insert buffer merge. Bug fixed PXB-1704.
  • Incremental backups did not update xtrabackup_binlog_info with --binlog-info=lockless. Bug fixed PXB-1711

Other bugs fixed:  PXB-1570PXB-1609PXB-1632

Release notes with all the improvements for version 2.4.13 are available in our online documentation. Please report any bugs to the issue tracker.

Replication Manager Works with MariaDB

Some time ago I wrote a script to manage asynchronous replication links between Percona XtraDB clusters. The original post can be found here. The script worked well with Percona XtraDB Cluster but it wasn’t working well with MariaDB®.  Finally, the replication manager works with MariaDB.

First, let’s review the purpose of the script. Managing replication links between Galera based clusters is a tedious task. There are many potential slaves and many potential masters. Furthermore, each replication link must have only a single slave. Just try to imagine how you would maintain the following replication topology:

A complex replication topology

The above topology consists of five clusters and four master-to-master links. The replication manager can easily handle this topology. Of course, it is not a fix to the limitations of asynchronous replication. You must make sure your writes are replication safe. You could want, for example, a global user list or to centralize some access logs. Just to refresh memories, here are some of the script highlights:

  • Uses the Galera cluster for Quorum
  • Configurable, arbitrarily complex topologies
  • The script stores the topology in database tables
  • Elects slaves automatically
  • Monitors replication links
  • Slaves can connect to a list of potential masters

As you probably know, MariaDB has a different GTID implementation and syntax for the multi-source replication commands. I took some time to investigate why the script was failing and fixed it. Now, provided you are using MariaDB 10.1.4+ with GTIDs, the replication manager works fine.

You can found the script here. Be aware that although I work for Percona, the script is not officially supported by Percona.

MySQL 8.0.13 : InnoDB Transparent Tablespace Encryption for General Tablespaces

In my previous post (here) I talked about “InnoDB Transparent tablespace encryption” which was introduced in MySQL 5.7.11. That encryption was limited to file-per-table tablespaces. In MySQL 8.0.13, encryption for general tablespace is introduced. This blog post aims to give a peek into this new feature.…

Percona Server for MySQL 8.0.13-4 Is Now Available

Percona announces the release of Percona Server for MySQL 8.0.13-4 on January 17, 2019 (downloads are available here and from the Percona Software Repositories). This release contains a fix for a critical bug that prevented Percona Server for MySQL 5.7.24-26 from being upgraded to version 8.0.13-3 if there are more than around 1000 tables, or if the maximum allocated InnoDB table ID is around 1000. Percona Server for MySQL 8.0.13-4 is now the current GA release in the 8.0 series.

All of Percona’s software is open-source and free.

Percona Server for MySQL 8.0 includes all the features available in MySQL 8.0 Community Edition in addition to enterprise-grade features developed by Percona. For a list of highlighted features from both MySQL 8.0 and Percona Server for MySQL 8.0, please see the GA release announcement.

Note: If you are upgrading from 5.7 to 8.0, please ensure that you read the upgrade guide and the document Changed in Percona Server for MySQL 8.0.

Bugs Fixed
  • It was not possible to upgrade from MySQL 5.7.24-26 to 8.0.13-3 if there were more than around 1000 tables, or if the maximum allocated InnoDB table ID was around 1000. Bug fixed #5245.
  • SHOW BINLOG EVENTS FROM <bad offset> is not diagnosed inside Format_description_log_events. Bug fixed #5126 (Upstream #93544).
  • There was a typo in mysqld_safe.sh: trottling was replaced with throttling. Bug fixed #240. Thanks to Michael Coburn for the patch.
  • Percona Server for MySQL 8.0 could crash with the “Assertion failure: dict0dict.cc:7451:space_id != SPACE_UNKNOWN” exception during an upgrade from Percona Server for MySQL 5.7.23 to Percona Server for MySQL 8.0.13-3 with --innodb_file_per_table=OFF. Bug fixed #5222.
  • On Debian or Ubuntu, a conflict was reported on the /usr/bin/innochecksum file when attempting to install Percona Server for MySQL 8 over MySQL 8. Bug fixed #5225.
  • An out-of-bound read exception could occur on debug builds in the compressed columns with dictionaries feature. Bug fixed #5311.
  • The innodb_data_pending_reads server status variable contained an incorrect value. Bug fixed #5264. Thanks to Fangxin Lou for the patch.
  • A memory leak and needless allocation in compression dictionaries could happen in mysqldump. Bug fixed #5307.
  • A compression-related memory leak could happen in mysqlbinlog. Bug fixed #5308.

Other bugs fixed: #4797#5209#5268#5270#5306#5309

Find the release notes for Percona Server for MySQL 8.0.13-4 in our online documentation. Report bugs in the Jira bug tracker.

Shinguz: MariaDB/MySQL Environment MyEnv 2.0.2 has been released

FromDual has the pleasure to announce the release of the new version 2.0.2 of its popular MariaDB, Galera Cluster and MySQL multi-instance environment MyEnv.

The new MyEnv can be downloaded here. How to install MyEnv is described in the MyEnv Installation Guide.

In the inconceivable case that you find a bug in the MyEnv please report it to the FromDual bug tracker.

Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.

Upgrade from 1.1.x to 2.0

Please look at the MyEnv 2.0.0 Release Notes.

Upgrade from 2.0.x to 2.0.2 shell> cd ${HOME}/product shell> tar xf /download/myenv-2.0.2.tar.gz shell> rm -f myenv shell> ln -s myenv-2.0.2 myenv
Plug-ins

If you are using plug-ins for showMyEnvStatus create all the links in the new directory structure:

shell> cd ${HOME}/product/myenv shell> ln -s ../../utl/oem_agent.php plg/showMyEnvStatus/
Upgrade of the instance directory structure

From MyEnv 1.0 to 2.0 the directory structure of instances has fundamentally changed. Nevertheless MyEnv 2.0 works fine with MyEnv 1.0 directory structures.

Changes in MyEnv 2.0.2 MyEnv
  • Error message fixed.
  • bind_address 0.0.0.0 is optimized to *.
  • State up and down are coloured now.
  • Complaint on missing symbolic link to my.cnf added.
  • New start-timeout configuration variable added. Important for Galera SST.
  • Default MariaDB my.cnf hash added to avoid complaints.
  • mysqld is consistently searched in sbin, bin and libexec now for RHEL/CentOS 7 compatibility.
  • Avoid EGPCS error messages during MyEnv start/stop.
  • Not used aReleaseVersion removed, side effect is to not have performance issues any more on up in huge MyEnv set-ups with older MySQL releases.

MyEnv Installer
  • Function answerQuestion on previous error message works now.
  • Try and catch for existing configuration file improved.
  • Default answer is "q" on error and instance name and blacklist name check is fixed.
  • myenv.conf backup file has a correct timestamp now.
  • Create symlink to datadir for my.cnf.
  • Purge of database is done from instancedir and not datadir any more.

MyEnv Utilities
  • galera_monitor.sh output made nicer.
  • Script az_test.php added, initial test found already a bug in MariaDB 10.3.
  • Script slave_monitor.sh added.
  • Option check made more careful for drop_partition.php and merge_partition.php.
  • Timestamp problem fixed for year change in split_partition.php.

For subscriptions of commercial use of MyEnv please get in contact with us.

Taxonomy upgrade extras:  MyEnv multi-instance virtualization consolidation SaaS Operations release upgrade mysqld_multi

Using Parallel Query with Amazon Aurora for MySQL

Parallel query execution is my favorite, non-existent, feature in MySQL. In all versions of MySQL – at least at the time of writing – when you run a single query it will run in one thread, effectively utilizing one CPU core only. Multiple queries run at the same time will be using different threads and will utilize more than one CPU core.

On multi-core machines – which is the majority of the hardware nowadays – and in the cloud, we have multiple cores available for use. With faster disks (i.e. SSD) we can’t utilize the full potential of IOPS with just one thread.

AWS Aurora (based on MySQL 5.6) now has a version which will support parallelism for SELECT queries (utilizing the read capacity of storage nodes underneath the Aurora cluster). In this article, we will look at how this can improve the reporting/analytical query performance in MySQL. I will compare AWS Aurora with MySQL (Percona Server) 5.6 running on an EC2 instance of the same class.

In Short

Aurora Parallel Query response time (for queries which can not use indexes) can be 5x-10x better compared to the non-parallel fully cached operations. This is a significant improvement for the slow queries.

Test data and versions

For my test, I need to choose:

  1. Aurora instance type and comparison
  2. Dataset
  3. Queries
Aurora instance type and comparison

According to Jeff Barr’s excellent article (https://aws.amazon.com/blogs/aws/new-parallel-query-for-amazon-aurora/) the following instance classes will support parallel query (PQ):

“The instance class determines the number of parallel queries that can be active at a given time:

  • db.r*.large – 1 concurrent parallel query session
  • db.r*.xlarge – 2 concurrent parallel query sessions
  • db.r*.2xlarge – 4 concurrent parallel query sessions
  • db.r*.4xlarge – 8 concurrent parallel query sessions
  • db.r*.8xlarge – 16 concurrent parallel query sessions
  • db.r4.16xlarge – 16 concurrent parallel query sessions”

As I want to maximize the concurrency of parallel query sessions, I have chosen db.r4.8xlarge. For the EC2 instance I will use the same class: r4.8xlarge.

Aurora:

mysql> show global variables like '%version%'; +-------------------------+------------------------------+ | Variable_name           | Value      | +-------------------------+------------------------------+ | aurora_version          | 1.18.0      | | innodb_version          | 1.2.10      | | protocol_version        | 10      | | version                 | 5.6.10      | | version_comment         | MySQL Community Server (GPL) | | version_compile_machine | x86_64                       | | version_compile_os      | Linux      | +-------------------------+------------------------------+

MySQL on ec2

mysql> show global variables like '%version%'; +-------------------------+------------------------------------------------------+ | Variable_name           | Value                              | +-------------------------+------------------------------------------------------+ | innodb_version          | 5.6.41-84.1                              | | protocol_version        | 10                              | | slave_type_conversions  |                              | | tls_version             | TLSv1.1,TLSv1.2                              | | version                 | 5.6.41-84.1                              | | version_comment         | Percona Server (GPL), Release 84.1, Revision b308619 | | version_compile_machine | x86_64                                               | | version_compile_os      | debian-linux-gnu                              | | version_suffix          |                              | +-------------------------+------------------------------------------------------+

Table

I’m using the “Airlines On-Time Performance” database from http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time  (You can find the scripts I used here: https://github.com/Percona-Lab/ontime-airline-performance).

mysql> show table status like 'ontime'\G *************************** 1. row ***************************           Name: ontime         Engine: InnoDB        Version: 10     Row_format: Compact           Rows: 173221661 Avg_row_length: 409    Data_length: 70850183168 Max_data_length: 0   Index_length: 0      Data_free: 7340032 Auto_increment: NULL    Create_time: 2018-09-26 02:03:28    Update_time: NULL     Check_time: NULL      Collation: latin1_swedish_ci       Checksum: NULL Create_options:        Comment: 1 row in set (0.00 sec)

The table is very wide, 84 columns.

Working with Aurora PQ (Parallel Query)

Documentation: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-mysql-parallel-query.html

Aurora PQ works by doing a full table scan (parallel reads are done on the storage level). The InnoDB buffer pool is not used when Parallel Query is utilized.

For the purposes of the test I turned PQ on and off (normally AWS Aurora uses its own heuristics to determine if the PQ will be helpful or not):

Turn on and force:

mysql> set session aurora_pq = 1; Query OK, 0 rows affected (0.00 sec) mysql> set aurora_pq_force = 1; Query OK, 0 rows affected (0.00 sec)

Turn off:

mysql> set session aurora_pq = 0; Query OK, 0 rows affected (0.00 sec)

The EXPLAIN plan in MySQL will also show the details about parallel query execution statistics.

Queries

Here, I use the “reporting” queries, running only one query at a time. The queries are similar to those I’ve used in older blog posts comparing MySQL and Apache Spark performance (https://www.percona.com/blog/2016/08/17/apache-spark-makes-slow-mysql-queries-10x-faster/ )

Here is a summary of the queries:

  1. Simple queries:
    • select count(*) from ontime where flightdate > '2017-01-01'
    • select avg(DepDelay/ArrDelay+1) from ontime
  2. Complex filter, single table:

select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE   DestState not in ('AK', 'HI', 'PR', 'VI')   and OriginState not in ('AK', 'HI', 'PR', 'VI')   and flightdate > '2015-01-01'    and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = 0   ORDER by DepDelay DESC LIMIT 10;

3. Complex filter, join “reference” table

select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay FROM ontime_ind o JOIN carriers c on o.carrier = c.carrier_code WHERE   (carrier_name like 'United%' or carrier_name like 'Delta%')   and ArrDelay > 30   ORDER by DepDelay DESC LIMIT 10\G

4. select one row only, no index

Query 1a: simple, count(*)

Let’s take a look at the most simple query: count(*). This variant of the “ontime” table has no secondary indexes.

select count(*) from ontime where flightdate > '2017-01-01';

Aurora, pq (parallel query) disabled:

I disabled the PQ first to compare:

mysql> select count(*) from ontime where flightdate > '2017-01-01'; +----------+ | count(*) | +----------+ |  5660651 | +----------+ 1 row in set (8 min 25.49 sec) mysql> select count(*) from ontime where flightdate > '2017-01-01'; +----------+ | count(*) | +----------+ |  5660651 | +----------+ 1 row in set (2 min 48.81 sec) mysql> mysql> select count(*) from ontime where flightdate > '2017-01-01'; +----------+ | count(*) | +----------+ |  5660651 | +----------+ 1 row in set (2 min 48.25 sec) Please note: the first run was “cold run”; data was read from disk. The second and third run used the cached data. Now let's enable and force Aurora PQ: mysql> set session aurora_pq = 1; Query OK, 0 rows affected (0.00 sec) mysql> set aurora_pq_force = 1;  Query OK, 0 rows affected (0.00 sec) mysql> explain select count(*) from ontime where flightdate > '2017-01-01'\G *************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: ontime         type: ALL possible_keys: NULL          key: NULL      key_len: NULL          ref: NULL         rows: 173706586        Extra: Using where; Using parallel query (1 columns, 1 filters, 0 exprs; 0 extra) 1 row in set (0.00 sec)

(from the EXPLAIN plan, we can see that parallel query is used).

Results:

mysql> select count(*) from ontime where flightdate > '2017-01-01';                                                                                                                           +----------+ | count(*) | +----------+ |  5660651 | +----------+ 1 row in set (16.53 sec) mysql> select count(*) from ontime where flightdate > '2017-01-01'; +----------+ | count(*) | +----------+ |  5660651 | +----------+ 1 row in set (16.56 sec) mysql> select count(*) from ontime where flightdate > '2017-01-01'; +----------+ | count(*) | +----------+ |  5660651 | +----------+ 1 row in set (16.36 sec) mysql> select count(*) from ontime where flightdate > '2017-01-01'; +----------+ | count(*) | +----------+ |  5660651 | +----------+ 1 row in set (16.56 sec) mysql> select count(*) from ontime where flightdate > '2017-01-01'; +----------+ | count(*) | +----------+ |  5660651 | +----------+ 1 row in set (16.36 sec)

As we can see the results are very stable. It does not use any cache (ie: innodb buffer pool) either. The result is also interesting: utilizing multiple threads (up to 16 threads) and reading data from disk (using disk cache, probably) can be ~10x faster compared to reading from memory in a single thread.

Result: ~10x performance gain, no index used

Query 1b: simple, avg

set aurora_pq = 1; set aurora_pq_force=1; select avg(DepDelay) from ontime; +---------------+ | avg(DepDelay) | +---------------+ |        8.2666 | +---------------+ 1 row in set (1 min 48.17 sec) set aurora_pq = 0; set aurora_pq_force=0;   select avg(DepDelay) from ontime; +---------------+ | avg(DepDelay) | +---------------+ |        8.2666 | +---------------+ 1 row in set (2 min 49.95 sec) Here we can see that PQ gives use ~2x performance increase.

Summary of simple query performance

Here is what we learned comparing Aurora PQ performance to native MySQL query execution:

  1. Select count(*), not using index: 10x performance increase with Aurora PQ.
  2. select avg(…), not using index: 2x performance increase with Aurora PQ.
Query 2: Complex filter, single table

The following query will always be slow in MySQL. This combination of the filters in the WHERE condition makes it extremely hard to prepare a good set of indexes to make this query faster.

select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE   DestState not in ('AK', 'HI', 'PR', 'VI')   and OriginState not in ('AK', 'HI', 'PR', 'VI')   and flightdate > '2015-01-01'   and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = '0' ORDER by DepDelay DESC LIMIT 10;

Let’s compare the query performance with and without PQ.

PQ disabled:

mysql> set aurora_pq_force = 0; Query OK, 0 rows affected (0.00 sec) mysql> set aurora_pq = 0;                                                                                                                                                                   Query OK, 0 rows affected (0.00 sec) mysql> explain select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE    DestState not in ('AK', 'HI', 'PR', 'VI') and OriginState not in ('AK', 'HI', 'PR', 'VI') and flightdate > '2015-01-01'     and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = 0 ORDER by DepDelay DESC LIMIT 10\G *************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: ontime         type: ALL possible_keys: NULL          key: NULL      key_len: NULL          ref: NULL         rows: 173706586        Extra: Using where; Using filesort 1 row in set (0.00 sec) mysql> select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE    DestState not in ('AK', 'HI', 'PR', 'VI') and OriginState not in ('AK', 'HI', 'PR', 'VI') and flightdate > '2015-01-01'     and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = 0 ORDER by DepDelay DESC LIMIT 10; +------------+---------+-----------+--------+------+ | FlightDate | carrier | FlightNum | Origin | Dest | +------------+---------+-----------+--------+------+ | 2017-10-09 | OO      | 5028 | SBP | SFO  | | 2015-11-03 | VX      | 969 | SAN | SFO  | | 2015-05-29 | VX      | 720 | TUL | AUS  | | 2016-03-11 | UA      | 380 | SFO | BOS  | | 2016-06-13 | DL      | 2066 | JFK | SAN  | | 2016-11-14 | UA      | 1600 | EWR | LAX  | | 2016-11-09 | WN      | 2318 | BDL | LAS  | | 2016-11-09 | UA      | 1652 | IAD | LAX  | | 2016-11-13 | AA      | 23 | JFK | LAX  | | 2016-11-12 | UA      | 800 | EWR | SFO  | +------------+---------+-----------+--------+------+

10 rows in set (3 min 42.47 sec)

/* another run */

10 rows in set (3 min 46.90 sec)

This query is 100% cached. Here is the graph from PMM showing the number of read requests:

  1. Read requests: logical requests from the buffer pool
  2. Disk reads: physical requests from disk

Buffer pool requests:

Now let’s enable and force PQ:

PQ enabled:

mysql> set session aurora_pq = 1; Query OK, 0 rows affected (0.00 sec) mysql> set aurora_pq_force = 1;                                                                                                                              Query OK, 0 rows affected (0.00 sec) mysql> explain select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE    DestState not in ('AK', 'HI', 'PR', 'VI') and OriginState not in ('AK', 'HI', 'PR', 'VI') and flightdate > '2015-01-01'     and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = 0 ORDER by DepDelay DESC LIMIT 10\G *************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: ontime         type: ALL possible_keys: NULL          key: NULL      key_len: NULL          ref: NULL         rows: 173706586        Extra: Using where; Using filesort; Using parallel query (12 columns, 4 filters, 3 exprs; 0 extra) 1 row in set (0.00 sec) mysql> select SQL_CALC_FOUND_ROWS                                                                                                                                                                      -> FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest -> FROM ontime    -> WHERE    ->  DestState not in ('AK', 'HI', 'PR', 'VI')    ->  and OriginState not in ('AK', 'HI', 'PR', 'VI')    ->  and flightdate > '2015-01-01'    ->   and ArrDelay < 15    -> and cancelled = 0    -> and Diverted = 0    -> and DivAirportLandings = 0    ->  ORDER by DepDelay DESC    -> LIMIT 10; +------------+---------+-----------+--------+------+ | FlightDate | carrier | FlightNum | Origin | Dest | +------------+---------+-----------+--------+------+ | 2017-10-09 | OO      | 5028 | SBP | SFO  | | 2015-11-03 | VX      | 969 | SAN | SFO  | | 2015-05-29 | VX      | 720 | TUL | AUS  | | 2016-03-11 | UA      | 380 | SFO | BOS  | | 2016-06-13 | DL      | 2066 | JFK | SAN  | | 2016-11-14 | UA      | 1600 | EWR | LAX  | | 2016-11-09 | WN      | 2318 | BDL | LAS  | | 2016-11-09 | UA      | 1652 | IAD | LAX  | | 2016-11-13 | AA      | 23 | JFK | LAX  | | 2016-11-12 | UA      | 800 | EWR | SFO  | +------------+---------+-----------+--------+------+ 10 rows in set (41.88 sec) /* run 2 */ 10 rows in set (28.49 sec) /* run 3 */ 10 rows in set (29.60 sec)

Now let’s compare the requests:

As we can see, Aurora PQ is almost NOT utilizing the buffer pool (there are a minor number of read requests. Compare the max of 4K requests per second with PQ to the constant 600K requests per second in the previous graph).

Result: ~8x performance gain

Query 3: Complex filter, join “reference” table

In this example I join two tables: the main “ontime” table and a reference table. If we have both tables without indexes it will simply be too slow in MySQL. To make it better, I have created an index for both tables and so it will use indexes for the join:

CREATE TABLE `carriers` (  `carrier_code` varchar(8) NOT NULL DEFAULT '',  `carrier_name` varchar(200) DEFAULT NULL,  PRIMARY KEY (`carrier_code`),  KEY `carrier_name` (`carrier_name`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 mysql> show create table ontime_ind\G ...  PRIMARY KEY (`id`),  KEY `comb1` (`Carrier`,`Year`,`ArrDelayMinutes`),  KEY `FlightDate` (`FlightDate`) ) ENGINE=InnoDB AUTO_INCREMENT=178116912 DEFAULT CHARSET=latin1

Query:

select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay FROM ontime_ind o JOIN carriers c on o.carrier = c.carrier_code WHERE   (carrier_name like 'United%' or carrier_name like 'Delta%')   and ArrDelay > 30   ORDER by DepDelay DESC LIMIT 10\G

PQ disabled, explain plan:

mysql> set aurora_pq_force = 0; Query OK, 0 rows affected (0.00 sec) mysql> set aurora_pq = 0;                                                                                                                                                                   Query OK, 0 rows affected (0.00 sec) mysql> explain    -> select SQL_CALC_FOUND_ROWS    -> FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay    -> FROM ontime_ind o    -> JOIN carriers c on o.carrier = c.carrier_code    -> WHERE    ->  (carrier_name like 'United%' or carrier_name like 'Delta%')    ->  and ArrDelay > 30    ->  ORDER by DepDelay DESC    -> LIMIT 10\G *************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: c         type: range possible_keys: PRIMARY,carrier_name          key: carrier_name      key_len: 203          ref: NULL         rows: 3        Extra: Using where; Using index; Using temporary; Using filesort *************************** 2. row ***************************           id: 1  select_type: SIMPLE        table: o         type: ref possible_keys: comb1          key: comb1      key_len: 3          ref: ontime.c.carrier_code         rows: 2711597        Extra: Using index condition; Using where 2 rows in set (0.01 sec)

As we can see MySQL uses indexes for the join. Response times:

/* run 1 – cold run */

10 rows in set (29 min 17.39 sec)

/* run 2  – warm run */

10 rows in set (2 min 45.16 sec)

PQ enabled, explain plan:

mysql> explain -> select SQL_CALC_FOUND_ROWS -> FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay -> FROM ontime_ind o -> JOIN carriers c on o.carrier = c.carrier_code -> WHERE -> (carrier_name like 'United%' or carrier_name like 'Delta%') -> and ArrDelay > 30 -> ORDER by DepDelay DESC -> LIMIT 10\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: c type: ALL possible_keys: PRIMARY,carrier_name key: NULL key_len: NULL ref: NULL rows: 1650 Extra: Using where; Using temporary; Using filesort; Using parallel query (2 columns, 0 filters, 1 exprs; 0 extra) *************************** 2. row *************************** id: 1 select_type: SIMPLE table: o type: ALL possible_keys: comb1 key: NULL key_len: NULL ref: NULL rows: 173542245 Extra: Using where; Using join buffer (Hash Join Outer table o); Using parallel query (11 columns, 1 filters, 1 exprs; 0 extra) 2 rows in set (0.00 sec)

As we can see, Aurora does not use any indexes and uses a parallel scan instead.

Response time:

mysql> select SQL_CALC_FOUND_ROWS    -> FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay    -> FROM ontime_ind o    -> JOIN carriers c on o.carrier = c.carrier_code    -> WHERE    ->  (carrier_name like 'United%' or carrier_name like 'Delta%')    ->  and ArrDelay > 30    ->  ORDER by DepDelay DESC    -> LIMIT 10\G ... *************************** 4. row ***************************    FlightDate: 2017-05-04 UniqueCarrier: UA       TailNum: N68821     FlightNum: 1205        Origin: KOA OriginCityName: Kona, HI          Dest: LAX  DestCityName: Los Angeles, CA      DepDelay: 1457      ArrDelay: 1459 *************************** 5. row ***************************    FlightDate: 1991-03-12 UniqueCarrier: DL       TailNum:     FlightNum: 1118        Origin: ATL OriginCityName: Atlanta, GA          Dest: STL  DestCityName: St. Louis, MO ... 10 rows in set (28.78 sec) mysql> select found_rows(); +--------------+ | found_rows() | +--------------+ |      4180974 | +--------------+ 1 row in set (0.00 sec)

Result: ~5x performance gain

(this is actually comparing the index cached read to a non-index PQ execution)

Summary

Aurora PQ can significantly improve the performance of reporting queries as such queries may be extremely hard to optimize in MySQL, even when using indexes. With indexes, Aurora PQ response time can be 5x-10x better compared to the non-parallel, fully cached operations. Aurora PQ can help improve performance of complex queries by performing parallel reads.

The following table summarizes the query response times:

Query Time, No PQ, index Time, PQ select count(*) from ontime where flightdate > ‘2017-01-01’ 2 min 48.81 sec 16.53 sec select avg(DepDelay) from ontime; 2 min 49.95 sec 1 min 48.17 sec select SQL_CALC_FOUND_ROWS

FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest

FROM ontime

WHERE

DestState not in (‘AK’, ‘HI’, ‘PR’, ‘VI’)

and OriginState not in (‘AK’, ‘HI’, ‘PR’, ‘VI’)

and flightdate > ‘2015-01-01’

and ArrDelay < 15

and cancelled = 0

and Diverted = 0

and DivAirportLandings = 0

ORDER by DepDelay DESC

LIMIT 10;

3 min 42.47 sec 28.49 sec select SQL_CALC_FOUND_ROWS

FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay

FROM ontime_ind o

JOIN carriers c on o.carrier = c.carrier_code

WHERE

(carrier_name like ‘United%’ or carrier_name like ‘Delta%’)

and ArrDelay > 30

ORDER by DepDelay DESC

LIMIT 10\G

2 min 45.16 sec 28.78 sec


Photo by Thomas Lipke on Unsplash

MySQL Mailing Lists

Over the last few years, our community discussions have moved to MySQL Community Slack and MySQL Community Forums. And, Q&A discussions have moved to StackExchange and StackOverflow. The traditional MySQL mailing lists do not have any meaningful traffic, and are being discontinued.

So I encourage you to join the MySQL Community Slack. We are all there, ready to discuss and help: MySQL Engineers, and famous DBAs and developers , and newbies too. Come join us!

Pages