Planet MySQL

Shinguz: MySQL Enterprise Backup Support Matrix

MySQL Enterprise Backup (MEB) is a bit limited related to support of older MySQL versions. So you should consider the following release matrix:

MEB/MySQL Supported  5.5   5.6   5.7   8.0  3.11.x NO x x 3.12.x YES x x 4.0.x NO x 4.1.x YES x 8.0.x YES 8.0.x*

* MySQL Enterprise Backup 8.0.15 only supports MySQL 8.0.15. For earlier versions of MySQL 8.0, use the MySQL Enterprise Backup version with the same version number as the server.

MySQL Enterprise Backup is available for download from the My Oracle Support (MOS) website. This release will be available on Oracle eDelivery (OSDC) after the next upload cycle. MySQL Enterprise Backup is a commercial extension to the MySQL family of products.

As an Open Source alternative Percona XtraBackup for MySQL databases is available.

Compatibility with MySQL Versions: 3.11, 3.12, 4.0, 4.1, 8.0.

MySQL Enterprise Backup User's Guide: 3.11, 3.12, 4.0, 4.1, 8.0.

Taxonomy upgrade extras:  MySQL Enterprise Backup Backup meb enterprise support matrix

How Network Bandwidth Affects MySQL Performance

Network is a major part of a database infrastructure. However, often performance benchmarks are done on a local machine, where a client and a server are collocated – I am guilty myself. This is done to simplify the setup and to exclude one more variable (the networking part), but with this we also miss looking at how network affects performance.

The network is even more important for clustering products like Percona XtraDB Cluster and MySQL Group Replication. Also, we are working on our Percona XtraDB Cluster Operator for Kubernetes and OpenShift, where network performance is critical for overall performance.

In this post, I will look into networking setups. These are simple and trivial, but are a building block towards understanding networking effects for more complex setups.

Setup

I will use two bare-metal servers, connected via a dedicated 10Gb network. I will emulate a 1Gb network by changing the network interface speed with

ethtool -s eth1 speed 1000 duplex full autoneg off  command.

I will run a simple benchmark:

sysbench oltp_read_only --mysql-ssl=on --mysql-host=172.16.0.1 --tables=20 --table-size=10000000 --mysql-user=sbtest --mysql-password=sbtest --threads=$i --time=300 --report-interval=1 --rand-type=pareto

This is run with the number of threads varied from 1 to 2048. All data fits into memory – innodb_buffer_pool_size is big enough – so the workload is CPU-intensive in memory: there is no IO overhead.

Operating System: Ubuntu 16.04

Benchmark N1. Network bandwidth

In the first experiment I will compare 1Gb network vs 10Gb network.

threads/throughput 1Gb network 10Gb network 1 326.13 394.4 4 1143.36 1544.73 16 2400.19 5647.73 32 2665.61 10256.11 64 2838.47 15762.59 96 2865.22 17626.77 128 2867.46 18525.91 256 2867.47 18529.4 512 2867.27 17901.67 1024 2865.4 16953.76 2048 2761.78 16393.84

 

Obviously the 1Gb network performance is a bottleneck here, and we can improve our results significantly if we move to the 10Gb network.

To see that 1Gb network is bottleneck we can check the network traffic chart in PMM:

We can see we achieved 116MiB/sec (or 928Mb/sec)  in throughput, which is very close to the network bandwidth.

But what we can do if the our network infrastructure is limited to 1Gb?

Benchmark N2. Protocol compression

There is a feature in MySQL protocol whereby you can see the compression for the network exchange between client and server:

--mysql-compression=on  for sysbench.

Let’s see how it will affect our results.

threads/throughput 1Gb network 1Gb with compression protocol 1 326.13 198.33 4 1143.36 771.59 16 2400.19 2714 32 2665.61 3939.73 64 2838.47 4454.87 96 2865.22 4770.83 128 2867.46 5030.78 256 2867.47 5134.57 512 2867.27 5133.94 1024 2865.4 5129.24 2048 2761.78 5100.46

 

Here is an interesting result. When we use all available network bandwidth, the protocol compression actually helps to improve the result.

threads/throughput 10Gb 10Gb with compression 1 394.4 216.25 4 1544.73 857.93 16 5647.73 3202.2 32 10256.11 5855.03 64 15762.59 8973.23 96 17626.77 9682.44 128 18525.91 10006.91 256 18529.4 9899.97 512 17901.67 9612.34 1024 16953.76 9270.27 2048 16393.84 9123.84

 

But this is not the case with the 10Gb network. The CPU resources needed for compression/decompression are a limiting factor, and with compression the throughput actually only reach half of what we have without compression.

Now let’s talk about protocol encryption, and how using SSL affects our results.

Benchmark N3. Network encryption

threads/throughput 1Gb network 1Gb SSL 1 326.13 295.19 4 1143.36 1070 16 2400.19 2351.81 32 2665.61 2630.53 64 2838.47 2822.34 96 2865.22 2837.04 128 2867.46 2837.21 256 2867.47 2837.12 512 2867.27 2836.28 1024 2865.4 1830.11 2048 2761.78 1019.23

threads/throughput 10Gb 10Gb SSL 1 394.4 359.8 4 1544.73 1417.93 16 5647.73 5235.1 32 10256.11 9131.34 64 15762.59 8248.6 96 17626.77 7801.6 128 18525.91 7107.31 256 18529.4 4726.5 512 17901.67 3067.55 1024 16953.76 1812.83 2048 16393.84 1013.22

 

For the 1Gb network, SSL encryption shows some penalty – about 10% for the single thread – but otherwise we hit the bandwidth limit again. We also see some scalability hit on a high amount of threads, which is more visible in the 10Gb network case.

With 10Gb, the SSL protocol does not scale after 32 threads. Actually, it appears to be a scalability problem in OpenSSL 1.0, which MySQL currently uses.

In our experiments, we saw that OpenSSL 1.1.1 provides much better scalability, but you need to have a special build of MySQL from source code linked to OpenSSL 1.1.1 to achieve this. I don’t show them here, as we do not have production binaries.

Conclusions
  1. Network performance and utilization will affect the general application throughput.
  2. Check if you are hitting network bandwidth limits
  3. Protocol compression can improve the results if you are limited by network bandwidth, but also can make things worse if you are not
  4. SSL encryption has some penalty (~10%) with a low amount of threads, but it does not scale for high concurrency workloads.

MySQL Connector/Python 8.0.14+: Changed Expression Handling in mysqlx

The X DevAPI allows you to work with JSON documents and SQL tables at the same time. Furthermore, the CRUD style API is more intuitive than SQL statements for some programmers. Either way, the X DevAPI allows you to mix JSON documents, SQL tables, CRUD methods, and SQL statements to give you the best of all worlds. In MySQL Connector/Python, the X DevAPI is implemented in the mysqlx module.

This blog will look at how MySQL Connector/Python handles expressions, and how you in version 8.0.14 and later need to use the mysqlx.expr() method to explicitly define expressions.

Information

The changed way to work with expressions does not apply when defining fields. In that case, you can still specify the expression inline.

Expression Handling

One original feature of the X DevAPI in MySQL Connector/Python was that expressions were automatically handled when you inlined them into statement definitions. For example, you could increase the population like:

result = countryinfo.modify("Name = :country") \ .set("demographics.Population", "CAST(FLOOR(demographics.Population * 1.10) AS unsigned)") \ .bind("country", country_name) \ .execute()

Notice the expression in line 3.

While this was nice, it also caused some problems such as Bug 92416 where regular data could end up being interpreted as an expression by mistake. For example:

schema = db.create_schema("db1") mycol = schema.create_collection("mycol") mycol.add({"A": "(@)"}).execute()

In this example (@) would be interpreted as an expression even though it should be taken as a string.

The solution has been to require the mysqlx.expr() method to be used to define all expressions. This then allows MySQL to interpret all strings as literals. While it does remove the shortcut method, it also removes any ambiguities and thus makes the code safer.

Tip

The mysqlx.expr() method is also available in MySQL Connector/Python 8.0.13 and earlier as well as other connectors. If you have not yet upgraded to the latest version, it is recommended to start using mysqlx.expr() now to avoid problems when upgrading.

Let’s look at an example to better understand how the change works.

Example

As an example, consider an application that uses the world_x database and updates the population of a country with 10%. This can be done using the following expression:

CAST(FLOOR(demographics.Population * 1.10) AS unsigned)

At first, the application will use the inline method to specify the expression, then we will look at changing this to work in MySQL Connector/Python 8.0.14 and later.

Inline Expression

The source code for updating the population using an inline expression can be seen in the following sample program:

import mysqlx import mysql.connector connect_args = { "host": "127.0.0.1", "port": 33060, "user": "pyuser", "password": "Py@pp4Demo", }; print("Using inline expression.") print("MySQL Connector/Python {0}".format(mysql.connector.__version__)) db = mysqlx.get_session(**connect_args) schema = db.get_schema("world_x") countryinfo = schema.get_collection("countryinfo") country_name = "Australia" db.start_transaction() # Get the population of the country population = countryinfo.find("Name = :country") \ .fields("demographics.Population AS Population") \ .bind("country", country_name) before = population.execute() print("Before ...: {0}".format(before.fetch_one()['Population'])) # Update the population result = countryinfo.modify("Name = :country") \ .set("demographics.Population", "CAST(FLOOR(demographics.Population * 1.10) AS unsigned)") \ .bind("country", country_name) \ .execute() after = population.execute() print("After ....: {0}".format(after.fetch_one()['Population'])) # Leave the data in the same state as before the changes db.rollback() db.close()

The population is updated in the statement defined and executed in lines 30-34. The population is fetched both before and after and printed together with the MySQL Connector/Python version. At the end, the transaction is rolled back, so it is possible to execute the program several times while getting the same output.

Information

The mysql.connector module is only included in order to print the MySQL Connector/Python version.

The output using MySQL Connector/Python 8.0.13 is:

Using inline expression. MySQL Connector/Python 8.0.13 Before ...: 18886000 After ....: 20774600

This is as expected. However, in MySQL Connector/Python 8.0.14 and later, the result is quite different:

Using inline expression. MySQL Connector/Python 8.0.15 Before ...: 18886000 After ....: CAST(FLOOR(demographics.Population * 1.10) AS unsigned)

Now the expression is taken as a literal string – oops.

Warning

This also highlights that you must be careful when working with a schemaless data model. The database will not stop you from putting wrong data types into your documents.

Let’s look at how this can be fixed.

Explicit Expressions

The solution to the issue, we just saw, is to use explicit expressions. You can do that using the mysqlx.expr() method. This returns an expression object that you can use in your statements. The same example as before – but using an explicit expression – is:

import mysqlx import mysql.connector connect_args = { "host": "127.0.0.1", "port": 33060, "user": "pyuser", "password": "Py@pp4Demo", }; print("Using explicit expression.") print("MySQL Connector/Python {0}".format(mysql.connector.__version__)) db = mysqlx.get_session(**connect_args) schema = db.get_schema("world_x") countryinfo = schema.get_collection("countryinfo") country_name = "Australia" db.start_transaction() # Get the population of the country population = countryinfo.find("Name = :country") \ .fields("demographics.Population AS Population") \ .bind("country", country_name) before = population.execute() print("Before ...: {0}".format(before.fetch_one()['Population'])) # Update the population expr = mysqlx.expr("CAST(FLOOR(demographics.Population * 1.10) AS unsigned)") result = countryinfo.modify("Name = :country") \ .set("demographics.Population", expr) \ .bind("country", country_name) \ .execute() after = population.execute() print("After ....: {0}".format(after.fetch_one()['Population'])) # Leave the data in the same state as before the changes db.rollback() db.close()

The only change is the definition of the expression in line 30 and the use of it in line 32.

Tip

The expression object can be re-used if you need the same expression in several statements.

Now, MySQL Connector/Python 8.0.13 and 8.0.15 updates the population to the same value. First 8.0.13:

Using explicit expression. MySQL Connector/Python 8.0.13 Before ...: 18886000 After ....: 20774600

Then 8.0.15:

Using explicit expression. MySQL Connector/Python 8.0.15 Before ...: 18886000 After ....: 20774600 Further Reading

If this blog has caught you interest in MySQL Connector/Python whether you are looking at using the traditional Python Database API specification (PEP 249) or the new X DevAPI, then I have written MySQL Connector/Python Revealed published by Apress.
The book is available from Apress (print and DRM free ePub+PDF), Amazon (print and Kindle), Barnes & Noble (print), and others.

Enjoy.

Percona Server for MySQL 5.7.25-28 Is Now Available

Percona is glad to announce the release of Percona Server 5.7.25-28 on February 18, 2019. Downloads are available here and from the Percona Software Repositories.

This release is based on MySQL 5.7.25 and includes all the bug fixes in it. Percona Server 5.7.25-28 is now the current GA (Generally Available) release in the 5.7 series.

All software developed by Percona is open-source and free.

In this release, Percona Server introduces the variable binlog_skip_flush_commands. This variable controls whether or not FLUSH commands are written to the binary log. Setting this variable to ON can help avoid problems in replication. For more information, refer to our documentation.

Note

If you’re currently using Percona Server 5.7, Percona recommends upgrading to this version of 5.7 prior to upgrading to Percona Server 8.0.

Bugs fixed
  • FLUSH commands written to the binary log could cause errors in case of replication. Bug fixed #1827: (upstream #88720).
  • Running LOCK TABLES FOR BACKUP followed by STOP SLAVE SQL_THREAD could block replication preventing it from being restarted normally. Bug fixed #4758.
  • The ACCESS_DENIED field of the information_schema.user_statistics table was not updated correctly. Bug fixed #3956.
  • MySQL could report that the maximum number of connections was exceeded with too many connections being in the CLOSE_WAIT state. Bug fixed #4716 (upstream #92108)
  • Wrong query results could be received in semi-join sub queries with materialization-scan that allowed inner tables of different semi-join nests to interleave. Bug fixed #4907 (upstream bug #92809).
  • In some cases, the server using the the MyRocks storage engine could crash when TTL (Time to Live) was defined on a table. Bug fixed #4911
  • Running the SELECT statement with the ORDER BY and LIMIT clauses could result in a less than optimal performance. Bug fixed #4949 (upstream #92850)
  • There was a typo in mysqld_safe.sh: trottling was replaced with throttling. Bug fixed #240. Thanks to Michael Coburn for the patch.
  • MyRocks could crash while running START TRANSACTION WITH CONSISTENT SNAPSHOT if other transactions were in specific states. Bug fixed #4705,
  • In some cases, mysqld could crash when inserting data into a database the name of which contained special characters (CVE-2018-20324). Bug fixed #5158.
  • MyRocks incorrectly processed transactions in which multiple statements had to be rolled back. Bug fixed #5219.
  • In some cases, the MyRocks storage engine could crash without triggering the crash recovery. Bug fixed #5366.
  • When bootstrapped with undo or redo log encryption enabled on a very fast storage, the server could fail to start. Bug fixed #4958.

Other bugs fixed: #2455#4791#4855#4996#5268.

This release also contains fixes for the following CVE issues: CVE-2019-2534, CVE-2019-2529, CVE-2019-2482, CVE-2019-2434.

Find the release notes for Percona Server for MySQL 5.7.25-28 in our online documentation. Report bugs in the Jira bug tracker.

 

How to Migrate from Oracle to MySQL / Percona Server

Migrating from Oracle to MySQL/Percona Server is not a trivial task. Although it is getting easier, especially with the arrival of MySQL 8.0 and Percona announced Percona Server for MySQL 8.0 GA. Aside from planning for your migration from Oracle to Percona Server, you must ensure that you understand the purpose and functionality for why it has to be Percona Server.

This blog will focus on Migrating from Oracle to Percona Server as its specific target database of choice. There's a page in the Oracle website about SQL Developer Supplementary Information for MySQL Migrations which can be used as a reference for the planned migration. This blog will not cover the overall process of migration, as it is a long process. But it will hopefully provide enough background information to serve as a guide for your migration process.

Since Percona Server is a fork of MySQL, almost all features that come along in MySQL are present in Percona Server. So any reference of MySQL here is applicable as well to Percona Server. We previously blogged about migrating Oracle Database to PostgreSQL. I’ll reiterate again the reasons why one would consider migrating from Oracle to an open-source RDBMS such as PostgreSQL or Percona Server/MySQL/MariaDB.

  1. Cost: As you may know Oracle licence cost is very expensive and there is additional cost for some features like partitioning and high availability. So overall it's very expensive.
  2. Flexible open source licensing and easy availability from public cloud providers like AWS.
  3. Benefit from open source add-ons to improve performance.
Planning and Development Strategy

Migration from Oracle going to Percona Server 8.0 can be a pain since there's a lot of key factors that needs to be considered and addressed. For example, Oracle can run on a Windows Server machine but Percona Server does not support Windows. Although you can compile it for Windows, Percona itself does not offer any support for Windows. You must also identify your database architecture requirements, since Percona Server is not designed for OLAP (Online Analytical Processing) or data-warehousing applications. Percona Server/MySQL RDBMS are perfect fit for OLTP (Online Transaction Processing).

Identifying the key aspect of your database architecture, for example if your current Oracle architecture implements MAA (Maximum Available Architecture) with Data Guard ++ Oracle RAC (Real Application Cluster), you should determine its equivalence in Percona Server. There's no straight answer for this within MySQL/Percona Server. However, you can choose from a synchronous replication, an asynchronous replication (Percona XtraDB Cluster is still currently on version 5.7.x), or with Group Replication. Then, there's multiple alternatives that you can implement for your own high-availability solution. For example, (to name a few) using Corosync/Pacemaker/DRBD/Linux stack, or using MHA (MySQL High Availability), or using Keepalived/HaProxy/ProxySQL stack, or plainly rely on ClusterControl which supports Keepalived, HaProxy, ProxySQL, Garbd, and Maxscale for your high-availability solutions.

On the other side, the question you have also to consider as part of the plan is "How will Percona will provide support and who will help us when Percona Server itself encounters a bug or how high is the urgency when we need help?". One thing to consider as well is budget, if the purpose of migration from enterprise database to an open-source RDBMS is because of cost-cutting.

There are different options from migration planning to the things you need to do as part of your development strategy. Such options include engaging with experts in the MySQL/Percona Server field and that includes us here at Severalnines. There are lots of MySQL consulting firms that can help you through this since migration from Oracle to MySQL requires a lot of expertise and know-how in the MySQL Server area. This should not be limited to the database but it should cover expertise in scalability, redundancy, backups, high-availability, security, monitoring/observability, recovery and engaging on mission critical systems. Overall, it should have an understanding of your architectural insight without exposing confidentiality of your data.

Assessment or Preliminary Check

Backing up your data including configurations or setup files, kernel tunings, automation scripts shall not be left into oblivion. It's an obvious task, but before you migrate, always secure everything first , especially when moving to a different platform.

You must assess as well that your applications are following the up-to-date software engineering conventions and ensure that they are platform agnostic. These practices can be to your benefit especially when moving to a different database platform, such as Percona Server for MySQL.

Take note that the operating system that Percona Server requires can be a show-stopper if your application and database run on a Windows Server and the application is Windows dependent; then this could be a lot of work! Always remember that Percona Server is on a different platform: perfection might not be guaranteed but can be achieved close enough.

Lastly, make sure that the targeted hardware is designed to work feasibly with Percona's server requirements or that it is bug-free at least (see here). You may consider stress testing first with Percona Server before reliably moving off your Oracle Database.

What You Should Know

It is worth noting that in Percona Server / MySQL, you can create multiple databases whereas Oracle does not come with that same functionality as MySQL.

In MySQL, physically, a schema is synonymous with a database. You can substitute the keyword SCHEMA instead of DATABASE in MySQL SQL syntax, for example using CREATE SCHEMA instead of CREATE DATABASE; whilst Oracle has a distinction of this. A schema represents only a part of a database: the tables and other objects owned by a single user. Normally, there is a one-to-one relationship between the instance and the database.

For example, in a replication setup equivalent in Oracle (e.g. Real Application Clusters or RAC), you have your multiple instances accessing a single database. This lets you start Oracle on multiple servers, but all accessing the same data. However, in MySQL, you can allow access to multiple databases from your multiple instances and can even filter out which databases/schema you can replicate to a MySQL node.

Referencing from one of our previous blog, the same principle applies when speaking of converting your database with available tools found on the internet.

There is no such tool that can 100% convert Oracle database into Percona Server / MySQL; some of it will be manual work.

Checkout the following sections for things that you must be aware of when it comes to migration and verifying the logical SQL result.

Data Type Mapping

MySQL / Percona Server have a number of data-types that is almost the same as Oracle but not as rich as compared to Oracle. But since the arrival of the 5.7.8 version of MySQL, is supports for a native JSON data type.

Below is its data-type equivalent representation (tabular representation is taken from here):

  Oracle MySQL 1 BFILE Pointer to binary file, ⇐ 4G VARCHAR(255) 2 BINARY_FLOAT 32-bit floating-point number FLOAT 3 BINARY_DOUBLE 64-bit floating-point number DOUBLE 4 BLOB Binary large object, ⇐ 4G LONGBLOB 5 CHAR(n), CHARACTER(n) Fixed-length string, 1 ⇐ n ⇐ 255 CHAR(n), CHARACTER(n) 6 CHAR(n), CHARACTER(n) Fixed-length string, 256 ⇐ n ⇐ 2000 VARCHAR(n) 7 CLOB Character large object, ⇐ 4G LONGTEXT 8 DATE Date and time DATETIME 9 DECIMAL(p,s), DEC(p,s) Fixed-point number DECIMAL(p,s), DEC(p,s) 10 DOUBLE PRECISION Floating-point number DOUBLE PRECISION 11 FLOAT(p) Floating-point number DOUBLE 12 INTEGER, INT 38 digits integer INT DECIMAL(38) 13 INTERVAL YEAR(p) TO MONTH Date interval VARCHAR(30) 14 INTERVAL DAY(p) TO SECOND(s) Day and time interval VARCHAR(30) 15 LONG Character data, ⇐ 2G LONGTEXT 16 LONG RAW Binary data, ⇐ 2G LONGBLOB 17 NCHAR(n) Fixed-length UTF-8 string, 1 ⇐ n ⇐ 255 NCHAR(n) 18 NCHAR(n) Fixed-length UTF-8 string, 256 ⇐ n ⇐ 2000 NVARCHAR(n) 19 NCHAR VARYING(n) Varying-length UTF-8 string, 1 ⇐ n ⇐ 4000 NCHAR VARYING(n) 20 NCLOB Variable-length Unicode string, ⇐ 4G NVARCHAR(max) 21 NUMBER(p,0), NUMBER(p) 8-bit integer, 1 <= p < 3 TINYINT (0 to 255) 16-bit integer, 3 <= p < 5 SMALLINT 32-bit integer, 5 <= p < 9 INT 64-bit integer, 9 <= p < 19 BIGINT Fixed-point number, 19 <= p <= 38 DECIMAL(p) 22 NUMBER(p,s) Fixed-point number, s > 0 DECIMAL(p,s) 23 NUMBER, NUMBER(*) Floating-point number DOUBLE 24 NUMERIC(p,s) Fixed-point number NUMERIC(p,s) 25 NVARCHAR2(n) Variable-length UTF-8 string, 1 ⇐ n ⇐ 4000 NVARCHAR(n) 26 RAW(n) Variable-length binary string, 1 ⇐ n ⇐ 255 BINARY(n) 27 RAW(n) Variable-length binary string, 256 ⇐ n ⇐ 2000 VARBINARY(n) 28 REAL Floating-point number DOUBLE 29 ROWID Physical row address CHAR(10) 30 SMALLINT 38 digits integer DECIMAL(38) 31 TIMESTAMP(p) Date and time with fraction DATETIME(p) 32 TIMESTAMP(p) WITH TIME ZONE Date and time with fraction and time zone DATETIME(p) 33 UROWID(n) Logical row addresses, 1 ⇐ n ⇐ 4000 VARCHAR(n) 34 VARCHAR(n) Variable-length string, 1 ⇐ n ⇐ 4000 VARCHAR(n) 35 VARCHAR2(n) Variable-length string, 1 ⇐ n ⇐ 4000 VARCHAR(n) 36 XMLTYPE XML data LONGTEXT

Data type attributes and options:

Oracle MySQL BYTE and CHAR column size semantics Size is always in characters   Transactions

Percona Server uses XtraDB (an enhanced version of InnoDB) as its primary storage engine for handling transactional data; although various storage engines can be an alternative choice for handling transactions such as TokuDB (deprecated) and MyRocks storage engines.

Whilst there are advantages and benefits to using or exploring MyRocks with XtraDB, the latter is more robust and de facto storage engine that Percona Server is using and its enabled by default, so we'll use this storage engine as the basis for migration with regards to transactions.

By default, Percona Server / MySQL has autocommit variable set to ON which means that you have to explicitly handle transactional statements to take advantage of ROLLBACK for ignoring changes or taking advantage of using SAVEPOINT.

It's basically the same concept that Oracle uses in terms of commit, rollbacks and savepoints.

For explicit transactions, this means that you have to use the START TRANSACTION/BEGIN; <SQL STATEMENTS>; COMMIT; syntax.

Otherwise, if you have to disable autocommit, you have to explicitly COMMIT all the time for your statements that requires changes to your data.

Dual Table

MySQL has the dual compatibility with Oracle which is meant for compatibility of databases using a dummy table, namely DUAL.

This suits Oracle's usage of DUAL so any existing statements in your application that use DUAL might require no changes upon migration to Percona Server.

The Oracle FROM clause is mandatory for every SELECT statement, so Oracle database uses DUAL table for SELECT statement where a table name is not required.

In MySQL, the FROM clause is not mandatory so DUAL table is not necessary. However, the DUAL table does not work exactly the same as it does for Oracle, but for simple SELECT's in Percona Server, this is fine.

See the following example below:

In Oracle,

SQL> DESC DUAL; Name Null? Type ----------------------------------------- -------- ---------------------------- DUMMY VARCHAR2(1) SQL> SELECT CURRENT_TIMESTAMP FROM DUAL; CURRENT_TIMESTAMP --------------------------------------------------------------------------- 16-FEB-19 04.16.18.910331 AM +08:00

But in MySQL:

mysql> DESC DUAL; ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'DUAL' at line 1 mysql> SELECT CURRENT_TIMESTAMP FROM DUAL; +---------------------+ | CURRENT_TIMESTAMP | +---------------------+ | 2019-02-15 20:20:28 | +---------------------+ 1 row in set (0.00 sec)

Note: the DESC DUAL syntax does not work in MySQL and the results as well differ as CURRENT_TIMESTAMP (uses TIMESTAMP data type) in MySQL does not include the timezone.

SYSDATE

Oracle's SYSDATE function is almost the same in MySQL.

MySQL returns date and time and is a function that requires () (close and open parenthesis with no arguments required. To demonstrate this below, here's Oracle and MySQL on using SYSDATE.

In Oracle, using plain SYSDATE just returns the date of the day without the time. But to get the time and date, use TO_CHAR to convert the date time into its desired format; whereas in MySQL, you might not need it to get the date and the time as it returns both.

See example below.

In Oracle:

SQL> SELECT TO_CHAR (SYSDATE, 'MM-DD-YYYY HH24:MI:SS') "NOW" FROM DUAL; NOW ------------------- 02-16-2019 04:39:00 SQL> SELECT SYSDATE FROM DUAL; SYSDATE --------- 16-FEB-19

But in MySQL:

mysql> SELECT SYSDATE() FROM DUAL; +---------------------+ | SYSDATE() | +---------------------+ | 2019-02-15 20:37:36 | +---------------------+ 1 row in set (0.00 sec)

If you want to format the date, MySQL has a DATE_FORMAT() function.

You can check the MySQL Date and Time documentation for more info.

ClusterControl Single Console for Your Entire Database Infrastructure Find out what else is new in ClusterControl Install ClusterControl for FREE TO_DATE

Oracle's TO_DATE equivalent in MySQL is the STR_TO_DATE() function.

It’s almost identical to the one in Oracle: it returns the DATE data type, while in MySQL it returns the DATETIME data type.

Oracle:

SQL> SELECT TO_DATE ('20190218121212','yyyymmddhh24miss') as "NOW" FROM DUAL; NOW ------------------------- 18-FEB-19

MySQL:

mysql> SELECT STR_TO_DATE('2019-02-18 12:12:12','%Y-%m-%d %H:%i:%s') as "NOW" FROM DUAL; +---------------------+ | NOW | +---------------------+ | 2019-02-18 12:12:12 | +---------------------+ 1 row in set (0.00 sec) SYNONYM

In MySQL, there's no such support nor any equivalence for SYNONYM in Oracle.

A possible alternative can be possible with MySQL is using VIEW.

Although SYNONYM can be used to create an alias of a remote table,

e.g.

CREATE PUBLIC SYNONYM emp_table FOR hr.employees@remote.us.oracle.com

In MySQL, you can take advantage of using FEDERATED storage engine.

e.g.

CREATE TABLE hr_employees ( id INT(20) NOT NULL AUTO_INCREMENT, name VARCHAR(32) NOT NULL DEFAULT '', other INT(20) NOT NULL DEFAULT '0', PRIMARY KEY (id), INDEX name (name), INDEX other_key (other) ) ENGINE=FEDERATED DEFAULT CHARSET=utf8mb4 CONNECTION='mysql://fed_user@remote_host:9306/federated/test_table';

Or you can simplify the process with CREATE SERVER syntax, so that when creating a table acting as your SYNONYM for accessing a remote table, it will be easier. See the documentation for more info on this.

Behaviour of Empty String and NULL

Take note that in Percona Server / MySQL, empty string is not NULL whereas Oracle treats empty string as null values.

In Oracle:

SQL> SELECT CASE WHEN '' IS NULL THEN 'Yes' ELSE 'No' END AS "Null Eval" FROM dual; Nul --- Yes

In MySQL:

mysql> SELECT CASE WHEN '' IS NULL THEN 'Yes' ELSE 'No' END AS "Null Eval" FROM dual; +-----------+ | Null Eval | +-----------+ | No | +-----------+ 1 row in set (0.00 sec) Sequences

In MySQL, there's no exact same approach to what Oracle does for SEQUENCE.

Although there are some posts that are simulating the functionality of this approach, you might be able to try to get the next key using LAST_INSERT_ID() as long as your table's clustered index, PRIMARY KEY, is defined with << is there something missing? >>

Character String Functions

Unlike Oracle, MySQL / Percona Server has a handful of string functions but not as many helpful functions built-in to the database.

It would be too long to discuss it here one-by-one, but you can check the documentation from MySQL and compare this against Oracle's string functions.

DML Statements

Insert/Update/Delete statements from Oracle are congruous in MySQL.

Oracle's INSERT ALL/INSERT FIRST is not supported in MySQL.

Otherwise, you’d need to state your MySQL queries one-by-one.

e.g.

In Oracle:

SQL> INSERT ALL INTO CUSTOMERS (customer_id, customer_name, city) VALUES (1000, 'Jase Alagaban', 'Davao City') INTO CUSTOMERS (customer_id, customer_name, city) VALUES (2000, 'Maximus Aleksandre Namuag', 'Davao City') SELECT * FROM dual; 2 rows created.

2 rows created.

But in MySQL, you have to run the insert one at a time:

mysql> INSERT INTO CUSTOMERS (customer_id, customer_name, city) VALUES (1000, 'Jase Alagaban', 'Davao City'); Query OK, 1 row affected (0.02 sec) mysql> INSERT INTO CUSTOMERS (customer_id, customer_name, city) VALUES (2000, 'Maximus Aleksandre Namuag', 'Davao City'); Query OK, 1 row affected (0.00 sec)

The INSERT ALL/INSERT FIRST doesn’t compare to how it is used in Oracle, where you can take advantage of conditions by adding a WHEN keyword in your syntax; there's no equivalent option in MySQL / Percona Server in this case.

Hence, your alternative solution on this is to use procedures.

Outer Joins "+" Symbol

In Oracle, using + operator for left and right joins is not supported at present in MySQL as + operator is only used for arithmetic decisions.

Hence, if you have + operator in your existing Oracle SQL statements, you need to replace this with LEFT JOIN or RIGHT JOIN.

You might want to check the official documentation for "Outer Join Simplification" of MySQL.

START WITH..CONNECT BY

Oracle uses START WITH..CONNECT BY for hierarchical queries.

Starting with MySQL / Percona 8.0, there is support for generating hierarchical data results which uses models such as adjacency list or nested set models. This is called Common Table Expressions (CTE) in MySQL.

Similar to PostgreSQL, MySQL uses WITH RECURSIVE syntax for hierarchical queries so translate CONNECT BY statement into WITH RECURSIVE statement.

Check down below on how it differs from ORACLE and in MySQL / Percona Server.

In Oracle:

SELECT cp.id, cp.title, CONCAT(c2.title, ' > ' || cp.title) as path FROM category cp INNER JOIN category c2 ON cp.parent_id = c2.id WHERE cp.parent_id IS NOT NULL START WITH cp.id >= 1 CONNECT BY NOCYCLE PRIOR c2.id=cp.parent_id;

And in MySQL:

WITH RECURSIVE category_path (id, title, path) AS ( SELECT id, title, title as path FROM category WHERE parent_id IS NULL UNION ALL SELECT c.id, c.title, CONCAT(cp.path, ' > ', c.title) FROM category_path AS cp JOIN category AS c ON cp.id = c.parent_id ) SELECT * FROM category_path ORDER BY path; PL/SQL in MySQL / Percona?

MySQL / Percona RDBMS has a different approach than Oracle's PL/SQL.

MySQL uses stored procedures or stored functions, which is similar to PL/SQL and syntax using BEGIN..END syntax.

Oracle's PL/SQL is compiled before execution when it is loaded into the server, while MySQL is compiled and stored in the cache when it's invoked.

You may want to checkout this documentation as a reference guide on converting your PL/SQL to MySQL.

Migration Tools

I did some research for any tools that could be a de facto standard for migration but I couldn’t find a good answer.

Though, I did find sqlines and it looks simple but promising.

While I didn’t deep-dive into it, the website offers a handful of insights, which could help you on migrating from Oracle to MySQL/Percona Server. There are also paid tools such as this and this.

I've also searched through github but found nothing much more appealing as a resolution to the problem. Hence, if you're aiming to migrate from Oracle and to Amazon, they have AWS Schema Conversion Tool for which migrating from Oracle to MySQL is supported.

Overall, the reason why migration is not an easy thing to do is mainly because Oracle RDBMS is such a beast with lots of features that Percona Server / MySQL or MariaDB RDBMS still do not have.

Anyhow, if you find or know of any tools that you find helpful and beneficial for migrating from Oracle to MySQL / Percona Server, please leave a comment on this blog!

Testing

As part of your migration plan, testing is a vital task that plays a very important role and affects your decision with regards to migration.

The tool dbdeployer (a port of MySQL Sandbox) is a very helpful tool that you can take advantage of. This is pretty easy for you to try and test different approaches and saves you time, rather than setting up the whole stack if your purpose is to try and test the RDBMS platform first.

For testing your SQL stored routines (functions or procedures), triggers, events, I suggest you use these tools mytap or the Google's Unit Testing Framework.

Percona as well offers a number of tools that are available for download on their website. Checkout Percona Toolkit here. You can cherry-pick the tools according to your needs especially for testing and production-usage tasks.

Related resources  Basic Administration Comparison Between Oracle, MSSQL, MySQL, PostgreSQL  Monitoring Your Databases with MySQL Enterprise Monitor  Database High Availability Comparison - MySQL / MariaDB Replication vs Oracle Data Guard

Overall, things that you need to keep-in-mind as your guidelines when doing a test for your MySQL Server are:

  • After your installation, you need to consider doing some tuning. Checkout this Percona blog for help.
  • Do some benchmarks and stress-load testing for your configuration setup on your current node. Checkout mysqlslap and sysbench which can help you with this. Also check out our blog "How to Benchmark Performance of MySQL & MariaDB using SysBench".
  • Check your DDL's if they are correctly defined such as data-types, constraints, clustered and secondary indexes, or partitions, if you have any.
  • Check your DML especially if syntax are correct and are saving the data correctly as expected.
  • Check out your stored routines, events, trigger to ensure they run/return the expected results.
  • Verify that your queries running are performant. I suggest you take advantage of open-source tools or try our ClusterControl product. It offers monitoring/observability especially of your MySQL / Percona Server. You can use ClusterControl here to monitor your queries and its query plan to make sure they are performant.
Tags:  MySQL oracle migration3 percona server for mysql

Deprecation of TLSv1.0 2019-02-28

Ahead of the PCI move to deprecate the use of ‘early TLS’, we’ve previously taken steps to disable TLSv1.0.

Unfortunately at that time we encountered some issues which led us to rollback these changes. This was to allow users of operating systems that did not – yet – support TLSv1.1 or higher to download Percona packages over TLSv1.0.

Since then, we have been tracking our usage statistics for older operating systems that don’t support TLSv1.1 or higher at https://repo.percona.com. We now receive very few legitimate requests for these downloads.

Consequently,  we are ending support for TLSv1.0 on all Percona web properties.

While the packages will still be available for download from percona.com, we are unlikely to update them as the OS’s are end-of-life (e.g. RHEL5). Also, in future you will need to download these packages from a client that supports TLSv1.1 or greater.

For example EL5 will not receive an updated version of OpenSSL to support versions greater than TLSv1.1. PCI has called for the deprecation of ‘early TLS’ standards. Therefore you should upgrade any EL5 installations to EL6 or greater as soon as possible. As noted in this support policy update by Red Hat, EL5 stopped receiving support under extended user support (EUS) in March 2015.

To continue to receive updates for your OS and for any Percona products that you use, you need to update to more recent versions of CentOS, Scientific Linux, and RedHat Enterprise Linux.


Photo by Kevin Noble on Unsplash

7 JavaScript Playgrounds to Use in 2019

The importance of online code editing platforms cannot be overemphasized. As much as we love our local IDE's, one too many times we find ourselves needing to quickly share and or collaborate with a friend or colleague in our local projects.

In cases like this, online playgrounds give us that added functionality to write and run codes as well as share snippets and have colleagues test it out and give us feedback. In this post, we'll be looking at the top 7 online code editors for JavaScript in no particular order.

CodePen

CodePen is an online social development environment for front-end developers. It comes fully equipped with all the features you'll need to build, test, share, collaborate and even deploy your websites.

Plans and Features CodePen plans are flexible enough to accommodate whatever needs you might have. First, there are individual and team plans. Under the individual plans, you have a range of other plans that have been carefully structured to offer you a wide range of choices depending on what features are more important to you and on how much you can afford.

The free plan has all you need to perform usual operations like creating pens, sharing snippets, running tests and so much more. If you're developing for experimental purposes or just building out your project for development purposes, the free plan has all you need.

The PRO plan, however, adds a little bit to what you can do with CodePen. With CodePen PRO, you get unlimited privacy, asset hosting space, live view, embedded themes and so much more. On the PRO plan, you won't miss your local IDE. If you have specific needs for these features and you've got a few dollars to spare, then these plans are totally worth the price.

You can visit the pricing page to have a closer look at what each plan offers to help you decide. However, that's not all, as we mentioned earlier, there's also a team plan for people working in teams. The price of the team plan is subject to the number of members in your team. However, it comes with a standard $12/month/member price.

CodeSandbox

CodeSandbox is an online editor that allows developers to focus on writing code while it handles all the necessary background processes and configurations. Unlike CodePen, CodeSandbox focuses more on building and sharing code demos that contain back-end components. The editor is optimized to analyze npm dependencies, show custom error messages, and also make projects searchable by npm dependency.

CodeSandbox offers dedicated sandboxes to help developers quickly get started on developing with their favorite tools to build web applications. With CodeSandbox, all you need to do is open the browser, select your preferred development tool and start writing code. It abstracts all the backgrounds tasks and configurations so that you only need to worry about writing your code. With Condesandbox, we can do all of the following and more:

Github integration With Codesanbox 2.0, you can commit, create repositories and open pull requests right from within CodeSandbox.

Host static files CodeSandbox hosts all the files in the public directory for you. Previously this functionality didn't exist on CodeSandbox, but now, you can add static images, web workers, etc. As a result, you can imitate all the functionalities of your local development server as you now have full access to the index.html file.

Real-time collaboration With CodeSandbox, you can create a project and open a Live Session to generate a live URL you can share to colleagues. Through the URL, all your colleagues can work together on the project at the same time. This may seem difficult to manage when the number grows, and you can't track what each person is doing. As a result, you can switch to a Classroom Mode where you can specify who can edit the sandbox and who can only view.

Visual studio code integration CodeSandbox has a feature called VSCode in browser. It allows you access to VSCode features like keybindings, user snippets, breadcrumbs etc. All you need to do is copy your settings files directly from VSCode to CodeSandbox, and you will have control of all the files on your sandbox. These are only a handful of the features you'll get with CodeSandbox, there are more, like working in teams, Sandbox containers etc. I invite you to check them out their docs page for more.

Back-end support

It is worthy to note that CodeSandbox is one of the few online playgrounds that has support for back-end languages like Node.js. What's more? It has npm support. As a result of it, you can install any npm package you require in seconds.

StackBlitz

StackBlitz is a VSCode powered online playground for web developers. It offers developers the ability to create projects in just one click. The collaboration features of StackBlitz makes it possible to share a created project with colleagues through a unique project URL. Seamless project setup Like other playgrounds we've already mentioned, StackBlitz automatically takes care of all the background processes involved in setting up projects; like installing dependencies, compiling, bundling, etc. This way it abstracts the setup process for you so you can focus on what you do best, writing code. Stackblitz is renowned for its ability to handle Angular and React projects, all you need to set up a new of them is the click of a button.

VSCode similarity Because VSCode powers StackBlitz, it comes packed with all the Aesthetic features we love in VSCode, giving you the look and feel of your local IDE.

TypeScript support StackBlitz has TypeScript support and TypeScript auto-completion, a feature that is not found in other IDE's like Plunker.

NPM support With StackBlitz, you can import any npm package into your project, just like you would in VSCode. Better still, you can copy snippets from documentation pages and blogs into the editor and it'll automatically detect the missing packages and prompt you to install them.

Image credit to Eric Simons.

Offline support Thanks to StackBlitz in-browser development server, you can continue writing code and editing your work even when offline. This gives you the superpower to build offline while leveraging the power of the online platform, killing two birds with one stone. We can't possibly go over all the perks, but you're welcome to see them yourself on the website.

JS Fiddle

JSFiddle is an online playground for creating, testing and showcasing collaborational HTML, CSS and JavaScript code snippets, known as 'fiddles'. It is one of the earliest playgrounds that laid the foundation for the creation of other modern-day playgrounds. At the moment, it may seem a bit basic compared to modern playgrounds like CodePen; however, it still performs basic playground operations like testing, sharing, and collaboration. As an added advantage, JSFiddle can also perform complex ajax simulations. JSFiddle lets you quickly get started by providing access to unique boilerplate templates for diverse technologies. As a result, it'll take you just about the click of a button to generate a boilerplate code for React, Vue or whichever technology you want.

As much as JSFiddle behaves alike with other playgrounds, it has certain features that make it equally unique in its own way. Unlike most others, with JSFiddle, you can customize, add and use other JavaScript tools and frameworks easily. With the Togetherjs support, JSSFiddle offers an interactive user interface that lets users collaborate and share fiddles with colleagues. Moreso, it has inbuilt Git support. You can also use JSFiddle for any of the following:

  • Bug reporting (test-case) for Github Issues
  • Presenting code answers on Stack Overflow
  • Live code collaboration
  • Code snippets hosting
JS BIN

JSBin is a live playground for Html, CSS and JavaScript and a range of other preprocessors like jade, markdown and much more. It provides specific editor panels with an intuitive user interface for user interactivity. With JSBin, users can create bins, share, collaborate and test codes.

JSBin is mostly focused on the ability to share code. It shares not just the code but also the complete output result of the code in the bin. It's real-time collaboration features lets colleagues view and see changes in real-time as you type in the editor panels.

Though JSBin opens into a default Html code on start, there are many libraries available for you. You can add as many libraries as is required for your project to build your desired product.

With JSBin, all created bins have a unique sharing Url that other users can join with to collaborate, fork and or edit the bin, while your original bin remains intact. Moreso, with JSBin, you can perform the following operations and more

  • Export your bins as a gist
  • live reload bins in editor and full preview
  • Download bins
  • Save snapshots of bins
  • Templatize bins
  • Archive bins
  • etc

JSBin also has a PRO plan that adds more functionality to what you can do. It gives an additional layer of features to extend your bin functionality. With The JSBin PRO, you can

  • SSL embeds
  • Custom embed CSS and editor settings
  • Sandbox mode - does not save the bin
  • Private bins
  • Dropbox
  • Vanity URLs - blog post
  • Asset hosting
Scrimba

Scrimba is an entirely different kind of online playground than what we've seen so far. It offers you the ability to pause the video and edit the instructor's code and see the result in the editor.

In-video interaction This provides an extra layer of interactivity to the usual playground functionality. Scrimba lets you take up an instructors demo and build it up into whatever use case you desire. In the playground mode, you can interact with the instructor's code however you please, and you can edit, copy, paste and basically perform any interactive operation you desire.

Cross-platform Scrimba is completely cross-platform. It adapts to any screen size and arranges contents in both portrait and landscape mode to match your screens resolution requirements, this way, it constantly serves you the best visual output.

Console and dependencies Scrimba has an in-built console that both the user and the instructor can access to run commands and debug code. You can log messages just the same way you would in any other editor. It also lets you install external dependencies into your project. What's better? the Scrimba team has designed the console in a way that you can see the console output with opening the console panel.

Analytics Scrimbar has a sidebar notes features that track all the changes you make in the editor. This is a handy feature that gives you the ability to keep track of all your changes and refer to them in the future when the need be.

Liveweaver

Liveweaver is an online playground for Html, CSS, and JavaScript for web designers and developers. Like other editors we've come across, Liveweaver gives users the ability to create, test, and share code with colleagues. It opens into distinct editor panels for Html, CSS, JavaScript and Output panels respectively. One remarkable feature of Liveweaver is its simplicity. A newbie could easily open up Liveweaver, create a small app and run it without reading up any prior documentation.

Moreso, Liveweaver supports a wide range of third-party libraries like Bootstrap, jQuery, Threejs etc. This helps developers build more robust applications without library support hinderances.

Liveweave is free to use and comes packed with over 20 handy JavaScript libraries. With Liveweave, you can do all of the following and more

  • Toggle live preview in editor panel
  • Toggle light and dark mode for better viewing
  • built-in ruler for measurements and aesthetics
  • Collaborate with colleagues easily with the TeamUp feature
  • Download weaves easily with a button click
  • Supports SVG
  • etc
Conclusion

In this post we have discussed five top online JavaScript editors you can use in 2019. Each playground has it's unique abilities so its difficult to choose a "best" one amongst them. We can, however, choose the one whose features best solves our needs. Disclaimer: This five online playgrounds discussed here are entirely my choice based on my own experience with them. There are other amazing online playgrounds like Plunker, CSS Deck, Dabblet, etc. If you have other playgrounds in mind, please do share them with us in the comments below. Hopefully, this post would help you pick the most suitable one for your needs.

Using FORCE INDEX: When Your Query Optimizer Gets It Wrong

A Guest Post from Jordan Raine, Clio

Jordan Raine is a staff software developer at Clio, the leading provider of legal practice management, client relationship management, and client intake software. Clio is dedicated to helping lawyers meet client needs and run efficient, profitable practices. Jordan wrote this post to help mySQL developers save time/effort and ultimately improve their customers' experience. 

Unlike most code a developer writes, writing SQL only requires us to describe what data we want and not how to get it. When given a query like SELECT id, author_id FROM posts WHERE author_id = 123 ORDER BY id, you needn’t concern yourself with what indexes are used (if any), what type of sort is used, or any other number of implementation details. Instead, the query optimizer handles this for you. This keeps SQL concise and readable and, for the most part, the query optimizer chooses the best path for a given set of data.

But sometimes the query optimizer gets it wrong. Let’s look at one example of this and what to do about it.

We have a blog app that pulls from a posts table that looks like this:

CREATE TABLE `posts` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`body` text,
`author_id` bigint(20) NOT NULL,
`coauthor_id` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`), KEY `index_posts_on_author_id` (`author_id`),
KEY `index_posts_on_coauthor_id` (`coauthor_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Our app is doing really great and there are over 100 million rows in the posts table. Unfortunately, the coauthor feature wasn’t as popular as expected and is rarely used:

mysql> SELECT COUNT(*) FROM posts WHERE coauthor_id IS NOT NULL;
+----------+
| COUNT(*) |
+----------+
| 159286 |
+----------+
1 row in set (0.04 sec)

Only about 0.01% of the posts have a coauthor, leaving 99.9% of the rows NULL. When MySQL calculates index statistics, the coauthor_id index comes back with extremely low cardinality: only 29!

mysql> show indexes from posts;
+-------+------------+----------------------------+--------------+-------------
+-----------+-------------+----------+--------+------+------------+---------
+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name
| Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
Index_comment |
+-------+------------+----------------------------+--------------+-------------
+-----------+-------------+----------+--------+------+------------+---------
+---------------+
| posts | 0 | PRIMARY | 1 | id
| A | 112787604 | NULL | NULL | | BTREE | |
|
| posts | 1 | index_posts_on_author_id | 1 | author_id
| A | 2891989 | NULL | NULL | | BTREE | |
|
| posts | 1 | index_posts_on_coauthor_id | 1 | coauthor_id
| A | 29 | NULL | NULL | YES | BTREE | |
|+-------+------------+----------------------------+--------------+-------------
+-----------+-------------+----------+--------+------+------------+---------
+---------------+ 3 rows in set (0.00 sec)

When an index has such low cardinality, MySQL is less likely to use it. After all, what’s the point of an index that can eliminate very few values?

However, even with such low cardinality, the query optimizer chooses to use it when looking up posts by a coauthor:

mysql> EXPLAIN SELECT * FROM posts WHERE coauthor_id = 98543\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: posts
type: ref
possible_keys: index_posts_on_coauthor_id
key: index_posts_on_coauthor_id
key_len: 9
ref: const
rows: 2
Extra: NULL
1 row in set (0.00 sec)

mysql> SELECT * FROM posts WHERE coauthor_id = 98543;
+----------+--------------+-----------+-------------+
| id | body | author_id | coauthor_id |
+----------+--------------+-----------+-------------+
| 21168595 | Lipsum Lorem | 1 | 98543 |
| 25695860 | Lipsum Lorem | 99833 | 98543 |
+----------+--------------+-----------+-------------+
2 rows in set (0.01 sec)

This even works when looking up multiple coauthors in a single query:

mysql> EXPLAIN SELECT * FROM posts WHERE coauthor_id IN
(98543,99592,99096,98022,99643,99091,98578,99910,99910,98842)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: posts
type: range
possible_keys: index_posts_on_coauthor_id
key: index_posts_on_coauthor_id
key_len: 9
ref: NULL
rows: 19
Extra: Using index condition
1 row in set (0.00 sec)

mysql> SELECT * FROM posts WHERE coauthor_id IN
(98543,99592,99096,98022,99643,99091,98578,99910,99910,98842);
+-----------+--------------+-----------+-------------+
| id | body | author_id | coauthor_id |
+-----------+--------------+-----------+-------------+
| <results removed for brevity> |
+-----------+--------------+-----------+-------------+
19 rows in set (0.01 sec)

But things go off the rails when we add an extra ID to the IN clause:

mysql> EXPLAIN SELECT * FROM posts WHERE coauthor_id IN
(98543,99592,99096,98022,99643,99091,98578,99910,99910,98842,98511)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: posts
type: ALL
possible_keys: index_posts_on_coauthor_id
key: NULL
key_len: NULL
ref: NULL
rows: 112787604
Extra: Using where
1 row in set (0.01 sec)

mysql> SELECT * FROM posts WHERE coauthor_id IN
(98543,99592,99096,98022,99643,99091,98578,99910,99910,98842,98511);
+-----------+--------------+-----------+-------------+
| id | body | author_id | coauthor_id |
+-----------+--------------+-----------+-------------+
| <results removed for brevity> |
+-----------+--------------+-----------+-------------+
21 rows in set (37.39 sec)

Instead of using an index, the query optimizer has decided to scan the entire table. What happened?

To understand why this happens, we need to look at how MySQL picks an index. Generally, the best index is the one that matches the fewest rows. To estimate how many rows will be found in each index for a given query, the query optimizer can use one of two methods: using “dives” into the index or using index statistics.

Index dives — counting entries in an index for a given value — provides accurate row count estimates because an index dive is done for each index and value (e.g., for an IN with 10 values, it performs 10 index dives). This can degrade performance. Using index statistics avoids this problem, providing near constant time lookups by reading the table’s index statistics (i.e., cardinality). (More detail on how this is done can be found here.)

In other words: slow and accurate or fast and sloppy. For tables with evenly distributed data, the latter works great. However, when an index is sparsely populated, like with coauthor_id, index statistics can be wildly inaccurate.

In cases like this, we can lend a hand using index hints like USE INDEX and FORCE INDEX. Let’s start with USE INDEX , which according to the documentation, “tells MySQL to use only one of the named indexes to find rows in the table”:

mysql> EXPLAIN SELECT * FROM posts USE INDEX(index_posts_on_coauthor_id) WHERE
coauthor_id IN
(98543,99592,99096,98022,99643,99091,98578,99910,99910,98842,98511)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: posts
type: ALL
possible_keys: index_posts_on_coauthor_id
key: NULL
key_len: NULL
ref: NULL
rows: 112787604
Extra: Using where
1 row in set (0.00 sec)

This recommendation wasn't enough; MySQL still decided it was better to use no index at all. Let’s try again but this time use FORCE INDEX, which “acts like USE INDEX, with the addition that a table scan is assumed to be very expensive”:

mysql> EXPLAIN SELECT * FROM posts FORCE INDEX(index_posts_on_coauthor_id)
WHERE coauthor_id IN
(98543,99592,99096,98022,99643,99091,98578,99910,99910,98842,98511)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: posts
type: range
possible_keys: index_posts_on_coauthor_id
key: index_posts_on_coauthor_id
key_len: 9
ref: NULL
rows: 37761660
Extra: Using index condition
1 row in set (0.00 sec)

mysql> SELECT * FROM posts FORCE INDEX(index_posts_on_coauthor_id) WHERE
coauthor_id IN
(98543,99592,99096,98022,99643,99091,98578,99910,99910,98842,98511);
+-----------+--------------+-----------+-------------+
| id | body | author_id | coauthor_id |
+-----------+--------------+-----------+-------------+
| <results removed for brevity> |
+-----------+--------------+-----------+-------------+
21 rows in set (0.00 sec)

Thankfully, this does the trick, coercing the optimizer into a query plan we know to be more performant. Instead of 37 seconds, the query finishes in less than 1ms!

But why did the behavior change when adding one more ID to the query above, increasing it from 10 to 11 values? In MySQL 5.6, a new variable called eq_range_index_dive_limit was added, setting a threshold for equality ranges like IN

mysql> SHOW VARIABLES LIKE 'eq_range_index_dive_limit';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| eq_range_index_dive_limit | 10 |
+---------------------------+-------+
1 row in set (0.00 sec)

By default, it is set to 10 in MySQL 5.6 and was increased to 200 in MySQL 5.7.4. 

While intended to improve performance by reducing index dives, this variable surprised some teams − including my own − by significantly degrading performance of certain queries. When a routinely run query degrades in performance by 10,000x overnight, your database quickly becomes occupied entirely by those queries, grinding your app and your business to a halt. There’s a detailed look at one example of this from the MySQL at Facebook blog.

The query optimizer has been tuned to handle the most common data shapes and, in most cases, will choose a good query plan. However, when this is not the true, FORCE INDEX provides a way to influence the choices of the query optimizer. However, this heavy-handed approach comes with added responsibility: you’re now responsible to specify what data you want and how MySQL should retrieve it. As data evolves and new queries are introduced, the index you’ve forced MySQL to use may no longer be best.

It’s worth considering why the optimizer chooses a catastrophic query plan. In our example, based on a real world system, the shape of our data poorly fit the schema we’d chosen. Instead of continuing to optimize the query using FORCE INDEX, it may be time to roll up our sleeves and change the schema.

 

Jordan Raine, Staff Software Developer at Clio, is a full-stack web developer who loves to write code and solve interesting problems. He particularly enjoys Ruby/Rails backends, in addition to work in Java. Jordan also has extensive HTML, CSS, and JavaScript frontend experience. Whenever possible, he welcomes the opportunity to participate in open sourcing projects and contribute back to the community.

 

Pages