Planet MySQL

Percona Live Presents: The MySQL Query Optimizer Explained Through Optimizer Trace

During my presentation at Percona Live 2019 I will show how using Optimizer Trace can give insight into the inner workings of the MySQL Query Optimizer. Through the presentation, the audience will both be introduced to optimizer trace, learn more about the decisions the query optimizer makes, and learn about the query execution strategies the query optimizer has at its disposal. I’ll be covering the main phases of the MySQL optimizer and its optimization strategies, including query transformations, data access strategies, the range optimizer, the join optimizer, and subquery optimization.

Who’d benefit most from the presentation?

DBAs, developers, support engineers and other people who are concerned about MySQL query performance will benefit from this presentation. Knowing the optimizer trace will enable them to understand why the query optimizer selected a particular query plan. This will be very helpful in order to understand how tune their queries for better performance.

Whose presentations are you most looking forward to?

I’m definitely looking forward to A Proactive Approach to Monitoring Slow Queries by Shashank Sahni of ThousandEyes Inc. It is always interesting to learn how users of MySQL monitor their systems to detect and improve slow queries.

The post Percona Live Presents: The MySQL Query Optimizer Explained Through Optimizer Trace appeared first on Percona Community Blog.

How to Integrate Tungsten Cluster Monitoring Tools with PagerDuty Alerts

Overview The Skinny

In this blog post we will discuss how to best integrate various Continuent-bundled cluster monitoring solutions with PagerDuty (, a popular alerting service.

Agenda What’s Here?
  • Briefly explore the bundled cluster monitoring tools
  • Describe the procedure for establishing alerting via PagerDuty
  • Examine some of the multiple monitoring tools included with the Continuent Tungsten Clustering software, and provide examples of how to send an email to PagerDuty from each of the tools.
Exploring the Bundled Cluster Monitoring Tools A Brief Summary

Continuent provides multiple methods out of the box to monitor the cluster health. The most popular is the suite of Nagios/NRPE scripts (i.e. cluster-home/bin/check_tungsten_*). We also have Zabbix scripts (i.e. cluster-home/bin/zabbix_tungsten_*). Additionally, there is a standalone script available, tungsten_monitor, based upon the shared Ruby-based tpm libraries. We also include a very old shell script called, but it is obsolete.

Implementing a Simple PagerDuty Alert How To Add a PagerDuty Email Endpoint for Alerting
  • Create a new user to get the alerts:
    Configuration -> Users -> Click on the [+ Add Users] button
    • Enter the desired email address and invite. Be sure to respond to the invitation before proceeding.
  • Create a new escalation policy:
    Configuration -> Escalation Policies -> Click on the [+ New Escalation Policy] button
    • Enter the policy name at the top, i.e. Continuent Alert Escalation Policy
    • “Notify the following users or schedules” – click in the box and select the new user created in the first step
    • “escalates after” Set to 1 minute, or your desired value
    • “If no one acknowledges, repeat this policy X times” – set to 1 time, or your desired value
    • Finally, click on the green [Save] button at the bottom
  • Create a new service:
    Configuration -> Services -> Click on the [+ New Service] button
    • General Settings: Name – Enter the service name, i.e. Continuent Alert Emails from Monitoring (what you type in this box will automatically populate the
    • Integration Settings: Integration Type – Click on the second radio choice “Integrate via email”
    • Integration Settings: Integration Name – Email (automatically set for you, no action needed here)
    • Integration Settings: Integration Email – Adjust this email address, i.e. alerts, then copy this email address into a notepad for use later
    • Incident Settings: Escalation Policy – Select the Escalation Policy you created in the third step, i.e. “Continuent Alert Escalation Policy”
    • Incident Settings: Incident Timeouts – Check the box in front of Auto-resolution
    • Finally, click on the green [Add Service] button at the bottom

At this point, you should have an email address like “” available for testing.

Go ahead and send a test email to that email address to make sure the alerting is working.

If the test works, you have successfully setup a PagerDuty email endpoint to use for alerting, congratulations!

How to Send Alerts to PagerDuty using the tungsten_monitor Script Invoking the Bundled Script via cron

The tungsten_monitor script provides a mechanism for monitoring the cluster state when monitoring tools like Nagios aren’t available.

Each time the tungsten_monitor runs, it will execute a standard set of checks:

  • Check that all Tungsten services for this host are running
  • Check that all replication services and datasources are ONLINE
  • Check that replication latency does not exceed a specified amount
  • Check that the local connector is responsive
  • Check disk usage

Additional checks may be enabled using various command line options.

The tungsten_monitor is able to send you an email when problems are found.

It is suggested that you run the script as root so it is able to use the mail program without warnings.

Alerts are cached to prevent them from being sent multiple times and flooding your inbox. You may pass --reset to clear out the cache or –-lock-timeout to adjust the amount of time this cache is kept. The default is 3 hours.

An example root crontab entry to run tungsten_monitor every five minutes:

*/5 * * * * /opt/continuent/tungsten/cluster-home/bin/tungsten_monitor >/dev/null 2>/dev/null

An alternate example root crontab entry to run tungsten_monitor every five minutes in case your version of cron does not support the new syntax:

0,5,10,15,20,25,30,35,40,45,50,55 * * * * /opt/continuent/tungsten/cluster-home/bin/tungsten_monitor >/dev/null 2>/dev/null

All messages will be sent to /opt/continuent/share/tungsten_monitor/lastrun.log

The online documentation is here:

Big Brother is Watching You! The Power of Nagios and the check_tungsten_* scripts

We have two very descriptive blog posts about how to implement the Nagios-based cluster monitoring solution:

Global Multimaster Cluster Monitoring Using Nagios and NRPE

Essential Cluster Monitoring Using Nagios and NRPE

We also have Nagios-specific documentation to assist with configuration:

In the event you are unable to get Nagios working with Tungsten Clustering, please open a support case via our ZenDesk-based support portal
For more information about getting support, visit

There are many available NRPE-based check scripts, and the online documentation for each is listed below:

Big Brother Tells You Tell the Nagios server how to contact PagerDuty

The key is to have a contact defined for PagerDuty-specific email address, which is handled by the Nagios configuration file /opt/local/etc/nagios/objects/contacts.cfg:


define contact{ use generic-contact contact_name pagerduty alias PagerDuty Alerting Service Endpoint email } define contactgroup{ contactgroup_name admin alias PagerDuty Alerts members pagerduty,anotherContactIfDesired,etc }

Teach the Targets Tell NRPE on the Database Nodes What To Do

The NRPE commands are defined in the /etc/nagios/nrpe.cfg file on each monitored database node:


command[check_tungsten_online]=/usr/bin/sudo -u tungsten /opt/continuent/tungsten/cluster-home/bin/check_tungsten_online command[check_tungsten_latency]=/usr/bin/sudo -u tungsten /opt/continuent/tungsten/cluster-home/bin/check_tungsten_latency -w 2.5 -c 4.0 command[check_tungsten_progress_alpha]=/usr/bin/sudo -u tungsten /opt/continuent/tungsten/cluster-home/bin/check_tungsten_progress -t 5 -s alpha command[check_tungsten_progress_beta]=/usr/bin/sudo -u tungsten /opt/continuent/tungsten/cluster-home/bin/check_tungsten_progress -t 5 -s beta command[check_tungsten_progress_gamma]=/usr/bin/sudo -u tungsten /opt/continuent/tungsten/cluster-home/bin/check_tungsten_progress -t 5 -s gamma

Note that sudo is in use to give the nrpe user access as the tungsten user to the tungsten-owned check scripts using the sudo wildcard configuration.

Additionally, there is no harm in defining commands that may not be called, which allows for simple administration – keep the master copy in one place and then just push updates to all nodes as needed then restart nrpe.

Big Brother Sees You Tell the Nagios server to begin watching

Here are the service check definitions for the /opt/local/etc/nagios/objects/services.cfg file:


# Service definition define service{ service_description check_tungsten_online for all cluster nodes host_name db1,db2,db3,db4,db5,db6,db7,db8,db9 check_command check_nrpe!check_tungsten_online contact_groups admin use generic-service } # Service definition define service{ service_description check_tungsten_latency for all cluster nodes host_name db1,db2,db3,db4,db5,db7,db8,db9 check_command check_nrpe!check_tungsten_latency contact_groups admin use generic-service } # Service definition define service{ service_description check_tungsten_progress for alpha host_name db1,db2,db3 check_command check_nrpe!check_tungsten_progress_alpha contact_groups admin use generic-service } # Service definition define service{ service_description check_tungsten_progress for beta host_name db4,db5,db6 check_command check_nrpe!check_tungsten_progress_beta contact_groups admin use generic-service } # Service definition define service{ service_description check_tungsten_progress for gamma host_name db7,db8,db9 check_command check_nrpe!check_tungsten_progress_gamma contact_groups admin use generic-service }

Summary The Wrap-Up

In this blog post we discussed how to best integrate various cluster monitoring solutions with PagerDuty (, a popular alerting service.

To learn about Continuent solutions in general, check out

The Library Please read the docs!

For more information about monitoring Tungsten clusters, please visit

Below are a list of Nagios NRPE plugin scripts provided by Tungsten Clustering. Click on each to be taken to the associated documentation page.

  • check_tungsten_latency – reports warning or critical status based on the replication latency levels provided.
  • check_tungsten_online – checks whether all the hosts in a given service are online and running. This command only needs to be run on one node within the service; the command returns the status for all nodes. The service name may be specified by using the -s SVCNAME option.
  • check_tungsten_policy – checks whether the policy is in AUTOMATIC mode and returns a CRITICAL if not./
  • check_tungsten_progress – executes a heartbeat operation and validates that the sequence number has incremented within a specific time period. The default is one (1) second, and may be changed using the -t SECS option.
  • check_tungsten_services – confirms that the services and processes are running; their state is not confirmed. To check state with a similar interface, use the check_tungsten_online command.

Tungsten Clustering is the most flexible, performant global database layer available today – use it underlying your SaaS offering as a strong base upon which to grow your worldwide business!

For more information, please visit

Want to learn more or run a POC? Contact us.

How to Connect Golang with MySQL

Today, I will describe MySQL database connectivity with golang. MySQL is most popular open source relational database. I will let you know step by step how to golang connect with MySQL database. I assumed you have configured golang environment within your system, if not please configure golang environment into your system by following my previous […]

The post How to Connect Golang with MySQL appeared first on

Exposing MyRocks Internals Via System Variables: Part 1, Data Writing

Series Introduction

Back in 2016 I decided to write a blog series on InnoDB in hopes that it would help give a good description of the high level mechanics of the storage engine. The main motivating factor at that time was that I knew there was a lot of information out there about InnoDB, but a lot of it was ambiguous or even contradictory and I wanted to help make things a bit clearer if I could.

Now there’s a new storage engine that’s rising in popularity that I feel needs similar attention. Specifically MyRocks, the log-structured merge-driven RocksDB engine in MySQL. Given the amount of discussion in the community about MyRocks, I’m sure most of you already have some degree of familiarity, or at the very least have heard the name.

Now we’ve arrived at a point where this is no longer just a Facebook integration project and major players in the community like Maria and Percona have their own implemented solutions that come packaged with their respective binaries. Things even went a bit further in December of 2018 when Percona announced that TokuDB would be deprecated encouraged its users to explore MyRocks. This is when I decided we needed to take this engine a little more seriously, dig in, see how it works and what it has to offer.

Unlike InnoDB I found that the problem was not an overabundance of ambiguous information, but instead only small amounts of information that was scattered between various wikis, blog posts, and presentation slides. This initially made researching difficult, but as I came to know the engine better things got clearer and clearer. My hope is to take what I’ve learned and pass it along to you in a way that can help you gain the familiarity and confidence you need to explore this technology.

Much like my InnoDB series, I’m drafting this by presenting system variables in an order where the topics discussed for one variable lead to the next, and then to the next, and so on. Though I have to note here that my series on InnoDB was largely based on production experience, whereas this blog series is based on research and lab testing. I feel confident in the definitions that I’ve provided, but the recommendations are based more on theory than anything else.

With this in mind I would like to offer the same disclaimer I provided for my InnoDB series.

“….I should note that while tuning recommendations are provided, this objective of this blog post series is NOT meant to be a tuning primer, but instead to explore the mechanics that each variable interacts with…”

Let’s get started!!

Rows vs. Key-Value

The first thing that needs to be clarified when we talk about MyRocks is that it’s a storage engine that converts your MySQL requests into requests for the RocksDB engine that come packaged with the installation. Unlike MySQL, RocksDB is not a relational database and instead is a key-value store. When you enter column-based records into MyRocks, a process transparently converts them into key-value records.

The key of the record is a combination of an internal index id and the primary key as defined by your create table statement, if available. You can read more about this starting on slide 46 of this tutorial presentation. The value holds any record metadata plus the non-primary key columns. Just keep this in mind as we continue through the blog series as there will be several usages of the term ‘key’ which can mean either the primary key value of the table or the key component of the RocksDB key-value backend.

You can read more about the key-value format of MyRocks / RocksDB by checking the MyRocks record format page on the Facebook MyRocks Wiki.

Column Families

Data that is converted by MyRocks from row format to key-value format is stored in column families. A column family in RocksDB is like a table for an engine like InnoDB or MYISAM. The usage of column families in MyRocks from the perspective of MySQL is almost completely transparent. You will still query tables in MySQL and on the back end your requests will be converted to use column families, much like how row-level data and key-value data are converted transparently.

When you create a table in MyRocks, you will specify what column family the table belongs to. If the table has any secondary indexes you can specify what column families those indexes belong to. Keep in mind that a table and its secondary indexes don’t need to be in the same column family, and in contrast, you can have multiple tables and indexes in the same column family.

You create a column family by using the create table statement and noting the name of the column family as part of a comment for the primary key like in the example below where we create a table t1 and place it in the column family cf_t1.

CREATE TABLE `t1` ( `c1` int(10) unsigned NOT NULL AUTO_INCREMENT, `c2` char(255) NOT NULL, PRIMARY KEY (`c1`) COMMENT 'cf_t1' ) ENGINE=ROCKSDB DEFAULT CHARSET=latin1

If we wanted to we could create another table and place it in the cf_t1 column family by using the same comment.

In the case that we do not provide a comment in the create table statement, the table and its secondary indexes will call be created in the ‘default’ column family, which is automatically created when MyRocks is installed.

Each column family is going to have its own resources, data files, memory caches, etc. Creating multiple column families will help reduce the chances of resource contention found with putting everything in a single column family. There is no solid recommendation as to how many column families you should use, but per Yoshinori Matsunobu MyRocks Introduction Slides (slide 52 to be specific), you should aim to create no more than 20.

You can find more information about column families and how to create them on the Getting Started with MyRocks page in the facebook MyRocks wiki.

System Variables vs. Column Family Options

Now that we know a little bit about column families, we should address how configuration works for MyRocks. This whole blog series looks at mechanics from the variable / option perspective, so knowing how to configure those variables and options would be handy!

MyRocks still uses system variables, but it can also be configured using column family options. Earlier I pointed out that column families have their own resources. You can configure options for these resources for each column family independently if you like.

You can see the column family option settings by looking at the information_schema.ROCKSDB_CF_OPTIONS table. It will show you each column family, the option type, and the option value. You’re going to want to be aware of the fact that there is a difference between system variables, which more often than not designate global level settings for MyRocks, and column family options, which are more specific to each column family.

To make things even more confusing, you should know that sometimes column family options are set by system variables, but more often than not they’re set using the variable rocksdb_default_cf_options, which allows you to pass in a list of settings and subsettings that would be the default option for each column family. A good example of this can be found in Mark Callaghan’s blog post on default values for MyRocks.

If you change the default option for column families using the variable rocksdb_default_cf_options and restart MySQL, all column families will take on the new settings immediately.

Rocksdb_default_cf_options is used to set default column family options, but you can further specify options for each column family using rocksdb_override_cf_options. You can see an example of using overrides in Alexey Maykov’s presentation on MyRocks Internals slide 24.

The important takeaway here is that you configure MyRocks using both system variables and column family options. Below we’re going to start talking about these variables, settings, and the associated mechanic. If I’m referring to a column family option the heading will be prefixed with ‘CF_OPTION’, if I’m referring to a system variable there will be no prefix.

In-Memory Data Writes

Ok, now we know about row format, column families, and how to configure MyRocks. Now we can start digging into the details of how MyRocks works.

New data or data changes occurring in MyRocks first get written to a memory space called a “memtable”. Each column family will have its own memtable. As the data gets written into memory, a log of the changes gets written to a write-ahead log. In the case that a crash occurs between the time the data is written into the memtable and the time that memtable data gets flushed to disk, the write-ahead log can be used to replay the data changes that occurred so that MyRocks can recover and pick up right where it left off.

For those of you that are familiar with InnoDB, this is going to sound a lot like how data initially gets written in the buffer pool along with data changes being entered into the InnoDB redo logs. This is a very common method of ensuring data durability as a part of achieving ACID compliance.

Memtables have a size limit, which by default is 64Mb. Once the memtable fills up it’s converted to what’s called an “immutable memtable”. Immutable means that the data in that memtable can no longer be changed. In order to continue handling data write requests, a new memtable, now called the “active memtable”, is created to handle new write requests while the immutable memtable is rotated out so its data can be written to disk.

Variables and CF_OPTIONS

Let’s take a more detailed look at the mechanics associated with column families and memtables as well as the variables that are associated with them.


A mentioned above, you create column families using comments in the create table statement. You may want to include a safety measure in your system to ensure that new column families aren’t created, in order to keep your column family count under control.

The variable rocksdb_no_create_column_family allows you to prevent the creation of new column families via use of comments. If the variable is enabled and if you attempt to create a new column family using this method, you will get an error.

Default: OFF

I can see how someone may want to protect their system to ensure that no unexpected column families are created, but one important factor to consider is that this variable is not dynamic, meaning if you were to enable it you would need to restart MySQL in order to create a new column family and I think that is enough to justify leaving this variable disabled.

CF_OPTION write_buffer_size

As mentioned earlier, data initially gets written into MyRocks column families using memtables. There is one active memtable for each column family in MyRocks.

The column family option write_buffer_size designates how large the memtable will be.

Default: 67108864 (64Mb)

CF_OPTION min_write_buffer_number_to_merge

Once an active memtable is full, it will be converted into an immutable memtable so it can then be read and flushed to disk. However, this can be extended so that you can have multiple immutable memtables collected prior to a flush. This is controlled by the column family option min_write_buffer_number_to_merge. The value of this option will designate how many immutable memtables to create before starting the flushing process.

For example, if you have this value set to 2, as you write data into the active memtable and it fills, it will become immutable and a new memtable will take its place, but flushing won’t occur. Writes will continue and eventually, the new active memtable will fill, be converted to immutable, and replaced. You now have 2 immutable tables and at this point, the call to flush will occur.

I should note that I do believe the definition for this column family option is accurate; however, during my testing, I found that regardless of what I set this option to, I was never able to create more than one immutable memtable. I have created a bug with Percona to raise this issue with them.

Default: 1

I would be inclined to leave this variable at its default value. If you increase it, you will decrease the number of times data is flushed to disk, but each flush will be larger, and as a result, creating larger bursts of disk activity. If your immutable memtables aren’t finished flushing and your active memtable fills to capacity, then writes will stop until the flush can complete.


All column families will have their own active and immutable memtables, meaning the amount of memory used by writes would be equal to the number of active memory tables and immutable memory tables multiplied by the number of column families you have. If you’re not paying attention to your configuration and add a lot of column families, this may use more memory than you anticipated and could potentially lead to OOMs. However, you can protect yourself from this by using the rocksdb_db_write_buffer_size variable. In order to prevent excessive memory use, this variable will put a cap on the total amount of memory that should be used across all active memtables. This will have no impact on immutable memtables.

For example, if you have two column families with active memtables at 64Mb each, but have this variable set to 32Mb, MyRocks will attempt to rotate active memtables to immutable before the 32Mb cap is hit. However, if you expand this to a greater value such as 128Mb, then no restrictions will apply as the two 64Mb tables can only use 128Mb.

Again, it’s important to note that this has no impact on the amount of memory used by immutable memtables. So consider the example of having two column families with a 64Mb write_buffer_size and a min_write_buffer_number_to_merge value of 1. If you have the rocksdb_db_write_buffer_size set to 64Mb, it’s possible to have one column family with a full active memtable and a full immutable memtable, as well as having a full immutable memtable, meaning you have 192Mb of memory used.

I would say that the best way to approach this is that the maximum memory capacity used by the write buffers across all column families would be this value plus the maximum potential of all immutable memtables across all column families.

One thing I would like to note is that if you check the documentation for this variable at the time of this writing it states that this is the write buffer size per column family. However, in my testing, I showed that this setting is more of a global scope. As such, I wrote this blog post to reflect what my testing showed given that I believe what I have provided is the proper definition. I’ve opened a bug report with Percona to report this.

Default: 0

The default of 0 indicates that there is no upper limit for how much memory can be used by active memtables in the MyRocks instance.

I personally do not prefer having unrestricted caches as I’ve seen enough out of memory kills of MySQL to be wary. You’ll need to consider how many column families you wish to create, and how tolerant you are of the number of flushes being performed. Also, keep in mind that while memtables will be used for reads, it’s not anywhere near as important as the block cache (handled caching for reads will be covered in a later post). You won’t want to give memtables a majority share of physical memory like you would for something like the InnoDB Buffer Pool, for those of you who are familiar.


Enabling this variable will allow multiple threads to write to the same memtable concurrently. If this is not enabled, then multiple threads will have to write to the memtable in a sequential fashion.

It should be noted that this is only available for the memtable factory ‘SkipList’. RocksDB allows for memtables to be stored in different formats and these are called “factories”. If you’re familiar with InnoDB you can think of this kind of like a row format. You can find information about the different memtable factories and their different features on this page in the Facebook RocksDB Github Wiki, but I believe that going deep into the details on this particular topic may be premature given that, as far as I’m aware, SkipList is the only memtable factory supported by MyRocks for now.

While it’s possible that you may see performance increases in some use cases by enabling this feature, as seen in this blog post by Mark Callaghan, it does come at the cost of losing in-place updates and filter deletes, meaning some DML statements that update or delete data may be impacted by enabling this feature.

Also, if you are going to enable this feature, you will also need to enable the rocksdb_enable_write_thread_adaptive_yield variable, which is the next variable covered in this blog post.

Default: OFF

As noted above, enabling this feature may increase write-only workload, but at the cost of modifying data after insert. I would recommend testing this with appropriate performance benchmarks prior to implementation in production.


By default writes are not grouped, regardless of whether or not you are writing to a single column family or multiple column families. When you enable this variable you allow grouping to occur as mutexes for writing to memtables will occur on a timed basis instead of occurring as part of each transaction commit.

Default: OFF

Per the comments in RocksDB options.h file, enabling this may allow for additional throughput for concurrent workloads, but there is very little documentation out there to suggest what the negative impact of enabling this variable may be. It’s recommended to test this thoroughly before enabling in production.


As data is written into the memtables, it is also written into the write-ahead log. The purpose of the write-ahead log is to store information about committed data changes in order to ensure that the changes can be recovered in case of a crash. For those of you that have a familiarity with InnoDB, this plays the same role as the InnoDB redo log files.

The write-ahead log file can be found in the rocksdb data directory (default: /var/lib/mysql/.rocksdb) with the file extension of ‘.log’.

Regardless of whether or not your writes are going to one column family or several, all writes will go to the same write-ahead log. In the case that the memtable is flushed to disk, a new write-ahead log will be created at that time. However, you can also limit the size of write-ahead log files on disk by using this variable. Keep in mind that if you limit the size of disk consumption that can be used by write-ahead log files, once that limit is hit it will force a memtable flush. For example, if you have this variable set to 10Mb, once the 10Mb limit is reached it will force all memtables to flush to disk so that a new write-ahead log file can be created.

Default: 0

The default value indicates that the write-ahead log size should be chosen dynamically by MyRocks. Based on my testing it appears that this will effectively place very little limit on the log files. Generally speaking, you can expect that the amount of space consumed on disk would be the maximum amount of space of all of your column family’s active memtables, or rocksdb_db_write_buffer_size, whichever is smaller.

For example, if you have no limit on rocksdb_db_write_buffer_size and you have 3 column families at their default setting of 64Mb, you can expect that you’ll have at least 192Mb of write-ahead logs on disk.

Based on my testing, my recommendation would be to leave this variable at the default, mainly because you will want to keep disk flushes to a minimum and setting a hard limit for write-ahead logging may result in more flushing than if it were left to its original value. However, I think a good exception would be if you plan to use a lot of column families and if you want to make sure the space consumed by write-ahead logs doesn’t get out of control.


You have the option of storing the write-ahead log files within the data directory for MyRocks, or you if you like you can specify a different directory using the system variable rocksdb_wal_dir.

Default: NULL (place in data directory)

It’s been common in the past for database admins to separate random I/O and sequential I/O patterns by placing them on different storage devices or platforms. This is becoming less common as SSDs and similar technologies are becoming the database storage of choice, but if you’re still using spinning media, you may want to consider putting your write-ahead logs on a disk that is dedicated for the sequential I/O pattern typically found with log writing.


When a transaction is committed, you would want to sync it to the write-ahead log before returning a commit confirmation to the client in order to ensure that the transaction is ACID-compliant. If this did not occur, you wouldn’t have the assurance that the transaction that was recently committed would actually be able to be recovered from the write-ahead log in the case of a crash.

The variable rocksdb_flush_log_at_trx_commit controls this activity, allowing you to relax the requirements to sync transaction data to the log as part of the commit process and increase performance at the cost of ACID compliance. For those of you who are familiar with InnoDB, this variable is similar to innodb_flush_log_at_trx_commit.

Per the documentation, the following options are available to you with this variable.

1 – Write data to the write-ahead log and sync the data to disk as part of the transaction commit process.

2 – Write the data to the write-ahead log at commit and sync to disk once per second.

0 – Write the data to the write-ahead log at commit but do not sync to disk.

The interesting part about value 0 is that it doesn’t specify under what specific conditions the write-ahead log would be synced to disk. Upon further testing, I found that options 0 and 2 behaved identically and have opened a bug with Percona to address this problem.

Default: 1

I would recommend leaving this at its default value unless you can live with the risk of potentially losing data in the case of a crash.


There are times when you may want to disable write-ahead logging. This can increase write performance overall as transactions don’t have to wait for confirmation that log data has been written prior to commit, which is good for bulk loading your database, or even restoring from a logical backup. The fact that this can be configured dynamically for an individual MySQL session also makes this variable worth considering. However, you have to be careful as data written into a memtable is not guaranteed to be flushed to disk until it has become immutable and subsequently flushed. Until that point, any data written to the memtable with this option enabled would be lost in a crash.

Default: OFF

I think it’s pretty clear that leaving this at the default of ‘OFF’ is best. If you were to disable write-ahead logging, you would want to do so dynamically, and only within a specific MySQL session.


Once a new write-ahead log file is created under one of the previously noted conditions, old versions of the write-ahead log file should be kept in the archive directory of the MyRocks data directory. For example, with my default configuration values this directory is located in /var/lib/mysql/.rocksdb/archive.

Per the documentation, this variable is supposed to control how long archived files should be kept. With a value of ‘0’ it should not delete any archived files. However, in my testing, I discovered that the behavior of write-ahead log archiving was different than what the documentation describes. Instead, if you have a value of 0, there is no write-ahead log archiving at all. Write-ahead log archiving only happened in my lab once this variable was set to a non-zero value. I have created a bug with Percona to address this.

Default: 0

Given that write-ahead logs are there to serve a purpose of ensuring data durability in a crash, I don’t think it’s very beneficial to hold onto old write-ahead log files. As such, I would be inclined to leave this at zero, or perhaps keep this at a very low value in order to ensure that you aren’t consuming a lot of disk space for old write-ahead logs.


The whole point of the write-ahead log is to be able to recover from a crash with committed data still intact. This is the ‘durable’ component of ACID compliance. However, MySQL could crash in the middle of an entry addition, or other corruption could occur. This variable allows you to configure how MyRocks should handle that type of an issue during crash recovery.

Permitted values are as follows.

0 – If a corrupted WAL entry is detected as the last entry in the WAL, truncate the entry and start up normally; otherwise, refuse to start.

1 – If a corrupted WAL entry is detected, fail to start. This is the most conservative recovery mode.

2 – If a corrupted WAL entry is detected in the middle of the WAL, truncate all of WAL entries after that (even though there may be uncorrupted entries) and then start up normally. For replication slaves, this option is fine, since the slave instance can recover any lost data from the master without breaking consistency. For replication masters, this option may end up losing data if you do not run failover. For example, if the master crashed and was restarted (by mysqld_safe, or auto restart after OS reboot) with this mode and it silently truncated several WAL entries, the master would lose some data which may be present on one or more slaves.

3 – If a corrupted WAL entry is detected in the middle of the WAL, skip the WAL entry and continue to apply as many healthy WAL entries as possible. This is the most dangerous recovery option and it is not generally recommended.

Default: 1

I believe that the most conservative crash recovery method should be used until you are forced to select another option. In the case that you attempt to start MySQL and MyRocks provides an error in regard to log corruption, use the data that you’ve collected from the error log and select the next least conservative value.

In my opinion, if Option 3 ever need to be used, you should start the system, do a logical dump of the database using something like mysqldump, stop MySQL, clear out your data directory, and reload from the dump that you created.

Associated Metrics

Here are some of the metrics you should pay attention to when it comes to initial data writing.

You can find the following information using system status variables

    • Rocksdb_rows_deleted: The number of rows that have been deleted from tables using the MyRocks storage engine since the last MySQL restart.
    • Rocksdb_rows_inserted: The number of rows that have been inserted into tables using the MyRocks storage engine since the last MySQL restart.
    • Rocksdb_rows_updated: The number of rows that have been updated in tables using the MyRocks storage engine since the last MySQL restart.
    • Rocksdb_bytes_written: The number of bytes written into MyRocks tables including updates and deletes since the last MySQL restart.
    • Rocksdb_memtable_total: The total amount of memory currently being consumed by memtables in bytes.
    • Rocksdb_stall_memtable_limit_slowdowns: The number of slowdowns that have occurred due to MyRocks getting close to the maximum number of allowed memtables since the last MySQL restart.
    • Rocksdb_stall_memtable_limit_stops: The number of write stalls that have occurred due to MyRocks hitting the maximum number of allowed memtables since the last MySQL restart.
    • Rocksdb_stall_total_slowdowns: The total number of write slowdowns that have occurred in the MyRocks engine since the last MySQL restart.
    • Rocksdb_stall_total_stops: The total number of write stalls that have occurred in the MyRocks engine since the last MySQL restart.
    • Rocksdb_wal_bytes: The number of bytes written to the write-ahead log since the last MySQL restart.
    • Rocksdb_wal_synced: The number of times that a write-ahead log has been synced to disk since the last MySQL restart.

In the information_schema.ROCKSDB_CFSTATS table you can find the following information about each column family.

  • CUR_SIZE_ALL_MEM_TABLES: The current size of all memtables, including active and immutable, per column family.
  • CUR_SIZE_ACTIVE_MEM_TABLE: The current size of all active memtables per column family.
  • NUM_ENTRIES_ACTIVE_MEM_TABLE: The number of record changes in the active memtable per column family.
  • NUM_ENTRIES_IMM_MEM_TABLES: The number of record changes in the immutable memtable(s) per column family.
  • NUM_IMMUTABLE_MEM_TABLE: The current number of immutable memtables for each column family.

In this post, we explained column families, the differences between server variables and column family options, how data gets written into memory space, the associated logging, and different conditions that designate when that data will be flushed to disk.

Stay tuned for the next post where we will discuss the mechanics surrounding the flush from immutable memtable to disk.

A big thank you goes out to my internal fact checker, Sandeep Varupula, as well as George Lorch of Percona for acting as an additional external technical reviewer!

On the Database Market. Interview with Merv Adrian

“Anyone who expects to have some of their work in the cloud (e.g. just about everyone) will want to consider the offerings of the cloud platform provider in any shortlist they put together for new projects. These vendors have the resources to challenge anyone already in the market.”– Merv Adrian.

I have interviewed Merv Adrian, Research VP, Data & Analytics at Gartner. We talked about the the database market, the Cloud and the 2018 Gartner Magic Quadrant for Operational Database Management Systems.


Q1. Looking Back at 2018, how has the database market changed?

Merv Adrian: At a high level, much is similar to the prior year. The DBMS market returned to double digit growth in 2017 (12.7% year over year in Gartner’s estimate) to $38.8 billion. Over 73% of that growth was attributable to two vendors: Amazon Web Services and Microsoft, reflecting the enormous shift to new spending going to cloud and hybrid-capable offerings. In 2018, the trend grew, and the erosion of share for vendors like Oracle, IBM and Teradata continued. We don’t have our 2018 data completed yet, but I suspect we will see a similar ballpark for overall growth, with the same players up and down as last year. Competition from Chinese cloud vendors, such as Alibaba Cloud and Tencent, is emerging, especially outside North America.

Q2. What most surprised you?

Merv Adrian: The strength of Hadoop. Even before the merger, both Cloudera and Hortonworks continued steady growth, with Hadoop as a cohort outpacing all other nonrelational DBMS activity from a revenue perspective. With the merger, Cloudera becomes the 7th largest vendor by revenue and usage and intentions data suggest continued growth in the year ahead.

Q3. Is the distinction between relational and nonrelational database management still relevant?

Merv Adrian: Yes, but it’s less important than the cloud. As established vendors refresh and extend product offerings that build on their core strengths and capabilities to provide multimodel DBMS and/or or broad portfolios of both, the “architecture” battle will ramp up. New disruptive players and existing cloud platform providers will have to battle established vendors where they are strong – so for DMSA players like Snowflake will have more competition and on the OPDBMS side, relational and nonrelational providers alike – such as EnterpriseDB, MongoDB, and Datastax – will battle more for a cloud foothold than a “nonrelational” one.
Specific nonrelational plays like Graph, Time Series, and ledger DBMSs will be more disruptive than the general “nonrelational” category.

Q4. Artificial intelligence is moving from sci-fi to the mainstream. What is the impact on the database market?

Merv Adrian: Vendors are struggling to make the case that much of the heavy lifting should move to their DBMS layer with in-database processing. Although it’s intuitive, it represents a different buyer base, with different needs for design, tools, expertise and operational support. They have a lot of work to do.

Q5. Recently Google announced BigQuery ML. Machine Learning in the (Cloud) Database. What are the pros and cons?

Merv Adrian: See the above answer. Google has many strong offerings in the space – putting them together coherently is as much of a challenge for them as anyone else, but they have considerable assets, a revamped executive team under the leadership of Thomas Kurian, and are entering what is likely to be a strong growth phase for their overall DBMS business. They are clearly a candidate to be included in planning and testing.

Q6. You recently published the 2018 Gartner Magic Quadrant for Operational Database Management Systems. In a nutshell. what are your main insights?

Merv Adrian: Much of that is included in the first answer above. What I didn’t say there is that the degree of disruption varies between the Operational and DMSA wings of the market, even though most of the players are the same. Most important, specialists are going to be less relevant in the big picture as the converged model of application design and multimodel DBMSs make it harder to thrive in a niche.

Q7. To qualify for inclusion in this Magic Quadrant, vendors must have had to support two of the following four use cases: traditional transactions, distributed variable data, operational analytical convergence and event processing or data in motion. What is the rational beyond this inclusion choice?

Merv Adrian: The rationale is to offer our clients the offerings with the broadest capabilities. We can’t cover all possibilities in depth, so we attempt to reach as many as we can within the constraints we design to map to our capacity to deliver. We call out specialists in various other research offerings such as Other Vendors to Consider, Cool Vendor, Hype Cycle and other documents, and pieces specific to categories where client inquiry makes it clear we need to have a published point of view.

Q8. How is the Cloud changing the overall database market?

Merv Adrian: Massively. In addition to functional and architectural disruption, it’s changing pricing, support, release frequency, and user skills and organizational models. The future value of data center skills, container technology, multicloud and hybrid challenges and more are hot topics.

Q9. In your Quadrant you listed Amazon Web Services, Alibaba Cloud and Google. These are no pure database vendors, strictly speaking. What role do they play in the overall Operational DBMS market?

Merv Adrian: Anyone who expects to have some of their work in the cloud (e.g. just about everyone) will want to consider the offerings of the cloud platform provider in any shortlist they put together for new projects. These vendors have the resources to challenge anyone already in the market. And their deep pockets, and the availability of open source versions of every DBMS technology type that they can use – including creating their own versions of with optimizations for their stack and pre-built integrations to upstream and downstream technologies required for delivery – makes them formidable.

Q10. What are the main data management challenges and opportunities in 2019?

Merv Adrian: Avoiding silver bullet solutions, sticking to sound architectural principles based on understanding real business needs, and leveraging emerging ideas without getting caught in dead end plays. Pretty much the same as always. The details change, but sound design and a focus on outcomes remain the way forward.

Qx Anything else you wish to add?

Merv Adrian: Fasten your seat belt. It’s going to be a bumpy ride.


Merv Adrian, Research VP, Data & Analytics. Gartner

Merv Adrian is an Analyst on the Data Management team following operational DBMS, Apache Hadoop, Spark, nonrelational DBMS and adjacent technologies. Mr. Adrian also tracks the increasing impact of open source on data management software and monitors the changing requirements for data security in information platforms.


–  2018 Gartner Magic Quadrant for Operational Database Management Systems

–  2019 Gartner Magic Quadrant for Data Management Solutions for Analytics

–  2019 Gartner Magic Quadrant for Data Science and Machine Learning Platforms

 Related Posts – On the technical challenges posed by the 5G revolution. Q&A with Dheeraj Remella – On Neo4j and Google Cloud Partnership. Q&A with Lance Walter – On Kubernetes, Cloud and Couchbase Autonomous Operator. Q&A with Anil Kumar – On AI, Big Data, Healthcare in China. Q&A with Luciano Brustia – On MariaDB Connector/Node.js. Q&A with Thomas Boyd – On Yellowbrick Data Warehouse. Q&A with Gary Orenstein –  On RedisGraph. Q&A with Daniel Howard –  On Software Quality. Q&A with Alexander Boehm and Greg Law. –  On Redis. Q&A with Yiftach Shoolman,Founder and CTO of Redis Labs –  On Apache NiFi, Kafka and Lambda Architecture. Q&A with Jordan Martz –  On Vertica Database, Apache ORC and Parquet. Q&A with Deepak Majeti –  On Apache Kafka®. Q&A with Gwen Shapira –  On Cloud-based AI. Q&A with James Kobielus

Follow us on Twitter: @odbmsorg


How to speed-up builds with Omnibus caching

To build a package, Omnibus compiles dependencies. If it’s a Python project, Omnibus will compile python each time it packages the project. If the project depends on a library – it will have to be recompiled every time. That’s suboptimal, let’s see what we can do to reduce the build time. I was inspecting omnibus/omnibus.rb when […]

The post How to speed-up builds with Omnibus caching appeared first on TwinDB.

End of Life Announcement for Query Analyzer and MySQL Configuration Generator

Percona will no longer be offering the free services of Query Analyzer and MySQL Configuration Generator hosted at  We made this decision based on observed low usage numbers, outdated technology, and lack of support for MySQL 5.7+, MongoDB, and PostgreSQL.

Although there isn’t a like-for-like replacement available, you might wish to consider the following options, which offer similar and effective tools to support your business:

1. Query Analyzer – Consider the combination of PMM (Percona Monitoring and Management) + EverSQL (use code PERCONA-EVERSQL-10OFF for a 10% discount on the first billing cycle) to unify slow query identification with query optimization! We’ve blogged about this before:

Monitor and Optimize Slow Queries with PMM and EverSQL – Part One

Monitor and Optimize Slow Queries with PMM and EverSQL – Part Two

2. Percona toolkit – Our command line tools enable DBAs to be more productive, for example, pt-config-diff can help you spot the differences between a running instance and configuration file (or across configuration files!), and pt-variable-advisor examines the output of SHOW VARIABLES and reports bad values in order to properly tune your MySQL server.

We would like to thank all of the users of Query Analyzer and the MySQL Configuration Generator for their support over the years. We would be happy to discuss the alternative options available if you would find this useful. Please contact for more information.

SQL SELECT Query Example | SQL SELECT Statement Tutorial

SQL SELECT Query Example | SQL SELECT Statement Tutorial is today’s topic. The Select is the most commonly used statement in SQL. SQL SELECT statement is used to fetch the data from a database table which returns this data in the form of a result table.

The Select command in SQL is one of the most powerful and heavily used commands. This is I guess the first command anyone learns in SQL even before CREATE which is used to create a table in SQL. SELECT is used in SQL to fetch records from database tables, and you can do a lot many things using Select.

For example, you can select all records; you can choose few records based on the condition specified in WHERE clause, select all columns using the wild card (*) or only selecting a few columns by explicitly declaring them in a query.

SQL SELECT Query Example

The SQL SELECT syntax is following.

SELECT column-names FROM table-name

Here, column-names are the field names of the table you want to select data from. If you’re going to select all the fields available in the table, use the following syntax.

SELECT * FROM table_name;

Here are some of the fundamental examples of SQL SELECT Query Statements.

Fetch all the records Select * from Apps

Above query fetch all the records from the defined table. In our case, it is Apps. The output is following.


Finding how many rows in the table

Let’s count the rows of the table using the select statement.

select count(*) from Apps

The output is following.


Select a few records based on some condition

If you are not aware of WHERE Statement then check out SQL WHERE Clause Example.

See the following query.

select * from Apps where AppName = 'MoneyControl'

In this query, we are fetching the records based on where condition. We are fetching only those records, whose AppName is MoneyControl. The output is following.


Select a few columns instead of all columns

Let’s see the scenario where we do not need to fetch all the columns instead only required columns. See the below query.

select AppName, AppCategory from Apps

We are only selecting the AppName and AppCategory from the Apps table. See the output.


select distinct (unique) records from columns

Let’s see the scenario, where we need to find unique records based on the column values. See the following query.

select distinct AppName from Apps

In our table, AppName column has already distinct values but, if it has repeated values, then it will eliminate other repeated values.


Select value with condition based on >, <, >=, <=

We can select value with condition based on >, <, >=, <=. See the following code.

select AppName from Apps where AppPrice > 60

See the following example.


select query using BETWEEN, NOT BETWEEN

As the name suggests BETWEEN is used to get data between ranges.

select * from Apps where AppPrice BETWEEN '70' AND '100';

The output is following.


For NOT BETWEEN, you need to add the keyword NOT BETWEEN in above query instead of BETWEEN. It will give us the output which is not in range.

Finally, SQL SELECT Query Example | SQL SELECT Statement Tutorial is over.

The post SQL SELECT Query Example | SQL SELECT Statement Tutorial appeared first on AppDividend.

Percona Live Presents: Vitess – Running Sharded MySQL on Kubernetes

The topic I’m presenting addresses a growing and unfulfilled need: the ability to run stateful workloads in Kubernetes. Running stateless applications is now considered a solved problem. However, it’s currently not practical to put databases like MySQL in containers, give them to Kubernetes, and expect it to manage their life cycles.

Sugu Sougoumarane, CTO of Planetscale and creator of Vitess

Vitess addresses this need by providing all the necessary orchestration and safety, and it has multiple years of mileage to show for it. Storage is the last piece of the puzzle that needs to be solved in Kubernetes, and it’s exciting to see people look towards Vitess to fill this gap.

Who’d benefit most from the presentation?

Anybody that’s looking to move to Kubernetes and is wondering about what to do about their data is the perfect audience. Needless to say, vitess also addresses problems of scalability. So, those who are looking to scale mysql will also benefit from our talk.

Whose presentations are you most looking forward to?

I’m looking forward to An Open-Source, Cloud Native Database (CNDB) by David Cohen, of Intel, and others. They are doing something unique by bridging the gap from legacy systems and cloud-based architectures that are coming up today, and using all open source technology.

I’ll be presenting my talk Vitess: Running Sharded MySQL on Kubernetes at Percona Live 2019 on Wednesday, May 29 alongside Dan Kozlowski, also of PlanetScale. If you’d like to register for the conference, use the code SEEMESPEAK for a 20% discount on your ticket.

Percona Live 2019 takes place in Austin Texas from May 28 – May 30, view the full programme here.

The post Percona Live Presents: Vitess – Running Sharded MySQL on Kubernetes appeared first on Percona Community Blog.

MySQL-python: Adding caching_sha2_password and TLSv1.2 Support

Python 2 reaches EOL on 2020-01-01 and one of its commonly used third-party packages is MySQL-python. If you have not yet migrated away from both of these, since MySQL-python does not support Python 3, then you may have come across some issues if you are using more recent versions of MySQL and are enforcing a secured installation. This post will look at two specific issues that you may come across (caching_sha2_password in MySQL 8.0 and TLSv1.2 with MySQL >=5.7.10 when using OpenSSL) that will prevent you from connecting to a MySQL database and buy you a little extra time to migrate code.

For CentOS 7, MySQL-python is built against the client library provided by the mysql-devel package, which does not support some of the newer features, such as caching_sha2_password (the new default authentication plugin as of MySQL 8.0.4) and TLSv1.2. This means that you cannot take advantage of these security features and therefore leave yourself with an increased level of vulnerability.

So, what can we do about this? Thankfully, it is very simple to resolve, so long as you don’t mind getting your hands dirty!

Help! MySQL-python won’t connect to my MySQL 8.0 instance

First of all, let’s take a look at the issues that the latest version of MySQL-python (MySQL-python-1.2.5-1.el7.rpm for CentOS 7) has when connecting to a MySQL 8.0 instance. We will use a Docker container to help test this out by installing MySQL-python along with the Percona Server 8.0 client so that we can compare the two.

=> docker run --rm -it --network host --name rpm-build centos:latest # yum install -y -q MySQL-python warning: /var/cache/yum/x86_64/7/base/packages/MySQL-python-1.2.5-1.el7.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID f4a80eb5: NOKEY Public key for MySQL-python-1.2.5-1.el7.x86_64.rpm is not installed Importing GPG key 0xF4A80EB5: Userid : "CentOS-7 Key (CentOS 7 Official Signing Key) <>" Fingerprint: 6341 ab27 53d7 8a78 a7c2 7bb1 24c6 a8a7 f4a8 0eb5 Package : centos-release-7-6.1810.2.el7.centos.x86_64 (@CentOS) From : /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7 # yum install -q -y; percona-release setup ps80 * Enabling the Percona Original repository <*> All done! The percona-release package now contains a percona-release script that can enable additional repositories for our newer products. For example, to enable the Percona Server 8.0 repository use: percona-release setup ps80 Note: To avoid conflicts with older product versions, the percona-release setup command may disable our original repository for some products. For more information, please visit: * Disabling all Percona Repositories * Enabling the Percona Server 8.0 repository * Enabling the Percona Tools repository <*> All done! # yum install -q -y percona-server-client warning: /var/cache/yum/x86_64/7/ps-80-release-x86_64/packages/percona-server-shared-8.0.15-5.1.el7.x86_64.rpm: Header V4 RSA/SHA256 Signature, key ID 8507efa5: NOKEY Public key for percona-server-shared-8.0.15-5.1.el7.x86_64.rpm is not installed Importing GPG key 0x8507EFA5: Userid : "Percona MySQL Development Team (Packaging key) <>" Fingerprint: 4d1b b29d 63d9 8e42 2b21 13b1 9334 a25f 8507 efa5 Package : percona-release-1.0-11.noarch (installed) From : /etc/pki/rpm-gpg/PERCONA-PACKAGING-KEY

Next we will create a client config to save some typing later on and then check that we can connect to the MySQL instance that is already running; if you don’t have MySQL running on your local machine then you can install it in the container.

# cat <<EOS > /root/.my.cnf > [client] > user = testme > password = my-secret-pw > host = > protocol = tcp > ssl-mode = REQUIRED > EOS mysql> /* hide passwords */ PAGER sed "s/AS '.*' REQUIRE/AS 'xxx' REQUIRE/" ; PAGER set to 'sed "s/AS '.*' REQUIRE/AS 'xxx' REQUIRE/"' mysql> SHOW CREATE USER CURRENT_USER(); CREATE USER 'testme'@'%' IDENTIFIED WITH 'caching_sha2_password' AS 'xxx' REQUIRE SSL PASSWORD EXPIRE DEFAULT ACCOUNT UNLOCK PASSWORD HISTORY DEFAULT PASSWORD REUSE INTERVAL DEFAULT mysql> SELECT @@version, @@version_comment, ssl_version, ssl_cipher, user FROM sys.session_ssl_status INNER JOIN sys.processlist ON thread_id = thd_id AND conn_id = CONNECTION_ID(); 8.0.12-1 Percona Server (GPL), Release '1', Revision 'b072566' TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 testme@localhost All looks good here, so now we will check that MySQLdb can also connect: # /usr/bin/python -c "import MySQLdb; dbc=MySQLdb.connect(read_default_file='~/.my.cnf').cursor(); dbc.execute('select @@version, @@version_comment, ssl_version, ssl_cipher, user from sys.session_ssl_status inner join sys.processlist on thread_id = thd_id and conn_id = connection_id()'); print dbc.fetchall()" Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python2.7/site-packages/MySQLdb/", line 81, in Connect return Connection(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/MySQLdb/", line 193, in __init__ super(Connection, self).__init__(*args, **kwargs2) _mysql_exceptions.OperationalError: (2059, "Authentication plugin 'caching_sha2_password' cannot be loaded: /usr/lib64/mysql/plugin/ cannot open shared object file: No such file or directory")

Changing the user’s authentication plugin

We have hit the first issue, because MySQL 8.0 introduced the caching_sha2_password plugin and made it the default authentication plugin, so we can’t connect at all. However, we can gain access by changing the grants for the user to use the old default plugin and then test again.

mysql> ALTER USER 'testme'@'%' IDENTIFIED WITH mysql_native_password BY "my-secret-pw"; # /usr/bin/python -c "import MySQLdb; dbc=MySQLdb.connect(read_default_file='~/.my.cnf').cursor(); dbc.execute('select @@version, @@version_comment, ssl_version, ssl_cipher, user from sys.session_ssl_status inner join sys.processlist on thread_id = thd_id and conn_id = connection_id()'); print dbc.fetchall()" Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python2.7/site-packages/MySQLdb/", line 81, in Connect return Connection(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/MySQLdb/", line 193, in __init__ super(Connection, self).__init__(*args, **kwargs2) _mysql_exceptions.OperationalError: (1045, "Access denied for user 'testme'@'localhost' (using password: YES)")

Configuring SSL options

We still can’t connect, so what can be wrong? Well, we haven’t added any SSL details to the config other than specifying that we need to use SSL, so we will add the necessary options and make sure that we can connect.

# cat <<EOS >> /root/.my.cnf > ssl-ca = /root/certs/ca.pem > ssl-cert = /root/certs/client-cert.pem > ssl-key = /root/certs/client-key.pem > EOS # mysql -Bse "select 1" 1 # /usr/bin/python -c "import MySQLdb; dbc=MySQLdb.connect(read_default_file='~/.my.cnf').cursor(); dbc.execute('select @@version, @@version_comment, ssl_version, ssl_cipher, user from sys.session_ssl_status inner join sys.processlist on thread_id = thd_id and conn_id = connection_id()'); print dbc.fetchall()" (('8.0.12-1', "Percona Server (GPL), Release '1', Revision 'b072566'", 'TLSv1', 'ECDHE-RSA-AES256-SHA', 'testme@localhost'),)

Forcing TLSv1.2 or later to further secure connections

That looks much better now, but if you look closely then you will notice that the MySQLdb connection is using TLSv1, which will make your security team either sad, angry or perhaps both as the connection can be downgraded! We want to secure the installation and keep data over the wire safe from prying eyes, so we will remove TLSv1 and also TLSv1.1 from the list of versions accepted by MySQL and leave only TLSv1.2 (TLSv1.3 is not supported with our current MySQL 8.0 and OpenSSL versions). Any guesses what will happen now?

mysql> SET PERSIST_ONLY tls_version = "TLSv1.2"; RESTART; # /usr/bin/python -c "import MySQLdb; dbc=MySQLdb.connect(read_default_file='~/.my.cnf').cursor(); dbc.execute('select @@version, @@version_comment, ssl_version, ssl_cipher, user from sys.session_ssl_status inner join sys.processlist on thread_id = thd_id and conn_id = connection_id()'); print dbc.fetchall()" Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python2.7/site-packages/MySQLdb/", line 81, in Connect return Connection(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/MySQLdb/", line 193, in __init__ super(Connection, self).__init__(*args, **kwargs2) _mysql_exceptions.OperationalError: (2026, 'SSL connection error: error:00000001:lib(0):func(0):reason(1)')

MySQLdb can no longer connect to MySQL, sadly with a rather cryptic message! However, as we made the change that triggered this we don’t need to decipher it and can now start looking to add TLSv1.2 support to MySQLdb, so roll up your sleeves!

Solution: Build a new RPM

In order to build a new RPM we will need to do a little extra work in the container first of all, but it doesn’t take long and is pretty simple to do. We are going to install the necessary packages to create a basic build environment and then rebuild the MySQL-python RPM from its current source RPM.

## Download packages # yum install -q -y rpm-build yum-utils gnupg2 rsync deltarpm gcc Package yum-utils-1.1.31-50.el7.noarch already installed and latest version Package gnupg2-2.0.22-5.el7_5.x86_64 already installed and latest version Delta RPMs disabled because /usr/bin/applydeltarpm not installed. ## Create build directory tree # install -d /usr/local/builds/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS} ## Configure the RPM macro # echo "%_topdir /usr/local/builds/rpmbuild" > ~/.rpmmacros ## Switch to a temporary directory to ease cleanup # cd "$(mktemp -d)"; pwd /tmp/tmp.xqxdBeLkXs ## Download the source RPM # yumdownloader --source -q MySQL-python Enabling updates-source repository Enabling base-source repository Enabling extras-source repository ## Extract the source RPM # rpm2cpio "$(ls -1 MySQL-python*src.rpm)" | cpio -idmv MySQL-python-no-openssl.patch MySQL-python.spec 234 blocks

We are now ready to start making some changes to the source code and build specifications, but first of all we need to take note of another change that occurred in MySQL 8.0. Older code will reference my_config.h, which has since been removed and is no longer required for building; the fix for this will be shown below.

## Adjust the spec file to use percona-server-devel and allow a build number # sed -i "s/mysql-devel/percona-server-devel/g; s/Release: 1/Release: %{buildno}/" MySQL-python.spec ## Store the ZIP filename and extract # SRC_ZIP="$(ls -1 MySQL-python*.zip)"; unzip "${SRC_ZIP}" ## Store the source directory and remove the include of my_config.h # SRC_DIR=$(find . -maxdepth 1 -type d -name "MySQL-python*"); sed -i 's/#include "my_config.h"/#define NO_MY_CONFIG/' "${SRC_DIR}/_mysql.c" ## View our _mysql.c change # fgrep -m1 -B3 -A1 -n NO_MY_CONFIG "${SRC_DIR}/_mysql.c" 41-#if defined(MS_WINDOWS) 42-#include 43-#else 44:#define NO_MY_CONFIG 45-#endif ## Update the source # zip -uv "${SRC_ZIP}" "${SRC_DIR}/_mysql.c" updating: MySQL-python-1.2.5/_mysql.c (in=84707) (out=17314) (deflated 80%) total bytes=330794, compressed=99180 -> 70% savings

Now that we have adjusted the source code and specification we can start work on the new package so that we can once again connect to MySQL.

## Sync the source to the build tree # rsync -ac ./ /usr/local/builds/rpmbuild/SOURCES/ ## Copy the new specification file to the build tree # cp -a MySQL-python.spec /usr/local/builds/rpmbuild/SPECS/ ## Build a new source RPM with our updates # rpmbuild --define "buildno 2" -bs /usr/local/builds/rpmbuild/SPECS/MySQL-python.spec Wrote: /usr/local/builds/rpmbuild/SRPMS/MySQL-python-1.2.5-2.el7.src.rpm ## Use the new source RPM to install any missing dependencies # yum-builddep -q -y /usr/local/builds/rpmbuild/SRPMS/MySQL-python-1.2.5-2.el7.src.rpm &>>debug.log ## Build a new RPM # rpmbuild --define "buildno 2" --rebuild /usr/local/builds/rpmbuild/SRPMS/MySQL-python-1.2.5-2.el7.src.rpm &>>debug.log # tail -n1 debug.log + exit 0

All went well and so we can now install the new package and see if it worked.

## Install the local RPM # yum localinstall -q -y /usr/local/builds/rpmbuild/RPMS/x86_64/MySQL-python-1.2.5-2.el7.x86_64.rpm ## Check to see which libmysqlclient is used # ldd /usr/lib64/python2.7/site-packages/ | fgrep libmysqlclient => /usr/lib64/mysql/ (0x00007f61660be000) ## Test the connection # /usr/bin/python -c "import MySQLdb; dbc=MySQLdb.connect(read_default_file='~/.my.cnf').cursor(); dbc.execute('select @@version, @@version_comment, ssl_version, ssl_cipher, user from sys.session_ssl_status inner join sys.processlist on thread_id = thd_id and conn_id = connection_id()'); print dbc.fetchall()" (('8.0.12-1', "Percona Server (GPL), Release '1', Revision 'b072566'", 'TLSv1.2', 'ECDHE-RSA-AES128-GCM-SHA256', 'testme@localhost'),)

Almost there… now force user authentication with the caching_sha2_password plugin

Hurrah! We can once again connect to the database and this time we are using TLSv1.2 and have thus increased our security. There is one thing left to do now though. Earlier on we needed to change the authentication plugin, so we will now change it back for extra security and see if all is still well.

mysql> ALTER USER 'testme'@'%' IDENTIFIED WITH caching_sha2_password BY "my-secret-pw"; # /usr/bin/python -c "import MySQLdb; dbc=MySQLdb.connect(read_default_file='~/.my.cnf').cursor(); dbc.execute('select @@version, @@version_comment, ssl_version, ssl_cipher, user from sys.session_ssl_status inner join sys.processlist on thread_id = thd_id and conn_id = connection_id()'); print dbc.fetchall()" (('8.0.12-1', "Percona Server (GPL), Release '1', Revision 'b072566'", 'TLSv1.2', 'ECDHE-RSA-AES128-GCM-SHA256', 'testme@localhost'),)

Mission successful! Hopefully, if you are finding yourself in need of a little extra time migrating away from MySQLdb then this will help.

Image modified from photo by David Clode on Unsplash

Galera Cluster with new Galera Replication library 3.26 and MySQL 5.6.43, MySQL 5.7.25 Generally Available (GA)

Codership is pleased to announce a new Generally Available (GA) release of Galera Cluster for MySQL 5.6 and 5.7, consisting of MySQL-wsrep 5.6.43 and MySQL-wsrep 5.7.25 with a new Galera Replication library 3.26 (release notes, download), implementing wsrep API version 25. This release incorporates all changes to MySQL 5.6.43 (release notes, download) and 5.7.25 respectively (release notes, download).

The Galera Replication library compared to to the previous 3.25 release has a few new features and enhancements: GCache page store fixes around an early release of the GCache page, a check for duplicate UUIDs to prevent a node joining with a similar UUID, and improvements to the internal handling of IPv6 addresses. From a compilation standpoint, dynamic symbol dispatch was disabled in to avoid symbol conflicts during dynamic loading.

MySQL 5.6.43 with Galera Replication library 3.26 is an updated rebase, with a few notes: on Ubuntu 16.04 Xenial, note that the server cannot be bootstrapped with systemd and must rely on the SysV init scripts. However normal server operations will work with systemd. On Debian 9 (Stretch), the service command cannot be used to start the server.

MySQL 5.7.25 with Galera Replication library 3.26 is an updated rebase, with additional packages added for openSUSE Leap 15 and Ubuntu 18.04 LTS (Bionic Beaver). Similarly to the 5.6.43 release, the service command cannot be used to start the server on Debian 9 (Stretch). It should be noted that if you are trying to perform a State Snapshot Transfer (SST) between a MySQL 5.6 and MySQL 5.7 node, this operation is not supported. It should be worth noting that InnoDB tablespaces that are outside the data directory (typically /var/lib/mysql) are not supported as they may not be copied over during an SST operation.

You can get the latest release of Galera Cluster from There are package repositories for Debian, Ubuntu, CentOS, RHEL, OpenSUSE and SLES. The latest versions are also available via the FreeBSD Ports Collection.

Jinja2 for better Ansible

Jinja2 is a modern and designer-friendly templating language for Python frameworks. It is fast, reliable and widely used for dynamic file generation based on its parameter. In this blog, I like to share how and where jinja2 template language used in Ansible and how we can create better Ansible playbook.

How it works

The Jinja variables and expressions indicated using the default delimiters as follows:

  • {% … %} for control statements (conditions)
  • {{ … }} for expressions (variables)
  • {# … #} for comments (describe the task)

Here’s an example Jinja expressions:

- hosts: vars_files: - vars.yml tasks: - name: Checking the IP address debug: msg: "IP address {{ ip }}" - name: Checking OS name debug: msg: "OS NAME {{ os_name }}"

Variable definitions are needed for Jinja to resolve expressions. In the above example, definitions are required for ip and os_name.

In Ansible, more then 21 places we can declare or define variable or value to the variable, below we have shared three important places:

  • Role defaults d
  • Passing a YAML or JSON file with the –vars-file option
  • Environment variables
Variable files (vars_file or vars)

Pass the path to a file containing variable definitions using the –vars-file option. The file path must be one of the following:

  • Absolute file path
  • Relative to the project path
  • Relative to the ansible folder

When –vars-file is passed, Ansible Container checks if the path is an absolute path to a file. If not, it checks for the file relative to the project path, which is the current working directory. If the file is still not found, it looks for the file relative to the ansible folder within the project path.

Variable files can also be specified using the vars_files directive under settings in container.yml. For example:

- hosts: vars_files: - vars.yml tasks: ...

This templating will helpful for many automation. It can be used to create a dynamic configuration for MySQL, Nagios depend upon the resources.


MySQL innodb_buffer_pool have to be 70% of total RAM for better performance. So it’s easy to make it from ansible variables like,

mysql_innodb_buffer_pool_size: "{{ (ansible_memtotal_mb * 0.7) | int }}M"


ansible_memtotal_mb will be retrieved from the setup module. Basically, it will return the system stats and assigned it to respected variables.

Command to get complete stats about the system.

To get stats about the local system:

ansible --connection=local -m setup

To get stats about the remote system from the inventory file:

ansible -i inventory_file group_name -m setup

This can be disabled by adding the “gather_facts: no” in the respected host.


- hosts: all gather_facts: no

Auto generated variable definitions using the ansible stats (system resources). Based on the condition it will revert the values for the respected variables.

Below is the sample yaml file which has the syntax and the variables.


--- # MySQL connection settings. mysql_port: "3306" mysql_data_dir: "/var/lib/mysql" mysql_pid_file: "{{ mysql_data_dir }}/" mysql_socket: "{{ mysql_data_dir }}/mysql.sock" # Slow query log settings. mysql_slow_query_log_enabled: yes mysql_slow_query_time: "2" mysql_slow_query_log_file: "{{ mysql_data_dir }}/mysql-slow.log" # Based on resources mysql_max_connections: "{{ (ansible_memtotal_mb // 12) | int }}" # Set .._buffer_pool_size up to 70% of RAM but beware of setting too high. mysql_innodb_buffer_pool_size: "{{ (ansible_memtotal_mb * 0.7) | int }}M" # Set .._log_file_size to 25% of buffer pool size. mysql_innodb_log_file_size: '{{ ((mysql_innodb_buffer_pool_size | string | replace("M", "") | int) * 0.25) | int }}M'

When we have the variable definition ready we need to apply it for generating the configuration file with required fields.

mysql_conf.j2: (template)

# {{ ansible_managed }} [client] port = {{ mysql_port }} socket = {{ mysql_socket }} [mysqld] port = {{ mysql_port }} datadir = {{ mysql_data_dir }} socket = {{ mysql_socket }} pid-file = {{ mysql_pid_file }} # Slow query log configuration. {% if mysql_slow_query_log_enabled %} slow_query_log = 1 slow_query_log_file = {{ mysql_slow_query_log_file }} long_query_time = {{ mysql_slow_query_time }} {% endif %} # InnoDB settings. innodb_buffer_pool_size = {{ mysql_innodb_buffer_pool_size }} innodb_log_file_size = {{ mysql_innodb_log_file_size }} # Setting max connections {% if mysql_max_connections | int > 3000 %} max_connections = 3000 thread_cache_size = {{ (3000 * 0.15) | int }} {% elif mysql_max_connections | int < 150 %} max_connections = 150 thread_cache_size = {{ (150 * 0.15) | int }} {% else %} max_connections = {{ mysql_max_connections }} thread_cache_size = {{ (mysql_max_connections | int * 0.15) | int }} {% endif %}

Above will have the condition mapping along with the variable precedence. If the condition matches it will return the values with respect to the resource or it will keep the default value.


- hosts: vars_files: - mysql_conf.yml tasks: - name: Creating my.cnf with respected resources template: src: mysql_conf.j2 dest: my.cnf

Command to generate my.cnf using the template:

ansible-playbook playbook.yml


Mydbops-MacBook-Air:jinja dhanasekar$ ansible-playbook playbook.yml PLAY [] ********************************************** TASK [Gathering Facts] **************************************** ok: [] TASK [Creating my.cnf with respected resources] *************** changed: [] PLAY RECAP ****************************************************                  : ok=2    changed=1    unreachable=0    failed=0

my.cnf: (OUTPUT)

# Ansible managed [client] port = 3306 socket = /var/lib/mysql/mysql.sock [mysqld] port = 3306 datadir = /var/lib/mysql socket = /var/lib/mysql/mysql.sock pid-file = /var/lib/mysql/ # Slow query log configuration. slow_query_log = 1 slow_query_log_file = /var/lib/mysql/mysql-slow.log long_query_time = 2 # InnoDB settings. innodb_buffer_pool_size = 5734M innodb_log_file_size = 1433M # Setting max connections max_connections = 682 thread_cache_size = 102

The above cnf was generated using the template. I hope it will give you a better idea about templating using Jinja2.

Key takeaways:

  1. Easy to debug. Line numbers of exceptions directly point to the correct line in the template even with the column number.
  2. Configurable syntax with respected the yaml files.

Ultimate Guide: Derived Queries with Spring Data JPA

The post Ultimate Guide: Derived Queries with Spring Data JPA appeared first on Thoughts on Java.

All applications need to execute queries to retrieve data from their database. With JPA and Hibernate, you can write JPQL, native SQL queries or Criteria Queries and you can use all of them with Spring Data JPA. In most cases, Spring Data just makes it a little bit easier.

Spring Data JPA acts as a layer on top of JPA, and it offers you 2 ways to define your query:

  1. You can let Spring Data JPA derive the query from the name of a method in your repository.
  2. You can define your own JPQL or native query using a @Query annotation.

Both options work great, and you should be familiar with them. In this article, I will focus on derived queries, and I will dive deeper into defining a custom query in a future article.

Example model

But before we do that, let’s take a quick look at the domain model that we will use in all of the examples. It consists of an Author and a Book entity with a many-to-many association between them.

Derived queries

Spring Data often gets praised for its derived query feature. As long as your method name starts with find…By, read…By, query…By, count…By, or get…By and follows the right pattern, Spring Data generates the required JPQL query.

That might sound like you will never need to write your own queries again. But that’s not the case. It’s a great way to define simple queries. But as soon as you need to use more than 2 query parameters or your query gets at least a little bit complex, you should use a custom query. That’s either because the query name gets really complicated to write and read or because you exceed the capabilities of the method name parser.

That said, let’s now take a look Spring Data JPA’s derived query feature.

Simple derived query with parameters

Let’s start with a simple example of a query that loads Author entities with a given firstName.

public interface AuthorRepository extends JpaRepository<Author, Long> { List<Author> findByFirstName(String firstName); }

As you can see, the definition of a derived query is pretty simple and self-explaining. I started the name of the method with findBy and then referenced the entity attributes for which I want to filter in the WHERE clause by its name. And then I define a method parameter with the same name as the referenced entity attribute.

You can then use this query by injecting an instance of the AuthorRepository and calling the findByFirstName method with the firstName you want to search for.

@RunWith(SpringRunner.class) @SpringBootTest(classes = Application.class) public class TestQueryMethod { @Autowired private AuthorRepository authorRepository; @Test @Transactional public void testQueryMethodAuthor() { List<Author> a = authorRepository.findByFirstName("Thorben"); } }

When you run this test case and activate the logging of SQL statements, you can see the generated SQL statement in your log file.

2019-04-16 10:38:22.523 DEBUG 24408 --- [ main] org.hibernate.SQL : select as id1_0_, author0_.first_name as first_na2_0_, author0_.last_name as last_nam3_0_, author0_.version as version4_0_ from author author0_ where author0_.first_name=?

Get this post as a free cheat sheet!
I prepared a free cheat sheet with the most important information and code snippets of this post. As always, you can download it for free from the Thoughts on Java Library.
Join Now!

Already a member? Login here.
Derived queries with multiple parameters

You can extend this method to search for Author entities with a given firstNameand lastName by combining them with And. Spring Data JPA, of course, also allows you to concatenate multiple checks using an Or clause.

public interface AuthorRepository extends JpaRepository<Author, Long> { List<Author> findByFirstNameAndLastName(String firstName, String lastName); }

As expected, when you call this repository method, Spring Data JPA and Hibernate generate an SQL query with a WHERE clause that filters the result based on the first_name and last_name columns.

2019-04-16 10:38:22.661 DEBUG 24408 --- [ main] org.hibernate.SQL : select as id1_0_, author0_.first_name as first_na2_0_, author0_.last_name as last_nam3_0_, author0_.version as version4_0_ from author author0_ where author0_.first_name=? and author0_.last_name=? Traverse associations in derived queries

If you want to filter for an attribute of an associated entity, you can traverse managed relationships by referencing the attribute that maps the association followed by the attribute of the related entity.

The following code snippet shows an example in which I reference the books attribute on the Author entity to traverse the mapped association and then reference the title attribute of the associated Book entity. That create a query that returns all authors who have written a book with a given title.

public interface AuthorRepository extends JpaRepository<Author, Long> { List<Author> findByBooksTitle(String title); }

When you call this query method, Hibernate generates an SQL query that joins the author and the book table and compares the value in the title column with the provided bind parameter value in the WHERE clause.

2019-04-16 10:37:31.200 DEBUG 20024 --- [ main] org.hibernate.SQL : select as id1_0_, author0_.first_name as first_na2_0_, author0_.last_name as last_nam3_0_, author0_.version as version4_0_ from author author0_ left outer join book_author books1_ on left outer join book book2_ on where book2_.title=? Other comparison operators

If you just reference an entity attribute in your method name, Spring Data JPA will generate a simple equals comparison. You can also specify different comparison operations by using one of the following keywords together with the name of your entity attribute:

  • Like – to check if the value of an entity is like a provided String.
  • Containing – to check if the value of an entity attribute contains the provided String.
  • IgnoreCase – to ignore the case when comparing the value of an entity attribute with a provided String.
  • Between – to check if the value of an entity attribute is between 2 provided values.
  • LessThan / GreaterThan – to check if the value of an entity attribute is less or greater then a provided one.

Here is a simple example that selects an Author entity which firstName contains the String “thor“ while ignoring its case.

public interface AuthorRepository extends JpaRepository<Author, Long> { List<Author> findByFirstNameContainingIgnoreCase(String firstName); }

When you call this method on the AuthorRepository, Spring Data JPA and Hibernate generate an SQL query that converts the provided String and the value in the first_name column to upper case and creates a LIKE expression to check if the first_name contains the provided String.

2019-04-16 10:38:22.693 DEBUG 24408 --- [ main] org.hibernate.SQL : select as id1_0_, author0_.first_name as first_na2_0_, author0_.last_name as last_nam3_0_, author0_.version as version4_0_ from author author0_ where upper(author0_.first_name) like upper(?) 2019-04-16 10:38:22.695 TRACE 24408 --- [ main] o.h.type.descriptor.sql.BasicBinder : binding parameter [1] as [VARCHAR] - [%tho%] Order the results of a derived query

You can, of course, also order your query results. In JPQL, this would require an ORDER BY clause in your query. With Spring Data JPA, you just need to add the words OrderBy to your query followed by the name of the entity attribute and the abbreviations ASC or DESC for your preferred order.

The following example uses this feature to retrieve all Book entities whose title contains a provided String in the ascending order of their title.

public interface BookRepository extends JpaRepository<Book, Long> { List<Book> findByTitleContainsOrderByTitleAsc(String title); }

When you call this method on the BookRepository, Spring Data JPA and Hibernate generate an SQL statement with the expected ORDER BY clause.

2019-04-16 15:34:44.517 DEBUG 17108 --- [ main] org.hibernate.SQL : select as id1_1_, book0_.title as title2_1_, book0_.version as version3_1_ from book book0_ where book0_.title like ? order by book0_.title asc

If you require dynamic ordering, you can add a parameter of type Sort to your query method. This is one of the special parameters supported by Spring Data JPA, and it triggers the generation of an ORDER BY clause.

public interface BookRepository extends JpaRepository<Book, Long> { List<Book> findByTitleContains(String title, Sort sort); }

You then need to instantiate a Sort object and specify the ordering the entity attributes that shall be used to generate the ORDER BY clause.

Sort sort = new Sort(Sort.Direction.ASC, "title"); List<Book> b = bookRepository.findByTitleContains("Hibernate", sort);

When you execute the test case, the findByTitleContains generates the same SQL statement as the previous method. But this time, you define the order dynamically,to only return the first 5 records. and you can adjust it at runtime.

2019-04-16 15:34:44.517 DEBUG 17108 --- [ main] org.hibernate.SQL : select as id1_1_, book0_.title as title2_1_, book0_.version as version3_1_ from book book0_ where book0_.title like ? order by book0_.title asc Limiting the number of results

Using Hibernate or any other JPA implementation, you can limit the number of returned records on the Query interface. With Spring Data JPA, you can do the same by adding the keywords Top or First followed by a number between the find and By keywords.

When you call the findFirst5ByTitleOrderByTitleAsc method on the BookRepository, Spring Data JPA and Hibernate generate a query that returns the first 5 Book entities whose title contains the given String.

public interface BookRepository extends JpaRepository<Book, Long> { List<Book> findFirst5ByTitleOrderByTitleAsc(String title); }

As you might have expected, the generated SQL statement contains a LIMIT clause to return the first 5 records.

2019-04-16 16:10:42.977 DEBUG 24352 --- [ main] org.hibernate.SQL : select as id1_1_, book0_.title as title2_1_, book0_.version as version3_1_ from book book0_ where book0_.title like ? order by book0_.title asc limit ? Paginate the results of a derived query

And after we had a look at ordering and limiting the number of returned records, we also need to talk about pagination. Spring Data JPA provides another special parameter for it. You just need to add a parameter of type Pageable to your query method definition and change the return type to Page<YourEntity>.

public interface BookRepository extends JpaRepository<Book, Long> { Page<Book> findAll(Pageable pageable); }

The Pageable interface makes it very easy to step through the pages. You just define which page number you want to retrieve and how many records should be on a page. That’s it. Spring Data JPA takes care of the rest.

Pageable pageable = PageRequest.of(0, 10); Page<Book> b = bookRepository.findAll(pageable);

As expected, the generated SQL query contains a LIMIT clause and it would also contain an OFFSET clause, if you don’t request the first page.

2019-04-16 16:43:49.221 DEBUG 17156 --- [ main] org.hibernate.SQL : select as id1_1_, book0_.title as title2_1_, book0_.version as version3_1_ from book book0_ limit ?

Get this post as a free cheat sheet!
I prepared a free cheat sheet with the most important information and code snippets of this post. As always, you can download it for free from the Thoughts on Java Library.
Join Now!

Already a member? Login here.

Spring Data JPA just provides a relatively small usability layer on top of JPA, but it offers several features that make working with JPA much easier. The derived query feature, which I showed you in this tutorial, is an excellent example of that.

Sure, you could write all these queries yourself, but you don’t have to. As long as your method name doesn’t get too long or complicated, I recommend to let Spring Data JPA generate the required JPQL statement and to take care of the query execution. As a rule of thumb, as long as your query doesn’t need more than 2 parameters, a derived query is the easiest approach.

If your query requires more than 2 parameters or you can’t express it in a short and simple method name, you should define the query yourself. I will show you how to do that in one of my next articles.

The post Ultimate Guide: Derived Queries with Spring Data JPA appeared first on Thoughts on Java.

Avoiding Double Payments in a Distributed Payments System

How we built a generic idempotency framework to achieve eventual consistency and correctness across our payments micro-service architecture.

Authors: Jon Chew and Ninad Khisti

One of the conference rooms in our San Francisco officeBackground

Airbnb has been migrating its infrastructure to a Service Oriented Architecture (“SOA”). SOA offers many upsides, such as enabling developer specialization and the ability to iterate faster. However, it also poses challenges for billing and payments applications because it makes it more difficult to maintain data integrity. An API call to a service that makes further API calls to downstream services, where each service changes state and potentially has side effects, is equivalent to executing a complex distributed transaction.

To ensure consistency among all services, protocols such as two-phase commit might be used. Without such a protocol, distributed transactions present challenges to maintaining data integrity, allowing graceful degradation, and achieving consistency. Requests also inevitably fail within distributed systems — connections will drop and timeout at some point, especially for transactions that consist of multiple network requests.

There are three different common techniques used in distributed systems to achieve eventual consistency: read repair, write repair, and asynchronous repair. There are benefits and trade-offs to each approach. Our payments system uses all three in various functions.

Asynchronous repair involves the server being responsible for running data consistency checks, such as table scans, lambda functions, and cron jobs. Additionally, asynchronous notifications from the server to the client are widely used in the payments industry to force consistency on the client side. Asynchronous repair, along with notifications, can be used in-conjunction with read and write repair techniques, offering a second line of defense with trade-offs in solution complexity.

Our solution in this particular post utilizes write repair, where every write call from the client to the server attempts to repair an inconsistent, broken state. Write repair requires clients to be smarter (we’ll expand on this later), and allows them to repeatedly fire the same request and never have to maintain state (aside from retries). Clients can thus request eventual consistency on-demand, giving them control over the user experience. Idempotency is an extremely important property when implementing write repair.

What is idempotency?

For an API request to be idempotent, clients can make the same call repeatedly and the result will be the same. In other words, making multiple identical requests should have the same effect as making a single request.

This technique is commonly used in billing and payment systems involving money movement — it is crucial a payments request gets processed completely exactly once (also known as “exactly-once delivery”). Importantly, if a single operation to move money is called multiple times, the underlying system should move money at most once. This is critical for Airbnb Payments APIs in order to avoid multiple payouts to the host, and even worse, multiple charges to the guest.

By design, idempotency safely allows multiple identical calls from clients using an auto-retry mechanism for an API to achieve eventual consistency. This technique is common among client-server relationships with idempotency, and something that we use in our distributed systems today.

At a high level, the below diagram illustrates some simple, example scenarios with duplicate requests and ideal idempotent behavior. No matter how many charge requests are made, the guest will always be charged at most once.

An idempotent request is a request that is made with identical parameters and the outcome will always be the same, consistently (the guest is charged at most once).The Problem Statement

Guaranteeing eventual consistency for our payments system is of the utmost importance. Idempotency is a desirable mechanism to achieve this in a distributed system. In an SOA world, we will inevitably run into problems. For example, how would clients recover if it failed to consume the response? What if the response was lost or the client timed out? What about race-conditions resulting in a user clicking “Book” twice? Our requirements included the following:

  • Instead of implementing a single, custom solution specific for a given use case, we needed a generic yet configurable idempotency solution to be used across Airbnb’s various Payments SOA services.
  • While SOA-based payment products were being iterated on, we couldn’t compromise on data consistency since this would impact our community directly.
  • We needed ultra low latency, so building a separate, stand-alone idempotency service wouldn’t be sufficient. Most importantly, the service would suffer from the same problems it was originally intended to solve.
  • As Airbnb is scaling its engineering organization using SOA, it would be highly inefficient to have every developer specialize on data integrity and eventual consistency challenges. We wanted to shield product developers from these nuisances to allow them to focus on product development and iterate faster.

Additionally, considerable trade-offs with code readability, testability and ability to troubleshoot were all considered non-starters.

Solution Explained

We wanted to be able to identify each incoming request uniquely. Additionally, we needed to accurately track and manage where a specific request was in its lifecycle.

We implemented and utilized “Orpheus”, a general-purpose idempotency library, across multiple payments services. Orpheus is the legendary Greek mythological hero who was able to orchestrate and charm all living things.

We chose a library as a solution because it offers low latency while still providing clean separation between high-velocity product code and low-velocity system management code. At a high level, it consists of the following simple concepts:

  • An idempotency key is passed into the framework, representing a single idempotent request
  • Tables of idempotency information, always read and written from a sharded master database (for consistency)
  • Database transactions are combined in different parts of the codebase to ensure atomicity, using Java lambdas
  • Error responses are classified as “retryable” or “non-retryable

We’ll detail how a complex, distributed system with idempotency guarantees can become self-healing and eventually consistent. We’ll also walk through some of the trade-offs and additional complexities from our solution that one should be mindful of.

Keep database commits to a minimum

One of the key requirements in an idempotent system is to produce only two outcomes, success or failure, with consistency. Otherwise, deviations in data can lead to hours of investigation and incorrect payments. Because databases offer ACID properties, database transactions can be effectively used to atomically write data while ensuring consistency. A database commit can be guaranteed to succeed or fail as a unit.

Orpheus is centered around the assumption that almost every standard API request can be separated into three distinct phases: Pre-RPC, RPC, and Post-RPC.

An “RPC”, or Remote Procedural Call(s), is when a client makes a request to a remote server and waits for that server to finish the requested procedure(s) before resuming its process. In the context of payments APIs, we refer to an RPC as a request to a downstream service over a network, which can include external payments processors and acquiring banks. In brief, here is what happens in each phase:

  1. Pre-RPC: Details of the payment request are recorded in the database.
  2. RPC: The request is made live to the external processor and the response is received. This is a place to do one or more idempotent computations or RPCs (for example, query service for the status of a transaction first if it’s a retry-attempt).
  3. Post-RPC: Details of the response from the external service are recorded in the database, including its successfulness and whether a bad request is retryable or not.

To maintain data integrity, we adhere to two simple ground rules:

  1. No service interaction over networks in Pre and Post-RPC phases
  2. No database interactions in the RPC phases

We essentially want to avoid mixing network communication with database work. We’ve learned the hard way that network calls (RPCs) during the Pre and Post-RPC phases are vulnerable and can result in bad things like rapid connection pool exhaustion and performance degradation. Simply put, network calls are inherently unreliable. Because of this, we wrapped Pre and Post-RPC phases in enclosing database transactions initiated by the library itself.

We also want to call out that a single API request may consist of multiple RPCs. Orpheus does support multi-RPC requests, but in this post we wanted to illustrate our thought process with only the simple single-RPC case.

As shown in the example diagram below, every database commit in each of the Pre-RPC and Post-RPC phases is combined into a single database transaction. This ensures atomicity — entire units of work (here the Pre-RPC and Post-RPC phases) can fail or succeed as a unit consistently. The motive is that the system should fail in a manner it could recover from. For example, if several API requests failed in the middle of a long sequence of database commits, it would be extremely difficult to systematically keep track of where each failure occurred. Note that the all network communication, the RPC, are explicitly separated from all database transactions.

Network communication is strictly kept separate from database transactions

A database commit here includes an idempotency library commit and application layer database commits, all combined in the same code block. Without being careful, this could actually begin to look really messy in real code (spaghetti, anyone?). We also felt that it shouldn’t be the responsibility of the product developer to call certain idempotency routines.

Java Lambdas to the Rescue

Thankfully, Java lambda expressions can be used to combine multiple sentences into a single database transaction seamlessly, with no impact to testability and code readability.

Here is an example, simplified usage of Orpheus, with Java lambdas in action:

At a deeper level, here is a simplified excerpt from the source code:

The separation of these concerns does offer some trade-offs. Developers must use forethought to ensure code readability and maintainability as other new ones constantly contribute. They also need to consistently evaluate that proper dependencies and data get passed along. API calls are now required to be refactored into three smaller chunks, which could arguably be restrictive on the way developers write code. It might actually be really difficult for some complex API calls to effectively be broken down into a three-step approach. One of our services has implemented a Finite State Machine with every transition as an idempotent step using StatefulJ, where you could safely multiplex idempotent calls in an API call.

Handling Exceptions — To Retry or Not to Retry?

With a framework like Orpheus, the server should know when a request is safe to retry and when it isn’t. For this to happen, exceptions should be handled with meticulous intention — they should be categorized as either “retryable” or “non-retryable”. This undoubtedly adds a layer of complexity for developers and could create bad side-effects if they are not judicious and prudent.

For example, suppose a downstream service was offline temporarily, but the exception raised was mistakenly labeled as “non-retryable” when it really should have been “retryable”. The request would be “failed” indefinitely, and subsequent retry requests would perpetually return the incorrect non-retryable error. Conversely, double payments could occur if an exception was labeled “retryable” when it actually should have been “non-retryable” and requires manual intervention.

In general, we believe unexpected runtime exceptions due to network and infrastructure issues (5XX HTTP statuses) are retryable. We expect these errors to be transient, and we expect that a later retry of the same request may eventually be successful.

We categorize validation errors, such as invalid input and states (for example, you can’t refund a refund), as non-retryable (4XX HTTP statuses) — we expect all subsequent retries of the same request to fail in the same manner. We created a custom, generic exception class that handled these cases, defaulting to “non-retryable”, and for certain other cases, categorized as “retryable”.

It is essential that request payloads for each request remain the same and are never mutated, otherwise it would break the definition of an idempotent request.

Categorizing “retryable” and “non-retryable” exceptions

There are of course more vague edge cases that need to be handled with care, such as handling a NullPointerException appropriately in different contexts. For example, a null value returned from the database due to a connectivity blip is different from an erroneous null field in a request from a client or from a third party response.

Clients Play a Vital Role

As alluded to at the beginning of this post, the client must be smarter in a write repair system. It must own several key responsibilities when interacting with a service that uses an idempotency library like Orpheus:

  • Pass in a unique idempotency key for every new request; reuse the same idempotency key for retries.
  • Persist these idempotency keys to the database before calling the service (to later use for retries).
  • Properly consume successful responses and subsequently unassign (or nullify) idempotency keys.
  • Ensure mutation of the request payload between retry attempts is not allowed.
  • Carefully devise and configure auto-retry strategies based on business needs (using exponential backoff or randomized wait times (“jitter”) to avoid the thundering herd problem).
How to Choose an Idempotency Key?

Choosing an idempotency key is crucial — the client can choose either to have request-level idempotency or entity-level idempotency based on what key to use. This decision to use one over the other would depend on different business use-cases, but request-level idempotency is the most straightforward and common.

For request-level idempotency, a random and unique key should be chosen from the client in order to ensure idempotency for the entire entity collection level. For example, if we wanted to allow multiple, different payments for a reservation booking (such as Pay Less Upfront), we just need to make sure the idempotency keys are different. UUID is a good example format to use for this.

Entity-level idempotency is far more stringent and restrictive than request-level idempotency. Say we want to ensure that a given $10 payment with ID 1234 would only be refunded $5 once, since we can technically make $5 refund requests twice. We would then want to use a deterministic idempotency key based on the entity model to ensure entity-level idempotency. An example format would be “payment-1234-refund”. Every refund request for a unique payment would consequently be idempotent at the entity-level (Payment 1234).

Each API Request Has an Expiring Lease

Multiple identical requests can be fired due to multiple user-clicks or if the client has an aggressive retry policy. This could potentially create race conditions on the server or double payments for our community. To avoid these, API calls, with the help of the framework, each need to acquire a database row-level lock on an idempotency key. This grants a lease, or a permission, for a given request to proceed further.

A lease comes with an expiration to cover the scenario where there are timeouts on the server side. If there is no response, then an API request can be retryable only after the current lease has expired. The application can configure the lease expiration and RPC timeouts according to their needs. A good rule of thumb is to have a higher lease expiration than the RPC timeout.

Orpheus additionally offers a maximum retryable window for an idempotency key to provide a safety net in order to avoid rogue retries from unexpected system behavior.

Recording the Response

We also record responses, to maintain and monitor idempotent behavior. When a client makes the same request for a transaction that has reached a deterministic end-state, such as a non-retryable error (validation errors, for example) or a successful response, the response is recorded in the database.

Persisting responses does have a performance trade-off — clients are able to receive quick responses on subsequent retries, but this table will have growth proportional to the growth of the application’s throughput. This table can quickly become bloated the table if we’re not careful. One potential solution is to periodically remove rows older than a certain timeframe, but removing an idempotent response too early has negative implications, too. Developers should also be wary not to make backwards-incompatible changes to the response entities and structure.

Avoid Replica Databases — Stick to Master

When reading and writing idempotency information with Orpheus, we chose to do this directly from the master database. In a system of distributed databases, there is a tradeoff between consistency and latency. Since we couldn’t tolerate high latency or reading uncommitted data, using master for these tables mades the most sense for us. In doing so, there is no need for using a cache or a database replica. If a database system is not configured for strong read-consistency (our systems are backed by MySQL), using replicas for these operations could actually create adverse effects from an idempotency perspective.

For example, suppose a payments service stored its idempotency information on a replica database. A client submits a payment request to the service, which ends up being successful downstream, but the client doesn’t receive a response due to a network issue. The response, currently stored on the service’s master database, will eventually be written to the replica. However, in the event of replica lag, the client could correctly fire an idempotent retry to the service and the response would not be recorded to the replica yet. Because the response “does not exist” (on the replica), the service could mistakenly execute the payment again, resulting in duplicate payments. The below example illustrates how just a few seconds of replica lag could cause significant financial impact to Airbnb’s community.

A duplicate payment created as a result from replica lag Response 1 is not found on the replica in the retry attempt (Request 2)Duplicate payment avoided by storing idempotency information only on master Response 1 found immediately on master and returned in the retry attempt (Request 2)

When using a single master database for idempotency, it was pretty clear that scaling would undoubtedly and quickly becomes an issue. We alleviated this by sharding the database by idempotency key. The idempotency keys we use have high cardinality and even distribution, making them effective shard keys.

Final thoughts

Achieving eventual consistency does not come without introducing some complexity. Clients need to store and handle idempotency keys and implement automated retry-mechanisms. Developers require additional context and must be surgically precise when implementing and troubleshooting Java lambdas. They must be deliberate when handling exceptions. Additionally, as the current version of Orpheus is battle-tested, we are continuously finding things to improve on: request-payload matching for retries, improved support for schema changes and nested migrations, actively restricting database access during RPC phases, and so on.

While these are considerations at top of mind, where has Orpheus gotten Airbnb Payments so far? Since the launch of the framework, we have achieved five nines in consistency for our payments, while our annual payment volume has simultaneously doubled (read this if you’d like to learn more on how we measure data integrity at scale).

If you’re interested in working on the intricacies of a distributed payments platform and helping travelers around the world belong anywhere, the Airbnb Payments team is hiring!

Shoutout to Michel Weksler and Derek Wang for their thought leadership and architectural philosophy on this project!

Avoiding Double Payments in a Distributed Payments System was originally published in Airbnb Engineering & Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Percona Universities in South America Next Week

There’s just one week to go before the first of this year’s Percona University events in South America. We’re really pleased with the lineup for all three events. We’re also incredibly happy with the response that we have had from the community. While we realize that a free event is… well… free… you are still giving up your time and travel. We place great value on that and we’re making sure that you’ll be hearing quality technical talks from Percona and our guest speakers. Most of the talks will be presented in Spanish – Portuguese in Brazil – although slides will be made available in English.

In fact, the events have been so popular that it’s quite possible that by the time you read this we’ll be operating a wait list for Montevideo (Tuesday April 23), Buenos Aires (Thursday, April 25), and São Paulo (Saturday, April 27).

A request…

So that others don’t miss out on this (rare!) opportunity, if you have reserved a place and can no longer attend, please can you cancel your booking? You can do that after logging into your eventbrite (or MeetUp) account.

After all… we want to make sure that these events are so successful that our CEO, Peter Zaitsev, will find the idea of returning to South America to present at a future series of events completely irresistible.

To wrap up this blog post, let me mention that for the Montevideo event, we’ll have some guest speakers from other companies, to make the content even better; and remember there is a raffle at the end, in which you can get a signed copy of the High Performance MySQL book signed by Peter himself (and other goodies to be discussed ;)).

Looking forward to seeing you next Tuesday!

MySQL High Availability Framework Explained – Part III: Failure Scenarios

In this three-part blog series, we introduced a High Availability (HA) Framework for MySQL hosting in Part I, and discussed the details of MySQL semisynchronous replication in Part II. Now in Part III, we review how the framework handles some of the important MySQL failure scenarios and recovers to ensure high availability.

MySQL Failure Scenarios Scenario 1 – Master MySQL Goes Down
  • The Corosync and Pacemaker framework detects that the master MySQL is no longer available. Pacemaker demotes the master resource and tries to recover with a restart of the MySQL service, if possible.
  • At this point, due to the semisynchronous nature of the replication, all transactions committed on the master have been received by at least one of the slaves.
  • Pacemaker waits until all the received transactions are applied on the slaves and lets the slaves report their promotion scores. The score calculation is done in such a way that the score is ‘0’ if a slave is completely in sync with the master, and is a negative number otherwise.
  • Pacemaker picks the slave that has reported the 0 score and promotes that slave which now assumes the role of master MySQL on which writes are allowed.
  • After slave promotion, the Resource Agent triggers a DNS rerouting module. The module updates the proxy DNS entry with the IP address of the new master, thus, facilitating all application writes to be redirected to the new master.
  • Pacemaker also sets up the available slaves to start replicating from this new master.

Thus, whenever a master MySQL goes down (whether due to a MySQL crash, OS crash, system reboot, etc.), our HA framework detects it and promotes a suitable slave to take over the role of the master. This ensures that the system continues to be available to the applications.

#MySQL High Availability Framework Explained – Part III: Failure ScenariosClick To Tweet Scenario 2 – Slave MySQL Goes Down
  • The Corosync and Pacemaker framework detects that the slave MySQL is no longer available.
  • Pacemaker tries to recover the resource by trying to restart MySQL on the node. If it comes up, it is added back to the current master as a slave and replication continues.
  • If recovery fails, Pacemaker reports that resource as down – based on which alerts or notifications can be generated. If necessary, the ScaleGrid support team will handle the recovery of this node.
  • In this case, there is no impact on the availability of MySQL services.
Scenario 3 – Network Partition – Network Connectivity Breaks Down Between Master and Slave Nodes

This is a classical problem in any distributed system where each node thinks the other nodes are down, while in reality, only the network communication between the nodes is broken. This scenario is more commonly known as split-brain scenario, and if not handled properly, can lead to more than one node claiming to be a master MySQL which in turn leads to data inconsistencies and corruption.

Let’s use an example to review how our framework deals with split-brain scenarios in the cluster. We assume that due to network issues, the cluster has partitioned into two groups – master in one group and 2 slaves in the other group, and we will denote this as [(M), (S1,S2)].

  • Corosync detects that the master node is not able to communicate with the slave nodes, and the slave nodes can communicate with each other, but not with the master.
  • The master node will not be able to commit any transactions as the semisynchronous replication expects acknowledgement from at least one of the slaves before the master can commit. At the same time, Pacemaker shuts down MySQL on the master node due to lack of quorum based on the Pacemaker setting ‘no-quorum-policy = stop’. Quorum here means a majority of the nodes, or two out of three in a 3-node cluster setup. Since there is only one master node running in this partition of the cluster, the no-quorum-policy setting is triggered leading to the shutdown of the MySQL master.
  • Now, Pacemaker on the partition [(S1), (S2)] detects that there is no master available in the cluster and initiates a promotion process. Assuming that S1 is up to date with the master (as guaranteed by semisynchronous replication), it is then promoted as the new master.
  • Application traffic will be redirected to this new master MySQL node and the slave S2 will start replicating from the new master.

Thus, we see that the MySQL HA framework handles split-brain scenarios effectively, ensuring both data consistency and availability in the event the network connectivity breaks between master and slave nodes.

This concludes our 3-part blog series on the MySQL High Availability (HA) framework using semisynchronous replication and the Corosync plus Pacemaker stack. At ScaleGrid, we offer highly available hosting for MySQL on AWS and MySQL on Azure that is implemented based on the concepts explained in this blog series. Please visit the ScaleGrid Console for a free trial of our solutions.

Exporting Masked and De-Identified Data from MySQL

In all likelihood your MySQL database contains valuable and sensitive information. Within that database, MySQL protects that data using features such as encryption, access controls, auditing, views, and more. However in many cases you may need to share some of this data, but must at the same time protect that sensitive information.  … Facebook Twitter Google+ LinkedIn

How to implement a database job queue using SKIP LOCKED

Introduction In this article, we are going to see how we can implement a database job queue using SKIP LOCKED. I decided to write this article while answering this Stack Overflow question asked by Rafael Winterhalter. Since SKIP LOCKED is a lesser-known SQL feature, it’s a good opportunity to show you how to use it and why you should employ it, especially when implementing a job queue task. Domain Model Let’s assume we have the following Post entity which has a status Enum property looking as follows: The PostStatus Enum encapsulates the... Read More

The post How to implement a database job queue using SKIP LOCKED appeared first on Vlad Mihalcea.

PHP JWT & REST API Authentication Tutorial: Login and Signup

In this tutorial, we'll learn how to add JWT authentication to our REST API PHP application. We'll see what JWT is and how it works. We'll also see how to get the authorization header in PHP. We'll create REST API endpoints for allowing users to login and signup to access protected resources. What is JWT JWT stands for JSON Web Token and comprised of user encrypted information that can be used to authenticate users and exchange information between clients and servers. When building REST API, instead of server sessions commonly used in PHP apps we tokens which are sent with HTTP headers from the server to clients where they are persisted (usually using local storage) then attached to every outgoing request originating from the client to the server. The server checks the token and allow or deny access to the request resource. RESTful APIs are stateless. This means that requests from clients should contain all the necessary information required to process the request. If you are building a REST API application using PHP, you are not going to use the $_SESSION variable to save data about the client's session. This means, we can not access the state of a client (such as login state). In order to solve the issue, the client is responsible for perisiting the state locally and send it to the sever with each request. Since these important information are now persisted in the client local storage we need to protect it from eyes dropping. Enter JWTs. A JWT token is simply a JSON object that has information about the user. For example: { "user": "bob", "email": "", "access_token": "at145451sd451sd4e5r4", "expire_at"; "11245454" } Since thos token can be tampered with to get access to protected resources. For example, a malicious user can change the previous token as follows to access admin only resources on the server: { "user": "administrator", "email": "" } To prevent this situation, we JWTs need to be signed by the server. If the token is changed on the client side, the token's signature will no longer be valid and the server will deny access to the requested resource. How JWT Works JWT tokens are simply encrypted user's information like identifier, username, email and password. When users are successfully logged in the server, the latter will produce and send a JWT token back to the client. This JWT token will be persisted by the client using the browser's local storage or cookies and attached with every outgoing request so if the user requests access to certain protected resources, the token needs to be checked first by the server to allow or deny access. What is PHP-JWT php-jwt is a PHP library that allows you to encode and decode JSON Web Tokens (JWT) in PHP, conforming to RFC 7519. Prerequisites You must have the following prerequsites to be able to follow this tutorial from scratch: You need PHP 7, Composer and MySQL database system installed on your development environment, You need to have basic knowledge of PHP and SQL. Creating the MySQL Database and Table(s) If you have the prerequisites, let's get started by creating the MySQL database. We'll be using the MySQL client installed with the server. Open a terminal and run the following command to invoke the client: $ mysql -u root -p You need to enter your MySQL password when prompted. Next, let's create a database using the following SQL instruction: mysql> create database db; Note: Here we assume you have a MySQL user called root. You need to change that to the name of an existing MySQL user. You can also use phpMyAdmin or any MySQL client you are comfortable with to create the database and SQL tables. Let's now select the db database and create a users table that will hold the users of our application: mysql> use db; mysql> CREATE TABLE IF NOT EXISTS `Users` ( `id` INT AUTO_INCREMENT , `first_name` VARCHAR(150) NOT NULL , `last_name` VARCHAR(150) NOT NULL , `email` VARCHAR(255), `password` VARCHAR(255), PRIMARY KEY (`id`) ); Creating the Project Directory Structure Let's create a simple directory strucutre for our project. In your terminal, navigate to your working directory and create a folder for our project: $ mkdir php-jwt-example $ cd php-jwt-example $ mkdir api && cd api $ mkdir config We first created the project's directory. Next, we created an api folder. Inside it, we created a config folder. Connecting to your MySQL Database in PHP Navigate to the config folder and create a database.php file with the following code: <?php // used to get mysql database connection class DatabaseService{ private $db_host = "localhost"; private $db_name = "mydb"; private $db_user = "root"; private $db_password = ""; private $connection; public function getConnection(){ $this->connection = null; try{ $this->connection = new PDO("mysql:host=" . $this->db_host . ";dbname=" . $this->db_name, $this->db_user, $this->db_password); }catch(PDOException $exception){ echo "Connection failed: " . $exception->getMessage(); } return $this->connection; } } ?> Installing php-jwt Let's now proceed to install the php-jwt library using Composer. In your terminal, run the following command from the root of your project's directory: $ composer require firebase/php-jwt This will donwload the php-jwt library into a vendor folder. You can require the php-jwt library to encode and decode JWT tokens using the following code: <?php require "vendor/autoload.php"; use \Firebase\JWT\JWT; Adding the User Registration API Endpoint Inside the api folder, create a register.php file and add the following code to create a new user in the MySQL database: <?php include_once './config/database.php'; header("Access-Control-Allow-Origin: * "); header("Content-Type: application/json; charset=UTF-8"); header("Access-Control-Allow-Methods: POST"); header("Access-Control-Max-Age: 3600"); header("Access-Control-Allow-Headers: Content-Type, Access-Control-Allow-Headers, Authorization, X-Requested-With"); $firstName = ''; $lastName = ''; $email = ''; $password = ''; $conn = null; $databaseService = new DatabaseService(); $conn = $databaseService->getConnection(); $data = json_decode(file_get_contents("php://input")); $firstName = $data->first_name; $lastName = $data->last_name; $email = $data->email; $password = $data->password; $table_name = 'Users'; $query = "INSERT INTO " . $table_name . " SET first_name = :firstname, last_name = :lastname, email = :email, password = :password"; $stmt = $conn->prepare($query); $stmt->bindParam(':firstname', $firstName); $stmt->bindParam(':lastname', $lastName); $stmt->bindParam(':email', $email); $password_hash = password_hash($password, PASSWORD_BCRYPT); $stmt->bindParam(':password', $password_hash); if($stmt->execute()){ http_response_code(200); echo json_encode(array("message" => "User was successfully registered.")); } else{ http_response_code(400); echo json_encode(array("message" => "Unable to register the user.")); } ?> Adding the User Login API Endpoint Inside the api folder, create a login.php file and add the following code to check the user credentials and return a JWT token to the client: <?php include_once './config/database.php'; require "../vendor/autoload.php"; use \Firebase\JWT\JWT; header("Access-Control-Allow-Origin: *"); header("Content-Type: application/json; charset=UTF-8"); header("Access-Control-Allow-Methods: POST"); header("Access-Control-Max-Age: 3600"); header("Access-Control-Allow-Headers: Content-Type, Access-Control-Allow-Headers, Authorization, X-Requested-With"); $email = ''; $password = ''; $databaseService = new DatabaseService(); $conn = $databaseService->getConnection(); $data = json_decode(file_get_contents("php://input")); $email = $data->email; $password = $data->password; $table_name = 'Users'; $query = "SELECT id, first_name, last_name, password FROM " . $table_name . " WHERE email = ? LIMIT 0,1"; $stmt = $conn->prepare( $query ); $stmt->bindParam(1, $email); $stmt->execute(); $num = $stmt->rowCount(); if($num > 0){ $row = $stmt->fetch(PDO::FETCH_ASSOC); $id = $row['id']; $firstname = $row['first_name']; $lastname = $row['last_name']; $password2 = $row['password']; if(password_verify($password, $password2)) { $secret_key = "YOUR_SECRET_KEY"; $issuer_claim = "THE_ISSUER"; $audience_claim = "THE_AUDIENCE"; $issuedat_claim = TIME_IN_SECONDS; // issued at $notbefore_claim = TIME_IN_SECONDS; //not before $token = array( "iss" => $issuer_claim, "aud" => $audience_claim, "iat" => $issuedat_claim, "nbf" => $notbefore_claim, "data" => array( "id" => $id, "firstname" => $firstname, "lastname" => $lastname, "email" => $email )); http_response_code(200); $jwt = JWT::encode($token, $secret_key); echo json_encode( array( "message" => "Successful login.", "jwt" => $jwt )); } else{ http_response_code(401); echo json_encode(array("message" => "Login failed.", "password" => $password, "password2" => $password2)); } } ?> We now have two restful endpoints for registering and log users in. At this point, you can use a REST client like Postman to intercat with the API. First, start your PHP server using the following command: $ php -S A development server will be running from the address. Let's now, create a user in the database by sending a POST request to the api/register.php endpoint with a JSON body that contains the first_name, last_name, email and password: You should get an 200 HTTP response with a User was successfully registered. message. Next, you need to send a POST request to the /api/login.php endpoint with a JSON body that contains the email and password used for registering the user: You should get a Successful login message with a JWT token. The JWT token needs to be persisted in your browser's local storage or cookies using JavaScript then attached to each send HTTP request to access a protected resource on your PHP server. Protecting an API Endpoint Using JWT Let's now see how we can protected our server endpoints using JWT tokens. Before accessing an endpoint a JWT token is sent with every request from the client. The server needs to decode the JWT and check if it's valid before allowing access to the endpoint. Inside the api folder, create a protected.php file and add the following code: <?php include_once './config/database.php'; require "../vendor/autoload.php"; use \Firebase\JWT\JWT; header("Access-Control-Allow-Origin: *"); header("Content-Type: application/json; charset=UTF-8"); header("Access-Control-Allow-Methods: POST"); header("Access-Control-Max-Age: 3600"); header("Access-Control-Allow-Headers: Content-Type, Access-Control-Allow-Headers, Authorization, X-Requested-With"); $secret_key = "YOUR_SECRET_KEY"; $jwt = null; $databaseService = new DatabaseService(); $conn = $databaseService->getConnection(); $data = json_decode(file_get_contents("php://input")); $authHeader = $_SERVER['HTTP_AUTHORIZATION']; $arr = explode(" ", $authHeader); /*echo json_encode(array( "message" => "sd" .$arr[1] ));*/ $jwt = $arr[1]; if($jwt){ try { $decoded = JWT::decode($jwt, $secret_key, array('HS256')); // Access is granted. Add code of the operation here echo json_encode(array( "message" => "Access granted:", "error" => $e->getMessage() )); }catch (Exception $e){ http_response_code(401); echo json_encode(array( "message" => "Access denied.", "error" => $e->getMessage() )); } } ?> You can now send a POST request with an Authorization header in the following formats: JWT <YOUR_JWT_TOKEN_HERE> Or also using the bearer format: Bearer <YOUR_JWT_TOKEN_HERE> Conclusion In this tutorial, we've seen how to implement REST API JWT authentication in PHP and MySQL.

Shinguz: MariaDB Prepared Statements, Transactions and Multi-Row Inserts

Last week at the MariaDB/MySQL Developer Training we had one participant asking some tricky questions I did not know the answer by heart.

Also MariaDB documentation was not too verbose (here and here).

So time to do some experiments:

Prepared Statements and Multi-Row Inserts SQL> PREPARE stmt1 FROM 'INSERT INTO `test`.`test` (`data`) VALUES (?), (?), (?)'; Statement prepared SQL> SET @d1 = 'Bli'; SQL> SET @d2 = 'Bla'; SQL> SET @d3 = 'Blub'; SQL> EXECUTE stmt1 USING @d1, @d2, @d3; Query OK, 3 rows affected (0.010 sec) Records: 3 Duplicates: 0 Warnings: 0 SQL> DEALLOCATE PREPARE stmt1; SQL> SELECT * FROM test; +----+------+---------------------+ | id | data | ts | +----+------+---------------------+ | 1 | Bli | 2019-04-15 17:26:22 | | 2 | Bla | 2019-04-15 17:26:22 | | 3 | Blub | 2019-04-15 17:26:22 | +----+------+---------------------+
Prepared Statements and Transactions SQL> SET SESSION autocommit=Off; SQL> START TRANSACTION; SQL> PREPARE stmt2 FROM 'INSERT INTO `test`.`test` (`data`) VALUES (?)'; Statement prepared SQL> SET @d1 = 'BliTrx'; SQL> EXECUTE stmt2 USING @d1; Query OK, 1 row affected (0.000 sec) SQL> SET @d1 = 'BlaTrx'; SQL> EXECUTE stmt2 USING @d1; Query OK, 1 row affected (0.000 sec) SQL> COMMIT; -- Theoretically we should do a START TRANSACTION; here again... SQL> SET @d1 = 'BlubTrx'; SQL> EXECUTE stmt2 USING @d1; Query OK, 1 row affected (0.000 sec) SQL> ROLLBACK; SQL> DEALLOCATE PREPARE stmt2; SQL> SELECT * FROM test; +----+---------+---------------------+ | id | data | ts | +----+---------+---------------------+ | 10 | BliTrx | 2019-04-15 17:33:30 | | 11 | BlaTrx | 2019-04-15 17:33:39 | +----+---------+---------------------+
Prepared Statements and Transactions and Multi-Row Inserts SQL> SET SESSION autocommit=Off; SQL> START TRANSACTION; SQL> PREPARE stmt3 FROM 'INSERT INTO `test`.`test` (`data`) VALUES (?), (?), (?)'; Statement prepared SQL> SET @d1 = 'Bli1Trx'; SQL> SET @d2 = 'Bla1Trx'; SQL> SET @d3 = 'Blub1Trx'; SQL> EXECUTE stmt3 USING @d1, @d2, @d3; Query OK, 3 rows affected (0.000 sec) SQL> COMMIT; -- Theoretically we should do a START TRANSACTION; here again... SQL> SET @d1 = 'Bli2Trx'; SQL> SET @d2 = 'Bla2Trx'; SQL> SET @d3 = 'Blub2Trx'; SQL> EXECUTE stmt3 USING @d1, @d2, @d3; Query OK, 3 rows affected (0.000 sec) SQL> ROLLBACK; -- Theoretically we should do a START TRANSACTION; here again... SQL> SET @d1 = 'Bli3Trx'; SQL> SET @d2 = 'Bla3Trx'; SQL> SET @d3 = 'Blub3Trx'; SQL> EXECUTE stmt3 USING @d1, @d2, @d3; Query OK, 3 rows affected (0.001 sec) SQL> COMMIT; SQL> DEALLOCATE PREPARE stmt3; SQL> SELECT * FROM test; +----+----------+---------------------+ | id | data | ts | +----+----------+---------------------+ | 1 | Bli1Trx | 2019-04-15 17:37:50 | | 2 | Bla1Trx | 2019-04-15 17:37:50 | | 3 | Blub1Trx | 2019-04-15 17:37:50 | | 7 | Bli3Trx | 2019-04-15 17:38:38 | | 8 | Bla3Trx | 2019-04-15 17:38:38 | | 9 | Blub3Trx | 2019-04-15 17:38:38 | +----+----------+---------------------+

Seems all to work as expected. Now we know it for sure!

Taxonomy upgrade extras:  transaction insert prepared statements multi-row insert