Planet MySQL

Improvements to ROLLUP in MySQL

ROLLUP has been supported in MySQL for sometime now. But until now its use has come with two implementation restrictions: it cannot be combined with DISTINCT or ORDER BY in the same query expression. Starting in 8.0.12, these limitations have been lifted.…

Facebook Twitter Google+ LinkedIn

Southern California Linux Expo (ScaLE) with MySQL, Mar 7-10, 2019

This week the Southern California Linux Expo 2019 (ScaLE) will be hold in Pasadena, US where you can find MySQL & Oracle Cloud representatives. Please find details about our presence below:

We are happy to invite you for stopping by at our shared booth and attend talks, we are looking forward to talking to you @ScaLE!

Percona XtraBackup 8.0.5 Is Now Available

Percona is glad to announce the release of Percona XtraBackup 8.0.5 on March 4, 2019. Downloads are available from our download site and from apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.

Percona XtraBackup 8.0.5 introduces the support of undo tablespaces created using the new syntax (CREATE UNDO TABLESPACE) available since MySQL 8.0.14. Percona XtraBackup also supports the binary log encryption introduced in MySQL 8.0.14.

Two new options were added to xbstream. Use the --decompress option with xbstream to decompress individual qpress files. With the --decompress-threads option, specify the number of threads to apply when decompressing. Thanks to Rauli Ikonen for this contribution.

This release of Percona XtraBackup is a General Availability release ready for use in a production environment.

All Percona software is open-source and free.

Please note the following about this release:

  • The deprecated innobackupex has been removed. Use the xtrabackup command to back up your instances: $ xtrabackup --backup --target-dir=/data/backup
  • When migrating from earlier database server versions, backup and restore and using Percona XtraBackup 2.4 and then use mysql_upgrade from MySQL 8.0.x
  • If using yum or apt repositories to install Percona XtraBackup 8.0.5, ensure that you have enabled the new tools repository. You can do this with the percona-release enable tools release command and then install the percona-xtrabackup-80 package.
New Features
  • PXB-1548: Percona XtraBackup enables updating the ib_buffer_pool file with the latest pages present in the buffer pool using the --dump-innodb-buffer-pool option. Thanks to Marcelo Altmann for contribution.
  • PXB-1768: Added support for undo tablespaces created with the new MySQL 8.0.14 syntax.
  • PXB-1781: Added support for binary log encryption introduced in MySQL 8.0.14.
  • PXB-1797: For xbstream, two new options were added. The --decompress option enables xbstream to decompress individual qpress files. The --decompress-threads option controls the number of threads to apply when decompressing. Thanks to Rauli Ikonen for this contribution.
Bugs Fixed
  • Using --lock-ddl-per-table caused the server to scan all records of partitioned tables which could lead to the “out of memory” error. Bugs fixed PXB-1691 and PXB-1698.
  • When Percona XtraBackup was started run with the --slave-info, incorrect coordinates were written to the xtrabackup_slave_info file. Bug fixed PXB-1737
  • Percona XtraBackup could crash at the prepare stage when making an incremental backup if the variable innodb-rollback-segments was changed after starting the MySQL Server. Bug fixed PXB-1785.
  • The full backup could fail when Percona Server was started with the --innodb-encrypt-tables parameter. Bug fixed PXB-1793.

Other bugs fixed: PXB-1632PXB-1715PXB-1770PXB-1771PXB-1773.

Upcoming Webinar Wed 3/6: High Availability and Disaster Recovery in Amazon RDS

Join Percona CEO Peter Zaitsev as he presents High Availability and Disaster Recovery in Amazon RDS on Wednesday, March 6th, 2019, at 11:00 AM PST (UTC-8) / 2:00 PM EST (UTC-5).

Register Now

In this hour-long webinar, Peter describes the differences between high availability (HA) and disaster recovery (DR). Afterward, Peter will go through scenarios detailing how each is handled manually and in Amazon RDS.

He will review the pros and cons of managing HA and DR in the traditional database environment as well in the cloud. Having full control of these areas is daunting. However, Amazon RDS makes meeting these needs easier and more efficient.

Regardless of which path you choose, monitoring your environment is vital. Peter’s talk will make that message clear. A discussion of metrics you should regularly review to keep your environment working correctly and performing optimally concludes the webinar.

In order to learn more register for Peter’s webinar on High Availability and Disaster Recovery in Amazon RDS.

2019 Database Trends – SQL vs. NoSQL, Top Databases, Single vs. Multiple Database Use

Wondering which databases are trending in 2019? We asked hundreds of developers, engineers, software architects, dev teams, and IT leaders at DeveloperWeek to discover the current NoSQL vs. SQL usage, most popular databases, important metrics to track, and their most time-consuming database management tasks. Get the latest insights on MySQL, MongoDB, PostgreSQL, Redis, and many others to see which database management systems are most favored this year.

SQL vs. NoSQL

As any database administrator knows, the first question you have to ask yourself is whether to use a SQL or NoSQL database for your application. What’s the difference between the two?

SQL Databases

Also known as relational databases, define and manipulate data based on structured query language (SQL). These are most popularly used and useful for handling structured data that organizes elements of data and standardizes how they relate to one another and to different properties.

NoSQL Databases

Also known as non-relational databases, allow you to store and retrieve unstructured data using a dynamic schema. NoSQL is popularly used for its flexible ability to create a unique structure, and can be document, graph, column, or even KeyValue organized as a data structure.

SQL has had a large lead over the non-relational alternatives for decades, but NoSQL is quickly closing the gap with popular databases such as MongoDB, Redis, and Cassandra. Though many organizations are choosing to migrate from legacy databases, such as Oracle, not all are moving to NoSQL way. Based on our findings, SQL still holds 60% with rising demand for systems such as PostgreSQL:

SQL Database Use: 60.48% NoSQL Database Use: 39.52%

Most Popular Databases

So, which databases are most popular in 2019? Knowing that SQL was used by over 3/5 of respondents, you might assume Oracle stole the show. Guess again. MySQL dominated this report with 38.9% use, followed by MongoDB at 24.6%, PostgreSQL at 17.4%, Redis at 8.4%, and Cassandra at 3.0%. Oracle trailed behind at just 1.8% from these database reporters, and CouchDB, Berkeley DB, Microsoft SQL Server, Redshift, Firebase, Elasticsearch, and InfluxDB users combined our Other category at 2.4%.

While these numbers might shock, there’s no mistaking the rise in popularity of MySQL, MongoDB, and PostgreSQL. So how does this survey compare to best-known source for database management system trends? DB-Engines Ranking – Trend Popularity report places these leaders in the top 5, but Oracle keeps hold at number one and Microsoft SQL Server at number 3.

While we expected to see a much higher presence of Oracle database users, their representation was low at the world’s largest developer expo.

Single Database vs. Multi-Database Use

Multi-database type use has exploded over the past decade, compared to the traditional strategy of throwing all of your eggs in one basket. How much so? Almost half of the organizations we spoke with actually use more than one type of database to power their applications than a single database! 44.3% reported using multiple databases, while 55.7% are operating with one:

2019 Database Trends – SQL vs. NoSQL, Top Databases, Single vs. Multiple Database UseClick To Tweet SQL & NoSQL Multiple Database Combinations

So, knowing that almost half of our respondents are combining multiple databases to support their products, what types of database management systems are they using together? This one is less of a shocker, 75.6% of multiple database type use is made up of a combination of both SQL and NoSQL databases. This solidifies that case that, for many organizations, one size does not fit all. While you might have a preference over SQL vs. NoSQL, there’s no denying the fact that they both offer clear advantages of the other. Instead of limiting your organization to one database type, evolve (or develop) your data strategy for compatibility so that these powerful database management systems can complement each other and fill the gaps in your data needs!

SQL + NoSQL Database Use: 75.6% SQL + SQL Database Use: 14.6% NoSQL + NoSQL Database Use: 9.8%

Most Popular Multiple Database Type Combinations

If you’re a single database type user who’s considering adding another database type to your mix, this section might be of high interest – which databases, SQL and NoSQL alike, are most popularly used together.

The clear winner with over 1/3 of multiple database type use is the combination of MySQL and MongoDB. While MongoDB is often considered an alternative to MySQL, the two databases do work well together when properly designed. The second most popular combination was MySQL and PostgreSQL together. These two SQL databases are clear competitors, but can be jointly used to store different data sets. As you can see in the above section graph, the 9.76% representation of MySQL and PostgreSQL comprises a large majority of the SQL + SQL use in multiple databases.

MySQL + MongoDB: 34.15% MySQL + PostgreSQL: 9.76% MongoDB + PostgreSQL: 7.32% MongoDB + Redis: 7.32% MySQL + MongoDB + PostgreSQL: 4.88% MySQL + MongoDB + PostgreSQL + Redis: 4.88%

Most Time-Consuming Database Management Task

So, now that we know which database management systems, types, and use combinations are most popular, let’s take a look at what’s eating up our time on the database management front. As anyone who’s managed a database before knows, there are countless tasks involved with maintaining a healthy production deployment. So, we were not surprised to see such a diverse response in our most time-consuming database management task question.

Monitoring came in at number one with 12.6% from our respondents, barely breaking ahead of backups, managing disk space, scaling, and joining tables who all tied for number two with 11.6% each. Standalone at number three was maintaining and redistributing changes between views and stored programs at 8.7%, and again a tie at number 4 with 7.2% for each cleaning and database setup. Upgrades came in at number five with 6.5%, and a dozen other tasks made up the 11.6% Other category, including migrations, queries, comparing, tuning, and replication.

Most Important Metric Tracked For Database Performance

While we saw a wide variety of responses for the most important database management task, the most important metric to track for performance had three significant leaders.

Query response time was not only the most tracked metric, but also the majority with 51.8% of responses! We expected this to lead as it came in at 30.8% from a Most Time-Consuming PostgreSQL Management Task report we compiled in October of 2018, but significantly increased when we expanded this question to all database management systems. Query speed is an extremely important metric to track on a continuous basis so you can identify slow-running queries that could be affecting your application performance. Many DBA’s use a Slow Query Analyzer tool to identify problem queries, see which sort of query it is associated with, understand their queries by time range, and find the top queries causing read-load in your system to identify those queries that are not indexed.

Coming in at number two was reliability with 18.2% from our respondents. Needless to say, while outages are less common than slow queries, if your databases goes down, it will have the most serious impact on your performance. That’s why it’s critically important to implement a high availability framework for your production deployments to keep your databases online if there’s an outage in one of your datacenters.

Memory then came in at number three with 8.2% of responses. The more memory you have available, the better your database should perform. Both understanding and monitoring memory usage should be high on your list, as insufficient or exhausted memory will cause your database to read and write data to your disk which is dramatically slower.

Thanks to the hundreds of participants who contributed to the cloud database trends report at DeveloperWeek 2019! We are excited to share these insights, and hope to hear your thoughts below in our comments.

Your SQL IS NOT JavaScript (neither PHP)

IS NULL, IS NOT NULL, truthiness among other assertions on MySQL and PostgreSQL

People like to complain about JavaScript, how can one thing be equal to the other, i.e. null == undefined evaluates to true unless you use the triple equals ===. Other dynamic typed languages have its peculiarities, such as Ruby where 0 == true evaluates to true, the reason for Ruby is that considers 0 as a value and any value evaluates to true.

How about SQL? The answer is… it depends.

Which database are you using?

Some may be more forgiving, like MySQL doing casts for you all over the place, or more strict like PostgreSQL where you can only compare the truthiness of something of the same type.

Assertion MySQL PostgreSQL ('A' = TRUE) IS TRUE 0 invalid input syntax for type boolean: “A” ('A' IS TRUE) IS TRUE 0 invalid input syntax for type boolean: “A” (1 = TRUE) IS TRUE 1 operator does not exist: integer = boolean (1 IS TRUE) IS TRUE 1 argument of IS TRUE must be type boolean, not type integer ('1' IS TRUE) IS TRUE 1 1 ('0' IS FALSE) IS TRUE 1 1 (1 = '1') IS TRUE 1 1 (0 = '0') IS TRUE 1 1

Beyond the implications of wrong type comparison, as you can see on lines 1 through 4 where MySQL evaluates values as true or false, you should also worry about what the fact of a column being NULL may imply when comparing two columns.

Suddenly knowing if it is true or false doesn’t matter because you cornered yourself with a third possible value: NULL. By definition NULL is not a value neither a state, it should be considered “garbage”, and no column where you know its value and type would be garbage.

In this front both databases operate the same way:

Assertion SQL (NULL = NULL) IS TRUE 0 (NULL = NULL) IS FALSE 0 (NULL IS NULL) IS TRUE 1 (0 = NULL) IS NULL 1 (1 = NULL) IS NULL 1 ('A' = NULL) IS NULL 1 (TRUE = NULL) IS NULL 1 (FALSE = NULL) IS NULL 1 (0 IS NULL) IS TRUE 0 (0 IS NOT NULL) IS TRUE 0 (0 = NULL) IS TRUE 0 (0 = NULL) IS FALSE 0 (0 = NULL) IS NOT TRUE 1 (0 = NULL) IS NOT FALSE 1 ('NULL' = NULL) IS TRUE 0 ('NULL' IS NULL) IS TRUE 0

The highlighted parts are usually assumptions that people expect to behave differently, for example that zero is not equal to NULL. That catches many people by surprise. It shouldn’t catch you by surprise though, because 0 is a value. The default behavior of MySQL of casting NULL to 0, on a INTEGER NOT NULL column without a DEFAULT value taught a whole generation of developers that this assumption is true. The same applies when casting a string-based column that has no default value on a NOT NULL column to empty string.

Is that a spaceship operator?

While writing this post and taking a peek into MySQL documentation, I never noticed that a NULL-safe equal operator, <=>, existed, unfortunately the MySQL website only shows the documentation from 5.5 to 8.0, so I can’t be certain if this existed in prior releases.

SELECT 1 IS NOT NULL, 1 = NULL, 1 <=> NULL; Assertion MySQL 1 IS NOT NULL 1 1 = NULL NULL 1 <=> NULL 0 The case where MySQL thinks it is PHP

Another behavior I discovered when doing some JOINs were the string and integer comparison. Both databases get SELECT 1 = '1'; as true, however, MySQL takes a step further:

SELECT 1 = '1a', '1' * 3, '1abc' + 4; Assertion MySQL PostgreSQL 1 = '1a' 1 invalid input syntax for integer: “1a” '1' * 3 3 3 '1abc' + 4 5 invalid input syntax for integer: “1abc”

Thus behaving like PHP. PHP is expected to change this behavior soon with this RFC. PostgreSQL, in this case, is forgiving in casting a string as an integer, only if there is an integer inside of the quotes. If you mix the integer with other characters, it throws an error as you can see above.

I Know!

No, I don’t. I thought I knew enough SQL. Apparently, I was wrong.

Independent of the language you are using, you should be aware that assertions that are true in your language may not be true in SQL and the other way as well.

MySQL Replication Setup

  • Replication used to replicate data from the Master node to a slave node[Replica].
  • By default Replication is asynchronous.
  • It uses binary logs for reading data from the Master node and relay log on slave[Replica].
  • Depending on the configuration, you can replicate all databases, selected databases, or even selected tables within a database.

Replication Methods:

  • File-based Replication:

The traditional method is based on replicating events from the master’s binary log, and requires the log files and positions in them to be synchronized between master and slave.

  • GTID Replication:

The newer method based on global transaction identifiers (GTIDs) is transactional and therefore does not require working with log files or positions within these files. Replication using GTIDs guarantees consistency between master and slave as long as all transactions committed on the master have also been applied on the slave.

  Configuration variables for replication:

MASTER Conf:

server_id=145

gtid-mode=on

enforce-gtid-consistency=1

log-slave-updates=1

log-bin=/mysql/log/binary/mysql-bin145

relay-log=/mysql/log/relay/relay-bin145

relay_log_recovery= on

master_info_repository = TABLE

relay_log_info_repository = TABLE

SLAVE conf:

server_id=146

gtid-mode=on

enforce-gtid-consistency=1

log-slave-updates=1

log-bin=/mysql/log/binary/mysql-bin146

relay-log=/mysql/log/relay/relay-bin146

relay_log_recovery= on

master_info_repository = TABLE

relay_log_info_repository = TABLE

# don’t replicate the mysql database

replicate_wild_ignore_table = mysql.%

read-only

skip-slave-start

 

Replication setup:
  1. Create replication user on MASTER with replication privileges.

CREATE USER IF NOT EXISTS ‘rpluser’@’%’ IDENTIFIED BY ‘rpluser@1234’;

GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO ‘rpluser’@’%’;

  1. On SLAVE: setup replication as follows :

File-based Replication:

CHANGE MASTER TO MASTER_HOST='<MASTER_IP’,MASTER_USER=’rpluser’,MASTER_PASSWORD=’rpluser1234′,MASTER_PORT=3306,MASTER_LOG_FILE=’mysql-bin.000002′,MASTER_LOG_POS=690;

GTID Replication:

CHANGE MASTER TO MASTER_HOST='<MASTER_IP’,MASTER_USER=’rpluser’,MASTER_PASSWORD=’rpluser1234′,MASTER_PORT=3306,MASTER_AUTO_POSITION=1;

  1. Start slave

START SLAVE;

  1. Check slave status

SHOW SLAVE STATUS;

Slave_IO_Running and Slave_SQL_Running column value should be ‘YES’

Ref: https://dev.mysql.com/doc/refman/5.7/en/replication-gtids-howto.html

 

MySQL 5.7 Binary install on Linux

In Generic binary Installation method, we are extracting compressed tar files of MySQL and later we can do customization for MySQL setup as per our requirement. Like having different data directory, log directory etc.

MySQL Generic Binaries can be used to install MySQL on Unix/Linux platforms.

MySQL Installation has a dependency on the libaio library. Make sure that is there on a server before mysql install.

  • Create a mysql user and group, which we will use as a service account for the mysql server.

shell> groupadd mysql

shell> useradd -r -g mysql -s /bin/false mysql

  • Extract MySQL binaries TAR to mysql base directory for example in /mysql dir

  • Create require directories and set appropriate permissions.

shell> cd mysql

shell> mkdir data mysql-files log tmp socket pid

shell> chown -R mysql:mysql .

shell> chmod 750 mysql-files

Sample my.cnf

  • Add/update the following variables to the appropriate configuration section of my.cnf.

[mysqld]

#GENERAL

user = mysql

port = 3306

server_id = 1

skip-name-resolve

bind-address= 0.0.0.0

default_storage_engine = InnoDB

character_set_server = utf8

socket = /mysql/socket/mysql.sock

pid-file = /mysql/pid/mysqld.pid

# DATA STORAGE #

basedir= /mysql

datadir = /mysql/data

#INNODB

innodb_file_per_table = 1

innodb_buffer_pool_size = 512M

innodb_data_file_path = ibdata1:100M;ibdata2:100M:autoextend

innodb_log_buffer_size = 16M

innodb_log_file_size = 100M

innodb-log-files-in-group = 3

innodb_flush_method = O_DIRECT

innodb_file_format = Barracuda

innodb_tmpdir= /mysql/tmp

#MyISAM

key_buffer_size = 64M

#Logging

log_error = /mysql/log/mysqld.log

master_info_repository = TABLE

relay_log_info_repository = TABLE

log-bin = /mysql/log/binary/mysql-bin1

relay-log = /mysql/log/relay/relay-bin1

relay_log_recovery = on

log-slave-updates = 1

expire_logs_days = 15

gtid-mode = on

enforce-gtid-consistency = 1

binlog_format = row

binlog_row_image = minimal

# General logs (only enable for debugging – it use too much I/o)

#general-log = on

#general-log-file = /mysql/log/general-query.log

# Slow query logs (optional)

slow_query_log = on

long_query_time= 3

slow_query_log_file = /mysql/log/slow-query.log

max_allowed_packet=300M

[mysql]

socket = /mysql/socket/mysql.sock

[client]

socket = /mysql/pid/mysql.sock

port = 3306

 

  • Initialize mysql server with my.cnf file configuration.

shell> bin/mysqld –defaults-file=/etc/my.cnf –initialize –user=mysql —basedir=/mysql/ –datadir=/mysql/data

NOTE: –defaults-file should be the first option always while using a command-line option.

START MySQL

shell> bin/mysqld -–defaults-file=/etc/my.cnf –user=mysql

NOTE: If it is not working try to start mysql with mysqld_safe, this for Syetem V platforms which needs mysqld_safe to run mysqld. 

shell> bin/mysqld_safe -–defaults-file=/etc/my.cnf –user=mysql

MySQL SHUTDOWN

shell> mysqladmin –u -p shutdown

Run MySQL as a service

shell> cp support-files/mysql.server /etc/init.d/mysql

MySQL START/STOP/STATUS

shell> /etc/init.d/mysql start/stop/status

Ref: https://dev.mysql.com/doc/refman/5.7/en/binary-installation.html

 

MySQL PITR The Fastest Way With DevOps

Point In Time Recovery - is a nightmare for DBAs if the MySQL clusters are self managed. It was 10PM, after had my dinner I was simply watching some shows in YouTube. And my phone was ringing, the customer on other side. Due to some bad queries, one of the main table get updated without where clause. Then suddenly everyone joined the call and asking me to bring the data back. That day it took 6 to 8 Hours to bring the data. Yes, every DBAs will do one or two biggest mistakes. In my carrier I would say this was that day. So here is my MySQL PITR the fastest way with DevOps.

Where I failed in this DR setup?
  • PITR starts with last full backup + binlogs
  • I missed in my backup script to add --master-data, So I don’t know how to start applying binlogs.
  • No Delay replica. I got the call within 10mins when the data has been messed up. But all of my replicas are real time sync. Its affected all of them.
  • And the effort, launching new server, Install MySQL and configure MySQL parameters. It took 15-20mins.
  • Restoring the logical backup of a pretty huge database. Its few hours.
  • Then copy binlogs(or read binlog from remote server), get the start position and find the stop position and execute them on DR server. Again the effort matters here.

After that incident, I compared my scenario with AWS RDS. How simple is that? Just few clicks it’ll provision an instance and apply the binlogs. But this is also fail in one case. What happen if need to restore till a position. Its not possible. You have option for select the time not binlog position. But anyhow I like this option RDS(in CloudSQL it sucks). Then I build this Simplified PITR in one click with the help of RunDeck.

DR setup:

I done modification in my DR site. My entire Infra in GCP and backup files are sync with GCS bucket.

  • Setup a delay replica for 15mins.
  • Replaced logical backup with Percona XtraBackup.
  • Image(in AWS terms its AMI) is ready with MySQL and Optimized MySQL parameters.
  • Finally a RunDeck Job which will do this magic.

Before you start implementing this solution, you need to setup the below things.

  1. Setup a backup job with Percona Xtrabackup along with incremental backups. (Naming conversion must be same as I mentioned in that blog).
  2. Setup Rundeck and add your DR server in RunDeck.(Read my rundeck kickstart series)
  3. Make your DR server to pull the binlog files from Master. (You can use read binlog from remote server, SCP or something like that).
Percona XtraBackup:

My PITR depends on my xtrabackup. I have configured my backup job with

  • 12.00AM full backup (runtime 15mins)
  • Every 1Hr incremental Backup(run time 10mins)

The backup process will take 10-15mins to complete. If I want to restore till 1.04AM, then I should restore the FULL Backup(12AM), but this 1AM incremental backup is still going on. I should not use this backup to perform PITR. Then what should we do?

  1. Restore 12AM full backup
  2. Apply binlogs after 12AM to 1.04AM

Then we don’t need to bother about the on going incremental backup.

Read Here :Automation Script For Percona Xtrabackup FULL/Incremental

What Backup files needs to Restore?

Now in my scripting, based on the above scenario, I have added a condition for which backup files are needs to be restored. For safer side I considered 15mins as my backup complete time.

1. If hour !=00(not equal to 12AM), then check if minutes >15. Now I can use FULL backup + last Incremental backups.

1 2 3 4 ex: 2019-03-04 02:20:30 hour =02 (!=00) minutes =20 (>15) Restore: FULL Backup + 01 Inc + 02 Inc

2. If hour !=00(not equal to 12AM), then check if minutes <15. Then the incremental backup is going on this time. So we should avoid this current Inc backup and use FULL Backup alone + Current Hour -1 Inc backup

1 2 3 4 ex: 2019-03-04 05:10:30 hour=01 (!=0) minutes=10 (<15) Restore: FULL backup + Inc1 to Inc4

3. If hour=00 and minute<15, then this time FULL Backup process is going on, so we should not use this backup. In this case we should sync yesterday’s FULL backup + Yesterday’s Inc 1 to Inc 23.

So this is my IF condition to select which file needs to sync.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 d='2019-03-04 20:42:53" s_d=`echo $d | awk -F ' ' '{print $1}'` s_h=`echo $d | awk -F ' ' '{print $2}'| awk -F ':' '{print $1}'` s_m=`echo $d | awk -F ' ' '{print $2}' | awk -F ':' '{print $2}'` if [ $s_h -ne 00 ] then if [ $s_m -gt 15 ] then echo "sync FULL + actual hour" else if [ $s_h -eq 01 ] then echo "sync actual full only" else echo "sync FULL + hour -1" fi fi else if [ $s_m -gt 15 ] then echo "sync actual full only" else echo "last day full + 23inc" fi fi Transfer.sh:

For my database infra, I have configured transfer.sh file sharing. because if your MySQL is down, then we should not use read-from-remote-server. SCP also needs shell based access and we configured SSH session recording with 2FA, so SCP will not work. Its not a big deal you can read how I configured Transfer.sh for my internal network from the below link.

CloudOps At Scale: Secure access patterns using Bastion Host and Transfer.Sh

Again its your choice that how to bring Binlog files to DR Server.

Get the Binlogs:

Now its another question, What/How many binlog files do I need to copy from Master?

Its actually from your last backup type(Full or Inc) to next backup hour.

XtraBackup will have the info about the binlog file name and position. For this PITR, we should restore FULL Backup + Inc1 to Inc5. Once we executed the xtrabackup prepare command(restore), the xtrabackup-binlog_info will contains the exact binlog file and position during that backup. So we need to copy from that backup to what are all other files created within our PITR time.

Step Involved:
  1. Once we decided the time, Download the backup files from GCS bucket.
  2. Restore the XtraBackup.
  3. Point MySQL DataDir to Restored Xtrabackup directory.
  4. Get the necessary binlog files from Master Server.
  5. Decode the Binlog files using mysqlbinlog utility.
  6. Restore the decoded binlog file.
On DR Server:

Before run this job, please make sure the below things:

  1. Master/DR server are added to RunDeck Server.
  2. RunDeck user on DR and Master server should have root access
  3. Install wget on DR server.
  4. RunDeck user’s Private Key must be located on DR Servers RunDeck users home directory (/home/rundeck/.ssh/id_rsa) This is for login to Master sever without password and export the Binlog files.
  5. I used Transfer.sh, so I have created an alias to run curl upload command by calling transfer filename. So you can use transfer.sh or bring your own copy mechanism.

If you are not using transfer.sh then ignore this step.

On Master:

Replace 10.10.10.10 with your transfer.sh server IP.

1 2 3 4 5 6 7 8 vi /home/rundeck/.bashrc transfer() { curl --progress-bar --upload-file "$1" http://10.10.10.10/$(basename $1) | tee /dev/null; } alias transfer=transfer

Save and close.

Add rundeck user to mysql group.

usermod -aG mysql rundeck Lets create the RunDeck Job:
  1. Go to RunDeck –> Jobs –> New Job
  2. JobName –> Point In Time Recovery
  3. In the Options section add as Option.
  4. Option Name/Option Label –> datetime
  5. Under the input type, select DATE
  6. Date Format: YYYY-MM-DD HH:mm:ss
  7. Required: Yes

Under the Workflow –> Node steps, click on script. Copy and Paste the below shell script.

Step 1: Download Backup files from GCS d='@option.datetime@' echo $d s_d=`echo $d | awk -F ' ' '{print $1}'` s_h=`echo $d | awk -F ' ' '{print $2}'| awk -F ':' '{print $1}'` s_m=`echo $d | awk -F ' ' '{print $2}' | awk -F ':' '{print $2}'` if [ $s_h -ne 00 ] then if [ $s_m -gt 15 ] then echo "sync FULL + actual hour" mkdir -p /mysqldata/FULL gsutil -m rsync -r gs://xtrabackup/$s_d/FULL/ /mysqldata/FULL/ for i in $(seq 1 $s_h); do echo "inc"$i mkdir -p /mysqldata/inc$i gsutil -m rsync -r gs://xtrabackup/$s_d/inc$i/ /mysqldata/inc$i/ done else if [ $s_h -eq 01 ] then echo "sync actual full only" mkdir -p /mysqldata/FULL gsutil -m rsync -r gs://xtrabackup/$s_d/FULL/ /mysqldata/FULL/ else echo "sunc FULL + hour -1" inc=$(expr $s_h - 1) mkdir -p /mysqldata/FULL gsutil -m rsync -r gs://xtrabackup/$s_d/FULL/ /mysqldata/FULL/ for i in $(seq 1 $inc); do mkdir -p /mysqldata/inc$i echo "inc"$i gsutil -m rsync -r gs://xtrabackup/$s_d/inc$i/ /mysqldata/inc$i/ done fi fi else if [ $s_m -gt 15 ] then echo "sync actual full only" mkdir -p /mysqldata/FULL gsutil -m rsync -r gs://xtrabackup/$s_d/FULL/ /mysqldata/FULL/ else echo "last day full + 23inc" yesterday=`date -d "$s_d -1 days" +%Y-%m-%d` mkdir -p /mysqldata/FULL gsutil -m rsync -r gs://xtrabackup/$yesterday/FULL/ /mysqldata/FULL/ for i in $(seq 1 23); do mkdir -p /mysqldata/inc$i echo "inc"$i gsutil -m rsync -r gs://xtrabackup/$yesterday/inc$i/ /mysqldata/inc$i/ done fi fi

If you are using AWS, Azure or FTP, then replace this gsutil -m rsync -r gs://xtrabackup/$s_d/FULL/ /mysqldata/FULL/ line with your commands.

Also replace /mysqldata for where you need to download Backup files.

Step 2: Restore the Xtrabackup BACKUP_DIR='/mysqldata' echo `date '+%Y-%m-%d %H:%M:%S:%s'`": Decompressing the FULL backup" >> $BACKUP_DIR/xtrabackup-restore.log xtrabackup --decompress --remove-original --parallel=30 --target-dir=$BACKUP_DIR/FULL echo `date '+%Y-%m-%d %H:%M:%S:%s'`": Decompressing Done !!!" >> $BACKUP_DIR/xtrabackup-restore.log echo `date '+%Y-%m-%d %H:%M:%S:%s'`": Prepareing FULL Backup ..." >> $BACKUP_DIR/xtrabackup-restore.log xtrabackup --prepare --apply-log-only --target-dir=$BACKUP_DIR/FULL echo `date '+%Y-%m-%d %H:%M:%S:%s'`": FULL Backup Preparation Done!!!" >> $BACKUP_DIR/xtrabackup-restore.log P=1 while [ -d $BACKUP_DIR/inc$P ] && [ -d $BACKUP_DIR/inc$(($P+1)) ] do echo `date '+%Y-%m-%d %H:%M:%S:%s'`": Decompressing incremental:$P" >> $BACKUP_DIR/xtrabackup-restore.log xtrabackup --decompress --remove-original --parallel=30 --target-dir=$BACKUP_DIR/inc$P echo `date '+%Y-%m-%d %H:%M:%S:%s'`": Decompressing incremental:$P Done !!!" >> $BACKUP_DIR/xtrabackup-restore.log echo `date '+%Y-%m-%d %H:%M:%S:%s'`": Prepareing incremental:$P" >> $BACKUP_DIR/xtrabackup-restore.log xtrabackup --prepare --apply-log-only --target-dir=$BACKUP_DIR/FULL --incremental-dir=$BACKUP_DIR/inc$P echo `date '+%Y-%m-%d %H:%M:%S:%s'`": incremental:$P Preparation Done!!!" >> $BACKUP_DIR/xtrabackup-restore.log P=$(($P+1)) done if [ -d $BACKUP_DIR/inc$P ] then echo `date '+%Y-%m-%d %H:%M:%S:%s'`": Decompressing the last incremental:$P" >> $BACKUP_DIR/xtrabackup-restore.log xtrabackup --decompress --remove-original --parallel=30 --target-dir=$BACKUP_DIR/inc$P echo `date '+%Y-%m-%d %H:%M:%S:%s'`": Decompressing the last incremental:$P Done !!!" >> $BACKUP_DIR/xtrabackup-restore.log echo `date '+%Y-%m-%d %H:%M:%S:%s'`": Prepareing the last incremental:$P" >> $BACKUP_DIR/xtrabackup-restore.log xtrabackup --prepare --target-dir=$BACKUP_DIR/FULL --incremental-dir=$BACKUP_DIR/inc$P echo `date '+%Y-%m-%d %H:%M:%S:%s'`": Last incremental:$P Preparation Done!!!" >> $BACKUP_DIR/xtrabackup-restore.log fi

BACKUP_DIR='/mysqldata' - Replace /mysqldata with your location on DR server where is you backups are downloaded from the step 1. I have 40 core CPU, so I used 30 parallel threads. You can modify based on you DR server’s CPU in --parallel=30

Step 3: Start MySQL with Restored Data BACKUP_DIR='/mysqldata' DATADIR='/mysqldata/data' sudo service mysql stop sudo mv $BACKUP_DIR/FULL $BACKUP_DIR/data sudo sed -i.bak "s|.*datadir.*|datadir=$DATADIR|" /etc/my.cnf sudo semanage fcontext -a -t mysqld_db_t "/mysqldata/data(/.*)?" sudo restorecon -R -v /mysqldata/data sudo chown -R mysql:mysql /mysqldata/data sudo chmod 750 /mysqldata/data sudo service mysql start

Again /mysqldata replace this with your backup file location. And the Restored backup will be in the directory called FULL. Im renaming it to data.

Step 4: Download binlog files from Master: d='@option.datetime@' DATADIR='/mysqldata/data' [ -e /tmp/out ] && rm /tmp/out start_binlog=`head -n 1 $DATADIR/xtrabackup_binlog_info | awk -F ' ' '{print $1}'` stop_binlog=`ssh 10.10.10.40 "find /mysql-binlog/ -type f -newermt \"$d\" -exec basename {} \; | sort | head -1"` files=$(ssh 10.10.10.40 "seq -w `echo $start_binlog | awk -F'.' '{print $2}'` `echo $stop_binlog | awk -F'.' '{print $2}'`") for x in `echo $files`; do ssh 10.10.10.40 "transfer /mysql-binlog/mysql-bin.$x" ; done >> /tmp/out binlogfiles=`perl -pe 's#(?<=.)(?=http://)#\n#g' /tmp/out` for x in `echo $binlogfiles`; do wget -P /mysqldata/binlogdump $x ; done
  • Replace /mysqldata/data with your data directory of mysql.
  • Replace 10.10.10.40 with your master server IP.
  • Replace /mysql-binlog/ your master server’s binlog location.
  • transfer /mysql-binlog/mysql-bin.$x this command will run the transfer.sh alias and upload the binlog file to transfer.sh serve. If you want to use your own copy process then replace transfer and /mysql-binlog/ location of binlog location. And $x is the files. So don’t replace that.
  • /mysqldata/binlogdump location on the DR server to download binlog files
Step 5: Decode the Binlog files: d='@option.datetime@' DATADIR='/mysqldata/data' mkdir -p /mysqldata/binlogdump binlogfilename=`head -n 1 $DATADIR/xtrabackup_binlog_info | awk -F ' ' '{print $1}'` binlogposition=`head -n 1 $DATADIR/xtrabackup_binlog_info | awk -F ' ' '{print $2}'` files=`ls /mysqldata/binlogdump/` cd /mysqldata/binlogdump mysqlbinlog -d Eztaxi --start-position="${binlogposition}" --stop-datetime="${d}" ${files} --disable-log-bin > mysqldata-delta.sql
  • /mysqldata/data replace with your data directory of MySQL.
  • /mysqldata/binlogdump Downloaded binlog file location.
Step 6: Restore the Binlog file to MySQL: mysql -u root -p'password' < /mysqldata/binlogdump/mysqldata-delta.sql
  • p'password' Replace with your MySQL root password. You can use Parameterized password in Rundeck. See here.
  • '/mysqldata/binlogdump/ location of the decoded binlog file.

Steps are done.

Now on Matched Nodes select the DR server and click on Create button.

Trigger the PITR:

Click on the Date Picker Icon and select the date and time for your PITR.

Now click on Run Job Now button and go for a Cup of Coffee.

Here is my job execution has been done in 18mins.

Further improvements and development:
  • In my case, I have my DR server ready with MySQL installed. (as an Image). Before trigger this job, I’ll launch a VM with this image and validate the connectivity between Rundeck and GCS bucket. Im planning to use Terraform template which is also a part of this RunDeck job.
  • My complete setup is in GCP, you can perform the same on AWS, Azure or even On-Prem and comment below how it goes.
  • I have hardcoded all of mysql data directory, binlog location and everything. If you have enough time, use Options in Rundeck to get these things from an Input during the job execution.
  • finally, this is also same as RDS PITR, I never gave option for restore binlog till this position. But we can achieve this on Step 5, just add a variable called position and give your position number. You can use Options to get this value as an Input. and replace --stop-datetime="${d}" with --stop-position="${pos}".

If you have any difficulties in understanding the steps and scripts, please comment below.

Want to learn more basics of RunDeck Setup? here you go.

Happy Disaster Recovery and PITR :)

Using SQLAlchemy with MySQL 8

I few months ago, I wrote about using the Django framework with MySQL 8. There are also other Python frameworks that are worth considering. In this blog, I will look at using SQLAlchemy with MySQL 8.

In order for you to be able to use MySQL 8 with SQLAlchemy, you need three pieces of software: MySQL Server, MySQL Connector/Python, and SQLAlchemy. I will go through the installations, then I will look at a code example.

Information

The examples in this blog uses MySQL Server 8.0.15, MySQL Connector/Python 8.0.15, and SQLAlchemy 1.2.18. It should be possible to reuse the instructions with other MySQL versions as well, except in older MySQL versions you need to explicitly use the utf8mb4 character set.

If you do not want to read through the whole blog, the main thing for using MySQL and MySQL Connector/Python with SQLAlchemy is to create the engine with dialect set to mysql and the driver to mysqlconnector:

engine = sqlalchemy.create_engine( 'mysql+mysqlconnector://<user>:<password>@<host>:<port>/<default_db>...') Installing MySQL Server

There are several ways to install MySQL Server and which one is the best depends on your circumstances and preferences. For the sake of this blog, I will show how MySQL Server can be installed on Oracle Linux/RHEL/CentOS 7 using RPMs and on Microsoft Windows using MySQL Installer. For more options, see the installation chapter in the reference manual. Let’s look at the Linux installation first.

RPM Install on Enterprise Linux

MySQL provides repositories for several Linux distributions including the Oracle Linux/RHEL/CentOS family. This makes it easy to install MySQL. The step to install the repository definition is:

shell$ sudo yum install https://dev.mysql.com/get/mysql80-community-release-el7-1.noarch.rpmLoaded plugins: langpacks, ulninfo Repository ol7_developer_EPEL is listed more than once in the configuration mysql80-community-release-el7-1.noarch.rpm | 25 kB 00:00:00 Examining /var/tmp/yum-root-Ts4OzC/mysql80-community-release-el7-1.noarch.rpm: mysql80-community-release-el7-1.noarch Marking /var/tmp/yum-root-Ts4OzC/mysql80-community-release-el7-1.noarch.rpm to be installed Resolving Dependencies --> Running transaction check ---> Package mysql80-community-release.noarch 0:el7-1 will be installed --> Finished Dependency Resolution Dependencies Resolved ====================================================================================================== Package Arch Version Repository Size ====================================================================================================== Installing: mysql80-community-release noarch el7-1 /mysql80-community-release-el7-1.noarch 31 k Transaction Summary ====================================================================================================== Install 1 Package Total size: 31 k Installed size: 31 k Is this ok [y/d/N]: y Downloading packages: Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : mysql80-community-release-el7-1.noarch 1/1 Verifying : mysql80-community-release-el7-1.noarch 1/1 Installed: mysql80-community-release.noarch 0:el7-1 Complete!

Now, you can install MySQL Server. There are several RPMs to choose from and which you need depends on which other features you need to use. A common set of RPMs can be installed as:

shell$ sudo yum install mysql-community-server mysql-community-libs \ mysql-community-libs-compat mysql-community-common mysql-community-client ...

Note

If you have another MySQL installation, it will be upgraded to the latest release (at the time of writing 8.0.15).

On the first start, the data directory will be initialized:

shell$ sudo systemctl start mysqld

To keep a fresh installation secure, a random password has been set for the root user. This can be found from the MySQL error log:

shell$ sudo grep password /var/log/mysqld.log 2018-11-05T08:05:09.985857Z 5 [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: KWNfuA!1r:PF

Use this password to connect to MySQL and update the password (please use a strong password):

shell$ mysql --user=root --password Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 19 Server version: 8.0.15 MySQL Community Server - GPL Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> ALTER USER root@localhost IDENTIFIED BY 'Kij0@jDi~Faf4'; Query OK, 0 rows affected (0.13 sec)

MySQL is now ready for use. Before continuing, I will show an example of installing MySQL on Microsoft Windows.

Microsoft Windows

On Microsoft Windows an easy way to install MySQL is to use the MySQL Installer. The installer can be downloaded from the MySQL download site. The MySQL Installer can be used to install most MySQL products. If you start MySQL Installer for the first time, you will be taken directly to the screen to choose the products to install; if you already have installed products, you will need to choose to add new products.

On the Select Products and Features screen, choose to install MySQL Server 8.0 (MySQL Installer will list the latest release from the list of available products):

Installing MySQL Server from MySQL Installer – The screen shot is for 8.0.13, but other than the version number, it is the same for 8.0.15.

Optionally, you can filter the list of products. Feel free to choose other products you want. MySQL Notifier can be useful for starting and stopping MySQL, if you do not plan to have MySQL running at all times. You can also install MySQL Connector/Python this way, however for this blog a different method will be used.

Follow the installation wizard. For this blog, the default choices will work, though during the configuration you may want to ensure Open Windows Firewall ports for network access is unchecked unless you need remote access.

Preparing MySQL Server

While MySQL is now ready to work with SQLAlchemy, you will likely want to do a few more preparation steps. Here creating the MySQL user and schema (database) used by your application will be covered.

An example of creating the user pyuser@localhost and give it all privileges to the sqlalchemy schema and to create the sqlalchemy schema is:

mysql> CREATE USER pyuser@localhost IDENTIFIED BY 'Py@pp4Demo'; Query OK, 0 rows affected (0.01 sec) mysql> GRANT ALL ON sqlalchemy.* TO pyuser@localhost; Query OK, 0 rows affected (0.01 sec) mysql> CREATE SCHEMA sqlalchemy; Query OK, 1 row affected (0.01 sec)

This will allow the pyuser user to connect from the same host as MySQL Server is installed by authenticating with the password Py@app4Demo.

Installing MySQL Connector/Python and SQLAlchemy

Both MySQL Connector/Python and SQLAlchemy can be installed in a platform independent way using the pip command. Since Python 2.7 is soon end of life, I will assume Python 3.6 in this blog. (MySQL Connector/Python 8.0.13 and later also supports Python 3.7.)

If you do not have Python 3.6 installed on Oracle Linux/RHEL/CentOS 7, you can easily install it for example from for EPEL repository. Assuming you have configured the EPEL repository, the following steps install Python 3.6, enable pip, and update pip to the latest version:

shell$ yum install python36 shell$ python3.6 -m ensurepip shell$ python3.6 -m pip install --upgrade pip

You can now use python3.6 to invoke Python 3.6. In the following, replace python with python3.6 if you have installed Python 3.6 in this way.

To install the latest MySQL Connector/Python release (currently 8.0.15):

PS> python -m pip install mysql-connector-python Collecting mysql-connector-python Using cached https://files.pythonhosted.org/packages/31/45/ef8cf013918108f508a1a1bb5539abaff5f78f3a569f7fa30232967713c9/mysql_connector_python-8.0.15-cp36-cp36m-win_amd64.whl Collecting protobuf>=3.0.0 (from mysql-connector-python) Downloading https://files.pythonhosted.org/packages/5d/5c/476f473c2efc0a8d9fd7185e6c08dcbd21c469698e2a80487fa054b8c5ba/protobuf-3.7.0-cp36-cp36m-win_amd64.whl (1.1MB) 100% |████████████████████████████████| 1.1MB 6.6MB/s Requirement already satisfied: six>=1.9 in c:\users\myuser\appdata\local\programs\python\python36\lib\site-packages (from protobuf>=3.0.0->mysql-connector-python) (1.11.0) Requirement already satisfied: setuptools in c:\users\myuser\appdata\local\programs\python\python36\lib\site-packages (from protobuf>=3.0.0->mysql-connector-pytho n) (28.8.0) Installing collected packages: protobuf, mysql-connector-python Successfully installed mysql-connector-python-8.0.15 protobuf-3.7.0

Similar for installing SQLAlchemy:

PS> python -m pip install SQLAlchemy Collecting SQLAlchemy Downloading https://files.pythonhosted.org/packages/21/ed/7eb53305b43ca51774a458d7c292f3bc7664d7a9bbb5bac4149fa34756b9/SQLAlchemy-1.2.18.tar.gz (5.7MB) 100% |████████████████████████████████| 5.7MB 3.3MB/s Installing collected packages: SQLAlchemy Running setup.py install for SQLAlchemy ... done

That’s it. Now you are ready to use SQLAlchemy with MySQL Connector/Python 8 and MySQL Server 8.

SQLAlchemy Example

With MySQL and SQLAlchemy installed, it is straight forward to use SQLAlchemy in you application. Here, I will just show a very simple example based on the tutorial in the official SQLAlchemy manual.

The first step is to import the necessary parts of SQLAlchemy. In this example, only sqlalchemy itself and declarative_base from sqlalchemy.ext.declarative are required:

import sqlalchemy from sqlalchemy.ext.declarative import declarative_base

The second step is to define MySQL as the engine and that you want to connect using MySQL Connector/Python:

# Define the MySQL engine using MySQL Connector/Python engine = sqlalchemy.create_engine( 'mysql+mysqlconnector://pyuser:Py@pp4Demo@localhost:3306/sqlalchemy', echo=True)

The definition of MySQL and MySQL Connector Python happens in line 6. mysql defines that you are using MySQL as the database (the dialect), and mysqlconnector tells that you want to use MySQL Connector/Python as the driver. The rest of the line defines the connection options. In this case you have user:password@host:port/default_db. You can add more options if you need it. So, in this case the following options have been set:

  • User: pyuser
  • Password: Py@app4Demo
  • Host: localhost
  • Port: 3306
  • Default database: sqlalchemy

Avoid

Never hard code the connection parameters into your application. Particularly the password is an absolutely no go. It is done here to keep the example simple, but doing so in an actual application makes deployments hard and pose a severe security issue.

See also the documentation of the engine configuration in the SQLAlchemy manual for more information.

The argument echo=True makes SQLAlchemy print each SQL statement it executes. This can be useful when testing.

The third step is to define and create a table – in this example the users table:

# Define and create the table Base = declarative_base() class User(Base): __tablename__ = 'users' id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True) name = sqlalchemy.Column(sqlalchemy.String(length=50)) fullname = sqlalchemy.Column(sqlalchemy.String(length=50)) nickname = sqlalchemy.Column(sqlalchemy.String(length=50)) def __repr__(self): return "<User(name='{0}', fullname='{1}', nickname='{2}')>".format( self.name, self.fullname, self.nickname) Base.metadata.create_all(engine)

MySQL requires that you specify the maximum number of characters for varchar columns, which is the data type used when specifying sqlalchemy.String(). So, the length argument is passed. (The length argument is also the first argument to sqlalchemy.String(), so you do not need to specify explicitly that it is the length.)

The call to Base.metadata.create_all() tells SQLAlchemy to create the underlying database table, if it does not already exist. Otherwise the existing table will be used.

The fourth step is to add a user. This requires a session. Once the session is created, you can add the user to it:

# Create a session Session = sqlalchemy.orm.sessionmaker() Session.configure(bind=engine) session = Session() # Add a user jwk_user = User(name='jesper', fullname='Jesper Wisborg Krogh', nickname='&#x1f42c;') session.add(jwk_user) session.commit()

There are two things, I will like you to pay attention to here. When defining the jwk_user in line 31, the nickname is set to be the dolphin emoji. This emoji requires four bytes in UTF-8 (0xF09F90AC). In older versions of MySQL (5.7 and earlier), you would need to explicitly change the character set to handle UTF-8. In MySQL 8, however, the four byte variant of UTF-8 (utf8mb4) is the default, so the dolphin emoji will work out of the box.

Tip

If you are interested in more information about the recommendation of which character set to use in MySQL, I wrote a blog about that last year: Which Character Set Should You Use in MySQL?

The other thing is that in order to persist the new user, you need to call session.commit(). The session works as a transaction here, so the changes will not be persisted until it is explicitly committed.

The fifth – and final – step is to query the data just saved:

# Query the user our_user = session.query(User).filter_by(name='jesper').first() print('\nOur User:') print(our_user) print('Nick name in hex: {0}'.format(our_user.nickname.encode('utf-8')))

The query finds the first user with the name set to “jesper”. Because the dolphin emoji tends not to be displayed correctly in many shells, the byte sequence in hex is also printed, so it can be confirmed it is indeed the dolphin emoji that was retrieved.

The output of the entire program is:

2019-03-03 14:02:57,265 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'sql_mode' 2019-03-03 14:02:57,265 INFO sqlalchemy.engine.base.Engine {} 2019-03-03 14:02:57,265 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'lower_case_table_names' 2019-03-03 14:02:57,265 INFO sqlalchemy.engine.base.Engine {} 2019-03-03 14:02:57,265 INFO sqlalchemy.engine.base.Engine SELECT DATABASE() 2019-03-03 14:02:57,265 INFO sqlalchemy.engine.base.Engine {} 2019-03-03 14:02:57,265 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS CHAR(60)) AS anon_1 2019-03-03 14:02:57,265 INFO sqlalchemy.engine.base.Engine {} 2019-03-03 14:02:57,265 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS CHAR(60)) AS anon_1 2019-03-03 14:02:57,265 INFO sqlalchemy.engine.base.Engine {} 2019-03-03 14:02:57,265 INFO sqlalchemy.engine.base.Engine DESCRIBE `users` 2019-03-03 14:02:57,280 INFO sqlalchemy.engine.base.Engine {} 2019-03-03 14:02:57,280 INFO sqlalchemy.engine.base.Engine ROLLBACK 2019-03-03 14:02:57,280 INFO sqlalchemy.engine.base.Engine CREATE TABLE users ( id INTEGER NOT NULL AUTO_INCREMENT, name VARCHAR(50), fullname VARCHAR(50), nickname VARCHAR(50), PRIMARY KEY (id) ) 2019-03-03 14:02:57,280 INFO sqlalchemy.engine.base.Engine {} 2019-03-03 14:02:57,312 INFO sqlalchemy.engine.base.Engine COMMIT 2019-03-03 14:02:57,312 INFO sqlalchemy.engine.base.Engine BEGIN (implicit) 2019-03-03 14:02:57,312 INFO sqlalchemy.engine.base.Engine INSERT INTO users (name, fullname, nickname) VALUES (%(name)s, %(fullname)s, %(nickname)s) 2019-03-03 14:02:57,312 INFO sqlalchemy.engine.base.Engine {'name': 'jesper', 'fullname': 'Jesper Wisborg Krogh', 'nickname': '&#x1f42c;'} 2019-03-03 14:02:57,312 INFO sqlalchemy.engine.base.Engine COMMIT 2019-03-03 14:02:57,312 INFO sqlalchemy.engine.base.Engine BEGIN (implicit) 2019-03-03 14:02:57,312 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname FROM users WHERE users.name = %(name_1)s LIMIT %(param_1)s 2019-03-03 14:02:57,312 INFO sqlalchemy.engine.base.Engine {'name_1': 'jesper', 'param_1': 1} Our User: <User(name='jesper', fullname='Jesper Wisborg Krogh', nickname='&#x1f42c;')> Nick name in hex: b'\xf0\x9f\x90\xac'

As it can be seen from the last two lines, the dolphin emoji was saved and retrieved correctly.

Complete Example Code

For completeness, here is the entire example program:

import sqlalchemy from sqlalchemy.ext.declarative import declarative_base # Define the MySQL engine using MySQL Connector/Python engine = sqlalchemy.create_engine( 'mysql+mysqlconnector://pyuser:Py@pp4Demo@localhost:3306/sqlalchemy', echo=True) # Define and create the table Base = declarative_base() class User(Base): __tablename__ = 'users' id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True) name = sqlalchemy.Column(sqlalchemy.String(length=50)) fullname = sqlalchemy.Column(sqlalchemy.String(length=50)) nickname = sqlalchemy.Column(sqlalchemy.String(length=50)) def __repr__(self): return "<User(name='{0}', fullname='{1}', nickname='{2}')>".format( self.name, self.fullname, self.nickname) Base.metadata.create_all(engine) # Create a session Session = sqlalchemy.orm.sessionmaker() Session.configure(bind=engine) session = Session() # Add a user jwk_user = User(name='jesper', fullname='Jesper Wisborg Krogh', nickname='&#x1f42c;') session.add(jwk_user) session.commit() # Query the user our_user = session.query(User).filter_by(name='jesper').first() print('\nOur User:') print(our_user) print('Nick name in hex: {0}'.format(our_user.nickname.encode('utf-8')))

Enjoy using MySQL, MySQL Connector/Python, and SQLAlchemy.

The Format for Timestamps in MySQL Logs

MySQL changed the timestamp format in the log files in MySQL 5.7. Since then, I have a few times seen questions about the new format, and how to change the time zone that is used. Latest in a comment to my blog about log_slow_extra in 8.0.14, where the question was what T and Z in the timestamp (for example 2019-01-31T07:24:06.100447Z) means. In this blog, I will discuss the timestamp format and show you how you can change the time zone used.

Examples of the timestamps from the MySQL error log when restarting MySQL (click to enlarge).
The MySQL 5.7 Change

In MySQL 5.7 it was decided to make two changes to the timestamps affecting the format and time zone:

  • The format was changed to use the ISO 8601 / RFC 3339 format: YYYY-MM-DDThh:mm:ss.uuuuuu plus a tail value to signify the time zone.
  • The time zone used for the timestamps defaults to UTC (in 5.6 it defaulted to system time).

Tip

A couple of useful free resources, if you want to know more about the ISO 8601 / RFC 3339 format is the RFC specification and Wikipedia. The actual ISO 8601 specification can be purchased.

So there are two slightly different ways the timestamps can be displayed. The following two examples are for the default where the timestamp is in UTC and one where it is in UTC+11:00 (Australian Eastern Daylight Time):

2019-03-02T02:53:39.781592Z 0 [System] [MY-010116] [Server] C:\Program Files\MySQL\MySQL Server 8.0\bin\mysqld.exe (mysqld 8.0.15) starting as process 1516 2019-03-02T13:54:17.155810+11:00 0 [System] [MY-010116] [Server] C:\Program Files\MySQL\MySQL Server 8.0\bin\mysqld.exe (mysqld 8.0.15) starting as process 2812

Tip

In MySQL 8.0, you can also change the format itself, e.g. to JSON. See the MySQL reference manual on the error log format for for more information.

Changing the Time Zone

How do you change between the UTC and system time zones? You do that using the log_timestamps option which can take one of two values:

  • UTC: Use UTC as the time zone. This is the default.
  • SYSTEM: Use the same time zone as the operating system.

The option affects both the error log, slow query log, and general query log. The the slow and general query logs, though, it is only the case when logging to a file.

From XKCD comic 1179.That Do T and Z Mean?

Back to the T and Z in the timestamps – what do they mean? The T is simply used as a separator between the data and time part. Think of it as T for Time. The Z means Zulu – another word for using the UTC time zone. That is:

  • T: Separator between the data and time parts.
  • Z: Zulu – the UTC time zone.

Thanks for reading.

Shinguz: MariaDB and MySQL consulting by plane

Since January 2019 FromDual tries to contribute actively a little bit against global warming too.

The best for the climate would be to NOT travel to the customer at all! For this cases we have our FromDual remote-DBA services for MariaDB and MySQL.

But sometimes customer wants or needs us on-site for our FromDual in-house trainings or our FromDual on-site consulting engagements. In these cases we try to travel by train. Travelling by train is after walking or travelling by bicycle the most climate friendly way to travel:


But some customers are located more than 7 to 8 hours far away by train. For these customers we have to take the plan which is not good for the climate at all. But at least we will compensate for our CO2 emission via MyClimate.org:

Taxonomy upgrade extras:  consulting training remote-dba climate mysql mariadb services

A Review of the New Analytic Window Functions in MySQL 8.0

Data is captured and stored for a variety of reasons. Hours beyond count (and even more budget) invested in collecting, ingesting, structuring, validating, and ultimately storing of data; to say that it is a valuable asset is to drive home a moot point. This day in age it may, in fact, be our most precious commodity.

Some data is used strictly as an archive. Perhaps to record or track events that happened in the past. But the other side of that coin is that historical data has value in basing decisions for the future and future endeavors.

  • What day to have our sale on? (Planning for future sales based on how we did in the past.)
  • Which salesperson performed the best in quarter one? (Looking back, who can we reward for their efforts.)
  • Which restaurant is frequented the most in the middle of July? (The travel season is upon us... Who can we sell our foodstuffs and goods to?)

You get the picture. Using data on hand is integral for any organization.

Many companies build, base, and provide services with data. They depend on it.

Several months back, depending on when you are reading this, I began walking for exercise, in earnest, to lose weight, get a handle on my health, and to seek a daily bit of solitude from this busy world we live in.

I used a mobile pedometer app to track my hikes, even considering which shoes I wore, as I have a tendency to be ultra-picky when it comes to footwear.

While this data is not nearly as important as that mentioned in those scenarios above, for me, a key element in learning anything, is using something I am interested in, can relate to, and understand.

Window Functions have been on my radar to explore for a long while now. So, I thought to try my hand at a couple of them in this post. Having recently been supported in MySQL 8 (Visit this Severalnines blog I wrote about MySQL 8 upgrades and new additions where I mention them briefly) that ecosystem is the one I will use here. Be forewarned, I am not a window analytical function guru.

What is a MySQL Window Function?

The MySQL documentation defines them as such: "A window function performs an aggregate-like operation on a set of query rows. However, whereas an aggregate operation groups query rows into a single result row, a window function produces a result for each query row:"

Data Set and Setup for This Post

I store the captured data from my walks in this table:

mysql> DESC hiking_stats; +-----------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+--------------+------+-----+---------+-------+ | day_walked | date | YES | | NULL | | | burned_calories | decimal(4,1) | YES | | NULL | | | distance_walked | decimal(4,2) | YES | | NULL | | | time_walking | time | YES | | NULL | | | pace | decimal(2,1) | YES | | NULL | | | shoes_worn | text | YES | | NULL | | | trail_hiked | text | YES | | NULL | | +-----------------+--------------+------+-----+---------+-------+ 7 rows in set (0.01 sec)

There is close to 90 days worth of data here:

mysql> SELECT COUNT(*) FROM hiking_stats; +----------+ | COUNT(*) | +----------+ | 84 | +----------+ 1 row in set (0.00 sec)

I'll admit, I am finicky about my footwear so let's determine which pair of shoes I favored most:

mysql> SELECT DISTINCT shoes_worn, COUNT(*) -> FROM hiking_stats -> GROUP BY shoes_worn; +---------------------------------------+----------+ | shoes_worn | COUNT(*) | +---------------------------------------+----------+ | New Balance Trail Runners-All Terrain | 30 | | Oboz Sawtooth Low | 47 | | Keen Koven WP(keen-dry) | 6 | | New Balance 510v2 | 1 | +---------------------------------------+----------+ 4 rows in set (0.00 sec)

In order to provide a better, manageable on-screen demonstration, I will limit the remaining portion of query results to just those of the favorite shoes I wore 47 times.

I also have a trail_hiked column and since I was in 'ultra exercise mode' during this almost 3 month period, I even counted calories while push mowing the yard:

mysql> SELECT DISTINCT trail_hiked, COUNT(*) -> FROM hiking_stats -> GROUP BY trail_hiked; +------------------------+----------+ | trail_hiked | COUNT(*) | +------------------------+----------+ | Yard Mowing | 14 | | Sandy Trail-Drive | 20 | | West Boundary | 29 | | House-Power Line Route | 10 | | Tree Trail-extended | 11 | +------------------------+----------+ 5 rows in set (0.01 sec)

Yet, to even further limit the data set, I will filter out those rows as well:

mysql> SELECT COUNT(*) -> FROM hiking_stats -> WHERE shoes_worn = 'Oboz Sawtooth Low' -> AND -> trail_hiked <> 'Yard Mowing'; +----------+ | COUNT(*) | +----------+ | 40 | +----------+ 1 row in set (0.01 sec)

For the sake of simplicity and ease of use, I will create a VIEW of columns to work with:

mysql> CREATE VIEW vw_fav_shoe_stats AS -> (SELECT day_walked, burned_calories, distance_walked, time_walking, pace, trail_hiked -> FROM hiking_stats -> WHERE shoes_worn = 'Oboz Sawtooth Low' -> AND trail_hiked <> 'Yard Mowing'); Query OK, 0 rows affected (0.19 sec)

Leaving me with this set of data:

mysql> SELECT * FROM vw_fav_shoe_stats; +------------+-----------------+-----------------+--------------+------+------------------------+ | day_walked | burned_calories | distance_walked | time_walking | pace | trail_hiked | +------------+-----------------+-----------------+--------------+------+------------------------+ | 2018-06-03 | 389.6 | 4.11 | 01:13:19 | 3.4 | Sandy Trail-Drive | | 2018-06-04 | 394.6 | 4.26 | 01:14:15 | 3.4 | Sandy Trail-Drive | | 2018-06-06 | 384.6 | 4.10 | 01:13:14 | 3.4 | Sandy Trail-Drive | | 2018-06-07 | 382.7 | 4.12 | 01:12:52 | 3.4 | Sandy Trail-Drive | | 2018-06-17 | 296.3 | 2.82 | 00:55:45 | 3.0 | West Boundary | | 2018-06-18 | 314.7 | 3.08 | 00:59:13 | 3.1 | West Boundary | | 2018-06-20 | 338.5 | 3.27 | 01:03:42 | 3.1 | West Boundary | | 2018-06-21 | 339.5 | 3.40 | 01:03:54 | 3.2 | West Boundary | | 2018-06-24 | 392.4 | 3.76 | 01:13:51 | 3.1 | House-Power Line Route | | 2018-06-25 | 362.1 | 3.72 | 01:08:09 | 3.3 | West Boundary | | 2018-06-26 | 380.5 | 3.94 | 01:11:36 | 3.3 | West Boundary | | 2018-07-03 | 323.7 | 3.29 | 01:00:55 | 3.2 | West Boundary | | 2018-07-04 | 342.8 | 3.47 | 01:04:31 | 3.2 | West Boundary | | 2018-07-06 | 375.7 | 3.80 | 01:10:42 | 3.2 | West Boundary | | 2018-07-07 | 347.6 | 3.40 | 01:05:25 | 3.1 | Sandy Trail-Drive | | 2018-07-08 | 351.6 | 3.58 | 01:06:09 | 3.2 | West Boundary | | 2018-07-09 | 336.0 | 3.28 | 01:03:13 | 3.1 | West Boundary | | 2018-07-11 | 375.2 | 3.81 | 01:10:37 | 3.2 | West Boundary | | 2018-07-12 | 325.9 | 3.28 | 01:01:20 | 3.2 | West Boundary | | 2018-07-15 | 382.9 | 3.91 | 01:12:03 | 3.3 | House-Power Line Route | | 2018-07-16 | 368.6 | 3.72 | 01:09:22 | 3.2 | West Boundary | | 2018-07-17 | 339.4 | 3.46 | 01:03:52 | 3.3 | West Boundary | | 2018-07-18 | 368.1 | 3.72 | 01:08:28 | 3.3 | West Boundary | | 2018-07-19 | 339.2 | 3.44 | 01:03:06 | 3.3 | West Boundary | | 2018-07-22 | 378.3 | 3.76 | 01:10:22 | 3.2 | West Boundary | | 2018-07-23 | 322.9 | 3.28 | 01:00:03 | 3.3 | West Boundary | | 2018-07-24 | 386.4 | 3.81 | 01:11:53 | 3.2 | West Boundary | | 2018-07-25 | 379.9 | 3.83 | 01:10:39 | 3.3 | West Boundary | | 2018-07-27 | 378.3 | 3.73 | 01:10:21 | 3.2 | West Boundary | | 2018-07-28 | 337.4 | 3.39 | 01:02:45 | 3.2 | Sandy Trail-Drive | | 2018-07-29 | 348.7 | 3.50 | 01:04:52 | 3.2 | West Boundary | | 2018-07-30 | 361.6 | 3.69 | 01:07:15 | 3.3 | West Boundary | | 2018-07-31 | 359.9 | 3.66 | 01:06:57 | 3.3 | West Boundary | | 2018-08-01 | 336.1 | 3.37 | 01:01:48 | 3.3 | West Boundary | | 2018-08-03 | 259.9 | 2.57 | 00:47:47 | 3.2 | West Boundary | | 2018-08-05 | 341.2 | 3.37 | 01:02:44 | 3.2 | West Boundary | | 2018-08-06 | 357.7 | 3.64 | 01:05:46 | 3.3 | West Boundary | | 2018-08-17 | 184.2 | 1.89 | 00:39:00 | 2.9 | Tree Trail-extended | | 2018-08-18 | 242.9 | 2.53 | 00:51:25 | 3.0 | Tree Trail-extended | | 2018-08-30 | 204.4 | 1.95 | 00:37:35 | 3.1 | House-Power Line Route | +------------+-----------------+-----------------+--------------+------+------------------------+ 40 rows in set (0.00 sec)

The first window function I will look at is ROW_NUMBER().

Suppose I want a result set ordered by the burned_calories column for the month of 'July'.

Of course, I can retrieve that data with this query:

mysql> SELECT day_walked, burned_calories, trail_hiked -> FROM vw_fav_shoe_stats -> WHERE MONTHNAME(day_walked) = 'July' -> ORDER BY burned_calories DESC; +------------+-----------------+------------------------+ | day_walked | burned_calories | trail_hiked | +------------+-----------------+------------------------+ | 2018-07-24 | 386.4 | West Boundary | | 2018-07-15 | 382.9 | House-Power Line Route | | 2018-07-25 | 379.9 | West Boundary | | 2018-07-22 | 378.3 | West Boundary | | 2018-07-27 | 378.3 | West Boundary | | 2018-07-06 | 375.7 | West Boundary | | 2018-07-11 | 375.2 | West Boundary | | 2018-07-16 | 368.6 | West Boundary | | 2018-07-18 | 368.1 | West Boundary | | 2018-07-30 | 361.6 | West Boundary | | 2018-07-31 | 359.9 | West Boundary | | 2018-07-08 | 351.6 | West Boundary | | 2018-07-29 | 348.7 | West Boundary | | 2018-07-07 | 347.6 | Sandy Trail-Drive | | 2018-07-04 | 342.8 | West Boundary | | 2018-07-17 | 339.4 | West Boundary | | 2018-07-19 | 339.2 | West Boundary | | 2018-07-28 | 337.4 | Sandy Trail-Drive | | 2018-07-09 | 336.0 | West Boundary | | 2018-07-12 | 325.9 | West Boundary | | 2018-07-03 | 323.7 | West Boundary | | 2018-07-23 | 322.9 | West Boundary | +------------+-----------------+------------------------+ 22 rows in set (0.01 sec)

Yet, for whatever reason (maybe personal satisfaction), I want to award a ranking among the returned rows beginning with 1 indicative of the highest burned_calories count, all the way to (n) rows in the result set.

ROW_NUMBER(), can handle this no problem at all:

mysql> SELECT day_walked, burned_calories, -> ROW_NUMBER() OVER(ORDER BY burned_calories DESC) -> AS position, trail_hiked -> FROM vw_fav_shoe_stats -> WHERE MONTHNAME(day_walked) = 'July'; +------------+-----------------+----------+------------------------+ | day_walked | burned_calories | position | trail_hiked | +------------+-----------------+----------+------------------------+ | 2018-07-24 | 386.4 | 1 | West Boundary | | 2018-07-15 | 382.9 | 2 | House-Power Line Route | | 2018-07-25 | 379.9 | 3 | West Boundary | | 2018-07-22 | 378.3 | 4 | West Boundary | | 2018-07-27 | 378.3 | 5 | West Boundary | | 2018-07-06 | 375.7 | 6 | West Boundary | | 2018-07-11 | 375.2 | 7 | West Boundary | | 2018-07-16 | 368.6 | 8 | West Boundary | | 2018-07-18 | 368.1 | 9 | West Boundary | | 2018-07-30 | 361.6 | 10 | West Boundary | | 2018-07-31 | 359.9 | 11 | West Boundary | | 2018-07-08 | 351.6 | 12 | West Boundary | | 2018-07-29 | 348.7 | 13 | West Boundary | | 2018-07-07 | 347.6 | 14 | Sandy Trail-Drive | | 2018-07-04 | 342.8 | 15 | West Boundary | | 2018-07-17 | 339.4 | 16 | West Boundary | | 2018-07-19 | 339.2 | 17 | West Boundary | | 2018-07-28 | 337.4 | 18 | Sandy Trail-Drive | | 2018-07-09 | 336.0 | 19 | West Boundary | | 2018-07-12 | 325.9 | 20 | West Boundary | | 2018-07-03 | 323.7 | 21 | West Boundary | | 2018-07-23 | 322.9 | 22 | West Boundary | +------------+-----------------+----------+------------------------+ 22 rows in set (0.00 sec)

You can see the row with burned_calories amount of 386.4 has position 1, while the row with value 322.9 has 22, which is the least (or lowest) amount among the returned rows set.

I'll use ROW_NUMBER() for something a bit more interesting as we progress. Only when I learned about it used in that context, did I truly realize some of its real power.

Up next, let's visit the RANK() window function to provide a different sort of 'ranking' among the rows. We will still target the burned_calories column value. And, while RANK() is similar to ROW_NUMBER() in that they somewhat rank rows, it does introduce a subtle difference in certain circumstances.

I will even further limit the number of rows as a whole by filtering any records not in the month of 'July' but targeting a specific trail:

mysql> SELECT day_walked, burned_calories, -> RANK() OVER(ORDER BY burned_calories DESC) AS position, -> trail_hiked -> FROM vw_fav_shoe_stats -> WHERE MONTHNAME(day_walked) = 'July' -> AND trail_hiked = 'West Boundary'; +------------+-----------------+----------+---------------+ | day_walked | burned_calories | position | trail_hiked | +------------+-----------------+----------+---------------+ | 2018-07-24 | 386.4 | 1 | West Boundary | | 2018-07-25 | 379.9 | 2 | West Boundary | | 2018-07-22 | 378.3 | 3 | West Boundary | | 2018-07-27 | 378.3 | 3 | West Boundary | | 2018-07-06 | 375.7 | 5 | West Boundary | | 2018-07-11 | 375.2 | 6 | West Boundary | | 2018-07-16 | 368.6 | 7 | West Boundary | | 2018-07-18 | 368.1 | 8 | West Boundary | | 2018-07-30 | 361.6 | 9 | West Boundary | | 2018-07-31 | 359.9 | 10 | West Boundary | | 2018-07-08 | 351.6 | 11 | West Boundary | | 2018-07-29 | 348.7 | 12 | West Boundary | | 2018-07-04 | 342.8 | 13 | West Boundary | | 2018-07-17 | 339.4 | 14 | West Boundary | | 2018-07-19 | 339.2 | 15 | West Boundary | | 2018-07-09 | 336.0 | 16 | West Boundary | | 2018-07-12 | 325.9 | 17 | West Boundary | | 2018-07-03 | 323.7 | 18 | West Boundary | | 2018-07-23 | 322.9 | 19 | West Boundary | +------------+-----------------+----------+---------------+ 19 rows in set (0.01 sec)

Notice anything odd here? Different from ROW_NUMBER()?

Check out the position value for those rows of '2018-07-22' and '2018-07-27'. They are in a tie at 3rd.

With good reason since the burned_calorie value of 378.3 is present in both rows.

How would ROW_NUMBER() rank them?

Let's find out:

mysql> SELECT day_walked, burned_calories, -> ROW_NUMBER() OVER(ORDER BY burned_calories DESC) AS position, -> trail_hiked -> FROM vw_fav_shoe_stats -> WHERE MONTHNAME(day_walked) = 'July' -> AND trail_hiked = 'West Boundary'; +------------+-----------------+----------+---------------+ | day_walked | burned_calories | position | trail_hiked | +------------+-----------------+----------+---------------+ | 2018-07-24 | 386.4 | 1 | West Boundary | | 2018-07-25 | 379.9 | 2 | West Boundary | | 2018-07-22 | 378.3 | 3 | West Boundary | | 2018-07-27 | 378.3 | 4 | West Boundary | | 2018-07-06 | 375.7 | 5 | West Boundary | | 2018-07-11 | 375.2 | 6 | West Boundary | | 2018-07-16 | 368.6 | 7 | West Boundary | | 2018-07-18 | 368.1 | 8 | West Boundary | | 2018-07-30 | 361.6 | 9 | West Boundary | | 2018-07-31 | 359.9 | 10 | West Boundary | | 2018-07-08 | 351.6 | 11 | West Boundary | | 2018-07-29 | 348.7 | 12 | West Boundary | | 2018-07-04 | 342.8 | 13 | West Boundary | | 2018-07-17 | 339.4 | 14 | West Boundary | | 2018-07-19 | 339.2 | 15 | West Boundary | | 2018-07-09 | 336.0 | 16 | West Boundary | | 2018-07-12 | 325.9 | 17 | West Boundary | | 2018-07-03 | 323.7 | 18 | West Boundary | | 2018-07-23 | 322.9 | 19 | West Boundary | +------------+-----------------+----------+---------------+ 19 rows in set (0.06 sec)

Hmmm...

No ties in the position column numbering this time.

But, who gets precedence?

To my knowledge, for a predictable ordering, you will likely have to determine it by some other additional means within the query (e.g. the time_walking column in this case?).

But we are not done yet with ranking options. Here is DENSE_RANK():

mysql> SELECT day_walked, burned_calories, -> DENSE_RANK() OVER(ORDER BY burned_calories DESC) AS position, -> trail_hiked -> FROM vw_fav_shoe_stats -> WHERE MONTHNAME(day_walked) = 'July' -> AND trail_hiked = 'West Boundary'; +------------+-----------------+----------+---------------+ | day_walked | burned_calories | position | trail_hiked | +------------+-----------------+----------+---------------+ | 2018-07-24 | 386.4 | 1 | West Boundary | | 2018-07-25 | 379.9 | 2 | West Boundary | | 2018-07-22 | 378.3 | 3 | West Boundary | | 2018-07-27 | 378.3 | 3 | West Boundary | | 2018-07-06 | 375.7 | 4 | West Boundary | | 2018-07-11 | 375.2 | 5 | West Boundary | | 2018-07-16 | 368.6 | 6 | West Boundary | | 2018-07-18 | 368.1 | 7 | West Boundary | | 2018-07-30 | 361.6 | 8 | West Boundary | | 2018-07-31 | 359.9 | 9 | West Boundary | | 2018-07-08 | 351.6 | 10 | West Boundary | | 2018-07-29 | 348.7 | 11 | West Boundary | | 2018-07-04 | 342.8 | 12 | West Boundary | | 2018-07-17 | 339.4 | 13 | West Boundary | | 2018-07-19 | 339.2 | 14 | West Boundary | | 2018-07-09 | 336.0 | 15 | West Boundary | | 2018-07-12 | 325.9 | 16 | West Boundary | | 2018-07-03 | 323.7 | 17 | West Boundary | | 2018-07-23 | 322.9 | 18 | West Boundary | +------------+-----------------+----------+---------------+ 19 rows in set (0.00 sec)

The tie remains, however, the numbering is different in where rows are counted, continuing through the remaining results.

Where RANK() began the count with 5 after the ties, DENSE_RANK() picks up at the next number, which is 4 in this instance, since the tie happened at row 3.

I'll be the first to admit, these various row ranking patterns are quite interesting, but, how can you use them for a meaningful result set?

ClusterControl Single Console for Your Entire Database Infrastructure Find out what else is new in ClusterControl Install ClusterControl for FREE A Bonus Thought

I have to give credit where credit is due. I learned so much about window functions from a wonderful series on YouTube and one video, in particular, inspired me for this next example. Please keep in mind although the examples in that series are demonstrated with a non-open-source database system (Don't toss the digital rotten fruits and veggies at me), there is a ton to learn from the videos overall.

I see a pattern in most of the query results so far I want to explore. I will not filter by any month nor trail.

What I want to know, are the consecutive days that I burned more than 350 calories. Better yet, groups of those days.

Here is the base query I will start with and build off from:

mysql> SELECT day_walked, burned_calories, -> ROW_NUMBER() OVER(ORDER BY day_walked ASC) AS positional_bound, -> trail_hiked -> FROM vw_fav_shoe_stats -> WHERE burned_calories > 350; +------------+-----------------+------------------+------------------------+ | day_walked | burned_calories | positional_bound | trail_hiked | +------------+-----------------+------------------+------------------------+ | 2018-06-03 | 389.6 | 1 | Sandy Trail-Drive | | 2018-06-04 | 394.6 | 2 | Sandy Trail-Drive | | 2018-06-06 | 384.6 | 3 | Sandy Trail-Drive | | 2018-06-07 | 382.7 | 4 | Sandy Trail-Drive | | 2018-06-24 | 392.4 | 5 | House-Power Line Route | | 2018-06-25 | 362.1 | 6 | West Boundary | | 2018-06-26 | 380.5 | 7 | West Boundary | | 2018-07-06 | 375.7 | 8 | West Boundary | | 2018-07-08 | 351.6 | 9 | West Boundary | | 2018-07-11 | 375.2 | 10 | West Boundary | | 2018-07-15 | 382.9 | 11 | House-Power Line Route | | 2018-07-16 | 368.6 | 12 | West Boundary | | 2018-07-18 | 368.1 | 13 | West Boundary | | 2018-07-22 | 378.3 | 14 | West Boundary | | 2018-07-24 | 386.4 | 15 | West Boundary | | 2018-07-25 | 379.9 | 16 | West Boundary | | 2018-07-27 | 378.3 | 17 | West Boundary | | 2018-07-30 | 361.6 | 18 | West Boundary | | 2018-07-31 | 359.9 | 19 | West Boundary | | 2018-08-06 | 357.7 | 20 | West Boundary | +------------+-----------------+------------------+------------------------+ 20 rows in set (0.00 sec)

We've seen ROW_NUMBER() already, however now it really comes into play.

To make this work (in MySQL at least) I had to use the DATE_SUB() function since essentially, with this technique we are subtracting a number - the value provided by ROW_NUMBER() from the day_walked date column of the same row, which in turn, provides a date itself via the calculation:

mysql> SELECT day_walked AS day_of_walk, -> DATE_SUB(day_walked, INTERVAL ROW_NUMBER() OVER(ORDER BY day_walked ASC) DAY) AS positional_bound, -> burned_calories, -> trail_hiked -> FROM vw_fav_shoe_stats -> WHERE burned_calories > 350; +-------------+------------------+-----------------+------------------------+ | day_of_walk | positional_bound | burned_calories | trail_hiked | +-------------+------------------+-----------------+------------------------+ | 2018-06-03 | 2018-06-02 | 389.6 | Sandy Trail-Drive | | 2018-06-04 | 2018-06-02 | 394.6 | Sandy Trail-Drive | | 2018-06-06 | 2018-06-03 | 384.6 | Sandy Trail-Drive | | 2018-06-07 | 2018-06-03 | 382.7 | Sandy Trail-Drive | | 2018-06-24 | 2018-06-19 | 392.4 | House-Power Line Route | | 2018-06-25 | 2018-06-19 | 362.1 | West Boundary | | 2018-06-26 | 2018-06-19 | 380.5 | West Boundary | | 2018-07-06 | 2018-06-28 | 375.7 | West Boundary | | 2018-07-08 | 2018-06-29 | 351.6 | West Boundary | | 2018-07-11 | 2018-07-01 | 375.2 | West Boundary | | 2018-07-15 | 2018-07-04 | 382.9 | House-Power Line Route | | 2018-07-16 | 2018-07-04 | 368.6 | West Boundary | | 2018-07-18 | 2018-07-05 | 368.1 | West Boundary | | 2018-07-22 | 2018-07-08 | 378.3 | West Boundary | | 2018-07-24 | 2018-07-09 | 386.4 | West Boundary | | 2018-07-25 | 2018-07-09 | 379.9 | West Boundary | | 2018-07-27 | 2018-07-10 | 378.3 | West Boundary | | 2018-07-30 | 2018-07-12 | 361.6 | West Boundary | | 2018-07-31 | 2018-07-12 | 359.9 | West Boundary | | 2018-08-06 | 2018-07-17 | 357.7 | West Boundary | +-------------+------------------+-----------------+------------------------+ 20 rows in set (0.00 sec)

However, without DATE_SUB(), you wind up with this (or at least I did):

mysql> SELECT day_walked AS day_of_walk, -> day_walked - ROW_NUMBER() OVER(ORDER BY day_walked ASC) AS positional_bound, -> burned_calories, -> trail_hiked -> FROM vw_fav_shoe_stats -> WHERE burned_calories > 350; +-------------+------------------+-----------------+------------------------+ | day_of_walk | positional_bound | burned_calories | trail_hiked | +-------------+------------------+-----------------+------------------------+ | 2018-06-03 | 20180602 | 389.6 | Sandy Trail-Drive | | 2018-06-04 | 20180602 | 394.6 | Sandy Trail-Drive | | 2018-06-06 | 20180603 | 384.6 | Sandy Trail-Drive | | 2018-06-07 | 20180603 | 382.7 | Sandy Trail-Drive | | 2018-06-24 | 20180619 | 392.4 | House-Power Line Route | | 2018-06-25 | 20180619 | 362.1 | West Boundary | | 2018-06-26 | 20180619 | 380.5 | West Boundary | | 2018-07-06 | 20180698 | 375.7 | West Boundary | | 2018-07-08 | 20180699 | 351.6 | West Boundary | | 2018-07-11 | 20180701 | 375.2 | West Boundary | | 2018-07-15 | 20180704 | 382.9 | House-Power Line Route | | 2018-07-16 | 20180704 | 368.6 | West Boundary | | 2018-07-18 | 20180705 | 368.1 | West Boundary | | 2018-07-22 | 20180708 | 378.3 | West Boundary | | 2018-07-24 | 20180709 | 386.4 | West Boundary | | 2018-07-25 | 20180709 | 379.9 | West Boundary | | 2018-07-27 | 20180710 | 378.3 | West Boundary | | 2018-07-30 | 20180712 | 361.6 | West Boundary | | 2018-07-31 | 20180712 | 359.9 | West Boundary | | 2018-08-06 | 20180786 | 357.7 | West Boundary | +-------------+------------------+-----------------+------------------------+ 20 rows in set (0.04 sec)

Hey, that doesn't look so bad really.

What gives?

Eh, the row with a positional_bound value of '20180698'...

Wait a minute, this is supposed to calculate a date value by subtracting the number ROW_NUMBER() provides from the day_of_walk column.

Correct.

I don't know about you, but I am not aware of a month with 98 days!

But, if there is one, bring on the extra paychecks!

All fun aside, this obviously was incorrect and prompted me to (eventually) use DATE_SUB(), which provides a correct, results set then allowing me to run this query:

mysql> SELECT MIN(t.day_of_walk), -> MAX(t.day_of_walk), -> COUNT(*) AS num_of_hikes -> FROM (SELECT day_walked AS day_of_walk, -> DATE_SUB(day_walked, INTERVAL ROW_NUMBER() OVER(ORDER BY day_walked ASC) DAY) AS positional_bound -> FROM vw_fav_shoe_stats -> WHERE burned_calories > 350) AS t -> GROUP BY t.positional_bound -> ORDER BY 1; +--------------------+--------------------+--------------+ | MIN(t.day_of_walk) | MAX(t.day_of_walk) | num_of_hikes | +--------------------+--------------------+--------------+ | 2018-06-03 | 2018-06-04 | 2 | | 2018-06-06 | 2018-06-07 | 2 | | 2018-06-24 | 2018-06-26 | 3 | | 2018-07-06 | 2018-07-06 | 1 | | 2018-07-08 | 2018-07-08 | 1 | | 2018-07-11 | 2018-07-11 | 1 | | 2018-07-15 | 2018-07-16 | 2 | | 2018-07-18 | 2018-07-18 | 1 | | 2018-07-22 | 2018-07-22 | 1 | | 2018-07-24 | 2018-07-25 | 2 | | 2018-07-27 | 2018-07-27 | 1 | | 2018-07-30 | 2018-07-31 | 2 | | 2018-08-06 | 2018-08-06 | 1 | +--------------------+--------------------+--------------+ 13 rows in set (0.12 sec) Related resources  ClusterControl for MySQL  MySQL in 2018: What’s in 8.0 and Other Observations  MySQL Performance Benchmarking: MySQL 5.7 vs MySQL 8.0

Basically, I have wrapped the results set provided from that analytical query, in the form of a Derived Table, and queried it for: a start and end date, a count of what I have labeled num_of_hikes, then grouped on the positional_bound column, ultimately providing sets of groups of consecutive days where I burned more than 350 calories.

You can see in the date range of 2018-06-24 to 2018-06-26, resulted in 3 consecutive days meeting the calorie burned criteria of 350 in the WHERE clause.

Not too bad if I don't say so myself, but definitely a record I want to try and best!

Conclusion

Window functions are in a world and league of their own. I have not even scratched the surface of them, having only covered 3 of them in a 'high-level' introductory and perhaps, trivial sense. However, hopefully, through this post, you find that you can query for quite interesting and potentially insightful data with a 'bare minimal' use of them.

Thank you for reading.

Tags:  MySQL analytics window functions

Percona XtraDB Cluster 5.6.43-28.32 Is Now Available

Percona is glad to announce the release of Percona XtraDB Cluster 5.6.43-28.32 on February 28, 2019. Binaries are available from the downloads section or from our software repositories.

This release of Percona XtraDB Cluster includes the support of Ubuntu 18.10 (Cosmic Cuttlefish). Percona XtraDB Cluster 5.6.43-28.32 is now the current release, based on the following:

All Percona software is open-source and free.

Bugs Fixed
  • PXC-2388: In some cases, DROP FUNCTION function_name was not replicated.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Percona XtraDB Cluster 5.7.25-31.35 Is Now Available

Percona is glad to announce the release of Percona XtraDB Cluster 5.7.25-31.35 on February 28, 2018. Binaries are available from the downloads section or from our software repositories.

This release of Percona XtraDB Cluster includes the support of Ubuntu 18.10 (Cosmic Cuttlefish). Percona XtraDB Cluster 5.7.25-31.35 is now the current release, based on the following:

All Percona software is open-source and free.

Bugs Fixed
  • PXC-2346: mysqld could crash when executing mysqldump --single-transaction while the binary log is disabled. This problem was also reported in PXC-1711PXC-2371PXC-2419.
  • PXC-2388: In some cases, DROP FUNCTION function_name was not replicated.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

How to Install LEMP on Ubuntu 18.04

The LEMP stack is a web development platform for hosting static and dynamic websites. It is a combination of free and open-source software including Linux, Nginx, MySQL, and PHP. It is an alternative to the highly popular LAMP stack, the only difference being having Nginx as a web server instead of Apache. Requirements: For the […]

How to Install Cachet Status Page System on CentOS 7

Cachet is a beautiful and powerful open source status page system written in PHP that allows you to better communicate downtime and system failures to your customers, teams, and shareholders. In this tutorial, we will install the Cachet status page system using PHP, Nginx, MySQL, and Composer on the CentOS 7 system.

Giving back to the Community -- ChickTech Austin

I am doing something this weekend that is way out of my comfort zone and I want to encourage you to do something similar.  On Saturday I will be teaching a pop up workshop for Chick Tech Austin titled An Introduction to Databases with MySQL.  

Why is this out of my comfort zone?  I regularly speak to computer professional on databases and programming and have done so for years.  But this Saturday the class is made up of young women between the ages of fourteen and eighteen.For us oldsters, the class is made up of people half the age of MySQL, Python, and Java.  Their parents are probably younger than Structured Query Language! But these young folks are going to be inheriting our code bases based on technologies used today and integrated with future innovations. 

Another point of reference is that the Hudson continuous integration tool came out fourteen years ago and it seems like the CI tools world has really blossomed since.  And that is time life span of some folks in the class.

Teaching up and coming developers how to use Structured Query Language and other long lived technologies is vital.  These cogs of our infrastructures have lasted as they provide utility and fulfill needed function.  SQL may have many oddities but it has lasted so very long because it is so very useful. But fewer developers each year seem to get any formal training in SQL, sets, relational calculus, symbolic logic, or the basics of relational databases.  Yet the cries of help on sites like Quora and Stackoverflow for relatively simple SQL and database questions seem to grow exponentially.

What I am asking (or pleading) you to do is to pass on your knowledge of basics to others.  You should be able to speak on something basic you do on a regular basis to a novice for at least five minutes.  Find a local user group, Meetup, or other organization and offer them a short presentation.  Can't find one, start one! Feynman said that teaching really sharpens your own skill sets. 

Real novices need very clear examples.  Saying, "Oh it is like a Generator in PHP" is not going to help them frame of reference wise.  Take something you do every day (use the vi editor, explain a query, use css) and write a very simple guide and/or cheat sheet.  Thee is an audience for you information.

Or contact local groups and ask what subjects they would want as a presentation.  A simple 'How I set up PHP with Apache' covering the steps you took and what you discovered on the way may be old hat to you but there are folks stumbling around who would welcome your guidance.  Organizers of meetups and user groups are dying for presentations and even a five to ten minutes talk is welcome.  Ask your local groups what they would like to see and if you do not know the subject, plunge in and build your own skills.  Nobody is expecting perfection and often talking about how to dig yourself out of the potholes you smacked into is the real value of your presentation.

I wish there was a formalized way so that anyone with X years of experience could share their knowledge.  But there is not.  So I am asking you to get up off your backside, find a group near you, and offer your expertise.  Think of this as meta-documentation for the future generations who will have to suffer with your code and systems long after you have logged off for the last time.

And if you want to talk about an introduction to databases, I will send you my materials and help you prep for the presentation.

How to concatenate strings in MySQL and MariaDB

Tower Bridge concatenates two parts of London
and it looks like the || operator

The standard way to concatenate strings in MySQL is:

mysql> SET @var1 := 'Planet '; Query OK, 0 rows affected (0.00 sec) mysql> SET @var2 := 'Earth'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT CONCAT(@var1, @var2); +----------------------+ | CONCAT(@var1, @var2) | +----------------------+ | Planet Earth | +----------------------+

All examples in this page will work without changes in MariaDB.

Dealing with NULL

But what if one of the variables is NULL? This is the case, for example, if a variable was never set. Well, in that case CONCAT() will return NULL.

Normally this is not the desired behaviour. Usually we want NULL to be considered as an empty string, instead. We can use CONCAT_WS(). This function adds a separator between other arguments, which is good for example to add a comma to a list or spaces to a human-readable message. If we don’t need it, we can just use an empty string as the first argument.

mysql> SELECT CONCAT(first_name, last_name) FROM employee; +-------------------------------+ | CONCAT(first_name, last_name) | +-------------------------------+ | NULL | +-------------------------------+ 1 row in set (0.00 sec) mysql> SELECT CONCAT_WS('', first_name, last_name) FROM employee; +--------------------------------------+ | CONCAT_WS('', first_name, last_name) | +--------------------------------------+ | Doe | +--------------------------------------+ 1 row in set (0.00 sec)

We can consider >CONCAT_WS() as a NULL-safe concatenator.

But we can also replace NULLs with some default values using IFNULL():

mysql> SELECT CONCAT_WS(' ', IFNULL(first_name, ''), last_name) -> FROM employee; +------------------------------------------------------------+ | CONCAT_WS(' ', IFNULL(first_name, ''), last_name) | +------------------------------------------------------------+ | Doe | +------------------------------------------------------------+ Oracle compatibility

For Oracle compatibility, MySQL also allows to use || as a concatenation operator, after setting sql_mode to PIPES_AS_CONCAT:

mysql> SET sql_mode := 'PIPES_AS_CONCAT'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT 'A' || 'B', 'A' || NULL; +------------+-------------+ | 'A' || 'B' | 'A' || NULL | +------------+-------------+ | AB | NULL | +------------+-------------+

Note that this behaviour is different from Oracle’s. In fact, for Oracle sometimes NULL means NULL, but in some contexts it is treated as an empty string. The concatenation operator only returns NULL if both operands are NULL. More info in Oracle documentation: Nulls and Concatenation Operator.

It also is worth noting that, if we do this, we cannot use || as an OR operator, and its operands are always considered as strings:

mysql> SELECT 1 || 0; +--------+ | 1 || 0 | +--------+ | 10 | +--------+

If you want to use this operator in a stored procedure or a trigger, just be sure to set the proper sql_mode:

SET sql_mode := 'PIPES_AS_CONCAT'; CREATE PROCEDURE procedure_name() ...

In fact, stored procedures and triggers will run with the same sql_mode that was set during its creation.

Toodle pip,
Federico

Photo credit: (C) Graham Hogg and licensed for reuse under this Creative Commons Licence

Share

The post How to concatenate strings in MySQL and MariaDB appeared first on Federico Razzoli.

Pages