Planet MySQL

Percona Live Europe Social

One for the road…

The social events at Percona Live Europe provide the community with more time to catch up with old friends and make new contacts. The formal sessions provided lots of opportunities for exchanging notes, experiences and ideas. Lunches and coffee breaks proved to be busy too. Even so, what’s better than chilling out over a beer or two (we were in Dublin after all) and enjoying the city nightlife in good company?

Percona Live Europe made it easy for us to get together each evening.  A welcome reception (after tutorials) at Sinnott’s Pub in the heart of the City hosted a lively crowd. The Community Dinner at the Mercantile Bar, another lively city center hostelry, was a sell-out. While our closing reception was held at the conference venue, which had proven to be an excellent base. 

Many delegates took the chance to enjoy the best of Dublin’s hospitality late into the night. It’s credit to their stamina – and the fantastic conference agenda – that opening keynotes on both Tuesday and Wednesday were very well attended.

In case you think we might have been prioritizing the Guinness, though, there was the little matter of the lightning talks at the Community Dinner. Seven community-minded generous souls gave up some of their valuable socializing time to share insights into matters open source. Thank you again to Renato Losio of Funambol, Anirban Rahut of Facebook, Federico Razzoli of Catawiki, Dana Van Aken of Carnegie Mellon University, Toshaan Bharvani of VanTosh, Balys Kriksciunas of Hostinger International and Vishal Loel of Lazada.

More about the lightning talks can be seen on the Percona Live Europe website.

Many of the conference treats – coffee, cakes, community dinner – are sponsored and thanks are due once more to our sponsors who helped make Percona Live Europe the worthwhile, enjoyable event that it was.

And so Percona Live Europe drew to a close. Delegates from 43 countries headed home armed with new knowledge, new ideas and new friends. I’ve put together to give a taste of the Percona Live social meetups in this video. Tempted to join us in 2018?

Sláinte!

ClickHouse MySQL Silicon Valley Meetup Wednesday, October 25 at Uber Engineering with Percona’s CTO Vadim Tkachenko

I will be presenting at the ClickHouse MySQL Silicon Valley Meetup on Wednesday, October 25, 2017, at 6:30 PM.

ClickHouse is a real-time analytical database system. Even though they’re only celebrating one year as open source software, it has already proved itself ready for the serious workloads. We will talk about ClickHouse in general, some internals and why it is so fast. ClickHouse works in conjunction with MySQL – traditionally weak for analytical workloads – and this presentation demonstrates how to make the two systems work together.

My talk will cover how we can improve the experience with real-time analytics using ClickHouse, and how we can integrate ClickHouse with MySQL.

I want to thank our friends at Uber Engineering who agreed to host this event.

Please join us here: https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Meetup/events/243887397/.

Vadim Tkachenko, Percona CTO

Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Officer. Vadim leads Percona Labs, which focuses on technology research and performance evaluations of Percona’s and third-party products. Percona Labs designs no-gimmick tests of hardware, filesystems, storage engines, and databases that surpass the standard performance and functionality scenario benchmarks.

Vadim’s expertise in LAMP performance and multi-threaded programming help optimize MySQL and InnoDB internals to take full advantage of modern hardware. Oracle Corporation and its predecessors have incorporated Vadim’s source code patches into the mainstream MySQL and InnoDB products.

He also co-authored the book High Performance MySQL: Optimization, Backups, and Replication 3rd Edition. Previously, he founded a web development company in his native Ukraine and spent two years in the High Performance Group within the official MySQL support team.

2017 MySQL Community Team Award

The 2017 MySQL Community Team Award was presented last night at the MySQL Community Reception. As part of this event the MySQL Community Team made a presentation to Alex Rubin of Percona for solving a bug that was report over a dozen years before. The initial bug report was files September 12, 2002 MySQL Does Not Make Toast but Alex demonstrated how to solve the problem in Fixing MySQL Bug#2: now MySQL makes toast! . For this contribution and many years of long, hard work in the MySQL Community, the MySQL Community Team is providing this award to recognize this achievement (and we spent a surprising large percentage of the team budget on this award).

MySQL Performance : 2.1M QPS on 8.0-rc

The first release candidate of MySQL 8.0 is here, and I'm happy to share few performance stories about. This article will be about the "most simple" one -- our in-memory Read-Only performance ;-))

However, the used test workload was here for double reasons :


Going ahead to the second point, the main worry about New Sysbench was about its LUA overhead (the previous version 0.5 was running slower than the old one 0.4 due LUA) -- a long story short, I can confirm now that the New Sysbench is running as fast as the oldest "most lightweight" Sysbench binary I have in use ! so, KUDOS Alex !!! ;-))

While regarding the improvements coming with MySQL 8.0 on Read-Only workloads I'd mention :
  • several "overheads" were fixed
  • the most notable one is related to UTF8, of course
  • however, even latin1 related functions were improved little bit
  • but this was only about "overheads", and nothing about "scalability"
  • because the main "scalability" gap was already made with MySQL 5.7 two years ago ;-))
  • so, our main merit with MySQL 8.0 here will be rather NOT TO LOOSE the already obtained gain !
  • (agree, sounds very odd, but if you'll just look on the list of the all new features coming with 8.0 you can imagine our code path is not going to be shorter, right ? ;-))
  • so the fair test here will be to compare 8.0 vs 5.7 and 5.6 with latin1 encoding
  • (for UTF8 the winner is 8.0 and from very far, which you already know)

The most "sensible" RO workload in Sysbench is Point-Selects, so here is my test scenario:
  • workload : New Sysbench RO point-selects
  • data volume : 8 tables of 10M rows each
  • encoding : latin1
  • user load levels : 1, 2, 4, .. 1024
  • engines : MySQL 8.0, MySQL 5.7, MySQL 5.6
  • server : 96cores-HT 4CPU sockets 2.2Ghz (Broadwell), OL7.3

and here is the result :


Observations :
  • 2.1M SQL Query/sec for MySQL 8.0 -- our current new Max QPS record obtained in MySQL history !
  • which is great, no doubt !
  • however, there is a clearly seen small, but visible QPS regression on lower load levels..
  • which is not really cool (even if could be easily explained by increased code path + new DD + etc.. etc..)
  • well, adding it to my list of "low load" performance issues and will investigate later..

So far, the 2.1M QPS result is obtained on the "old" Broadwell CPU, there is no any bottlenecks observed (only potential overheads), so for the moment I have no idea what to expect on the same workload on 4CPU sockets Skylake, will share the results once have such a box in my hands ;-))

Then, many people are constantly asking me about how to reproduce the presented results, so the following all all exactly details you may need:

1) install and start your MySQL 8.0 instance (rc or current labs release)

2) the config settings I've used is here : [mysqld] # general max_connections=4000 table_open_cache=8000 table_open_cache_instances=16 back_log=1500 default_password_lifetime=0 ssl=0 performance_schema=OFF max_prepared_stmt_count=128000 skip_log_bin=1 character_set_server=latin1 collation_server=latin1_swedish_ci transaction_isolation=REPEATABLE-READ # files innodb_file_per_table innodb_log_file_size=1024M innodb_log_files_in_group=3 innodb_open_files=4000 # buffers innodb_buffer_pool_size=32000M innodb_buffer_pool_instances=16 innodb_log_buffer_size=64M # tune innodb_doublewrite=0 innodb_thread_concurrency=0 innodb_flush_log_at_trx_commit=1 innodb_max_dirty_pages_pct=90 innodb_max_dirty_pages_pct_lwm=10 join_buffer_size=32K sort_buffer_size=32K innodb_use_native_aio=1 innodb_stats_persistent=1 innodb_spin_wait_delay=6 innodb_max_purge_lag_delay=300000 innodb_max_purge_lag=0 innodb_flush_method=O_DIRECT_NO_FSYNC innodb_checksum_algorithm=none innodb_io_capacity=4000 innodb_io_capacity_max=20000 innodb_lru_scan_depth=9000 innodb_change_buffering=none innodb_read_only=0 innodb_page_cleaners=4 innodb_undo_log_truncate=off # perf special innodb_adaptive_flushing=1 innodb_flush_neighbors=0 innodb_read_io_threads=16 innodb_write_io_threads=16 innodb_purge_threads=4 innodb_adaptive_hash_index=0 # monitoring innodb_monitor_enable='%'
NOTE: yes, I know, PFS is turned OFF, the results comparing PFS=on/off are coming later, no worry ;-))

4) download the tarball with my scripts + sysbench binary + my.conf, and then untar it into "/" directory

5) this will create /BMK directory with all the stuff inside, so edit the ".bench" file to provide the account details to connect to your MySQL instance (user, passwd, host, port)

6) create "sysbench" database

7) run the load data script : $ cd /BMK $ sh sb_exec/sb11-Prepare_10M_8tab-InnoDB.sh this will load 8 tables with 8M rows each

8) run ANALYZE on each table within sysbench database to be sure your stats for these tables are up-to-date

9) run the test : $ cd /BMK $ for nn in 1 2 4 8 16 32 64 128 256 512 1024 do sh sb_exec/sb11-OLTP_RO_10M_8tab-uniform-ps-p_sel1-notrx.sh $nn 300 sleep 60 done
10) enjoy ;-))

stay tuned, more to come..

Rgds,
-Dimitri

MyRocks Metrics Now in PMM 1.3.0

One of the most exciting features shipped in the Percona Monitoring and Management 1.3.0 (PMM) release is support for MyRocks metrics via a new Metrics Monitor dashboard titled MySQL MyRocks Metrics. The support in PMM follows the recent Percona Server for MySQL release 5.7.19 from September 6, where Percona delivered an EXPERIMENTAL version of MyRocks for non-Production usage.

The MyRocks storage engine from Facebook is based on RocksDB, a persistent key-value store for fast storage environments. MyRocks is optimized for fast storage and combines outstanding space and write efficiency with acceptable read performance. As a result, MyRocks has the following advantages compared to other storage engines (if your workload uses fast storage, such as SSD):

  • Requires less storage space
  • Provides more storage endurance
  • Ensures better IO capacity
MyRocks Database Operations

This graph will help you visualize MyRocks database operations of Next and Seek attributes:

MyRocks Cache Activity

We also have a graph to help you visualize the count of Hits and Misses on the MyRocks cache:

MyRocks Cache Data Bytes Read/Write

Finally, another important MyRocks graph will help you understand the volume of data read and written to the MyRocks cache:

Please note that the MyRocks storage engine is not suitable (yet) for production workloads, but if you are testing this technology take a moment to install PMM in order to take advantage of our new MySQL MyRocks Metrics dashboard!

In PMM, you can view the metrics provided by the information schema as well as various data reported by the RocksDB engine’s status used by your MySQL database instance.

Webinar October 4, 2017: Databases in the Hosted Cloud

Join Percona’s Chief Evangelist, Colin Charles as he presents Databases in the Hosted Cloud on Wednesday, October 4, 2017, at 7:00 am PDT / 10:00 am EDT (UTC-7).

Register Now

Today you can use hosted MySQL/MariaDB/Percona Server for MySQL/PostgreSQL in several “cloud providers” as a database as a service (DBaaS). Learn the differences, the access methods and the level of control you have for the various public databases in the hosted cloud offerings:

  • Amazon RDS including Aurora
  • Google Cloud SQL
  • Rackspace OpenStack DBaaS
  • Oracle Cloud’s MySQL Service

The administration tools and ideologies behind each are completely different, and you are in a “locked-down” environment. Some considerations include:

  • Different backup strategies
  • Planning for multiple data centers for availability
  • Where do you host your application?
  • How do you get the most performance out of the solution?
  • What does this all cost?
  • Monitoring

Growth topics include:

  • How do you move from one DBaaS to another?
  • How do you move from a DBaaS to your own hosted platform?

Register for the webinar here.

Colin Charles, Chief Evangelist

Colin Charles is the Chief Evangelist at Percona. He was previously on the founding team for MariaDB Server in 2009, worked in MySQL since 2005 and been a MySQL user since 2000. Before joining MySQL, he worked actively on the Fedora and OpenOffice.org projects. He’s well known within many open source communities and has spoken on the conference circuit.

 

MySQL Shell with Command Completion

MySQL ShellCLI Interfaces are usually boring, ASCII-ish functional interfaces that are as about as exciting as paint drying or end user license agreements for your rice steamer. They get the job done but no excitement. The new MySQL Shell (mysqlsh instead of mysql at the command line is a great new tool but like its predecessor it is not exactly visually dynamic.

Until Now.

At labs.mysql.com there is a new version of the MySQL Shell that adds some new functionality and some visual enticements. I was in a session at Oracle OpenWorld and was impressed by not only the visually stunning upgrade but by the fact that we now get command auto-completion!

You can login as you did with the old shell but then you see that little bit of information in the command line. Pretty, for a CLI. Security tooSo connect to a database and then check your status. And pow! your connected with SSL. I am running the MySQL 8.0.3 RC and the labs release of the shell but I did not set up SSL/TLS, but there is it.

Command CompletionAnd there is even command completion. Few of us cam remember all the syntax for MySQL Syntax plus all the table or schema info.Take this example. I switch into SQL mode (note the notice about loading the auto-completion information). The first query I got to SELECT * FROM and then hit a tab. Auto-completion wisely asked if I wanted all 752 possibilities. Then I entered SELECT * FROM ci before entering a tab. i was given two options, one for CIPHER and the other for the table city.



Please TestPlease download this prospective new shell from Labs.MySQL.Com, kick the tires, and send feedback.

Paramiko 2.x Update Tutorial

MySQL Workbench has tunneling capabilities, and these
depend on third-party libraries.

One required library is Paramiko, which is a python module used to 
create an SSH connection to the destination server. MySQL Workbench uses
Paramiko 1.8.x, and 1.8.x depends on the pycrypto library. But
because recent Python distributions replaced pycrypto with a
different cryptography module, this causes problems for software
(like Workbench) that depends on the older (1.8.x) Paramiko library, 
problems such as MySQL BUG#87381 (http://bugs.mysql.com/bug.php?id=87381).

For details on upgrading the Paramiko library, see our previous blog post
titled
Paramiko Update Tutorial
. The procedure to upgrade to
Paramiko 2.x is the same as described in the tutorial, except 2.x 
requires these additional dependencies: cryptography, bcrypt, and nacl.
These should also be placed next to the paramiko folder as described 
in that tutorial.

We hope this helps!

Big Dataset: All Reddit Comments – Analyzing with ClickHouse

In this blog, I’ll use ClickHouse and Tabix to look at a new very large dataset for research.

It is hard to come across interesting datasets, especially a big one (and by big I mean one billion rows or more). Before, I’ve used on-time airline performance available from BUREAU OF TRANSPORTATION STATISTICS. Another recent example is NYC Taxi and Uber Trips data, with over one billion records.

However, today I wanted to mention an interesting dataset I found recently that has been available since 2015. This is Reddit’s comments and submissions dataset, made possible thanks to Reddit’s generous API. The dataset was first mentioned at “I have every publicly available Reddit comment for research,” and currently you can find it at pushshift.io. However, there is no guarantee that pushshift.io will provide this dataset in the future. I think it would be valuable for Amazon or another cloud provider made this dataset available for researchers, just as Amazon provides https://aws.amazon.com/public-datasets/.

The dataset contains 2.86 billion records to the end of 2016 and is 709GB in size, uncompressed. This dataset is valuable for a variety of research scenarios, from simple stats to natural language processing and machine learning.

Now let’s see what simple info we can collect from this dataset using ClickHouse and https://tabix.io/, a GUI tool for ClickHouse. In this first round, we’ll figure some basic stats, like number of comments per month, number of authors per month and number of subreddits. I also added how many comments in average are left for a post.

Queries to achieve this:

SELECT toYYYYMM(created_date) dt,count(*) comments FROM commententry1 GROUP BY dt ORDER BY dt ;; SELECT toYYYYMM(created_date) dt,count(DISTINCT author) authors FROM commententry1 GROUP BY dt ORDER BY dt ;; SELECT toYYYYMM(created_date) dt,count(DISTINCT subreddit) subreddits FROM commententry1 GROUP BY dt ORDER BY dt ;; SELECT toYYYYMM(created_date) dt,count(*)/count(distinct link_id) comments_per_post FROM commententry1 GROUP BY dt ORDER BY dt

And the graphical result:


It impressive to see the constant growth in comments (to 70mln per month by the end of 2016) and authors (to 3.5mln for the same time period). There is something interesting happening with subreddits, which jump up and down. It’s interesting to see that the average count of comments per post stays stable, with a slight decline to 13 comments/post by the end of 2016.

Now let’s check most popular subreddits:

SELECT subreddit,count(*) cnt FROM commententry1 GROUP BY subreddit ORDER BY cnt DESC limit 100 DRAW_TREEMAP { path:'subreddit.cnt' }

and using a treemap (available in Tabix.io):

We can measure subreddits that get the biggest increase in comments in 2016 compared to 2015:

SELECT subreddit,cntnew-cntold diff FROM (SELECT subreddit,count(*) cntnew FROM commententry1 WHERE toYear(created_date)=2016 GROUP BY subreddit) ALL INNER JOIN (SELECT subreddit,count(*) cntold FROM commententry1 WHERE toYear(created_date)=2015 GROUP BY subreddit) USING (subreddit) ORDER BY diff DESC LIMIT 50 DRAW_TREEMAP { path:'subreddit.diff' }

Obviously, Reddit was affected by the United States Presidential Election 2016, but not just that. The gaming community saw an increase in Overwatch, PokemonGO and Dark Souls 3.

Now we can try to run our own DB-Ranking, but only based on Reddit comments. This is how I can do this for MySQL, PostgreSQL and MongoDB:

SELECT toStartOfQuarter(created_date) Quarter, sum(if(positionCaseInsensitive(body,'mysql')>0,1,0)) mysql, sum(if(positionCaseInsensitive(body,'postgres')>0,1,0)) postgres, sum(if(positionCaseInsensitive(body,'mongodb')>0,1,0)) mongodb FROM commententry1 GROUP BY Quarter ORDER BY Quarter;

I would say the result is aligned with https://db-engines.com/en/ranking, where MySQL is the most popular among the three, followed by PostgreSQL and then MongoDB. There is an interesting spike for PostgreSQL in the second quarter in 2015, caused by a bot in “leagueoflegend” tournaments. The bot was actively announcing that it is powered by PostgreSQL in the comments, like this: http://reddit.com/r/leagueoflegends/comments/37cvc3/c/crln2ef.

To highlight more ClickHouse features: along with standard SQL functions, it provides a variety of statistical functions (for example, Quantile calculations). We can try to see the distribution of the number of comments left by authors:

SELECT quantileExact(0.1)(cnt), quantileExact(0.2)(cnt), quantileExact(0.3)(cnt), quantileExact(0.4)(cnt), quantileExact(0.5)(cnt), quantileExact(0.6)(cnt), quantileExact(0.7)(cnt), quantileExact(0.8)(cnt), quantileExact(0.9)(cnt), quantileExact(0.99)(cnt) FROM ( SELECT author, count(*) AS cnt FROM commententry1 WHERE author != '[deleted]' GROUP BY author )

The result is:

quantileExact(0.1)(cnt) - 1 quantileExact(0.2)(cnt) - 1 quantileExact(0.3)(cnt) - 1 quantileExact(0.4)(cnt) - 2 quantileExact(0.5)(cnt) - 4 quantileExact(0.6)(cnt) - 7 quantileExact(0.7)(cnt) - 16 quantileExact(0.8)(cnt) - 42 quantileExact(0.9)(cnt) - 160 quantileExact(0.99)(cnt) - 2271

Which means that 30% of authors left only one comment, and 50% of authors left four comments or less.

In general, ClickHouse was a pleasure to use when running analytical queries. However, I should note the missing support of WINDOW functions is a huge limitation. Even MySQL 8.0, which recently was released as RC, provides support for WINDOW functions. I hope ClickHouse will implement this as well.

Histogram statistics in MySQL

As of MySQL 8.0.3, you now have the ability to create histogram statistics in order to provide more statistics to the optimizer. In this blog post, we will have a look at how you can create histogram statistics, and we will explain when it might be useful to have histogram statistics.…

Getting Started with MariaDB Galera and MariaDB MaxScale on CentOS

Getting Started with MariaDB Galera and MariaDB MaxScale on CentOS anderskarlsson4 Mon, 10/02/2017 - 15:05 Introduction

A good thing with Galera as a High Availability solution for MariaDB is that it is rather easy to set up and use, as far as High Availability can be easy. Also, both MariaDB and Galera are well documented. To use MariaDB MaxScale for failover with a MariaDB Galera Cluster is also rather straightforward and well documented. The issue is that there are three technologies at play here, and although they are all well documented and reasonably easy to use, when it comes to using all these together this turn slightly more difficult.

This blogpost then aims at showing all these three technologies at play together, starting from scratch and creating a MariaDB Galera Cluster based on MariaDB Galera 10.2.7 and MaxScale 2.1. The operating system we are going to use for this example is CentOS version 7.2. Before we get started I also want to mention that we are here looking at a minimal initial installation, we will not look at many fancy features in either product.

Target system

Also, before we start, let's have a look at what we aim in terms of a completed system. We will build a MariaDB Galera Cluster with 3 nodes and a fourth node for MaxScale. The cluster is built on three separate virtual machines in this case, and the IP addresses of all the machines in this set up is:

IP Address Node use 192.168.0.180 MariaDB MaxScale node 192.168.0.181 MariaDB Galera Cluster node 1 192.168.0.182 MariaDB Galera Cluster node 2 192.168.0.183 MariaDB Galera Cluster node 3
Installing MariaDB Galera Cluster

In this section we will set up the MariaDB Galera Cluster from start to finish. As said above we will for the most part look at only the basic settings for the cluster to get started, no advanced settings will be used and for a production environment, you want to fine tune this.

Linux settings

There are just a few things that we are to adjust in the standard Linux installation before we commence, and this is to disable SELinux and the Linux firewall (which is firewalld in CentOS and RedHat 7.0 and up, and not iptables) and also set the hostname.

Disable SELinux

For all intents and purposes, in a production environment running with SELinux enabled is often a good idea. For the purposes of testing as we are doing here though, we do not want SELinux around at all. For this make sure that your SELinux configuration, in the file /etc/selinux/config,  looks something like this:

# This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: #     enforcing - SELinux security policy is enforced. #     permissive - SELinux prints warnings instead of enforcing. #     disabled - No SELinux policy is loaded. SELINUX=disabled # SELINUXTYPE= can take one of these two values: #     targeted - Targeted processes are protected, #     minimum - Modification of targeted policy. Only selected processes are protected. #     mls - Multi Level Security protection. SELINUXTYPE=targeted


The change here is the SELINUX setting of course.

Disable firewalld

Firewalld is a standard service that is disabled using the systemctl command:

$ sudo systemctl disable firewalld
Set hostname

This is real simple, and we do this to be able to tell from the MariaDB command prompt which server I am connecting to when we use MariaDB MaxScale. On each node run something like this:

$ sudo hostname node181

And be sure to name all the nodes appropriately and different :-)

Rebooting and checking the new settings

At this point it is best to reboot to ensure that your setting are enabled. So reboot now and then check the status of SELinux and firewalld:

$ sestatus SELinux status:                 disabled $ systemctl status firewalld ● firewalld.service - firewalld - dynamic firewall daemon    Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)    Active: inactive (dead)   Aug 10 12:24:09 localhost.localdomain systemd[1]: Stopped firewalld - dynamic firewall daemon.


Make sure that you disable SELinux and firewalld on all the four machines we are using if you are following this example.

Software installation

Before we install the software we need to set up the MariaDB repository on all 4 servers:

$ curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash


Having run this on the four servers, let us now go on with installing MariaDB Galera server on the three nodes where this is appropriate, in the case here we are looking at nodes 192.168.0.181, 192.168.182 and 192.1678.0.183. On these three nodes run this this:

$ sudo yum -y install MariaDB-server


When this is completed, we should have MariaDB Server installed. The next thing to do then is to install MariaDB MaxScale on the 192.168.0.180 box:

$ sudo yum -y install maxscale


Now, only one thing remains to install, which strictly speaking is optional, but it is used when we test what we are setting up here, and that is to install the MariaDB client programs on the machine we run MariaDB MaxScale on, so on 192.168.0.180 run:

$ sudo yum -y install MariaDB-client


With that in place we are ready to get to the next step, which is to configure a MariaDB Galera Cluster.

Setting up MariaDB Galera Cluster

Before we start up MariaDB, we need to configure the cluster. This is not complicated, but there are a few settings that needs to be in place. Again, note that what we are setting up here is the bare minimum required to get started, in real life there are a bunch of more parameters you would want to set up. Also, beyond Galera I am more or less leaving MariaDB as it is, again with the exception of a few things that Galera requires. All in all, we are not configuring any InnoDB cache or metadata cache or defining a non-default name of the cluster.

We have to edit the file /etc/my.cnf.d/server.cnf and we are to adjust the Galera specific settings on the nodes 192.168.0.181, 192.168.182 and 192.1678.0.183. Edit the [galera] section to look like this on all three nodes:

[galera] # Mandatory settings wsrep_on=ON wsrep_provider=/usr/lib64/galera/libgalera_smm.so wsrep_cluster_address=gcomm://192.168.0.181,192.168.0.182,192.168.0.183 binlog_format=row default_storage_engine=InnoDB innodb_autoinc_lock_mode=2


Let's have a look at these settings now and see what they do:

  • wsrep_on - This is a session level flag to indicate of the operations we are performing will be replicated, much like how the sql_log_bin setting controls if statements are written to the binlog when using MariaDB Replication. The default of this is ON, but we set it anyway, to be safe.
  • wsrep_provider - This points to the location of the Galera library. Although MariaDB is set up to use Galera from scratch, you still have to point to the Galera library. This is installed as part of the MariaDB-Server installation above.
  • wsrep_cluster_address - This is where we define the nodes in the cluster, in general you don't need to list all nodes and new nodes can be added later to a running cluster, but in this case we know what the cluster looks like so we set it up here.
  • binlog_format - Although the binlog, in terms of the actual binlog files, isn't used by Galera, the binlog facility is, and for Galera to work you have to run with row format in the binlog.
  • default_storage_engine - Again, this is the default value, but just to be safe let's set up MariaDB Server to explicitly use the InnoDB Storage Engine, which is the only engine supported by Galera.
  • innodb_autoinc_lock_mode - This setting defines how the InnoDB Storage Engine generates values for auto_increment columns. Using mode 2 here is very important for Galera to work properly. In short, mode 2 cause much less locking during auto_increment generation and hence doesn't interfere with other locking. Values other than 2 can cause deadlocking and other issues with Galera.

With these settings in place, we are ready to start the cluster.

Starting MariaDB Cluster

The way starting a Cluster from scratch works is that we run a process called a bootstrap, and the reason this is a bit different from the usual MariaDB startup is that for HA reasons a node in a cluster attaches to one or more other nodes in the cluster, but for the first node, this is not possible. This is not complicated though, there is a script that is included with MariaDB Server that manages this, but note that this script is only to be used when the first node in a Cluster is started with no existing nodes in it. In this case, on 192.168.0.181 run:

$ sudo galera_new_cluster


With this in place, we should have a mariadb server running, let's have a look:

$ ps -f -u mysql | more UID        PID  PPID  C STIME TTY          TIME CMD mysql     3472     1  0 14:42 ?        00:00:00 /usr/sbin/mysqld --wsrep-new-cluster --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1


As you can see, the galera_new_cluster script has started MariaDB, but with some additional parameters, notably --wsrep_new_cluster. Before we continue, let's also look at the status of the cluster from the commandline:

$ mysql -u root Welcome to the MariaDB monitor.  Commands end with ; or \g. Your MariaDB connection id is 10 Server version: 10.2.7-MariaDB MariaDB Server   Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.   Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.   MariaDB [(none)]> show global status like 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name      | Value | +--------------------+-------+ | wsrep_cluster_size | 1     | +--------------------+-------+ 1 row in set (0.00 sec)


This shows that we have a running cluster (if not, the value of wsrep_cluster_size would have been 0), but there is just one node. Let's now start another node, and note that although we are starting a cluster from scratch, only the first node needs bootstraping, so here, on 192.168.0.182, we start MariaDB in the usual fashion:

$ sudo systemctl start mariadb.service


We should now have 2 nodes running in the cluster, let's check it out from the MariaDB commandline on 192.168.0.181:

MariaDB [(none)]> show global status like 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name      | Value | +--------------------+-------+ | wsrep_cluster_size | 2     | +--------------------+-------+ 1 row in set (0.00 sec)


Yes, we have 2 nodes now, so then on 192.168.0.183, start MariaDB and we have a complete 3-node cluster running.

$ sudo systemctl start mariadb.service


And on 192.168.0.181, let's verify that we are done, so far:
 

MariaDB [(none)]> show global status like 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name      | Value | +--------------------+-------+ | wsrep_cluster_size | 3     | +--------------------+-------+ 1 row in set (0.00 sec) Getting Started with MariaDB MaxScale

MariaDB MaxScale will in this scenario act as a router to the cluster, ensuring that traffic is directed to the appropriate server. There are a few means of dealing with this, and MaxScale also has a lot of options, but again we will be dealing with setting up the bara minimum for the use case we have in mind.
Before we get started though, we need to set up the MariaDB servers to work with MariaDB MaxScale and there are a few reasons for this is. One reason is that MaxScale monitors the Cluster out-of-band, which means that the cluster is constantly monitored, even if there are no user connections. Another reason is that when there is a connection from a client through MaxScale to the MariaDB Cluster, it is MaxScale that does the user authentication and authentication data is picked up from the MariaDB cluster.

Setting up MariaDB for MariaDB MaxScale

First we need to set up a user that MariaDB MaxScale use to attach to the cluster to get authentication data. On 192.168.0.181, using the MariaDB command line as the database root user:

$ mysql -u root Welcome to the MariaDB monitor.  Commands end with ; or \g. Your MariaDB connection id is 11 Server version: 10.2.7-MariaDB MariaDB Server   Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.   Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.   MariaDB [(none)]> create user 'myuser'@'192.168.0.180' identified by 'mypwd'; Query OK, 0 rows affected (0.01 sec)   MariaDB [(none)]> grant select on mysql.user to 'myuser'@'192.168.0.180'; Query OK, 0 rows affected (0.01 sec)


Following this, we need some extra privileges for table and database level grants:

MariaDB [(none)]> grant select on mysql.db to 'myuser'@'192.168.0.180'; Query OK, 0 rows affected (0.01 sec)   MariaDB [(none)]> grant select on mysql.tables_priv to 'myuser'@'192.168.0.180'; Query OK, 0 rows affected (0.00 sec)   MariaDB [(none)]> grant show databases on *.* to 'myuser'@'192.168.0.180'; Query OK, 0 rows affected (0.00 sec)


Note that the above commands need only be run on one of the servers in the cluster (say, 192.168.0.181) as these grants are replicated to all servers by virtue of Galera.
With this in place, we are ready to configure MariaDB MaxScale.

Configuring MariaDB MaxScale

The configuration for MariaDB MaxScale is in the file /etc/maxscale.cnf and in this case we will create a new configuration from scratch instead of amending the existing one. I will explain the important aspects of this, but before that, this is what it looks like, and again, no fancy configuration, just the basics to get going:
 

# Globals [maxscale] threads=1   # Servers [server1] type=server address=192.168.0.181 port=3306 protocol=MySQLBackend   [server2] type=server address=192.168.0.182 port=3306 protocol=MySQLBackend   [server3] type=server address=192.168.0.183 port=3306 protocol=MySQLBackend   # Monitoring for the servers [Galera Monitor] type=monitor module=galeramon servers=server1,server2,server3 user=myuser passwd=mypwd monitor_interval=1000   # Galera router service [Galera Service] type=service router=readwritesplit servers=server1,server2,server3 user=myuser passwd=mypwd   # MaxAdmin Service [MaxAdmin Service] type=service router=cli   # Galera cluster listener [Galera Listener] type=listener service=Galera Service protocol=MySQLClient port=3306   # MaxAdmin listener [MaxAdmin Listener] type=listener service=MaxAdmin Service protocol=maxscaled socket=default MariaDB MaxScale configuration file format

The format of the MariaDB MaxScale configuration file is, as you see above, similar to the one used by MariaDB Server. There are a few differences though. One is that as MariaDB MaxScale does more or less everything through plugins, and this is reflected in the configuration file. Each instance of a plugin has a separate section and the name of the section is used when referenced from some other plugin, so the section names are not fixed, but rather are used to name an instance of a service. This is true except in the case of global settings, which are not related to any particular plugin and are placed in the [maxscale] section.
A related setting is the type setting which defines what type of plugin this section related to, which is currently one of filter, listener, monitor, server or service. In addition, all plugins has a setting that defines the name of the plugin to load, or the name of the shared object file that MariaDB MaxScale will be loaded.

Global settings

In this case I have only one global setting which is to set the number of MariaDB MaxScale threads that we have running, in this case it is set to 1, which is the default. There might be many reasons to have a higher setting here, but for this simple example 1 thread is enough.

Servers

The settings should be fairly obvious, with the protocol setting defining what protocol plugin is being used here, and MySQLBackend is the only option so far. 

Monitors

A monitor is the plugin that checks the status of servers and the important setting here is the module which is set to galeramon in this case. There are a few different monitors available for different type of setup of the backend servers, but in this case we are using Galera so galeramon is what we want to use.
The user and password settings define how the monitor connects to the backend servers to get the current status and monitor_interval defines how often, in milliseconds, that we connect to the servers and check status.

Services

In this we define two services, one which is our main service and then we define an administrative service. The interesting service here is the Galera Service one, and the first thing we need to look at which router we will use, a router here being the actual implementation of the router plugin and there are several to choose from. Another thing we have to define is which servers make up the cluster that we are working with here. Finally we set a user and password that MariaDB MaxScale use to connect to the servers in the cluster to get authentication data.
We are for now ignoring the management MariaDB MaxScale services.

Listeners

This last type of plugin we define are the listeners, and these are the plugins that implement that actual protocol that listens for client connections. The interesting listener here is Galera Listener which listens on the MariaDB client protocol connections, again we are ignoring the MariaDB MaxScale management listener.
An important aspect of defining a listener is to use the correct service parameter, and this is set to the service which this listen connects to when there is a new connection. In this case we connect to the Galera Service we define above.
Note that we have the listener running on the MariaDB Server default port of 3306 and that this is deliberate and works fine as MariaDB MaxScale runs on a separate server from the ones where MariaDB Server runs.

Starting and testing MariaDB MaxScale

With the configuration described above in place, we are ready to start MaxScale on 192.168.0.180, to do this, just run:

$ sudo systemctl start maxscale.service


And before I leave you, let's test a few things. First, let's connect to the cluster through MariaDB MaxScale:

 

$ mysql -h 192.168.0.180 -u myuser -pmypwd Welcome to the MariaDB monitor.  Commands end with ; or \g. Your MySQL connection id is 4668 Server version: 10.0.0 2.1.5-maxscale MariaDB Server   Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.   Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.   MySQL [(none)]>

You can see that we are connected to MariaDB MaxScale now, but which server in the MariaDB Galera Cluster? Let's check it up!

MySQL [(none)]> show variables like 'hostname'; +---------------+---------+ | Variable_name | Value   | +---------------+---------+ | hostname      | node181 | +---------------+---------+ 1 row in set (0.00 sec)


It's the 181 server it seems. Let's then try one more thing before we are done for today, let's stop MariaDB server on 192.168.0.181 and see what happens. On 192.168.0.181 run:

$ sudo systemctl stop mariadb.service


And then we go back to our command prompt on 192.168.0.180 and see what happens when we access MariaDB Server from there.

$ mysql -h 192.168.0.180 -u myuser -pmypwd Welcome to the MariaDB monitor.  Commands end with ; or \g. Your MySQL connection id is 4668 Server version: 10.0.0 2.1.5-maxscale MariaDB Server   Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.   Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.   MySQL [(none)]> show variables like 'hostname'; +---------------+---------+ | Variable_name | Value   | +---------------+---------+ | hostname      | node182 | +---------------+---------+ 1 row in set (0.00 sec)


With that we are done for now. A follow up blogpost or two are planned to show some more advanced settings.

Happy SQL'ing
/Karlsson

MariaDB Galera Cluster is a great way to implement High Availability with MariaDB and combining it with MariaDB MaxScale creates a great High Availability Cluster with transparent failover. Getting started with any HA solution is mostly a bit confusing as there are so many components that work together and each has to be configured to work with the other. This blogpost shows how to get started with MariaDB Galera Cluster with MariaDB MaxScale on CentOS / RHEL 7.

Daniel Heller

Fri, 10/06/2017 - 01:42

I have just bookmarked this post as it is going to take me some time to go through all the information you have compiled! As I am in this field I know I will find some very valuable advice and information that can only help ( https://www.writemyessayz.co/ )!

Login or Register to post comments

MySQL Partitioning [Quick Overview]

This Blog Post is an overview of MySQL Partitioning and its operations, Will explore following points for MySQL Partitioning,

About Partitioning:

  • It’s a plugin named as “Partition”
  • MySQL Partitioning has inbuilt support for binary install setup. If you are using MySQL source then build must be configured with the -DWITH_PARTITION_STORAGE_ENGINE
  • To disable partitioning start server with –skip-partition [ This will not allow any access/manipulation to partition table data.]
  • For MySQL commercial edition Partitioning supported by MySQL Enterprise and MySQL Cluster edition, Partitioning not supported by MySQL Standard edition.
  • MySQL Community binaries provided by Oracle include partitioning support.
  • No Special configuration needed in my.cnf file.
  • MySQL partitioning not supported for MERGE, CSV, or FEDERATEDstorage engines.
  • Metadata information about partition tables in INFORMATION_SCHEMA.PARTITIONS
  • Data and Index together will be part of Partitioning for InnoDB Storage engine.
  • For MyISAM tables, data and indexes can be stored across different disk/location.
SHOW PLUGINS; SELECT PLUGIN_NAME, PLUGIN_VERSION, PLUGIN_STATUS FROM INFORMATION_SCHEMA.PLUGINS WHERE PLUGIN_NAME='partition'; Partition Types:

RANGE Partitioning:  Partitioning based on Range values in column . For example, partitioning data for each year.

Example:

CREATE TABLE pt_range ( inc_id BIGINT, inc_tital VARCHAR(250) NOT NULL, inc_desc TEXT, owner VARCHAR(250), created_date DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP )ENGINE=INNODB PARTITION BY RANGE( YEAR(created_date) ) ( PARTITION p0 VALUES LESS THAN (2000), PARTITION p1 VALUES LESS THAN (2003), PARTITION p2 VALUES LESS THAN (2006), PARTITION p3 VALUES LESS THAN (2009), PARTITION p4 VALUES LESS THAN (2012), PARTITION p5 VALUES LESS THAN (2015), PARTITION p6 VALUES LESS THAN (2018), PARTITION p7 VALUES LESS THAN MAXVALUE );

NOTE:
If you are using primary key in partition table, then partitioning column must be a part of primary key otherwise table creation will fail with the following error,
ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table’s partitioning function

For Date and Time type columns only DATE and DATETIME datatypes are supported for RANGE Partitioning. For example use of TIMESTAMP data type in partitioning column will throw the following error,
ERROR 1486 (HY000): Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed.

With Primary key:

CREATE TABLE pt_range ( inc_id bigint NOT NULL AUTO_INCREMENT, inc_tital VARCHAR(250) NOT NULL, inc_desc TEXT, owner VARCHAR(250), created_date DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (inc_id,created_date) )ENGINE=INNODB PARTITION BY RANGE( YEAR(created_date) ) ( PARTITION p0 VALUES LESS THAN (2000), PARTITION p1 VALUES LESS THAN (2003), PARTITION p2 VALUES LESS THAN (2006), PARTITION p3 VALUES LESS THAN (2009), PARTITION p4 VALUES LESS THAN (2012), PARTITION p5 VALUES LESS THAN (2015), PARTITION p6 VALUES LESS THAN (2018), PARTITION p7 VALUES LESS THAN MAXVALUE );

 

LIST Partitioning:  Partitioning of data the basis of list of values in partition column. For example column with status_id

Example:

CREATE TABLE pt_list ( inc_id bigint NOT NULL AUTO_INCREMENT, Inc_tital VARCHAR(250) NOT NULL, inc_desc TEXT, owner VARCHAR(250), created_date DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, status_id tinyint NOT NULL, PRIMARY KEY (inc_id,created_date,status_id) ) PARTITION BY LIST (status_id) ( PARTITION p1 VALUES IN (1,2), PARTITION p2 VALUES IN (3), PARTITION p3 VALUES IN (4,5) );

Key point:  

  • Partitioning column must be an integer type.

 

List Column Portioning:  This is a variant of LIST partitioning allows the use columns of types other than integer types for partitioning columns, as well as to use multiple columns as partitioning keys.

Example:

CREATE TABLE pt_list_column ( inc_id bigint NOT NULL, inc_tital VARCHAR(250) NOT NULL, inc_desc TEXT, owner VARCHAR(250), created_date DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, dept_name varchar(100) NOT NULL DEFAULT 'other', KEY (dept_name) ) PARTITION BY LIST COLUMNS (dept_name) ( PARTITION p1 VALUES IN ('support','development'), PARTITION p2 VALUES IN ('sales'), PARTITION p3 VALUES IN ('other','admin') );

On Disk:

pt_key.frm pt_key#P#p0.ibd pt_key#P#p1.ibd pt_key#P#p2.ibd pt_key#P#p3.ibd pt_key#P#p4.ibd

Key point:  

  • Supported in MySQL 5.6 version.
  • Enum datatype for partition column not supported.

 

HASH Partitioning: MySQL Hashing will do an even distribution of data for specified partitioning column with given number of partitions.

Example:

CREATE TABLE pt_hash ( inc_id bigint NOT NULL AUTO_INCREMENT, inc_tital VARCHAR(250) NOT NULL, inc_desc TEXT, owner VARCHAR(250), created_date DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, status_id tinyint NOT NULL, PRIMARY KEY (inc_id,created_date,status_id) ) PARTITION BY HASH (status_id) PARTITIONS 5;

On Disk:

pt_hash#P#p0.ibd pt_hash#P#p1.ibd pt_hash#P#p2.ibd pt_hash#P#p3.ibd pt_hash#P#p4.ibd

Key Points:

  • If you do not include a partition clause, the number of partitions defaults to 1.

 

LINEAR HASH Partitioning: Linear hashing utilizes a linear powers-of-two algorithm whereas regular hashing employs the modulus of the hashing function’s value.

Example:

CREATE TABLE pt_linear_hash ( inc_id bigint NOT NULL AUTO_INCREMENT, inc_tital VARCHAR(250) NOT NULL, inc_desc TEXT, owner VARCHAR(250), created_date DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, status_id tinyint NOT NULL, PRIMARY KEY (inc_id,created_date,status_id) ) PARTITION BY LINEAR HASH (YEAR(created_date)) PARTITIONS 12;

 

KEY Partitioning: Partitioning by key is similar to partitioning by hash, except that where hash partitioning employs a user-defined expression, the hashing function for key partitioning is supplied by the MySQL server.

Example:

CREATE TABLE pt_key ( inc_id bigint NOT NULL AUTO_INCREMENT, inc_tital VARCHAR(250) NOT NULL, inc_desc TEXT, owner VARCHAR(250), created_date DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, status_id tinyint(5) NOT NULL, PRIMARY KEY (inc_id,created_date,status_id) ) PARTITION BY KEY (status_id) PARTITIONS 5;

Key point:

  • Default partitioning on the basis of a primary key. If there no primary key then it will use available unique key.

 

Subpartitioning: [Composite Partitioning ]

Example:

CREATE TABLE pt_sub ( inc_id bigint NOT NULL AUTO_INCREMENT, inc_tital VARCHAR(250) NOT NULL, inc_desc TEXT, owner VARCHAR(250), created_date DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, status_id tinyint NOT NULL, PRIMARY KEY (inc_id,created_date,status_id) ) PARTITION BY LIST (status_id) SUBPARTITION BY HASH ( YEAR(created_date)) SUBPARTITIONS 12( PARTITION p1 VALUES IN (1,2) ENGINE = InnoDB, PARTITION p2 VALUES IN (3) ENGINE = InnoDB, PARTITION p3 VALUES IN (4,5) ENGINE = InnoDB );

On Disk:

pt_sub.frm pt_sub#P#p1#SP#p1sp4.ibd pt_sub#P#p2#SP#p2sp10.ibd pt_sub#P#p2#SP#p2sp6.ibd pt_sub#P#p3#SP#p3sp1.ibd pt_sub#P#p3#SP#p3sp8.ibd pt_sub#P#p1#SP#p1sp0.ibd pt_sub#P#p1#SP#p1sp5.ibd pt_sub#P#p2#SP#p2sp11.ibd pt_sub#P#p2#SP#p2sp7.ibd pt_sub#P#p3#SP#p3sp2.ibd pt_sub#P#p3#SP#p3sp9.ibd pt_sub#P#p1#SP#p1sp10.ibd pt_sub#P#p1#SP#p1sp6.ibd pt_sub#P#p2#SP#p2sp1.ibd pt_sub#P#p2#SP#p2sp8.ibd pt_sub#P#p3#SP#p3sp3.ibd pt_sub#P#p1#SP#p1sp11.ibd pt_sub#P#p1#SP#p1sp7.ibd pt_sub#P#p2#SP#p2sp2.ibd pt_sub#P#p2#SP#p2sp9.ibd pt_sub#P#p3#SP#p3sp4.ibd pt_sub#P#p1#SP#p1sp1.ibd pt_sub#P#p1#SP#p1sp8.ibd pt_sub#P#p2#SP#p2sp3.ibd pt_sub#P#p3#SP#p3sp0.ibd pt_sub#P#p3#SP#p3sp5.ibd pt_sub#P#p1#SP#p1sp2.ibd pt_sub#P#p1#SP#p1sp9.ibd pt_sub#P#p2#SP#p2sp4.ibd pt_sub#P#p3#SP#p3sp10.ibd pt_sub#P#p3#SP#p3sp6.ibd pt_sub#P#p1#SP#p1sp3.ibd pt_sub#P#p2#SP#p2sp0.ibd pt_sub#P#p2#SP#p2sp5.ibd pt_sub#P#p3#SP#p3sp11.ibd pt_sub#P#p3#SP#p3sp7.ibd Partition table info and SELECT Operations:

SELECT TABLE_NAME,PARTITION_NAME,SUBPARTITION_NAME,PARTITION_METHOD,SUBPARTITION_METHOD FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_SCHEMA='db_name';

Sample data:

Partition details: PARTITION BY RANGE( YEAR(created_date) ) ( PARTITION p0 VALUES LESS THAN (2000), PARTITION p1 VALUES LESS THAN (2003), PARTITION p2 VALUES LESS THAN (2006), PARTITION p3 VALUES LESS THAN (2009), PARTITION p4 VALUES LESS THAN (2012), PARTITION p5 VALUES LESS THAN (2015), PARTITION p6 VALUES LESS THAN (2018), PARTITION p7 VALUES LESS THAN MAXVALUE ); mysql> select * from pt_range; +--------+-------------------+-----------------+-------+---------------------+ | inc_id | Inc_tital | inc_desc | owner | created_date | +--------+-------------------+-----------------+-------+---------------------+ | 1 | login issue | unable to login | lalit | 1999-06-26 12:33:21 | | 2 | print issue | unable to print | XYZ | 2000-09-28 12:33:21 | | 3 | print issue again | | maren | 2001-10-26 02:23:21 | | 4 | scan issue | NULL | maren | 2005-06-26 02:23:21 | | 5 | scan issue | NULL | abc | 2010-07-26 02:23:21 | | 6 | scan issue | NULL | abc | 2013-07-26 02:23:21 | | 7 | scan issue | NULL | abc | 2015-07-26 02:23:21 | | 8 | urgent | NULL | NULL | 2016-07-26 02:23:21 | | 9 | test | NULL | NULL | 2017-09-28 12:38:35 | +--------+-------------------+-----------------+-------+---------------------+ 9 rows in set (0.00 sec) mysql> SELECT * FROM pt_range PARTITION (p5); +--------+------------+----------+-------+---------------------+ | inc_id | Inc_tital | inc_desc | owner | created_date | +--------+------------+----------+-------+---------------------+ | 6 | scan issue | NULL | abc | 2013-07-26 02:23:21 | +--------+------------+----------+-------+---------------------+ 1 row in set (0.00 sec) mysql> SELECT count(*) FROM pt_range WHERE YEAR(created_date)='2000'; +----------+ | count(*) | +----------+ | 1 | +----------+ 1 row in set (0.00 sec) mysql> SELECT * FROM pt_range PARTITION (p1); +--------+-------------------+-----------------+-------+---------------------+ | inc_id | Inc_tital | inc_desc | owner | created_date | +--------+-------------------+-----------------+-------+---------------------+ | 2 | print issue | unable to print | XYZ | 2000-09-28 12:33:21 | | 3 | print issue again | | maren | 2001-10-26 02:23:21 | +--------+-------------------+-----------------+-------+---------------------+ 2 rows in set (0.00 sec)

SELECT Analysis using EXPLAIN:

mysql> explain SELECT * FROM pt_range WHERE YEAR(created_date)='2000'; +----+-------------+----------+----------------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+----------------------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | pt_range | p0,p1,p3,p4,p5,p6,p7 | ALL | NULL | NULL | NULL | NULL | 6 | 100.00 | Using where | +----+-------------+----------+----------------------+------+---------------+------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) mysql> explain SELECT * FROM pt_range PARTITION (p1) WHERE YEAR(created_date)='2000'; +----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | pt_range | p1 | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Using where | +----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) NULL Column value and MySQL Partition:

Ideally, it’s better to avoid  NULL value in partition columns by enforcing the NOT NULL constraint, where ever it possible.

If NULL is allowed for partition column then, partition will handle null values as follows,

Type of Partition Handling NULL Example RANGE Partitioning For partition column NULL value will be inserted into lower value partition. PARTITION p0 VALUES LESS THAN (2000),

PARTITION p1 VALUES LESS THAN (2003),

PARTITION p2 VALUES LESS THAN (2006)

NULL Value will go into partition p0

LIST Partitioning NULL should be part of one of the partition list value.

If NULL is not included in any partition list, Insert operation will fail with the error.

PARTITION BY LIST (status_id) (

PARTITION p1 VALUES IN (1,2)

INSERT INTO pt_list(status_id) values(NULL);

ERROR 1504 (HY000): Table has no partition for value NULL

ALTER TABLE pt_list

REORGANIZE PARTITION p1 INTO (PARTITION p1 VALUES IN (1,2,NULL));

Now NULL values will go into p1 partition.

HASH  and KEY Partitioning For both types of portioning NULL treated as 0 (zero) Hence NULL values will go to very first partition (eg . p0) Partition table management:

Adding new partition:

CREATE TABLE `pt_list` ( `inc_id` bigint(20) NOT NULL, `Inc_tital` varchar(250) NOT NULL, `inc_desc` text, `owner` varchar(250) DEFAULT NULL, `created_date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, `status_id` tinyint(4) NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 PARTITION BY LIST (status_id) (PARTITION p1 VALUES IN (1,2) ENGINE = InnoDB, PARTITION p2 VALUES IN (3) ENGINE = InnoDB, PARTITION p3 VALUES IN (4,5) ENGINE = InnoDB); mysql> ALTER TABLE pt_list ADD PARTITION (PARTITION p4 VALUES IN (6,7,8)); Query OK, 0 rows affected (0.02 sec) Query OK, 0 rows affected (0.02 sec) Records: 0 Duplicates: 0 Warnings: mysql> show create table pt_list\G *************************** 1. row *************************** Table: pt_list Create Table: CREATE TABLE `pt_list` ( `inc_id` bigint(20) NOT NULL, `Inc_tital` varchar(250) NOT NULL, `inc_desc` text, `owner` varchar(250) DEFAULT NULL, `created_date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, `status_id` tinyint(4) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 /*!50100 PARTITION BY LIST (status_id) (PARTITION p1 VALUES IN (1,2) ENGINE = InnoDB, PARTITION p2 VALUES IN (3) ENGINE = InnoDB, PARTITION p3 VALUES IN (4,5) ENGINE = InnoDB, PARTITION p4 VALUES IN (6,7,8) ENGINE = InnoDB) */ 1 row in set (0.00 sec)

Alter/Change Partition:

mysql> ALTER TABLE pt_list REORGANIZE PARTITION p1 INTO (PARTITION p1 VALUES IN (1,2,NULL)); Query OK, 0 rows affected (0.11 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> show create table pt_list\G *************************** 1. row *************************** Table: pt_list Create Table: CREATE TABLE `pt_list` ( `inc_id` bigint(20) NOT NULL, `Inc_tital` varchar(250) NOT NULL, `inc_desc` text, `owner` varchar(250) DEFAULT NULL, `created_date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, `status_id` tinyint(4) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 /*!50100 PARTITION BY LIST (status_id) (PARTITION p1 VALUES IN (NULL,1,2) ENGINE = InnoDB, PARTITION p2 VALUES IN (3) ENGINE = InnoDB, PARTITION p3 VALUES IN (4,5) ENGINE = InnoDB, PARTITION p4 VALUES IN (6,7,8) ENGINE = InnoDB) */ 1 row in set (0.00 sec)

 

DROP Partition:

mysql> show create table pt_range\G *************************** 1. row *************************** Table: pt_range Create Table: CREATE TABLE `pt_range` ( `inc_id` bigint(20) NOT NULL AUTO_INCREMENT, `Inc_tital` varchar(250) NOT NULL, `inc_desc` text, `owner` varchar(250) DEFAULT NULL, `created_date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`inc_id`,`created_date`) ) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=utf8 /*!50100 PARTITION BY RANGE ( YEAR(created_date)) (PARTITION p0 VALUES LESS THAN (2000) ENGINE = InnoDB, PARTITION p1 VALUES LESS THAN (2003) ENGINE = InnoDB, PARTITION p2 VALUES LESS THAN (2006) ENGINE = InnoDB, PARTITION p3 VALUES LESS THAN (2009) ENGINE = InnoDB, PARTITION p4 VALUES LESS THAN (2012) ENGINE = InnoDB, PARTITION p5 VALUES LESS THAN (2015) ENGINE = InnoDB, PARTITION p6 VALUES LESS THAN (2018) ENGINE = InnoDB, PARTITION p7 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ 1 row in set (0.00 sec) mysql> select * from pt_range; +--------+-------------------+-----------------+-------+---------------------+ | inc_id | Inc_tital | inc_desc | owner | created_date | +--------+-------------------+-----------------+-------+---------------------+ | 1 | login issue | unable to login | lalit | 1999-06-26 12:33:21 | | 2 | print issue | unable to print | XYZ | 2000-09-28 12:33:21 | | 3 | print issue again | | maren | 2001-10-26 02:23:21 | | 4 | scan issue | NULL | maren | 2005-06-26 02:23:21 | | 5 | scan issue | NULL | abc | 2010-07-26 02:23:21 | | 6 | scan issue | NULL | abc | 2013-07-26 02:23:21 | | 7 | scan issue | NULL | abc | 2015-07-26 02:23:21 | | 8 | urgent | NULL | NULL | 2016-07-26 02:23:21 | | 9 | test | NULL | NULL | 2017-09-28 12:38:35 | +--------+-------------------+-----------------+-------+---------------------+ 9 rows in set (0.00 sec) mysql> ALTER TABLE pt_range DROP PARTITION p2; Query OK, 0 rows affected (0.01 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> select * from pt_range; +--------+-------------------+-----------------+-------+---------------------+ | inc_id | Inc_tital | inc_desc | owner | created_date | +--------+-------------------+-----------------+-------+---------------------+ | 1 | login issue | unable to login | lalit | 1999-06-26 12:33:21 | | 2 | print issue | unable to print | XYZ | 2000-09-28 12:33:21 | | 3 | print issue again | | maren | 2001-10-26 02:23:21 | | 5 | scan issue | NULL | abc | 2010-07-26 02:23:21 | | 6 | scan issue | NULL | abc | 2013-07-26 02:23:21 | | 7 | scan issue | NULL | abc | 2015-07-26 02:23:21 | | 8 | urgent | NULL | NULL | 2016-07-26 02:23:21 | | 9 | test | NULL | NULL | 2017-09-28 12:38:35 | +--------+-------------------+-----------------+-------+---------------------+ 8 rows in set (0.00 sec) mysql> show create table pt_range\G *************************** 1. row *************************** Table: pt_range Create Table: CREATE TABLE `pt_range` ( `inc_id` bigint(20) NOT NULL AUTO_INCREMENT, `Inc_tital` varchar(250) NOT NULL, `inc_desc` text, `owner` varchar(250) DEFAULT NULL, `created_date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`inc_id`,`created_date`) ) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=utf8 /*!50100 PARTITION BY RANGE ( YEAR(created_date)) (PARTITION p0 VALUES LESS THAN (2000) ENGINE = InnoDB, PARTITION p1 VALUES LESS THAN (2003) ENGINE = InnoDB, PARTITION p3 VALUES LESS THAN (2009) ENGINE = InnoDB, PARTITION p4 VALUES LESS THAN (2012) ENGINE = InnoDB, PARTITION p5 VALUES LESS THAN (2015) ENGINE = InnoDB, PARTITION p6 VALUES LESS THAN (2018) ENGINE = InnoDB, PARTITION p7 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ 1 row in set (0.00 sec)

Advantages of partitioning:

  • Allow to store more data efficiently.
  • An unused portion of data can be easily removed from a table.
  • Due to the partitioning of data on physical level Read/Write operations are faster as compare non-partition table.

 

Refer MySQL official PARTITIONING_DOC for more…


One Million Tables in MySQL 8.0

In my previous blog post, I talked about new general tablespaces in MySQL 8.0. Recently MySQL 8.0.3-rc was released, which includes a new data dictionary. My goal is to create one million tables in MySQL and test the performance.

Background questions

Q: Why million tables in MySQL? Is it even realistic? How does this happen?

Usually, millions of tables in MySQL is a result of “a schema per customer” Software as a Service (SaaS) approach. For the purposes of customer data isolation (security) and logical data partitioning (performance), each “customer” has a dedicated schema. You can think of a WordPress hosting service (or any CMS based hosting) where each customer has their own dedicated schema. With 10K customers per MySQL server, we could end up with millions of tables.

Q: Should you design an application with >1 million tables?

Having separate tables is one of the easiest designs for a multi-tenant or SaaS application, and makes it easy to shard and re-distribute your workload between servers. In fact, the table-per-customer or schema-per-customer design has the quickest time-to-market, which is why we see it a lot in consulting. In this post, we are not aiming to cover the merits of should you do this (if your application has high churn or millions of free users, for example, it might not be a good idea). Instead, we will focus on if the new data dictionary provides relief to a historical pain point.

Q: Why is one million tables a problem?

The main issue results from the fact that MySQL needs to open (and eventually close) the table structure file (FRM file). With one million tables, we are talking about at least one million files. Originally MySQL fixed it with table_open_cache and table_definition_cache. However, the maximum value for table_open_cache is 524288. In addition, it is split into 16 partitions by default (to reduce the contention). So it is not ideal. MySQL 8.0 has removed FRM files for InnoDB, and will now allow you to create general tablespaces. I’ve demonstrated how we can create tablespace per customer in MySQL 8.0, which is ideal for “schema-per-customer” approach (we can move/migrate one customer data to a new server by importing/exporting the tablespace).

One million tables in MySQL 5.7

Recently, I’ve created the test with one million tables. The test creates 10K databases, and each database contains 100 tables. To use a standard benchmark I’ve employed sysbench table structure.

mysql> select count(*) from information_schema.schemata where schema_name like 'test_sbtest%'; +----------+ | count(*) | +----------+ | 10000 | +----------+ 1 row in set (0.01 sec) mysql> select count(*) from information_schema.tables where table_schema like 'test_sbtest%'; +----------+ | count(*) | +----------+ | 1000000 | +----------+ 1 row in set (4.61 sec)

This also creates a huge overhead: with one million tables we have ~two million files. Each .frm file and .ibd file size sums up to 175G:

# du -sh /ssd/mysql_57 175G /ssd/mysql_57

Now I’ve used sysbench Lua script to insert one row randomly into one table

pathtest = "/usr/share/sysbench/tests/include/oltp_legacy/" if pathtest then dofile(pathtest .. "common.lua") else require("common") end function event() local table_name local i local c_val local k_val local pad_val local oltp_tables_count = 100 local oltp_db_count = 10000 table_name = "test_sbtest_" .. sb_rand_uniform(1, oltp_db_count) .. ".sbtest".. sb_rand_uniform(1, oltp_tables_count) k_val = sb_rand(1, oltp_table_size) c_val = sb_rand_str([[ ###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]]) pad_val = sb_rand_str([[ ###########-###########-###########-###########-###########]]) rs = db_query("INSERT INTO " .. table_name .. " (id, k, c, pad) VALUES " .. string.format("(%d, %d, '%s', '%s')", i, k_val, c_val, pad_val)) end end

With:

local oltp_tables_count = 100 local oltp_db_count = 10000

Sysbench will choose one table randomly out of one million. With oltp_tables_count = 1 and oltp_db_count = 100, it will only choose the first table (sbtest1) out of the first 100 databases (randomly).

As expected, MySQL 5.7 has a huge performance degradation when going across one million tables. When running a script that only inserts data into 100 random tables, we can see ~150K transactions per second. When the data is inserted in one million tables (chosen randomly) performance drops to 2K (!) transactions per second:

Insert into 100 random tables:

SQL statistics: queries performed: read: 0 write: 16879188 other: 0 total: 16879188 transactions: 16879188 (140611.72 per sec.) queries: 16879188 (140611.72 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.)

Insert into one million random tables:

SQL statistics: queries performed: read: 0 write: 243533 other: 0 total: 243533 transactions: 243533 (2029.21 per sec.) queries: 243533 (2029.21 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.)

This is expected. Here I’m testing the worse case scenario, where we can’t keep all table open handlers and table definitions in cache (memory) since the table_open_cache and table_definition_cache both have a limit of 524288.

Also, normally we can expect a huge skew between access to the tables. There can be only 20% active customers (80-20 rule), meaning that we can only expect an active access to 2K databases. In addition, there will be old or unused tables so we can expect around 100K or less of active tables.

Hardware and config files

The above results are from this server:

Processors | 64xGenuine Intel(R) CPU @ 2.00GHz Memory Total | 251.8G Disk | Samsung 950 Pro PCIE SSD (nvme)

Sysbench script:

sysbench $conn --report-interval=1 --num-threads=32 --max-requests=0 --max-time=600 --test=/root/drupal_demo/insert_custom.lua run

My.cnf:

innodb_buffer_pool_size = 100G innodb_io_capacity=20000 innodb_flush_log_at_trx_commit = 0 innodb_log_file_size = 2G innodb_flush_method=O_DIRECT_NO_FSYNC skip-log-bin open_files_limit=1000000 table_open_cache=524288 table_definition_cache=524288

One million tables in MySQL 8.0 + general tablespaces

In MySQL 8.0 is it easy and logical to create one general tablespace per each schema (it will host all tables in this schema). In MySQL 5.7, general tablespaces are available – but there are still .frm files.

I’ve used the following script to create 100 tables in one schema all in one tablespace:

mysql test -e "CREATE TABLESPACE t ADD DATAFILE 't.ibd' engine=InnoDB;" for i in {1..10000} do mysql test -e "create table ab$i(i int) tablespace t" done

The new MySQL 8.0.3-rc also uses the new data dictionary, so all MyISAM tables in the mysql schema are removed and all metadata is stored in additional mysql.ibd file.

Creating one million tables

Creating InnoDB tables fast enough can be a task by itself. Stewart Smith published a blog post a while ago where he focused on optimizing time to create 30K tables in MySQL.

The problem is that after creating an .ibd file, MySQL needs to “fsync” it. However, when creating a table inside the tablespace, there is no fsync. I’ve created a simple script to create tables in parallel, one thread per database:

#/bin/bash function do_db { mysql -vvv -e "create database $db"; mysql -vvv $db -e "CREATE TABLESPACE $db ADD DATAFILE '$db.ibd' engine=InnoDB;" for i in {1..100} do table="CREATE TABLE sbtest$i ( id int(10) unsigned NOT NULL AUTO_INCREMENT, k int(10) unsigned NOT NULL DEFAULT '0', c varchar(120) NOT NULL DEFAULT '', pad varchar(60) NOT NULL DEFAULT '', PRIMARY KEY (id), KEY k_1 (k) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 tablespace $db;" mysql $db -e "$table" done } c=0 for m in {1..100} do for i in {1..100} do let c=$c+1 echo $c db="test_sbtest_$c" do_db & done wait done

That script works perfectly in MySQL 8.0.1-dmr and creates one million tables in 25 minutes and 28 seconds (1528 seconds). That is ~654 tables per second. That is significantly faster than ~30 tables per second in the original Stewart’s test and 2x faster than a test where all fsyncs were artificially disabled using libeat-my-data library.

Unfortunately, in MySQL 8.0.3-rc some regression was introduced. In MySQL 8.0.3-rc I can see heavy mutex contention, and the table creation speed dropped from 25 minutes to ~280 minutes. I’ve filed a bug report: performance regression: “create table” speed and scalability in 8.0.3.

Size on disk

With general tablespaces and no .frm files, the size on disk decreased:

# du -h -d1 /ssd/ 147G /ssd/mysql_801 119G /ssd/mysql_803 175G /ssd/mysql_57

Please note though that in MySQL 8.0.3-rc, with new native data dictionary, the size on disk increased as it needs to write additional information (Serialized Dictionary Information, SDI) to the tablespace files:

InnoDB: Serialized Dictionary Information (SDI) is now present in all InnoDB tablespace files except for temporary tablespace and undo tablespace files. SDI is serialized metadata for schema, table, and tablespace objects. The presence of SDI data provides metadata redundancy. ... The inclusion of SDI data in tablespace files increases tablespace file size. An SDI record requires a single index page, which is 16k in size by default. However, SDI data is compressed when it is stored to reduce the storage footprint.

The general mysql data dictionary in MySQL 8.0.3 is 6.6Gb:

6.6G /ssd/mysql/mysql.ibd

Benchmarking the insert speed in MySQL 8.0 

I’ve repeated the same test I’ve done for MySQL 5.7 in MySQL 8.0.3-rc (and in 8.0.1-dmr), but using general tablespace. I created 10K databases (=10K tablespace files), each database has100 tables and each database resides in its own tablespace.

There are two new tablespace level caches we can use in MySQL 8.0: tablespace_definition_cache and schema_definition_cache:

tablespace_definition_cache = 15000 schema_definition_cache = 524288

Unfortunately, with one million random table accesses in MySQL 8.0 (both 8.0.1 and 8.0.3), we can still see that it stalls on opening tables (even with no .frm files and general tablespaces):

mysql> select conn_id, current_statement, state, statement_latency, lock_latency from sys.processlist where current_statement is not null and conn_id <> CONNECTION_ID(); +---------+-------------------------------------------------------------------+----------------+-------------------+--------------+ | conn_id | current_statement | state | statement_latency | lock_latency | +---------+-------------------------------------------------------------------+----------------+-------------------+--------------+ | 199 | INSERT INTO test_sbtest_9749.s ... 8079-53209333270-93105555128') | Opening tables | 4.45 ms | 0 ps | | 198 | INSERT INTO test_sbtest_1863.s ... 9574-29782886623-39251573705') | Opening tables | 9.95 ms | 5.67 ms | | 189 | INSERT INTO test_sbtest_3948.s ... 9365-63502117132-66650064067') | Opening tables | 16.29 ms | 15.38 ms | | 190 | INSERT INTO test_sbtest_6885.s ... 8436-41291265610-60894472357') | Opening tables | 13.78 ms | 9.52 ms | | 191 | INSERT INTO test_sbtest_247.sb ... 7467-89459234028-92064334941') | Opening tables | 8.36 ms | 3.18 ms | | 192 | INSERT INTO test_sbtest_9689.s ... 8058-74586985382-00185651578') | Opening tables | 6.89 ms | 0 ps | | 193 | INSERT INTO test_sbtest_8777.s ... 1900-02582963670-01868315060') | Opening tables | 7.09 ms | 5.70 ms | | 194 | INSERT INTO test_sbtest_9972.s ... 9057-89011320723-95018545652') | Opening tables | 9.44 ms | 9.35 ms | | 195 | INSERT INTO test_sbtest_6977.s ... 7902-29158428721-66447528241') | Opening tables | 7.82 ms | 789.00 us | | 196 | INSERT INTO test_sbtest_129.sb ... 2091-86346366083-87657045906') | Opening tables | 13.01 ms | 7.30 ms | | 197 | INSERT INTO test_sbtest_1418.s ... 6581-90894769279-68213053531') | Opening tables | 16.35 ms | 10.07 ms | | 208 | INSERT INTO test_sbtest_4757.s ... 4592-86183240946-83973365617') | Opening tables | 8.66 ms | 2.84 ms | | 207 | INSERT INTO test_sbtest_2152.s ... 5459-55779113235-07063155183') | Opening tables | 11.08 ms | 3.89 ms | | 212 | INSERT INTO test_sbtest_7623.s ... 0354-58204256630-57234862746') | Opening tables | 8.67 ms | 2.80 ms | | 215 | INSERT INTO test_sbtest_5216.s ... 9161-37142478639-26288001648') | Opening tables | 9.72 ms | 3.92 ms | | 210 | INSERT INTO test_sbtest_8007.s ... 2999-90116450579-85010442132') | Opening tables | 1.33 ms | 0 ps | | 203 | INSERT INTO test_sbtest_7173.s ... 2718-12894934801-25331023143') | Opening tables | 358.09 us | 0 ps | | 209 | INSERT INTO test_sbtest_1118.s ... 8361-98642762543-17027080501') | Opening tables | 3.32 ms | 0 ps | | 219 | INSERT INTO test_sbtest_5039.s ... 1740-21004115002-49204432949') | Opening tables | 8.56 ms | 8.44 ms | | 202 | INSERT INTO test_sbtest_8322.s ... 8686-46403563348-31237202393') | Opening tables | 1.19 ms | 0 ps | | 205 | INSERT INTO test_sbtest_1563.s ... 6753-76124087654-01753008993') | Opening tables | 9.62 ms | 2.76 ms | | 213 | INSERT INTO test_sbtest_5817.s ... 2771-82142650177-00423653942') | Opening tables | 17.21 ms | 16.47 ms | | 216 | INSERT INTO test_sbtest_238.sb ... 5343-25703812276-82353892989') | Opening tables | 7.24 ms | 7.20 ms | | 200 | INSERT INTO test_sbtest_2637.s ... 8022-62207583903-44136028229') | Opening tables | 7.52 ms | 7.39 ms | | 204 | INSERT INTO test_sbtest_9289.s ... 2786-22417080232-11687891881') | Opening tables | 10.75 ms | 9.01 ms | | 201 | INSERT INTO test_sbtest_6573.s ... 0106-91679428362-14852851066') | Opening tables | 8.43 ms | 7.03 ms | | 217 | INSERT INTO test_sbtest_1071.s ... 9465-09453525844-02377557541') | Opening tables | 8.42 ms | 7.49 ms | | 206 | INSERT INTO test_sbtest_9588.s ... 8804-20770286377-79085399594') | Opening tables | 8.02 ms | 7.50 ms | | 211 | INSERT INTO test_sbtest_4657.s ... 4758-53442917995-98424096745') | Opening tables | 16.62 ms | 9.76 ms | | 218 | INSERT INTO test_sbtest_9672.s ... 1537-13189199316-54071282928') | Opening tables | 10.01 ms | 7.41 ms | | 214 | INSERT INTO test_sbtest_1391.s ... 9241-84702335152-38653248940') | Opening tables | 21.34 ms | 15.54 ms | | 220 | INSERT INTO test_sbtest_6542.s ... 7778-65788940102-87075246009') | Opening tables | 2.96 ms | 0 ps | +---------+-------------------------------------------------------------------+----------------+-------------------+--------------+ 32 rows in set (0.11 sec)

And the transactions per second drops to ~2K.

Here I’ve expected different behavior. With the .frm files gone and with tablespace_definition_cache set to more than 10K (we have only 10K tablespace files), I’ve expected that MySQL does not have to open and close files. It looks like this is not the case.

I can also see the table opening (since the server started):

mysql> show global status like '%open%'; +----------------------------+-----------+ | Variable_name | Value | +----------------------------+-----------+ | Com_ha_open | 0 | | Com_show_open_tables | 0 | | Innodb_num_open_files | 10040 | | Open_files | 0 | | Open_streams | 0 | | Open_table_definitions | 524288 | | Open_tables | 499794 | | Opened_files | 22 | | Opened_table_definitions | 1220904 | | Opened_tables | 2254648 | | Slave_open_temp_tables | 0 | | Table_open_cache_hits | 256866421 | | Table_open_cache_misses | 2254643 | | Table_open_cache_overflows | 1254766 | +----------------------------+-----------+

This is easier to see on the graphs from PMM. Insert per second for the two runs (both running 16 threads):

  1. The first run is 10K random databases/tablespaces and one table (sysbench is choosing table#1 from a randomly chosen list of 10K databases). This way there is also no contention on the tablespace file.
  2. The second run is a randomly chosen table from a list of one million tables.

As we can see, the first run is dong 50K -100K inserts/second. Second run is only limited to ~2.5 inserts per second:

“Table open cache misses” grows significantly after the start of the second benchmark run:

As we can see, MySQL performs ~1.1K table definition openings per second and has ~2K table cache misses due to the overflow:

When inserting against only 1K random tables (one specific table in a random database, that way we almost guarantee that one thread will always write to a different tablespace file), the table_open_cache got warmed up quickly. After a couple of seconds, the sysbench test starts showing > 100K tps. The processlist looks much better (compare the statement latency and lock latency to the above as well):

mysql> select conn_id, current_statement, state, statement_latency, lock_latency from sys.processlist where current_statement is not null and conn_id <> CONNECTION_ID(); +---------+-------------------------------------------------------------------+----------------+-------------------+--------------+ | conn_id | current_statement | state | statement_latency | lock_latency | +---------+-------------------------------------------------------------------+----------------+-------------------+--------------+ | 253 | INSERT INTO test_sbtest_3293.s ... 2282-95400708146-84684851551') | starting | 22.72 us | 0 ps | | 254 | INSERT INTO test_sbtest_3802.s ... 4030-35983148190-23616685226') | update | 62.88 us | 45.00 us | | 255 | INSERT INTO test_sbtest_5290.s ... 2361-58374942527-86207214617') | Opening tables | 36.07 us | 0 ps | | 256 | INSERT INTO test_sbtest_5684.s ... 4717-34992549120-04746631452') | Opening tables | 37.61 us | 37.00 us | | 257 | INSERT INTO test_sbtest_5088.s ... 5637-75275906887-76112520982') | starting | 22.97 us | 0 ps | | 258 | INSERT INTO test_sbtest_1375.s ... 8592-24036624620-65536442287') | query end | 98.66 us | 35.00 us | | 259 | INSERT INTO test_sbtest_8764.s ... 8566-02569157908-49891861265') | Opening tables | 47.13 us | 37.00 us | | 260 | INSERT INTO test_sbtest_560.sb ... 2605-08226572929-25889530906') | query end | 155.64 us | 38.00 us | | 261 | INSERT INTO test_sbtest_7776.s ... 0243-86335905542-37976752368') | System lock | 46.68 us | 32.00 us | | 262 | INSERT INTO test_sbtest_6551.s ... 5496-19983185638-75401382079') | update | 74.07 us | 40.00 us | | 263 | INSERT INTO test_sbtest_7765.s ... 5428-29707353898-77023627427') | update | 71.35 us | 45.00 us | | 265 | INSERT INTO test_sbtest_5771.s ... 7065-03531013976-67381721569') | query end | 138.42 us | 39.00 us | | 266 | INSERT INTO test_sbtest_8603.s ... 7158-66470411444-47085285977') | update | 64.00 us | 36.00 us | | 267 | INSERT INTO test_sbtest_3983.s ... 5039-55965227945-22430910215') | update | 21.04 ms | 39.00 us | | 268 | INSERT INTO test_sbtest_8186.s ... 5418-65389322831-81706268892') | query end | 113.58 us | 37.00 us | | 269 | INSERT INTO test_sbtest_1373.s ... 1399-08304962595-55155170406') | update | 131.97 us | 59.00 us | | 270 | INSERT INTO test_sbtest_7624.s ... 0589-64243675321-62971916496') | query end | 120.47 us | 38.00 us | | 271 | INSERT INTO test_sbtest_8201.s ... 6888-31692084119-80855845726') | query end | 109.97 us | 37.00 us | | 272 | INSERT INTO test_sbtest_7054.s ... 3674-32329064814-59707699237') | update | 67.99 us | 35.00 us | | 273 | INSERT INTO test_sbtest_3019.s ... 1740-35410584680-96109859552') | update | 5.21 ms | 33.00 us | | 275 | INSERT INTO test_sbtest_7657.s ... 4985-72017519764-59842283878') | update | 88.91 us | 48.00 us | | 274 | INSERT INTO test_sbtest_8606.s ... 0580-38496560423-65038119567') | freeing items | NULL | 37.00 us | | 276 | INSERT INTO test_sbtest_9349.s ... 0295-94997123247-88008705118') | starting | 25.74 us | 0 ps | | 277 | INSERT INTO test_sbtest_3552.s ... 2080-59650597118-53885660147') | starting | 32.23 us | 0 ps | | 278 | INSERT INTO test_sbtest_3832.s ... 1580-27778606266-19414961452') | freeing items | 194.14 us | 51.00 us | | 279 | INSERT INTO test_sbtest_7685.s ... 0234-22016898044-97277319766') | update | 62.66 us | 40.00 us | | 280 | INSERT INTO test_sbtest_6026.s ... 2629-36599580811-97852201188') | Opening tables | 49.41 us | 37.00 us | | 281 | INSERT INTO test_sbtest_8273.s ... 7957-39977507737-37560332932') | update | 92.56 us | 36.00 us | | 283 | INSERT INTO test_sbtest_8584.s ... 7604-24831943860-69537745471') | starting | 31.20 us | 0 ps | | 284 | INSERT INTO test_sbtest_3787.s ... 1644-40368085836-11529677841') | update | 100.41 us | 40.00 us | +---------+-------------------------------------------------------------------+----------------+-------------------+--------------+ 30 rows in set (0.10 sec)

What about the 100K random tables? That should fit into the table_open_cache. At the same time, the default 16 table_open_cache_instances split 500K table_open_cache, so each bucket is only ~30K. To fix that, I’ve set table_open_cache_instances = 4 and was able to get ~50K tps average. However, the contention inside the table_open_cache seems to stall the queries:

There are only a very limited amount of table openings:

 

Conclusion

MySQL 8.0 general tablespaces looks very promising. It is finally possible to create one million tables in MySQL without the need to create two million files. Actually, MySQL 8 can handle many tables very well as long as table cache misses are kept to a minimum.

At the same time, the problem with “Opening tables” (worst case scenario test) still persists in MySQL 8.0.3-rc and limits the throughput. I expected to see that MySQL does not have to open/close the table structure file. I also hope the create table regression bug is fixed in the next MySQL 8.0 version.

I’ve not tested other new features in the new data dictionary in 8.0.3-rc: i.e., atomic DDL (InnoDB now supports atomic DDL, which ensures that DDL operations are either committed in their entirety or rolled back in case of an unplanned server stoppage). That is the topic of the next blog post.

MySQL Enterprise Monitor 4.0.0 GA has been released

We are very happy to announce the general availability of MySQL Enterprise Monitor, Version 4.0. MySQL Enterprise Monitor is the best-in-class tool for monitoring and management of your MySQL assets and is included with your MySQL Enterprise Edition and MySQL Enterprise Carrier Grade subscriptions.

You can find more information on the contents of this release in the change log.

Highlights of MySQL Enterprise Monitor 4.0 include:

  • Modern look and feel: a redesigned user interface delivers a vastly improved overall user experience. The visible changes--the layout, the icons, the and the overall aesthetics--provide a more natural and intuitive experience. Views dynamically change and adjust to your current context and the assets you've selected, everything from individual MySQL instances or hosts to your custom Groups, to your complex replication and clustered topologies. Additional enhancements include a much more responsive UI and a greater capacity to scale, allowing you to more effectively manage thousands of MySQL related assets.
  • MySQL Cluster monitoring: we now auto-discover your MySQL Cluster installations and give you visibility into the performance, availability, and health of each MySQL instance and NDB process, as well as the health of the MySQL Cluster instance as a single logical system. The Overview dashboard displays detailed instrumentation available for each MySQL Cluster and the Topology dashboard displays the current configuration of your MySQL Cluster enabling you to quickly see the status of the MySQL Cluster instance as a whole and each individual process. The Topology dashboard allows you to easily see how your MySQL Cluster installations are currently functioning.
  • A User Statistics report: which provides an easy way to monitor MySQL resource utilization broken down by user.

You will find binaries for the new release on My Oracle Support. Choose the "Patches & Updates" tab, and then choose the "Product or Family (Advanced Search)" side tab in the "Patch Search" portlet.

You will also find the binaries on the Oracle Software Delivery Cloud soon.  Type "MySQL Enterprise Monitor 4.0.0" in the search box, or enter a license name to find Enterprise Monitor along with other MySQL products: "MySQL Enterprise Edition" or "MySQL Cluster Carrier Edition".  Then select your platform.

Please open a bug or a ticket on My Oracle Support to report problems, request features, or give us general feedback about how this release meets your needs.

If you are not a MySQL Enterprise customer and want to try the Monitor and Query Analyzer using our 30-day free customer trial, go to http://www.mysql.com/trials, or contact Sales at http://www.mysql.com/about/contact.

Thanks and Happy Monitoring!

- The MySQL Enterprise Tools Development Team

Useful URLs

This Week in Data with Colin Charles 8: Percona Live Europe 2017 Is a Wrap!

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Percona Live Europe 2017 Dublin

We’ve spent a lot of time in the last few months organizing Percona Live Europe Dublin. I want to thank all the speakers, sponsors and attendees for helping us to pull off yet another great event. While we’ll provide some perspectives, thoughts and feedback soon, all the early mornings, jam-packed meetings and the 4 am bedtimes means I’ll probably talk about this event in my next column!

In the meantime, save the date for Percona Live Santa Clara, April 23-25 2018. The call for papers will open in October 2017.

Releases Link List Upcoming appearances

Percona’s website keeps track of community events, so check out where to listen to a Perconian speak. My upcoming appearances are:

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Percona Monitoring and Management 1.3.1 Is Now Available

Percona announces the release of Percona Monitoring and Management 1.3.1. This release only contains bug fixes related to usability.

For install and upgrade instructions, see Deploying Percona Monitoring and Management.

Bug fixes
  • PMM-1271: In QAN, when the user selected a database host with no queries, the query monitor could still show metrics.
  • PMM-1512: When clicking the QAN in GrafanaQAN would open the home page. Now, QAN opens and automatically selects the database host and time range active in Grafana.
  • PMM-1523: User-defined Prometheus memory settings were not honored, potentially causing performance issues in high load environments.

Other bug fixes in this release: PMM-1452PMM-1515.

About Percona Monitoring and Management

Percona Monitoring and Management (PMM) is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

A live demo of PMM is available at pmmdemo.percona.com.

We’re always happy to help! Please provide your feedback and questions on the PMM forum.

If you would like to report a bug or submit a feature request, please use the PMM project in JIRA.

How to set up and use MariaDB Connector C

I just wanted to provide quick tutorial on using MariaDB’s Connector C.

I downloaded the latest Connector C (3.0.2), running MariaDB 10.1.28, and was able to get it to work fine with a simple C program using the following commands:

1. Downloaded the Connector C .msi file (32-bit, since my VS is 32-bit), extracted, and installed, which placed it at:

C:\Program Files (x86)\MariaDB\MariaDB Connector C\lib

2. You need to add the directory that contains libmaria.dll to your $PATH Environment LIB PATH variable. In my case, it was:

Control Panel -> System -> Advanced System Settings -> Environment Variables -> Choose "LIB" from under "System variables" -> then add the Connector C lib path, like: C:\Program Files (x86)\MariaDB\MariaDB Connector C\lib

3. Then just run the following command, where my c/c++ program name is “mysql1.c”:

cl /I "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\include" /I "C:\Program Files (x86)\mariadb-10.1.25\include\mysql" /I "C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Include" /MD "C:\Program Files (x86)\MariaDB\MariaDB Connector C\lib\libmariadb.lib" mysql1.c

Note the first path is to include all normal C headers like stdio.h, the second for mysql.h, and the third is for windows.h, and the last for the Connector C .lib.

Here is the actual session output:

C:\chris\mysql1> cl /I "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\include" /I "C:\Program Files (x86)\mariadb-10.1.25\include\mysql" /I "C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Include" /MD "C:\Program Files (x86)\MariaDB\MariaDB Connector C\lib\libmariadb.lib" mysql1.c Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. mysql1.c Microsoft (R) Incremental Linker Version 10.00.40219.01 Copyright (C) Microsoft Corporation. All rights reserved. /out:mysql1.exe C:\chris\mysql1\libmariadb.lib mysql1.obj

4. If all is successful, as the above, you can invoke your newly created c/c++ program, mysql1.exe, in this case (mine just obtains the version number from the instance):

C:\chris\mysql1>mysql1 MySQL Version = 100128

For reference, here is my mysql1.c code:

#include #include #include MYSQL *conn; int version = 1; int main ( int argc, char *argv[] ) { conn = mysql_init ( NULL ); mysql_real_connect ( conn, "localhost", "root", "xxxxx", "test", 3316, NULL, 0 ); version = mysql_get_server_version( conn ); printf("\nMySQL Version = %d\n",version); mysql_close ( conn ); return 0; }

Previous related posts, if interested:

  1. Creating a basic C/C++ Program to Interact with MySQL and MariaDB
  2. Common Errors and Resolutions for Building your own MySQL or MariaDB C/C++ Program on Windows
  3. Setting Up Connector/C and SkySQL C Connector for MySQL and MariaDB

I hope this helps.

Watch the Replay: MySQL on Docker - Understanding the Basics

Thanks to everyone who joined us this week as we broadcast our MySQL on Docker webinar live from the Percona Live Conference in Dublin!

Our colleague Ashraf Sharif discussed how Docker containers work through to running a simple MySQL container as well as the ClusterControl Docker image (amongst other things)

If you missed the session or would like to watch it again, it’s now online for viewing.

Watch replay

Related resources  MySQL on Docker - Blog Series  Severalnines on Docker Hub  ClusterControl Docker Image

Here’s the full agenda of the topics that were covered during the webinar. The content is aimed at MySQL users who are Docker beginners and who would like to understand the basics of running a MySQL container on Docker.

  • Docker and its components
  • Concept and terminology
  • How a Docker container works
  • Advantages and disadvantages
  • Stateless vs stateful
  • Docker images for MySQL
  • Running a simple MySQL container
  • The ClusterControl Docker image
  • Severalnines on Docker Hub

Watch replay

And if you’re not following our Docker blog series yet, we encourage you to do so: MySQL on Docker.

Tags:  mysql on docker docker mysql containers MySQL

Handling Encrypted InnoDB Backups

      ​In this post, we are going to see how we can backup encrypted tables using Xtrabackup. InnoDB supports data encryption for InnoDB tables stored in file per table tablespaces. For the application to access encrypted tablespace, InnoDB will use master encryption key to decrypt the tablespace key. The master encryption key is stored in a keyring file in the location specified by the key_ring_file_data configuration option. We have already discussed on enabling encrypted tablespace. Here, we will try full and incremental backups of encrypted tablespace.

Percona xtrabackup supports encrypted innodb tablespace backups. While taking backup, we have to add –keyring-file-data and –server-id. After the completion of the backup, we have to use the same options to prepare the backup. Below is an example of encrypted table,

Backup can be taken without using –keyring-file-data and –server-id and the same can be done for apply log as well. But, after restoring the backup we cant access the encrypted table. It will throw error like below,

FULL BACKUP

So, proper encrypted backup should be taken by specifying the path to keyring file and server-id. 

innobackupex –user=root –password=pass –safe-slave-backup  –keyring-file-data=/var/lib/mysql_keyring/keyring –server-id=176457 /path/to/backup

Once the backup is completed, we have prepare the backup by passing the same options. Its needed to specify keyring file data but not server id  as it can be obtained from backup-my.cnf. Xtrabackup will not copy keyring file into the backup directory. In order to prepare the backup, you must make a copy of keyring file yourself.

innobackupex –apply-log –use-memory=20G –keyring-file data=/var/lib/mysql_keyring/keyring  /path/to/backup

After finishing applying the logs, you should can see the below message,

InnoDB: Shutdown completed; log sequence number 21647652
170528 19:23:15 completed OK!

Once the backup is prepared, it can be restored either using –move-back or –copy-back. If the keyring file is rotated during the backup, we have to use the same keyring file which was used to backup and prepare backup. After successful restoration, we can see the data.

If the master key file is deleted or moved, then the encrypted table cant be accessed. So, it must be stored properly.

INCREMENTAL BACKUP

To perform incremental backup, we need a full backup. The Xtrabackup binary writes a file called xtrabackup_checkpoints  in the backup directory.

After the full backup, the incremental backup will be based on that.Even here the xtrabackup will not copy keyring file into the backup directory. In order to be prepare the backup, you must make a copy of keyring file yourself. If the keyring hasn’t been rotated you can use the same as the one you’ve backed-up with the base backup. If the keyring has been rotated you’ll need to back it up otherwise you won’t be able to prepare the backup.

innobackupex–user=root –password=pass –incremental –incremental-basedir=/path/to/base/backup/  –keyring-file-data=/var/lib/mysql_keyring/keyring –server-id=176457 /path/to/inc/backup

Now the incremental backup directory will have the delta files, which represents the changes.

The apply log phase in incremental backup doesn’t work like full backup. Here, you must use –apply-log-only to rollback phase. Then we have to prepare the base and incremental backup.

innobackupex –user=root –password=pass –apply-log-only –use-memory=10G –keyring-file-data=/var/lib/mysql_keyring/keyring/  /path/to/base/backup/

After preparing the base backup, the incremental backup should applied from the base backup.

 innobackupex–user=root –password=pass –apply-log-only –incremental-dir=/path/to/inc/backup –keyring-file-data=/var/lib/mysql_keyring/keyring/  /path/to/base/backup/

Once it is successful, it can be restored. If there is any issue in accessing the encrypted tables, then either the file should have rotated or it might have removed/moved. So, it us necessary to check the encrypted tables after the restoration.


Lesson 05: Basic SQL (in MySQL)

Notes/errata/updates for Chapter 5:
See the official book errata at http://tahaghoghi.com/LearningMySQL/errata.php – Chapter 5 is pages 135-177.

Notes:
If you did not load the sample databases in Chapter 3 as per page 102, you can download the file to import from http://tahaghoghi.com/LearningMySQL/Downloads/Files/Data/SQL_files_with_foreign_key_references/music.sql

On page 162, it says “In Chapter 6, we’ll discuss how to load data from existing tables or from external data sources.” but this content is in Chapter 8.

On page 169, it has an example of “DELETE FROM played;” which deletes all the rows of a table. This will not work if safe_updates are enabled. I always advise that if you truly want to do that, ALWAYS use a WHERE clause, for example: DELETE FROM played WHERE 1=1;. In this way, your query is self-documenting – it tells the reader that you intended to delete all rows. If you just have “DELETE FROM played” it is possible you meant to put a WHERE clause in but forgot.

Similarly, on page 171, it shows an example where an UPDATE with no WHERE clause is used to update all rows. I prefer the syntax UPDATE tbl SET column=value WHERE 1=1 – this broadcasts your intent.

Topics covered:
How to get table metadata with SHOW
Basic SELECT statements
– Choosing columns
– Filtering with WHERE clauses
– ORDER BY
– LIMIT
Joining tables with INNER JOIN
INSERT and INSERT IGNORE
DELETE
UPDATE
TRUNCATE

Reference/Quick Links for MySQL Marinate

Pages