Planet MySQL

Announcing Galera Cluster Security Release for MySQL 5.5.59, 5.6.39, 5.7.21 with Galera 3.23.

Codership is pleased to announce the release of Galera Replication library 3.23, implementing wsrep API version 25.

This release incorporates all changes up to MySQL 5.7.21, MySQL 5.6.39 and MySQL 5.5.59, including several fixes to vulnerabilities reported by Oracle in here.

New features and notable fixes in Galera replication since last binary release
by Codership (3.22):

 

Notable bug fixes in MySQL 5.7.21:

 

Notable bug fixes in 5.6.39 :

 

READ THE RELEASE NOTES:

Galera Replication library 3.23

MySQL-wsrep 5.5.59

MySQL-wsrep 5.6.39

MySQL-wsrep 5.7.21

 

 

 

This Week in Data with Colin Charles 27: Percona Live Tutorials Released and a Comprehensive Review of the FOSDEM MySQL DevRoom

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Percona Live Santa Clara 2018 update: tutorials have been announced. The committee rated over 300+ talks, and easily 70% of the schedule should go live next week as well. In practice, then, you should see about 50 talks announced next week. There’s been great competition: we only have 70 slots in total, so about 1 in 5 talks get picked — talk about a competitive ratio.

FOSDEM

FOSDEM was truly awesome last week. From a Percona standpoint, we had a lot of excellent booth traffic (being outside of the PostgreSQL room on Saturday, and not too far out from the MySQL room on Sunday). We gave away bottle openers — useful in Brussels with all the beer; we tried a new design with a magnet to attach it to your fridge — stickers, some brochures, but most of all we had plenty of great conversations. There was quite a crowd from Percona, and it was excellent to see the MySQL & Friends DevRoom almost constantly full! A few of us brave souls managed to stay there the whole day, barely with any breaks, so as to enjoy all the talks.

I find the quality of talks to be extremely high. And when it comes to a community run event, with all content picked by an independent program committee, FOSDEM really sets the bar high. There is plenty of competition to get a good talk in, and I enjoyed everything we picked (yes, I was on the committee too). We’ve had plenty of events in the ecosystem that sort of had “MySQL” or related days, but FOSDEM might be the only one that has really survived. I understand we will have a day of some sort at SCALE16x, but even that has been scaled down. So if you care about the MySQL ecosystem, you will really want to ensure that you are at FOSDEM next year.

This year, we started with the usual MySQL Day on Friday. I could not be present, as I was at the CentOS Dojo, giving a presentation. So, the highlight of Friday for me? The community dinner. Over 80 people showed up, I know there was a waiting list, and lots of people were trying to get tickets at the last minute. Many missed out too; sorry, better luck next year; and also, hopefully, we will get a larger venue going forward. I really thank the organizers for this — we affectionately refer to them as the Belconians (i.e. a handful of Perconians based in Belgium). The conversation, the food, the drink —  they were all excellent. It’s good to see representation from all parts of the community: MySQL, Percona, MariaDB, Pythian, and others. So thank you again, Liz, Dimitri, Tom, and Kenny in absentia. I think Tjerk also deserves special mention for always helping (this year with the drinks)

As for FOSDEM itself, beyond the booth, I think the most interesting stuff was the talks. There are video recordings and slides of pretty much all talks, but I will also give you the “Cliff’s Notes” of them here.

MySQL DevRoom talk quick summaries Beyond WHERE and GROUP BY – Sergei Golubchik
  • EXCEPT is in MariaDB Server 10.3
  • recursive CTEs are good for hierarchical data, graphs, data generation, Turing complete (you can use it to solve Sudoku even)
  • non-recursive CTEs can be an alternative syntax for subqueries in the FROM clause
  • Window functions:
    • Normal: one result per row, depend on that row only
    • Aggregate: one result per group, depending on the whole group
    • Window: one result per row, depending on the whole group
  • System versioned tables with AS OF
  • Aggregate stored functions
MySQL 8.0 Performance: InnoDB Re-Design – Dimitri Kravtchuk
  • Contention-Aware Transactions Scheduling (CATS), since 8.0.3. Not all transactions are equal, FIFO could not be optimal, unblock the most blocking transactions first
  • CATS (VATS) had a few issues, and there were bugs (they thought everything worked since MariaDB Server had implemented it). They spent about 9 months before fixing everything.
  • Where does CATS help? Workloads hitting row lock contentions. You can monitor via SHOW ENGINE INNODB MUTEX.
  • the main problem is because of repeatable read versus read committed transaction isolation on the same workload. You really need to understand your workload when it comes to VATS.
MySQL 8.0 Roles – Giuseppe Maxia
  • Created like a user, granted like privileges. You need to activate them to use them.
  • Before roles, you created a user, then grant, grant, and more grant’s… Add another user? Same deal. Lots of repetitive work and a lot of chances to make mistakes.
  • Faster user administration – define a role, assign it many times. Centralized grant handling – grant and revoke privileges to roles, add/edit all user profiles.
  • You need to remember to set the default role.
  • A user can have many roles; default role can be a list of roles.
  • Roles are users without a login – roles are saved in user tables. This is useful from an account lock/unlock perspective.
  • You can grant a user to a user
  • SET ROLE is for session management; SET DEFAULT ROLE is a permanent assignment of a role for a user. SET ROLE DEFAULT means assign the default role for this user for this session
  • The role_edges table reports which roles are assigned to which users. default_roles keeps track of the current default roles assigned to users. A default role may not exist.
Histogram support in MySQL 8.0 – Øystein Grøvlen
  • You can now do ANALYZE TABLE table UPDATE HISTOGRAM on column WITH n BUCKETS;
  • New storage engine API for sampling (default implementation is full table scan even when sampling)
  • Histogram is stored in a JSON column in the data dictionary. Grab this from the INFORMATION_SCHEMA.
  • Histograms are useful for columns that are not the first column of any index, and used in WHERE conditions of JOIN queries, queries with IN-subqueries, ORDER BY … LIMIT queries. Best fit: low cardinality columns (e.g. gender, orderStatus, dayOfWeek, enums), columns with uneven distribution (skew), stable distribution (do not change much over time)
  • How many buckets? equi-height, 100 buckets should be enough.
  • Histograms are stored in the data dictionary, so will persist over restarts of course.
Let’s talk database optimizers – Vicențiu Ciorbaru TLS for MySQL at Large Scale – Jaime Crespo
  • Literally took 3 lines in the my.cnf to turn on TLS
  • https://dbtree.wikimedia.org
  • They wanted to do a data centre failover and wanted to ensure replication would be encrypted.
  • They didn’t have proper orchestration in place (MySQL could have this too). Every time OpenSSL or MySQL had to be upgraded, the daemon needed restarting. If there was an incompatible change, you had to sync master/replicas too.
  • The automation and orchestration that Wikipedia uses: https://fosdem.org/2018/schedule/event/cumin_automation/ (it is called Cumin: https://wikitech.wikimedia.org/wiki/Cumin)
  • Server support was poor – OpenSSL – so they had to deploy wmf-mysql and wmf-mariadb of their own
  • Currently using MariaDB 10.0, and looking to migrate to MariaDB 10.1
  • Client library pain they’ve had
  • TLSv1.2 from the beginning (2015).
  • 20-50x slower for actual connecting; the impact is less than 5% for the actual query performance. Just fix client libraries, make them use persistent connections. They are now very interested in ProxySQL for this purpose.
  • https://grafana.wikimedia.org/?orgId=1
  • Monty asks, would a double certificate help? Jaime says sure. But he may not actually use double certificates; might not solve CA issues, and the goal is not to restart the server.
  • Monty wonders why not to upgrade to 10.2? “Let’s talk outside because it’s a much larger question.”
MySQL InnoDB Cluster – Miguel Araújo
  • group replication: update everywhere (multi-master), virtually synchronous replication, automatic server failover, distributed recovery, group reconfiguration, GCS (implementation of Paxos – group communication system). HA is a critical factor.
  • mysqlsh: interactive and batch operations. Document store (CRUD and relational access)
  • admin API in mysqlsh: create & manage clusters, hide complexity of configuration/provisioning/orchestration of the InnoDB clusters. Works with JavaScript and Python
  • Usability. HA out of the box.
  • It’s easy to join a new node; new node goes into recovery mode (and as long as you have all the binary logs, this is easy; otherwise start from a backup)
  • SET PERSIST – run a command remotely, and the configuration is persisted in the server
  • Network flapping? Group replication will just reject the node from the cluster if its flapping too often
Why we’re excited about MySQL 8 – Peter Zaitsev
  • Native data dictionary – atomic, crash safe, DDLs, no more MyISAM system table requirements
  • Fast INFORMATION_SCHEMA
  • utf8mb4 as default character set
  • Security: roles, breakdown of SUPER privileges, password history, faster cached-SHA2 authentication (default), builds using OpenSSL (like Percona Server), skip grants blocks remote connections, logs now encrypted when tablespace encryption enabled
  • Persistent AUTO_INCREMENT
  • auto-managed undo tablespaces – do not use system table space for undo space. Automatically reclaim space on disks.
  • Self-tuning, limited to InnoDB (innodb_dedicated_server to auto-tune)
  • partial in-place update for JSON – update filed in JSON object without full rewrite. Good for counters/statuses/timestamps. Update/removal of element is supported
  • Invisible indexes – test impact of dropping indexes before actually dropping them. Maintained but unused by the optimizer. If not needed or used, then drop away.
  • TmpTable Storage Engine – more efficient storage engine for internal temporary tables. Efficient storage for VARCHAR and VARBINARY columns. Good for GROUP BY queries. Doesn’t support BLOB/TEXT columns yet (this reverts to InnoDB temp table now)
  • Backup locks – prevent operations which may result in inconsistent backups. CHECK INSTANCE FOR BACKUP (something Percona Server has had before)
  • Optimizer histograms – detailed statistics on columns, not just indexes
  • improved cost model for the optimizer – www.unofficialmysqlguide.com
  • Performance schematic – faster (via “fake” indexes), error instrumentation, response time histograms (global & per query), digest summaries
  • select * from sys.session – fast potential replacement for show processlist
  • RESTART (command)
  • SET PERSIST – e.g. change the buffer pool size, and this helps during a restart
  • assumes default storage is SSD now
  • binary log on by default, log_slave_updates enabled by default, and log expires after 30 days by default
  • query cache removed. Look at ProxySQL or some other caching solution
  • native partitioning only – remove partitions from MyISAM or convert to InnoDB
  • resource groups – isolation and better performance (map queries to specific CPU cores; can jail your costly queries, like analytical queries)
  • Feature Requests: better single thread performance, no parallel query support
MySQL Test Framework for Support and Bugs Work – Sveta Smirnova
  • MTR allows you to add multiple connections
  • has commands for flow control
ProxySQL – GTID Consistent Reads – René Cannaò, Nick Vyzas
  • threshold is configurable in increments of 1 second. Replication lag can be monitored with ProxySQL. Want to ensure you don’t have stale reads.
  • Why is GTID important? To guarantee consistently. Auto positioning for restructuring topologies.
  • –session-track-gtids is an important feature which allows sending the GTID for a transaction on the OK packet for a transaction. Not available in MariaDB.
  • There is a ProxySQL Binlog Reader now – GTID information about a MySQL server to all connected ProxySQL instances. Lightweight process to run on your MySQL server.
  • ProxySQL can be configured to enforce GTID consistency for reads on any hostgroup/replication hostgroup.
  • Live demo by René
Turbocharging MySQL with Vitess – Sugu Sougoumarane
  • trend for the cloud: container instances, short-lived containers, tolerate neighbors, discoverability. No good tools yet for Kubernetes.
  • non-ideal options: application sharing, NoSQL, paid solutions, NewSQL (CockroachDB, TiDB, Yugabyte)
  • Vitess: leverage MySQL at massive scale, opensource, 8+ years of work, and multiple production examples
  • Square uses Vitess for Square Cash application.
  • Can MySQL run on Docker? Absolutely, many of the companies do huge QPS on Docker.
  • YouTube does a major re-shard every 2-3 months once. No one notices nowadays when that happens.
  • app server connects to vtgate, and only underneath it’s a bunch of smaller databases with vttablet + mysqld. The lockserver is what makes it run well in the cloud.
  • pluggable architecture with no compromise on performance: monitoring, health check, ACLs, tracing, more.
  • at most, it adds about 2ms overhead to connections
  • Go coding standards are enforced, unit tests with strict coverage requirements, end-to-end tests, Travis, CodeClimate and Netlify. Readability is king.
  • On February 5 2018, it will be a CNCF project. One year of due diligence. They said there was nothing to compare it with. Looked at maturity and contributors. It’s becoming a truly community-owned project! (CNCF to Host Vitess is already live as of now)
  • roadmap: full cross-shard queries, migration tools, simplify configurability, documentation.
  • full MySQL protocol, but a limited query set – they want to get it to a point where it accepts a full MySQL query.
Orchestrator on Raft – Shlomi Noach
  • Raft: guaranteed to be in-order replication log, an increasing index. This is how nodes choose a leader based on who has the higher index. Get periodic snapshots (node runs a full backup).
  • HashiCorp raft, a Golang raft implementation, used by Consul
  • orchestrator manages topology for HA topologies; also want orchestrator to be highly available. Now with orchestrator/raft, remove the MySQL backend dependency, and you can have data center fencing too. Now you get: better cross-DC deploys, DC-local KV control, and also Kubernetes friendly.
  • n-orchestrator nodes, each node still runs its own backend (either MySQL or SQLite). Orchestrator provides the communication for SQLite between the nodes. Only one (the Raft leader) will handle failovers
  • implementation & deployment @ Github – one node per DC (deployed at 3 different DCs). 1-second raft polling interval. 2 major DCs, one in the cloud. Step-down, raft-yield, SQLite-backed log store, and still a MySQL backend (SQLite backend use case is in the works)
  • They patched the HashiCorp raft library. The library doesn’t care about the identity of nodes, with Github they do want to control the identity of the leader. There is an “active” data center, and locality is important. This is what they mean by raft-yield (picking a candidate leader).
  • The ability for a leader to step down is also something they had to patch.
  • HashiCorp Raft only supports LMDB and another database, so the replication log is now kept in a relational SQLite backed log store. Another patch.
  • once orchestrator can’t run its own self-health check, it recognizes this. The application can tell raft now that it’s stepping down. Takes 5 seconds to step down, and raft then promotes another orchestrator node to be the leader. This is their patch.
  • can also grab leadership
  • DC fencing handles network partitioning.
  • orchestrator is Consul-aware. Upon failover, orchestrator updates Consul KV with the identity of the promoted master.
  • considerations to watch out for: what happens if, upon replay of the Raft log, you hit two failovers for the same cluster? NOW() and otherwise time-based assumptions. Reapplying snapshot/log upon startup
  • roadmap: use Kubernetes (cluster IP based configuration in progress, already container friendly via auto-re-provisioning of nodes via Raft)
MyRocks Roadmaps – Yoshinori Matsunobu
  • Facebook has a large User Database (UDB). Social graph, massively sharded, low latency, automated operations, pure flash storage (constrained by space, not CPU/IOPS)
  • They have a record cache in-front of MySQL – Tao for reads. If cache misses, then it hits the database. And all write requests go thru MySQL. UDB has to be fast to ensure a good user experience.
  • they also at Facebook run 2 instances of MySQL on the same machine, because CPU wasn’t huge, but the space savings were awesome.
  • design decisions: clustered index (same as InnoDB), slower for reads, faster for writes (bloom filters, column family), support for transactions including consistency between binlog and MyRocks. Faster data loading/deletes/replication, dynamic options (instead of having to restart mysqld), TTL (comparable to HBase TTL feature, specify the TTL, any data older than time, can be removed), online logical (for recovery purposes) & binary backup (for creating replicas)
  • Pros: smaller space, better cache hit rate, writes are faster so you get faster replication, much smaller bytes written
  • Cons: no statement based replication, GAP locks, foreign keys, full-text index, spatial index support. Need to use case sensitive collations for performance. Reads are slower, especially if the data fits in memory. Dependent on file system and OS; lack of solid direct I/O (uses buffered I/O). You need a newer than 4.6 kernel. Too many tuning options beyond buffer pool such as bloom filter, compactions, etc.
  • https://twitter.com/deniszh/status/960163082642382849
  • Completed InnoDB to MyRocks migration. Saved 50% space in UDB compared to compressed InnoDB.
  • Roadmaps: getting in MariaDB and Percona Server for MySQL. Read Mark’s blog for matching read performance vs InnoDB. Supporting mixed engines. Better replication and bigger instance sizes.
  • mixed engines: InnoDB and MyRocks on the same instance, though single transaction does not overlap engines. Plan to extend star backup to integrate `myrocks_hotbackup. Backport gtid_pos_auto_engines from MariaDB?
  • Removing engine log. Could be caused by binlog and engine log, which requires 2pc and ordered commits. Use one log? Either binlog or binlog like service or RocksDB WAL? Rely on binlog now (semi-sync, binlog consumers), need to determine how much performance is gained by stopping writing to WAL.
  • Parallel replication apply is important in MySQL 8
  • support bigger instance sizes: shared nothing database is not a general purpose database. Today you can get 256GB+ RAM and 10TB+ flash on commodity servers. Why not run one big instance and put everything there? Bigger instances may help general purpose small-mid applications. Then you don’t have to worry about sharing. Atomic transactions, joins and secondary keys will just work. Amazon Aurora today supports a 60TB instance!
  • today: you can start deploying slaves with consistency check. Many status counters for instance monitoring.
ProxySQL internals – René Cannaò
  • reduce latency, scales, maximize throughput. Single instance to travel hundreds of thousands of connections and to handle thousands of backend servers.
  • threading models: one thread per connection (blocking I/O), thread pooling (non-blocking I/O, scalable).
  • ProxySQL thread pool implementation: known as “MySQL threads”, fixed number of worker threads (configurable), all threads listen on the same port(s), client connections are not shared between threads, all threads perform their own network I/O, and it uses poll() (does that scale? True, but there is a reason why poll over epoll)
  • threads never share client connections – no need for synchronization, thread contention is reduced, each thread calls poll(). Possibly imbalanced load as a con (one thread that has way more connections that another). Is it really a problem? Most of the time, no, connections will automatically balance.
  • poll() is O(N), epoll() is O(1). Poll() is faster than epoll() for fewer connections (around 1000). Performance degrees when there are a lot of connections. So by default, it uses poll() instead of epoll(), around 50,000 connections performance degrades badly – so ProxySQL has auxiliary threads.
  • MySQL_Session() is implemented as a state machine. Stores metadata associated with the client session (running timers, default hostgroup, etc.)
MySQL Point-in-time recovery like a rockstar – Frederic Descamps Releases
  • Percona Monitoring and Management 1.7.0 (PMM) – This release features improved support for external services, which enables a PMM Server to store and display metrics for any available Prometheus exporter. For example, you could deploy the postgres_exporter and use PMM’s external services feature to store PostgreSQL metrics in PMM. Immediately, you’ll see these new metrics in the Advanced Data Exploration dashboard. Then you could leverage many of the pre-developed PostgreSQL dashboards available on Grafana.com, and with a minimal amount of edits have a working PostgreSQL dashboard in PMM!
  • MariaDB Server 10.1.31 – usual updates to storage engines, and a handful of bug fixes.
Link List Upcoming appearances
  • SCALE16x – Pasadena, California, USA – March 8-11 2018
Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

How to Enable Binary Logging on an Amazon RDS Read Replica

One of the more common struggles I’ve had to assist with in regard to Amazon RDS is enabling binary logging on read replicas, or forming multi-tier replication in instances using version 5.6 or later after seeing that multi-tier replication is not supported in version 5.5 (for a reason that will become clear by the end of this post.)

First off, let’s have a look at the topology that I have in place in my AWS account. As you’ll see below I have a master, blog1, and a read replica that I created via the AWS console called blog2. You’ll also notice that, despite being supported, if I select instance actions while having blog2 highlighted the option to create a read replica is grayed out.

Further, if we use the MySQL CLI to connect to blog2 and check the global variables for log_bin and binlog_format, you’ll see that binary logging is off and binlog_format is set to statement. This is strange considering that the parameter group that’s assigned to blog2 is the same as blog1 where binary logging is enabled and the format is set to mixed. In fact, when you try to change the binary logging format in the MySQL RDS parameter group you’ll see that statement isn’t even an option.

mysql> show global variables like 'log_bin'; show global variables like 'binlog_format'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | log_bin | OFF | +---------------+-------+ 1 row in set (0.06 sec) +---------------+-----------+ | Variable_name | Value | +---------------+-----------+ | binlog_format | STATEMENT | +---------------+-----------+ 1 row in set (0.05 sec)

So what gives? The status of the variables in our instance clearly don’t reflect the intended configuration. The answer here is to enable automatic backups for blog2. When you first create a read replica in RDS you won’t have the option to enable automatic backups and, much like this situation, you’ll have to go back and modify the read replica instance after its creation to enable automatic backups. The supporting documentation for this can be found here. The appropriate text is as follows:

When creating a Read Replica, there are a few things to consider. First, you must enable automatic backups on the source DB instance by setting the backup retention period to a value other than 0. This requirement also applies to a Read Replica that is the source DB instance for another Read Replica. For MySQL DB instances, automatic backups are supported only for Read Replicas running MySQL 5.6 and later, but not for MySQL versions 5.5. To enable automatic backups on an Amazon RDS MySQL version 5.6 and later Read Replica, first create the Read Replica, then modify the Read Replica to enable automatic backups.

After you enable automatic backups by modifying your read replica instance to have a backup retention period greater than 0 days, you’ll find that the log_bin and binlog_format will align itself with the configuration specified in your parameter group dynamically and will not require the RDS instance to be restarted. You will also be able to create a read replica from your read replica instance with no further modification requirements.

Conclusion

To enable binary logging on an AWS RDS Read Replica, you must set backup retention period to something greater than 0 days. I like to explain this to people by saying that they need to look at this from the Database-as-a-Service (DBaaS) perspective. If you have a DBaaS instance that has no backup recovery requirements and has no read replicas, there is no reason for the cloud provider to waste clock cycles and storage on binary logs that won’t be used. By enabling the backup requirement, you’ve forced a position where binary logs are justified.

Fsync Performance on Storage Devices

While preparing a post on the design of ZFS based servers for use with MySQL, I stumbled on the topic of fsync call performance. The fsync call is very expensive, but it is essential to databases as it allows for durability (the “D” of the ACID acronym).

Let’s first review the type of disk IO operations executed by InnoDB in MySQL. I’ll assume the default InnoDB variable values.

The first and most obvious type of IO are pages reads and writes from the tablespaces. The pages are most often read one at a time, as 16KB random read operations. Writes to the tablespaces are also typically 16KB random operations, but they are done in batches. After every batch, fsync is called on the tablespace file handle.

To avoid partially written pages in the tablespaces (a source of data corruption), InnoDB performs a doublewrite. During a doublewrite operation, a batch of dirty pages, from 1 to about 100 pages, is first written sequentially to the doublewrite buffer and fsynced. The doublewrite buffer is a fixed area of the ibdata1 file, or a specific file with the latest Percona Server for MySQL 5.7. Only then do the writes to the tablespaces of the previous paragraph occur.

That leaves us with the writes to the InnoDB log files. During those writes, the transaction information — a kind of binary diff of the affected pages — is written to the log files and then the log file is fsynced. The duration of the fsync call can be a major contributor to the COMMIT latency.

Because the fsync call takes time, it greatly affects the performance of MySQL. Because of this, you probably noticed there are many status variables that relate to fsyncs. To overcome the inherent limitations of the storage devices, group commit allows multiple simultaneous transactions to fsync the log file once for all the transactions waiting for the fsync. There is no need for a transaction to call fsync for a write operation that another transaction already forced to disk. A series of write transactions sent over a single database connection cannot benefit from group commit.

Fsync Results

In order to evaluate the fsync performance, I used the following Python script:

#!/usr/bin/python import os, sys, mmap # Open a file fd = os.open( "testfile", os.O_RDWR|os.O_CREAT|os.O_DIRECT ) m = mmap.mmap(-1, 512) for i in range (1,1000): os.lseek(fd,os.SEEK_SET,0) m[1] = "1" os.write(fd, m) os.fsync(fd) # Close opened file os.close( fd )

The script opens a file with the O_DIRECT flag, writes and fsyncs it 1000 times and close the file. I added O_DIRECT after an internal discussion with my colleagues, but it doesn’t change the results and it doesn’t remove the need for calling fsync. We’ll discuss in more detail the impacts of O_DIRECT after we reviewed the results. The script is called with the time command like below:

root@lab:/tmp/testfsync# time python /root/fsync.py real 0m18.320s user 0m0.060s sys 0m0.096s

In the above example using a 7.2k rpm drive, the fsync rate is about 56/s for a latency of 18ms. A 7.2k RPM drive performs 120 rotations per second. On average, the fsyncs require a bit more than two rotations to complete. The filesystem appears to make very little differences: ext4 and XFS show similar results. That means if MySQL uses such storage devices for the InnoDB log files, the latency of each transaction is at least 18ms. If the application workload requires 200 write transactions per second, they’ll need to be executed using at least four database connections.

So, let’s begin with rotational devices. These are becoming a bit less common now with databases, especially without a raid controller. I could only find a few.

Drive RPM Rate Latency Notes WDC WD2500BJKT 5400 22/s 45 ms Laptop SATA from 2009 ST2000LM003 5400 15/s 66 ms USB-3 portable drive ST3750528AS 7200 40/s 25 ms Desktop grade SATA WD2502ABYS-18B7A0 7200 56/s 18 ms Desktop grade SATA HUA723020ALA641 7200 50/s 20 ms Enterprise grade SATA, md mirror Dell SAS unknown 7200 58/s 17 ms Behind Perc ctrl but no write cache HDWE150 7200 43/s 23 ms Recent Desktop grade SATA, 5TB

 

I unfortunately didn’t have access to any 10k or 15k RPM drives that were not behind a raid controller with a write cache. If you have access to such drives, run the above script a few times and send me your results, that would help create a more complete picture! So, we can see a correlation between the rotational speed and the fsync rate, which makes sense. The faster a disk turns, the faster it can fsync. The fsync call saves the data and then updates the metadata. Hence, the heads need to move. That’s probably the main explanation for the remaining disparity. A good point, all drives appears to be fully complying with the SATA flush command even though they all have an enabled write cache. Disabling the drives write caches made no difference.

With the above number, the possible transaction rates in fully ACID mode is pretty depressing. But those drives were rotating ones, what about SSD drives? SSD are memory devices and are much faster for random IO operations. There are extremely fast for reads, and good for writes. But as you will see below, not that great for fsyncs.

Drive rate latency notes SAMSUNG MZ7LN512 160/s 6.3ms Consumer grade SATA Crucial_CT480M500SSD1 108/s 9.3ms Consumer grade SATA Intel 520 2031/s 0.49ms Consumer grade SATA SAMSUNG MZVPV512HDGL 104/s 9.6ms Consumer grade NVMe Samsung SSD 960 PRO 267/s 3.8ms High-end consumer grade NVMe Intel PC-3100 1274/s 0.79ms Low-end consumer grade NVMe (cheat?) Intel 750 2038/s 0.49ms High-end consumer grade NVMe Intel PC-3700 7380/s 0.14ms High-end enterprise-grade NVMe

 

Again, this is a small sample of the devices I have access to. All SSD/Flash have write caches, but only the high-end devices have capacitors to flush their write cache to the flash with a loss of power. The PC-3100 device is actually in my home server, and it is obviously cheating. If you look at the card specs on the Intel website, it doesn’t have the “Enhanced Power Loss Data Protection” and “End-to-End Data Protection” features. The much more expansive PC-3700 does. I use the PC-3100 as a ZFS L2ARC device, so I am good. In general, the performance of a flash device varies a bit more than rotational devices, since factors like the number of recent writes and the filling factor come into play.

Even when using a high-end NVMe device like the PC-3700, you can’t reach 10k fully ACID transactions per second at low thread concurrency. How do you reach the higher levels? The answer here is the good old raid controller with a protected write cache. The write cache is basically using DRAM memory protected from power loss by a battery. SAN controllers have similar caches. The writes to the InnoDB log files are sequential writes interleaved with fsyncs. The raid controller concatenates the sequential writes, eventually writing one big chunk on disk and… ignoring the fsyncs. Here’s the result from the only device I had access to:

Drive rate latency notes Dell Perc with BBU 23000/s 0.04ms Array of 7.2k rpm drives

 

That’s extremely fast but, of course, it is memory. I modified the script to loop 10k times instead of 1k. In theory, something a single slave thread doing simple transactions could reach a rate of 20k/s or more while being fully ACID.

Discussion

We must always consider the results we got in the previous section in the context of a given application. For example, a server using an Intel PC-3700 NVMe card can do more than 7000 fully ACID transactions per second even if it is fully durable provided those transactions are issued by a sufficient number of threads. Adding threads will not allow scaling infinitely. At some point, other bottlenecks like mutex contention or page flushing will dominate.

We often say that Galera-based cluster solutions like Percona XtraDB Cluster (PXC) add latency to the transactions, since it involves communication over the network. With the Galera protocol, a commit operation returns only when all the nodes have received the data. Thus, tt is a good practice to relax the local durability and use innodb_flush_log_at_trx_commit set to 0 or 2. On a local network, the ping time is always below 1ms and often below 0.1ms. As a result, the transaction latency is often smaller.

About fdatasync

The fsync system is not the only system call that persists data to disk. There is also the fdatasync call. fdatasync persists the data to disk but does not update the metadata information like the file size and last update time. Said otherwise, it performs one write operation instead of two. In the Python script, if I replace os.fsync with os.fdatasync, here are the results for a subset of devices:

Drive rpm rate latency notes ST2000LM003 5400 72/s 13 ms USB-3 portable drive WD2502ABYS-18B7A0 7200 118/s 8.5 ms Desktop grade SATA SAMSUNG MZ7LN512 N/A 333/s 3.0ms Consumer grade SATA Crucial_CT480M500SSD1 N/A 213/s 4.7ms Consumer grade SATA Samsung SSD 960 PRO N/A 714/s 1.4ms High-end consumer grade NVMe

 

In all cases, the resulting rates have more than doubled. The fdatasync call has a troubled history, as there were issues with it many years ago. Because of those issues, InnoDB never uses fdatasync, only fsyncs. You can find the following comments in the InnoDB os/os0file.cc:

/* We let O_SYNC only affect log files; note that we map O_DSYNC to O_SYNC because the datasync options seemed to corrupt files in 2001 in both Linux and Solaris */

2001 is a long time ago. Given the above results, maybe we should reconsider the use of fdatasync. From the Linux main page on fdatasync, you find:

fdatasync() is similar to fsync(), but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. For example, changes to st_atime or st_mtime (respectively, time of last access and time of last modification; see stat(2)) do not require flushing because they are not necessary for a subsequent data read to be handled correctly. On the other hand, a change to the file size (st_size, as made by say ftruncate(2)), would require a metadata flush.

So, even with fdatasync, operations like extending an InnoDB tablespace will update the metadata correctly. This appears to be an interesting low-hanging fruit in term of MySQL performance. In fact, webscalesql already have fdatasync available

O_DIRECT

Why do we need a fsync or fdatasync with O_DIRECT? With O_DIRECT, the OS is not buffering anything along the way. So the data should be persisted right? Actually, the OS is not buffering but the device very likely is. Here are a few results to highlight the point using a 7.2k rpm SATA drive:

Test rate latency O_DIRECT, drive Write cache enabled 4651/s 0.22ms O_DIRECT, drive Write cache disabled 101/s 9.9ms ASYNC + fdatasync, Write cache enabled 119/s 8.4ms ASYNC + fdatasync, Write cache disabled 117/s 8.5ms

 

The drive write cache was enabled/disabled using the hdparm command. Clearly, there’s no way the drive can persist 4651 writes per second. O_DIRECT doesn’t send the SATA flush command to the disk, so we are only writing to the drive write cache. If the drive write cache is disabled, the rate falls to a more reasonable value of 101/s. What is interesting — and I don’t really understand why — is that opening the file in async mode and performing fdatasync is significantly faster. As expected, the presence of the drive write cache has no impacts on ASYNC + fdatasync. When the fdatasync call occurs, the data is still in the OS file cache.

If you want to use only O_DIRECT, you should make sure all the storage write caches are crash safe. That’s why MySQL adds a fsync call after a write to a file opened with O_DIRECT.

ZFS

These days, I find it difficult to write a blog post without talking about ZFS. How does ZFS handles fsyncs and fdatasyncs? ZFS, like a database, performs write ahead logging in the ZIL. That means calls like fsync and fdatasync return when the data has been persisted to the ZIL, and not to the actual filesystem. The real write operation is done a few seconds later by a background thread. That means the added write for the metadata does not impact performance right away. My home server uses ZFS over a pair of 7.2k RPM drive and doesn’t have a SLOG device. The ZIL is thus stored on the 7.2k RPM drives. The results are the following:

Drive rpm rate latency ZFS fsync 7200 104/s 9.6 ms ZFS fdatasync 7200 107/s 9.3 ms

 

Remember that with ZFS, you need to disable the O_DIRECT mode. The fdatasync rate appears to be slightly faster, but it is not really significant. With ZFS, the fsync/fdatasync performance relates to where the ZIL is stored. If there is no SLOG device, the ZIL is stored with the data and thus, the persitence performance of the devices used for the data matter. If there is a SLOG device, the persistence performance is governed by the specs of the device(s) on which the SLOG is located. That’s a very important aspect we have to consider when designing a MySQL server that will use ZFS. The design of such server will be discussed in more details in a future post.

Tutorial Schedule for Percona Live 2018 Is Live

Percona has revealed the line-up of in-depth tutorials for the Percona Live 2018 Open Source Database Conference, taking place April 23-25, 2018 at the Santa Clara Convention Center in Santa Clara, Calif. Secure your spot now with Advanced Registration prices (available until March 4, 2018). Sponsorship opportunities for the conference are still available.

Percona Live 2018 Open Source Database Conference is the premier open source database event. The theme for the upcoming conference is “Championing Open Source Databases,” with a range of topics on MySQL, MongoDB and other open source databases, including time series databases, PostgreSQL and RocksDB. Session tracks include Developers, Operations and Business/Case Studies. 

Tutorials take place throughout the day on April 23, 2018. Tutorials provide practical, in-depth knowledge of critical open source database issues. Topics include:

Hyatt Regency Santa Clara & The Santa Clara Convention Center

Percona Live 2018 Open Source Database Conference will be held at the Hyatt Regency Santa Clara & The Santa Clara Convention Center, at 5101 Great America Parkway Santa Clara, CA 95054.

The Hyatt Regency Santa Clara & The Santa Clara Convention Center is a prime location in the heart of the Silicon Valley. Enjoy this spacious venue with complimentary wifi, on-site expert staff and three great restaurants. You can reserve a room by booking through the Hyatt’s dedicated Percona Live reservation site.

Book your hotel using Percona’s special room block rate!

Sponsorships

Sponsorship opportunities for Percona Live 2018 Open Source Database Conference are available and offer the opportunity to interact with the DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors, and entrepreneurs who typically attend the event. Contact live@percona.com for sponsorship details.

  • Diamond Sponsors – Continuent, VividCortex
  • Gold Sponsors – Facebook, Grafana
  • Bronze Sponsors – SolarWinds, TwinDB, Yelp
  • Media Sponsors – Datanami, EnterpriseTech, HPCWire, ODBMS.org

MySQL Cluster 7.5 inside and out

A new book on MySQL Cluster is out. It is called MySQL Cluster 7.5 inside and out.
It is on its way out to all internet sites. Currently it is accessible on adlibris.se and on
BoD's bookshop and now also cdon.com. They are all in swedish, but with Google
Translate that should be possible to overcome. It will hit most other international book
stores within a week or less.

It is currently available for orders as a printed book, it will become available as an
eBook in 1-2 weeks. The printed book is in bound format since I wanted to make it
possible to read it frequently, it is 640 pages long.

I started development on NDB as my Ph.D research. The first years I did collect requirements
and tried to understand how database internals works. In 1996 the development started.
I wrote my Ph.D thesis in 1998 that stated most of the ideas used in the early versions of
the NDB implementation.

The idea on writing a book about MySQL Cluster have been coming up for me every now
and then since more than 10 years back. However all the time I felt it was more important
to focus on one more feature to develop.

In 2015 I decided that it was more important to write down a description of the features in
NDB. So in 2016 I started writing this book. As usual with book projects they take a lot longer
than expected.

At about the same time Jesper Wisborg Krogh and  Mikiya Okuno also started writing a
book about MySQL Cluster. This is called Pro MySQL NDB Cluster.

So the good news is that we now have two very good books about MySQL Cluster.
Jesper and Mikiyas book is written from the perspective of the DBA that have
decided to use NDB.

My book explains why NDB was developed, it describes a great number of applications
where it fits in. It compares it to other clustering solutions for MySQL.

I wanted to make sure that the first step to install and get started with MySQL Cluster
isn't a showstopper, so I described in some detail how to install it and get up and running
on various platforms. This includes a chapter on MySQL Cluster and Docker. Later
there is also a chapter on using NDB in the cloud.

Next it goes through NDB from an SQL point of view and describes all the things that
are good to understand when working with MySQL Cluster. It goes through the direct
NDB APIs (C++ API, Java API and a Node.js API). It goes through how to import
and export data to/from NDB.

It explains the various ways you can replicate between clusters using MySQL Cluster.
It also explains why those solutions exist and what the problem it is trying to solve is.

We have developed quite a few ndbinfo tables that can be used to gather an understanding
of the cluster in operation. These tables are explained and the purpose of them.

Next I dive into some internals, describing the software architecture, the message flows
and the restarts in NDB. I provide some advices on how to optimise restart times.

Next I dive deep into the configuration of MySQL Cluster, both the cluster configuration
and the MySQL servers. I provide detailed descriptions of how to configure for optimal
performance. I also provide details on the memory impact of many configuration parameters.
The configuration chapters include detailed descriptions of how to setup an optimal
execution environment for NDB, this includes details on how to set up the Linux
infrastructure for optimal performance.

Finally I go through our testing frameworks that we make use. I go through in detail
the benchmark framework I developed for more than 10 years called dbt2-0.37.50
that can be used to benchmark with sysbench, DBT2 and flexAsynch.

Finally the history of MySQL Cluster is provided.


MySQL Router 8.0.4-rc has been released

Dear MySQL users,

MySQL Router 8.0.4-rc is the first release candidate for MySQL Router
8.0 series.

The MySQL Router is a new building block for high availability solutions
based on MySQL InnoDB clusters.

By taking advantage of the new Group Replication technology, and
combined with the MySQL Shell, InnoDB clusters provide an integrated
solution for high availability and scalability for InnoDB based MySQL
databases, that does not require advanced MySQL expertise.

The deployment of applications with high availability requirements is
greatly simplified by MySQL Router. MySQL client connections are
transparently routed to online members of a InnoDB cluster, with MySQL
server outages and cluster reconfigurations being automatically handled
by the Router.

To download MySQL Router 8.0.4-rc, see the “Development Releases” tab at
http://dev.mysql.com/downloads/router. Package binaries are available
for several platforms and also as a source code download.

Documentation for MySQL Router can be found at
http://dev.mysql.com/doc/mysql-router/en/

Enjoy!

Changes in MySQL Router 8.0.4 (2018-02-07, Release Candidate) Functionality Added or Changed * Bootstrapping now accepts the --config option and reads the [logger] level option's definition. For example, to enable bootstrap's debugging mode: [logger] level = DEBUG (Bug #27158098) * The default ttl metadata option (Time To Live, in seconds) changed from 300 to 5. (Bug #26990955, Bug #88140) * The new connect_timeout and read_timeout options were added. These are defined under the [DEFAULT] namespace and affect internal operations, such as metadata server connections. (Bug #26877946) * Bootstrap now accepts any member of an InnoDB cluster and automatically finds and reconnects to a writable primary. (Bug #25489509) * The optional routing_strategy configuration option was added. The available values are first-available, next-available, round-robin, and round-robin-with-fallback. Previously, these strategies were described as scheduling modes by the mode configuration option where the read-write mode defaults to the first-available strategy, and the read-only mode defaults to the round-robin strategy. This preserves previous behavior for these modes. (Bug #86261, Bug #26045094, Bug #25852803) Bugs Fixed * With logging_folder undefined during bootstrap, all logs were written to STDERR. Now, normal bootstrap logs are written to STDOUT and debug bootstrap logs are written to STDERR. (Bug #27232410) * Errors were changed to warnings for the following conditions: when Router could not connect to a particular metadata server, and when Router could not update the default metadata cache replicaset. Under these conditions, Router does not stop running because there are multiple metadata servers and replicasets. (Bug #27226627) * Configuring MySQL Router with sockets would create a socket that was only accessible by the MySQL Router user. (Bug #27179456, Bug #88667) * The commercial .deb packages were missing the mysqlrouter_plugin_info tool. (Bug #27122367) * The apt purge process did not remove the /var/{lib,log,run}/mysqlrouter directories. (Bug #26955232) * Bootstrapping would fail when connecting to a MySQL Server under high load if an associated bootstrap query took longer than 5 seconds. The 5 second read timeout was increased from 5 to 30. In addition, command line options were added to change the connect and read timeout values. (Bug #26877946) * Improved error text when bootstrapping against a MySQL server 8.0 instance that was not part of InnoDB cluster. (Bug #26846040) * Router assumed that a resulting socket from accept()ing a socket would be always blocking. On Solaris and Windows this assumption is not valid, and this resulted in broken connections with large result sets. (Bug #26834769) * It was difficult to distinguish the "Too many connections" between MySQL Server and MySQL Router, so the Router variant now reads as "Too many connections to MySQL Router". (Bug #26593909) * The bundled README.txt was missing Protobuf and Rapid JSON references. (Bug #25619654) * Some builds were missing the sample configuration file, including the Solaris and Oracle Linux binaries. (Bug #25530691) * Router would check IPv4 or IPv6 addresses, but not both. Now it goes through the list of addresses and first tries to bind to an IPv4 address and if it fails then it goes through the same address list and tries to bind to an IPv6 address. (Bug #25127667) * The generated error message from passing in an empty file to --master-key-file (or using an empty mysqlrouter.key) was improved. (Bug #25111926) * Defining multiple logger sections in the configuration file would emit an unclear error. Defining multiple logger sections is not allowed. (Bug #25095565) * Where destinations=metadata-cache, the role attribute was not used or validated; only the mode configuration option was used. (Bug #25061854) * Failed bootstrap commands would leave a generated mysqlrouter.conf.tmp file on the system. (Bug #24930442) * On Ubuntu Linux, documentation related files were installed under both /usr/share/mysql-router/docs and /usr/share/doc/mysql-router. Now they are only installed under /usr/share/doc/mysql-router for community builds and /usr/share/doc/mysql-router-commercial for commercial builds. (Bug #24765509) * The maximum number of concurrent client connections was increased from about 500 to over 5000, a limit now dependent on the operation system. To achieve this, select() based fd event calls were replaced by poll() (or WSAPoll() on Windows). (Bug #22661705, Bug #80260) * The --ssl-key and --ssl-cert optional bootstrap command-line options were added. They directly use their MySQL client's counterparts, and specify the client-side certificate and private key to facilitate client-side authentication. This is useful when the root account used during bootstrap was created with REQUIRE X509, which requires the client to authenticate itself when logging in.

On behalf of Oracle MySQL Release Team
Nawaz Nazeer Ahamed

ProxySQL Query Cache: What It Is, How It Works

In this blog post, I’ll present the ProxySQL query cache functionality. This is a query caching mechanism on top of ProxySQL. As there are already many how-tos regarding the ProxySQL prerequisites and installation process, we are going to skip these steps. For those who are already familiar with ProxySQL query cache configuration, let’s go directly to the query rules and the performance results.

Before talking about the ProxySQL query cache, let’s take a look at other caching mechanisms available for MySQL environments.

MySQL query cache is a query caching mechanism – deprecated as of MySQL 5.7.20 and removed in MySQL 8.0 – on top of MySQL itself (based on the official MySQL documentation).

The MySQL query cache stores the text of a SELECT statement together with the corresponding result sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again. The query cache is shared among sessions, so a result set generated by one client can be sent in response to the same query issued by another client.

Although MySQL query cache is supposed to improve performance, there are cases where MySQL query cache is not scaling well and can degrade performance due to its locking and invalidation algorithms.

You can find a really interesting post regarding MySQL query cache here.

There is also another method to cache results in a MySQL environment. It’s external caching (i.e., Memcached, Redis, etc.), but this also has some drawbacks. Introducing such a mechanism requires some changes on the application side.

But what is ProxySQL query cache if there are already the MySQL query cache and other external caching mechanisms? At the moment, although we’ve done some tests, we are not going to compare ProxySQL query cache performance against other caching mechanisms. We’ll address this in a future blog post. We will only focus on ProxySQL itself.

What is ProxySQL Query Cache

ProxySQL query cache is an in-memory key-value storage that uses:

  • as key: a combination of username, schema and query text. It is a hash derived from username, schema name and the query itself. Combining these ensures that users access only their resultsets and for the correct schema.
  • as value: the resultset returned by the backend (mysqld or another proxy).

There is some more metadata stored for each resultset:

  • length: length of the resultset
  • expire_ms: defines when the entry will expire
  • access_ms: records the last time an entry was accessed
  • ref_count: a reference count to identify resultset currently in use

Based on the configuration, the resultsets are cached on the wire while queries are executed, and the resultset is returned to the application. If the application re-executes the same query within the time slot defined by “expire_ms”, the resultset is returned by the embedded ProxySQL query cache.

The only way to invalidate entries from the ProxySQL query cache is through a time-to-live in milliseconds. This is in contrast to MySQL query cache, where the query cache gets invalidated each time a table gets updated. At the moment, it is only possible to tune the total amount of memory used by the query cache, using the variable “mysql-query_cache_size_MB”. The current implementation of mysql-query_cache_size_MB doesn’t impose a hard limit. Instead, it is used as an argument by the purging thread.

It’s obvious that it’s not easy to directly compare these two cache mechanisms, as each of them has its own way to invalidate results. Please also note a significant difference between MySQL and ProxySQL when query cache is enabled. ProxySQL query cache may serve stale resultsets due to the way it invalidates cached data (cached data are valid for the interval specified by “cache_ttl”, while MySQL’s cached data get invalidated each time data change). Every query that is cached may return stale data, and this may or may not be acceptable by the application.

How it Works

Before doing any benchmarks, I will try to give you a short description of how ProxySQL query cache gets enabled. Unlike MySQL query cache, where a common caching space exists for all tables, in ProxySQL query cache we have to define what traffic gets cached. This is done by defining query rules that match traffic that is going to be cached and setting a “cache_ttl” for the cached results. There are many ways to define matches for incoming traffic, either by query or digest using patterns. All we need to cache the resultset is to define the matching criteria and the TTL. If a query passed the matching criteria, the resultset is cached so the next requests are served directly from the ProxySQL instance instead of querying the hostgroup DB nodes (if cache_ttl has not expired).

Let’s use an example to make it clear how ProxySQL query cache is enabled.

In our setup we have three backend DB servers in a master-slave topology, with Percona Server for MySQL 5.7 and a ProxySQL ver. 1.4.3 instance with sysbench 1.0 installed. Backend servers are within the same reader hostgroup, and traffic is balanced among these servers using the same priority.

As I’ve already said, we won’t look at the ProxySQL installation. There are many topologies you can implement: deploying ProxySQL on each application server thus removing the “single point of failure” weakness, for example. But in our case, we will just present the ProxySQL query cache having a single instance. In general, you would expect to have better performance with the ProxySQL instance closer to the application.

Configuration

With the ProxySQL instance up and running, let’s confirm that all servers are OK. Querying ProxySQL admin shows that all servers are ONLINE:

Admin> select * from mysql_servers; +--------------+-----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+ | hostgroup_id | hostname | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment | +--------------+-----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+ | 2 | 10.0.2.12 | 3306 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | | | 1 | 10.0.2.11 | 3306 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | | | 2 | 10.0.2.13 | 3306 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | | | 2 | 10.0.2.11 | 3306 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | | +--------------+-----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+

As you can see, there are two hostgroups: the “1” used for WRITES and the “2” used for READS.

Some random connects proves that traffic is correctly routed to the DB backends:

[RDBA] percona@atsaloux-proxysql: ~ $ mysql -h 127.0.0.1 -P 6033 -e "select @@hostname" +--------------+ | @@hostname   | +--------------+ | db1-atsaloux | +--------------+ [RDBA] percona@atsaloux-proxysql: ~ $ mysql -h 127.0.0.1 -P 6033 -e "select @@hostname" +--------------+ | @@hostname   | +--------------+ | db2-atsaloux | +--------------+ [RDBA] percona@atsaloux-proxysql: ~ $ mysql -h 127.0.0.1 -P 6033 -e "select @@hostname" +--------------+ | @@hostname   | +--------------+ | db1-atsaloux | +--------------+ [RDBA] percona@atsaloux-proxysql: ~ $ mysql -h 127.0.0.1 -P 6033 -e "select @@hostname" +--------------+ | @@hostname   | +--------------+ | db3-atsaloux | +--------------+

Let’s first take a look at some statistics. Before using sysbench, the “stats_mysql_query_digest” table (where digests are stored) is empty:

Admin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest_reset ORDER BY sum_time DESC; Empty set (0.00 sec)

The “stats_mysql_query_digest: table contains statistics related to the queries routed through the ProxySQL server. How many times each query was executed and the total execution time are two of the several provided statistics.

Before doing any benchmarks, I had to create some data. The following sysbench commands were used for selects by PK or by RANGE. For simplicity, we are not going to execute benchmarks inside transactions — although ProxySQL query cache is effective.

--threads will be adjusted for each benchmark:sysbench --threads=16 --max-requests=0 --time=60 --mysql-user=percona --mysql-password=percona --mysql-db=sbtest --mysql-host=127.0.0.1 --mysql-port=6033 --oltp-table-size=1000000 --oltp-read-only=on --oltp-skip-trx=on --oltp-test-mode=simple --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-point-selects=1 --oltp-simple-ranges=0 /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua run sysbench --threads=16 --max-requests=0 --time=60 --mysql-user=percona --mysql-password=percona --mysql-db=sbtest --mysql-host=127.0.0.1 --mysql-port=6033 --oltp-table-size=1000000 --oltp-read-only=on --oltp-skip-trx=on --oltp-test-mode=simple --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-point-selects=0 --oltp-simple-ranges=1 /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua run

Before running the full benchmark, a simple sysbench was run to get the queries digests that are used for the ProxySQL query cache configuration.

After running the first benchmark with ProxySQL query cache disabled, I queried the “stats_mysql_query_digest” table again and got the following results where it logs all executed queries.

Admin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest_reset ORDER BY sum_time DESC; +------------+-------------+-----------+--------------------+------------------------------------------------+ | count_star | sum_time | hostgroup | digest | digest_text | +------------+-------------+-----------+--------------------+------------------------------------------------+ | 301536 | 20962929791 | 2 | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=? | | 3269 | 30200073 | 2 | 0x290B92FD743826DA | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ? | +------------+-------------+-----------+--------------------+------------------------------------------------+ 2 row in set (0.01 sec)

Add mysql_query_rules To Be Cached

Output above provides all the needed information in order to enable ProxySQL query cache. What we need to do now add the query rules that match the results that should be cached. In this case we use a matching pattern criteria and a cache_ttl of 5000ms. Taking this into consideration, we added the following rules:

Admin> INSERT INTO mysql_query_rules (rule_id,active,digest,cache_ttl,apply) VALUES (1,1,'0xBF001A0C13781C1D',5000,1); Query OK, 1 row affected (0.00 sec) Admin> INSERT INTO mysql_query_rules (rule_id,active,digest,cache_ttl,apply) VALUES (2,1,'0x290B92FD743826DA',5000,1); Query OK, 1 row affected (0.00 sec)

We shouldn’t forget that we must load query rules at runtime. If we don’t want to lose these rules (i.e., after a ProxySQL restart), we should also save to disk:

Admin> LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK; Query OK, 0 rows affected (0.00 sec)

Now let’s reset the stats_mysql_query_digest results:

Admin> SELECT 1 FROM stats_mysql_query_digest_reset LIMIT 1; -- we reset the counters +---+ | 1 | +---+ | 1 | +---+ 1 row in set (0.01 sec) ----------

And re-run the benchmarks with query cache enabled. To confirm what traffic was cached, we have to query the stats_mysql_query_digest once again:

Admin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC; +------------+------------+-----------+--------------------+------------------------------------------------+ | count_star | sum_time | hostgroup | digest | digest_text | +------------+------------+-----------+--------------------+------------------------------------------------+ | 108681 | 6304585632 | 2 | 0x290B92FD743826DA | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ? | | 343277 | 0 | -1 | 0x290B92FD743826DA | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ? | +------------+------------+-----------+--------------------+------------------------------------------------+ 2 rows in set (0.00 sec)

Admin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC; +------------+-----------+-----------+--------------------+----------------------------------+ | count_star | sum_time | hostgroup | digest | digest_text | +------------+-----------+-----------+--------------------+----------------------------------+ | 79629 | 857050510 | 2 | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=? | | 441194 | 0 | -1 | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=? | +------------+-----------+-----------+--------------------+----------------------------------+ 2 rows in set (0.00 sec)

Cached queries are the ones marked with a special hostgroup -1 (this means that these queries were not sent to any hostgroup), and the total execution time for the queries cached is 0 (this means that the request was served within the same events loop).

Below you can see the benchmark results. Let’s look at what happens for selects by PK and selects by RANGE:

 

Points of Interest
  • In all cases, when threads and backend servers are increasing, ProxySQL performs better. This is achieved due to it’s connection pooling and multiplexing capabilities.
  • Enabling ProxySQL query cache provides a significant performance boost.
  • ProxySQL query cache can achieve a ~2X performance boost at a minimum.
  • This boost can be considerably valuable in cases where MySQL performance may fall to 50% (i.e., select by RANGE).
  • We shouldn’t forget that results are affected by hardware specs as well, but it’s obvious that ProxySQL with query cache enabled gives a really high throughput.
ProxySQL Query Cache Limitations

Current known limitations:

  • It is not possible to define query cache invalidation other than with cache_ttl.
  • There is no way to enforce query cache purge.
  • mysql-query_cache_size_MB is not strictly enforced, but only used as a metric to trigger automatic purging of expired entries.
  • Although access_ms is recorded, it is not used as a metric to expire an unused metric when mysql-query_cache_size_MB is achieved.
  • Query cache does not support prepared statements.
  • Query cache may serve stale data.
Conclusion

ProxySQL is generally a very powerful and easy-to-use tool.

With regards to query cache, it seems to scale very well and achieve a significant performance boost. Although having complex configs (not only for query cache) can add some extra overhead, it’s easy to maintain.

cache_ttl can be a limitation, but if it’s correctly configured in conjunction with max_replication_lag when configuring nodes in hostgroups, it does not add any significant drawback. In any case, it depends whether or not this is acceptable by the application.

Updated: Become a ClusterControl DBA - Deploying your Databases and Clusters

We get some nice feedback with regards to our product ClusterControl, especially how easy it is to install and get going. Installing new software is one thing, but using it properly is another.

It is not uncommon to be impatient to test new software and one would rather toy around with a new exciting application than to read documentation before getting started. That is a bit unfortunate as you may miss important features or misunderstand how to use them.

This blog series covers all the basic operations of ClusterControl for MySQL, MongoDB & PostgreSQL with examples on how to make the most of your setup. It provides you with a deep dive on different topics to save you time.

These are the topics covered in this series:

  • Deploying the first clusters
  • Adding your existing infrastructure
  • Performance and health monitoring
  • Making your components HA
  • Workflow management
  • Safeguarding your data
  • Protecting your data
  • In depth use case

In today’s post, we’ll cover installing ClusterControl and deploying your first clusters.

Preparations

In this series, we will make use of a set of Vagrant boxes but you can use your own infrastructure if you like. In case you do want to test it with Vagrant, we made an example setup available from the following Github repository: https://github.com/severalnines/vagrant

Clone the repo to your own machine:

$ git clone git@github.com:severalnines/vagrant.git

The topology of the vagrant nodes is as follows:

  • vm1: clustercontrol
  • vm2: database node1
  • vm3: database node2
  • vm4: database node3

You can easily add additional nodes if you like by changing the following line:

4.times do |n|

The Vagrant file is configured to automatically install ClusterControl on the first node and forward the user interface of ClusterControl to port 8080 on your host that runs Vagrant. So if your host’s ip address is 192.168.1.10, you will find the ClusterControl UI here: http://192.168.1.10:8080/clustercontrol/

Installing ClusterControl

You can skip this if you chose to use the Vagrant file, and get the automatic installation. But installation of ClusterControl is straightforward and will take less than five minutes.

With the package installation, all you have to do is to issue the following three commands on the ClusterControl node to get it installed:

$ wget http://www.severalnines.com/downloads/cmon/install-cc $ chmod +x install-cc $ ./install-cc # as root or sudo user

That’s it: it can’t get easier than this. If the installation script has not encountered any issues, then ClusterControl should be installed and up and running. You can now log into ClusterControl on the following URL: http://192.168.1.210/clustercontrol

After creating an administrator account and logging in, you will be prompted to add your first cluster.

Deploy a Galera cluster

You will be prompted to create a new database server/cluster or import an existing (i.e., already deployed) server or cluster:

We are going to deploy a Galera cluster. There are two sections that need to be filled in. The first tab is related to SSH and general settings:

To allow ClusterControl to install the Galera nodes, we use the root user that was granted SSH access by the Vagrant bootstrap scripts. In case you chose to use your own infrastructure, you must enter a user here that is allowed to do passwordless SSH to the nodes that ClusterControl will control. Just keep in mind that you have to setup passwordless SSH from ClusterControl to all database nodes by yourself beforehand.

Also make sure you disable AppArmor/SELinux. See here why.

Then, proceed to the second stage and specify the database related information and the target hosts:

ClusterControl will immediately perform some sanity checks each time you press Enter when adding a node. You can see the host summary by hovering over each defined node. Once everything is green, it means that ClusterControl has connectivity to all nodes, you can click Deploy. A job will be spawned to build the new cluster. The nice thing is that you can keep track of the progress of this job by clicking on the Activity -> Jobs -> Create Cluster -> Full Job Details:

Once the job has finished, you have just created your first cluster. The cluster overview should look like this:

In the nodes tab, you can do about any operation you normally would do on a cluster. The query monitor gives you a good overview of both running and top queries. The performance tab will help you keep a close eye on the performance of your cluster and also features the advisors that help you act proactively on trends in data. The backup tab enables you to easily schedule backups and store them on local or cloud storage. The manage tab enables you to expand your cluster or make it highly available for your applications through a load balancer.

All this functionality will be covered in later blog posts in this series.

Deploy a MySQL Replication Cluster

Deploying a MySQL Replication setup is similar to Galera database deployment, except that it has an additional tab in the deployment dialog where you can define the replication topology:

You can set up standard master-slave replication, as well as master-master replication. In case of the latter, only one master will remain writable at a time. Keep in mind that master-master replication doesn't come with conflict resolution and guaranteed data consistency, as in the case of Galera. Use this setup with caution, or look into Galera cluster. Once everything is green and you have clicked Deploy, a job will be spawned to build the new cluster.

Again, the deployment progress is available under Activity -> Jobs.

To scale out the slave (read copy), simply use the “Add Node” option in the cluster list:

After adding the slave node, ClusterControl will provision the slave with a copy of the data from its master using Xtrabackup or from any existing PITR compatible backups for that cluster.

Deploy PostgreSQL Replication

ClusterControl supports the deployment of PostgreSQL version 9.x and higher. The steps are similar with MySQL Replication deployment, where at the end of the deployment step, you can define the database topology when adding the nodes:

Similar to MySQL Replication, once the deployment completes, you can scale out by adding replications slave to the cluster. The step is as simple as selecting the master and filling in the FQDN for the new slave:

ClusterControl will then perform the necessary data staging from the chosen master using pg_basebackup, configure the replication user and enable the streaming replication. The PostgreSQL cluster overview gives you some insight into your setup:

Just like with the Galera and MySQL cluster overviews, you can find all the necessary tabs and functions here: the query monitor, performance, backup tabs all enable you to do the necessary operations.

Deploy a MongoDB Replica Set

Deploying a new MongoDB Replica Set is similar to the other clusters. From the Deploy Database Cluster dialog, pick MongoDB ReplicatSet, define the preferred database options and add the database nodes:

You can either choose to install Percona Server for MongoDB from Percona or MongoDB Server from MongoDB, Inc (formerly 10gen). You also need to specify the MongoDB admin user and password since ClusterControl will deploy by default a MongoDB cluster with authentication enabled.

After installing the cluster, you can add an additional slave or arbiter node into the replica set using the "Add Node" menu under the same dropdown from the cluster overview:

After adding the slave or arbiter to the replica set, a job will be spawned. Once this job has finished it will take a short while before MongoDB adds it to the cluster and it becomes visible in the cluster overview:

Final thoughts

With these three examples we have shown you how easy it is to set up different clusters from scratch in only a couple of minutes. The beauty of using this Vagrant setup is that, as easy as spawning this environment, you can also take it down and then spawn again. Impress your fellow colleagues by showing how quickly you can setup a working environment.

Of course it would be equally interesting to add existing hosts and already-deployed clusters into ClusterControl, and that’s what we'll cover next time.

Tags:  clustercontrol deployment installation MariaDB MongoDB MySQL polyglot persistence postgres PostgreSQL

Announcing Experimental Percona Monitoring and Management (PMM) Functionality via Percona Labs

In this blog post, we’ll introduce how you can look at some experimental Percona Monitoring and Management (PMM) features using Percona Labs builds on GitHub.

Note: PerconaLabs and Percona-QA are open source GitHub repositories for unofficial scripts and tools created by Percona staff. While not covered by Percona support or services agreements, these handy utilities can help you save time and effort.

Percona software builds located in the PerconaLabs and Percona-QA repositories are not officially released software, and also aren’t covered by Percona support or services agreements. 

Percona Monitoring and Management (PMM) is a free and open-source platform for managing and monitoring MySQL® and MongoDB® performance. You can run PMM in your environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

This month we’re announcing access to Percona Labs builds of Percona Monitoring and Management so that you can experiment with new functionality that’s not yet in our mainline product. You can identify the unique builds at:

https://hub.docker.com/r/perconalab/pmm-server/tags/

Most of the entries here are the pre-release candidate images we use for QA, and they follow a format of all integers (for example “201802061627”). You’re fine to use these images, but they aren’t the ones that have the experimental functionality.

Today we have two builds of note (these DO have the experimental functionality):

  • 1.6.0-prom2.1
  • 1.5.3-prometheus2

We’re highlighting Prometheus 2.1 on top of our January 1.6 release (1.6.0-prom2.1), available in Docker format. Some of the reasons you might want to deploy this experimental build to take advantage of the Prometheus 2 benefits are:

  • Reduced CPU usage by Prometheus, meaning you can add more hosts to your PMM Server
  • Performance improvements, meaning dashboards load faster
  • Reduced disk I/O, disk space usage

Please keep in mind that as this is a Percona Labs build (see our note above), so in addition note the following two criteria:

  • Support is available from our Percona Monitoring and Management Forums
  • Upgrades might not work – don’t count on upgrading out of this version to a newer release (although it’s not guaranteed to block upgrades)
How to Deploy an Experimental Build from Percona Labs

The great news is that you can follow our Deployment Instructions for Docker, and the only change is where you specify a different Docker container to pull. For example, the standard way to deploy the latest stable PMM Server release with Docker is:

docker pull percona/pmm-server:latest

To use the Percona Labs build 1.6.0-prom2.1 with Prometheus 2.1, execute the following:

docker pull perconalab/pmm-server:1.6.0-prom2.1

Please share your feedback on this build on our Percona Monitoring and Management Forums.

If you’re looking to deploy Percona’s officially released PMM Server (not the Percona Labs release, but our mainline version which currently is release 1.7) into a production environment, I encourage you to consider a Percona Support contract, which includes PMM at no additional charge!

MySQL Shell 8.0.4: Introducing “Upgrade checker” utility

MySQL 8.0 brings a lot of exciting new features and improvements. To make sure that your 5.7 system is ready for an upgrade there are certain steps you should take, described in our documentation: upgrade prerequisites. To make this process as quick and easy as possible we are introducing in MySQL Shell version 8.0.4 new utility called “Upgrade checker” (UC).…

MariaDB Server 10.1.31 and MariaDB Galera Cluster 10.0.34 now available

MariaDB Server 10.1.31 and MariaDB Galera Cluster 10.0.34 now available dbart Tue, 02/06/2018 - 13:11

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.1.31 and MariaDB Galera Cluster 10.0.34. See the release notes and changelogs for details and visit mariadb.com/downloads to download.

Download MariaDB Server 10.1.31

Release Notes Changelog About MariaDB Server 10.1

Download MariaDB Server 10.2.10

Release Notes Changelog About MariaDB Galera Cluster

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.1.31 and MariaDB Galera Cluster 10.0.34. See the release notes and changelogs for details.

Login or Register to post comments

Product Management Newsletter January 2018

 

Welcome to the January 2018 Continuent Product Management newsletter. It’s the start of the year, and so a good opportunity to look forward, as well as back a little to see how we did. Let’s start with the immediate future first.

  • Tungsten Clustering 6.0 is Coming!
  • Tungsten Replicator 6.0 is Also Coming!
  • Looking at the Year Ahead
  • Tungsten Backup, Tungsten Connector
  • Tungsten GUI
  • End-of-Life Policy
  • Release Schedule
  • Internal Tweaks
Tungsten Clustering 6.0 is Coming!

The development and restructuring of the product has taken a year to come to fruition, as there are quite a lot of different components, but the new version of Tungsten Clustering 6.0 is due out in February and we’re really pleased with the result.

The focus of this release of the product is to unify the components that previously made up our multimaster clustering solution, that is the new name for what we used to call Multi-Site/Multi-Master (MSMM).

In previous releases, MSMM was built like this:

  • Continuent Tungsten on each cluster
  • Tungsten Replicator to replicate data between each cluster

Two separate products, two sets of tools and command-lines to handle, and the replicators that handled the cross-site replication were completely independent of the clustering.

In the new multimaster clustering solution, everything is installed as one package. The cross-site replicators are controlled and managed from within cctrl and the rest of the clustering toolkit, and it’s finally possible to get a single unified view of the cluster, state and progress. The connector is properly aware of the relationship, as is cctrl, and recovery and management of the entire process is all within a single environment.

There are some other tweaks and improvements, including the package name, which is now Tungsten Clustering, and not Continuent Tungsten. We’ve got some webinars, deep dives, blog posts and other material to cover all the changes.

Tungsten Replicator 6.0 is Also Coming!

We’ve also made improvements to the replicator, although to be fair it had quite a few improvements and expansions over the past year. We’ve built on that by taking all of that functionality and making some additional improvements, including:

  • Re-introduction of the support for writing data into PostgreSQL.
  • Significant expansion of the Kafka applier to provide more detailed information in the messages, including whole transaction information so that you can verify message states.
  • Support for replicating data from databases that don’t have explicit database primary keys to analytics targets.

There are some other improvements too, so look out for a blog post and release notes to understand everything we’ve added.

Looking at the Year Ahead

Back in October we had an internal meeting where we looked ahead to the next three years of work and focus, and we’re beginning to execute on that.

Here are some highlights of the things we are planning and working on during 2018:

  • Two new products (see the more extensive notes on Tungsten Backup and Tungsten Connector below).
  • More appliers for the replicator, including new analytics and data warehouse platforms, in-memory databases, and desktop and corporate targets.
  • Expansion of our new DDL translation functionality for heterogeneous replication.
  • Better tools for schema changes and migration within Tungsten Clustering, particularly for multimaster.
  • Better tools to monitor and identify performance and replication issues.
  • Easier cloud deployments for some of the most popular cloud environments.
  • Faster replicator startup.
  • Faster clustering switches and failovers.
  • THL compression and encryption at rest.

These are some highlights, and I’m going to keep some items in reserve while we sort everything out.

Something here you would like to see but I haven’t mentioned? Please get in touch with me at  mc.brown@continuent.com

Tungsten Backup, Tungsten Connector

Percona Live in Dublin was where we announced two new products that we are working on, Tungsten Backup and Tungsten Connector.

Tungsten Backup is going to be a one-stop solution for handling and automating your backup and restore requirements. It will be based on our existing logging technology, and will use that information to be able to handle:

  • Automated backups
  • Automatic physical and logical backups
  • Backup/restore validation and verification
  • Point-in-time recovery
  • Per database, table and date data recovery

We’ll also have the ability to restore and extract information and apply back to any target. Want to pull out your year material and apply it into Hadoop? We can do that. Or how about extracting a list of changes to your database and apply into Elastic for some auditing? More information will be coming soon.

Tungsten Connector will be a standalone version of the connector we use within the main clustering product. This means you will gain much of the same functionality, including packet inspection, query distribution, load balancing, but you will be able to use it with other database deployments.

In fact, you’ll be able to use it with a variety of databases, so if you want to place the connector in front of your Oracle databases, or PostgreSQL, or even a Vertica cluster, you can do so. We’ll have more use cases, deployment examples, and capabilities to share soon, so look out for the blog posts and webinars that will surely be coming down the line.

Tungsten GUI

We have a surprise to come, in that we’ve been developing a GUI that enables full control of your Tungsten Clustering deployments, all through a web browser, while also providing a single view of your cluster state and performance.

Having worked on it for a few months, we are surprisingly close to a release, although from a practical level it’s probably going be the beginning of the 2nd quarter this year when we finally make it available.

We are, however, very pleased with the result and right now it’s going through a combination of UI development, user testing, security and QA, to ensure that the product is ready to be in the hands of our customers.

A GUI is something we’ve wished for here at Continuent for many years, and we finally have one that provides the ideal combination of functionality and information. Believe me when I say the excitement in our respective distributed offices is intense!

End-of-Life Policy

Back in November last year we made the decision to have a more formal end-of-life policy for our existing products.

There’s a longer more detailed description that will be going out, but the moment the basic approach is for Continuent to support their products for three years from the date of the launch of the original major version. So, for example, 6.0 will be coming out soon, and we’ll support that right round until February 2021. All 6.x versions will have the same EOL date.

This goes hand-in-hand with our new release schedule, which we’ve actually had in place for the last 18 months or so now. That stipulates that we have a new major release every year. So by the time 6.x support ends in 2021, we will already have released v7 in 2019, v8 in 2020 and v9 in 2021.

Look out for a more detailed description of the policy soon.

Release Schedule

As mentioned above, we changed to a more aggressive release schedule in 2016 so that we could ensure that we get new features and functionality, and bug fixes, in to your hands as quickly as possible. The official policy is:

  • One major release version every 12 months (e.g. v6.x, v7.x)
  • One minor (feature) release version every 3 months (e.g. v6.1, v6.2)
  • One bug fix release every 6 weeks (e.g. v6.1.1)

How did we do in 2017 on those goals? Here’s a list of the releases:

5.0.1 23 February 2017

5.1.0 26 April 2017

5.1.1 23 May 2017

5.2.0 19 July 2017

5.2.1 21 September 2017

5.2.2 22 October 2017

5.3.0 12 December 2017

Actually not too bad. Towards the end of the year the releases started to spread out a bit, but we still had a release of some kind almost every two months or less.

Internal Tweaks

As anybody who has worked in software development will tell you, sometimes you have stop, take stock of what you have and where you are, and determine how you are going to make and produce everything that you’ve promised over the coming months.

We decided back in a meeting almost exactly a year ago that we need to make some changes to our internal systems. We have, for example, been using Jenkins as our build and test environment for a while, but we’ve begun to outgrow it, especially in light of new products and the new release schedule.

So, once our 6.0 release goes out the door we are going to be doing some housekeeping. We’re migrating to a new QA environment, we’re going to be making some changes to some of our internal processes, and we’re going to work hard to ensure that the features and functionality we’ve got scheduled over the next few years will get ordered properly.

That does mean we’re going to be quieter during February and March than usual, but it will all be for a good cause in the long run.

Follow-Up

If you have any questions from the newsletter, please feel free to get in touch with me directly (mc.brown@continuent.com) or use the usual sales@continuent.com or support@continuent.com addresses and we’ll do our best to answer your questions (secret plans and hidden projects notwithstanding!).

 

Shinguz: Advanced MySQL Enterprise Training by FromDual

Due to the increasing demand FromDual has developed an Advanced MySQL Enterprise Training for DBAs and DevOps. After testing this training extensively with some selected customers last year we offer this MySQL Enterprise Training in 2018 for a broader audience.

The MySQL Enterprise Training addresses MySQL DBAs and DevOps which are already familiar with MySQL and approach now the challenge to operate a serious MySQL Enterprise infrastructure.

The topics of the 3 days MySQL Enterprise training you can find here.

You further have the opportunity to add 2 extra days of MySQL Performance Tuning from the Advanced MySQL Training.

We would be pleased to hold this training in-house in your company or at the location of one of our training partners in Essen, Berlin and Cologne (Germany).

For any question please contact us by eMail.

Taxonomy upgrade extras:  training enterprise mysql advanced

How caching_sha2_password leaks passwords

Oracle recently announced a new authentication plugin: caching_sha2_password. This was added in 8.0.4, the second release candidate for MySQL 8.0. The new plugin is also made the default (can be configured by changing default_authentication_plugin.

Why? Phasing out SHA1

As Oracle said in the blog post to annouce this change they want to move to a more secure hashing algorithm (SHA256). Which I think is a good reason to do this.

Adding salt

Adding a salt makes hashes for identical passwords, but different users different. Again a good reason to do this.

Performance

Their earlier attempt at this resulted in sha256_password. But this resulted in slower authentication. Without using persistent connections this is a serious limitation. So again a good reason.

What's wrong? If you don't use SSL/TLS it gives your password away.

To protect against sending the password in cleartext over an insecure connection it encrypts the password before sending it. It does this by using public key cryptography. It encrypts the password with the public key of the server. Then the server can decrypt it with its private key. So far so good.

But the problem is how MySQL gets the public from the server. There is --get-server-public-key which requests the key from the server. But it does so over an insecure connection, so this isn't safe.

An attacker could do a Man-in-the-Middle attack and give you their public key... and then the attacker can decrypt your password and proxy the connection.

The second option is to use --server-public-key-path=file_name. But then you somehow need to collect all public keys from all your servers and securely distribute them to your clients. And you might want to renew these keys every year... this seems like an operational nightmare to me.

Also depending on what connector you use these options may not be available.

If you use SSL/TLS things are not much better.

With default settings mysqld generates self-signed X509 certificates and enables SSL/TLS. And the default ssl-mode is PREFERRED. This is better than the previous defaults as it guards against passive attacks. However this is NOT protecting against active attacks (MitM attacks) as MySQL won't verify if the certificate is signed by a known CA. It by default also doesn't verify if the hostname matches the certificate.

So if someone hijacks your connection and knowns how to do a SSL handshake: then the caching_sha2_password plugin will handover the password in clear text.

Can we use it in a secure way?

Use SSL/TLS and set ssl-mode to VERIFY_IDENTITY (or at least VERIFY_CA). Note that this requires you to configure MySQL with certificates which are signed by your CA and matches the hostnames of your servers.

In case you only need localhost connections: configure MySQL to only listen on local-loopback and you're done.

Staying with mysql_native_password seems also to be an acceptable option for now. Note that sha256_password has many of the same issues and should also be avoided without strict SSL/TLS settings.

I initially reported this to Oracle in Bug #79944 on 13 January 2016 for the sha256_password plugin in 5.7.10.

About READ UNCOMMITTED

Transaction isolation levels are the least understood feature of relational databases. Most developers don’t know them and just use the default one. Actually, a relevant part of them even believe they use MySQL without transactions.

Amongst isolation levels, READ UNCOMMITTED is the least understood. So here’s a quick note about what it is and why – if you know about it – you probably have false beliefs.

Basically, READ UNCOMMITTED is totally inconsistent. It sees changes (new, deleted, modified rows) made by other transactions, that didn’t COMMIT yet. And actually, it’s possible that those transactions will fail, which leads READ UNCOMMITTED to see something that will never happen.

Despite this, it is extremely useful in some cases. For example:

  • To run SELECTs which read a huge amount of rows for analytics.
  • To DELETE rows not used by anyone (archived historical contents).
  • To INSERT rows when no one else can possibly insert new rows.

When using the default isolation level, or even READ COMMITTED, if your transaction involves many rows (say, millions) your statements will probably be slow, and will probably use much CPU time, maybe will also cause replication lag.

READ UNCOMMITTED is not magic and cannot make a slow query fast. But it can save a lot of transaction logs processing, making a query a bit faster and avoiding consuming many of resources for no reason.

That said, its drawbacks should also be noted:

  • Bugs. Precisely because it is the least used (tested) isolation level.
  • It would be a mistake to assume that READ UNCOMMITTED is lock-free. When trying to modify a locked row, it’s queued. When writing anything, it puts a lock. In some cases (like the ones listed above) this is irrelevant, but you need to understand this.
  • Selecting all rows from a table while it’s been intensively INSERTed is slower with READ UNCOMMITTED.

One last important detail. When I explain isolation levels, people tend to think that READ UNCOMMITTED doesn’t take snapshots. This is wrong: it takes a snapshots for every statement it runs, but snapshots include uncommitted changes.

This may sound slightly obscure. To understand better, create a test table and run something like:

SELECT a FROM my_test WHERE a > SLEEP(a);

The query will evaluate each row, and the evaluation will cause a wait of some seconds. So you have the time to add a new row from another connection. The row will not appear in the results of these query – it’s not in the snapshot that has been taken when the query was issued. However, if you run the query again, the new row will be there, because a new snapshot is taken.

Links for the curious:

Federico

Percona Monitoring Plugins 1.1.8 Release Is Now Available

Percona announces the release of Percona Monitoring Plugins 1.1.8.

Changelog

  • Add MySQL 5.7 support
  • Changed a canary check to use timestamp.now() and return a timedelta.seconds
  • Remove an additional condition for the Dictionary memory allocated
  • Fixed a false-positive problem when the calculated delay was less than 0 and the -m was not set.
  • Fixed the problem where slaves would alert due to deadlocks on the master.
  • If using pt-heartbeat, get_slave_status was only called when the -s option is set to MASTER
  • Disabled UNK alerts by default (it is possible to enable them explicitly).
  • A fix was added for MySQL Multi-Source replication.
  • The graph Percona InnoDB Memory Allocation showed zeroes for the
    metrics Total memory (data source item nl) and Dictionary memory
    (data source item nm) when used for MySQL 5.7.18, because the syntax
    of SHOW ENGINE INNODB STATUS has changed in MySQL 5.7 (see https://dev.mysql.com/doc/refman/5.7/en/innodb-standard-monitor.html).
  • The graph Percona InnoDB I/O Pending showed NaN for the metrics
    Pending Log Writes (data source item hn) and Pending Chkp Writes
    (data source item hk) when used for MySQL 5.7.18, because the syntax
    of SHOW ENGINE INNODB STATUS has changed in MySQL 5.7 (see https://dev.mysql.com/doc/refman/5.7/en/innodb-standard-monitor.html).
  • Added server @@hostname as a possible match to avoid DNS lookups while allowing hostname-match.

A new tarball is available from downloads area or in packages from our software repositories. The plugins are fully supported for customers with a Percona Support contract and free installation services are provided as part of some contracts. You can find links to the documentation, forums and more at the project homepage.

About Percona Monitoring Plugins
Percona Monitoring Plugins are monitoring and graphing components designed to integrate seamlessly with widely deployed solutions such as Nagios, Cacti and Zabbix.

Four Ways to Execute MySQL GROUP BY

In this blog post, I’ll look into four ways to execute MySQL GROUP BY. 

In my previous blog post, we learned that indexes or other means of finding data might not be the most expensive part of query execution. For example, MySQL GROUP BY could potentially be responsible for 90% or more of the query execution time. 

The main complexity when executing GROUP BY is computing aggregate functions in a GROUP BY statement. How this works is shown in the documentation for UDF Aggregate Functions. As we see, the requirement is that UDF functions get all values that constitute the single group one after another. That way, it can compute the aggregate function value for the single group before moving to another group.

The problem, of course, is that in most cases the source data values aren’t grouped. Values coming from a variety of groups follow one another during processing. As such, we need a special step to handle MySQL GROUP BY.

Let’s look at the same table we looked at before:

mysql> show create table tbl G *************************** 1. row ***************************       Table: tbl Create Table: CREATE TABLE `tbl` (  `id` int(11) NOT NULL AUTO_INCREMENT,  `k` int(11) NOT NULL DEFAULT '0',  `g` int(10) unsigned NOT NULL,  PRIMARY KEY (`id`),  KEY `k` (`k`) ) ENGINE=InnoDB AUTO_INCREMENT=2340933 DEFAULT CHARSET=latin1 1 row in set (0.00 sec)

And some GROUP BY statements executed in different ways:

1: Index Ordered GROUP BY in MySQL

mysql> select k, count(*) c from tbl group by k order by k limit 5; +---+---+ | k | c | +---+---+ | 2 | 3 | | 4 | 1 | | 5 | 2 | | 8 | 1 | | 9 | 1 | +---+---+ 5 rows in set (0.00 sec) mysql> explain select k, count(*) c from tbl group by k order by k limit 5 G *************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: tbl   partitions: NULL         type: index possible_keys: k          key: k      key_len: 4          ref: NULL         rows: 5     filtered: 100.00        Extra: Using index 1 row in set, 1 warning (0.00 sec)

In this case, we have an index on the column we use for GROUP BY. This way, we can just scan data group by group and perform GROUP BY on the fly (inexpensively).

It works especially well when we use LIMIT to restrict the number of groups we retrieve or when a “covering index” is in use, as a sequential index-only scan is a very fast operation.

If you have a small number of groups though, and no covering index, index order scans can cause a lot of IO. So this might not be the most optimal plan.

2: External Sort GROUP BY in MySQL

mysql> explain select SQL_BIG_RESULT g, count(*) c from tbl group by g limit 5 G *************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: tbl   partitions: NULL         type: ALL possible_keys: NULL          key: NULL      key_len: NULL          ref: NULL         rows: 998490     filtered: 100.00        Extra: Using filesort 1 row in set, 1 warning (0.00 sec) mysql> select SQL_BIG_RESULT g, count(*) c from tbl group by g limit 5; +---+---+ | g | c | +---+---+ | 0 | 1 | | 1 | 2 | | 4 | 1 | | 5 | 1 | | 6 | 2 | +---+---+ 5 rows in set (0.88 sec)

If we do not have an index that allows us to scan the data in group order, we can instead get data sorted through an external sort (also referred to as “filesort” in MySQL).

You may notice I’m using an SQL_BIG_RESULT hint here to get this plan. Otherwise, MySQL won’t choose this data.

In general, MySQL prefers to use this plan only if we have a large number of groups, because in this case sorting is more efficient than having a temporary table (which we will talk about next).

3: Temporary Table GROUP BY in MySQL

mysql> explain select  g, sum(g) s from tbl group by g limit 5 G *************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: tbl   partitions: NULL         type: ALL possible_keys: NULL          key: NULL      key_len: NULL          ref: NULL         rows: 998490     filtered: 100.00        Extra: Using temporary 1 row in set, 1 warning (0.00 sec) mysql> select  g, sum(g) s from tbl group by g order by null limit 5; +---+------+ | g | s    | +---+------+ | 0 |    0 | | 1 |    2 | | 4 |    4 | | 5 |    5 | | 6 |   12 | +---+------+ 5 rows in set (7.75 sec)

In this case, MySQL also does a full table scan. But instead of running additional sort passes, it creates a temporary table instead. This temporary table contains one row per group, and with each incoming row the value for the corresponding group is updated. Lots of updates! While this might be reasonable in-memory, it becomes very expensive if the resulting table is so large that updates are going to cause a lot of disk IO. In this case, external sort plans are usually better.

Note that while MySQL selects this plan by default for this use case, if we do not supply any hints it is almost 10x slower than the plan we get using the SQL_BIG_RESULT hint.

You may notice I added “ORDER BY NULL” to this query. This is to show you “clean” the temporary table only plan. Without it, we get this plan:

mysql> explain select  g, sum(g) s from tbl group by g limit 5 G *************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: tbl   partitions: NULL         type: ALL possible_keys: NULL          key: NULL      key_len: NULL          ref: NULL         rows: 998490     filtered: 100.00        Extra: Using temporary; Using filesort 1 row in set, 1 warning (0.00 sec)

In it, we get the “worst of both worlds” with Using Temporary Table and filesort.  

MySQL 5.7 always returns GROUP BY results sorted in group order, even if this the query doesn’t require it (which can then require an expensive additional sort pass). ORDER BY NULL signals the application doesn’t need this.

You should note that in some cases – such as JOIN queries with aggregate functions accessing columns from different tables – using temporary tables for GROUP BY might be the only option.

If you want to force MySQL to use a plan that does temporary tables for GROUP BY, you can use the SQL_SMALL_RESULT  hint.

4:  Index Skip-Scan-Based GROUP BY in MySQL

The previous three GROUP BY execution methods apply to all aggregate functions. Some of them, however, have a fourth method.

mysql> explain select k,max(id) from tbl group by k G *************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: tbl   partitions: NULL         type: range possible_keys: k          key: k      key_len: 4          ref: NULL         rows: 2     filtered: 100.00        Extra: Using index for group-by 1 row in set, 1 warning (0.00 sec) mysql> select k,max(id) from tbl group by k; +---+---------+ | k | max(id) | +---+---------+ | 0 | 2340920 | | 1 | 2340916 | | 2 | 2340932 | | 3 | 2340928 | | 4 | 2340924 | +---+---------+ 5 rows in set (0.00 sec)

This method applies only to very special aggregate functions: MIN() and MAX(). These do not really need to go through all the rows in the group to compute the value at all.

They can just jump to the minimum or maximum group value in the group directly (if there is such an index).

How can you find MAX(ID) value for each group if the index is only built on (K) column? This is an InnoDB table. Remember InnoDB tables effectively append the PRIMARY KEY to all indexes. (K) becomes (K,ID), allowing us to use Skip-Scan optimization for this query.

This optimization is only enabled if there is a large number of rows per group. Otherwise, MySQL prefers more conventional means to execute this query (like Index Ordered GROUP BY detailed in approach #1).

While we’re on MIN()/MAX() aggregate functions, other optimizations apply to them as well. For example, if you have an aggregate function with no GROUP BY (effectively  having one group for all tables), MySQL fetches those values from indexes during a statistics analyzes phase and avoids reading tables during the execution stage altogether:

mysql> explain select max(k) from tbl G *************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: NULL   partitions: NULL         type: NULL possible_keys: NULL          key: NULL      key_len: NULL          ref: NULL         rows: NULL     filtered: NULL        Extra: Select tables optimized away 1 row in set, 1 warning (0.00 sec)

Filtering and Group By

We have looked at four ways MySQL executes GROUP BY.  For simplicity, I used GROUP BY on the whole table, with no filtering applied. The same concepts apply when you have a WHERE clause:

mysql> explain select  g, sum(g) s from tbl where k>4 group by g order by NULL limit 5 G *************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: tbl   partitions: NULL         type: range possible_keys: k          key: k      key_len: 4          ref: NULL         rows: 1     filtered: 100.00        Extra: Using index condition; Using temporary 1 row in set, 1 warning (0.00 sec)

For this case, we use the range on the K column for data filtering/lookup and do a GROUP BY when there is a temporary table.

In some cases, the methods do not conflict. In others, however, we have to choose either to use one index for GROUP BY or another index for filtering:

mysql> alter table tbl add key(g); Query OK, 0 rows affected (4.17 sec) Records: 0  Duplicates: 0  Warnings: 0 mysql> explain select  g, sum(g) s from tbl where k>1 group by g limit 5 G *************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: tbl   partitions: NULL         type: index possible_keys: k,g          key: g      key_len: 4          ref: NULL         rows: 16     filtered: 50.00        Extra: Using where 1 row in set, 1 warning (0.00 sec) mysql> explain select  g, sum(g) s from tbl where k>4 group by g limit 5 G *************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: tbl   partitions: NULL         type: range possible_keys: k,g          key: k      key_len: 4          ref: NULL         rows: 1     filtered: 100.00        Extra: Using index condition; Using temporary; Using filesort 1 row in set, 1 warning (0.00 sec)

Depending on specific constants used in this query, we can see that we either use an index ordered scan for GROUP BY (and  “give up”  benefiting from the index to resolve the WHERE clause), or use an index to resolve the WHERE clause (but use a temporary table to resolve GROUP BY).

In my experience, this is where MySQL GROUP BY does not always make the right choice. You might need to use FORCE INDEX to execute queries the way you want them to.

Summary

I hope this article provides a good overview of how to execute MySQL GROUP BY.  In my next blog post, we will look into techniques you can use to optimize GROUP BY queries.

Pages