Planet MySQL

How VividCortex Uses the New Generated Virtual Columns Feature in MySQL

In an industry as fast-growing and interconnected as database technology’s, it’s exciting to track how innovations in one platform can ignite beneficial, rippling effects on other, surrounding systems. At VividCortex we frequently find ourselves faced with opportunities to improve our monitoring solutions based on how database technologies (such as MySQL, Redis, PostgreSQL, etc) develop and integrate new upgrades.

When those platforms that we monitor -- already powerful, tried and true -- equip themselves with new features, we work to discover how VividCortex can leverage those features and make our own technology even better. In the case of MySQL 5.7.8’s recent introduction of Generated Virtual Columns, we found the opportunity to use a new feature to make our queries simpler and more elegant, with results that are significantly faster and more eifficient in how they use space.   

Image via Wikipedia

Before having access to MySQL’s Generated Virtual Columns, we were already using a table that had an ID column for metric ID’s. As an inherent part of our use of that table, when we read from it we were interested in a huge number of those metric ID’s. Unfortunately, because those ID’s are generated from a hash, reading them wasn’t so simple as just selecting a particular range. Instead, we need to generate a huge list of ID’s and put them in IN (...) clauses when we query.

We developed a way to decrease the number of ID’s generated and lighten the load of the process; instead of specifying the raw ID’s themselves -- often a number in the thousands -- we found that a satisfactory solution in selecting ID’s that have a certain hash result (specifically, a modulo result). In other words, instead of using

SELECT * FROM table WHERE metric IN (_, _, _, ... [hundreds or thousands more])

we can use

SELECT * from table WHERE metric % 100 IN (1, 2, 3, 4)

In this expression, we’re only interested in metrics ID’s that have a remainder of 1,2,3, or 4 after dividing by 100. This specification makes handling our queries much easier… but it also means we have an indexing problem.

On one hand, our metric ID is part of our primary key, so specifying the ID directly in the IN clause would be very fast, as we can look up records directly in the primary key. However, with our modulo approach, we’d have to scan through and check each and every ID -- a slow, granular process. This process causes MySQL to look at more rows than is actually necessary, which is unnecessary work. This is where Generated Virtual Columns come in.

As of MySQL 5.7.8, users have had the ability to create secondary indexes on generated virtual columns -- for us, that means we can add a virtual generated column for our modulo result (metric % 100), which, significantly, uses no space directly. With this power at our fingertips, we updated one of our indexes to use the generated column -- something that was previously not possible. We also updated that index to include another column that we needed for our query, so it became a covering index (read about covering indexes in Baron Schwartz’s post about exploiting MySQL index optimizations.)

Of special interest to VividCortex, once we started exploring Generated Virtual Columns, we found it especially helpful to look at the differences in EXPLAIN plans in our Query Details page. Rather than manually experimenting with different queries and exhaustively checking latencies and other details, the information was all there, available, easily accessed, on VividCortex.

The final result is that our queries got a lot simpler and more efficient mainly due to MySQL's new virtual generated column support. MySQL became more flexible and powerful, VividCortex was able to leverage that power, and, as a result, when customers use our product, they’ll find a more streamlined solution, making minimal demands on space and time in their resources. If you’d like to see VividCortex in action on your own systems, be sure to request a demo.

PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL High Availability: The Road Ahead for Percona and XtraDB Cluster

This blog post discusses what is going on in the MySQL high availability market, and what Percona’s plans are for helping customers with high availability solutions.

One thing I like to tell people is that you shouldn’t view Percona as a “software” company, but as a “solution” company. Our goal has always been to provide the best solution that meets each customer’s situation, rather than push our own software, regardless of whether it is the best fit or not. As a result, we have customers running all kinds of MySQL “flavors”: MySQL, MariaDB, Percona Server, Amazon RDS and Google Cloud SQL. We’re happy to help customers be successful with the technology of their choice, and advise them on alternatives when we see a better fit.

One area where I have been increasingly uneasy is our advanced high availability support with Percona XtraDB Cluster and other Galera-based technologies. In 2011, when we started working on Percona XtraDB Cluster together with Codership, we needed to find a way to arrange investment into the development of Galera technology to bring it to market. So we made a deal, which, while providing needed development resources, also required us to price Percona XtraDB Cluster support as a very expensive add-on option. While this made sense at the time, it also meant few companies could afford XtraDB Cluster support from Percona, especially at large scale.

As a few years passed, the Galera technology became the mainstream high-end high availability option. In addition to being available in Percona XtraDB Cluster, it has been included in MariaDB, as well as Galera Cluster for MySQL. Additionally, the alternative technology to solve the same problem – MySQL Group Replication – started to be developed by the MySQL Team at Oracle. With these all changes, it was impossible for us to provide affordable support for Percona XtraDB Cluster due to our previous commercial agreement with Codership that reflected a very different market situation than we now find ourselves facing.

As a result, over a year ago we exited our support partnership agreement with Codership and moved the support and development function in-house. These changes have proven to be positive for our customers, allowing us to better focus on their priorities and provide better response time for issues, as these no longer require partner escalation.

Today we’re taking the next natural step – we will no longer require customers to purchase Percona XtraDB Cluster as a separate add-on. Percona will include support for XtraDB Cluster and other Galera-based replication technologies in our Enterprise and Premier support levels, as well as our Percona Care and Managed Services subscriptions. Furthermore, we are going to support Oracle’s MySQL Group Replication technology at no additional cost too, once it becomes generally available, so our customers have access to the best high availability technology for their deployment.

As part of this change, you will also see us focusing on hardening XtraDB Cluster and Galera technology, making it better suited for demanding business workloads, as well as more secure and easier to use. All of our changes will be available as 100% open source solutions and will also be contributed back to the Galera development team to incorporate into their code base if they wish.

I believe making the Galera code better is the most appropriate action for us at this point!

PlanetMySQL Voting: Vote UP / Vote DOWN

Installing Nginx with PHP 7 and MySQL 5.7 (LEMP) on Ubuntu 16.04 LTS

Nginx (pronounced "engine x") is a free, open-source, high-performance HTTP server. Nginx is known for its stability, rich feature set, simple configuration, and low resource consumption. This tutorial shows how you can install Nginx on an Ubuntu 16.04 server with PHP 7 support (through PHP-FPM) and MySQL support (LEMP = Linux + nginx (pronounced "engine x") + MySQL + PHP) .
PlanetMySQL Voting: Vote UP / Vote DOWN

MariaDB MaxScale 1.4.3 GA is available for download

Wed, 2016-05-04 07:17Johan

We are pleased to announce that MariaDB MaxScale 1.4.3 GA is now available for download!

If MariaDB MaxScale is new to you, we recommend reading this page first.

1.4.3 is a bugfix release, not bringing any new features but fixing certain issues found in 1.4.2.

MariaDB MaxScale 1.4 brings:

  1. The Firewall Filter has been extended and can now be used for either black-listing or white-listing queries. In addition it is capable of logging both queries that match and queries that do not match.
  2. Client-side SSL has been available in MariaDB MaxScale for a while, but it has been somewhat unreliable. We now believe that client side SSL is fully functional and usable.

Additional improvements:

  • POSIX Extended Regular Expression Syntax can now be used in conjunction with qlafilter, topfilter and namedserverfilter.
  • Improved user grant detection.
  • Improved password encryption.

The release notes can be found here and the binaries can be downloaded here.

In case you want to build the binaries yourself, the source can be found at GitHub, tagged with 1.4.3.

We hope you will download and use this release, and we encourage you to create a bug report in Jira for any bugs you might encounter.

Tags: MaxScale About the Author

Johan Wikman is a senior developer working on MaxScale at MariaDB Corporation. 

PlanetMySQL Voting: Vote UP / Vote DOWN

MariaDB Galera Cluster 10.0.25 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB Galera Cluster 10.0.25. See the release notes and changelog for details on this release. Download MariaDB Galera Cluster 10.0.25 Release Notes Changelog What is MariaDB Galera Cluster? MariaDB APT and YUM Repository Configuration Generator Thanks, and enjoy MariaDB!

The post MariaDB Galera Cluster 10.0.25 now available appeared first on

PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 5.7: initial flushing analysis and why Performance Schema data is incomplete

In this post, we’ll examine why in an initial flushing analysis we find that Performance Schema data is incomplete.

Having shown the performance impact of Percona Server 5.7 patches, we can now discuss their technical reasoning and details. Let’s revisit the MySQL 5.7.11 performance schema synch wait graph from the previous post, for the case of unlimited InnoDB concurrency:

First of all, this graph is a little “nicer” than reality, which limits its diagnostic value. There are two reasons for this. The first one is that page cleaner worker threads are invisible to Performance Schema (see bug 79894). This alone limits PFS value in 5.7 if, for example, one tries to select only the events in the page cleaner threads or monitors low concurrency where the cleaner thread count is non-negligible part of the total threads.

To understand the second reason, let’s look into PMP for the same setting. Note that selected intermediate stack frames were removed for clarity, especially in the InnoDB mutex implementation.

660 pthread_cond_wait,enter(ib0mutex.h:850),buf_dblwr_write_single_page(ib0mutex.h:850),buf_flush_write_block_low(,buf_flush_page(,buf_flush_single_page_from_LRU(,buf_LRU_get_free_block(,... 631 pthread_cond_wait,buf_dblwr_write_single_page(,buf_flush_write_block_low(,buf_flush_page(,buf_flush_single_page_from_LRU(,buf_LRU_get_free_block(,... 337 pthread_cond_wait,PolicyMutex<TTASEventMutex<GenericPolicy>(ut0mutex.ic:89),get_next_redo_rseg(,trx_assign_rseg_low(,trx_set_rw_mode(,lock_table(,... 324 libaio::??(,LinuxAIOHandler::collect(,LinuxAIOHandler::poll(,... 241 pthread_cond_wait,PolicyMutex<TTASEventMutex<GenericPolicy>(ut0mutex.ic:89),trx_write_serialisation_history(,trx_commit_low(,... 147 pthread_cond_wait,enter(ib0mutex.h:850),trx_undo_assign_undo(ib0mutex.h:850),trx_undo_report_row_operation(,... 112 pthread_cod_wait,mtr_t::s_lock(sync0rw.ic:433),btr_cur_search_to_nth_level(,... 83 poll(,Protocol_classic::get_command(,do_command(,handle_connection(,... 64 pthread_cond_wait,Per_thread_connection_handler::block_until_new_connection(thr_cond.h:136),...

The top wait in both PMP and the graph is the 660 samples of enter mutex in buf_dblwr_write_single_pages, which is the doublewrite mutex. Now try to find the nearly as hot 631 samples of event wait in buf_dblwr_write_single_page in the PFS output. You won’t find it because InnoDB OS event waits are not annotated in Performance Schema. In most cases this is correct, as OS event waits tend to be used when there is no work to do. The thread waits for work to appear, or for time to pass. But in the report above, the waiting thread is blocked from proceeding with useful work (see bug 80979).

Now that we’ve shown the two reasons why PFS data is not telling the whole server story, let’s take PMP data instead and consider how to proceed. Those top two PMP waits suggest 1) the server is performing a lot of single page flushes, and 2) those single page flushes have their concurrency limited by the eight doublewrite single-page flush slots available, and that the wait for a free slot to appear is significant.

Two options become apparent at this point: either make the single-page flush doublewrite more parallel or reduce the single-page flushing in the first place. We’re big fans of the latter option since version 5.6 performance work, where we configured Percona Server to not perform single-page flushes at all by introducing the innodb_empty_free_list_algorithm option, with the “backoff” default.

The next post in the series will describe how we removed single-page flushing in 5.7.

PlanetMySQL Voting: Vote UP / Vote DOWN

My slides about MySQL Performance from #PerconaLive Apr.2016 US

As promised, here are my slides from Percona Live Conference in US, Apr.2016 :

  • MySQL 5.7 Performance & Scalability Benchmarks (PDF)
  • MySQL 5.7 Demystified Tuning (PDF)

Feel free to ask any questions or details you're needing, etc..

Also, not really related to MySQL, but as I was asked so many times about "how did you manage to project your slides from Mac, but drive it an annotate via iPad?" - here is a short HOWTO:
  • you need to have Keynote app installed on both your Mac and iPad
  • you create your own WiFi Network on your Mac (MenuBar->WiFi->Create Network...)
  • once done, connect to this WiFi Network your iPad
  • (having your own network is getting a rid of any potential sync issues, removing any dependency on wifi availability in a room, as well allowing you to walk way far from your Mac and still keep a control on your slides ;-))
  • then you're starting your Keynote presentation projection on your Mac
  • after what opening Keynote app on your iPad
    • "clicking" on Keynote Remote
    • selecting your Mac from the list of available devices
    • and you're getting hands on your currently projected slides ;-))
    • you can select then a preferred layout: current slide, current + next, current + notes, etc.
    • AND on any slide you can involve an annotation and draw over the slide with pencils of different color to point on one or another part of your slides
    • (of course, the drawing you're doing remains only during annotation and not destroying your slides ;-))
  • have fun! ;-))

What else to say? The conference was really great and I may only admit that Percona is doing it better and better from year to year.. Huge amount of very interesting talks, great technical content mostly everywhere, a lot of innovation, new ideas, deep discussions, etc. etc.. -- you don't know what you're missing if you were not there ;-))

Well, time for the rest now, and as a final point - a "Bloody Cheesecake" on my departure from SFO Airport
(for those who understand ;-))


PlanetMySQL Voting: Vote UP / Vote DOWN

Best Practices for Configuring Optimal MySQL Memory Usage

In this blog post, we’ll discuss some of the best practices for configuring optimal MySQL memory usage.

Correctly configuring the use of available memory resources is one of the most important things you have to get right with MySQL for optimal performance and stability. As of MySQL 5.7, the default configuration uses a very limited amount of memory – leaving defaults is one of the worst things you can do. But configuring it incorrectly can result in even worse performance (or even crashes).

The first rule of configuring MySQL memory usage is you never want your MySQL to cause the operating system to swap. Even minor swapping activity can dramatically reduce MySQL performance. Note the keyword “activity” here. It is fine to have some used space in your swap file, as there are probably parts of the operating system that are unused when MySQL is running, and it’s a good idea to swap them out. What you don’t want is constant swapping going on during the operation, which is easily seen in the “si” and “so” columns in the vmstat output.

Example: No Significant Swapping

Example:  Heavy Swapping Going

If you’re running Percona Monitoring and Management, you can also look into the Swap Activity graph in System Overview Dashboard.

If you have spikes of more than 1MB/sec, or constant swap activity, you might need to revisit your memory configuration.

MySQL Memory allocation is complicated. There are global buffers, per-connection buffers (which depend on the workload), and some uncontrolled memory allocations (i.e., inside Stored Procedures), all contributing to difficulties in computing how much memory MySQL will really use for your workload. It is better to check it by looking at the virtual memory size (VSZ) that MySQL uses. You can get it from “top”, or by running ps aux | grep mysqld.

mysql     3939 30.3 53.4 11635184 8748364 ?    Sl   Apr08 9106:41 /usr/sbin/mysqld

The 5th column here shows VSZ usage (about 11GB).

Note that the VSZ is likely to change over time. It is often a good idea to plot it in your monitoring system and set an alert to ping you when it hits a specified threshold. Don’t allow the mysqld process VSZ exceed 90% of the system memory (and less if you’re running more than just MySQL on the system).

It’s a good idea to start on the safe side by conservatively setting your global and per connections buffers, and then increase them as you go. Many can be set online, including innodb_buffer_pool_size in MySQL 5.7.

So how do you decide how much memory to allocate to MySQL versus everything else? In most cases you shouldn’t commit more than 90% of your physical memory to MySQL, as you need to have some reserved for the operating system and things like caching binary log files, temporary sort files, etc.

There are cases when MySQL should use significantly less than 90% of memory:

  • If there are other important processes running on the same server, either all the time or periodically. If you have heavy batch jobs run from cron, which require a lot of memory, you’ll need to account for that.
  • If you want to use OS caching for some storage engines. With InnoDB, we recommend innodb_flush_method=O_DIRECT  in most cases, which won’t use Operating System File Cache. However, there have been cases when using buffered IO with InnoDB made sense. If you’re still running MyISAM, you will need OS cache for the “data” part of your tables. With TokuDB, using OS cache is also a good idea for some workloads.
  • If your workload has significant demands, Operating System Cache – MyISAM on disk temporary tables, sort files, some other temporary files which MySQL creates the need to be well-cached for optimal performance.

Once you know how much memory you want the MySQL process to have as a whole, you’ll need to think about for what purpose the memory should be used inside MySQL.The first part of memory usage in MySQL is workload related – if you have many connections active at the same time that run heavy selects using a lot of memory for sorting or temporary tables, you might need a lot of memory (especially if Performance Schema is enabled). In other cases this amount of memory is minimal. You’ll generally need somewhere between 1 and 10GB for this purpose.

Another thing you need to account for is memory fragmentation. Depending on the memory allocation library you’re using (glibc, TCMalloc, jemalloc, etc.), the operating system settings such as Transparent Huge Pages (THP) and workload may show memory usage to grow over time (until it reaches some steady state). Memory fragmentation can also account for 10% or more of additional memory usage.

Finally, let’s think about various global buffers and caching. In typical cases, you mainly only have innodb_buffer_pool_size to worry about. But you might also need to consider key_buffer_size,  tokudb_cache_size, query_cache_size  as well as table_cache and table_open_cache. These are also responsible for global memory allocation, even though they are not counted in bytes. Performance _Schema may also take a lot of memory, especially if you have a large number of connections or tables in the system.

When you specify the size of the buffers and caches, you should determine what you’re specifying. For innodb_buffer_pool_size, remember there is another 5-10% of memory that is allocated for additional data structures – and this number is larger if you’re using compression or set innodb_page_size smaller than 16K. For tokudb_cache_size, it’s important to remember that the setting specified is a guide, not a “hard” limit: the cache size can actually grow slightly larger than the specified limit.

For systems with large amounts of memory, the database cache is going to be by far the largest memory consumer, and you’re going to allocate most of your memory to it. When you add extra memory to the system, it is typically to increase the database cache size.

Let’s do some math for a specific example. Assume you have a system (physical or virtual) with 16GB of memory. We are only running MySQL on this system, with an InnoDB storage engine and use innodb_flush_method=O_DIRECT, so we can allocate 90% (or 14.4GB) of memory to MySQL. For our workload, we assume connection handling and other MySQL connection-based overhead will take up 1GB (leaving 13.4GB). 0.4GB is likely to be consumed by various other global buffers (innodb_log_buffer_size, Table Caches, other miscellaneous needs, etc.), which now leaves 13GB. Considering the 5-7% overhead that the InnodB Buffer Pool has, a sensible setting is innodb_buffer_pool_size=12G – what we very commonly see working well for systems with 16GB of memory.

Now that we have configured MySQL memory usage, we also should look at the OS configuration. The first question to ask is if we don’t want MySQL to swap, should we even have the swap file enabled?  In most cases, the answer is yes – you want to have the swap file enabled (strive for 4GB minimum, and no less than 25% of memory installed) for two reasons:

  • The operating system is quite likely to have some portions that are unused when it is running as a database server. It is better to let it swap those out instead of forcing it to keep it in memory.
  • If you’ve made a mistake in the MySQL configuration, or you have some rogue process taking much more memory than expected, it is usually a much better situation to lose performance due to a swap then to kill MySQL with an out of memory (OOM) error – potentially causing downtime.

As we only want the swap file used in emergencies, such as when there is no memory available or to swap out idle processes, we want to reduce Operating System tendency to swap   (echo 1 >  /proc/sys/vm/swappiness). Without this configuration setting you might find the OS swapping out portions of MySQL just because it feels it needs to increase the amount of available file cache (which is almost always a wrong choice for MySQL).

The next thing when it comes to OS configuration is setting the Out Of Memory killer. You may have seen message like this in your kernel log file:

Apr 24 02:43:18 db01 kernel: Out of memory: Kill process 22211 (mysqld) score 986 or sacrifice child

When MySQL itself is at fault, it’s pretty rational thing to do. However, it’s also possible the real problem was some of the batch activities you’re running: scripts, backups, etc. In this case, you probably want those processes to be terminated if the system does not have enough memory rather than MySQL.

To make MySQL a less likely candidate to be killed by the OOM killer, you can adjust the behavior to make MySQL less preferable with the following:

echo '-800' > /proc/$(pidof mysqld)/oom_score_adj

This will make the Linux kernel prefer killing other heavy memory consumers first.

Finally on a system with more than one CPU socket, you should care about NUMA when it comes to MySQL memory allocation. In newer MySQL versions, you want to enable innodb_numa_interleave=1. In older versions you can either manually run numactl --interleave=all  before you start MySQL server, or use the numa_interleave configuration option in Percona Server.


PlanetMySQL Voting: Vote UP / Vote DOWN

MariaDB Security at the Open Source Conference in Rome

Tue, 2016-05-03 12:35maria-luisaraviol

At the 2016 Open Source Conference in Rome, held a few weeks ago, MariaDB was present along with many of the most relevant players in the Open Source community. This included the Italian Public Administration. They are required by law to give preference to free software and open source software. The goal of the Italian Open Source Observatory is to promote the use of open source software in Italian public administrations. They monitor and collect information on cases of open source adoption and best practices related to open technology usage within institutions.

Our focus at the conference this year was in particular on the new important MariaDB security features. They're part of our open source offering starting with the release of MariaDB 10.1. It's extremely important that these features are accessible to all MariaDB users. However, it's very relevant for the Italian Public Administration since they collect, manage and share sensitive Italian citizen data. This data needs to be protected and secure from any kind of unauthorized access.

With MariaDB 10.1 Security Capabilities, the Italian Public Administration can access these database security features (e.g., firewalls, encryption and database activity logging) while still relying on an open source solution in compliance with Italian law.

Every organization is concerned about threats to its data and the consequences that such threats represent. Every citizen wants to be sure that his privacy is protected and private data (i.e., health, insurance, family and financial data) are safely managed, transferred and accessed. But it's also important that all of the services provided by the Public Administration through their web portals are always accessible and that those web sites and their database are protected against attacks that aim to deny those services.

Protecting data through encryption is essential, but it's also important to make every effort possible to prevent data from being accessed. MariaDB provides an excellent set of features to support all of the new requirements in terms of database authentication, like Single Sign On which is preferred in most enterprises.

Private and public companies—even though they may have different objectives—also need to secure their services whenever they are provided through the Internet. This can be handled by MariaDB through the firewall features provided with MariaDB MaxScale. It can prevent attacks which are derived from unauthorized access, as well as denial of service attacks and SQL Injections.

See also our Blog on MariaDB Database Security. Tags: Security About the Author

Maria-Luisa Raviol is a Senior Sales Engineer with over 20 years industry experience.

PlanetMySQL Voting: Vote UP / Vote DOWN

Solving the Facebook-OSC non-atomic table swap problem

We present a way to use an atomic, blocking table swap in the Facebook Online-Schema-Change solution, as well as in a rumored, other Online-Schema-rumored-Change solution.

Quick recap (really quick)

pt-online-schema-change and facebook-osc are two popular online-schema-change solutions for MySQL. They both use triggers, but in different ways. While the Percona tool uses synchronous table updates, such that any INSERT|UPDATE|DELETE on the modified table causes an INSERT|UPDATE|DELETE on a ghost table, in the Facebook tool all cause an INSERT on a changelog table, which is then iterated, read, having entries applied on the ghost table.

The TL;DR is that DMLs on the table propagate synchronously, within same transaction in the Percona tool, and asynchronously, with lag, in the Facebook tool.

What's the problem with the table swap?

In the Percona tool, once the logic is satisfied the copy is complete, we issue this query:

RENAME TABLE tbl TO tbl_old, tbl_new TO tbl;

This is an atomic, two table RENAME operation.

However with the asynchronous nature of the Facebook tool, such a RENAME would be a mistake. We must first block writes to the modified table, then make sure we have iterated the changelog table to the point of lock, apply those changes onto the ghost table, and only then do the swap.

The problem is: you cannot RENAME TABLES while one of them is LOCKed.

This is silly, and inconsistent, because:

> LOCK TABLES tbl WRITE; Query OK, 0 rows affected (0.00 sec) > RENAME TABLE tbl TO tbl_old, tbl_new TO tbl; ERROR 1192 (HY000): Can't execute the given command because you have active locked tables or an active transaction > ALTER TABLE tbl RENAME TO tbl_old; Query OK, 0 rows affected (0.00 sec)

Why would the RENAME fail where the ALTER works?

Small thing, but critical to the operation of the online-schema-change. From the Facebook OSC documentation:

Since alter table causes an implicit commit in innodb, innodb locks get released after the first alter table. So any transaction that sneaks in after the first alter table and before the second alter table gets a 'table not found' error. The second alter table is expected to be very fast though because copytable is not visible to other transactions and so there is no need to wait.

What the FB solution means

It means for a very brief duration, the table is not there. Your app will get errors.

Of course, we should be able to handle errors anytime, aywhere. But the honest truth is: we (as in the world) do not. Many apps will fail ungracefully should they get a table not found error.

An atomic swap, as compared, would make for briefly blocking operations, making the app ignorant of the swap.


Rumor has it that we at GitHub are developing a new, triggerless, Online Schema Change tool. It is rumored to be based off binary logs and is rumored to have lots of interesting rumored implications.

Such rumored implementation would have to be asynchronous by nature, or so rumors say. And as such, it would fall for the same non-atomic table swap problem.


Once we heard it was rumored we were working on a triggerless online schema change tool, we realized we would have to solve the non-atomic swap problem. What we did was to gossip about it in between ourselves, which led to three different rumors of a solution, eventually manifested as three different working solutions. All three solutions make for blocking queries on the app's side. I will present one of these solution here, based on voluntary locks.

The idea is to make a table locked without actually issuing a LOCK TABLE statement, such that we would be able to run a RENAME TABLE operation, that would wait until we say it's good to complete.

Let's assume:

  • Our table is tbl
  • Ghost table (table onto which we've actually made the changes) is tbl_new
  • Our app continuously writes to tbl
  • We have 3 connections on our tables, aptly named #1, #2, #3

We issue the following, in this order:

  1. #1:
    SELECT GET_LOCK('ding', 0);

    Lock acquired, no problems

  2. #2:
    SELECT RELEASE_LOCK('ding') FROM tbl WHERE GET_LOCK('ding', 999999)>=0 LIMIT 1;

    Ignore the RELEASE_LOCK for now, this is merely cleanup. The query attempts to read one row from tbl where GET_LOCK('ding')>=0. But 'ding' is locked, hence the entire query blocks.
    Otherwise, other queries on tbl (both reads and writes) are running fine.

  3. #3:
    RENAME TABLE tbl TO tbl_old, tbl_new TO tbl;

    Now the magic begins. The RENAME operation cannot proceed while queries are executing on tbl. In particular, it waits on #2 to complete. But #2 is blocked on #1, so it does not complete. Our RENAME is also blocked!
    There are further two consequences that work to our advantage:

    • Any further incoming INSERT|UPDATE|DELETE on tbl is now unable to proceed; such queries will now wait for the RENAME to complete. So no further updated on tbl are being applied. App is blocked
    • tbl_new is nonblocked! And this is because how RENAME works internally. Since it couldn't satisfy the first clause, it doesn't even check for the second, and does not place a LOCK on tbl_new.
  4. OSC:
    Now that no further writes are possible on tbl, we satisfy ourselves that we've iterated to the last of the changelog entries and have applied changes to tbl_new. Exactly how we satisfy ourselves is a matter of implementation. Rumor is that we use a rumor that the last entry was handled in our rumored solution. That last part is actually not a pun.
    We are now content that all changes have been applied to tbl_new.
  5. #1:

    Ding! Connection #2 gets released, reads some row from tbl (but no one is actually interested in the result set) and completes. The #3 RENAME is not blocking on anything anymore. It executes. The tables are swapped. Once they are swapped, any INSERT|UPDATE|DELETEs that were pending on tbl are released and App is unblocked.
    The atomic swap is complete.


Agony. This workaround is agonizing. Is agonization a word? By rumor written in Go, our OSC has this implemented via goroutines, and the code is one of those atrocities you are ashamed to look at. Well, it's OK under the circumstances. But really, implementing this is painful, and actually more complicated than the above description. Why is that?

In the above we make fro two blocking operations: #2 and #3. We must not proceed to #3 before #2 is applied, and we must not proceed to OSC completion before #3 is applied. But how does our code know that it's being blocked? If it's being blocked, it can't tell me anything, because it's blocked. If it's not blocked yet, it can tell me it's not blocked yet, but I'm really interested in knowing the time it gets blocked.

But preferably the exact time, or near exact, because one we start blocking, App suffers. Connections accumulate. We really want to make the swap as quick as possible (and by rumor we have a rollback & retry mechanism for this operation if it exceeds X seconds).

Unfortunately the solution involves polling. That is, Once we issue #2 (asynchronously, right? It's blocking), we aggressively poll SHOW PROCESSLIST and look for that blocked query. And the same for #3. Polling is a form of necessary ugliness in this flow.

Other solutions

The other two solutions do not use a voluntary lock. Instead:

  1. Use a LOCK on some yet another table and a query involving that table JOINed with tbl
  2. A SELECT ... FOR UPDATE on yet another table followed by a SELECT on the locked row on that table JOINed with tbl.

We leave the implementation as an exercise for the reader.

Can't we just make the RENAME work under LOCK?

Yeah. That's what the Facebook people said. "Hey, we can just fix this".

PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 5.7 utilizando vagrant

Fala galera.
Hoje vou escrever um post rapidinho pra mostrar uma maneira facil de configurar uma vm com mysql 5.7 usando vagrant.

Vagrant é um utilitário de linha de comando que disponibiliza ferramentas para administrar softwares de máquinas virtuais, como por exemplo o virtualbox. Você pode encontrar mais detalhes sobre o Vagrant na documentação. Clique no link installation, se tiver dúvidas quanto a sua instalação.

Eu escrevi um script que está disponível no github. Ele cria uma máquina virtual usando centos 7 e instala o mysql 5.7 nela:

git clone cd mysql-vagrant vagrant up vagrant ssh mysql57 mysql

Se você olhar nos arquivos que foram clonados do git, vai ver um arquivo chamado Este arquivo vai configurar o repositório do mysql, instalar o pacote do mysql 5.7 e resetar o password do usuário root.

Qualquer dúvida é só postar nos comentários.

PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 5.7 using vagrant

Hi there.
Today I’ll write a quick post to show how to easily configure a brand new mysql 5.7 machine using vagrant.
Vagrant is a command line utility that provides tools to manage virtual machines for different virtualization technologies, for example virtualbox. You can see more details about it here. Navigate to installation if you want to know how to install it.

I’ve created a vagrant script to boot a centos 7 + mysql 5.7 machine. You can find it on github. See how you can get it:

git clone cd mysql-vagrant vagrant up vagrant ssh mysql57 mysql

If you look the files you have cloned from git, there is a file called This file will configure mysql yum repository, install mysql 5.7 and reset the password for the root user.

If you have any questions, please comment.

PlanetMySQL Voting: Vote UP / Vote DOWN

Reserved words usage in MySQL

It is not uncommon to come across MySQL databases where reserved words are in use as identifiers for any kind of database objects.

Perhaps when the application schema was implemented, the words were not reserved yet, and they became reserved later on a subsequent MySQL release.

It is a good practice to check reserved words usage prior to doing any database upgrades, as any newly reserved keywords will cause syntax errors on the new version.

This is usually not a problem if proper quoting is used for referencing the objects, as described on the official manual page.

The actual steps to do this depend on the environment; for example, the following can be configured to tell Hibernate to escape identifiers:

property name="hibernate.globally_quoted_identifiers" value="true"

This does not appear to be documented properly (there is an open bug unresolved at the time of this writing).

However, we cannot make the assumption that all application code is properly escaped to deal with this kind of issues.

So what are the symptoms?

Error 1064 will be reported while trying to use a reserved word:

mysql> CREATE TABLE interval (begin INT, end INT); ERROR 1064 (42000): You have an error in your SQL syntax ... near 'interval (begin INT, end INT)' How can we check for reserved words?

The following procedure can help you find out if any particular MySQL version’s reserved words are in use:

  1. Using the list on the corresponding manual page, create a text file with one reserved word on each line
  2. Load data into a temporary table USE test; CREATE TABLE reserved_words VARCHAR(50); LOAD DATA INFILE 'reserved_words.txt' INTO TABLE test.reserved_words;
  3. Check for any column names using reserved keywordsSELECT table_schema, table_name, column_name, ordinal_position FROM information_schema.columns WHERE table_schema NOT IN ( 'mysql', 'information_schema', 'performance_schema' ) AND column_name = ANY ( SELECT * FROM test.reserved_words ) ORDER BY 1,2,4;
  4. Check for any table names using reserved keywordsSELECT table_schema, table_name FROM information_schema.tables WHERE table_schema NOT IN ( 'mysql', 'information_schema', 'performance_schema' ) AND table_name = ANY ( SELECT * FROM test.reserved_words );
  5. Check for any procedures or functions SELECT routine_schema, routine_name, routine_type FROM information_schema.routines WHERE routine_schema NOT IN ( 'mysql', 'information_schema', 'performance_schema' ) AND routine_name = ANY ( select * from test.words );

I hope this post helps you avoid one of the many issues you may encounter during the challenging task of database upgrades.

PlanetMySQL Voting: Vote UP / Vote DOWN

Taking the MySQL document store for a spin

This is not a comprehensive review, nor an user guide. It's a step-by-step account of my initial impressions while trying the new MySQL XProtocol and the document store capabilities. In fact, I am barely scratching the surface here: more articles will come as time allows.

MySQL 5.7 has been GA for several months, as it was released in October 2015. Among the many features and improvements, I was surprised to see the MySQL team emphasizing the JSON data type. While it is an interesting feature per se, I failed to see the reason why so many articles and conference talks were focused around this single feature. Everything became clear when, with the release of MySQL 5.7.12, the MySQL team announced a new release model.


In MySQL 5.7.12, we get the usual MySQL server, which shouldn't have new features. However, in an attempt to combine the stability of the server with a more dynamic release cycle, the server ships with a new plugin, unimaginatively named X-Plugin, which supports an alternative communication protocol, named X-Protocol.

In short, the X-Protocol extends and replaces the traditional client/server protocol, by allowing asynchronous communication to the server, using different API calls, which are available, as of today, in Javascript, Python, C#, and Java, with more languages to come.

The reason for this decision is easy to see. Many developers struggle with relational tables and SQL, while they understand structures made of arrays and associative arrays (maps.) This is also one of the reasons for the recent popularity of NoSQL databases, where schemas and tables are replaced by collections of documents or similar schema-less structures. With this new release, MySQL wants to offer the best of two worlds, by allowing developers to use the database with the tools they feel most comfortable with.

To use the new plugin, you need two components:

  • The plugin itself, which ships with the server package, but is not enabled by default;
  • The MySQL shell, a new command line tool that you have to download and install separately and will allow you to use Javascript or Python with the database.

As a QA professional, I am a bit concerned about this mix of GA and alpha features (The MySQL shell is defined as alpha software. and the shell itself says development preview in its help). Theoretically, the two worlds should be separated. If you don't install the plugin, the server should work as usual. But practice and experience tell me that there are dangers waiting for a chance to destroy our data. If you want a single piece of advice to summarize this article, DON'T USE the new MySQL shell with a production server. That said, let's start a quick tour.


You need to install the shell, which comes in a package that is different from the rest of MySQL products. The manual shows how to install it on OSX or Linux. The only mention that this product could be dangerous to use is a note reminding the user to enable the MySQL Preview Packages when installing from a Linux repository. The procedure, on any operating system, will install library and executables globally. Unlike the server package, it is not possible to install it in a user-defined directory, like you install the server with MySQL Sandbox. In this context, the standard Oracle disclaimer may have a meaning that goes beyond a regular CYA.

Next, you need to enable the plugin. You can do it in three ways:


$ mysqlsh --classic -u youruser -p --dba enableXProtocol
mysqlx: [Warning] Using a password on the command line interface can be insecure.
Creating a Classic Session to youruser@localhost:3306
Enter password:
No default schema selected.

enableXProtocol: Installing plugin mysqlx...
enableXProtocol: done


Start the server with This will enable the plugin, although it does not seem to work the way it should.


Enable the plugin with a SQL command.

mysql> install plugin mysqlx soname '';

I prefer method #3 because is the only one that does not have side effects or misunderstanding. The issue that hit me when I tried method #1 for the first time is that calling mysqlsh --classic uses the client/server protocol on port 3306 (or the port that you defined for the database) while subsequent calls will use the X-Protocol on port 33060.

Alternatives. Using Docker

If what I said previously has made you cautious and you have decided not to use the shell in your main computer (as you should), there are alternative ways. If you have a data center at your disposal, just fire a virtual machine and play with it. However, be aware that the MySQL shell does not install in Ubuntu 15.04 and 16.04.

A lightweight method to try on the new shell without endangering your production server is to use a Docker image for MySQL, or a combination of MySQL Sandbox and Docker.

In Docker, the MySQL shell does not ship together with the server. It requires a separate image. A quick guide is available in a recent article. I don't like the current approach: having two images is a waste of space. It would be acceptable if the images were based on a slim Linux distribution, such as Alpine. Since they run on OracleLinux, instead, you need to download two beefy images to start testing. With a fast internet connection this should not be a problem, but if you live in a place where 3 MBPS is the norm or if you are traveling, this could become an annoyance. Once you have pulled the images, you can use them at will, even without internet connection.

The above mentioned quick guide suggests using docker run --link to connect the two containers. I recommend a different approach, as the link option is now considered legacy.

$ docker network create mynet
$ docker network ls
a64b55fb7c92 bridge bridge
0b8a52002dfd none null
cc775ec7edab host host
edcc36be21e5 mynet bridge

$ docker run --name mybox -e MYSQL_ROOT_PASSWORD=secret -d --net mynet mysql/mysql-server:5.7.12 \

$ docker run -it --net mynet mysql/shell -u root -h mybox -p
Creating an X Session to root@mybox:33060
Enter password:
No default schema selected.

Welcome to MySQL Shell 1.0.3 Development Preview

Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective

Type '\help', '\h' or '\?' for help.

Currently in JavaScript mode. Use \sql to switch to SQL mode and execute queries.

The first command creates a network (called mynet).

The second command creates the server container, which is launched using the network mynet and with the plugin-load option (which seems to work well with the docker image). When you use a docker network, the container name is recognized by the network as an host name, and can be called by other members of the network. This is much cleaner than using --link.

The third command runs the MySQL shell, using the same network. This allows us to use the container name (mybox) without any other options.

Running the MySQL Javascript shell

My favorite setup for this test is a mix of MySQL Sandbox for the server and Docker for the shell. This way I can use the alpha shell without polluting my Linux host and use a feature rich MySQL Sandbox to control the server.

Here is what I do:

$ make_sandbox 5.7.12 -- --no_show -c general_log=1 -c general_log_file=general.log

I start a sandbox with MySQL 5.7.12 (tarball expanded and renamed into /opt/mysql/5.7.12), with the general log enabled. We need this to peek under the hood when we use the document store.

Next, we load the sample world_x database from the MySQL documentation page.

$ ~/sandboxes/msb_5_7_12/use -e 'source world_x.sql'

Finally, we enable the plugin.

$ ~/sandboxes/msb_5_7_12/use -e "install plugin mysqlx soname ''"

Now we can connect the shell:

$ docker run -it --net host mysql/shell -u msandbox -pmsandbox world_x
mysqlx: [Warning] Using a password on the command line interface can be insecure.
Creating an X Session to msandbox@localhost:33060/world_x
Default schema `world_x` accessible through db.

Welcome to MySQL Shell 1.0.3 Development Preview

Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective

Type '\help', '\h' or '\?' for help.

Currently in JavaScript mode. Use \sql to switch to SQL mode and execute queries.

What have we done? We use the network named 'host', which is a standard Docker protocol that lets a container use the host environment. We don't need to specify a port, since the shell assumes 33060 (enabled by the X-Plugin). The username and password are the usual ones for a sandbox. We enter inside a Javascript shell, where we can communicate with the database server using an alternative syntax. Let's see what we have:

  • We have an "X-Session" using port 33060 and working on database world_x;
  • There is a help, same as in the MySQL client;
  • The database world_x is accessible through the variable db.
  • Note: all the commands used below are the same for Python and Javascript. There are differences only when using the language extensively.

With the above elements, we can try getting data from the database.

mysql-js> db.collections
"CountryInfo": <Collection:CountryInfo>

mysql-js> db.tables
"City": <Table:City>,
"Country": <Table:Country>,
"CountryLanguage": <Table:CountryLanguage>

What does it mean? Let's abandon the Javascript shell and look at the traditional client:

mysql [localhost] {msandbox} (world_x) > show tables;
| Tables_in_world_x |
| City |
| Country |
| CountryInfo |
| CountryLanguage |
4 rows in set (0.00 sec)

Here we see 4 tables, while the Javascript console lists only 3. However, the fourth table has the same name as the "collection." Let's have a look:

mysql [localhost] {msandbox} (world_x) > desc CountryInfo;
| Field | Type | Null | Key | Default | Extra |
| doc | json | YES | | NULL | |
| _id | varchar(32) | YES | | NULL | STORED GENERATED |
2 rows in set (0.00 sec)

mysql [localhost] {msandbox} (world_x) > show create table CountryInfo\G
*************************** 1. row ***************************
Table: CountryInfo
Create Table: CREATE TABLE `CountryInfo` (
`doc` json DEFAULT NULL,
`_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED
1 row in set (0.00 sec)

Look what we got! A JSON column with a dynamic index implemented as a virtual column. Now we can appreciate why the JSON data type was such an important thing.

Back to the Javascript shell, let's get something from the database. (You can get all the commands I am using, and much more, from the manual.)

mysql-js> db.collections.CountryInfo.find("_id='USA'")
"GNP": 8510700,
"IndepYear": 1776,
"Name": "United States",
"_id": "USA",
"demographics": {
"LifeExpectancy": 77.0999984741211,
"Population": 278357000
"geography": {
"Continent": "North America",
"Region": "North America",
"SurfaceArea": 9363520
"government": {
"GovernmentForm": "Federal Republic",
"HeadOfState": "George W. Bush"
1 document in set (0.00 sec)

Apart from the feeling of being back in the good old times when MySQL was still playing with IPO dreams (look at the HeadOfState field in the above data), this record is a straightforward JSON document, where data that should belong to different normalized tables are bundled together in this unified view. So, we are really querying a Table that contains JSON data associated with an _id. We know because the general log lists what happens after our simple query:

SELECT doc FROM `world_x`.`CountryInfo` WHERE (`_id` = 'USA')

Let's try a more complex query. We want all countries in Oceania with a population of more than 150,000 people, and whose Head of State is Elisabeth II. The query is a bit intimidating, albeit eerily familiar:

mysql-js> db.collections.CountryInfo.find("government.HeadOfState='Elisabeth II' AND geography.Continent = 'Oceania' AND demographics.Population > 150000").fields(["Name", "demographics.Population","geography.Continent"])
"Name": "Australia",
"demographics.Population": 18886000,
"geography.Continent": "Oceania"
"Name": "New Zealand",
"demographics.Population": 3862000,
"geography.Continent": "Oceania"
"Name": "Papua New Guinea",
"demographics.Population": 4807000,
"geography.Continent": "Oceania"
"Name": "Solomon Islands",
"demographics.Population": 444000,
"geography.Continent": "Oceania"
4 documents in set (0.00 sec)

Here is the corresponding SQL query recorder in the general log:

'Name', JSON_EXTRACT(doc,'$.Name'),'demographics.Population', \
JSON_EXTRACT(doc,'$.demographics.Population'),'geography.Continent', \
) AS doc
FROM `world_x`.`CountryInfo` \
((JSON_EXTRACT(doc,'$.government.HeadOfState') = 'Elisabeth II') \
AND (JSON_EXTRACT(doc,'$.geography.Continent') = 'Oceania')) \
AND (JSON_EXTRACT(doc,'$.demographics.Population') > 150000)

I am not sure which one I prefer. The SQL looks strange, with all those JSON functions, while the Javascript command seems more readable (I had never thought I would say what I have just said!)

Enough with reading data. I want to manipulate some. I'll start by creating a new collection.

mysql-js> db.createCollection('somethingNew')

And the general log shows what should not be a surprise, as we have seen a similar structure for CountryInfo:

CREATE TABLE `world_x`.`somethingNew` (doc JSON, \
_id VARCHAR(32) \

Now, to the data manipulation:

mysql-js> mynew=db.getCollection('somethingNew')

The variable mynew can access the new collection. It's a shortcut to avoid db.collections.somethingNew

mysql-js> db.collections
"CountryInfo": <Collection:CountryInfo>,
"somethingNew": <Collection:somethingNew>
mysql-js> mynew.find()
Empty set (0.00 sec)

As expected, there is nothing inside the new collection. Now we enter a very minimal record.

mysql-js> mynew.add({Name:'Joe'})
Query OK, 1 item affected (0.01 sec)

mysql-js> mynew.find()
"Name": "Joe",
"_id": "e09ef177c50fe6110100b8aeed734276"
1 document in set (0.00 sec)

The collection contains more than what we have inserted. There is an apparently auto-generated _id field. Looking at the general log, we see that the data includes the new field.

INSERT INTO `world_x`.`somethingNew` (doc) VALUES ('{\"Name\":\"Joe\",\"_id\":\"e09ef177c50fe6110100b8aeed734276\"}')

As you can see, an _id field was added automatically. We could override that behavior by providing our own value:

mysql-js> mynew.add({_id: "a dummy string", Name:"Frank", country: "UK"})

The data inserted now includes the _id filed with our manual value. The general log says:

INSERT INTO `world_x`.`somethingNew` (doc) VALUES ('{\"Name\":\"Frank\",\"_id\":\"a dummy string\",\"country\":\"UK\"}')

The value of _id, however, must be unique, or the engine will generate an error:

mysql-js> mynew.add({_id: "a dummy string", Name:"Sam", country: "USA"})
MySQL Error (5116): Document contains a field value that is not unique but required to be

If all this gives you a sense of deja-vu, you're right. This feels and smells a lot like MongoDB, and I am sure it isn't a coincidence.

Synchronizing operations

As our last attempt for the day, we will see what happens when we manipulate data in SQL and then retrieve it in Javascript or Python.

We leave the JS console open, and we do something in SQL

mysql [localhost] {msandbox} (world_x) > drop table somethingNew;
Query OK, 0 rows affected (0.01 sec)

How does it look like on the other side?

mysql-js> db.collections
"CountryInfo": <Collection:CountryInfo>,
"somethingNew": <Collection:somethingNew>
mysql-js> db.getCollections()
"CountryInfo": <Collection:CountryInfo>,
"somethingNew": <Collection:somethingNew>

Oops! mysqlsh didn't get the memo! It still considers somethingNew to be available.

mysql-js> db.collections.somethingNew.find()
MySQL Error (1146): Table 'world_x.somethingNew' doesn't exist

We need to refresh the connection. Unlike the SQL client, you need to specify the connection parameters.

mysql-js> \connect msandbox:msandbox@localhost:33060/world_x
Closing old connection...
Creating an X Session to msandbox@localhost:33060/world_x
Default schema `world_x` accessible through db.

mysql-js> db.collections
"CountryInfo": <Collection:CountryInfo>

We can see the same happening when we create a new table in SQL. The session in mysqlsh keeps showing the cached contents, and we need to refresh the session to see the changes. Looking at the general log, there are no changes when we issue commands asking for metadata, such as db.collections or db.tables. Instead, when we refresh the session, we see this:

SELECT table_name, COUNT(table_name) c FROM information_schema.columns \
WHERE ((column_name = 'doc' and data_type = 'json')
OR (column_name = '_id' and generation_expression = 'json_unquote(json_extract(`doc`,''$._id''))')) \
AND table_schema = 'world_x' GROUP BY table_name HAVING c = 2

The first query lists all tables that contain a JSON document and a generated _id (these are the collections). The second one lists all tables. Then the shell removes from the table list all the ones that were in the collections list.

Given the way it is done, we can cheat the system easily by creating something that looks like a collection, but has extra fields:

CREATE TABLE strangedoc (doc JSON, \
_id VARCHAR(32) \
secret_stash varchar(200),
more_secret_info mediumtext
) CHARSET utf8mb4 ENGINE=InnoDB;

mysql [localhost] {msandbox} (world_x) > insert into strangedoc (doc,secret_stash,more_secret_info) values \
('{"_id": "abc", "name": "Susan"}', \
'and now for something completely different', \
'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.');
Query OK, 1 row affected (0.00 sec)

mysql [localhost] {msandbox} (world_x) > select * from strangedoc\G
*************************** 1. row ***************************
doc: {"_id": "abc", "name": "Susan"}
_id: abc
secret_stash: and now for something completely different
more_secret_info: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
1 row in set (0.00 sec)

And the Javascript console will be unaware of the extra material:

mysql-js> db.collections
"CountryInfo": <Collection:CountryInfo>,
"strangedoc": <Collection:strangedoc>
mysql-js> db.strangedoc.find()
"_id": "abc",
"name": "Susan"
1 document in set (0.00 sec)

We can add contents to the collection in Javascript, and the database server won't protest (provided that the extra fields are nullable or have a default value). Is it a bug or a feature?

Parting thoughts

As I have said at the beginning, this is a very simple exploration. More work is required to test the full potential of the new model. My impressions are mildly positive. On one hand, it's an exciting environment, which promises to expand to better usefulness with more programming languages and possibly better coordination between shell and server software. On the other hand, there are many bugs, and the software is still very green. It will require more iterations from the community and the development team before it could be trusted with important data.

PlanetMySQL Voting: Vote UP / Vote DOWN

Modelling databases in a web browser

Mon, 2016-05-02 13:50colin

When folk in the MariaDB world think about a GUI front-end, they tend to think of HeidiSQL, which has been shipping as part of MariaDB Server since 5.2.7. If one has a MariaDB Enterprise subscription, you get the Visual Query Editor which is SQLyog by Webyog. In fact the Knowledge Base lists a fairly lengthy list of Graphical and Enhanced Clients.

This past week I got a quick demo of Vertabelo. It is a SaaS based database design tool, that allows you to model within a web browser. They do have an on-site option too that costs a bit more. Now we don't have a direct offering for this, but I've always thought that this is the kind of complementary product today's developers will ask for: data modelling with a friendly UI, having a chat with people about a query (so if you the DBA find that the query looks odd, you can immediately start a chat with the developer to get it fixed and all this gets logged), and one of the cool things is the live validation (I saw this and enjoyed the hints it also provides you). And naturally there is also the idea of "reverse engineering" between PostgreSQL, SQL Server, Oracle, DB2, and others. The feature list is wide and varied.

I'm hoping to see more support of MariaDB Server within their offering, after all they did talk about the history of MySQL as well as Problems With Naming a Product: SOLVED in the past.

Tags: Developer About the Author

Colin Charles is the Chief Evangelist for MariaDB since 2009, work ranging from speaking engagements to consultancy and engineering works around MariaDB. He lives in Kuala Lumpur, Malaysia and had worked at MySQL since 2005, and been a MySQL user since 2000. Before joining MySQL, he worked actively on the Fedora and projects. He's well known on the conference track having spoken at many of them over the course of his career.

PlanetMySQL Voting: Vote UP / Vote DOWN

MariaDB in Google Summer of Code 2016

And for the fourth year in a row, MariaDB Foundation participates in the Google Summer of Code! The MariaDB Organization in GSoC is an umbrella organization for all projects that belongs to the MariaDB ecosystem, be it MariaDB Server, MariaDB Connectors, or MariaDB MaxScale. The complete list of our suggested project ideas is in MariaDB […]

The post MariaDB in Google Summer of Code 2016 appeared first on

PlanetMySQL Voting: Vote UP / Vote DOWN

Ubuntu 16.04 LAMP server tutorial with Apache 2.4, PHP 7 and MariaDB (instead of MySQL)

LAMP is short for Linux, Apache, MySQL, PHP. This tutorial shows how you can install an Apache 2.4 web server on an Ubuntu 16.04 (Xenial Xerus) server with PHP 7 support (mod_php) and MySQL / MariaDB. Additionally, I will install PHPMyAdmin to make MySQL administration easier. A LAMP setup is the perfect basis for CMS systems like Joomla, Wordpress or Drupal.
PlanetMySQL Voting: Vote UP / Vote DOWN

Fun with Bugs #42 - Bugs Fixed in MySQL 5.7.12

MySQL 5.7.12 was released more than 2 weeks ago. New features introduced there in a form of "rapid plugins" are widely discussed, but I am more interested in bugs reported by MySQL Community users that are fixed there. Unfortunately I do not see MySQL Community Release Notes by Morgan (like this) for quite a some time, so I have to continue describing key bug fixes and name people who reported and verified bugs in my "Fun with Bugs" series.

As usual, let's start with InnoDB bugs fixed:
  • Bug #80070 - "allocated_size and file_size differ if create general tablespace outside datadir". It was reported by my former colleague from Percona Shahriyar Rzayev and verified by Bogdan Kecman. Nice to see more people from Oracle involved in processing community bug reports!
  • Bug #79185 - "Innodb freeze running REPLACE statements". This bug (that affected many users, also on versions 5.5.x and 5.6.x, and was a kind of a regression) was reported by Will Bryant and verified (probably) and fixed by Shaohua Wang. The fix is also included into versions 5.5.49 and 5.6.30.
  • Bug #73816 - ''MySQL instance stalling “doing SYNC index”". It was reported by Denis Jedig and a lot of additional evidence was provided by my former colleague Aurimas Mikalauskas. This bug was fixed (and probably verified) by Shaohua Wang.
  • Bug #79200 - "InnoDB: "data directory" option of create table fails with pwrite() OS error 22", is a widely noted regression (I've seen customer issue with a potentially related MariaDB problem this week). This bug was reported by Frank Ullrich and verified by Bogdan Kecman. It is also fixed in MySQL 5.6.30.
  • Bug #79725 - "Check algorithm=innodb on crc32 checksum mismatch before crc32(big endian)". This bug was created to track the patch contributed by Daniel Black at GitHub. It was verified by Umesh.
Next, let's review replication bugs fixed in 5.7.12:
  • Bug #79504 - "STOP SLAVE IO THREAD prints wrong LOST CONNECTION message in error log file". It was reported by Venkatesh Duggirala.
  • Bug #78722 - "Relay log info currently_executing_gtid is not properly initialized or protected". This bug was reported by Pedro Gomes. It contains a nice simple test case and fix suggested.
  • Bug #78445 is private. So, I can only quote the release notes:
    "RESET SLAVE ALL could delete a channel even when master_pos_wait and wait_until_sql_thread_after_gtid were still waiting for binlog to be applied. This could cause a MySQL server exit when the functions tried to access the channel that was deleted. Now, a channel reference counter was added that is increased if the channel should not be deleted when the functions are running. RESET SLAVE ALL will wait for no reference, and then it will delete the channel."I am not sure this crash is a "security" bug of any kind, but what do I know...
  • Bug #78352 - "Slow startup of 5.7.x slave with relay_log_recovery = ON and many relay logs". I reported it based on regression comparing to 5.6.x reported by a customer of Percona, and verified by Umesh. Nice to see it fixed, as it was really annoying for almost anyone who upgraded production replication setup to 5.7.
  • Bug #78133 - "Slave_worker::write_info() incorrect DBUG_ENTER (contribution)". This bug was created to track the patch contributed by Stewart Smith at GitHub. It was verified by Umesh.
  • Bug #77740 - "silent failure to start if mysql.gtids_executed gets HA_ERR_LOCK_WAIT_TIMEOUT ". It was reported and verified by Shane Bester.
  • Bug #77237 - "Multi-threaded slave log spamming on failure". This bug was reported by Davi Arnaut and verified by Umesh. Fix is also included in MySQL 5.6.30.
  • Bug #78963 - "super_read_only aborts STOP SLAVE if relay_log_info_repository=TABLE, dbg crash". It was reported by my former colleague in Percona Laurynas Biveinis and verified by Umesh. Check also related Bug #79328 - "super_read_only broken as a server option".
  • Bug #77684 - "DROP TABLE IF EXISTS may brake replication if slave has replication filters". This bug was reported by my former colleague in Percona Fernando Laudares Camargos for MySQL 5.6.x and verified by Umesh. MySQL 5.6.30 also got this fixed.
We all remember that Performance Schema is perfect and the next greatest thing after sliced bread, but sometimes bugs are noted even there. Check Bug #79934 - "i_perfschema.table_leak random result failure" reported and verified by Magnus Blåudd. Another example is Bug #79784 - "update setup_instruments do not affect the global mutex/rwlock/cond" reported by Zhang Yingqiang and verified by Umesh. The later, IMHO, is related to or a super set of my good old report, Bug #68097 - "Manual does not explain that some P_S instruments must be enabled at startup" that remains open as a feature request (after some changes in the manual) for more than 3 years already. I truly hope 5.7.12 fixed this for a general case - it's truly important to be able to enable instruments dynamically if we expect Performance Schema to be used as a main tool for troubleshooting.

I'd also want to highlight a couple of fixes related to optimizer:
  • Bug #77209 - "Update may use index merge without any reason (increasing chances for deadlock)". It was reported and verified by my former colleagues from Oracle, Andrii Nikitin. MySQL 5.6.30 also includes the fix.
  • Bug #72858 - "EXPLAIN .. SELECT .. FOR UPDATE takes locks". This bug was reported by my former colleague in Percona (and, I hope, my colleague again soon) Justin Swanhart, who has a birthday today. Happy Birthday to you, Justin! The bug was verified by Umesh and is also fixed in MySQL 5.6.30. Justin had reported another bug fixed in 5.7.12, Bug #69375 - "LOAD DATA INFILE claims to be holding 'System Lock' in processlist".
Several more bugs reported by community were also fixed, but they were in the areas (or for platforms) I am not particularly interested in.

To summarize, MySQL 5.7.12 contains important bug fixes in replication and InnoDB and it makes sense to consider upgrade even if you do not care about any "rapid plugins", X protocol, encryption of data at rest, MySQL Keyring and other "cool" new shiny features.
PlanetMySQL Voting: Vote UP / Vote DOWN

MariaDB 10.0.25 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.0.25. See the release notes and changelog for details on this release. Download MariaDB 10.0.25 Release Notes Changelog What is MariaDB 10.0? MariaDB APT and YUM Repository Configuration Generator Thanks, and enjoy MariaDB!

The post MariaDB 10.0.25 now available appeared first on

PlanetMySQL Voting: Vote UP / Vote DOWN

Being Naive About Folder Paths

Recently I worked on a feature that involved displaying a list of Folders using an auto-complete field. A simple enough task, but one that had a bit of complexity behind it. For starters, each organization in our system has their own "folder hierarchy", or a tree structure of folders, that could be one to many layers deep. The second thing being that the UI we wanted -- auto-complete -- meant that we would need to "search" through the folders to find a match (based on the user's input) as well as display the path information (where the folder lives) so that the user can differentiate folders if they happen to have the same or similar name. Since it was auto-complete, the searching also needed to be fast or else the user experience would be annoying.

The folder tree structure was stored in a MySQL DB table in a very typical self-referencing schema that represented hierarchical data. Each row of data could have a column that referenced the "parent" folder id and if there wasn't one, then it was a "root" folder. Each row in the table could be represented by an object in Java called Folder. Getting folders by ID and getting them in a tree-like structure was pretty straight forward. However, the path-string wasn't part of the deal.

My first thought on this was that the paths would need to be pre-computed ahead of time so that they wouldn't need to be computed every time a user typed in a letter, an expensive and slow function. One way to do it would be to have some application code insert to a table that contained the folder paths whenever a folder was saved. Our resident MySQL expert, Ike Walker, suggested I look into the closure table pattern, discussed in Chapter 3, Naive Trees, of Bill Karwin's book, "SQL Antipatterns". Yes, this was exactly what I was thinking of! However, as I thought about it more there were some things I didn't quite like about it. First, there was already a lot of existing data. Using Closure Tables would mean that I would need to script something to populate the table storing the folder paths, and run it at least once. The fact that I would need to script something like that also made me think about how will this table get updated? In the book "SQL Antipatterns", a discussion thread example was used. In the case of a discussion thread, individual comments have the same kind of tree-structure that folders might have, but comments and replies typically don't move around whereas folders do. Updating the folder paths would be required to keep in sync with the underlying tree, along with whole sections of a particular folder's branch. The thought of having to write this updating code made me think twice about my initial approach.

Caching to the rescue

So what does a developer do when there's a problem with performance? Cache. So I thought: how can caching help me here? As I pointed out earlier, there was already a Java object -- Folder -- and a way to get the folders in a tree structure. If the Folder had a field to store the path, then I'm halfway there, right? But wait, doesn't that mean I'd be storing the path in the DB again? Well, not if I set it on a transient field and set it only when the tree structure is getting built up. Hold on -- doesn't that mean that the paths still need to be calculated every time we search the tree? Here's where the caching bit comes in. The folder tree would be traversed once and as it is recursing, the paths get computed and stored on a transient field of the Folder object. At the same time, the Folders are stored in a flat List that can be used for iterating and searching, etc. Next, the method which retrieves the folder list is cached and can be cached with the Organization's ID in the cache-key so that each list can be kept separately. To make sure we don't cache large lists of heavy Folder objects, I cached only lists of Folder IDs (not objects) and made the Folder objects also cached by their ID. Okay, then what about the updating part I talked about? The cached list of Folders would be invalidated whenever a Folder update occurred. It means that the paths would need to be re-computed again, but only once until the next update and that seemed reasonable enough.

In the end, I didn't use the closure table, although I'm keen on using it if the opportunity arises. One of the key things to coming up with the solution was knowing not only the data-structure but also knowing the nature of the data (how it's used and how often does it tend to change). In all fairness, Bill Karwin also points out frequent updating as a weakness/pain-point of the Closure Table pattern. So there you have it my friends, use those clousure tables if you can, and if you can't, then hopefully the approach I used here will give you some ideas!

PlanetMySQL Voting: Vote UP / Vote DOWN