Planet MySQL

MariaDB 10.3.4 and MariaDB Connector/C 3.0.3 and 2.3.5 now available

The MariaDB project is pleased to announce the availability of MariaDB 10.3.4, the second beta release in the MariaDB 10.3 series, as well as MariaDB Connector/C 3.0.3, the latest stable release in the MariaDB Connector/J 3.0 series, and MariaDB Connector/C 2.3.5, the latest stable release in the MariaDB Connector/C 2.3 series. See the release notes […]

The post MariaDB 10.3.4 and MariaDB Connector/C 3.0.3 and 2.3.5 now available appeared first on MariaDB.org.

A tale of Corrupt InnoDB table, MySQL crash & recovery

I’m going to narrate you a story that happened around a crashing MyQL, Corrupted InnoDB table and finally the recovery by table restore. We will see how our database administrator detected the issue and what he did to resolve it. A day in MySQL Database Consultant’s day was taking its shape while a friend called […]

New Video - Ten Tips to Secure MySQL & MariaDB

This video, based on last weeks blog “Ten Tips to Achieve MySQL and MariaDB Security”, walks you through ten different items to keep in mind when deploying a MySQL or MariaDB database to production.

Database security is an essential part of any system. With more and more news reports of widespread data breaches coming in from around the world, there is no better time to check your environments and make sure you have implemented these basic steps to remain secure.

ClusterControl for Database Security

ClusterControl provides advanced deployment, monitoring and management features to ensure your databases and their data are secure. It ensures that your open source database deployments always adhere to basic security model setups for each technology.

ClusterControl provides the Package Summary Operational Report that shows you how many technology and security patches are available to upgrade and can even execute the upgrades for you!

Related resources  Ten Tips to Achieve MySQL & MariaDB Security  ClusterControl Database Security  How to Secure Your Open Source Databases with ClusterControl

In addition ClusterControl offers…

  • Secure Deployments
    Every technology has its own unique security features and ClusterControl ensures that what should be enabled is enabled during deployment. This eliminates the risk of human error which could otherwise result in leaving the database vulnerable because of a security setting oversight.
  • Communication Security
    ClusterControl provides the ability to install a purchased or self-signed SSL certificate to encrypt the communications between the server and the client. Replication traffic within a Galera Cluster can also be encrypted. Keys for these certificates are entered into and managed by ClusterControl.
  • Backup Security
    Backups are encrypted at rest using AES-256 CBC algorithm. An auto generated key will be stored in the cluster's configuration file under /etc/cmon.d. The backup files are transferred in encrypted format. Users can now secure their backups for offsite or cloud storage with the flip of a checkbox. This feature is available for select backup methods for MySQL, MongoDB & PostgreSQL.
  • User Management
    ClusterControl’s advanced user management features allow you to restrict read or write access to your data at the database or table level. ClusterControl also provides advisors that check that all of your users have proper passwords, and even comes with checks to make sure any part of your database is not open to the public.
  • Reports & Auditing
    ClusterControl provides reporting and audit tools to ensure you remain compliant, whether it is to an industry standard or to your own requirements. It also provides several Developer Studio Advisors that check your database environment to ensure that it is secure. You can even create your own security advisors to automate your own best practices. In addition, several Operational Reports found in ClusterControl can provide you with information you need to know to ensure your database environment is secure.

Download ClusterControl today to take advantage of these database security features.

Tags:  MySQL MariaDB security encryption user management

Now with System Versioned Tables - Announcing MariaDB Server 10.3 Second Beta

Now with System Versioned Tables - Announcing MariaDB Server 10.3 Second Beta RalfGebhardt Thu, 01/18/2018 - 07:19

We are happy to announce the second beta release of MariaDB Server 10.3, the fastest growing open source relational database. Beta is an important time in our release and we encourage you to download this release today! Please note that we do not recommend running beta releases in production.

MariaDB Server 10.2 added enhancements like Window Functions, Common Table Expressions, JSON functions and CHECK constraints. MariaDB Server 10.3 is the next evolution. For MariaDB Server 10.3 a lot of effort has been spent on database compatibility enhancements, especially for stored routines. This will allow easier migration of stored functions and better usability of stored functions in general.

With System Versioned Tables we are now adding a new powerful enhancement to MariaDB Server, the ability to process temporal data. This opens new use cases to MariaDB Server, like retrospective and trend data analysis, forensic discovery, or data auditing.
System Versioned Tables could  be used for compliance, audit, risk analysis, or position analysis. 

Enabling the System Versioned Tables feature is as easy as altering an existing table:

ALTER TABLE products ADD SYSTEM VERSIONING;

or when creating a new table:

CREATE TABLE products ( pname VARCHAR(30), price decimal(8,2) ) WITH SYSTEM VERSIONING;

System versioned tables are storing timestamps for when data has been added until it has been updated or deleted. This allows to query the data "as of" a given time, or to compare the data "as of" a different date and time.

SELECT * FROM products FOR SYSTEM_TIME AS OF TIMESTAMP @t1;

Now, with MariaDB Server 10.3.4 beta, several significant features and enhancements are available for our users and customers, including:

  • Temporal Data Processing
    • System Versioned Tables store information relating to past and present time
  • Database Compatibility Enhancements
    • PL/SQL Compatibility for MariaDB Stored Functions: The server now understands a subset of Oracle's PL/SQL language instead of the traditional MariaDB syntax for stored routines
    • New option for CURSOR in stored routines: A CURSOR can now have parameters used by the associated query
    • New data types for stored routines: ROW data type, TYPE OF and ROW TYPE OF anchored data types
    • Generation of unique primary keys by SEQUENCES: As an alternative to AUTO INCREMENT It is now possible to define names sequence objects to create a sequence of numeric values
    • Operations over result sets with INTERSECT and EXCEPT: In addition to the already existing UNION an intersection and subtraction of result sets is now possible
    • Define Columns to be invisible: Columns now can be defined to be invisible. There exist 3 levels of invisibility, user defined, system level and completely invisible
    • Window Function Enhancement: percentile and median window functions have been added
  • User Flexibility
    • User Defined Aggregate Functions: In addition to creating SQL functions it is now also possible to create aggregate functions
    • Lifted limitations for updates and deletes: A DELETE statement can now delete from a table used in the WHERE clause. UPDATE can be the same for source and target
  • Performance/Storage Enhancements
  • Storage Engine Enhancements
    • Spider Storage Engine: The partitioning storage engine has been updated to the newest release of the Spider Storage engine to support new Spider features including direct join support, direct update and delete, direct aggregates
    • Proxy Layer Support for MariaDB Server: Client / Server authentication via a Proxy like MariaDB MaxScale using a Server Proxy Protocol Support

Try out MariaDB Server Beta software and share your feedback!

Download MariaDB Server 10.3.4 Beta

Release Notes Changelog What is MariaDB Server 10.3?

 

We are happy to announce the second beta release of MariaDB Server 10.3, the fastest growing open source relational database.

With System Versioned Tables we are now adding a new powerful enhancement to MariaDB Server, the ability to process temporal data. This opens new use cases to MariaDB Server, like retrospective and trends data analysis, forensic discovery or data auditing.

Login or Register to post comments

Replication Will Not Start On RDS – MariaDB 10.2

Briefing the recent encounter on a Replication issue with RDS MariaDB 10.2.

Problem Statement:

After the upgrade of replicas to MariaDB 10.2 on RDS, Once replication is stopped manually through “call mysql.rds_stop_replication;” or replication failed due to some error. Replication cannot be started back using “call mysql.rds_start_replication;” and there is no straightforward way or documented process to start the replication back. Also, most of the replication related RDS commands like skip errors etc will not work. This is due to the implementation of replication handling in RDS.

Summary:

On Jan 5, 2018, RDS announced support for MariaDB 10.2. Release Notes From Amazon

To test the release we tried upgrading one of our read replicas used for development and testing on RDS from 10.1.23 to 10.2.11.

Reproducible Test Case:

### I have a running slave upgraded to MariaDB 10.2

mysql> select @@version; +-----------------+ | @@version | +-----------------+ | 10.2.11-MariaDB | +-----------------+ mysql> pager grep -i 'Running:\|Seconds' mysql> show slave status\G Slave_IO_Running: Yes Slave_SQL_Running: Yes Seconds_Behind_Master: 0

### Stopping replication manually

mysql> call mysql.rds_stop_replication; +----------------------------------------------------------------------------+ | Message | +----------------------------------------------------------------------------+ | Slave has encountered an error. Run SHOW SLAVE STATUS\G; to see the error. | +----------------------------------------------------------------------------+ 1 row in set (1.39 sec) Query OK, 0 rows affected (1.54 sec)

### RDS threw error info here because, even after RDS performing “STOP SLAVE”, it sees threads owned by “system user”, Which we detailed in the later section.

### Let’s check the slave is actually stopped.

mysql> pager grep -i 'Running:\|Seconds' mysql> show slave status\G Slave_IO_Running: No Slave_SQL_Running: No Seconds_Behind_Master: NULL

### Slave is stopped, Let’s try to startup.

mysql> call mysql.rds_start_replication; Query OK, 0 rows affected (0.48 sec)

### Call returns OK, Let’s check the status.

mysql> pager grep -i 'Running:\|Seconds' mysql> show slave status\G Slave_IO_Running: No Slave_SQL_Running: No Seconds_Behind_Master: NULL

### Call returned OK, Nothing happened.

If “call mysql.rds_start_replication” is failing or not working, there is no other documented way to start back the replication in RDS. Let’s take a deep look at RDS implementation.

RDS Implementation:

As most of us know we will not be granted super privileges on RDS, To manage replication we have to use the procedures used by RDS. This is how the procedure is implemented.

Procedure: mysql.rds_start_replication CREATE DEFINER=`rdsadmin`@`localhost` PROCEDURE `rds_start_replication`() BEGIN DECLARE v_mysql_version VARCHAR(20); DECLARE v_threads_running INT; DECLARE v_called_by_user VARCHAR(50); DECLARE v_sleep int; DECLARE sql_logging BOOLEAN; select @@sql_log_bin into sql_logging; Select user() into v_called_by_user; Select version() into v_mysql_version; SELECT COUNT(1) into v_threads_running FROM information_schema.processlist WHERE user = 'system user'; if v_threads_running = 0 then set @@sql_log_bin=off; update mysql.rds_replication_status set called_by_user=v_called_by_user,action='start slave', mysql_version=v_mysql_version where action is not null; commit; select sleep(1) into v_sleep; START SLAVE; SELECT COUNT(1) into v_threads_running FROM information_schema.processlist WHERE user = 'system user'; if v_threads_running = 2 then insert into mysql.rds_history (called_by_user,action,mysql_version) values (v_called_by_user,'start slave', v_mysql_version); commit; Select 'Slave running normally.' as Message; else Select 'Slave has encountered an error. Run SHOW SLAVE STATUS\\G; to see the error.' as Message; end if; else if v_threads_running = 2 then Select 'Slave may already running. Call rds_stop_replication to stop replication;' as Message; end if; end if; set @@sql_log_bin=sql_logging; END Procedure In Words:

– Get count of threads getting executed by the user ‘system user’

– Only if it’s 0 starts the replication.

This is how replication traditionally worked, we will have two threads with the system user, one is IO thread and another is SQL thread.

So what has changed in MariaDB 10.2: mysql> show processlist; +----+-------------+-----------------------+------+---------+------+--------------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------------------+------+---------+------+--------------------------+------------------+----------+ | 1 | system user | | NULL | Daemon | NULL | InnoDB purge coordinator | NULL | 0.000 | | 2 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 | | 3 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 | | 4 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 | | 5 | system user | | NULL | Daemon | NULL | InnoDB shutdown handler | NULL | 0.000 |

MariaDB 10.2 made background processes (InnoDB purge threads / InnoDB shutdown handler) as threads executed by the system user.

By default, innodb_purge_threads = 4, 1 shutdown handler thread. So we always have 5 threads running as system user.

Issue:

As most of the replication handling procedures on RDS “rds_stop_replication, rds_start_replication, rds_skip_repl_error, etc” operates based on the count of the threads ran by “system user”, this update broke the complete implementation of RDS replication handling.

I honestly believe RDS should implement more robust validations and fix this bug soon.

Temporary Hack:

Though it’s not documented I just tried doing some kind of hack after reading the procedure and succeeded starting the stopped replication back.

  • Disable read-only on the replica.
  • update mysql.rds_replication_status set action=’start slave’;
  • Wait for the next minute to cross, Replication will be started.
mysql> update mysql.rds_replication_status set action='start slave'; Rows matched: 1 Changed: 1 Warnings: 0 mysql> select * from mysql.rds_replication_status; +----+---------------------+-------------------------+-------------+-----------------+-------------+-------------+ | id | action_timestamp | called_by_user | action | mysql_version | master_host | master_port | +----+---------------------+-------------------------+-------------+-----------------+-------------+-------------+ | 1 | 2018-01-18 08:29:55 | mydbops@122.166.223.194 | start slave | 10.2.11-MariaDB | NULL | NULL | +----+---------------------+-------------------------+-------------+-----------------+-------------+-------------+ mysql> select now(); +---------------------+ | now() | +---------------------+ | 2018-01-18 08:30:04 | +---------------------+ mysql> pager grep "Running:\|Seconds" mysql> show slave status\G Slave_IO_Running: Yes Slave_SQL_Running: Yes Seconds_Behind_Master: 0

### Voila

Donkey System

Donkey system is a fully automatic MySQL database change system.
It gives a great help both to the release of the business and the company’s automated operation and maintenance.

Donkey.pptx
Donkey_intro.pdf

Meltdown vs storage

tl;dr - sysbench fileio throughput for ext4 drops by more than 20% from Linux 4.8 to 4.13

I shared results from sysbench with a cached database to show a small impact from the Meltdown patch in Ubuntu 16.04. Then I repeated the test for an IO-bound configuration using a 200mb buffer pool for InnoDB and database that is ~1.5gb.

The results for read-only tests looked similar to what I saw previously so I won't share them. The results for write-heavy tests were odd as QPS for the kernel without the patch (4.8.0-36) were much better than for the kernel with the patch (4.13.0-26).

The next step was to use sysbench fileio to determine whether storage performance was OK and it was similar for 4.8 and 4.13 with read-only and write-only tests. But throughput with 4.8 was better than 4.13 for a mixed test that does reads and writes.

Configuration


I used a NUC7i5bnh server with a Samsung 960 EVO SSD that uses NVMe. The OS is Ubuntu 16.04 with the HWE kernels -- either 4.13.0-26 that has the Meltdown fix or 4.8.0-36 that does not. For the 4.13 kernel I repeat the test with PTI enabled and disabled. The test uses sysbench with one 2gb file, O_DIRECT and 4 client threads. The server has 2 cores and 4 HW threads. The filesystem is ext4.

I used these command lines for sysbench:
sysbench fileio --file-num=1 --file-test-mode=rndrw --file-extra-flags=direct \
    --max-requests=0 --num-threads=4 --max-time=60 prepare
sysbench fileio --file-num=1 --file-test-mode=rndrw --file-extra-flags=direct \
    --max-requests=0 --num-threads=4 --max-time=60 run

And I see this:
cat /sys/block/nvme0n1/queue/write_cache
write back

Results

The next step was to understand the impact of the filesystem mount options. I used ext4 for these tests and don't have much experience with it. The table has the throughput in MB/s from sysbench fileio that does reads and writes. I noticed a few things:
  1. Throughput is much worse with the nobarrier mount option. I don't know whether this is expected.
  2. There is a small difference in performance from enabling the Meltdown fix - about 3%
  3. There is a big difference in performance between the 4.8 and 4.13 kernels, whether or not PTI is enabled for the 4.13 kernel. I get about 25% more throughput with the 4.8 kernel.

4.13    4.13    4.8    mount options
pti=on  pti=off no-pti
100     104     137     nobarrier,data=ordered,discard,noauto,dioread_nolock
 93     119     128     nobarrier,data=ordered,discard,noauto
226     235     275     data=ordered,discard,noauto
233     239     299     data=ordered,discard,noauto,dioread_nolock

Is it the kernel?

I am curious about what happened between 4.8 and 4.13 to explain the 25% loss of IO throughput.

I have another set of Intel NUC servers that use Ubuntu 16.04 without the HWE kernels -- 4.4.0-109 with the Meltdown fix and 4.4.0-38 without the Meltdown fix. These servers still use XFS. I get ~2% more throughput with the 4.4.0-38 kernel than the 4.4.0-109 kernel (whether or not PTI is enabled).

The loss in sysbench fileio throughput does not reproduce for XFS. The filesystem mount options are "noatime,nodiratime,discard,noauto" and tests were run with /sys/block/nvme0n1/queue/write_cache set to write back and write through. The table below has MB/s of IO throughput.

4.13    4.13    4.8
pti=on  pti=off no-pti
225     229     232     write_cache="write back"
125     168     138     write_cache="write through"

More debugging

This is vmstat output from the sysbench test and the values for wa are over 40 for the 4.13 kernel but less than 10 for the 4.8 kernel. The ratio of cs per IO operation is similar for 4.13 and 4.8.

# vmstat from 4.13 with pti=off
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  4      0 15065620 299600 830564    0    0 64768 43940 7071 21629  1  6 42 51  0
 0  4      0 15065000 300168 830512    0    0 67728 45972 7312 22816  1  3 44 52  0
 2  2      0 15064380 300752 830564    0    0 69856 47516 7584 23657  1  5 43 51  0
 0  2      0 15063884 301288 830524    0    0 64688 43924 7003 21745  0  4 43 52  0

# vmstat from 4.8 with pti=on
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  4      0 14998364 384536 818532    0    0 142080 96484 15538 38791  1  6  9 84  0
 0  4      0 14997868 385132 818248    0    0 144096 97788 15828 39576  1  7 10 83  0
 1  4      0 14997248 385704 818488    0    0 151360 102796 16533 41417  2  9  9 81  0
 0  4      0 14997124 385704 818660    0    0 140240 95140 15301 38219  1  7 11 82  0

FOSDEM MySQL Community Dinner – Friday 2 Feb 2018 – Tickets Now Available!

FOSDEM is back in town, folks, and as usual, there will there be a MySQL and Friends Devroom. At this point, we can’t really remember what came first, was it the annual dinner or the devroom? Our memories of the past few years are particularly hazy on that…  Can’t imagine why.

Following a great tradition of dining together in style with members from all over the community, we have rented a by now familiar private space at ICAB, more detailed directions below.

A couple of things changed compared to the last couple of years. The community dinner will take place on Friday night 2 February, following the Pre FOSDEM MySQL Day, both in the same location.

Book your tickets now!!!

The listed ticket price includes a selection of Belgian Speciality Beers and food, which will be typical Belgian style (no ‘Italian’ Pizza!)

If you have any dietary requirements (vegetarian, vegan, gluten-free,…) please let us know ahead of time when buying the ticket as we need to inform the caterer who will make the necessary accommodations for you.

We’re looking forward to meeting you all again at Fosdem and the Community Dinner. See you then!

Party-Squad – Dimitri VanoverbekeTom De CoomanLiz van Dijk, (Kenny Gryp) Sponsors Once again, we want to thank our generous sponsors, whose help makes this affordable at such a great price. Community Sponsors:

Other Sponsors:

ICAB Brussels – Business and Technology Incubator

Wondering how to get there?

The venue itself is located very close to the VUB. A map is displayed below to help you plan your route.

The total distance from the heart of Brussels, Grand Place, is about 20 minutes by public transport, along the route of metro line 5 (closest stop: Petillon). Be sure to use Google/Apple/Your Preferred Maps to determine the best way to get there, though as both tram and bus lines are also an option.

Shinguz: Advanced MySQL and MariaDB training in Cologne 2018

End of February, from February 26 to March 2 (5 days), FromDual offers an additional training for DBAs and DevOps: our most visited Advanced MySQL and MariaDB training.

This training is hold in the training facilities of the FromDual training partner GFU Cyrus GmbH in Cologne-Deutz (Germany).

There are already enough registrations so it is certain the training will take place. But there are still free places for at least 3 additional participants.

The training is in German.

You can find the training of this 5-day MySQL/MariaDB training here.

If you have any question please do not hesitate to contact us.

Taxonomy upgrade extras:  training advanced cologne

Shinguz: Oracle releases MySQL security vulnerability fixes 2018-01

As in every quarter of the year Oracle has released yesterday its recommendation for the MySQL security updates. This is called, in Oracle terminology, Critical Patch Update (CPU) Advisory.

This CPU is published for all Oracle products. But FromDual is only interested in MySQL related topics. So let us concentrate on those.

This time 25 fixes with a maximum score of 8.1 (out of 10.0) were published.

6 of theses 25 vulnerabilities are exploitable remotely over the network without authentication (no user credentials required)!

The following MySQL products are affected:

  • MySQL Enterprise Monitor (3.3.6.3293 and before, 3.4.4.4226 and before, 4.0.0.5135 and before)
  • MySQL Connector/Net (6.9.9. and before, 6.10.4 and before)
  • MySQL Connector/ODBC (5.3.9. and before)
  • MySQL Server (5.5.58 and before, 5.6.38 and before, 5.7.19 and before)

It is recommended to upgrade your MySQL products to close the security vulnerabilities.

FromDual upgrade decision aid

Because such security updates are published quarterly and some of our customers have dozens to hundreds of MySQL installations this would end up in a never ending story where you are continuously upgrading MySQL database servers and other products.

This led to idea to create an upgrade decision aid to decide if you have to upgrade to this CPU or not.

The following questions can be asked:

  • How exposed is your database?
    Databases can be located in various network segments. It is not recommended to expose databases directly to the internet. Databases are either installed in demilitarized zones (DMZ) with no direct access from the internet or in the companies private network (only company employees should be able to access the database) or even specialized secure networks (only a limited number of specific employees can access this network).
  • How critical are your data?
    Some data are more interesting or critical, some data are less interesting or critical. Interesting data are: User data (user name and password), customer data (profiles, preferences, etc.), financial data (credit cards) and health care data (medical data). Systems containing such data are more critical than others. You can also ask: How sever is it if such data leak?
  • How broad is the user base able to access the database?
    How many employees do you have in your company? How many contractors do you have in your company? How many employees have physical access to the database server? How good is the mood of those people?
    How good are the user credentials to protect your database? Do you have shared passwords or no passwords at all? Do you have an account management (expiring old accounts, rotate passwords from time to time)?
    How much do you trust your users? Do you trust all your employees? Do you trust only admins? Or do you not even trust your admins?
  • How severe are the security vulnerabilities?
    You can define a threshold of severity of the vulnerabilities above you want to take actions. According to your criticality you can take actions for example as follows: Greater or equal than 7.5 if you have less critical data. Greater or equal than 6.0 if you have critical data.
  • Can the vulnerability be use from remote (over the network) and does it need a user authentication to exploit the vulnerability? What products (MySQL Enterprise Monitor, MySQL Server, MySQL Connectors) and what modules (Apache/Tomcat, .Net Connector, Partitioning, Stored Procedures, InnoDB, DDL, GIS, Optimizer, ODBC, Replication, DML, Performance Schema) are affected?

Depending on your readiness to take a risk you get now answers to decide if you have to take actions or not.

Some examples
  • Situation: Your database is exposed directly to the internet or you forgot to install some firewall rules to protect your MySQL port.
    Analysis: You are probably affected by CVE-2018-2696 and CVE-2017-3737 (score 5.9 and 7.5). So you passed the threshold for non-critical data (7.5) and nearly passed the threshold for critical data (6.0). These vulnerabilities allow attacks over the network without user authentication.
    Action: Immediate upgrade is recommended. Mid-term action: Install firewall rules to protect your MySQL to avoid access from remote and/or do not expose databases directly to the internet.
  • Situation: Your database is located in the intranet zone. You have slack user/password policies and you have many employees and also many contractors from foreign countries working on various projects. And you have very sensitive/interesting financial data stored in your database.
    Analysis: Many people, not all of them are really trusted, have network access to the database. It is quite possible that passwords have been shared or people have passwords for projects they are not working for any more. You are affected by nearly all of the vulnerabilities (network).
    Action: You should plan an upgrade soon. Mid-term action: Try to restrict access to the databases and implement some password policy rules (no shared passwords, password expiration, account locking etc.).
  • Situation: Your highly critical databases are located in a specially secured network and only applications, Linux admins and DBAs have access to this network. And you completely trust those people.
    Analysis: Your threshold is 6.0 and (unauthenticated) attack over the network is not possible. There are some vulnerabilities of which you are affected but the database is only accessed by an application. So those vulnerabilities cannot be exploited easily.
    Action: You possibly can ignore this CPU for the MySQL database this time. But you have a vulnerability in the .Net Connector (Connector/Net). If an attacker exploits the vulnerability on the Connector he possibly can get access to the data. So you have to upgrade the Connector of your application accessing the database.

If you follow the ideas of this aid you will probably have one or two upgrades a year. And this you should do anyway just to stay up to date...

See also Common Vulnerability Scoring System Version 3.0 Calculator.

Taxonomy upgrade extras:  cpu security mysql upgrade

Updating/Deleting Rows From Clickhouse (Part 2)

In this post, we’ll look at updating and deleting rows with ClickHouse. It’s the second of two parts.

In the first part of this post, we described the high-level overview of implementing incremental refresh on a ClickHouse table as an alternative support for UPDATE/DELETE. In this part, we will show you the actual steps and sample code.

Prepare Changelog Table

First, we create the changelog table below. This can be stored on any other MySQL instance separate from the source of our analytics table. When we run the change capture script, it will record the data on this table that we can consume later with the incremental refresh script:

CREATE TABLE `clickhouse_changelog` ( `db` varchar(255) NOT NULL DEFAULT '', `tbl` varchar(255) NOT NULL DEFAULT '', `created_at` date NOT NULL, `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `log_file` varchar(255) NOT NULL, `log_pos` int(10) unsigned NOT NULL, PRIMARY KEY (`db`,`tbl`,`created_at`), KEY `log_file` (`log_file`,`log_pos`) ) ENGINE=InnoDB;

Create ClickHouse Table

Next, let’s create the target ClickhHouse table. Remember, that the corresponding MySQL table is below:

CREATE TABLE `hits` ( `id` int(11) NOT NULL AUTO_INCREMENT, `type` varchar(100) DEFAULT NULL, `user_id` int(11) NOT NULL, `location_id` int(11) NOT NULL, `created_at` datetime DEFAULT NULL PRIMARY KEY (`id`), KEY `created_at` (`created_at`) ) ENGINE=InnoDB;

Converting this table to ClickHouse looks like below, with the addition of a “created_day” column that serves as the partitioning key:

CREATE TABLE hits ( id Int32, created_day Date, type String, user_id Int32, location_id Int32, created_at Int32 ) ENGINE = MergeTree PARTITION BY toMonday(created_day) ORDER BY (created_at, id) SETTINGS index_granularity = 8192;

Run Changelog Capture

Once the tables are ready, running the change capture script. An example script can be found in this gist, which is written in Python and uses the python-mysql-replication library. This library acts as replication client, continuously downloads the binary logs from the source and sifts through it to find any UPDATE/DELETE executed against our source table.

There are a few configuration options that need to be customized in the script.

  • LOG_DB_HOST: The MySQL host where we created the clickhouse_changelog table.
  • LOG_DB_NAME: The database name where the clickhouse_changelog table is created.
  • SRC_DB_HOST: The MySQL host where we will be downloading binary logs from. This can either be a primary or secondary/replica as long as its the same server where our raw table is also located.
  • MYSQL_USER: MySQL username.
  • MYSQL_PASS: MySQL password.
  • TABLE: The table we want to watch for changes.

When the script is successfully configured and running, the

clickhouse_changelog table should start populating with data like below.mysql> select * from mydb.clickhouse_changelog; +------+------+------------+---------------------+------------------+-----------+ | db | tbl | created_at | updated_at | log_file | log_pos | +------+------+------------+---------------------+------------------+-----------+ | mydb | hits | 2014-06-02 | 2017-12-23 17:19:33 | mysql-bin.016353 | 18876747 | | mydb | hits | 2014-06-09 | 2017-12-23 22:10:29 | mysql-bin.016414 | 1595314 | | mydb | hits | 2014-06-16 | 2017-12-23 02:59:37 | mysql-bin.016166 | 33999981 | | mydb | hits | 2014-06-23 | 2017-12-23 18:09:33 | mysql-bin.016363 | 84498477 | | mydb | hits | 2014-06-30 | 2017-12-23 06:08:59 | mysql-bin.016204 | 23792406 | | mydb | hits | 2014-08-04 | 2017-12-23 18:09:33 | mysql-bin.016363 | 84499495 | | mydb | hits | 2014-08-18 | 2017-12-23 18:09:33 | mysql-bin.016363 | 84500523 | | mydb | hits | 2014-09-01 | 2017-12-23 06:09:19 | mysql-bin.016204 | 27120145 | +------+------+------------+---------------------+------------------+-----------+

Full Table Import

So we have our changelog capture in place, the next step is to initially populate the ClickHouse table from MySQL. Normally, we can easily do this with a 

mysqldump into a tab-separated format, but remember we have to transform the created_at column from MySQL into ClickHouse’s Date format to be used as partitioning key.

A simple way to do this is by using a simple set of shell commands like below:

SQL=$(cat <<EOF SELECT id, DATE_FORMAT(created_at, "%Y-%m-%d"), type, user_id, location_id, UNIX_TIMESTAMP(created_at) FROM hits EOF ) mysql -h source_db_host mydb -BNe "$sql" > hist.txt cat hist.txt | clickhouse-client -d mydb --query="INSERT INTO hits FORMAT TabSeparated"

One thing to note about this process is that the

MySQL client buffers the results for the whole query, and it could eat up all the memory on the server you run this from if the table is really large. To avoid this, chunk the table into several million rows at a time. Since we already have the changelog capture running and in place from the previous step, you do not need to worry about any changes between chunks. We will consolidate those changes during the incremental refreshes. Incremental Refresh

After initially populating the ClickHouse table, we then set up our continuous incremental refresh using a separate script. A template script we use for the table on our example can be found in this gist.

What this script does is twofold:

  • Determines the list of weeks recently modified based on clickhouse_changelog, dump rows for those weeks and re-imports to ClickHouse.
  • If the current week is not on the list of those with modifications, it also checks for new rows based on the auto-incrementing primary key and appends them to the ClickHouse table.

An example output of this script would be:

ubuntu@mysql~/clickhouse$ bash clickhouse-incr-refresh.sh hits 2017-12-24_00_20_19 incr-refresh Starting changelog processing for hits 2017-12-24_00_20_19 incr-refresh Current week is: 2017-12-18 2017-12-24_00_20_19 incr-refresh Processing week: 2017-12-18 2017-12-24_00_20_20 incr-refresh Changelog import for hits complete ubuntu@mysql~/clickhouse$ bash clickhouse-incr-refresh.sh hits 2017-12-24_00_20_33 incr-refresh Starting changelog processing for hits 2017-12-24_00_20_33 incr-refresh Current week is: 2017-12-18 2017-12-24_00_20_33 incr-refresh Weeks is empty, nothing to do 2017-12-24_00_20_33 incr-refresh Changelog import for hits complete 2017-12-24_00_20_33 incr-refresh Inserting new records for hits > id: 5213438 2017-12-24_00_20_33 incr-refresh No new rows found 2017-12-24_00_20_33 incr-refresh Incremental import for hits complete ubuntu@mysql~/clickhouse$

Note that, on step 4, if you imported a really large table and the changelog had accumulated a large number of changes to refresh, the initial incremental execution might take some time. After that though, it should be faster. This script can be run every minute, longer or shorter, depending on how often you want the ClickHouse table to be refreshed.

To wrap up, here is a query from MySQL on the same table, versus ClickHouse.

mysql> SELECT COUNT(DISTINCT user_id) FROM hits WHERE created_at -> BETWEEN '2016-01-01 00:00:00' AND '2017-01-01 00:00:00'; +-------------------------+ | COUNT(DISTINCT user_id) | +-------------------------+ | 3023028 | +-------------------------+ 1 row in set (25.89 sec)

:) SELECT COUNT(DISTINCT user_id) FROM hits WHERE created_at BETWEEN 1451606400 AND 1483228800; SELECT COUNTDistinct(user_id) FROM hits WHERE (created_at >= 1451606400) AND (created_at <= 1483228800) ┌─uniqExact(user_id)─┐ │ 3023028 │ └────────────────────┘ 1 rows in set. Elapsed: 0.491 sec. Processed 35.82 million rows, 286.59 MB (73.01 million rows/s., 584.06 MB/s.)

Enjoy!

Fun with Bugs #60 - On Some Memory Leaks, Replication and Other Bugs Fixed in MySQL 5.7.21

Oracle had formally released MySQL 5.7.21 yesterday. I do not bother any more to study MySQL release notes carefully and completely, but during a quick review today I've noted several interesting items I'd like you to pay attention to.

I am historically interested in InnoDB implementation details, so I could not miss Bug #87619 - "InnoDB partition table will lock into the near record as a condition in the use ". This was a regression bug in 5.7+, probably caused by new implementation of partitioning in InnoDB.

Another interesting bug is Bug #86927 - "Renaming a partitioned table does not update mysql.innodb_table_stats.", by Jean-François Gagné. It was yet another bug in InnoDB's persistent statistics (that I truly hate). What makes it especially interesting to me, though, is that it's the first public bug report I noted that mentioned MySQL 9.0.0 release as a target for the fix:
"Fixed as of the upcoming 5.7.21, 8.0.4, 9.0.0 release"So, it's clear that back in October 2017 Oracle had already got a separate branch for upcoming MySQL 9.0.x! It also probably means that MySQL 8.0.x GA is coming really soon.

There are bug reports that are worth reading for technical reasons, others - only if you want to get some fun. Bug #86607 - "InnoDB crashed when master thread evict dict_table_t object" is agood example that covers both cases. Good to know the crash is fixed, but, please, make sure to read all comments there.

In this release I've noted fixes to several public bugs reported by Shane Bester. The first one of them is Bug #86573 - "foreign key cascades use excessive memory". Check how he used memory instrumentation in Performance Schema to demonstrate the problem! In Bug #86482 - "innodb leaks memory, performance_schema file_instances #sql-ib3129987-252773.ibd", he used similar approach to show potential memory leak in the Performance Schema itself ! Yet another bug that mentions 9.0.0 as a target version for the fix, among others... 

Bug #78048 - "INNODB Full text Case sensitive not working", is here both because I recently started to notice problems related to InnoDB FULLTEXT indexing, again (first time was soon after it was introduced), and because it has an MTR  test case contributed by Sveta Smirnova.


XA transactions support had always been problematic in MySQL  (still "Verified" Bug #87526 by Sveta Smirnova is one of recent examples how incomplete or useless it can be, see also MDEV-14593). Check the following bugs fixed in MySQL 5.7.21 if you use XA transactions:
  • Bug #87393 - "xa rollback with wrong xid will be recorded into the binlog". It was reported by HongXiang Jiang, who had also contributed a patch.
  • Bug #83295 - "replication error occurs, use xa transaction(one phase)". Yet another XA transactions problem reported by Hiroyuki Itoh and then confirmed by many affected users. Nice to see it fixed.
There are many fixes in MySQL 5.7.21 related to memory leaks. Two bug reports of this kind were from Przemyslaw Malkowski:
  • Bug #85371 - "Memory leak in multi-source replication when binlog_rows_query_log_events=1". Again, memory instrumentation of Performance Schema was used to demonstrate the problem. Vlad Lesin, also from Percona, contributed the patch for this bug.
  • Bug #85251 - "Memory leak in master-master GTID replication with sync_relay_log_info". Here Vlad Lesin, who had contributed the patch, also used Massif for the detailed analysis.
To summarize, I start to miss memory instrumentation in Performance Schema in MariaDB 10.x... This is a really useful feature.

I usually care about optimizer bugs, and these two attracted my attention:
  • Bug #87207 - "select distinct with secondary key for 'Using index for group-by' bad results". This nice optimizer regression bug was found by Shane Bester. As a workaround, while you do not use 5.7.21, you can try to set optimizer_switch='use_index_extensions=off'. I'd keep it that way by default...
  • Bug #72854 - "Extremely slow performance with outer joins and join buffer". I am happy to see this old optimizer bug reported by Sergey Petrunya from MariaDB finally fixed.
You can find a lot more details, including usual references to MySQL bug reports that are still private, in the Release Notes. Keep reading and consider upgrade :)

Webinar January 18, 2018: MySQL Troubleshooting and Performance Optimization with Percona Monitoring and Management (PMM) Part 2

Join Percona’s Product Manager Michael Coburn as he presents MySQL Troubleshooting and Performance Optimization with Percona Monitoring and Management (PMM) Part 2 on Thursday, January 18, 2018, at 11:00 am PST / 2:00 pm EST (UTC-8).

Register Now

Tags: Percona Monitoring and Management, PMM, Monitoring, MySQL, Performance, Optimization, DBA, SysAdmin, DevOps
Experience Level: Expert

Optimizing MySQL performance and troubleshooting MySQL problems are two of the most critical and challenging tasks for MySQL DBAs. The databases powering your applications need to handle heavy traffic loads while remaining responsive and stable. This is so that you can deliver an excellent user experience. Furthermore, DBA’s are also expected to find cost-efficient means of solving these issues.

In this webinar — the second part of a two-part series — Michael discusses how you can optimize and troubleshoot MySQL performance and demonstrate how Percona Monitoring and Management (PMM) enables you to solve these challenges using free and open source software. We will look at specific, common MySQL problems and review the essential components in PMM that allow you to diagnose and resolve them.

By the end of this webinar, you will have a better understanding of how you can troubleshoot MySQL problems in your database.

Register for the webinar now.

Michael Coburn, Product Manager

Michael joined Percona as a Consultant in 2012 and progressed through various roles including Managing Consultant, Principal Architect, Technical Account Manager, and Technical Support Engineer. He is now leading the Product Manager of Percona Monitoring and Management.

MySQL Workbench on Slack

Now that we have the new MySQL community workspace on Slack we opened a workspace channel for our users there. If you have a question or problems with Workbench this is the place to be.

This channel will soon replace our old #workbench IRC channel.

MySQL on Docker - How to Containerize Your Database - New Whitepaper

Severalnines is happy to announce that our new whitepaper “MySQL on Docker - How to Containerize Your Database” is now available to download for free!

While the idea of containers has been around since the early days of Unix, Docker made waves in 2013 when it hit the market with its innovative solution. Docker allows you to add your stacks and applications to containers where they share a common operating system kernel. This lets you have a lightweight virtualized system with almost zero overhead. Docker also lets you bring containers up or down in seconds, making for rapid deployment of your stack.

Download whitepaper

Severalnines has been experimenting with and writing about how to utilize Docker for MySQL in our MySQL on Docker Blog Series since 2014.

This new white paper is the culmination of years of work by our team trying to understand how best to deploy and manage MySQL on Docker while utilizing the advanced monitoring and management features found in ClusterControl.

The topics covered in this white paper are...

  • An Introduction to Docker
  • MySQL Docker Images
  • Networking in Docker
  • Understanding MySQL Containers & Volume
  • Monitoring and Management of MySQL in Docker
    • Docker Security
    • Backup and Restores
  • Running ClusterControl on Docker

If your organization is or plans on taking advantage of the latest in Docker container technology in conjunction with their open source MySQL databases, this whitepaper will help you better understand what you need to do to get started.

ClusterControl Single Console for Your Entire Database Infrastructure Find out what else is new in ClusterControl Install ClusterControl for FREE ClusterControl on Docker Related resources  Download the Whitepaper  ClusterControl on Docker  MySQL Docker Containers - Understanding the Basics  Running Galera Cluster in Production with ClusterControl on Kubernetes

ClusterControl provides advanced management and monitoring functionality to get your MySQL replication and clustered instances up-and-running using proven methodologies that you can depend on to work. Used in conjunction with other orchestration tools for deployment to the containers, ClusterControl makes managing your open source databases easy with point-and-click interfaces and no need to have specialized knowledge about the technology.

ClusterControl delivers on an array of features to help manage and monitor your open source database environments:

  • Management & Monitoring: ClusterControl provides management features to repair and recover broken nodes, as well as test and automate MySQL upgrades.
  • Advanced Monitoring: ClusterControl provides a unified view of all MySQL nodes and clusters across all your data centers and lets you drill down into individual nodes for more detailed statistics.
  • Automatic Failure Detection and Handling: ClusterControl takes care of your replication clusters health. If a master failure is detected, ClusterControl automatically promotes one of the available slaves to ensure your cluster is always up.

Learn more about how ClusterControl can enhance performance here or pull the Docker Image here.

Tags:  docker MySQL container how to intro mysql on docker database containers containerisation

List of Conferences & Events w/ MySQL, January - March 2018!

As a tradition (and follow up of the previous announcement posted on Oct 23, & Sep 15, 2017) we would like to inform you about a conferences & events we - MySQL Community Team - are attending together with MySQL experts in the first quarter of 2018. Please be aware that the list does not have to be final, during the time more events could be added or some of them removed.  

January 2018:  

  • OpenSource Conference (OSC) Osaka, Japan, January 26-27, 2018
    • We are happy to announce MySQL attendance and MySQL session at the OSC event in Osaka, Japan. Please do not miss to find us at MySQL boot in the exhibition area as well as find the MySQL talk. Please find more information at the OSC website.

February 2018:  

  • FOSDEM 2018, Brussels, Belgium, February 3-4, 2018
    • MySQL Community Team is going to be again part of FOSDEM this year. Please mark your calendars. There will be the second edition of the "MySQL Day" just before FOSDEM as well as “MySQL and Friends” dev room scheduled as part of FOSDEM conference & also the "MySQL Community Dinner" which is still TBC. Please find more details/schedule/content & logistics below:    
  • SunShine PHP, Miami, Florida, February 8-10, 2018
    • We are happy to announce our attendance on SunShine PHP this year again. Please do not miss a MySQL talk (listed below) & visit our booth. 
      • Talk: "MySQL 8.0" by David Stokes, scheduled for Fri, Feb 9, 2018 @10:15-11:15am, see the schedule.
  • PHP UK, London, UK, February 14-16, 2018
    • We are happy to announce that we are having a talk at PHP UK this February. Please find more details below.
      • "MySQL 8 - A New Beginning", given by David Stokes, the MySQL Community Manager, scheduled for Fri, Feb 16, 2018 @11:30 - 12:30pm in Side Track 1, see more details in the schedule.  
  • ​​Korean Community Day (KCD), Seoul, Korea, February 24, 2018  
    • We are happy to announce MySQL Community Team's involvement in the Korean Community Day as Session sponsor. Please find our local staff there to talk about MySQL topics.​
  • PyCon Philippines, Manila, Phgilippines, February 24-25, 2018  
    • ​MySQL Community team is going to become a Premium partner of this conference with a talk on "MySQL 8.0 and InnoDB Cluster live demo" given by Ronen Baram, the MySQL Principal Sales Consultant. Please watch the official PyCon schedule for potential changes/more details.​

March 2018:

  • ConFoo Montreal, CA, March 7-9, 2018
    • We are happy to announce that we have two accepted MySQL related topics at ConFoo Montreal, both given by David Stokes, the MySQL Community Manager, see details below:
      • “Advance MySQL Query Tuning” scheduled for Thu, March 8, 2018 @15:00-15:45 pm   
      • “MyQL Without The SQL” scheduled for Fri, March 9, 2018 @11:00-11:45 am.
  • Southern California Linux Expo (SCaLE), Pasadena, US, March 8-11, 2018​
    • We are going to be set as Gold sponsor of this show. Do not miss the opportunity to talk to our MySQL Community Manager, David Stokes in the exhibition area at MySQL booth. 
  • FOSSASIA, Singapore, March 22-25, 2018
    • As a tradition we are going to be part of the biggest Free Open Source Summit in Asia as Bronze sponsor. Come to visit our MySQL booth in expo area as well as do not miss session(s) on MySQL topic which we strongly believe will be accepted. For further updates watch the conference website. 

... to be continued...

 

ProxySQL Firewalling

In this blog post, we’ll look at ProxySQL firewalling (how to use ProxySQL as a firewall).

Not long ago we had an internal discussion about security, and how to enforce a stricter set of rules to prevent malicious acts and block other undesired queries. ProxySQL came up as a possible tool that could help us in achieving what we were looking for. Last year I wrote about how to use ProxySQL to stop a single query.

That approach may be good for few queries and as a temporary solution. But what can we do when we really want to use ProxySQL as an SQL-based firewall? And more importantly, how to do it right?

First of all, let us define what “right” can be in this context.

For right I mean an approach that allows us to have rules matching as specifically as possible, and impacting the production system as little as possible.

To make this clearer, let us assume I have three schemas:

  • Shakila
  • World
  • Windmills

I want to have my firewall block/allow SQL access independently by each schema, user, eventually by source, and so on.

There are a few case where this is not realistic, like in SaaS setups where each schema represents a customer. In this case, the application will have exactly the same kind of SQL – just pointing to different schemas depending the customer.

Using ProxySQL

Anyhow… ProxySQL allows you to manage query firewalling in a very simple and efficient way using the query rules.

In the mysql_query_rules table, we can define a lot of important things – one being setting our SQL firewall.

How?

Let us take a look to the mysql_query_rules table:

rule_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 0, username VARCHAR, schemaname VARCHAR, flagIN INT NOT NULL DEFAULT 0, client_addr VARCHAR, proxy_addr VARCHAR, proxy_port INT, digest VARCHAR, match_digest VARCHAR, match_pattern VARCHAR, negate_match_pattern INT CHECK (negate_match_pattern IN (0,1)) NOT NULL DEFAULT 0, re_modifiers VARCHAR DEFAULT 'CASELESS', flagOUT INT, replace_pattern VARCHAR, destination_hostgroup INT DEFAULT NULL, cache_ttl INT CHECK(cache_ttl > 0), reconnect INT CHECK (reconnect IN (0,1)) DEFAULT NULL, timeout INT UNSIGNED, retries INT CHECK (retries>=0 AND retries <=1000), delay INT UNSIGNED, next_query_flagIN INT UNSIGNED, mirror_flagOUT INT UNSIGNED, mirror_hostgroup INT UNSIGNED, error_msg VARCHAR, OK_msg VARCHAR, sticky_conn INT CHECK (sticky_conn IN (0,1)), multiplex INT CHECK (multiplex IN (0,1,2)), log INT CHECK (log IN (0,1)), apply INT CHECK(apply IN (0,1)) NOT NULL DEFAULT 0, comment VARCHAR)

We can define rules around almost everything: connection source, port, destination IP/Port, user, schema, SQL text or any combination of them.

Given we may have quite a large set of queries to manage, I prefer to logically create “areas” around which add the rules to manage SQL access.

For instance, I may decide to allow a specific set of SELECTs to my schema windmills, but nothing more.

Given that, I allocate the set of rule IDs from 100 to 1100 to my schema, and add my rules in three groups.

  1. The exception that will bypass the firewall
  2. The blocking rule(s) (the firewall)
  3. The managing rules (post-processing, like sharding and so on)

There is a simple thing to keep in mind when you design rules for firewalling: do you need post-processing of the query or not?

In the case that you DON’T need post-processing, the rule can simply apply and exit the QueryProcessor. That is probably the most common scenario, and read/write splits can be defined in the exception rules assigned to the rule for the desired HostGroup.

If you DO need post-processing, the rule MUST have apply=0 and the FLAGOUT must be defined. That allows you to have additional actions once the query is beyond the firewall. An example is in case of sharding, where you need to process the sharding key/comment or whatever.

I will use the simple firewall scenario, given this is the topic of the current article.

The rules

Let us start with the easy one, set 2, the blocking rule:

insert into mysql_query_rules (rule_id,username,schemaname,match_digest,error_msg,active,apply) values(1000,'pxc_test','windmills','.','You cannot pass.....I am a servant of the Secret Fire, wielder of the flame of Anor,. You cannot pass.',1, 1);

In this query rule, I had defined the following:

  • User connecting
  • Schema name
  • Any query
  • Message to report
  • Rule_id

That rule will block ANY query that tries to access the schema windmills from application user pxc_test.

Now in set 1, I will add all the rules I want to let pass. I will report here one only, but all can be found in GitHub here (https://github.com/Tusamarco/blogs/tree/master/proxysql_firewall).

insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,schemaname,active,retries,apply,flagout,match_digest) values(101,6033,'pxc_test',52,'windmills',1,3,1,1000,'SELECT wmillAUTOINC.id,wmillAUTOINC.millid,wmillAUTOINC.location FROM wmillAUTOINC WHERE wmillAUTOINC.millid=.* and wmillAUTOINC.active=.*');

That is quite simple and straightforward, but there is an important element that you must note. In this rule, apply must have value of =1 always, to allow the query rule to bypass without further delay the firewall.

(Side Note: if you need post-processing, the flagout needs to have a value (like flagout=1000) and apply must be =0. That allows the query to jump to set 3, the managing rules.)

This is it, ProxySQL will go to the managing rules as soon as it finds a matching rule that allows the application to access my database/schema, or it will exit if apply=1.

A graph will help to understand better:

Rule set 3 has the standard query rules to manage what to do with the incoming connection, like sharding or redirecting SELECT FOR UPDATE, and so on:

insert into mysql_query_rules (rule_id,proxy_port,schemaname,username,destination_hostgroup,active,retries,match_digest,apply,flagin) values(1040,6033,'windmills','pxc_test',50,1,3,'^SELECT.*FOR UPDATE',1,1000);

Please note the presence of the flagin, which matches the flagout above.

Setting rules, sometimes thousands of them, can be very confusing. It is very important to correctly plan what should be in as an excluding rule and what should not. Do not rush, take your time and identify the queries you need to manage carefully.

Once more ProxySQL can help us. Querying the table stats_mysql_query_digest tells us exactly what queries were sent to ProxySQL:

admin@127.0.0.1) [main]>select hostgroup,schemaname,digest,digest_text,count_star from stats_mysql_query_digest where schemaname='windmills' order by count_star desc;

The above query shows us all the queries hitting the windmills schema. From there we can decide which queries we want to pass and which not.

>select hostgroup,schemaname,digest,digest_text,count_star from stats_mysql_query_digest where schemaname='windmills' order by count_star desc limit 1G *************************** 1. row *************************** hostgroup: 50 schemaname: windmills digest: 0x18CA8FF2C9C53276 digest_text: SHOW GLOBAL STATUS count_star: 141

Once we have our set done (check on github for an example), we are ready to check how our firewall works.

By default, I suggest you to keep all the exceptions (in set 1) with active=0, just to test the firewall.

For instance, my application generates the following exception:

com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You cannot pass.....I am a servant of the Secret Fire, wielder of the flame of Anor,. You cannot pass. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.Util.getInstance(Util.java:386) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1054) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4187) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4119) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2570) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2731) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2809) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2758) at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1612) at net.tc.stresstool.statistics.providers.MySQLStatus.getStatus(MySQLStatus.java:48) at net.tc.stresstool.statistics.providers.MySQLSuper.collectStatistics(MySQLSuper.java:92) at net.tc.stresstool.statistics.StatCollector.collectStatistics(StatCollector.java:258) at net.tc.stresstool.StressTool.<init>(StressTool.java:198) at net.tc.stresstool.StressTool.main(StressTool.java:282)

Activating the rules, will instead allow your application to work as usual.

What is the impact?

First, let’s define the baseline by running the application without any rule blocking (but only the r/w split (set 3)).

Queries/sec:

Queries/sec global

Using two application servers:

  • Server A: Total Execution time = 213
  • Server B: Total Execution time = 209

Queries/sec per server

As we can see, queries are almost equally distributed.

QueryProcessor time taken/Query processed total

All queries are processed by QueryProcessor in ~148ms AVG (total).

QueryProcessor efficiency per query

The single query cost is in nanoseconds (avg 10 us).

Use match_digest

Once we’ve defined the baseline, we can go ahead and activate all the rules using the match_digest. Run the same tests again and… :

Using two application servers:

  • Server A: Total Execution time = 207
  • Server B: Total Execution time = 204

First of all, we notice that the execution time did not increase. This is mainly because we have CPU cycles to use in the ProxySQL box:

Here we have a bit of unbalance. We will investigate that in a separate session, but all in all, time/effort looks ok:

Here we have the first thing to notice. Comparing this to the baseline we defined, we can see that using the rules as match_digest significantly increased the execution time to 458ms:

Also notice that if we are in the range of nanoseconds, the cost of processing the query is now three times that of the baseline. Not too much, but if you add a stone to another stone and another stone and another stone … you end up building a huge wall.

So, what to do? Up to now, we saw that managing the firewall with ProxySQL is easy and it can be set at very detailed level – but the cost may not be what we expect it to be.

What can be done? Use DIGEST instead

The secret is to not use match_digest (which implies interpretation of the string) but to use the DIGEST of the query (which is calculated ahead and remains constant for that query).

Let us see what happens if we run the same load using DIGEST in the MYSQL_QUERY_RULES table:

Using two application servers:

  • Server A: Total Execution time = 213
  • Server B: Total Execution time = 209

No, this is not an issue with cut and paste. I had more or less the same execution time as without rules, at the seconds (different millisecond though):

Again, there is some unbalance, but a minor thing:

And we drop to 61ms for execution of all queries. Note that we improve the efficiency of the Query Processor from 148ms AVG to 61ms AVG.

Why? Because our rules using the DIGEST also have the instructions for read/write split, so requests can exit the Query Processor with all the information required at this stage (much more efficient).

Finally, when using the DIGEST the cost for query drops to 4us which is … LOW!

That’s it! ProxySQL using the DIGEST field from mysql_query_rules performs much better given that it doesn’t need to analyze the whole SQL string with regular expressions – it just matches the DIGEST.

Conclusions

ProxySQL can be effectively used as an SQL firewall, but some best practices should be taken in to consideration. First of all, try to use specific rules and be specific on what should be filtered/allowed. Use filter by schema or user or IP/port or combination of them. Always try to avoid match_digest and use digest instead. That allows ProxySQL to bypass the call to the regularExp lib and is far more efficient. Use stats_mysql_query_digest to identify the correct DIGEST.

Regarding this, it would be nice to have a GUI interface that allows us to manage these rules. That would make the usage of the ProxySQL much easier, and the maintenance/creation of rule_chains friendlier.

Backup and data streaming with xbstream, tar, socat, and netcat

On April 4th 2012 Xtrabackup 2.0 was released in to GA by Percona along with a new streaming feature called xbstream. This new tool allowed for compression and parallelism of streaming backups when running xtrabackup or innobackupex without having to stream using tar, then pipe to gzip or pigz, then pipe to netcat or socat to stream your backup to the recipient server. This resulted in simplifying the command structure a great deal and fast became the preferred way of streaming backups from a origin server to its destination.

In recent months we’ve had discussions internally as to whether xbstream would be a better way of streaming large amounts of data between servers for use cases outside of xtrabackup. And which is better, socat or netcat? So I decided to put this to the test.

In order to test this I created two m5.xlarge EC2 instances as this provided an “up to 10 gigabit” level of network performance. I also put both instances in the same availability zone in order to reduce the chance of poor networking skewing my results. Once this was done I installed Percona XtraDB Server 5.6, Xtrabackup 2.4.9, and created a simple database with a data set size of 90Gb.

For my first test I started by using a streaming backup of the entire data set using both the xbstream and tar streaming methods. Compression was not used so to evaluate the streaming methods equally. Both socat and netcat were evaluated.

XBSTREAM / NETCAT TESTS

[root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=1 ./ | nc 172.31.55.250 10001 171228 15:11:13 innobackupex: Starting the backup operation ..... xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied. 171228 15:25:22 completed OK! real 14m9.385s user 3m27.392s sys 3m34.420s [root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=2 ./ | nc 172.31.55.250 10001 171228 15:38:50 innobackupex: Starting the backup operation ..... xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied. 171228 15:50:42 completed OK! real 11m51.915s user 3m31.808s sys 3m34.740s [root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=4 ./ | nc 172.31.55.250 10001 171228 15:38:50 innobackupex: Starting the backup operation ..... xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied. 171228 16:07:28 completed OK! real 11m51.923s user 3m27.836s sys 3m30.088s

XBSTREAM / SOCAT TESTS

[root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=1 ./ | socat -u stdio TCP:172.31.55.250:10001 171228 16:13:51 innobackupex: Starting the backup operation ....... xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied. 171228 16:26:55 completed OK! real 13m3.911s user 3m8.208s sys 2m35.160s [root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=2 ./ | socat -u stdio TCP:172.31.55.250:10001 171228 16:28:16 innobackupex: Starting the backup operation ..... xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied. 171228 16:40:08 completed OK! real 11m51.984s user 3m8.148s sys 2m28.860s [root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=4 ./ | socat -u stdio TCP:172.31.55.250:10001 171228 16:44:54 innobackupex: Starting the backup operation ....... xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied. 171228 16:56:46 completed OK! real 11m51.916s user 3m7.460s sys 2m24.968s

TAR / NETCAT TEST

[root@ip-172-31-54-219 ~]# time innobackupex --stream=tar --parallel=1 ./ | nc 172.31.55.250 10001 171228 17:02:26 innobackupex: Starting the backup operation ....... xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied. 171228 17:16:09 completed OK! real 13m42.910s user 3m19.696s sys 3m47.672s

TAR / SOCAT TEST

[root@ip-172-31-54-219 ~]# time innobackupex --stream=tar --parallel=1 ./ | socat -u stdio TCP:172.31.55.250:10001 171228 17:19:59 innobackupex: Starting the backup operation ...... xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied. 171228 17:33:03 completed OK! real 13m3.940s user 2m59.468s sys 2m29.388s

Here is a summary of the output noted above, in seconds.

 

You’ll notice that the xbstream method outperformed the tar method once we started introducing parallel threads. You may also note that performance gains ended after 2 threads were in use and this is likely due to the fact we may have hit a networking bottleneck. Another interesting thing to note is that with a single thread, socat outperformed netcat, but when it came to using multiple threads, they were about equal.

So what does this mean for moving data outside of xtrabackup / innobackupex? For my next test I decided to focus on just the large data files that I created in the test schema directory, the main reason being that xbstream can handle files, but not directories and cannot act recursively. First I used xbstream and then tried again using tar. Again, compression was not used so we could look at just the streaming method. Both netcat and socat were evaluated

XBSTREAM / NETCAT TESTS

[root@ip-172-31-54-219 streamtest]# time xbstream -c -p 1 ./t* | nc 172.31.55.250 10001 real 12m25.439s user 0m20.928s sys 3m43.492s [root@ip-172-31-54-219 streamtest]# time xbstream -c -p 2 ./t* | nc 172.31.55.250 10001 real 12m28.086s user 0m22.996s sys 3m50.972s [root@ip-172-31-54-219 streamtest]# time xbstream -c -p 4 ./t* | nc 172.31.55.250 10001 real 13m15.775s user 0m21.460s sys 3m50.336s

XBSTREAM / SOCAT TESTS

[root@ip-172-31-54-219 streamtest]# time xbstream -c -p 1 ./t* | socat -u stdio TCP:172.31.55.250:10001 real 11m47.781s user 0m17.132s sys 2m38.168s [root@ip-172-31-54-219 streamtest]# time xbstream -c -p 2 ./t* | socat -u stdio TCP:172.31.55.250:10001 real 11m47.707s user 0m15.816s sys 2m22.884s [root@ip-172-31-54-219 streamtest]# time xbstream -c -p 4 ./t* | socat -u stdio TCP:172.31.55.250:10001 real 11m47.805s user 0m16.796s sys 2m36.588s

TAR / NETCAT TEST

[root@ip-172-31-54-219 streamtest]# time tar -cf - ./t* | nc 172.31.55.250 10001 real 11m47.942s user 0m5.260s sys 2m32.048s

TAR / SOCAT TEST

[root@ip-172-31-54-219 streamtest]# time tar -cf - ./t* | socat -u stdio TCP:172.31.55.250:10001 real 11m47.914s user 0m4.860s sys 1m37.632s

Here is a summary of the output noted above, in seconds.

In this test we can see that almost all the methods worked equally well, with the only less efficient process being xbstream / netcat combination. Keep in mind that the changing of parallel threads with the xbstream -p option didn’t really seem to have an effect because xbstream will not leverage parallel threads on its own. It needs to be working with another tool like xtrabackup that will be able to take advantage of the parallelism.

CONCLUSION

When working with xtrabackup / innobackupex, it looks like xbstream and socat is the way to go. If you’re steaming backups and are not taking advantage of multiple threads, you should consider it.

For large data copies from one server to another. It looks like you’re safe using xbstream or tar, so long as the combination of xbsteam and netcat is avoided. Considering that xbstream will not work with directories or act recursively natively, it may just be easier to stick with tar.

Announcing MySQL Server 5.7.21, 5.6.39, and 5.5.59

MySQL Server 5.7.21, 5.6.39, and 5.5.59, new versions of the popular Open Source Database Management System, have been released. These releases are recommended for use on production systems. For an overview of what’s new, please see http://dev.mysql.com/doc/refman/5.7/en/mysql-nutshell.html http://dev.mysql.com/doc/refman/5.6/en/mysql-nutshell.html http://dev.mysql.com/doc/refman/5.5/en/mysql-nutshell.html For information on installing the release on new servers, please see the MySQL installation documentation at […]

Sneak Peek of the Percona Live 2018 Open Source Database Conference Breakout Sessions!

Take a look at the sneak peek of the breakout sessions for the Percona Live 2018 Open Source Database Conference, taking place April 23-25, 2018 at the Santa Clara Convention Center in Santa Clara, California. Early Bird registration discounts are available until February 4, 2018, and sponsorship opportunities are still available.

Conference breakout sessions will feature a range of in-depth talks related to each of the key areas. Breakout session examples include:

  • Database Security as a Function: Scaling to Your Organization’s Needs – Laine Campbell, Fastly
  • How to Use JSON in MySQL Wrong – Bill Karwin, Square
  • Scaling a High Traffic Database: Moving Tables Across Clusters – Bryana Knight, GitHub
  • MySQL: How to Save Bandwidth – Georgi Kodinov, Oracle
  • MyRocks Roadmaps and Production Deployment at Facebook – Yoshinori Matsunobu, Facebook
  • Securing Your Data on PostgreSQL – Payal Singh, OmniTI Computer Consulting, Inc.
  • The Accidental DBA – Jenni Snyder, Yelp
  • How Microsoft Built MySQL, PostgreSQL and MariaDB for the Cloud – Jun Su, Microsoft
  • MongoDB Cluster Topology, Management and Optimization – Steven Wang, Tesla
  • Ghostferry: A Data Migration Tool for Incompatible Cloud Platforms – Shuhao Wu, Shopify, Inc.

Percona Live Open Source Database Conference 2018 is the premier open source database event. The theme for the upcoming conference is “Championing Open Source Databases,” with a range of topics on MySQL, MongoDB and other open source databases, including time series databases, PostgreSQL and RocksDB. Session tracks include Developers, Operations and Business/Case Studies. A major conference focus will be providing strategies to help attendees meet their business goals by deploying the right mix of database solutions to obtain the performance they need while managing complexity.

Hyatt Regency Santa Clara & The Santa Clara Convention Center

Percona Live 2018 Open Source Database Conference 2018 will be held at the Hyatt Regency Santa Clara & The Santa Clara Convention Center, at 5101 Great America Parkway Santa Clara, CA 95054.

The Hyatt Regency Santa Clara & The Santa Clara Convention Center is a prime location in the heart of the Silicon Valley. Enjoy this spacious venue with complimentary wifi, on-site expert staff and three great restaurants offering Tuscan cuisine, classic American or tantalizing Sushi. Staying for a couple of extra days? Take time to enjoy the Bay Area and enjoy a day in San Francisco located only an hour away. You can reserve a room by booking through the Hyatt’s dedicated Percona Live reservation site.

Book your hotel using Percona’s special room block rate!

Sponsorships

Sponsorship opportunities for Percona Live 2018 Open Source Database Conference 2018 are available and offer the opportunity to interact with the DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors, and entrepreneurs who typically attend the event. Contact live@percona.com for sponsorship details.

 

Pages