Planet MySQL

How to Install TaskBoard on CentOS 7

TaskBoard is a free and open source application to keep a track of the tasks that needs to be done. It requires minimal dependencies to work. Database is stored in SQLite which eliminates the requirement of MySQL or any other database server.

How to Find Values of User Variables With gdb

In his comment to my announcement of the previous post, Shane Bester kindly suggested to consider pretty printing the information about user variables from gdb. I tried to do this tonight, after a long working day, while working with the same Percona server 5.7.x on CentOS 6.9, and found several interesting details to share even before getting to the pretty printing part (that I'd surely try to avoid doing with Python anyway, as I am lazy and not in a mood to use that programming language for a decade already). So, I decided to share them in a separate post.

To begin with, I checked if actually using gdb for this task is needed for anyone else but those poor souls who has to study core dumps after the crash. This time I do not want Mark Leith to step in with a comment that I had better use <some cool P_S feature> (instead of attaching gdb to properly working production server), so I did my homework...

One can surely use the user_variables_by_thread table in the Performance Schema to get these details if server still works, P_S is enabled and it's possible to connect to it to run SQL queries. There is one minor problem (the type of user variable is missing there), that I was happy to report immediately as Bug #87341. By the way, I had found old feature request, Bug #20307, that I verified 11 years ago and that should be just properly closed because of this nice feature of current MySQL.

So, there is (small) reason to use gdb even before any crashes, and I did it immediately after settings values for few user variables in connection with id 16 with the following statements:
set @a := 'aaa', @b := 254, @c := sysdate();
set @e = 10.5;I also want to show how one may use (relatively) new column in the threads table of Performance Schema while studying live server with gdb. This was also a hint from Mark Leith in a comment on my older post, to check THREAD_OS_ID column:
mysql> select * from threads\G
...
*************************** 27. row ***************************
          THREAD_ID: 42
               NAME: thread/sql/one_connection
               TYPE: FOREGROUND
     PROCESSLIST_ID: 16
   PROCESSLIST_USER: root
   PROCESSLIST_HOST: localhost
     PROCESSLIST_DB: test
PROCESSLIST_COMMAND: Sleep
   PROCESSLIST_TIME: 195
  PROCESSLIST_STATE: NULL
   PROCESSLIST_INFO: NULL
   PARENT_THREAD_ID: NULL
               ROLE: NULL
       INSTRUMENTED: YES
            HISTORY: YES
    CONNECTION_TYPE: Socket
       THREAD_OS_ID: 2224
...My first naive attempt to check the details about user variables with gdb ended up as follows:
(gdb) thread 3
[Switching to thread 3 (Thread 0x7fc5997bc700 (LWP 2224))]#0  0x00007fc5cc299383 in poll () from /lib64/libc.so.6
(gdb) p do_command::thd->m_thread_id
$1 = 16
(gdb) p do_command::thd->user_vars
$2 = {key_offset = 0, key_length = 0, blength = 8, records = 4, flags = 0,
  array = {buffer = 0x7fc5aabf1560 "\001", elements = 4, max_element = 16,
    alloc_increment = 32, size_of_element = 16, m_psi_key = 38},
  get_key = 0xc63630 <get_var_key(user_var_entry*, size_t*, my_bool)>,
  free = 0xc636c0 <free_user_var(user_var_entry*)>, charset = 0x1ded740,
  hash_function = 0xeb6990 <cset_hash_sort_adapter>, m_psi_key = 38}I see 4 elements for the thread I am interested in. Looks good. I also assume from the above that they are of user_var_entry type. So, given some buffer, I try to cast it to this type and check (that was stupid, especially further pointer increment attempt - if that were an array of structures imagine how long would it take to find some variable by name!):
(gdb) p do_command::thd->user_vars->array->buffer
$3 = (uchar *) 0x7fc5aabf1560 "\001"
(gdb) p (user_var_entry *)(do_command::thd->user_vars->array->buffer)
$4 = (user_var_entry *) 0x7fc5aabf1560
(gdb) p sizeof(user_var_entry)
$5 = 104
(gdb) set $vars = (user_var_entry *)(do_command::thd->user_vars->array->buffer)
(gdb) p $vars
$6 = (user_var_entry *) 0x7fc5aabf1560
(gdb) p *($vars)
$7 = {static extra_size = 8, m_ptr = 0x1 <Address 0x1 out of bounds>,
  m_length = 140486683628224, m_type = 4294967295
, m_owner = 0x7fc59adf90e0,
  m_catalog = {str = 0xffffffff <Address 0xffffffff out of bounds>,
    length = 140486683628064}, entry_name = {
    m_str = 0xffffffff <Address 0xffffffff out of bounds>,
    m_length = 140486683627904}, collation = {collation = 0x0,
    derivation = DERIVATION_EXPLICIT, repertoire = 0}, update_query_id = 0,
  used_query_id = 0, unsigned_flag = false}
(gdb) set $vars = $vars + 1
(gdb) p *($vars)
$8 = {static extra_size = 8, m_ptr = 0x0, m_length = 0,
  m_type = STRING_RESULT, m_owner = 0x0, m_catalog = {str = 0x0, length = 0},
  entry_name = {m_str = 0x0, m_length = 0}, collation = {collation = 0x0,
    derivation = DERIVATION_EXPLICIT, repertoire = 0}, update_query_id = 0,
  used_query_id = 0, unsigned_flag = false}
...So, while eventually I've see something expected (STRING_RESULT as type) and pointer trick "worked", the content I get is obviously a bullshit - addresses are out of bounds or zero etc. This leads us nowhere at best.

I had to dig into the code to see how this structure is actually used. This is easy with grep:
[root@centos openxs]# grep -rn user_vars git/mysql-server/*
git/mysql-server/sql/item_func.cc:6017:  HASH *hash= & thd->user_vars;
...
git/mysql-server/sql/sql_prepare.cc:2182:    entry= (user_var_entry*)my_hash_search(&thd->user_vars,...
git/mysql-server/storage/perfschema/table_uvar_by_thread.cc:76:    sql_uvar= reinterpret_cast<user_var_entry*> (my_hash_element(& thd->user_vars, index));
...
git/mysql-server/sql/sql_class.cc:1209:  my_hash_init(&user_vars, system_charset_info, USER_VARS_HASH_SIZE, 0, 0,
git/mysql-server/sql/sql_class.cc:1650:  my_hash_init(&user_vars, system_charset_info, USER_VARS_HASH_SIZE, 0, 0,
...
git/mysql-server/sql/rpl_binlog_sender.cc:654:    (user_var_entry*) my_hash_search(&m_thd->user_vars, (uchar*) name.str,
...
git/mysql-server/sql/sql_class.h:1576:  HASH    user_vars;                     // hash for user variables...
git/mysql-server/storage/perfschema/table_uvar_by_thread.cc:76:    sql_uvar= reinterpret_cast<user_var_entry*> (my_hash_element(& thd->user_vars, index));[root@centos openxs]#So, user_vars is actually a HASH, and we see functions to find the item in this hash by name, my_hash_search(), and to get item by index, my_hash_element(). I've immediately proceed to use the latter, as I wanted to see all variables:
...
(gdb) thread 10
[Switching to thread 10 (Thread 0x7fc5997bc700 (LWP 2224))]#0  0x00007fc5cc299383 in poll () from /lib64/libc.so.6
(gdb) p do_command::thd->m_thread_id
$1 = 16
(gdb) p do_command::thd->user_vars$2 = {key_offset = 0, key_length = 0, blength = 8, records = 4, flags = 0,
  array = {buffer = 0x7fc5aabf1560 "\001", elements = 4, max_element = 16,
    alloc_increment = 32, size_of_element = 16, m_psi_key = 38},
  get_key = 0xc63630 <get_var_key(user_var_entry*, size_t*, my_bool)>,
  free = 0xc636c0 <free_user_var(user_var_entry*)>, charset = 0x1ded740,
  hash_function = 0xeb6990 <cset_hash_sort_adapter>, m_psi_key = 38}
(gdb) p my_hash_element(&(do_command::thd->user_vars), 1)
$3 = (uchar *) 0x7fc59adf90e0 "H\221ъ \305\177"
(gdb) set $uvar = (user_var_entry *)(my_hash_element(&(do_command::thd->user_vars), 1))
[New Thread 0x7fc59946f700 (LWP 2447)]
[Thread 0x7fc59973a700 (LWP 2396) exited]
[New Thread 0x7fc59932a700 (LWP 2448)]
(gdb) p $uvar
$4 = (user_var_entry *) 0x7fc59adf90e0
(gdb) p *$uvar
$5 = {static extra_size = 8, m_ptr = 0x7fc59adf9148 "aaa", m_length = 3,
  m_type = STRING_RESULT
, m_owner = 0x7fc59accc000, m_catalog = {
    str = 0x41272c275441434e <Address 0x41272c275441434e out of bounds>,
    length = 6075168230325441358}, entry_name = {m_str = 0x7fc59adf9150 "a",
    m_length = 1}, collation = {collation = 0x1ded740,
    derivation = DERIVATION_IMPLICIT, repertoire = 3}, update_query_id = 430,
  used_query_id = 426, unsigned_flag = false}
...
(gdb) set $uvar = (user_var_entry *)(my_hash_element(&(do_command::thd->user_vars), 3))
[Thread 0x7fc599636700 (LWP 2400) exited]
(gdb) p *$uvar
$7 = {static extra_size = 8, m_ptr = 0x7fc59adf91e8 "\376", m_length = 8,
  m_type = INT_RESULT, m_owner = 0x7fc59accc000, m_catalog = {
    str = 0x655f79625f726573 <Address 0x655f79625f726573 out of bounds>,
    length = 7881702179129419126}, entry_name = {m_str = 0x7fc59adf91f0 "b",
    m_length = 1}, collation = {collation = 0x1ded740,
    derivation = DERIVATION_IMPLICIT, repertoire = 3}, update_query_id = 430,
  used_query_id = 426, unsigned_flag = false}
(gdb) set $uvar = (user_var_entry *)(my_hash_element(&(do_command::thd->user_vars), 4))
(gdb) p *$uvar
Cannot access memory at address 0x0
We see that this experimenting was performed on a server under some load, we see threads created and gone, we are able to get proper details by index, but should take extra case to check addresses before accessing them, and should note that items are numbered starting from zero, not 1 (there is no item 4, we get NULL pointer). If anyone cares, item with index 0 was this:
$8 = {static extra_size = 8, m_ptr = 0x7fc5b23b41c0 "\002", m_length = 64,
  m_type = DECIMAL_RESULT, m_owner = 0x7fc59accc000, m_catalog = {
    str = 0x7665602e60616d65 <Address 0x7665602e60616d65 out of bounds>,
    length = 7593481698565975653}, entry_name = {m_str = 0x7fc59adf9330 "e",
    m_length = 1}, collation = {collation = 0x1ded740,
    derivation = DERIVATION_IMPLICIT, repertoire = 3}, update_query_id = 444,
  used_query_id = 444, unsigned_flag = false}As this was a hash table the order of items is determined by hash, not by the order of assignments to variables.

Now, if I know the name of the variable I can access it in the same way as it's done in the server code (one fo examples found by grep):
    entry= (user_var_entry*)my_hash_search(&thd->user_vars,
                                           (uchar*)lex->prepared_stmt_code.str,
                                           lex->prepared_stmt_code.length);Basically, we need to pass variable name and the length of this name:
(gdb) set $uvar=(user_var_entry *)(my_hash_search(&(do_command::thd->user_vars), "e", strlen("e")))
[Thread 0x7fc5995b4700 (LWP 2848) exited]
(gdb) p *$uvar
$1 = {static extra_size = 8, m_ptr = 0x7fc5b23b41c0 "\002", m_length = 64,
  m_type = DECIMAL_RESULT, m_owner = 0x7fc59accc000, m_catalog = {
    str = 0x7665602e60616d65 <Address 0x7665602e60616d65 out of bounds>,
    length = 7593481698565975653}, entry_name = {m_str = 0x7fc59adf9330 "e",
    m_length = 1}, collation = {collation = 0x1ded740,
    derivation = DERIVATION_IMPLICIT, repertoire = 3}, update_query_id = 444,
  used_query_id = 444, unsigned_flag = false}

...
(gdb) p *((my_decimal *)$uvar->m_ptr)
$4 = {<st_decimal_t> = {intg = 2, frac = 1, len = 9, sign = 0 '\000',
    buf = 0x7fc5b23b41d8}, buffer = {10, 500000000, 0, 0, 0, 0, 257, 0,
    -1697445128}}I highlighted some parts that eventually should help to pretty print decimal values. With just strings it's also not that trivial:
...
$5 = {static extra_size = 8, m_ptr = 0x7fc59adf9148 "aaa", m_length = 3,
  m_type = STRING_RESULT, m_owner = 0x7fc59accc000, m_catalog = {
    str = 0x41272c275441434e <Address 0x41272c275441434e out of bounds>,
    length = 6075168230325441358}, entry_name = {m_str = 0x7fc59adf9150 "a",
    m_length = 1}, collation = {collation = 0x1ded740,
    derivation = DERIVATION_IMPLICIT, repertoire = 3}, update_query_id = 430,
  used_query_id = 426, unsigned_flag = false}
(gdb) p *$uvar->m_ptr
$6 = 97 'a'We can not simply print the pointer in a hope, as this is not a null-terminated string.

I am too tired to create a useful summary right now, but you can see that essentially it's easy (even if not trivial) to get all the details about the user variables defined in any session with gdb, both from working process or from core dump. Enjoy!

What’s New in MariaDB Connector/J 2.1

What’s New in MariaDB Connector/J 2.1 RalfGebhardt Mon, 08/07/2017 - 13:27

We are pleased to announce the general availability (GA) of MariaDB Connector/J 2.1, the newest version of MariaDB Connector/J for Java 8+. 

MariaDB Connector/J 2.1 is fully compatible with MariaDB Connector/J 2.0, so version 2.0.3 was the last maintenance release for 2.0 and will be replaced by version 2.1 moving forward. MariaDB Connector/J 2.1 includes the following new enhancements:

Security

MariaDB Connector/J 2.1 adds verification of SSL certificate name mismatch functionality. When using SSL, the driver will check the hostname against the server's identity as presented in the server's certificate (checking alternative names or certificate CN) to prevent man-in-the-middle attacks.

A new option "disableSslHostnameVerification" can be used to deactivate this new verification, for cases where the old connector behaviour is required.

High Availability

MariaDB Connector/J has provided its own high availability and load balancing functionality for Master-Slave and Multi-Master (Galera) environments since version 1.2.

With the new release and when configured for Multi-Master, the function “Connection.isValid()” is not only validating the connection, it also validates the state of the Galera Cluster node (@@wsrep_cluster_status). If the cluster status of the node is not “Primary”, the connection is not seen as valid and the connection will be removed from the connection pool.

Performance Batched INSERTS

Batch of INSERTS will now be used for better performance where possible. This enhancement requires MariaDB Server 10.2.7.

A new option "useBulkStmts" can be used to control the use of this protocol enhancement.

Updateable resultset

Updateable resultsets can now be used with “ResultSet.CONCUR_UPDATABLE”.


Example:

Statement stmt = con.createStatement(                             ResultSet.TYPE_SCROLL_INSENSITIVE,                             ResultSet.CONCUR_UPDATABLE); ResultSet rs = stmt.executeQuery("SELECT age FROM TABLE2"); // rs will be scrollable, will not show changes made by others, // and will be updatable while(rs.next()) {      //Retrieve by column name      int newAge = rs.getInt(1) + 5;      rs.updateDouble( 1 , newAge );      rs.updateRow(); }

Download the MariaDB Connector now and learn about the newest evolution of MariaDB Connector/J 2.1.
 

Download Release Notes Knowledge Base

 

We are pleased to announce the general availability (GA) of MariaDB Connector/J 2.1, the newest version of MariaDB Connector/J for Java 8+. 

MariaDB Connector/J 2.1 is fully compatible with MariaDB Connector/J 2.0, so version 2.0.3 was the last maintenance release for 2.0 and will be replaced by version 2.1 moving forward. 
 

Login or Register to post comments

MariaDB 10.0.32 and Connectors now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.0.32, as well as the recent releases of as MariaDB Connector/ODBC 3.0.1 (beta), MariaDB Connector/J 2.1.0 and MariaDB Connector/J 1.6.3. See the release notes and changelog for details. Download MariaDB 10.0.32 Release Notes Changelog What is MariaDB 10.0? MariaDB APT and YUM Repository […]

The post MariaDB 10.0.32 and Connectors now available appeared first on MariaDB.org.

MariaDB 10.0.32, and updated connectors now available

MariaDB 10.0.32, and updated connectors now available dbart Mon, 08/07/2017 - 11:23

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.0.32. In the past week, MariaDB Connector/J and MariaDB Connector/ODBC both received updates. See the release notes and changelogs for details and visit mariadb.com/downloads to download.

Download MariaDB 10.0.32

Release Notes Changelog What is MariaDB 10.0?

Download MariaDB Connector/ODBC 3.0.1 Beta

Release Notes Changelog About MariaDB Connector/ODBC

Download MariaDB Connector/J 2.1.0 GA

Release Notes Changelog About MariaDB Connector/J

Download MariaDB Connector/J 1.6.3 GA

Release Notes Changelog About MariaDB Connector/J

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.0.32. See the release notes and changelog for details.

Login or Register to post comments

How to Install MySQL Server with phpMyAdmin on FreeBSD 11

In this tutorial, we will install MySQL with phpMyAdmin along with Apache web server with PHP 5.6. MySQL is a free and open source relational management system. It stores data in tabular format. It is the most popular way of storing the data into the database. phpMyAdmin is also a free and open source application used to administrate a MySQL server instance through a rich graphical user interface.

An Adventure in InnoDB Table Compression (for read-only tables)

In my last post about big MySQL deployments, I am quickly mentioning that InnoDB compression is allowing dividing disk usage by about 4.3 on a 200+ TiB dataset.  In this post, I will give more information about this specific use case of InnoDB table compression and I will share some statistics and learnings on this system and subject.  Note that I am not covering InnoDB page compression which is

More house chores: Cleaning up the binary log

Historically, the MySQL binary log has gone through a few revisions resulting in different versions of how the log is formatted. The following shows the version of the binary log and the MySQL version in which it appears. As you can see, version 4 is the current version of the binary log found in the current releases of MySQL.…

MySQL Group Replication: read your own write across the group

Today is my last day in Asia (China and Taiwan), and between the sessions I had the time to code a small proof-of-concept for people that are asking how they could read their writes on all nodes and ensure the read consistency with their last write ?

The usual (and easier) answer to this particular question is to read on the same host you just wrote. But currently MySQL provides you all the elements to force a consistent read across all the nodes of a group.

Since MySQL 5.7.5, we introduced session_track_gtidsand in 5.7.6 we also introduced  GTIDs context to the OK packet (session tracker) (WL#6128 and WL#6972). So we can use this implementation in correlation with WAIT_FOR_EXECUTED_GTID_SET to force a consistent read.

Currently this solution must be implemented in your application but a tool like ProxySQL might be able to implement this too.

I wrote a small C program using Connector/C to illustrate this: gr_ryow.c

Let’s see this small C program in action:

Of course as we are aware of this request and we might come one day with an easier solution implemented only in the server or the router, but for the moment this already an option for those willing to read their own writes across all members.

 

How to Find Values of Session Variables With gdb

Usually I make notes while working on customer issues, reading the Slack channels or even scrolling across my Facebook news feed. Some of them are of "ToDo" kind, and this week while watching how my colleagues work on a very complicated crash/bug I noted that one of the problems they were discussing was how to find out from the core dump if the session behind the crashing thread had mrr=ON in the optimizer_switch.

I had already written more than once how to check MySQL threads one by one in gdb, depending on version and your real goal. For this post I decided to concentrate on checking the values of session variables and, specifically, individual switchable optimizations. This is actually easy, comparing to checking MySQL plugin variables...

I've used Percona 5.7.x and executed set session optimizer_switch='mrr=off'; in one of sessions. I checked threads one by one in gdb until I find the one I need, where thd variable is defined in some frame:
...
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f7171fbc700 (LWP 2186))]#0  0x00007f71a4be3383 in poll () from /lib64/libc.so.6
(gdb) p do_command::thd->m_thread_id
$1 = 7Local session variables are kept in the field called variables in the THD structure:
(gdb) p do_command::thd->variables
$2 = {dynamic_variables_version = 61,
  dynamic_variables_ptr = 0x7f717359c820 "\001", dynamic_variables_head = 384,
  dynamic_variables_size = 392, dynamic_variables_allocs = 0x7f71735f9040,
  max_heap_table_size = 16777216, tmp_table_size = 16777216,
  long_query_time = 0, end_markers_in_json = 0 '\000',
  optimizer_switch = 523711, optimizer_trace = 0,
  optimizer_trace_features = 15, optimizer_trace_offset = -1,
  optimizer_trace_limit = 1, optimizer_trace_max_mem_size = 16384,
  sql_mode = 1075838976, option_bits = 2147748608,
...
(gdb) p do_command::thd->variables->optimizer_switch
$3 = 523711
...
(gdb) p global_system_variables->optimizer_switch
$9 = 523775 Now, this is surely cool, but how to interpret the value I see, that must be some bitmask? I've printed the value of global variable also (it's in global_system_variables), so from comparing the values I can assume mrr setting is controlled by bit 64 in this bitmask. But we have to check this in the code.

Quick search with grep helps to find out that individual switches are defined in the sql/sql_const.h header file as follows:
/* @@optimizer_switch flags. These must be in sync with optimizer_switch_typelib */
#define OPTIMIZER_SWITCH_INDEX_MERGE               (1ULL << 0)
#define OPTIMIZER_SWITCH_INDEX_MERGE_UNION         (1ULL << 1)
#define OPTIMIZER_SWITCH_INDEX_MERGE_SORT_UNION    (1ULL << 2)
#define OPTIMIZER_SWITCH_INDEX_MERGE_INTERSECT     (1ULL << 3)
#define OPTIMIZER_SWITCH_ENGINE_CONDITION_PUSHDOWN (1ULL << 4)
#define OPTIMIZER_SWITCH_INDEX_CONDITION_PUSHDOWN  (1ULL << 5)
/** If this is off, MRR is never used. */
#define OPTIMIZER_SWITCH_MRR                       (1ULL << 6)/**
   If OPTIMIZER_SWITCH_MRR is on and this is on, MRR is used depending on a
   cost-based choice ("automatic"). If OPTIMIZER_SWITCH_MRR is on and this is
   off, MRR is "forced" (i.e. used as long as the storage engine is capable of
   doing it).
*/
#define OPTIMIZER_SWITCH_MRR_COST_BASED            (1ULL << 7)
#define OPTIMIZER_SWITCH_BNL                       (1ULL << 8)
#define OPTIMIZER_SWITCH_BKA                       (1ULL << 9)
#define OPTIMIZER_SWITCH_MATERIALIZATION           (1ULL << 10)
#define OPTIMIZER_SWITCH_SEMIJOIN                  (1ULL << 11)
#define OPTIMIZER_SWITCH_LOOSE_SCAN                (1ULL << 12)
#define OPTIMIZER_SWITCH_FIRSTMATCH                (1ULL << 13)
#define OPTIMIZER_SWITCH_DUPSWEEDOUT               (1ULL << 14)
#define OPTIMIZER_SWITCH_SUBQ_MAT_COST_BASED       (1ULL << 15)
#define OPTIMIZER_SWITCH_USE_INDEX_EXTENSIONS      (1ULL << 16)
#define OPTIMIZER_SWITCH_COND_FANOUT_FILTER        (1ULL << 17)
#define OPTIMIZER_SWITCH_DERIVED_MERGE             (1ULL << 18)
#define OPTIMIZER_SWITCH_LAST                      (1ULL << 19)So, to check if at session level we have mrr set, we should do a bitwise AND with 1<<6:
(gdb) thread 7
[Switching to thread 7 (Thread 0x7f7171e77700 (LWP 2257))]#0  0x00007f71a4be3383 in poll () from /lib64/libc.so.6
(gdb) p do_command::thd->variables->optimizer_switch
$1 = 523775
(gdb) p do_command::thd->variables->optimizer_switch & (1 << 6)
$2 = 64
...
(gdb) p /t do_command::thd->variables->optimizer_switch
$6 = 1111111110111111111Note that if you know bit positions for individual switches by heart printing with /t modifier may show you all bits set quickly.

MariaDB Server 10.2: JSON/GeoJSON & GIS

MariaDB Server 10.2: JSON/GeoJSON & GIS Bruno Šimić Fri, 08/04/2017 - 13:02

Starting with MariaDB Server 10.2, new JSON/GeoJSON functionality was introduced. Virtual columns also received improvements. With version 10.2 tons of new functionality was added and many of the limitations previously experienced were lifted.

Here, I’ll show how these new MariaDB Server 10.2 features can empower your business.

Side note: One additional piece of information before we start. You cannot imagine how much I hated mathematics back in my school days. Especially geometry and all the Sin, Cos and Radians stuff. Now I’m older, (and eventually) smarter and want to thank my teachers for not giving up!

For today’s task, we’ll create a MariaDB table with JSON objects. It will contain public data about meteorite landings, to see how safe the area I (or you) live in really is. The dataset is publicly available from NASA’s open data portal.

A little bit of geometry background before we go any further.

Latitudes and longitudes are numerical values and they are expressed in degrees. Latitude defines how far north (positive) or south (negative) of the equator a point is located. All points on the equator have a latitude value of zero. The north pole has a latitude of 90, and the south pole has a latitude of -90. So, that’s why all northern hemisphere locations have positive latitude and southern-hemisphere locations have negative values.

By definition, the Greenwich Observatory near London has a longitude of zero. Negative longitude means that this point is west of Greenwich, positive means that the point is east.

We will provide geographic coordinates (latitude and longitude) and by using GIS functions we will get the information about meteorite impacts in a circle of XY km (or miles) around those coordinates.

To get started, I created a table with the following properties:

CREATE TABLE gis_json ( id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT, jsonfield VARCHAR(1024), name VARCHAR(255) as (JSON_VALUE(jsonfield,'$.name')), gis_point POINT as ( ST_GeomFromGeoJSON(JSON_QUERY( jsonfield, '$.geolocation')) ) PERSISTENT, rlat VARCHAR(20) as (JSON_VALUE(jsonfield,'$.reclat')), rlong VARCHAR(20) as (JSON_VALUE(jsonfield,'$.reclong')), KEY jsonkey (name), CHECK (JSON_VALID(jsonfield)) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

I’m using jsonfield to save the actual JSON formatted data.

The field name is a virtual column, it’s matching a value from the JSON key ‘name’.

The field gis_point is a virtual column too, but here I’m using a persistent method to save its content. It is calculated from the JSON field ‘geolocation’ using GeoJSON function ST_GeomFromGeoJSON. All geolocation data in the JSON file are Point type.

I enabled a Check to be sure that the JSON data I read is valid and properly formatted. I also defined an index on the field name.

Here is just a sample of one JSON data record:

{ "fall":"Fell", "geolocation":{ "type":"Point","coordinates":[50.775,6.08333] }, "Id":"1", "Mass":"21", "name":"Aachen", "nametype":"Valid", "recclass":"L5", "reclat":"50.775000", "reclong":"6.083330", "year":"1880-01-01" }

As you can see, some of the data is redundant, but I decided to keep this record for the sake of simplicity.

Now, we can create the database and the table, and insert the data:

Welcome to the MariaDB monitor.  Commands end with ; or \g. Your MariaDB connection id is 31 Server version: 10.2.6-MariaDB Homebrew Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> create database geodata; Query OK, 1 row affected (0.00 sec) MariaDB [(none)]> use geodata; Database changed MariaDB [geodata]> source json_gis_sample.sql; Query OK, 0 rows affected, 1 warning (0.00 sec) Query OK, 0 rows affected (0.03 sec) Query OK, 38401 rows affected (0.67 sec) Records: 38401  Duplicates: 0  Warnings: 0

After we have imported the whole dataset (38401 records), we can execute our query:

SELECT name, impactyear, mass_g, Y(gis_point) as latit, X(gis_point) as longit, round(distance,2) as distance_km FROM ( SELECT name, JSON_VALUE(jsonfield,'$.year') as impactyear, JSON_VALUE(jsonfield,'$.mass') as mass_g, gis_point, r , units * DEGREES( ACOS( COS(RADIANS(latpoint)) * COS(RADIANS(X(gis_point))) * COS(RADIANS(longpoint) - RADIANS(Y(gis_point))) + SIN(RADIANS(latpoint)) * SIN(RADIANS(X(gis_point))))) AS distance FROM gis_json JOIN ( SELECT 48.139 AS latpoint, 11.566 AS longpoint, 100.0 AS r, 111.045 AS units ) AS p WHERE MbrContains(GeomFromText( CONCAT('LINESTRING(', latpoint-(r/units),' ', longpoint-(r /(units* COS(RADIANS(latpoint)))), ',', latpoint+(r/units) ,' ', longpoint+(r /(units * COS(RADIANS(latpoint)))), ')')), gis_point) ) AS d WHERE distance

As you can see with impactyear and mass_g, we can grab the JSON data directly from the jsonfield, using the JSON_VALUE() function. I’m also using MbrContains, which is a Minimum Bounding Rectangle (MBR) geographical function.

The result shows that (statistically) Munich is a safe (and great!) place to live. It’s much easier to drown in good beer than to be hit by a meteor.   

You can adapt the following variables to fit your needs:

latpoint 48.139

longpoint 11.566

As a starting point, I used the center of Munich.

r 100 -> this is a distance from the starting point you want to search within. Since we are using a circle distance in the query, it’s the circle radius.

units 111.045 -> if you want to search based in Kilometer, one degree has 111.045 km. For Miles, use 69.0. If you are on the sea, and want to know how far away the Roman Abramovich's yacht is, you should use 60.0 as units value. Please note, that you should add or subtract around 1 mile from the value, depending on where Roman was standing (bow or stern!) when he shared his geo position with you.

To search within a broader radius of 250 km, I’ll just change the r value from 100 to 250 and restart the query:

+-----------------+------------+--------+----------+----------+-------------+ | name | impactyear | mass_g | latit | longit | distance_km | +-----------------+------------+--------+----------+----------+-------------+ | Mässing | 1803-01-01 | 1600 | 12.61667 | 48.13333 | 77.86 | | Schönenberg | 1846-01-01 | 8000 | 10.46667 | 48.11667 | 81.52 | | Eichstädt | 1785-01-01 | 3000 | 11.21667 | 48.9 | 88.32 | | Neuschwanstein | 2002-01-01 | 6189 | 10.80833 | 47.525 | 88.54 | | Mühlau | 1877-01-01 | 5 | 11.41667 | 47.28333 | 95.67 | | Unter-Mässing | 1920-01-01 | 80000 | 11.33333 | 49.09028 | 107.01 | | Mauerkirchen | 1768-01-01 | 19000 | 13.13333 | 48.18333 | 116.20 | | Ischgl | 1976-01-01 | 724 | 10.27333 | 47.02633 | 156.97 | | Prambachkirchen | 1932-01-01 | 2125 | 13.94083 | 48.3025 | 176.63 | | Bohumilitz | 1829-01-01 | 59000 | 13.76667 | 49.05 | 190.66 | | Langwies | 1985-01-01 | 16 | 9.71667 | 46.81667 | 202.04 | | Teplá | 1909-01-01 | 17000 | 12.86667 | 49.98333 | 225.60 | | Barcis | 1950-01-01 | 87 | 12.35 | 46.1 | 234.04 | | Elbogen | 1399-12-24 | 107000 | 12.73333 | 50.18333 | 242.31 | | Pribram | 1959-01-01 | 5555 | 14.03333 | 49.66667 | 247.39 | | Ybbsitz | 1977-01-01 | 15000 | 14.89 | 47.96 | 247.53 | +-----------------+------------+--------+----------+----------+-------------+ 16 rows in set (0.06 sec)


With this set of functions, you can start enjoying a non-relational structure combined with the benefits of a powerful relational database system.

As an example, these functions can be used to build geotargeting API services for your business, showing your customers where your POS is located or checking how far away your friends are who shared their position with you.

A nice Open Source Framework for mapping is OpenLayers. Additionally, you can find a big list of free JSON datasets that don’t require authentication here.

If you want to give this code a try, download the complete SQL file (DDL and data).

Want more? Join our webinar on August 10 to dive deeper on geospatial data with MariaDB. In the webinar, we will cover how to build location-based services with MariaDB Server, and SQL using spatial data types, indexes and functions. Register for the MariaDB geospatial webinar now.

Happy geo positioning!

Starting with MariaDB Server 10.2, new JSON/GeoJSON functionality was introduced. Virtual columns also received improvements. With version 10.2 tons of new functionality was added and many of the limitations previously experienced were lifted. In this blog, I’ll show how these new MariaDB Server 10.2 features can empower your business.

Login or Register to post comments

Saturation Metrics in PMM 1.2.0

One of the new graphs added to Percona Monitoring and Management (PMM) is saturation metrics. This blog post explains how to use the information provided by these graphs.

You might have heard about Brendan Gregg’s USE Method  (Utilization-Saturation-Errors) as a way to analyze the performance of any system. Our goal in PMM is to support this method fully over time, and these graphs take us one step forward.

When it comes to utilization, there are many graphs available in PMM. There is the CPU Usage graph:

There is also Disk IO Utilization:

And there is Network Traffic:

If you would like to look at saturation type metrics, there is classical the Load Average graph:

While Load Average is helpful for understanding system saturation in general, it does not really distinguish whether it is the CPU or Disk that is saturated. Load Average, as the name says, is already averaged — so we can’t really observe short saturation spikes with Load Average. It is averaged for at least one minute. Finally, the problem with Load Average is it does not keep the number of CPU cores/threads into account. Suppose I have a CPU-bound Load Average of 16, for example. That is quite a load and will cause high saturation and queueing if you have two CPU threads. But if you have 64 threads, then 16 becomes a trivial load with no saturation at all.

Let’s take a look at the Saturation Metrics graph:

It provides us two metrics: one showing the CPU load and another is showing the IO load.These values roughly correspond to  the “r” and “b” columns in VMSTAT output:

These are sampled every second and then averaged over the reporting interval.

We also normalize the CPU load by dividing the raw number of runnable processes by a number of threads available. “Rocky” has 56 threads, which is why the normalized CPU load is about one even though the number of runnable processes shown by VMSTAT is around 50.

We do not normalize the IO load, as systems can have multiple IO devices and a number of requests they can handle in parallel is largely unknown. If you want to understand specific IO device performance, you should check out the Disk Performance Dashboard.

Testing Saturation Metrics in Practice

Let’s see if saturation graphs indeed show us when CPU saturation is the issue. I will use a sysbench CPU test for illustration, run as:

sysbench cpu  --cpu-max-prime=100000 --threads=1 --time=60 run

This will use the said number of threads to execute compute jobs, each of which will compute the said number of prime numbers. If we have enough CPU resources available, with no saturation, the latency of executing such requests should be about the same. When we overload the system, so there are not enough CPU execution units to process everything in the parallel, the average latency should increase.   

root@ts140i:/mnt/data# sysbench cpu  --cpu-max-prime=100000 --threads=1 --time=300 run sysbench 1.0.7 (using bundled LuaJIT 2.1.0-beta2) Running the test with following options: Number of threads: 1 Initializing random number generator from current time Prime numbers limit: 100000 Initializing worker threads... Threads started! General statistics:    total time:                          300.0234s    total number of events:              12784 Latency (ms):         min:                                 23.39         avg:                                 23.47         max:                                 28.07         95th percentile:                     23.52         sum:                             300018.06

As we can see with one thread working, the average time it takes to handle a single request is 23ms. Obviously, there is no saturation happening in this case:

“Ts140i” has four CPU cores, and as you can see the Normalized CPU load stays below one. You may wonder why isn’t it closer to 0.25 in this case, with one active thread and four cores available? The reason is at exactly the time when the metrics are being captured, there often happen to be an additional two to three threads active to facilitate the process. They are only active for a very few milliseconds at the time, so they do not produce much load — but they tend to skew the number a little bit.

Let’s now run with four threads. The number of threads matches the number of CPU cores available (and it is true cores in this case, no hyperthreading). In this case, don’t expect too much increase in the event processing time.

root@ts140i:/mnt/data# sysbench cpu  --cpu-max-prime=100000 --threads=4 --time=300 run sysbench 1.0.7 (using bundled LuaJIT 2.1.0-beta2) Running the test with following options: Number of threads: 4 Initializing random number generator from current time Prime numbers limit: 100000 Initializing worker threads... Threads started! General statistics:    total time:                          300.0215s    total number of events:              48285 Latency (ms):         min:                                 24.19         avg:                                 24.85         max:                                 43.61         95th percentile:                     24.83         sum:                            1200033.93

As you see test confirms the theory – we have avg latency increased just by about 6%  with Normalized CPU load in saturation metrics mostly hovering between 1 and 2:

Let’s now do the test with 16 threads, which is four times more than available CPU cores. We should see the latency dramatically increase due to CPU overload (or saturation). The same will happen to your CPU bound MySQL queries if you have more concurrency than CPUs available.

root@ts140i:/mnt/data# sysbench cpu  --cpu-max-prime=100000 --threads=16 --time=300 run sysbench 1.0.7 (using bundled LuaJIT 2.1.0-beta2) Running the test with following options: Number of threads: 16 Initializing random number generator from current time Prime numbers limit: 100000 Initializing worker threads... Threads started! General statistics:    total time:                          300.0570s    total number of events:              48269 Latency (ms):         min:                                 27.83         avg:                                 99.44         max:                                189.05         95th percentile:                    121.08         sum:                            4799856.52

We can see it takes about four times longer to process each request due to CPU overload and queueing. Let’s see what saturation metrics tell us:

As you can see, Normalized CPU Load floats between four and five on the graph, consistent with saturation we’re observing.

You may ask does the CPU utilization graph help us here? Not really. You will see 100% CPU usage for both the run with four threads and 16 threads, while request latencies are completely different.   

Summary

As we can see from our test, Normalized CPU Load is very helpful for understanding when the CPU is overloaded. An overloaded CPU causes response times to increase and performance to degrade. Furthermore, you can use it to (roughly) see how serious the overload is. As a rule of thumb, if you see Normalized CPU saturation over two, it indicates your CPUs are overloaded.

ProxySQL and MySQL 8.0.2 (la suite)

Recently I wrote a blog post on how to use ProxySQL with 8.0.2 dmr

This blog post was related to the change we implemented in the Group Replication’s monitoring (performance_schema tables) and how sys schema should be changed to modify the view used by ProxySQL according.

However, I used my labs machines that I was using for different purposes too and some changes I made hid a problem that avoid ProxySQL to work with the new defaults in MySQL 8.0.2. Thank you Andy for discovering this.

In fact the new collation default (utf8mb4_0900_ai_ci) is not yet supported by ProxySQL (1.4.1 is the latest recent release supporting MySQL Group Replication).

mysql> show global variables like '%colla%'; +----------------------+--------------------+ | Variable_name | Value | +----------------------+--------------------+ | collation_connection | utf8mb4_0900_ai_ci | | collation_database | utf8mb4_0900_ai_ci | | collation_server | utf8mb4_0900_ai_ci | +----------------------+--------------------+ 3 rows in set (0.00 sec)

You can get the list of supported collations in ProxySQL using this query in ProxySQL’s admin interface:

ProxySQL> select * from mysql_collations where charset='utf8mb4'; +-----+-----------------------+---------+---------+ | Id | Collation | Charset | Default | +-----+-----------------------+---------+---------+ | 45 | utf8mb4_general_ci | utf8mb4 | Yes | | 46 | utf8mb4_bin | utf8mb4 | | | 224 | utf8mb4_unicode_ci | utf8mb4 | | | 225 | utf8mb4_icelandic_ci | utf8mb4 | | | 226 | utf8mb4_latvian_ci | utf8mb4 | | | 227 | utf8mb4_romanian_ci | utf8mb4 | | | 228 | utf8mb4_slovenian_ci | utf8mb4 | | | 229 | utf8mb4_polish_ci | utf8mb4 | | | 230 | utf8mb4_estonian_ci | utf8mb4 | | | 231 | utf8mb4_spanish_ci | utf8mb4 | | | 232 | utf8mb4_swedish_ci | utf8mb4 | | | 233 | utf8mb4_turkish_ci | utf8mb4 | | | 234 | utf8mb4_czech_ci | utf8mb4 | | | 235 | utf8mb4_danish_ci | utf8mb4 | | | 236 | utf8mb4_lithuanian_ci | utf8mb4 | | | 237 | utf8mb4_slovak_ci | utf8mb4 | | | 238 | utf8mb4_spanish2_ci | utf8mb4 | | | 239 | utf8mb4_roman_ci | utf8mb4 | | | 240 | utf8mb4_persian_ci | utf8mb4 | | | 241 | utf8mb4_esperanto_ci | utf8mb4 | | | 242 | utf8mb4_hungarian_ci | utf8mb4 | | | 243 | utf8mb4_sinhala_ci | utf8mb4 | | +-----+-----------------------+---------+---------+ 22 rows in set (0.00 sec)

If you leave the collation as it is, ProxySQL won’t see the nodes online (because it cannot connect to MySQL) and after a short time, MySQL will blacklist the connection attempt:

ProxySQL> select hostgroup_id, hostname, status from runtime_mysql_servers; +--------------+--------------+---------+ | hostgroup_id | hostname | status | +--------------+--------------+---------+ | 2 | 192.168.90.2 | SHUNNED | | 2 | 192.168.90.4 | SHUNNED | | 2 | 192.168.90.3 | SHUNNED | +--------------+--------------+---------+ 3 rows in set (0.01 sec)

In ProxySQL’s log file you will see similar entries:

2017-08-03 17:14:15 MySQL_Monitor.cpp:1347:monitor_ping(): [ERROR] Server 192.168.90.2:3306 missed 3 heartbeats, shunning it and killing all the connections 2017-08-03 17:14:15 MySQL_Monitor.cpp:1347:monitor_ping(): [ERROR] Server 192.168.90.3:3306 missed 3 heartbeats, shunning it and killing all the connections 2017-08-03 17:14:15 MySQL_Monitor.cpp:1347:monitor_ping(): [ERROR] Server 192.168.90.4:3306 missed 3 heartbeats, shunning it and killing all the connections

You can also check in the table mysql_server_ping_log in ProxySQL’s admin interface, and you will find entries like these:

*************************** 178. row *************************** hostname: 192.168.90.2 port: 3306 time_start_us: 1501818227091532 ping_success_time_us: 0 ping_error: Host '192.168.90.2' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts' *************************** 179. row *************************** hostname: 192.168.90.3 port: 3306 time_start_us: 1501818227103737 ping_success_time_us: 0 ping_error: Can't initialize character set unknown (path: compiled_in)

So you will need to change the default collation, I added the following line in my.cnf (under [mysqld] section) and restarted the node:

collation-server = utf8mb4_general_ci

This fixed the problem !

ProxySQL> select hostgroup_id, hostname, status from runtime_mysql_servers; +--------------+--------------+--------+ | hostgroup_id | hostname | status | +--------------+--------------+--------+ | 2 | 192.168.90.3 | ONLINE | | 3 | 192.168.90.4 | ONLINE | | 3 | 192.168.90.2 | ONLINE | +--------------+--------------+--------+ 3 rows in set (0.00 sec)

In conclusion, if you need to use ProxySQL with MySQL 8.0.2 dmr (and not only for Group Replication or InnoDB Cluster), you will need to change the default collation while it is not yet supported. Of course René is already aware of this.

Again thank you René for the amazing work on ProxySQL and to Andy for trying MySQL 8.0.2 dmr, don’t forget, your feedback is important and really appreciated !

Comparing high availability for Google Cloud SQL and Amazon RDS

Cloud SQL is Google Cloud’s DBaaS platform that supports MySQL and now PostgreSQL in beta. RDS is AWS’s DBaaS. They both have “high availability” options which provide multi-zone redundancy and an automated failover mechanism. While they sound exactly the same at a high level, they have some important differences! This post describes some of the bigger differences I’ve noticed. I’m also focusing on MySQL.

Implementation Cloud SQL

Reference

Cloud SQL is implemented using semisynchronous replication. This isn’t surprising since Google contributed the semisynchronous replication support for MySQL.

You can read more about semisynchronous replication in MySQL in the reference manual.

RDS

Reference

The RDS information pages on the AWS web site don’t actually specify the implementation besides the fact that it’s synchronous. The fact that it’s implemented using DRBD has been mentioned in some talks and slide decks, like this one from the re:Invent 2016 Deep Dive on Amazon Aurora talk (available on YouTube).

Because replication happens at the block device level (below MySQL), the standby instance doesn’t actually have MySQL running!

Read scaling Cloud SQL

Reference

You can use the failover replica on Cloud SQL to serve reads.

RDS

Reference

The high-availability feature is not a scaling solution for read-only scenarios; you cannot use a standby replica to serve read traffic.

You cannot serve read traffic from the standby replica.

Failover considerations Cloud SQL

Reference

After the failover, the replica becomes the master, and Cloud SQL automatically creates a new failover replica in another zone. If you located your Cloud SQL instance to be near other resources, such as a Compute Engine instance, you can relocate your Cloud SQL instance back to its original zone when the zone becomes available. Otherwise, there is no need to relocate your instance after a failover.

During failover, all existing connections will be dropped. However, after failover, your applications can use the same IP address to connect to the new primary instance.

This Google Groups post by Jay on the Cloud SQL team also has some great answers.

4) My failover replica has an IP.. do I need to change my clients to use this IP, or will the old primary IP now start pointing at the failover replica? i.e. is this really a floating IP that gets moved?

No. There should be zero change required in your clients. After the failover, your client still connects tot he old primary IP, which now points to the primary instance that is moved to a healthy zone.

5) Can I use my failover replica as a read slave, or must it just sit idle until an event?

Yes. A failover replica is perfectly capable of being served as a read replica.

6) What happens to the old primary in a failover after it comes back. Does it become a failover replica for the new primary, or do I need to do something by hand?

The primary stays as primary before and after the failover process. It is just moved to a healthy zone. Therefore there is no such thing as “old primary comes back” as it always there, and there is nothing need to be done by hand.

7) How do I reset my original primary to be the real master after a failover event is complete?

Same as questions 6.

It’s not entirely clear how long failover time would be. The second post in that thread mentions InnoDB recovery, which could take a long time. However, the references pages say that the replica becomes the master, so I’m guessing that failover should be pretty quick.

RDS

Reference

Multi-AZ failover on RDS uses a DNS change to point to the standby instance. The reference page mentions 60-120 seconds of unavailability during the failover. Because the standby uses the same storage data as the primary, there will probably be InnoDB recovery, so failover time might be longer that.

Backups Cloud SQL

Reference

Backups must be done on the master instance.

RDS

Reference

Backups using the Multi-AZ configuration can be done on the standby because its storage data is the same as the master’s.

Sweet and sour can become bitter

 

Recently Fred published a post (http://lefred.be/content/mysql-group-replication-is-sweet-but-can-be-sour-if-you-misunderstand-it) in which he was stating, I had publish my blog (http://www.tusacentral.net/joomla/index.php/mysql-blogs/191-group-replication-sweet-a-sour.html) which contains few “misunderstanding”.

 

All the people that knows me, also know I have no problem to admit if I do a mistake, at the end doing mistake is a way of learning and be afraid of the ones who don’t do them.

But at the same time, I need to have proof of it. As I provide proof of what I wrote with numbers and tests (all available on github).

Let us put down the basis of the discussion with facts, not personal speculation or assertions.

Facts:

1) From MySQL official documentations (https://dev.mysql.com/doc/refman/5.7/en/group-replication-use-cases.html) 

Group Replication enables you to create fault-tolerant systems with redundancy by replicating the system state throughout a set of servers. Consequently, even if some of the servers fail, as long it is not all or a majority, the system is still available, and all it could have degraded performance or scalability, it is still available. Server failures are isolated and independent. They are tracked by a group membership service which relies on a distributed failure detector that is able to signal when any servers leave the group, either voluntarily or due to an unexpected halt. There is a distributed recovery procedure to ensure that when servers join the group they are brought up to date automatically. There is no need for server fail-over, and the multi-master update everywhere nature ensures that not even updates are blocked in the event of a single server failure. Therefore MySQL Group Replication guarantees that the database service is continuously available.

 

2) Still from MySQL (https://dev.mysql.com/worklog/task/?id=9838Group Replication introduced flow control to avoid excessive buffering and to maintain group members reasonably close to one another. For several reasons, it's more efficient to keep buffering low but, as mentioned before, it is not a requirement that members are kept in sync for replication to work: once a slave becomes delayed it will just increase the number of pending transactions in the relay log.

The flow control mechanism enters the scene to bound the back-log members can accumulate, in terms of transactions to certify and transactions to apply. Once those bounds are exceeded, and while they remain exceeded, the flow-control mechanism limits the throughput of writer members to adjust to the capacity of the slowest members of the group. By imposing conservative bounds it tries to make sure all members are able to keep-up and can return safely to within the bounds defined by the user.

 

 

So given the above, I, as “standard” user, read that as “when using MySQL and GR, and setting (eventually) the Flow Control correctly, I will achieve to have a data platform that is continuously available given the use of GR”.

Cool, right? So what I was doing in my tests? Well two main things, one is to see if I can use GR for scaling reads as I was doing (and hundreds of customers as well) with PXC, the other is to see if in case of crash of my Master, I can safely fail-over another slave.

This doesn’t mean I am comparing the way the two product works. I cannot care less at this stage, as I am sure most of the customer will not care. What they care is what they can do SAFELY, and with what. All the mambo-jambo about the technical details (how much sync or async I can be) is not their concern.

So the point was and is… Given I am used to product X can I move to product Y and if so how and what I should be aware of?

Once more I was trying to answer to the question “If GR and InnoDB cluster has to work as alternative to other (virtually) synchronous replication mechanism, what change or shift our customers must consider if they want move from one to the other.”

The outcome of the tests is pointing to answer that, period.

 

Let us clarify two things more:

I am perfectly aware (Fred talking to you) that GR use a different mechanism for FC, and that the numbers set in the group_replication_flow_control_certifier_threshold/ group_replication_flow_control_applier_threshold are then use to calculate the Quota. Still they are user threshold, and Specifies the number of waiting transactions in the applier/certifier queue that trigger flow control which are connected to the final statements reported above: By imposing conservative bounds it tries to make sure all members are able to keep-up and can return safely to within the bounds defined by the user.
Bound that as for the manual can go from 0 to 2147483648. 
As such setting it to 25 (I did try also 1000 or more with even worse results) is perfectly legit and I have some issues in considering it a mistake.

 

I was measuring the lag with the only tools MySQL/Oracle had give us, in the article I said I had used:
select  @last_rec:=SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX( Received_transaction_set,':',-2),':',1),'-',-1) last_received FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier'; select (@last_rec - @last_exec) as real_lag"

Which use the only information available at the moment regarding the incoming and the current executed transactions.
Fred says that is not valid because the certification and This means that a transaction listed on last_received_transaction_set is certified and queued to apply on this member. On the other members it may already be certified/applied or soon be.
I agree that it may not be perfect .. and I should have said: is for sure the last apply or soon to be given the certification on the master a node can know about.
But here we are not talking of 1 or 10 entries, but in most cases lag of hundreds or thousands entries. So also if this is not perfect and I can miss in a way or another a couple or entries because still processing the certification, I still think it is a valid way to evaluate the lag given the dimensions we are talking about.

BTW if this is not … well please help us and provide the right way to calculate the REAL lag between each node in a GR cluster and the writing master, with the most accurate precision.

 

About the comment of the node dimension, well thanks and yes you are right here I forgot to put the details.

The four nodes are VMS on separated hosts, so the gr1/2 where hosted on Host1 while gr3/4 hosted on the another host. Nothing else running on the hosts while test was running. Applications and monitor where running on a 3 host. The exact dimension is not really relevant given all of them were the same.

I want to say that I can setup the perfect environment for GR or PXC or NDB or whatever you are testing and showing how cool that is.

But I think we should consider that real life is far to be like that, and that we must see/test is how a product is behaving in adverse conditions, or if not adverse challenging.

 

Given all the above, in my opinion the tests clearly shown that also if the Flow Control mechanism in GR is the coolest thing ever conceptualize, at the end it does not what it is suppose to, no matter what.

Was I setting the thresholds wrong? Well not sure about that, given the results I got.

If I have 4 nodes and 2 of them (so … no majority) are lagging behind of hundreds or even thousands of entries, while my threshold is in the order of hundreds or less, this means that the cool mechanism is not doing his job, period.

One think is write down whatever about this or that but another is doing tests and provide numbers and evidences, not just words.

As said in a comment (not published in the Fred blog) I am more than happy to perform more tests and do them in conjunctions with anyone else.

Then if I am wrong I am wrong… but until I have different evidence, just saying a car can fly because it has very nice and colorful wings, doesn’t make it fly.

 


orchestrator/raft: Pre-Release 3.0

orchestrator 3.0 Pre-Release is now available. Most notable are Raft consensus, SQLite backend support, orchestrator-client no-binary-required client script.

TL;DR

You may now set up high availability for orchestrator via raft consensus, without need to set up high availability for orchestrator's backend MySQL servers (such as Galera/InnoDB Cluster). In fact, you can run a orchestrator/raft setup using embedded SQLite backend DB. Read on.

orchestrator still supports the existing shared backend DB paradigm; nothing dramatic changes if you upgrade to 3.0 and do not configure raft.

orchestrator/raft

Raft is a consensus protocol, supporting leader election and consensus across a distributed system.  In an orchestrator/raft setup orchestrator nodes talk to each other via raft protocol, form consensus and elect a leader. Each orchestrator node has its own dedicated backend database. The backend databases do not speak to each other; only the orchestrator nodes speak to each other.

No MySQL replication setup needed; the backend DBs act as standalone servers. In fact, the backend server doesn't have to be MySQL, and SQLite is supported. orchestrator now ships with SQLite embedded, no external dependency needed.

In a orchestrator/raft setup, all orchestrator nodes talk to each other. One and only one is elected as leader. To become a leader a node must be part of a quorum. On a 3 node setup, it takes 2 nodes to form a quorum. On a 5 node setup, it takes 3 nodes to form a quorum.

Only the leader will run failovers. This much is similar to the existing shared-backend DB setup. However in a orchestrator/raft setup each node is independent, and each orchestrator node runs discoveries. This means a MySQL server in your topology will be routinely visited and probed by not one orchestrator node, but by all 3 (or 5, or what have you) nodes in your raft cluster.

Any communication to orchestrator must take place through the leader. One may not tamper directly with the backend DBs anymore, since the leader is the one authoritative entity to replicate and announce changes to its peer nodes. See orchestrator-client section following.

For details, please refer to the documentation:

The orchetrator/raft setup comes to solve several issues, the most obvious is high availability for the orchestrator service: in a 3 node setup any single orchestrator node can go down and orchestrator will reliably continue probing, detecting failures and recovering from failures.

Another issue solve by orchestrator/raft is network isolation, in particularly cross-DC, also refered to as fencing. Some visualization will help describe the issue.

Consider this 3 data-center replication setup. The master, along with a few replicas, resides on DC1. Two additional DCs have intermediate masters, aka local-masters, that relay replication to local replicas.

We place 3 orchestrator nodes in a raft setup, each in a different DC. Note that traffic between orchestrator nodes is very low, and cross DC latencies still conveniently support the raft communication. Also note that backend DB writes have nothing to do with cross-DC traffic and are unaffected by latencies.

Consider what happens if DC1 gets network isolated: no traffic in or out DC1

Each orchestrator nodes operates independently, and each will see a different state. DC1's orchestrator will see all servers in DC2, DC3 as dead, but figure the master itself is fine, along with its local DC1 replicas:

However both orchestrator nodes in DC2 and DC3 will see a different picture: they will see all DC1's servers as dead, with local masters in DC2 and DC3 having broken replication:

Who gets to choose?

In the orchestrator/raft setup, only the leader runs failovers. The leader must be part of a quorum. Hence the leader will be an orchestrator node in either DC2 or DC3. DC1's orchestrator will know it is isolated, that it isn't part of the quorum, hence will step down from leadership (that's the premise of the raft consensus protocol), hence will not run recoveries.

There will be no split brain in this scenario. The orchestrator leader, be it in DC2 or DC3, will act to recover and promote a master from within DC2 or DC3. A possible outcome would be:

What if you only have 2 data centers?

In such case it is advisable to put two orchestrator nodes, one in each of your DCs, and a third orchestrator node as a mediator, in a 3rd DC, or in a different availability zone. A cloud offering should do well:

The orchestrator/raft setup plays nice and allows one to nominate a preferred leader.

SQLite

Suggested and requested by many, is to remove orchestrator's own dependency on a MySQL backend. orchestrator now supports a SQLite backend.

SQLite is a transactional, relational, embedded database, and as of 3.0 it is embedded within orchestrator, no external dependency required.

SQLite doesn't replicate, doesn't support client/server protocol. As such, it cannot work as a shared database backend. SQLite is only available on:

  • A single node setup: good for local dev installations, testing server, CI servers (indeed, SQLite now runs in orchestrator's CI)
  • orchestrator/raft setup, where, as noted above, backend DBs do not communicate with each other in the first place and are each dedicated to their own orchestrator node.

It should be pointed out that SQLite is a great transactional database, however MySQL is more performant. Load on backend DB is directly (and mostly linearly) affected by the number of probed servers. If you have 50 servers in your topologies or 500 servers, that matters. The probing frequency of course also matters for the write frequency on your backend DB. I would suggest if you have thousands of backend servers, to stick with MySQL. If dozens, SQLite should be good to go. In between is a gray zone, and at any case run your own experiments.

At this time SQLite is configured to commit to file; there is a different setup where SQLite places data in-memory, which makes it faster to execute. Occasional dumps required for durability. orchestrator may support this mode in the future.

orchestrator-client

You install orchestrator as a service on a few boxes; but then how do you access it from other hosts?

  • Either curl the orchestrator API
  • Or, as most do, install orchestrator-cli package, which includes the orchestrator binary, everywhere.

The latter implies:

  • Having the orchestrator binary installed everywhere, hence updated everywhere.
  • Having the /etc/orchestrator.conf.jsondeployed everywhere, along with credentials.

The orchestrator/raft setup does not support running orchestrator in command-line mode. Reason: in this mode orchestrator talks directly to the shared backend DB. There is no shared backend DB in the orchestrator/raft setup, and all communication must go through the leader service. This is a change of paradigm.

So, back to curling the HTTP API. Enter orchestrator-client which mimics the command line interface, while running curl | jq requests against the HTTP API. orchestrator-client, however, is just a shell script.

orchestrator-client will work well on either orchestrator/raft or on your existing non-raft setups. If you like, you may replace your remote orchestrator installations and your /etc/orchestrator.conf.json deployments with this script. You will need to provide the script with a hint: the $ORCHESTRATOR_API environment variable should be set to point to the orchestrator HTTP API.

Here's the fun part:

  • You will either have a proxy on top of your orchestrator service cluster, and you would export ORCHESTRATOR_API=http://my.orchestrator.service/api
  • Or you will provide orchestrator-client with all orchestrator node identities, as in export ORCHESTRATOR_API="https://orchestrator.host1:3000/api https://orchestrator.host2:3000/api https://orchestrator.host3:3000/api" .
    orchestrator-client will figure the identity of the leader and will forward requests to the leader. At least scripting-wise, you will not require a proxy.
Status

orchestrator 3.0 is a Pre-Release. We are running a mostly-passive orchestrator/raft setup in production. It is mostly-passive in that it is not in charge of failovers yet. Otherwise it probes and analyzes our topologies, as well as runs failure detection. We will continue to improve operational aspects of the orchestrator/raft setup (see this issue).

 

Percona Server for MySQL 5.6.36-82.1 is Now Available

Percona is glad to announce the release of Percona Server for MySQL 5.6.36-82.1 on August 1, 2017 (Downloads are available here and from the Percona Software Repositories).

Based on MySQL 5.6.36, including all the bug fixes in it, Percona Server for MySQL 5.6.34-79.1 is the current GA release in the Percona Server for MySQL 5.6 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.6.36-82.1 milestone at Launchpad.

Please note that RHEL 5, CentOS 5 and Ubuntu versions 12.04 and older are not supported in future releases of Percona Server for MySQL and no further packages are added for these distributions.

New Features

  • Percona Server for MySQL can now be built with support of OpenSSL 1.1.
  • Percona Server for MySQL is now available on Debian 9 (stretch). The support only covers the amd64 architecture.
  • TokuDB enables to kill a query that is awaiting an FT locktree lock.
Bugs Fixed
  • Row counts in TokuDB could be lost intermittently after restarts. Bug fixed #2.
  • In TokuDB, two races in the fractal tree lock manager could significantly affect transactional throughput for some applications that used a small number of concurrent transactions. These races manifested as transactions unnecessarily waiting for an available lock. Bug fixed #3.
  • TokuDB could assert when opening a dictionary with no useful information to error log. Bug fixed #23.
  • TokuDB could assert for various reasons deserializing nodes with no useful error output. Bug fixed #24.
  • Percona Server could crash when running a query over a partitioned table that uses an index to read a range of rows if this range was not covered by any existing partition. Bug fixed #1657941 (upstream #76418).
  • With two client connections to a server (debug server build), the server could crash after one of the clients set the global option userstat and flushed the client statistics (FLUSH CLIENT_STATISTICS) and then both clients were closed. Bug fixed #1661488.
  • TokuDB did not pass cmake flags on to snappy cmake. Bug fixed #41. The progress status for partitioned TokuDB table ALTERs was misleading. Bug fixed #42.
  • When a client application is connecting to the Aurora cluster end point using SSL (--ssl-verify-server-cert or --ssl-mode=VERIFY_IDENTITY option), wildcard and SAN enabled SSL certificates were ignored. See also Compatibility Matrix. Note that the --ssl-verify-server-cert option is deprecated in Percona Server 5.7. Bug fixed #1673656 (upstream #68052).
  • Killing a stored procedure execution could result in an assert failure on a debug server build. Bug fixed #1689736 (upstream #86260).
  • It was not possible to build Percona Server on Debian 9 (stretch) due to issues with OpenSSL 1.1. Bug fixed #1702903 (upstream #83814).
  • The SET STATEMENT .. FOR statement changed the global instead of the session value of a variable if the statement occurred immediately after the SET GLOBAL or SHOW GLOBAL STATUS command. Bug fixed #1385352.
  • The synchronization between the LRU manager and page cleaner threads was not done at shutdown. Bug fixed #1689552.

Other bugs fixed: #6#44#65#1160986#1676740#1689989#1689998#1690012#1699788, and #1684601 (upstream #86016).

Compatibility Matrix Feature YaSSL OpenSSL < 1.0.2 OpenSSL >= 1.0.2 ‘commonName’ validation Yes Yes Yes SAN validation No Yes Yes Wildcards support No No Yes

How to run multiple ProxySQL instances

How to run multiple ProxySQL instances

Some time ago I wrote about the ability of running multiple ProxySQL instances listening on the same port(s).
This capability is present since ProxySQL 1.3.0 , but not much used.
Although the feature seems super interesting, how to use it seems a bit not intiutive.
After a series of Q&A on Slack (mostly reported at the bottom of this blog post), I decided it is time to better describe this feature.
This blog post try to describe how to use this feature, starting from installation.

Note: this feature requires Linux kernel >= 3.9 .

Installation

In this example, installation is performed using deb packages from github.com repository:

root@nebula:~# wget -q https://github.com/sysown/proxysql/releases/download/v1.4.1/proxysql_1.4.1-ubuntu16_amd64.deb root@nebula:~# root@nebula:~# dpkg -i proxysql_1.4.1-ubuntu16_amd64.deb Selecting previously unselected package proxysql. (Reading database ... 269571 files and directories currently installed.) Preparing to unpack .../proxysql_1.4.1-ubuntu16_amd64.deb ... Unpacking proxysql (1.4.1) ... Setting up proxysql (1.4.1) ... Processing triggers for systemd (229-4ubuntu16) ... Processing triggers for ureadahead (0.100.0-19) ...

During the installation, an empty datadir is created, as well as a minimal config file:

root@nebula:~# ls -l /var/lib/proxysql/ total 0 root@nebula:~# ls -l /etc/proxysql.cnf -rw-r--r-- 1 root root 4517 ago 1 13:23 /etc/proxysql.cnf Starting ProxySQL

When ProxySQL starts, it normally starts as a daemon. As part of this process, it checks if there is a pid file in the datadir and if another proxysql instance is running. If yes, it returns immediately.
Therefore running two instances of ProxySQL needs some extra configuration.
Although it is possible for two proxysql instances to share the same database file proxysql.db , it is a bit complex and we won't describe this scenario. We will instead run two proxysql instances, each with its own datadir and configuration file (minimal configuration file!)

Preparing the datadirs

We will simply create two directories in /var/lib/proxysql , each directory representing the datadir of a process.

root@nebula:~# mkdir /var/lib/proxysql/proxy01 root@nebula:~# mkdir /var/lib/proxysql/proxy02

Next, we will create 2 config files, one in each datadir. The only thing we will configure is the listener port of the Admin interface, because otherwise we won't be able to specify to which ProxySQL's Admin we will connect.
Note that is also possible to configure two different Unix domain sockets, but in this example we will use TCP sockets.

root@nebula:~# cat > /var/lib/proxysql/proxy01/proxysql.cnf << EOF > datadir="/var/lib/proxysql/proxy01" > admin_variables= > { > mysql_ifaces="0.0.0.0:6031" > } > EOF root@nebula:~# cat > /var/lib/proxysql/proxy02/proxysql.cnf << EOF > datadir="/var/lib/proxysql/proxy02" > admin_variables= > { > mysql_ifaces="0.0.0.0:6032" > } > EOF

Verify the files:

root@nebula:~# cat /var/lib/proxysql/proxy01/proxysql.cnf datadir="/var/lib/proxysql/proxy01" admin_variables= { mysql_ifaces="0.0.0.0:6031" } root@nebula:~# cat /var/lib/proxysql/proxy02/proxysql.cnf datadir="/var/lib/proxysql/proxy02" admin_variables= { mysql_ifaces="0.0.0.0:6032" }

Now we can start two proxysql instances, each one with its config file:

root@nebula:~# proxysql -c /var/lib/proxysql/proxy01/proxysql.cnf root@nebula:~# proxysql -c /var/lib/proxysql/proxy02/proxysql.cnf

Next, let's check what is running:

root@nebula:~# ps -eF | grep proxysql | grep cnf root 16383 1855 0 8307 7052 6 20:32 ? 00:00:00 proxysql -c /var/lib/proxysql/proxy01/proxysql.cnf root 16384 16383 0 24920 31956 5 20:32 ? 00:00:00 proxysql -c /var/lib/proxysql/proxy01/proxysql.cnf root 16410 1855 0 8307 3368 3 20:32 ? 00:00:00 proxysql -c /var/lib/proxysql/proxy02/proxysql.cnf root 16411 16410 0 24920 8044 2 20:32 ? 00:00:00 proxysql -c /var/lib/proxysql/proxy02/proxysql.cnf

Perfect, we have four proxysql processes! Remember, when starting as a daemon, proxysql immediately forks:

  • the child process is the real service
  • the parent process checks the exit status of the child process, and eventually restart it immediately in under 15ms if it crashes.

We can now verify that the two services are using a datadir each:

root@nebula:~# ls -l /var/lib/proxysql/ total 8 drwxr-xr-x 2 root root 4096 ago 1 20:32 proxy01 drwxr-xr-x 2 root root 4096 ago 1 20:32 proxy02 root@nebula:~# ls -l /var/lib/proxysql/proxy01/ total 108 -rw-r--r-- 1 root root 87 ago 1 20:30 proxysql.cnf -rw------- 1 root root 98304 ago 1 20:32 proxysql.db -rw------- 1 root root 1239 ago 1 20:32 proxysql.log -rw-r--r-- 1 root root 6 ago 1 20:32 proxysql.pid root@nebula:~# ls -l /var/lib/proxysql/proxy02/ total 108 -rw-r--r-- 1 root root 87 ago 1 20:30 proxysql.cnf -rw------- 1 root root 98304 ago 1 20:32 proxysql.db -rw------- 1 root root 1239 ago 1 20:32 proxysql.log -rw-r--r-- 1 root root 6 ago 1 20:32 proxysql.pid Verify listening processes

Next, let's check which ports are used by the two proxysql instances:

root@nebula:~# netstat -ntap | grep proxysql tcp 0 0 0.0.0.0:6031 0.0.0.0:* LISTEN 16384/proxysql tcp 0 0 0.0.0.0:6032 0.0.0.0:* LISTEN 16411/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 16411/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 16411/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 16411/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 16411/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 16384/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 16384/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 16384/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 16384/proxysql

As expected, they are listening on port 6031 (pid 16384, proxysql1), port 6032 (pid 16411, proxysql2), and port 6033 (both pid 16384 and 16411).

But why each proxysql processes is reported 4 times as listening on port 6033?
In proxysql, each MySQL Thread is listening on said port. Because mysql-threads=4 by default, that means that each proxysql process has 4 threads listening on port 6033. That is, multiple threads are listening on the same socket:

root@nebula:~# lsof -i:6033 -n COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME proxysql 16384 root 15u IPv4 705075 0t0 TCP *:6033 (LISTEN) proxysql 16384 root 16u IPv4 705076 0t0 TCP *:6033 (LISTEN) proxysql 16384 root 17u IPv4 705077 0t0 TCP *:6033 (LISTEN) proxysql 16384 root 18u IPv4 705078 0t0 TCP *:6033 (LISTEN) proxysql 16411 root 15u IPv4 700302 0t0 TCP *:6033 (LISTEN) proxysql 16411 root 16u IPv4 700303 0t0 TCP *:6033 (LISTEN) proxysql 16411 root 17u IPv4 700304 0t0 TCP *:6033 (LISTEN) proxysql 16411 root 18u IPv4 700305 0t0 TCP *:6033 (LISTEN)

As a result, 8 threads are listening on the same port, 4 threads for each proxysql process.

Create a user and a backend in ProxySQL

To test how connections are load balanced, we first need to create a user in ProxySQL to allow clients to connect.
We will also configure a backend.
Because each proxysql instance has its own configuration, both need to be configured.

root@nebula:~# for i in 6031 6032 ; do mysql -u admin -padmin -h 127.0.0.1 -P$i -e "INSERT INTO mysql_users (username,password) VALUES ('sbtest','sbtest'); LOAD MYSQL USERS TO RUNTIME;" ; done mysql: [Warning] Using a password on the command line interface can be insecure. mysql: [Warning] Using a password on the command line interface can be insecure. root@nebula:~# for i in 6031 6032 ; do mysql -u admin -padmin -h 127.0.0.1 -P$i -e "INSERT INTO mysql_servers (hostname) VALUES ('127.0.0.1'); LOAD MYSQL SERVERS TO RUNTIME;" ; done mysql: [Warning] Using a password on the command line interface can be insecure. mysql: [Warning] Using a password on the command line interface can be insecure. Load balance traffic

It is now time to check how the kernel will load balance the traffic.
We will run very simple queries, specifically SELECT 1 using mysqlslap :

root@nebula:~# time mysqlslap --create-schema=sbtest -u sbtest -psbtest -h 127.0.0.1 -P6033 -c 8 -i 1000 -q "SELECT 1" mysqlslap: [Warning] Using a password on the command line interface can be insecure. Benchmark Average number of seconds to run all queries: 0.000 seconds Minimum number of seconds to run all queries: 0.000 seconds Maximum number of seconds to run all queries: 0.004 seconds Number of clients running queries: 8 Average number of queries per client: 1 real 0m1.131s user 0m0.344s sys 0m0.868s

How do we verify how many queries were executed and connections established on each proxysql?

root@nebula:~# for i in 6031 6032 ; do mysql -u admin -padmin -h 127.0.0.1 -P$i -e "SELECT * FROM stats_mysql_global WHERE VARIABLE_NAME IN ('Client_Connections_created','Questions')" 2> /dev/null ; done +----------------------------+----------------+ | Variable_Name | Variable_Value | +----------------------------+----------------+ | Client_Connections_created | 4034 | | Questions | 4033 | +----------------------------+----------------+ +----------------------------+----------------+ | Variable_Name | Variable_Value | +----------------------------+----------------+ | Client_Connections_created | 3967 | | Questions | 3967 | +----------------------------+----------------+

The kernel load balanced the traffic almost evenly:

  • 50.41% of the connections to proxysql1
  • 49.59% of the connections to proxysql2
PROXYSQL PAUSE

Very old versions of ProxySQL (maybe 0.2 or 1.0) has a command that allows to gracefully shutdown the listener. While ProxySQL became more modular and complex, this feature was somehow broken, but finally restored in 1.4.1 , following a feature request.
Command PROXYSQL PAUSE stops accepting new connections, while continue processing the requests from the current connections.
Why this feature is important? If there is the need to restart ProxySQL (for example to perform an upgrade), it is possible to execute PROXYSQL PAUSE, wait for all the clients to disconnect, and finally stop the process, so that clients' connections are terminated gracefully.

Let's try this running again mysqlslap for a longer period of time, and issuing PROXYSQL PAUSE on one of the proxies.

(Note: both proxies were erroneously restarted between the previous test and this one, therefore pids are changed and counters reset)

root@nebula:~# ( ( sleep 2 ; mysql -u admin -padmin -h 127.0.0.1 -P6031 -e "PROXYSQL PAUSE") & ) ; time mysqlslap --create-schema=sbtest -u sbtest -psbtest -h 127.0.0.1 -P6033 -c 8 -i 300 -q "SELECT sleep(0.01)" mysqlslap: [Warning] Using a password on the command line interface can be insecure. mysql: [Warning] Using a password on the command line interface can be insecure. Benchmark Average number of seconds to run all queries: 0.011 seconds Minimum number of seconds to run all queries: 0.010 seconds Maximum number of seconds to run all queries: 0.111 seconds Number of clients running queries: 8 Average number of queries per client: 1 real 0m3.763s user 0m0.204s sys 0m0.632s

Let's check where queries were executed. We expect that proxysql1 has received less connections because its listener was stopped during he test:

root@nebula:~# for i in 6031 6032 ; do mysql -u admin -padmin -h 127.0.0.1 -P$i -e "SELECT * FROM stats_mysql_global WHERE VARIABLE_NAME IN ('Client_Connections_created','Questions')" 2> /dev/null ; done +----------------------------+----------------+ | Variable_Name | Variable_Value | +----------------------------+----------------+ | Client_Connections_created | 644 | | Questions | 644 | +----------------------------+----------------+ +----------------------------+----------------+ | Variable_Name | Variable_Value | +----------------------------+----------------+ | Client_Connections_created | 1757 | | Questions | 1756 | +----------------------------+----------------+

One important thing to highlight is that mysqlslap didn't receive any error: this because when the listener is stopped, current connections are not terminated.

Now let's check listeners. Only one process is listening on port 6033:

root@nebula:~# netstat -ntlp | grep proxysql tcp 0 0 0.0.0.0:6031 0.0.0.0:* LISTEN 13909/proxysql tcp 0 0 0.0.0.0:6032 0.0.0.0:* LISTEN 13936/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13936/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13936/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13936/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13936/proxysql

That means that if we executes mysqlslap again, all traffic will go to proxysql2:

root@nebula:~# time mysqlslap --create-schema=sbtest -u sbtest -psbtest -h 127.0.0.1 -P6033 -c 8 -i 300 -q "SELECT sleep(0.01)" mysqlslap: [Warning] Using a password on the command line interface can be insecure. Benchmark Average number of seconds to run all queries: 0.011 seconds Minimum number of seconds to run all queries: 0.011 seconds Maximum number of seconds to run all queries: 0.018 seconds Number of clients running queries: 8 Average number of queries per client: 1 real 0m3.817s user 0m0.220s sys 0m0.648s root@nebula:~# for i in 6031 6032 ; do mysql -u admin -padmin -h 127.0.0.1 -P$i -e "SELECT * FROM stats_mysql_global WHERE VARIABLE_NAME IN ('Client_Connections_created','Questions')" 2> /dev/null ; done +----------------------------+----------------+ | Variable_Name | Variable_Value | +----------------------------+----------------+ | Client_Connections_created | 644 | | Questions | 644 | +----------------------------+----------------+ +----------------------------+----------------+ | Variable_Name | Variable_Value | +----------------------------+----------------+ | Client_Connections_created | 4158 | | Questions | 4156 | +----------------------------+----------------+

Confirmed! All new requests went to proxysql2.

PROXYSQL RESUME

Surprise! :)
ProxySQL doesn't have just the command PROXYSQL PAUSE, but also the command PROXYSQL RESUME: all the MySQL Threads inside ProxySQL will start listening again on listener port. Let's run again mysqlslap, and while it run we execute PROXYSQL RESUME on proxysql1:

root@nebula:~# ( ( sleep 2 ; mysql -u admin -padmin -h 127.0.0.1 -P6031 -e "PROXYSQL RESUME") & ) ; time mysqlslap --create-schema=sbtest -u sbtest -psbtest -h 127.0.0.1 -P6033 -c 8 -i 300 -q "SELECT sleep(0.01)" mysqlslap: [Warning] Using a password on the command line interface can be insecure. mysql: [Warning] Using a password on the command line interface can be insecure. Benchmark Average number of seconds to run all queries: 0.012 seconds Minimum number of seconds to run all queries: 0.011 seconds Maximum number of seconds to run all queries: 0.017 seconds Number of clients running queries: 8 Average number of queries per client: 1 real 0m3.839s user 0m0.224s sys 0m0.700s

Let's now check where connections were established and queries executed:

root@nebula:~# for i in 6031 6032 ; do mysql -u admin -padmin -h 127.0.0.1 -P$i -e "SELECT * FROM stats_mysql_global WHERE VARIABLE_NAME IN ('Client_Connections_created','Questions')" 2> /dev/null ; done +----------------------------+----------------+ | Variable_Name | Variable_Value | +----------------------------+----------------+ | Client_Connections_created | 1210 | | Questions | 1210 | +----------------------------+----------------+ +----------------------------+----------------+ | Variable_Name | Variable_Value | +----------------------------+----------------+ | Client_Connections_created | 5993 | | Questions | 5990 | +----------------------------+----------------+

As expected, now proxysql1 has resumed operation and processing requests!
And finally, netstat confirms that both proxies are listening on port 6033:

root@nebula:~# netstat -ntlp | grep proxysql tcp 0 0 0.0.0.0:6031 0.0.0.0:* LISTEN 13909/proxysql tcp 0 0 0.0.0.0:6032 0.0.0.0:* LISTEN 13936/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13909/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13909/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13909/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13909/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13936/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13936/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13936/proxysql tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 13936/proxysql Conclusion

Running multiple ProxySQL instances on the same box, all listening on the same port(s), it is a great way to provide HA, to perform graceful upgrade, or test new functionalities or configurations.
All this, completely transparent to the application.

Q & A

Below is reported the Q&A that we had on Slack on this subject.

Q. Wait, "Multiple ProxySQL instances can listen on same TCP port" - really? How did you do that?
A. Yes, actually, it's very easy, you just need Linux kernel 3.9+. Yet, a lot of production systems use 2.6. (Example, CentOS 6.x used 2.6 and current CentOS 7 uses 3.10.)

Q. Wow. The kernel must round robin incoming connections to listening processes? And keeps track by source IP and TCP port number?
A. Yes. Once a connection is bound to a process, it stays locked there, until it is disconnected.

Q. So tell me again how I can use this in ProxySQL.
A. The new command "PROXYSQL PAUSE" allows a graceful shutdown. If you have 2 proxies, you can issue PROXYSQL PAUSE in one of them, and from that moment on, all new connections will go to the other proxy. When the first proxy has no more connections, you can shut it down.

Q. And my connecting applications will never know the difference! I could run /usr/bin/proxysql01 and /usr/bin/proxysql02 both on port 6033, then PAUSE proxysql01, shut it down, and upgrade just that proxysql01 binary, while traffic still routes through 02?
A. Yes you can. If you run them in the foreground, that is easy. If you run them in the background, you need to specify 2 different datadirs, because otherwise the 2nd won't start because if will find a pid file.

Q. Right, data directory management ... though you could technically have both proxysql01 and proxysql02 share the same proxysql.db file? Admin them both separately, tell each to SAVE BLAH TO DISK, one config database, but then you are overwriting the db file so better be careful you know what you are doing!
A. Exactly. They can share the same proxysql.db, but using different database files makes things easier.

Q. I guess you better also run them on separate admin TCP ports -- else what a nightmare.
A. Yes. Technically you can use different unix socket domains, but it just makes management more complex. Using different datadir and different admin ports is way easier. Something like 6301, 6302, etc.

Q. But no matter what, I have to remember to make any configuration changes twice, on both instances, (or more) right?
A. Yes, you have to manage each instance separately, and be careful to keep them in sync since they are listening on the same host on the same port. Linux is choosing which one gets each incoming proxysql connection and you cannot control that. If you have different RUNTIME configs on same port you will get unpredictable results.

Q. I guess I could carefully use different pid files but share one proxysql.db file, SAVE {CONFIG} TO DISK in my master admin instance, then LOAD {CONFIG} FROM DISK and LOAD {CONFIG} TO RUNTIME on the other instances, to make sure they all had the same changes.
A. Yes, that would work. Example, LOAD MYSQL QUERY RULES FROM DISK and LOAD MYSQL QUERY RULES TO RUNTIME would bring up changes into your secondary instance(s) after saving from primary.

Q. In summary, this feature seems powerful but could be tricky to manage.
A. Yes for administration here you must be careful. It is only if you want to be able to upgrade proxysql in-place without losing service for even a few seconds. Remember you can still do a lot with proxysql in its more typical mode of one binary running and listening on one port. You can still accept thousands of connections and route them to mysql backends based on almost any criteria.

Releasing ProxySQL 1.4.1

Releasing ProxySQL 1.4.1

ProxySQL is a high performance, high availability, protocol aware proxy for MySQL, with a GPL license!
Today I am excited to announce the release of ProxySQL 1.4.1, the first stable release of series 1.4.x .
The previous version of ProxySQL 1.4 (1.4.0) was released in April and was tested in various environments, leading to what today is a stable 1.4 release.
A list of features, improvements and changes introduced in 1.4 are listed here.
As announced previously, all ProxySQL versions 1.3.x will be mostly bug fixes releases, and unlikely will introduce new features: all the new features will go into ProxySQL 1.4 .
If you are currently using ProxySQL 1.3 in production, make sure to test 1.4 before upgrading, and to perform a backup before the upgrade itself.

A special thanks to all the people that reported and reports bugs: this makes each version of ProxySQL better than the previous one.
Please report any bugs or feature requests on github issue tracker

Thanks

Group Replication: The Sweet and the Sour

In this blog, we’ll look at group replication and how it deals with flow control (FC) and replication lag. 

Overview

In the last few months, we had two main actors in the MySQL ecosystem: ProxySQL and Group-Replication (with the evolution to InnoDB Cluster). 

While I have extensively covered the first, my last serious work on Group Replication dates back to some lab version years past.

Given that Oracle decided to declare it GA, and Percona’s decision to provide some level of Group Replication support, I decided it was time for me to take a look at it again.

We’ve seen a lot of coverage already too many Group Replication topics. There are articles about Group Replication and performance, Group Replication and basic functionalities (or lack of it like automatic node provisioning), Group Replication and ProxySQL, and so on.

But one question kept coming up over and over in my mind. If Group Replication and InnoDB Cluster have to work as an alternative to other (virtually) synchronous replication mechanisms, what changes do our customers need to consider if they want to move from one to the other?

Solutions using Galera (like Percona XtraDB Cluster) must take into account a central concept: clusters are data-centric. What matters is the data and the data state. Both must be the same on each node at any given time (commit/apply). To guarantee this, Percona XtraDB Cluster (and other solutions) use a set of data validation and Flow Control processes that work to the ensure a consistent cluster data set on each node.

The upshot of this principle is that an application can query ANY node in a Percona XtraDB Cluster and get the same data, or write to ANY node and know that the data is visible everywhere in the cluster at (virtually) the same time.

Last but not least, inconsistent nodes should be excluded and either rebuild or fixed before rejoining the cluster.

If you think about it, this is very useful. Guaranteeing consistency across nodes allows you to transparently split write/read operations, failover from one node to another with very few issues, and more.

When I conceived of this blog on Group Replication (or InnoDB Cluster), I put myself in the customer shoes. I asked myself: “Aside from all the other things we know (see above), what is the real impact of moving from Percona XtraDB Cluster to Group Replication/InnoDB Cluster for my application? Since Group Replication still (basically) uses replication with binlogs and relaylog, is there also a Flow Control mechanism?” An alarm bell started to ring in my mind.

My answer is: “Let’s do a proof of concept (PoC), and see what is really going on.”

The POC

I setup a simple set of servers using Group Replication with a very basic application performing writes on a single writer node, and (eventually) reads on the other nodes. 

You can find the schema definition here. Mainly I used the four tables from my windmills test suite — nothing special or specifically designed for Group Replication. I’ve used this test a lot for Percona XtraDB Cluster in the past, so was a perfect fit.

Test Definition

The application will do very simple work, and I wanted to test four main cases:

  1. One thread performing one insert at each transaction
  2. One thread performing 50 batched inserts at each transaction
  3. Eight threads performing one insert to each transaction
  4. Eight threads performing 50 batched inserts at each transaction

As you can see, a pretty simple set of operations. Then I decided to test it using the following four conditions on the servers:

  1. One slave worker FC as default
  2. One slave worker FC set to 25
  3. Eight slave workers FC as default
  4. Eight slave workers FC set to 25

Again nothing weird or strange from my point of view. I used four nodes:

  1. Gr1 Writer
  2. Gr2 Reader
  3. Gr3 Reader minimal latency (~10ms)
  4. Gr4 Reader minimal latency (~10ms)

Finally, I had to be sure I measured the lag in a way that allowed me to reference it consistently on all nodes. 

I think we can safely say that the incoming GTID (last_ Received_transaction_set from replication_connection_status) is definitely the last change applied to the master that the slave node knows about. More recent changes could have occurred, but network delay can prevent them from being “received.” The other point of reference is GTID_EXECUTED, which refers to the latest GTID processed on the node itself.

The closest query that can track the distance will be:

select @last_exec:=SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX( @@global.GTID_EXECUTED,':',-2),':',1),'-',-1) last_executed;select  @last_rec:=SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX( Received_transaction_set,':',-2),':',1),'-',-1) last_received FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier'; select (@last_rec - @last_exec) as real_lag

Or in the case of a single worker:

select @last_exec:=SUBSTRING_INDEX(SUBSTRING_INDEX( @@global.GTID_EXECUTED,':',-1),'-',-1) last_executed;select  @last_rec:=SUBSTRING_INDEX(SUBSTRING_INDEX(Received_transaction_set,':',-1),'-',-1) last_received FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier'; select (@last_rec - @last_exec) as real_lag;

The result will be something like this:

+---------------+ | last_executed | +---------------+ | 23607         | +---------------+ +---------------+ | last_received | +---------------+ | 23607         | +---------------+ +----------+ | real_lag | +----------+ |        0 | +----------+

The whole set of tests can be found here, with all the commands you need to run the application (you can find it here) and replicate the tests. I will focus on the results (otherwise this blog post would be far too long), but I invite you to see the details.

The Results Efficiency on Writer by Execution Time and Rows/Sec

Using the raw data from the tests (Excel spreadsheet available here), I was interested in identifying if and how the Writer is affected by the use of Group Replication and flow control.

Reviewing the graph, we can see that the Writer has a linear increase in the execution time (when using default flow control) that matches the increase in the load. Nothing there is concerning, and all-in-all we see what is expected if the load is light. The volume of rows at the end justifies the execution time.

It’s a different scenario if we use flow control. The execution time increases significantly in both cases (single worker/multiple workers). In the worst case (eight threads, 50 inserts batch) it becomes four times higher than the same load without flow control.

What happens to the inserted rows? In the application, I traced the rows inserted/sec. It is easy to see what is going on there:

We can see that the Writer with flow control activated inserts less than a third of the rows it processes without flow control. 

We can definitely say that flow control has a significant impact on the Writer performance. To clarify, let’s look at this graph:

Without flow control, the Writer processes a high volume of rows in a limited amount of time (results from the test of eight workers, eight threads, 50 insert batch). With flow control, the situation changes drastically. The Writer takes a long time processing a significantly smaller number of rows/sec. In short, performance drops significantly.

But hey, I’m OK with that if it means having a consistent data-set cross all nodes. In the end, Percona XtraDB Cluster and similar solutions pay a significant performance price match the data-centric principle. 

Let’s see what happen on the other nodes.

Entries Lag

Well, this scenario is not so good:

When NOT using flow control, the nodes lag behind the writer significantly. Remember that by default flow control in Group Replication is set to 25000 entries (I mean 25K of entries!!!).

What happens is that as soon as I put some salt (see load) on the Writer, the slave nodes start to lag. When using the default single worker, that will have a significant impact. While using multiple workers, we see that the lag happens mainly on the node(s) with minimal (10ms) network latency. The sad thing is that is not really going down with respect to the single thread worker, indicating that the simple minimal latency of 10ms is enough to affect replication.

Time to activate the flow control and have no lag:

Unfortunately, this is not the case. As we can see, the lag of single worker remains high for Gr2 (154 entries). While using multiple workers, the Gr3/4 nodes can perform much better, with significantly less lag (but still high at ~1k entries).

It is important to remember that at this time the Writer is processing one-third or less of the rows it is normally able to. It is also important to note that I set 25 to the entry limit in flow control, and the Gr3 (and Gr4) nodes are still lagging more than 1K entries behind.

To clarify, let check the two graphs below:

Using the Writer (Master) as a baseline in entry #N, without flow control, the nodes (slaves) using Group Replication start to significantly lag behind the writer (even with a light load).

The distance in this PoC ranged from very minimal (with 58 entries), up to much higher loads (3849 entries):

Using flow control, the Writer (Master) diverges less, as expected. If it has a significant drop in performance (one-third or less), the nodes still lag. The worst-case is up to 1363 entries. 

I need to underline here that we have no further way (that I am aware of, anyway) to tune the lag and prevent it from happening.

This means an application cannot transparently split writes/reads and expect consistency. The gap is too high.

A Graph That Tells Us a Story

I used Percona Monitoring and Management (PMM) to keep an eye on the nodes while doing the tests. One of the graphs really showed me that Group Replication still has some “limits” as the replication mechanism for a cluster:

This graph shows the MySQL queries executed on all the four nodes, in the testing using 8-50 threads-batch and flow control. 

As you can see, the Gr1 (Writer) is the first one to take off, followed by Gr2. Nodes Gr3 and Gr4 require a bit more, given the binlog transmission (and 10ms delay). Once the data is there, they match (inconsistently) the Gr2 node. This is an effect of flow control asking the Master to slow down. But as previously seen, the nodes will never match the Writer. When the load test is over, the nodes continue to process the queue for additional ~130 seconds. Considering that the whole load takes 420 seconds on the Writer, this means that one-third of the total time on the Writer is spent syncing the slave AFTERWARDS.

The above graph shows the same test without flow control. It is interesting to see how the Writer moved above 300 queries/sec, while G2 stayed around 200 and Gr3/4 far below. The Writer was able to process the whole load in ~120 seconds instead 420, while Gr3/4 continue to process the load for an additional ~360 seconds.

This means that without flow control set, the nodes lag around 360 seconds behind the Master. With flow control set to 25, they lag 130 seconds.

This is a significant gap.

Conclusions

Going back to the reason why I was started this PoC, it looks like my application(s) are not a good fit for Group Replication given that I have set Percona XtraDB Cluster to scale out the reads and efficiently move my writer to another when I need to. 

Group Replication is still based on asynchronous replication (as my colleague Kenny said). It makes sense in many other cases, but it doesn’t compare to solutions based on virtually synchronous replication. It still requires a lot of refinement.

On the other hand, for applications that can afford to have a significant gap between writers and readers it is probably fine. But … doesn’t standard replication already cover that? 

Reviewing the Oracle documentations (https://dev.mysql.com/doc/refman/5.7/en/group-replication-background.html), I can see why Group Replication as part of the InnoDB cluster could help improve high availability when compared to standard replication. 

But I also think it is important to understand that Group Replication (and derived solutions like InnoDB cluster) are not comparable or a replacement for data-centric solutions as Percona XtraDB Cluster. At least up to now.

Good MySQL to everyone.

Pages