Planet MySQL

Handling Large Data Volumes with MySQL and MariaDB

Most databases grow in size over time. The growth is not always fast enough to impact the performance of the database, but there are definitely cases where that happens. When it does, we often wonder what could be done to reduce that impact and how can we ensure smooth database operations when dealing with data on a large scale.

First of all, let’s try to define what does a “large data volume” mean? For MySQL or MariaDB it is uncompressed InnoDB. InnoDB works in a way that it strongly benefits from available memory - mainly the InnoDB buffer pool. As long as the data fits there, disk access is minimized to handling writes only - reads are served out of the memory. What happens when the data outgrows memory? More and more data has to be read from disk when there’s a need to access rows, which are not currently cached. When the amount of data increase, the workload switches from CPU-bound towards I/O-bound. It means that the bottleneck is no longer CPU (which was the case when the data fit in memory - data access in memory is fast, data transformation and aggregation is slower) but rather it’s the I/O subsystem (CPU operations on data are way faster than accessing data from disk.) With increased adoption of flash, I/O bound workloads are not that terrible as they used to be in the times of spinning drives (random access is way faster with SSD) but the performance hit is still there.

Another thing we have to keep in mind that we typically only care about the active dataset. Sure, you may have terabytes of data in your schema but if you have to access only last 5GB, this is actually quite a good situation. Sure, it still pose operational challenges, but performance-wise it should still be ok.

Let’s just assume for the purpose of this blog, and this is not a scientific definition, that by the large data volume we mean case where active data size significantly outgrows the size of the memory. It can be 100GB when you have 2GB of memory, it can be 20TB when you have 200GB of memory. The tipping point is that your workload is strictly I/O bound. Bear with us while we discuss some of the options that are available for MySQL and MariaDB.

Partitioning

The historical (but perfectly valid) approach to handling large volumes of data is to implement partitioning. The idea behind it is to split table into partitions, sort of a sub-tables. The split happens according to the rules defined by the user. Let’s take a look at some of the examples (the SQL examples are taken from MySQL 8.0 documentation)

MySQL 8.0 comes with following types of partitioning:

  • RANGE
  • LIST
  • COLUMNS
  • HASH
  • KEY

It can also create subpartitions. We are not going to rewrite documentation here but we would still like to give you some insight into how partitions work. To create partitions, you have to define the partitioning key. It can be a column or in case of RANGE or LIST multiple columns that will be used to define how the data should be split into partitions.

HASH partitioning requires user to define a column, which will be hashed. Then, the data will be split into user-defined number of partitions based on that hash value:

CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT, store_id INT ) PARTITION BY HASH( YEAR(hired) ) PARTITIONS 4;

In this case hash will be created based on the outcome generated by YEAR() function on ‘hired’ column.

KEY partitioning is similar with the exception that user define which column should be hashed and the rest is up to the MySQL to handle.

While HASH and KEY partitions randomly distributed data across the number of partitions, RANGE and LIST let user decide what to do. RANGE is commonly used with time or date:

CREATE TABLE quarterly_report_status ( report_id INT NOT NULL, report_status VARCHAR(20) NOT NULL, report_updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated) ) ( PARTITION p0 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-01 00:00:00') ), PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-04-01 00:00:00') ), PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-07-01 00:00:00') ), PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-10-01 00:00:00') ), PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-01-01 00:00:00') ), PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-04-01 00:00:00') ), PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-07-01 00:00:00') ), PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-10-01 00:00:00') ), PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP('2010-01-01 00:00:00') ), PARTITION p9 VALUES LESS THAN (MAXVALUE) );

It can also be used with other type of columns:

CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT NOT NULL, store_id INT NOT NULL ) PARTITION BY RANGE (store_id) ( PARTITION p0 VALUES LESS THAN (6), PARTITION p1 VALUES LESS THAN (11), PARTITION p2 VALUES LESS THAN (16), PARTITION p3 VALUES LESS THAN MAXVALUE );

The LIST partitions work based on a list of values that sorts the rows across multiple partitions:

CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT, store_id INT ) PARTITION BY LIST(store_id) ( PARTITION pNorth VALUES IN (3,5,6,9,17), PARTITION pEast VALUES IN (1,2,10,11,19,20), PARTITION pWest VALUES IN (4,12,13,14,18), PARTITION pCentral VALUES IN (7,8,15,16) );

What is the point in using partitions you may ask? The main point is that the lookups are significantly faster than with non-partitioned table. Let’s say that you want to search for the rows which were created in a given month. If you have several years worth of data stored in the table, this will be a challenge - an index will have to be used and, as we know, indexes help to find rows but accessing those rows will result in a bunch of random reads from the whole table. If you have partitions created on year-month basis, MySQL can just read all the rows from that particular partition - no need for accessing index, no need for doing random reads: just read all the data from the partition, sequentially, and we are all set.

Partitions are also very useful in dealing with data rotation. If MySQL can easily identify rows to delete and map them to single partition, instead of running DELETE FROM table WHERE …, which will use index to locate rows, you can truncate the partition. This is extremely useful with RANGE partitioning - sticking to the example above, if we want to keep data for 2 years only, we can easily create a cron job, which will remove old partition and create a new, empty one for next month.

InnoDB Compression

If we have a large volume of data (not necessarily thinking about databases), the first thing that comes to our mind is to compress it. There are numerous tools that provide an option to compress your files, significantly reducing their size. InnoDB also has an option for that - both MySQL and MariaDB supports InnoDB compression. The main advantage of using compression is the reduction of the I/O activity. Data, when compressed, is smaller thus it is faster to read and to write. Typical InnoDB page is 16KB in size, for SSD this is 4 I/O operations to read or write (SSD typically use 4KB pages). If we manage to compress 16KB into 4KB, we just reduced I/O operations by four. It does not really help much regarding dataset to memory ratio. Actually, it may even make it worse - MySQL, in order to operate on the data, has to decompress the page. Yet it reads compressed page from disk. This results in InnoDB buffer pool storing 4KB of compressed data and 16KB of uncompressed data. Of course, there are algorithms in place to remove unneeded data (uncompressed page will be removed when possible, keeping only compressed one in memory) but you cannot expect too much of an improvement in this area.

It is also important to keep in mind how compression works regarding the storage. Solid state drives are norm for database servers these days and they have a couple of specific characteristics. They are fast, they don’t care much whether traffic is sequential or random (even though they still prefer sequential access over the random). They are expensive for large volumes. They suffer from “worn out” as they can handle a limited number of write cycles. Compression significantly helps here - by reducing the size of the data on disk, we reduce the cost of the storage layer for database. By reducing the size of the data we write to disk, we increase the lifespan of the SSD.

Unfortunately, even if compression helps, for larger volumes of data it still may not be enough. Another step would be to look for something else than InnoDB.

MyRocks

MyRocks is a storage engine available for MySQL and MariaDB that is based on a different concept than InnoDB. My colleague, Sebastian Insausti, has a nice blog about using MyRocks with MariaDB. The gist is, due to its design (it uses Log Structured Merge, LSM), MyRocks is significantly better in terms of compression than InnoDB (which is based on B+Tree structure). MyRocks is designed for handling large amounts of data and to reduce the number of writes. It originated from Facebook, where data volumes are large and requirements to access the data are high. Thus SSD storage - still, on such a large scale every gain in compression is huge. MyRocks can deliver even up to 2x better compression than InnoDB (which means you cut the number of servers by two). It is also designed to reduce the write amplification (number of writes required to handle a change of the row contents) - it requires 10x less writes than InnoDB. This, obviously, reduces I/O load but, even more importantly, it will increase lifespan of a SSD ten times compared with handing the same load using InnoDB). From a performance standpoint, smaller the data volume, the faster the access thus storage engines like that can also help to get the data out of the database faster (even though it was not the highest priority when designing MyRocks).

Columnar Datastores Related resources  ClusterControl Performance Management  Understanding the Effects of High Latency in High Availability MySQL and MariaDB Solutions  MySQL Performance Cheat Sheet

At some point all we can do is to admit that we cannot handle such volume of data using MySQL. Sure, you can shard it, you can do different things but eventually it just doesn’t make sense anymore. It is time to look for additional solutions. One of them would be to use columnar datastores - databases, which are designed with big data analytics in mind. Sure, they will not help with OLTP type of the traffic but analytics are pretty much standard nowadays as companies try to be data-driven and make decisions based on exact numbers, not random data. There are numerous columnar datastores but we would like to mention here two of those. MariaDB AX and ClickHouse. We have a couple of blogs explaining what MariaDB AX is and how can MariaDB AX be used. What’s important, MariaDB AX can be scaled up in a form of a cluster, improving the performance. ClickHouse is another option for running analytics - ClickHouse can easily be configured to replicate data from MySQL, as we discussed in one of our blog posts. It is fast, it is free and it can also be used to form a cluster and to shard data for even better performance.

Conclusion

We hope that this blog post gave you insights into how large volumes of data can be handled in MySQL or MariaDB. Luckily, there are a couple of options at our disposal and, eventually, if we cannot really make it work, there are good alternatives.

Tags:  big data MySQL MariaDB

SQL Foreign Key Constraint Tutorial With Example

SQL Foreign Key Constraint Tutorial With Example is today’s topic. SQL foreign key is used to form a link between two tables, which makes it a referencing key.  The foreign key is a constraint, which is a column or a combination of the column which is used to point the primary key of another table. If a table has the primary key defined on any field(s), then a user cannot have the two records having the same value of that field(s).

SQL Foreign Key Constraint

The primary purpose of the foreign key is that only those values will appear in the columns which are present in the primary key table and a  foreign key is a referencing key in one table, the foreign key must match an existing primary key in the referenced table. This enforcement of foreign key is known as referential integrity.

Let’s understand the use of a foreign key with example.

We are going to create a foreign key with create keyword. See the following syntax.

Create table table_name( Column1 datatype, Column2 datatype, …….., Column(n) datatype, Constraint (constraint name) FOREIGN KEY [column1, column2…] REFERENCES [primary_key_table] (column_list_of_primary_key_table) ….); #PARAMETERS
  1. Table_name: Name of the table
  2. Column(n) datatype: Name of the column associated with its datatype.
  3. Constraint: This is a keyword, and this is optional.
  4. Constraint name: It is a constraint name defined by a user.
  5. Primary_key_table: This a table where the primary key resides.
  6. column_list_of_primary_key_table: List of columns which is used for making primary key in a table.

See the following query.

Create table customers ( Id integer primary key, Name varchar(20), Age integer, Address varchar(25), Salary integer);

In the above statements, we have created customers table with ID as a primary key.

Let’s create Table which will be having foreign keys.

Create table orders ( Id integer, Date date, Cust_id integer foreign key references Customers(Id), Amount integer, Primary key(Id));

In the above statements, we have created orders table with ID as primary key and Cust_id as reference key, i.e. a foreign key which is referring to the primary key of customers table.

#Creating a foreign key using ALTER statement

This is used when a table is created without making any column as a foreign key. Using the alter statement, we can create a foreign key. Then to do that use the following Syntax.

Alter table table_name ADD foreign key(column) references refer_table_name(Primary_key_of_refer_table); #PARAMETERS
  • Table_name: Name of the table which will contain the foreign key.
  • ADD foreign key: Used for creating a foreign key.
  • Foreign key(Column): Used for creating a column which will be the foreign key.
  • Refer_table_name: Name of the table which will be referred.
  • Primary_key_of_refer_table: Primary key of the table which is referred.

Consider previously created table orders. Suppose that orders table doesn’t contain the foreign key. So, to make a foreign key use the following statements.

Let’s create it.

Alter table orders ADD foreign key(Cust_id) REFERENCES CUSTOMERS(Id);

By the above statements, a foreign key name cust_id is created, which is referring to the primary key, an, i.e. ID of customers table.

#Dropping A Foreign Key

Suppose we want to delete foreign key. Then to do that use the following syntax.

ALTER TABLE table_name DROP foreign key foreign_key_name; #PARAMETERS
  1. table_name: Name of the table which is containing the primary key.
  2. DROP: Name of the keyword which is used for dropping.
  3. Foreign key: Name of the keyword which will identify the foreign key.
  4. Foreign_key_name: Name of the foreign key.

Suppose, we want to delete previously created foreign key.

Use the following statements.

ALTER TABLE orders DROP foreign key cust_id;

The above statement has deleted the orders table foreign key which was cust_id.

Finally, SQL Foreign Key Constraint Tutorial With Example is over.

The post SQL Foreign Key Constraint Tutorial With Example appeared first on AppDividend.

Optimize Your MySQL Systems with Focused Monitoring

Author: Robert Agar

The working life of a DBA is often centered on optimizing their systems to provide users with the best performance. There are many aspects of a MySQL implementation that can be tuned to increase its functionality. These improvements will result in a more efficient system and enhanced user experience.

One of the problems facing the DBA responsible for instituting performance tuning is to identify the particular areas to address that will impact the system in a positive manner. Approaching this task in a haphazard fashion is not advisable. Randomly modifying parameters is just as likely to cause further issues than to lead to performance gains. Luckily, there is a solution to this dilemma.

Conducting focused monitoring can be instrumental in enabling the DBA to pinpoint the source of performance degradation. Armed with this knowledge, informed decisions can be made regarding how to tune the MySQL system to reach its peak efficiency. Let’s look at some methods that can be used to produce the insight required to address your performance issues in a structured way. They will increase the probability that your modifications deliver results.

How to Monitor a MySQL System

Here are some methods and tactics to use when monitoring your MySQL system.

Begin by Gathering Baseline Metrics

Attempting to address performance issues without an understanding of the current environment is not a productive way to go about monitoring your system. For this reason, it is imperative that you start your journey by accumulating baseline metrics from which future decisions can be made. Some of the specific metrics to concentrate on are:

  • Rows examined and sent;
  • Length of time for queries to run;
  • Transaction throughput;
  • Wait and thread states.

Building a more complete picture of your system’s present state and levels of performance will allow you to make the right choices when tuning it.

Create a Monitoring Plan

The benefits of developing a good monitoring plan allow you to address these issues that impact your MySQL installation.

  • Performance – This may be your primary reason for being concerned with monitoring your system. Observing the system’s performance can alert you to bottlenecks or point you to SQL statements that can be written in a more efficient manner.

  • Security – Monitoring can alert you to security weaknesses and should be incorporated into your plan.

  • Growth – Your plan should include provisions to monitor user and data traffic. This allows you to be better prepared to address the growth necessary to handle the workload.

When Monitoring Should be Performed

Based on the critical nature of your database, monitoring in real-time may be essential. If a problem with the system could lead to serious consequences, you need to be alerted immediately so you can take corrective action. This is a vital component of any monitoring strategy.

You also want to be able to obtain a historical perspective regarding the metrics you are monitoring. Comparative analysis of regularly collected data and statistics can identify potential issues and allow you to proactively address them. This type of monitoring is also the best way to gauge the success of any tuning efforts that have been made by comparing performance before and after the change was implemented.

Some Specific Areas to Monitor

These aspects of your database form a great place to start monitoring your system.

  • Throughput – Tracking the number of requests your database handles daily is a key metric that can have an impact on the direction you take with your tuning efforts.

  • Execution time – Related to the number of requests is the time required to process them.

  • Utilization – An overall metric that reports on the availability of the database. This metric is important when considering the performance of the system.

The Importance of Your Monitoring Tools

The tremendous amount of metrics that are available to a MySQL DBA make it imperative that they have quality tools at their disposal. It should offer the ability to create custom alerts which allow timely corrective action to be taken. The tool should present historical data for long-term analysis. Another important quality is the ability to generate reports and visually present data for better comprehension.

SQL Diagnostic Manager for MySQL (formerly Monyog) is a comprehensive monitoring solution for your MySQL databases. It provides customizable real-time monitoring and the ability to track all changes made to your MySQL configuration. You can create custom charts and dashboards that enable you to convey the data to all interested parties. It’s an excellent tool for discovering ways to optimize your system.

The post Optimize Your MySQL Systems with Focused Monitoring appeared first on Monyog Blog.

MySQL 8 support for custom SQL CHECK constraints

Introduction In this article, we are going to test the MySQL 8 implementation of custom SQL CHECK constraints. Although the CHECK clause is a standard SQL feature, prior to MySQL 8.0.16, the clause was parsed and ignored, so this functionality could only be emulated via BEFORE INSERT and UPDATE triggers. Custom SQL CHECK constraints As I explained in this article, custom SQL CHECK constraints are very useful to ensure non-nullability constraints for JPA entity subclass-specific attributes when using the SINGLE TABLE JPA inheritance strategy. To understand the problem, consider we have the... Read More

The post MySQL 8 support for custom SQL CHECK constraints appeared first on Vlad Mihalcea.

The People and Values Behind Great Software-as-a-Service (SaaS)

Great SaaS – starts with great software. My company has been working on this mission since 2004.

It took a few years to get our solution right – to offer a geo-distributed MySQL database backend for billion-dollar businesses. And it took a few more years to polish it.

Today, we continue to improve it day-in, day-out.

But there is no doubt the patience and perseverance pays off:

While the number of Tungsten-clustered databases in-production is in the thousands, we average only one (1) new support request per day, and, because of this, we provide an average response time less than three (3) minutes.

Not enough is said about the people and values behind great software; so that’s what I want to talk about here. Continuent is an agile, globally-distributed, remote team that is much greater than the sum of its parts. Our software helps global enterprises manage the risks associated with business-critical data, which requires a cutting-edge level of innovation and performance!

In case you missed it, Continuent falls under the category of “companies in the Open Source ecosystem” as described by our founder and CEO, Eero Teerikorpi:

Commercial software companies … monetizing their proprietary solutions by adding value on top of open source projects.

Thus, our culture is heavily influenced by its affiliation with Open Source, nicely described in this RedHat blog.

But what does this look like in practice?

Every six months, the whole company meets in-person in either the USA or Europe. At these company meetings, we reflect, get aligned for the future, and we get to know our teammates in-person (a strange and fun experience). 

Lake Champlain in Burlington, VT, USA: Spring 2019 Company Meeting

 

Since we’re actually seen only once every six months, what actually motivates us in the day-to-day – and how do we work against the challenges of being a distributed SaaS team? Eero summed it up at our last company meeting: 

Working for Continuent means the freedom to work 24/7.

We each take responsibility for the team’s mission, we do whatever it takes to provide increasing value; and in exchange, we get the ultimate flexibility. 

As a result, in my early twenties I was able to live in an RV and a sailboat and travel up and down the West Coast of the USA; now I find myself one of the luckiest moms in Tech and my company doesn’t need a formal policy to make it so!

How is this possible? Besides having a great product developed and supported through literally decades of experience, I believe it boils down to the following:

  • Growth mindset – We all strive to learn, innovate and find new ways to make things better. Nobody is put down for asking questions, and everyone is receptive to feedback. Things are pretty transparent; and this makes for a truly dynamic team. As Agustin Gallego said in his state of the community talk at Percona Live in Austin this year, “Do your homework, and no-one will ever refuse to help you.”
  • Trust/respect – We’re all passionate about our work, but it isn’t always easy to openly disagree with one another, especially when we all care so much. The beauty is that we have common ground in our mission to provide great SaaS; and this fluid collaboration wouldn’t be possible if we didn’t trust and respect our teammates and communicate in a way that shows this. 

At the end of the day, when we get to work with such awesome customers year after year, and hear what a difference our products make to them professionally and personally (it’s a matter of being able to sleep at night for some…), it’s all worth the effort!

If you’re interested in learning more, please visit this page for commercial-grade Tungsten Clustering, providing high availability, disaster recovery, and a host of other benefits especially for multi-region applications running on MySQL, MariaDB, or Percona Server; and this page for our Tungsten Replicator, good for analytics and other heterogeneous data integration for a variety of targets including Vertica, Redshift, Kafka, and more. 

MySQL: How to monitor MySQL Replication Lag in MilliSeconds With PMM and pt-heartbeat



There could be various requirements at application end which need realtime slaves. Let me introduce a solution to a problem we used to face how can we monitor whether our slaves are real time or they are lagging in Milliseconds. Unfortunately there is no built in feature in MySQL to get Replication Lag in MilliSeconds.
Perhaps there is a tool provisioned in pt-toolkit named as pt-heartbeat. It generates heartbeat events on master and monitoring system can monitor time difference on slave to calculate lag.
How to deploy pt-heartbeat in your environment (Assuming OS as UBUNTU xx.xx):

Step 1: Download pt-heartbeat using below command.
wget http://percona.com/get/pt-heartbeat -P /usr/bin/ && chmod +x /usr/bin/pt-heartbeat Step 2: It requires a database where it can create a table. Let's create it on master.
mysql -uUser -pPassword -e "create database if not exists percona;" Step 3: Create heartbeat table in percona db.
pt-heartbeat -D percona --create-table --check -uUser -pPassword --master-server-id ServerId Step 4: Start pt-heartbeat daemon
pt-heartbeat -D percona --daemonize --update h=127.0.0.1,u=User,p=Password --utc Step 5: Changes in pmm client to monitor lag in ms.
pmm-admin remove mysql:metrics

pmm-admin add mysql:metrics --user=monitor --password=xxxx -- --collect.heartbeat.database=percona --collect.heartbeat=true Step 6: Use Below grafana panel json to add new graph on dashboard
add panel > Graph > Panel Title > More > Panel Json > Update
{

  "aliasColors": {},

  "bars": false,

  "dashLength": 10,

  "dashes": false,

  "datasource": "Prometheus",

  "description": "Requires pt-heartbeat\nThis graph requires mysqld_exporter with -collect.heartbeat=true and a pt-heartbeat daemon running on MySQL masters",

  "fill": 2,

  "gridPos": {

    "h": 7,

    "w": 12,

    "x": 12,

    "y": 3

  },

  "id": 41,

  "legend": {

    "alignAsTable": true,

    "avg": true,

    "current": false,

    "max": true,

    "min": true,

    "show": true,

    "total": false,

    "values": true

  },

  "lines": true,

  "linewidth": 2,

  "links": [],

  "nullPointMode": "null",

  "percentage": false,

  "pointradius": 5,

  "points": false,

  "renderer": "flot",

  "seriesOverrides": [

    {

      "alias": "Lag",

      "color": "#bf1b00"

    }

  ],

  "spaceLength": 10,

  "stack": false,

  "steppedLine": false,

  "targets": [

    {

      "expr": "max_over_time(mysql_heartbeat_now_timestamp_seconds{instance=\"$host\"}[$interval]) - \nmax_over_time(mysql_heartbeat_stored_timestamp_seconds{instance=\"$host\"}[$interval]) or \nmax_over_time(mysql_heartbeat_now_timestamp_seconds{instance=\"$host\"}[5m]) - \nmax_over_time(mysql_heartbeat_stored_timestamp_seconds{instance=\"$host\"}[5m])",

      "format": "time_series",

      "interval": "$interval",

      "intervalFactor": 1,

      "legendFormat": "Lag for server-id={{server_id}}",

      "refId": "A"

    }

  ],

  "thresholds": [],

  "timeFrom": null,

  "timeShift": null,

  "title": "MySQL Heartbeat Replication Delay",

  "tooltip": {

    "shared": true,

    "sort": 0,

    "value_type": "individual"

  },

  "type": "graph",

  "xaxis": {

    "buckets": null,

    "mode": "time",

    "name": null,

    "show": true,

    "values": []

  },

  "yaxes": [

    {

      "format": "short",

      "label": null,

      "logBase": 1,

      "max": null,

      "min": "0",

      "show": true

    },

    {

      "format": "short",

      "label": null,

      "logBase": 1,

      "max": null,

      "min": null,

      "show": false

    }

  ],

  "yaxis": {

    "align": false,

    "alignLevel": null

  }

}

Find, Monitor, and Analyze Problematic SQL Queries – SQL Diagnostic Manager for MySQL (formerly Monyog)

In week 3 of our series, Benefits of SQL Diagnostic Manager for MySQL (formerly Monyog), we discuss how to identify and analyze problematic SQL queries using SQL Diagnostic Manager for MySQL. If you missed it, feel free to read our previous post on Agentless Monitoring and Cloud Readiness.

Find Problematic SQL Queries

MySQL and MariaDB currently lack advanced tools for profiling SQL queries (such as SQL Profiler of Microsoft’ SQL Server). While other monitoring tools for MySQL and MariaDB provide monitoring and advisory information on various system metrics, they do not pinpoint the problematic SQL queries. No amount of hardware upgrades and tuning of the parameters in the database server configuration file ‘my.cnf’ and the database server initialization file ‘my.ini’ can match the performance gains that are achievable when identifying and rewriting problematic queries and creating appropriate indexes.

SQL Diagnostic Manager for MySQL finds problematic SQL queries by:

  • Querying the MySQL Proxy application that clients and applications are configured to connect through
  • Analyzing slow query logs
  • Analyzing general query logs
  • Issuing the SHOW PROCESSLIST statement at regular intervals Utilizing Performance Schema tables

Additionally, export the reports created by the above methods as comma-separated values (CSV) files. The export functionality means that it is possible to further customize the report using a spreadsheet and by directly importing the comma-separated values (CSV) output into a MySQL and MariaDB table for further analysis.

Monitor and Analyze SQL Queries

Identifying problematic SQL queries (that is, by looking at slow query logs, general query logs, the Performance Schema feature, the SHOW PROCESSLIST statement, and the MySQL Proxy application) is important. However, this analysis is usually performed well after the problematic SQL queries executed. Consequently, real-time notifications for long-running queries are needed. SQL Diagnostic Manager for MySQL continuously monitors SQL queries in real-time. It sends notifications (via email, simple network management protocol (SNMP), the collaboration platform Slack, the incident response platform PagerDuty, and the messaging logging standard syslog) for SQL queries that take more than a specified amount of time to execute. It is also possible to specify an option to automatically kill such queries.

Monitor Amazon RDS and its Operating System

SQL Diagnostic Manager for MySQL provides monitoring capabilities for Amazon RDS for MySQL, MariaDB, and Amazon Aurora and its operating system. It uses the application programming interface (API) of Amazon CloudWatch. It uses the different metrics for the operating system that are available with the application programming interface (API) to fetch and display the data. All of the operating system monitors for Amazon RDS are shown under the monitor RDS/Aurora Instance Metrics group in the Monitors page. The corresponding charts are available on the Dashboard page. Just enable system metric for the Amazon RDS Aurora instance to see the data for the operating system.

Read more in the full solution brief.

Find and fix MySQL performance problems on-premises and in the cloud with SQL Diagnostic Manager for MySQL.

The post Find, Monitor, and Analyze Problematic SQL Queries – SQL Diagnostic Manager for MySQL (formerly Monyog) appeared first on Monyog Blog.

Communications link failure MySQL JDBC with TLS

Ran into an interesting situation trying to configure a MySQL JDBC driver to connect over TLS (though the driver may call it SSL, TLS is the name for more recent versions of the protocol).

The error I was getting was pretty generic:

Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.

With the relevant parts of the stacktrace, also being non helpful:

at com.mysql.cj.jdbc.exceptions.SQLError.createCommunicationsException(SQLError.java:174) at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:64) at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:835) at com.mysql.cj.jdbc.ConnectionImpl.(ConnectionImpl.java:455) at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:240) at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:199)

With a bit of googling you will find that there are tons of reasons why you might get this particular error message. Here are some of the most common ones:

  • Firewall is blocking the connection (it wasn't)
  • The MySQL server is not configured to allow remote connections (it did allow them)
  • Using an outdated JDBC driver (it wasn't)
  • Java trying to use IPv6, instead of IPv4 (it wasn't)

I had tested that I was able to connect from the app server to the MySQL server using the mysql command:

mysql --host=10.1.2.3 --user=user --password --ssl --ssl-ca=./ca.pem

So my JDBC connection string looked something like this:

jdbc:mysql://10.1.2.3:3306/dbname?useSSL=true&requireSSL=true&trustCertificateKeyStoreUrl=file:///path/to/mysql-ca-truststore.p12&trustCertificateKeyStoreType=PKCS12&trustCertificateKeyStorePassword=pwd&enabledTLSProtocols=TLSv1.2

I was certain that my trustCertificateKeyStoreUrl was valid because I had used the same file to successfully connect over TLS on another server.

If you look closely you can see I was trying to force the use of TLS 1.2 as the TLS protocol to avoid using older less secure versions of the TLS / SSL protocols with enabledTLSProtocols=TLSv1.2. After some testing I found that when I removed enabledTLSProtocols=TLSv1.2 from the JDBC connection string it worked. After I did this I wanted to see what protocol I was using, so I ran a query:

SHOW SESSION STATUS

This showed that the Ssl_version my connection was using was TLSv1.1. Next I queried to see what TLS protocols the server supports:

SHOW GLOBAL VARIABLES

This query shows that the tls_version enabled for this particular server was: TLSv1,TLSv1.1 - so TLSv1.2 was not supported by this MySQL server, and that is why I was getting a Communications link failure error message.

This is a MySQL install from an Ubuntu server using MySQL community edition. Well it turns out that MySQL supports two TLS/SSL implementations and the community edition is usually built with yaSSL (which only supports TLS 1 and TLS 1.1), while if it is compiled with OpenSSL (such as with the Enterprise edition) it supports TLS 1.2 as well. This detail is important, and really easy to overlook.

This is probably a good reason to use MariaDB instead of MySQL, recent versions of MariaDB even support TLSv1.3.

Plot your Location MySQL Workbench and OpenStreetMap

MySQL has added a lot of functionality for Geographical Information System (GIS) data in the last two releases. This has given us better ways to save and explore GIS data. MySQL Workbench is an amazing tool that many do not realize has vastly upped its game for supporting GIS information.  But did you know you can use MySQL Workbench with OpenStreetMap to plot locations?

1. Get your Longitude and Latitude
These, believe it or not, is the hard part.  Go to  https://www.latlong.net/ and type in the name of your location.  It will report back your longitude and your latitude.  For the example I will be using the information for the city of Justin, Texas.


Using www.longlat.net to find the location of Justin, Texas
2. Create Table and Add Data Now create a table using MySQL Workbench in your favorite test schema to store your data.

CREATE TABLE city (id int unsigned auto_increment primary key, 
             name char(25), 
  loc point);

And then add the information and as before the example is for my home town.

INSERT INTO city (name,loc) VALUES 
('Justin', ST_GeomFromText('point(33.084843 -97.296127)',4326));


3. Query With Workbench
Now query your data from workbench.  If you only have the one record SELECT * FROM city; will suffice or use SELECT loc FROM city WHERE name='<yourcitynamehere>';

Running the query to get the longitude and latitude of Jsutin, Texas using MySQL Workbench. Note the 'BLOB' under the loc column
4. Open In a Browser

If you right click on the 'BLOB' icon under the 'loc' column, one of the choice is show point in browser. Pick that choice and you will get redirected to OpenStreetMap.org

The OpenStreetMap.org map of Justin, Texas

MySQL to the Cloud! Thoughts on Migrating Away from On-Premise

The CTO has issued the decree: “We are moving to the cloud!”  Great, so now what do we do? When it comes to migrating from on-premise to the cloud, there are many factors to consider and decisions that need to be made.  First (and probably most important) on that list: managed DBaaS or setup and manage our own infrastructure?

Type of Cloud

Managed DBaaS options are great but come with some limitations.  The main two questions that should be considered here are around staff/experience and current architecture/database design.  In cases where there is a very limited database (or operations) team, a DBaaS is a great choice. Much of the operational infrastructure is already in place with general best practices in place.  However, a big tradeoff comes around flexibility. In cases where the current infrastructure is complex (for reasons right or wrong), a DBaaS is definitely the wrong choice as you will be extremely limited in what you can do. 

Here are some high-level considerations for determining if DBaaS is a good fit:

  • Simple schema
  • Standard existing architecture (master/slave, VIP failover)
  • Standard CRUD workload (no elaborate procedures or business logic in the database)
  • Fairly easy to shard/partition data (vertical and/or horizontal)
  • Management considerations (DBaaS is “managed”, but still requires manual work that is different from on-premise installations)
    • Minor upgrades/maintenance windows
    • Query analysis
    • Some standard operations (such as point-in-time recovery) are different in DBaaS

Naturally, this list isn’t all inclusive and the best fit will vary based on your specific use case, but it can serve as a good starting point.  Also, the complexity of managing a production-grade infrastructure in the cloud manually isn’t trivial and many current on-premise techniques will need to be modified.

Once you have landed on a destination, the most important phase begins.  Testing! I can’t emphasize enough how important testing is for migration into the cloud.  But what should be tested and how? Here are the main things that need to be tested thoroughly before the migration:

  • Schema
  • Workload
Schema Testing

This aspect of the testing should be fairly straightforward. Regardless of the specifics of your migration (DBaaS or self-managed), you’ll want to take a current snapshot of your database.  Ideally, this would be done using Percona’s Xtrabackup tool (procedure here).  However, mysqldump (or mydumper/myloader) is also a viable option if a binary snapshot isn’t supported by your target DBaaS.  The biggest downside to a logical dump is the time it takes to restore, but it does allow you to make tweaks to the schema prior to the reload.

One of the biggest issues when doing a restore into a DBaaS is having triggers or procedures defined by the root user.  In many cases (such as Amazon RDS), the root user isn’t available and this will cause the import to fail. As a best practice, you shouldn’t be creating procedures or triggers with the root user, but unfortunately, it isn’t uncommon.  Once you have verified that the snapshot is successfully restored in your target cloud environment, you can move onto the workload and performance testing.

Workload Testing

In many cases, these can be tested in parallel and both are equally important.  In preparation to test query compatibility and performance, you will want to capture live query traffic immediately following the snapshot used to test schema compatibility.  It is important to make sure the capture is done right after a snapshot to ensure an accurate representation of the schema in time that matches the traffic.  Testing concurrent write performance on a traffic set that will never lock or modify any rows won’t be realistic when compared with traffic that is actually modifying production data.

The best way to capture traffic is by using the slow query log (as it is most compatible with other tools and shares the most information).  Ideally, you can set the long_query_time = 0 to capture all query traffic, but your production traffic may make that challenging and adjustments may need to be made.

From Percona Toolkit, pt-upgrade is the primary tool to test that there are no differences in the workload.  The tool checks to make sure that the following are the same on both the current server and the new server:

  • Row Count
  • Row Data
  • Query Time (difference should be within an order a magnitude)
  • Warnings/errors

Assuming this tool shows no errors, you’ll want to ensure similar performance.  If your organization has a way to generate a realistic load (note that being realistic is important here), an extra step would be to compare performance metrics between the current environment and the target environment.  The easiest way to compare is to use a visualization tool such as Percona Monitoring and Management to ensure no detrimental metrics (lower throughput, major differences in IO/CPU, etc) are displayed.

Pre-Migration

Now that all the testing has happened, results have been signed off on, and the stakeholders are happy, it is time to actually move your data to the cloud.  The first step is to restore a snapshot of the current production. Again, this can be done using a logical snapshot (mysqldump) or a binary snapshot (xtrabackup if possible).  Just like setting up any other slave server, capturing the binlog coordinates is also needed.  

With the snapshot loaded in the target cloud server, it is time to set up replication from the current production environment to the new cloud server.  This is one very important step in the migration process – there needs to be a secure connection set up between the cloud server and production. This can be achieved in a few ways:

  • VPN tunnel to the cloud provider
    • This is the easiest method, assuming you have a VPN device available onsite
  • SSH tunnel
    • In this method, you create an SSH tunnel locally on the source server and connect to that port from the cloud server

In either scenario, you will use the replication coordinates from your snapshot and start replication from that point.  Again, various cloud deployments may be slightly different in terms of how this is set up (RDS, for example, uses the CALL mysql.rds_set_external_master procedure), but the concept is generally the same.

It is very important that this initial replication link between the cloud server and the current production database is encrypted.  Even with “non-critical” data, it is a best practice to ensure that your database traffic is encrypted if it needs to be sent over the WAN!

Cutover Time

Finally, a replica is in the cloud, is in sync, and you are ready to pull the trigger.  Through all of the testing, this should hopefully be a “trivial non-event”. In the case where the entire environment is moving to the cloud (application servers included), the new instances should be already be configured to use the new cloud server for a data source.  For the migration to go live, traffic simply needs to hit the new application servers in the cloud.

In the other scenario of keeping the application servers local while just moving the database tier to the cloud, you would simply need to update your application servers to point to the new database server.  Like the replication piece, if the application servers will live outside of the cloud, SSL connectivity is still a best practice.    

Summary

Thankfully, this process isn’t too different from a standard migration to a new server.  The biggest differences could potentially be found when migrating from on-premise to a DBaaS solution, but even then the process for executing the migration is similar.  Also, planning for cloud component failure (which again is a standard best practice, but equally important in the cloud) is something that should be considered in the plan.

Setting World-Writable File Permissions Prior to Preparing the Backup Can Break It

It’s bad practice to provide world-writable access to critical files in Linux, though we’ve seen time and time again that this is done to conveniently share files with other users, applications, or services. But with Xtrabackup, preparing backups could go wrong if the backup configuration has world-writable file permissions.

Say you performed a backup on a MySQL instance configured with data-at-rest encryption using the keyring plugin. On the backup directory, the generated backup-my.cnf contains these instructions to load this plugin that will be used by Xtrabackup while preparing the backup:

backup-my.cnf

[mysqld] innodb_checksum_algorithm=crc32 innodb_log_checksum_algorithm=strict_crc32 innodb_data_file_path=ibdata1:12M:autoextend innodb_log_files_in_group=2 innodb_log_file_size=1073741824 innodb_fast_checksum=false innodb_page_size=16384 innodb_log_block_size=512 innodb_undo_directory=./ innodb_undo_tablespaces=0 server_id=0 redo_log_version=1 plugin_load=keyring_file.so server_uuid=00005726-0000-0000-0000-000000005726 master_key_id=1

Perhaps you wanted to share the backup with another user, but made a mistake of making the directory and its contents world-writable: chmod -R 777 /backup/mysql

When that user prepares the backup, the corresponding output will show that Xtrabackup ignored reading backup-my.cnf and so it doesn’t know that it has to load the keyring plugin to decrypt the .ibd files:

~$ xtrabackup --prepare --keyring-file-data=/backup/mysql/keyring --target-dir=/backup/mysql xtrabackup: [Warning] World-writable config file '/backup/mysql/backup-my.cnf' is ignored. xtrabackup: recognized server arguments: xtrabackup: [Warning] World-writable config file '/backup/mysql/backup-my.cnf' is ignored. xtrabackup: recognized client arguments: --prepare=1 --target-dir=/backup/mysql xtrabackup version 2.4.14 based on MySQL server 5.7.19 Linux (x86_64) (revision id: ef675d4) xtrabackup: cd to /backup/mysql/ xtrabackup: This target seems to be not prepared yet. InnoDB: Number of pools: 1 xtrabackup: xtrabackup_logfile detected: size=215089152, start_lsn=(3094928949) xtrabackup: using the following InnoDB configuration for recovery: xtrabackup: innodb_data_home_dir = . xtrabackup: innodb_data_file_path = ibdata1:10M:autoextend xtrabackup: innodb_log_group_home_dir = . xtrabackup: innodb_log_files_in_group = 1 xtrabackup: innodb_log_file_size = 215089152 xtrabackup: [Warning] World-writable config file './backup-my.cnf' is ignored. xtrabackup: using the following InnoDB configuration for recovery: xtrabackup: innodb_data_home_dir = . xtrabackup: innodb_data_file_path = ibdata1:10M:autoextend xtrabackup: innodb_log_group_home_dir = . xtrabackup: innodb_log_files_in_group = 1 xtrabackup: innodb_log_file_size = 215089152 xtrabackup: Starting InnoDB instance for recovery. xtrabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter) InnoDB: PUNCH HOLE support available InnoDB: Mutexes and rw_locks use GCC atomic builtins InnoDB: Uses event mutexes InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier InnoDB: Compressed tables use zlib 1.2.8 InnoDB: Number of pools: 1 InnoDB: Using CPU crc32 instructions InnoDB: Initializing buffer pool, total size = 100M, instances = 1, chunk size = 100M InnoDB: Completed initialization of buffer pool InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). InnoDB: Highest supported file format is Barracuda. InnoDB: Encryption can't find master key, please check the keyring plugin is loaded. InnoDB: Encryption information in datafile: ./sbtest/sbtest2.ibd can't be decrypted. InnoDB: Encryption can't find master key, please check the keyring plugin is loaded. InnoDB: Encryption information in datafile: ./sbtest/sbtest1.ibd can't be decrypted. InnoDB: Encryption can't find master key, please check the keyring plugin is loaded. InnoDB: Encryption information in datafile: ./sbtest/sbtest4.ibd can't be decrypted. InnoDB: Encryption can't find master key, please check the keyring plugin is loaded. InnoDB: Encryption information in datafile: ./sbtest/sbtest3.ibd can't be decrypted. InnoDB: Encryption can't find master key, please check the keyring plugin is loaded. InnoDB: Encryption information in datafile: ./sbtest/sbtest5.ibd can't be decrypted. InnoDB: Log scan progressed past the checkpoint lsn 3094928949 ** redacted ** InnoDB: Doing recovery: scanned up to log sequence number 3097681408 (1%) InnoDB: Doing recovery: scanned up to log sequence number 3102924288 (4%) InnoDB: Doing recovery: scanned up to log sequence number 3108167168 (6%) InnoDB: Doing recovery: scanned up to log sequence number 3113410048 (9%) InnoDB: Doing recovery: scanned up to log sequence number 3118652928 (12%) InnoDB: Doing recovery: scanned up to log sequence number 3123895808 (15%) InnoDB: Doing recovery: scanned up to log sequence number 3129138688 (17%) InnoDB: Doing recovery: scanned up to log sequence number 3134381568 (20%) InnoDB: Starting an apply batch of log records to the database... InnoDB: Progress in percent: 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 ** redacted ** InnoDB: Doing recovery: scanned up to log sequence number 3265453568 (89%) InnoDB: Doing recovery: scanned up to log sequence number 3270696448 (91%) InnoDB: Doing recovery: scanned up to log sequence number 3275939328 (94%) InnoDB: Doing recovery: scanned up to log sequence number 3281182208 (97%) InnoDB: Doing recovery: scanned up to log sequence number 3286158358 (100%) InnoDB: Starting an apply batch of log records to the database... InnoDB: Progress in percent: 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 InnoDB: Apply batch completed InnoDB: xtrabackup: Last MySQL binlog file position 568369058, file name mysql-bin.000004 InnoDB: Encryption can't find master key, please check the keyring plugin is loaded. InnoDB: Encryption information in datafile: ./sbtest/sbtest1.ibd can't be decrypted. InnoDB: Removing missing table `sbtest/sbtest1` from InnoDB data dictionary. InnoDB: Encryption can't find master key, please check the keyring plugin is loaded. InnoDB: Encryption information in datafile: ./sbtest/sbtest2.ibd can't be decrypted. InnoDB: Removing missing table `sbtest/sbtest2` from InnoDB data dictionary. InnoDB: Encryption can't find master key, please check the keyring plugin is loaded. InnoDB: Encryption information in datafile: ./sbtest/sbtest3.ibd can't be decrypted. InnoDB: Removing missing table `sbtest/sbtest3` from InnoDB data dictionary. InnoDB: Encryption can't find master key, please check the keyring plugin is loaded. InnoDB: Encryption information in datafile: ./sbtest/sbtest4.ibd can't be decrypted. InnoDB: Removing missing table `sbtest/sbtest4` from InnoDB data dictionary. InnoDB: Encryption can't find master key, please check the keyring plugin is loaded. InnoDB: Encryption information in datafile: ./sbtest/sbtest5.ibd can't be decrypted. InnoDB: Removing missing table `sbtest/sbtest5` from InnoDB data dictionary. InnoDB: Creating shared tablespace for temporary tables InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... InnoDB: File './ibtmp1' size is now 12 MB. InnoDB: 96 redo rollback segment(s) found. 1 redo rollback segment(s) are active. InnoDB: 32 non-redo rollback segment(s) are active. InnoDB: page_cleaner: 1000ms intended loop took 6627ms. The settings might not be optimal. (flushed=0 and evicted=0, during the time.) InnoDB: 5.7.19 started; log sequence number 3286158358 InnoDB: xtrabackup: Last MySQL binlog file position 568369058, file name mysql-bin.000004

Even if you fix the permissions on backup-my.cnf, if you try to prepare the same backup again, Xtrabackup will warn you that it has already prepared the backup.

~$ xtrabackup --prepare --keyring-file-data=/backup/mysql/keyring --target-dir=/backup/mysql xtrabackup: recognized server arguments: --innodb_checksum_algorithm=crc32 --innodb_log_checksum_algorithm=strict_crc32 --innodb_data_file_path=ibdata1:12M:autoextend --innodb_log_files_in_group=2 --innodb_log_file_size=1073741824 --innodb_fast_checksum=0 --innodb_page_size=16384 --innodb_log_block_size=512 --innodb_undo_directory=./ --innodb_undo_tablespaces=0 --server-id=0 --redo-log-version=1 xtrabackup: recognized client arguments: --innodb_checksum_algorithm=crc32 --innodb_log_checksum_algorithm=strict_crc32 --innodb_data_file_path=ibdata1:12M:autoextend --innodb_log_files_in_group=2 --innodb_log_file_size=1073741824 --innodb_fast_checksum=0 --innodb_page_size=16384 --innodb_log_block_size=512 --innodb_undo_directory=./ --innodb_undo_tablespaces=0 --server-id=0 --redo-log-version=1 --prepare=1 --target-dir=/backup/mysql xtrabackup version 2.4.14 based on MySQL server 5.7.19 Linux (x86_64) (revision id: ef675d4) xtrabackup: cd to /backup/mysql/ xtrabackup: This target seems to be already prepared. InnoDB: Number of pools: 1

This means that changes made while the backup was taking place will not be applied and what you have restored is an inconsistent, potentially corrupt backup. You need to perform a full backup again and make sure that you do not place world/other writable permissions on the backup this around so that you will not face the same issue.

MySQL: How To Sync Specific Table(s) from Master to Slave





Most of us used to get errors like (Row not found, Duplicate row etc) on slave in Master slave replication and sometimes it is very difficult to find unsynced data and fix it while we know table name(s).
There are few recommended tools from percona to check replication integrity and fixed unsync data:
1. pt-table-checksum: performs an online replication consistency check by executing checksum queries on the master, which produces different results on replicas that are inconsistent with the master.
2. pt-table-sync: synchronizes data efficiently between MySQL tables. Recommended tools work well with limited data with less time, But what if table size is more and we don't have time to analyze each row of the table with recommended tools which is time consuming? We can achieve this with very simple step though:
Let's assume a simple cluster(MySQL 5.7.XX) with master m1 and slave s1 with GTID based replication replicating a db(db_ankit) with 5 tables(a,b,c,d,e). Now due to some issue data of table c in db_ankit on slave s1 got corrupted and table if huge in size like > 500GB.
  1. On Slave s1: Start slave with replication filter to ignore table c in replication.
  2. From Master m1: Dump table c with replication co-ordinates or GTID. mydumper can be used with multiple threads for faster dump according to the server capacity.
  3. On Slave s1: Rename current table c and restore(myloader with multiple threads will save your time if dump has been taken with mydumper) dumped table.
  4. On Slave s1: Once restore finishes stop the slave and note down the master exec position and GTID executed sets.
  5. On Slave s1: Reset slave & Change replication Filter to replicate only c.
  6. On Slave s1: Run Change master to start replication of table c from coordinates of dump taken in step 2.
  7. On Slave s1: start slave until the coordinates noted in step 4.
  8. On Slave s1: when replication lag becomes 0 and sql_thread_status 0 remove replication filter and start slave without any restriction.


In this way with the help of multithreaded dump and restore one can sync specific table(s) from master.

MySQL Functional Index and use cases.

MySQL has introduced the concept of functional index in MySQL 8.0.13. It is one of the much needed feature for query optimisation , we have seen about histogram in my last blog. Let us explore the functional index and its use cases.

For the below explanation, I have used a production scenario which has 16 core cpu, 32GB RAM and with MySQL version 8.0.16(Latest at the time of writing).

MySQL do support indexing on columns or prefixes of column values (length).

Example: 

mysql>show create table app_user\G *************************** 1. row *************************** Table: app_user Create Table: CREATE TABLE `app_user` ( `id` int(11) NOT NULL AUTO_INCREMENT, `ad_id` int(11) DEFAULT NULL, `source` varchar(32) DEFAULT NULL, `medium` varchar(32) DEFAULT NULL, `campaign` varchar(32) DEFAULT NULL, `timestamp` varchar(32) DEFAULT NULL, `createdOn` datetime DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `idx_source` (`source`), -------> Index on Column KEY `idx_medium` (`medium`(5)) -----> Index on Column Prefix ) ENGINE=InnoDB AUTO_INCREMENT=9349478 DEFAULT CHARSET=latin1 1 row in set (0.00 sec)

In MySQL 5.7, We can create an index on generated columns (Computed based on Expressions)

From MySQL 8.0.13, It is even easier by using the feature Functional Indexing. It is also implemented as a hidden virtual column. It inherits all restrictions that apply to generated columns,

Let’s see how it can ease DBA’s life.

The app_user’s table has around 9M records and data from Sep 2018.

mysql>select count(*) from app_user; +----------+ | count(*) | +----------+ | 9280573 | +----------+ 1 row in set (1.96 sec) mysql>select * from app_user limit 1\G *************************** 1. row *************************** id: 1 ad_id: 787977 source: google-play medium: organic campaign: timestamp: 2018-09-04T17:39:16+05:30 createdOn: 2018-09-04 12:09:20 1 row in set (0.00 sec)

Now Let us consider a case to query the records which are created in the month of May. I have used  pager md5sum for easier result verification.

mysql>pager md5sum PAGER set to 'md5sum' mysql>select * from app_user where month(createdOn)=5; 7e9e2b7bc2e9bde15504f6c5658458ab - 74322 rows in set (5.01 sec)

It took 5 seconds to fetch 74322 records. Here is the explain plan for the above query

mysql>explain select * from app_user where month(createdOn)=5\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: app_user partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 9176706 filtered: 100.00 Extra: Using where 1 row in set, 1 warning (0.00 sec)

No index is used. Let us try adding an index on column createdOn to speed up this query.

mysql>alter table app_user add index idx_createdon(createdOn); Query OK, 0 rows affected (44.55 sec) Records: 0 Duplicates: 0 Warnings: 0

Here is the explain plan post indexing

mysql>explain select * from app_user where month(createdOn)=5\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: app_user partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 9176706 filtered: 100.00 Extra: Using where 1 row in set, 1 warning (0.01 sec)

Even after adding index, it goes for full table scan (Indexing is not used) because we are using month() function in WHERE clause it masks the index usage. And there is no improvement in query performance. 

mysql> select * from app_user where month(createdOn)=5; 7e9e2b7bc2e9bde15504f6c5658458ab - 74322 rows in set (5.01 sec)

In this case, we need to rewrite the query replacing the date functions to use the index or else we can create a virtual column for the functional column in where clause of the query and create an index on top of it. But in MySQL 8.0, it makes our work even Simpler. We can create a functional index.

mysql>alter table app_user add index idx_month_createdon((month(createdOn))); Query OK, 0 rows affected (1 min 17.37 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql>explain select * from app_user where month(createdOn)=5\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: app_user partitions: NULL type: ref possible_keys: idx_month_createdon key: idx_month_createdon key_len: 5 ref: const rows: 1 filtered: 100.00 Extra: NULL 1 row in set, 1 warning (0.00 sec)

Now it is using an optimal (functional) index and query execution time is also reduced significantly.

mysql>select * from app_user where month(createdOn)=5; 7e9e2b7bc2e9bde15504f6c5658458ab - 74322 rows in set (0.29 sec)

But there are a few restrictions on creating a functional index.

 1) Only functions permitted for generated columns(5.7) are permitted for functional key parts.

2) Primary key cannot be included in functional key parts.

3) Spatial and full text indexes cannot have functional key parts

To drop the columns containing functional index, we need to remove index first before dropping the column else it will throw an error.

Let us try to drop the column createdOn (contains functional index).

mysql> alter table app_user drop column createdOn; ERROR 3755 (HY000): Cannot drop column 'createdOn' because it is used by a functional index. In order to drop the column, you must remove the functional index.

The functional index is an interesting feature in MySQL 8.0 and a must to tried out by DBA’s.

Featured Image Courtesy Photo by Antoine Dautry on Unsplash

From an empty box to MySQL custom replication in 3 minutes

Starting with version 1.32.0, dbdeployer has the ability of downloading a selection of MySQL tarballs from several sources.

This means that, when working in an empty box, you can populate it with database servers using

dbdeployer.

The “empty box” mentioned in the title is not really empty. It’s a Linux (or MacOS) host that is able to run a MySQL server. As such, it needs to have at least the prerequisites to run MySQL server (such as the libnuma and libaio packages), and a bash shell to run the scripts created by dbdeployer.

To try the thrill of an empty box that quickly becomes a working environment, we can use a docker image datacharmer/mysql-sb-base that I have created for this purpose.

$ docker pull datacharmer/mysql-sb-base
Using default tag: latest
latest: Pulling from datacharmer/mysql-sb-base
6b98dfc16071: Pull complete
4001a1209541: Pull complete
6319fc68c576: Pull complete
b24603670dc3: Pull complete
97f170c87c6f: Pull complete
b78c78fcfc94: Pull complete
379084573ce7: Pull complete
0afd193b699a: Pull complete
dfb4eecd399a: Pull complete
Digest: sha256:492c38b8662d393436141de5b3a9ad5b3994a7b095610b43896033fd740523ef
Status: Downloaded newer image for datacharmer/mysql-sb-base:latest

We can start a container from this image, and we won’t need anything else from the host computer.

$ docker run -ti --hostname dbtest datacharmer/mysql-sb-base bash
msandbox@dbtest:~$

The container runs as a regular user. Given that dbdeployer is designed specifically to run without root access (although it can run as root), this is the perfect scenario.

dbdeployer is already installed, but mysql is not.

msandbox@dbtest:~$ dbdeployer --version
dbdeployer version 1.34.0

msandbox@dbtest:~$ mysql
bash: mysql: command not found

Thus, we start getting our software from the locations that dbdeployer knows.

$ dbdeployer downloads list
Available tarballs
name OS version flavor size minimal
-------------------------------------------------------- ------- --------- -------- -------- ---------
tidb-master-linux-amd64.tar.gz Linux 3.0.0 tidb 26 MB
mysql-8.0.16-linux-glibc2.12-x86_64.tar.xz Linux 8.0.16 mysql 461 MB
mysql-8.0.16-linux-x86_64-minimal.tar.xz Linux 8.0.16 mysql 44 MB Y
mysql-5.7.26-linux-glibc2.12-x86_64.tar.gz Linux 5.7.26 mysql 645 MB
mysql-5.6.44-linux-glibc2.12-x86_64.tar.gz Linux 5.6.44 mysql 329 MB
mysql-5.5.62-linux-glibc2.12-x86_64.tar.gz Linux 5.5.62 mysql 199 MB
mysql-8.0.15-linux-glibc2.12-x86_64.tar.xz Linux 8.0.15 mysql 376 MB
mysql-8.0.13-linux-glibc2.12-x86_64.tar.xz Linux 8.0.13 mysql 394 MB
mysql-5.7.25-linux-glibc2.12-x86_64.tar.gz Linux 5.7.25 mysql 645 MB
mysql-5.6.43-linux-glibc2.12-x86_64.tar.gz Linux 5.6.43 mysql 329 MB
mysql-5.5.61-linux-glibc2.12-x86_64.tar.gz Linux 5.5.61 mysql 199 MB
mysql-5.1.73-linux-x86_64-glibc23.tar.gz Linux 5.1.73 mysql 134 MB
mysql-5.0.96.tar.xz Linux 5.0.96 mysql 5.5 MB Y
mysql-5.1.72.tar.xz Linux 5.1.72 mysql 10 MB Y
mysql-5.5.61.tar.xz Linux 5.5.61 mysql 6.6 MB Y
mysql-5.5.62.tar.xz Linux 5.5.62 mysql 6.6 MB Y
mysql-5.6.43.tar.xz Linux 5.6.43 mysql 9.0 MB Y
mysql-5.6.44.tar.xz Linux 5.6.44 mysql 9.1 MB Y
mysql-5.7.25.tar.xz Linux 5.7.25 mysql 23 MB Y
mysql-5.7.26.tar.xz Linux 5.7.26 mysql 23 MB Y
mysql-5.0.96-linux-x86_64-glibc23.tar.gz Linux 5.0.96 mysql 127 MB
mysql-4.1.22.tar.xz Linux 4.1.22 mysql 4.6 MB Y
mysql-cluster-gpl-7.6.10-linux-glibc2.12-x86_64.tar.gz Linux 7.6.10 ndb 916 MB
mysql-cluster-8.0.16-dmr-linux-glibc2.12-x86_64.tar.gz Linux 8.0.16 ndb 1.1 GB

The above command shows all the tarballs that are available for the current operating system. You see that in addition to vanilla MySQL, there are also NDB and TiDB packages.

We start by getting the latest MySQL version using the command get-unpack that is available since version 1.33.0. This command downloads the tarball, compares the checksum, and unpacks it into the expected place.

$ dbdeployer downloads get-unpack mysql-8.0.16-linux-x86_64-minimal.tar.xz
Downloading mysql-8.0.16-linux-x86_64-minimal.tar.xz
.... 44 MB
File /home/msandbox/mysql-8.0.16-linux-x86_64-minimal.tar.xz downloaded
Checksum matches
Unpacking tarball mysql-8.0.16-linux-x86_64-minimal.tar.xz to $HOME/opt/mysql/8.0.16
.........100.........200.219
Renaming directory /home/msandbox/opt/mysql/mysql-8.0.16-linux-x86_64-minimal to /home/msandbox/opt/mysql/8.0.16

The same operation for 5.7 gives us the second version available.

+ dbdeployer downloads get-unpack mysql-5.7.26.tar.xz
Downloading mysql-5.7.26.tar.xz
.. 23 MB
File /home/msandbox/mysql-5.7.26.tar.xz downloaded
Checksum matches
Unpacking tarball mysql-5.7.26.tar.xz to $HOME/opt/mysql/5.7.26
.........99
Renaming directory /home/msandbox/opt/mysql/mysql-5.7.26 to /home/msandbox/opt/mysql/5.7.26

Now there are two versions that can be used for operations.

$ dbdeployer versions
Basedir: /home/msandbox/opt/mysql
5.7.26 8.0.16

And we are going to deploy one sandbox from each version, because we want to put them in replication.

$ dbdeployer deploy single 5.7.26 --master
Creating directory /home/msandbox/sandboxes
Database installed in $HOME/sandboxes/msb_5_7_26
run 'dbdeployer usage single' for basic instructions'
. sandbox server started

$ dbdeployer deploy single 8.0.16 --master
Database installed in $HOME/sandboxes/msb_8_0_16
run 'dbdeployer usage single' for basic instructions'
.. sandbox server started


$ dbdeployer sandboxes --full-info
.------------.--------.---------.---------------.--------.-------.--------.
| name | type | version | ports | flavor | nodes | locked |
+------------+--------+---------+---------------+--------+-------+--------+
| msb_5_7_26 | single | 5.7.26 | [5726 ] | mysql | 0 | |
| msb_8_0_16 | single | 8.0.16 | [8016 18016 ] | mysql | 0 | |
'------------'--------'---------'---------------'--------'-------'--------'

This are our active assets. The sandboxes are independent, but each sandbox has the ability of becoming the receiver of replication. In this case we want to replicate from version 5.7 to version 8.0, as it is always recommended to replicate from earlier to later version.

$ ~/sandboxes/msb_8_0_16/replicate_from msb_5_7_26
Connecting to /home/msandbox/sandboxes/msb_5_7_26
--------------
CHANGE MASTER TO master_host="127.0.0.1",
master_port=5726,
master_user="rsandbox",
master_password="rsandbox"
, master_log_file="mysql-bin.000001", master_log_pos=4089
--------------

--------------
start slave
--------------

Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 4089
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Exec_Master_Log_Pos: 4089
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0


Replication is active. We can now quickly check that it is working:

$ ~/sandboxes/msb_5_7_26/use -e 'create table test.t1(id int not null primary key, msg1 varchar(50), msg2 varchar(50)) default charset=utf8mb4'

$ ~/home/msandbox/sandboxes/msb_5_7_26/use -e 'insert into test.t1 values (1, @@version, @@server_uuid)'

We create a table in 5.7, taking care of using a character set that agrees with 8.0 defaults (we could also use utf8, but this is the one that presents less potential problems. We fill the table with server specific information (its version and UUID).


Now we can check that the slave is working


$ ~/sandboxes/msb_8_0_16/use -e 'SHOW SLAVE STATUS\G' | grep 'Running\|Master_\|Log_'
Master_Host: 127.0.0.1
Master_User: rsandbox
Master_Port: 5726
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 4636
Relay_Log_File: mysql-relay.000002
Relay_Log_Pos: 868
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Exec_Master_Log_Pos: 4636
Relay_Log_Space: 1072
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Master_SSL_Verify_Server_Cert: No
Master_Server_Id: 5726
Master_UUID: 00005726-0000-0000-0000-000000005726
Master_Info_File: mysql.slave_master_info
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Master_SSL_Crl:
Master_SSL_Crlpath:
Master_TLS_Version:
Master_public_key_path:

And finally we retrieve from the 8.0 slave the data that was created in 5.7

$ ~/sandboxes/msb_8_0_16/use -e 'show tables from test'
+----------------+
| Tables_in_test |
+----------------+
| t1 |
+----------------+

$ ~/sandboxes/msb_8_0_16/use -e 'select * from test.t1'
+----+------------+--------------------------------------+
| id | msg1 | msg2 |
+----+------------+--------------------------------------+
| 1 | 5.7.26-log | 00005726-0000-0000-0000-000000005726 |
+----+------------+--------------------------------------+

QED.

Quickly configure replication using DBdeployer [SandBox]

Quickly configure replication using DBdeployer [SandBox]
We might have different scenarios when we need a quick setup of replication either between the same version of MySQL (Like 8.0 --> 8.0) or between the different version of MySQL (Like 5.7 --> 8.0) to perform some testings. 
Here in this blog post, I will explain how we can create our replication lab setup quickly using the virtual machine and DBdeployer tool. 
Let's see, how to create replication between the same version and different version of MySQL using DBdeployer step by step.  Create CentOS VM Please find my this blog post link where you will get instruction about, how you can create CentOS virtual machine using the vagrant.  Install DBdeployer tool-- Run below command to install the latest DBdeployer package.

[root@centos7-test-vm ~]# yum -y install wget [root@centos7-test-vm ~]# VERSION=1.33.0 [root@centos7-test-vm ~]# OS=linux [root@centos7-test-vm ~]# origin=https://github.com/datacharmer/dbdeployer/releases/download/v$VERSION [root@centos7-test-vm ~]# wget $origin/dbdeployer-$VERSION.$OS.tar.gz [root@centos7-test-vm ~]# tar -xzf dbdeployer-$VERSION.$OS.tar.gz [root@centos7-test-vm ~]# chmod +x dbdeployer-$VERSION.$OS [root@centos7-test-vm ~]# sudo mv dbdeployer-$VERSION.$OS /usr/local/bin/dbdeployer  Note: Once writing these steps, we had the latest Dbdeployer version released 1.33.0. You make sure to check the latest version released on below page and make accordingly value of "VERSION=XX.XX.XX" above in 2nd command.  https://github.com/datacharmer/dbdeployer/releases
-- Verify the installed dbdeployer version

[root@centos7-test-vm ~]# dbdeployer --version dbdeployer version 1.33.0 [root@centos7-test-vm ~]#  Download and unpack Percona server tarball -- Download the latest Percona Server 5.7 & 8.0 binary tarball package from below links under section "Linux-Generic." https://www.percona.com/downloads/Percona-Server-5.7/LATEST/binary/tarball/ https://www.percona.com/downloads/Percona-Server-8.0/LATEST/binary/tarball/
Note: Here if you are confused which SSL package tarball you should download, then please check my this blog post.
[root@centos7-test-vm ~]# wget https://www.percona.com/downloads/Percona-Server-5.7/Percona-Server-5.7.26-29/binary/tarball/Percona-Server-5.7.26-29-Linux.x86_64.ssl101.tar.gz
[root@centos7-test-vm ~]# wget https://www.percona.com/downloads/Percona-Server-8.0/Percona-Server-8.0.15-6/binary/tarball/Percona-Server-8.0.15-6-Linux.x86_64.ssl.tar.gz 
-- Unpack tarball packages using the DBdeployer tool

[root@centos7-test-vm ~]# pwd /root
[root@centos7-test-vm ~]# ls -lth Percona*tar.gz -rw-r--r--. 1 root root  64M Jun  1 11:50 Percona-Server-5.7.26-29-Linux.x86_64.ssl101.tar.gz -rw-r--r--. 1 root root 687M May  3 12:50 Percona-Server-8.0.15-6-Linux.x86_64.ssl.tar.gz
[root@centos7-test-vm ~]# mkdir -p /root/opt/mysql
[root@centos7-test-vm ~]# dbdeployer unpack --prefix=ps Percona-Server-5.7.26-29-Linux.x86_64.ssl101.tar.gz
[root@centos7-test-vm ~]# dbdeployer unpack --prefix=ps Percona-Server-8.0.15-6-Linux.x86_64.ssl.tar.gz 
-- Check the list of installed/available tarball binary packages for DBdeployer.

[root@centos7-test-vm ~]# dbdeployer versions Basedir: /root/opt/mysql ps5.7.26  ps8.0.15 [root@centos7-test-vm ~]# 
Configure replication between same Percona server version (8.0 --> 8.0) -- Run below command to configure replication between the Percona Server 8.0 version, i.e. both Master and 2 slaves will be on the same version. 
[root@centos7-test-vm ~]# dbdeployer deploy replication ps8.0 # ps8.0 => ps8.0.15 Installing and starting master .. sandbox server started Installing and starting slave1 ... sandbox server started Installing and starting slave2 ...... sandbox server started $HOME/sandboxes/rsandbox_ps8_0_15/initialize_slaves initializing slave 1 initializing slave 2 Replication directory installed in $HOME/sandboxes/rsandbox_ps8_0_15 run 'dbdeployer usage multiple' for basic instructions' [root@centos7-test-vm ~]# 
-- To access each node, go to the installed replication directory "$HOME/sandboxes/rsandbox_ps8_0_15" [root@centos7-test-vm ~]# cd $HOME/sandboxes/rsandbox_ps8_0_15 -------------------------------------------------------------- [root@centos7-test-vm rsandbox_ps8_0_15]# sh m Welcome to the MySQL monitor.  Commands end with ; or \g. Your MySQL connection id is 12 Server version: 8.0.15-6 Percona Server (GPL), Release 6, Revision 63abd08 * * master [localhost:20516] {msandbox} ((none)) > -------------------------------------------------------------- [root@centos7-test-vm rsandbox_ps8_0_15]# sh s1 Welcome to the MySQL monitor.  Commands end with ; or \g. Your MySQL connection id is 12 Server version: 8.0.15-6 Percona Server (GPL), Release 6, Revision 63abd08 * * slave1 [localhost:20517] {msandbox} ((none)) > -------------------------------------------------------------- [root@centos7-test-vm rsandbox_ps8_0_15]# sh s2 Welcome to the MySQL monitor.  Commands end with ; or \g. Your MySQL connection id is 12 Server version: 8.0.15-6 Percona Server (GPL), Release 6, Revision 63abd08 * * slave2 [localhost:20518] {msandbox} ((none)) >

-- Test replication using test_replication script which will create test table & insert some data inside test table on the master node and then it verifies created data on master node got replicated on slave node. 
[root@centos7-test-vm rsandbox_ps8_0_15]# sh test_replication # master log: mysql-bin.000001 - Position: 14696 - Rows: 20 # Testing slave #1 ok - slave #1 acknowledged reception of transactions from master ok - slave #1 IO thread is running ok - slave #1 SQL thread is running ok - Table t1 found on slave #1 ok - Table t1 has 20 rows on #1 # Testing slave #2 ok - slave #2 acknowledged reception of transactions from master ok - slave #2 IO thread is running ok - slave #2 SQL thread is running ok - Table t1 found on slave #2 ok - Table t1 has 20 rows on #2 # Tests :    10 # failed:     0 (  0.0%) # PASSED:    10 (100.0%) # exit code: 0 [root@centos7-test-vm rsandbox_ps8_0_15]# 
-- Check replication status using the check_slaves script.

[root@centos7-test-vm rsandbox_ps8_0_15]# sh check_slaves master port    20516 - server_id    100              File: mysql-bin.000001          Position: 14696 Executed_Gtid_Set: slave1 port    20517 - server_id    200               Master_Log_File: mysql-bin.000001           Read_Master_Log_Pos: 14696              Slave_IO_Running: Yes             Slave_SQL_Running: Yes           Exec_Master_Log_Pos: 14696            Retrieved_Gtid_Set:             Executed_Gtid_Set:                 Auto_Position: 0 slave2 port    20518 - server_id    300               Master_Log_File: mysql-bin.000001           Read_Master_Log_Pos: 14696              Slave_IO_Running: Yes             Slave_SQL_Running: Yes           Exec_Master_Log_Pos: 14696            Retrieved_Gtid_Set:             Executed_Gtid_Set:                 Auto_Position: 0 [root@centos7-test-vm rsandbox_ps8_0_15]# 
Configure replication between different Percona server version (5.7 --> 8.0) To configure replication between different MySQL version using the dbdeployer, we will need to first deploy the individual MySQL 5.7 and 8.0 instances using the dbdeployer, and then we can use dbdeployer's replicate_from script to configure the replication. 
-- Deploy the individual/single instances of Percona server 5.7 and 8.0 versions using their unpacked binary packages. 
[root@centos7-test-vm ~]# dbdeployer versions Basedir: /root/opt/mysql ps5.7.26  ps8.0.15
[root@centos7-test-vm ~]# dbdeployer deploy single ps5.7.26 Database installed in $HOME/sandboxes/msb_ps5_7_26 run 'dbdeployer usage single' for basic instructions' .. sandbox server started
[root@centos7-test-vm ~]# dbdeployer deploy single ps8.0.15 Database installed in $HOME/sandboxes/msb_ps8_0_15 run 'dbdeployer usage single' for basic instructions' ...... sandbox server started
-- Jump to installed database directory and then connect to MySQL prompt of each instance. [root@centos7-test-vm ~]# cd $HOME/sandboxes/msb_ps5_7_26
[root@centos7-test-vm msb_ps5_7_26]# sh use Welcome to the MySQL monitor.  Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.7.26-29 Percona Server (GPL), Release 29, Revision 11ad961 * * mysql [localhost:5726] {msandbox} ((none)) > select @@global.log_bin,@@global.server_id; +------------------+--------------------+ | @@global.log_bin | @@global.server_id | +------------------+--------------------+ |                0 |                  0 | +------------------+--------------------+ 1 row in set (0.00 sec)

[root@centos7-test-vm rsandbox_ps8_0_15]# cd $HOME/sandboxes/msb_ps8_0_15
[root@centos7-test-vm msb_ps8_0_15]# sh use Welcome to the MySQL monitor.  Commands end with ; or \g. Your MySQL connection id is 9 Server version: 8.0.15-6 Percona Server (GPL), Release 6, Revision 63abd08 * * mysql [localhost:8015] {msandbox} ((none)) > select @@global.log_bin,@@global.server_id; +------------------+--------------------+ | @@global.log_bin | @@global.server_id | +------------------+--------------------+ |                1 |                  1 | +------------------+--------------------+ 1 row in set (0.02 sec)
-- Above we can see, by default, in MySQL 5.7 version binary log is disabled as well as the server-id is also not set for the instance. So to setup replication, we need to enable binary log on MySQL 5.7, and also we need to set server-id. It is straightforward to change in dbdeployer using the add_option script, which will add the mentioned parameters inside the cnf file and then restart the mysqld service. 
[root@centos7-test-vm msb_ps5_7_26]# sh add_option log-bin server-id=10 # option 'log-bin' added to configuration file # option 'server-id=10' added to configuration file stop /root/sandboxes/msb_ps5_7_26 . sandbox server started
[root@centos7-test-vm msb_ps5_7_26]# sh use -e"select @@global.log_bin,@@global.server_id" +------------------+--------------------+ | @@global.log_bin | @@global.server_id | +------------------+--------------------+ |                1 |                 10 | +------------------+--------------------+ [root@centos7-test-vm msb_ps5_7_26]# 
-- Go to the installed database directory of Percona server 8.0 and then configure the replication with Percona Server 5.7 using the replicate_from script. 

[root@centos7-test-vm ~]# dbdeployer sandboxes msb_ps5_7_26             :   single         ps5.7.26   [5726 ] msb_ps8_0_15             :   single         ps8.0.15   [8015 18015 ]
[root@centos7-test-vm ~]# cd $HOME/sandboxes/msb_ps8_0_15
[root@centos7-test-vm msb_ps8_0_15]# sh replicate_from msb_ps5_7_26 Connecting to /root/sandboxes/msb_ps5_7_26 -------------- CHANGE MASTER TO master_host="127.0.0.1", master_port=5726, master_user="rsandbox", master_password="rsandbox" , master_log_file="centos7-test-vm-bin.000001", master_log_pos=154 --------------

-------------- start slave --------------

              Master_Log_File: centos7-test-vm-bin.000001           Read_Master_Log_Pos: 154              Slave_IO_Running: Yes             Slave_SQL_Running: Yes           Exec_Master_Log_Pos: 154            Retrieved_Gtid_Set:             Executed_Gtid_Set:                 Auto_Position: 0 [root@centos7-test-vm msb_ps8_0_15]# 
-- Here we go, our replication setup is ready between the different Percona server version 5.7 and 8.0. We can see from the file active_replication present on Percona server 8.0 directory that this sandbox is connected to Percona server 5.7.

[root@centos7-test-vm msb_ps8_0_15]# cat active_replication Sat Jun  1 13:18:01 UTC 2019 Connected to /root/sandboxes/msb_ps5_7_26 [root@centos7-test-vm msb_ps8_0_15]# 
-- We can also test our replication setup by creating a test table and inserting some random data on master 5.7 and then check data on the slave 8.0 and verify the replication status.

MASTER: 5.7
mysql [localhost:5726] {msandbox} ((none)) > use test Database changed mysql [localhost:5726] {msandbox} (test) > create table testing (id int); Query OK, 0 rows affected (0.03 sec)
mysql [localhost:5726] {msandbox} (test) > insert into testing values (1),(2),(3); Query OK, 3 rows affected (0.08 sec) Records: 3  Duplicates: 0  Warnings: 0
mysql [localhost:5726] {msandbox} ((none)) > show master status\G *************************** 1. row ***************************              File: centos7-test-vm-bin.000001          Position: 590      Binlog_Do_DB:  Binlog_Ignore_DB: Executed_Gtid_Set: 1 row in set (0.00 sec)
SLAVE: 8.0
mysql [localhost:8015] {msandbox} ((none)) > select * from test.testing; +------+ | id   | +------+ |    1 | |    2 | |    3 | +------+ 3 rows in set (0.00 sec)
mysql [localhost:8015] {msandbox} ((none)) > show slave status\G *************************** 1. row ***************************                Slave_IO_State: Waiting for master to send event                   Master_Host: 127.0.0.1                   Master_User: rsandbox                   Master_Port: 5726                 Connect_Retry: 60               Master_Log_File: centos7-test-vm-bin.000001           Read_Master_Log_Pos: 590                Relay_Log_File: centos7-test-vm-relay-bin.000002                 Relay_Log_Pos: 767         Relay_Master_Log_File: centos7-test-vm-bin.000001              Slave_IO_Running: Yes             Slave_SQL_Running: Yes               Replicate_Do_DB:           Replicate_Ignore_DB:            Replicate_Do_Table:        Replicate_Ignore_Table:       Replicate_Wild_Do_Table:   Replicate_Wild_Ignore_Table:                    Last_Errno: 0                    Last_Error:                  Skip_Counter: 0           Exec_Master_Log_Pos: 590               Relay_Log_Space: 985               Until_Condition: None                Until_Log_File:                 Until_Log_Pos: 0            Master_SSL_Allowed: No            Master_SSL_CA_File:            Master_SSL_CA_Path:               Master_SSL_Cert:             Master_SSL_Cipher:                Master_SSL_Key:         Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No                 Last_IO_Errno: 0                 Last_IO_Error:                Last_SQL_Errno: 0                Last_SQL_Error:   Replicate_Ignore_Server_Ids:              Master_Server_Id: 10                   Master_UUID: 00005726-0000-0000-0000-000000005726              Master_Info_File: mysql.slave_master_info                     SQL_Delay: 0           SQL_Remaining_Delay: NULL       Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates            Master_Retry_Count: 86400                   Master_Bind:       Last_IO_Error_Timestamp:      Last_SQL_Error_Timestamp:                Master_SSL_Crl:            Master_SSL_Crlpath:            Retrieved_Gtid_Set:             Executed_Gtid_Set:                 Auto_Position: 0          Replicate_Rewrite_DB:                  Channel_Name:            Master_TLS_Version:        Master_public_key_path:         Get_master_public_key: 0 1 row in set (0.00 sec)

Hope this blog post will be helpful for you to play with dbdeployer and create different replication setups quickly for you inside your test VM. 
*******

Photo by Gabriel Gusmao on Unsplash

TaskMax limit affects MySQL connections

Recently we had been bitten by a Systemd limitation at the “Tasks” created per-unit ie., process. This includes both the kernel threads and user-space threads, with each thread counting individually.

Am writing this blog as a reference for someone who might come across this limitation.

We have been actively working on migration DB instances, from one DC to the newly built DC .The instances on the newer DC were provisioned with the latest hardware and latest Debian OS. Below is the detailed spec of the system.

RAM             : 244G
Core             : 44Core
HardDisk.   : SSD
IOPS             : 120K
OS                 : Debian GNU/Linux 9 (stretch)
Kernel.         : 4.9.0-8-amd64
MySQL Version : 5.6.43-84.3-log Percona Server (GPL)

We had setup MySQL on this machine, with all optimal configurations to perform the NFR(Non-Functional Requirement) Testing. The test was performed using Jmeter and we monitored the performance of the db server with PMM.

The connection limit (max_connections) was set in the DB servers as 15k . We were doing a series of test by increasing the no.of app boxes, which increases the number of threads connecting to MySQL Server. When the connection reach 6.5k to 7k mark we started to observer a few errors in the application log , mostly related to connections failures.

Previously we had a limitation with open files limit (ulimit) which affects our connection and MySQL performance and blogged it too.

As you can see in the below graph there is some discontinuity at 7K connection, this clearly shows that we are hitting a resource limit.

Error Description:

We started to debug the issue. At first place we had checked the mysql error log, we could see the below error.

2019-06-06 17:48:42 14704 [ERROR] Error log throttle: 19866 'Can't create thread to handle new connection' error(s) suppressed 2019-06-06 17:48:42 14704 [ERROR] Can't create thread to handle request (errno= 11) 2019-06-06 17:49:42 14704 [ERROR] Error log throttle: 17886 'Can't create thread to handle new connection' error(s) suppressed 2019-06-06 17:49:42 14704 [ERROR] Can't create thread to handle request (errno= 11)

The “errno=11” gives us the clue that its a system (kernel) related issue at the first sight.

# perror 11 OS error code 11: Resource temporarily unavailable

The task limit hit can be observed in Systemctl status for MySQL service.

# systemctl status mysql ● mysql.service - LSB: Start and stop the mysql (Percona Server) daemon Loaded: loaded (/etc/init.d/mysql; generated; vendor preset: enabled) Active: active (running) since Wed 2019-05-22 15:31:04 IST; 2 weeks 1 days ago Docs: man:systemd-sysv-generator(8) Process: 38489 ExecStop=/etc/init.d/mysql stop (code=exited, status=0/SUCCESS) Process: 38515 ExecStart=/etc/init.d/mysql start (code=exited, status=0/SUCCESS) Tasks: 6758 (limit: 6758) CGroup: /system.slice/mysql.service ├─14704 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log- └─38542 /bin/sh /usr/bin/mysqld_safe What is TaskMax?

Below is the definition from man page:

Specify the maximum number of tasks that may be created in the unit. This ensures that the number of tasks accounted for the unit stays below a specific limit. This either takes an absolute number of tasks or a percentage value that is taken relative to the configured maximum number of tasks on the system. If assigned the special value “infinity”, no tasks limit is applied. 
The system default for this setting may be controlled with DefaultTasksMax=N

Analysis

To reproduce the issue we had restarted the DB servers, in order to flush the connections. As expected we had the same limitation at the same level of connections(6.5-7k)

After a few analysis we found that this limitation was imposed by “systemd”, with respect to its release version “v228”, the task limit for all services would be limited to a value of “512

We checked our systemd version as below and found the version to be greater than V228

/var/lib/mysql# systemd --version systemd 232 +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN Fix:

Now let’s proceed to see how to increase this limit, since this would be a blocker for high concurrent environment.

The Task limit can be increased both at the system level as well as at the service level.

System-level:

Increasing the limit at system level, affects all the process running.

Let us view the Task limit on process at OS level

# systemctl show --property=DefaultTasksMax DefaultTasksMax=6758

Increasing the limit at the OS level (systemd) applies to all the process running, setting at system level is online, does not require a restart and applied immediately

Edit the systemd conf file as below, with the required value of Taskmax

#vi /etc/systemd/system.conf DefaultTasksMax=10000

Then proceed to reload the system config as below

#systemctl daemon-reexec

Now you can see the increased limit with the process of mysql as below

# systemctl show mysql | grep -i task TasksCurrent=25 TasksAccounting=yes TasksMax=10000

The downside of setting in the system-level is that, it might cause a race situation with other multi-threaded process if its on the same machine as the DB. If its a dedicated DB machine this would be the quickest method to fix the issue online.

Process-level

Now lets see to set the limit at the process level, edit the service script of a process say mysql.service and add the below lines under the service tag as below

# systemctl edit mysql (or) # vi /etc/systemd/system/mysql.service.d/override.conf [Service] TasksMax=10000 # systemctl daemon-reload (to reload)

Note by increasing the limit at process-level(service) needs a restart/reload of service.

General Guide lines:
  • Ensure needed OS limits are set in MySQL system. Do not trust Kernel defaults, they are prone to change.
  • Compute your max_connections, based on the no.of.connections per app boxes and also set the TaskMax accordingly.
  • Set the time-out values accordingly in DB to close inactive client connections.
  • If you don’t have a control over the connections made by application, I would recommend to have proxysql which does connection pooling & multiplexing.

Featured Image Courtesy by Clint Adair on Unsplash

Galera Cluster 4 with MariaDB 10.4

Congratulations to Team MariaDB at MariaDB Corporation and MariaDB Foundation for releasing MariaDB 10.4.6 as Generally Available (GA) last week on 18 June 2019. This release is very exciting for Galera Cluster users as it comes with Galera 4 (it is now the first server to come with it!), with Galera wsrep library version 26.4.2.

What can Galera Cluster users expect from MariaDB 10.4? Some high level features include:

  • Streaming replication — a huge boost to large transaction support, since the node breaks transactions into fragments, replicates and certifies it across all secondary nodes while the transaction is still in progress. Read more about it in our dedicated documentation on streaming replication as well as a guide on using streaming replication (yes, you have to enable it first).
  • Galera System Tables — there are three new tables added to the mysql database: wsrep_cluster, wsrep_cluster_members, and wsrep_streaming_log. As a database administrator, you can see the cluster activity — again, please read the documentation on system tables, and note that if you do not have streaming replication enabled, you will not see anything in wsrep_streaming_log.
  • Synchronisation functions — these are SQL functions for use in wsrep synchronisation operations, like getting the GTID based on the last write or last seen transaction, as well as setting the node to wait for a specific GTID to replicate and apply, before executing the next transaction. 

But that is not all — recently a presentation by our CEO, Seppo Jaakola, can also shed some light into new features, and the roadmap. Please read: Galera 4 in MariaDB 10.4 for more information.

Both Team Codership and Team MariaDB have worked hard to ensure that there can be rolling upgrades performed from Galera Cluster in MariaDB Server 10.3 to MariaDB Server 10.4, and we highly recommend that you read the upgrade documentation: Upgrading from MariaDB 10.3 to MariaDB 10.4 with Galera Cluster.

So what are you waiting for? Give MariaDB Server 10.4 with Galera Cluster 4 a try (download it), and provide us some feedback. Bugs can of course be reported to the MariaDB Jira. We will monitor the maria-discuss and maria-developers mailing lists but don’t forget to ask specific Galera Cluster questions at our Google Group.

The '$' As The JSON Document

Recently on Stackoverflow was a question about the handling of  a JSON document stored in a MySQL Database. The data looked like this:

[{"name":"cdennett","address":"123 street","Postcode":"ABCDE"}]

The data above is valid JSON. That data is in an array because it is surrounded by []s while objects are surrounded by {}s. And the author was trying to use the following JSON_TABLE function

SELECT people.* 
FROM t1, 
     JSON_TABLE(json_col, '$.people[*]' 
COLUMNS (
                name VARCHAR(40)  PATH '$.name',
                address VARCHAR(100) PATH '$.address')
     ) people;

Some you who have been using the JSON data type are probably smiling at the code above.It is a simple problem caused by confusion about the path of the JSON document. The problem is the way the data is referenced. Or to put it another way the path to the document is wrong. And, sadly, you probably only run into this after getting confused and having to learn the proper way to look at the path of a JSON document.

'$' is Your DocumentThe '$' character refers to the entire document. Is we use JSON_EXTRACT or the arrow operator it is easy to retrieve the entire document. 

 select json_extract(json_col,'$') from t1;
+----------------------------------------------------------------------+
| json_extract(json_col,'$')                                           |
+----------------------------------------------------------------------+
| [{"name": "cdennett", "address": "123 street", "Postcode": "ABCDE"}] |
+----------------------------------------------------------------------+

Or we could use the shortcut method  select json_col->"$" from t1; to get the same results.

Where this gets confusing is if you use a function like JSON_KEYS like such:

select JSON_KEYS(json_col) from t1x;
+---------------------+
| JSON_KEYS(json_col) |
+---------------------+
| NULL                |
+---------------------+

Why did we get a NULL and not the same output as from the JSON_EXTRACT?  Well, JSON_KEYS will return null if a) any argument is NULL, the document in question is not an object, or a path.  And an array is not an object.

But other functions are not bothered by the fact that the document is not an object.

select json_pretty(json_col) from t1;
+------------------------------------------------------------------------------------------+
| json_pretty(json_col)                                                                    |
+------------------------------------------------------------------------------------------+
| [
  {
    "name": "cdennett",
    "address": "123 street",
    "Postcode": "ABCDE"
  }
] |
+------------------------------------------------------------------------------------------+

So if '$' is [{"name":"cdennett","address":"123 street","Postcode":"ABCDE"}] how do we 'peel off one onion layer' to get to the {"name":"cdennett","address":"123 street","Postcode":"ABCDE"}?

The contents of the first array is denoted as $[0].

'$' is [{"name":"cdennett","address":"123 street","Postcode":"ABCDE"}]
and
'$[0] is {"name":"cdennett","address":"123 street","Postcode":"ABCDE"}
and
'$[0].name is "cdbennet"

So if we refer back to the code snippet from Stackoverflow, it becomes evident that the path was certainly not '$.people[*]' but '$[*]' or '$[0]'.

ConclusionsSo we end up with two conclusions. First is that to remember that '$' refers to the entire document and walking down the document of the structure means walking down a path that starts at '$'. And second, you might want to consider not burring things in a top level array.
























Blog from the Top — The Changing Landscape of Open Source (MySQL) and Money

This is the first in a series of blog articles in which I will discuss the changing landscape of open source and money. Or, more specifically, open source databases and money. And even more specifically MySQL and its all variants (AWS Aurora, MariaDB, Percona Server, RDS/MySQL) and money. But before going too deep into what is changing, let’s review all the traditional business models in and around the MySQL marketplace.

In general, these are the following types of companies in the MySQL commercial ecosystem, sorted by total annual revenue and addressable market size:

  • Developers who do not aim to monetize the open source code, just provide value to others and hope to get development and other contributions in return. This is the purest form of open source. For example, all Apache-licensed projects fall into this category. One Apache-licensed project in the MySQL community was the Tungsten Replicator. The Tungsten Replicator project was ultimately closed and is now being offered via AWS as a low-cost AMI offering (more about that in future blog articles). By definition, this market has zero return, and the development must be supported by sponsorships or other contributions.
  • Developers who aim to monetize the open source code by offering consulting and support services for the open source software they have developed. As an example, this category includes various projects by Percona, such as Percona Server, XtraDB Backup, PMM for monitoring, and also their ‘fork’ projects from others, such as XtraDB Cluster (Galera). Naturally, Galera Cluster itself by Codership belongs in this category. The total market in this category is relatively small, a few tens of millions. Due to the lack of any real hook, besides excellent service itself, the companies in this space often struggle to maintain a growing, recurring revenue stream as the open source users tend to drop the support providers after their initial learning curve of these solutions has been covered. On that note, I recently spoke on this topic at the Percona Live Conference in Austin, Texas. You can watch the recording of my talk here:

https://continuent-videos.s3.amazonaws.com/Eero_Keynote_Percona_Live_Austin_2019.mp4

 

  • Commercial software companies who are monetizing their proprietary solutions by adding value on top of open source projects. The companies offering various recurring monthly or annual subscription models in this category include Severalnines, VividCortex, and ourselves, amongst others. This space has historically attracted the most VC money, reaching well over $500M during the past 10 years, as it has been the easiest business model to understand by the traditional VCs. Unfortunately, despite the promise of hundreds of millions of annual recurring revenue from this marketplace, many of these startups failed to generate enough revenue due to the natural rejection by many open source aficionados who prefer the, often sub-standard, ‘free’ open-source-based home-grown solution over the more polished commercial offerings. But to be fair, many of these VC funded companies failed because they were addressing the wrong problems and needs.
  • Developers who aim to monetize the open source code by offering enhanced, tested, and supported commercial versions of the core open source software. MariaDB is the leading contender in this category. MariaDB is available as open source, but several features and functionality are only available in the MariaDB Enterprise version. Naturally, Oracle is also approaching the market in the same way with its MySQL Enterprise edition. These solutions create the foundation for the whole market, but do not command the most revenue generated in the MySQL marketplace. Oracle and MariaDB combined are making in the low hundreds of millions of dollars from their own offerings, which is far less than the actual value they create.
  • Consulting companies, which are specialized organizations that bundle multiple open source projects to create more complete solutions. They aim to monetize these bundles by providing ad-hoc consulting and support services, or annual support subscriptions. Datavail, MariaDB, Percona and Pythian are the most well-known database-focused entities in this space, but practically all IT consulting companies around the world offer these services. The total market of these services is several hundreds of millions of dollars. Since, in many cases, these companies offer a DBA replacement solutions, this is a more ’sticky’ business model and creates a better overall, more steadily growing recurring revenue path. Then again, due the labor-intensive approach, the profitability of these services is not as high as the software subscription-focused offerings.
  • Last (but definitely not least!), come Cloud Providers who take over existing open source projects, modify them for their own needs, and offer these open source solution-as-a-services, often without contributions back to the project. A prime (pun intended) example of these providers is Amazon, which some people have called out for abusing the open source ecosystem. Amazon is making money hand-over-fist with its DBaaS offerings, especially Aurora. It is a multi-billion-dollar business. Maybe more money than the rest of the MySQL-related companies, combined, during the past 10 years. This topic was also covered during my keynote at the recent Percona Live Conference (embedded above).

Obviously, there are also other software companies, like Facebook and Google, who take over an existing open source project, modify it for their own needs and use the project as part of their own service. I will keep them out of the further analysis of the MySQL and Money, even though these companies are part of the ecosystem, especially with their software contributions back to the projects, and revenues of tens or hundreds of billions of dollars from their open source database platforms.

Interestingly enough, as you can see from the above list, companies that are actually doing core open source development have not fared nearly as well as those that take advantage of the solutions and market created by these open source offerings. Traditionally, it was the consulting companies taking the cream from the top, but increasingly it is the Cloud Providers.

In a future blog post, I will take a deeper dive into each of these paths that develop and monetize open source solutions. The focus will be on the commercial aspects, i.e. how to create a sustainable model to fund the development, and how to reap some financial benefits from these efforts when possible and feasible. As they say, money makes the world go around.

Open source won’t survive on good faith alone. Stay tuned.

Smooth sailing,

Eero Teerikorpi, Founder and CEO of Continuent

Image Source: Dreamstime

Pages