Planet MySQL

Percona Blog Poll: How Do You Currently Host Applications and Databases?

Percona latest blog poll asks how you currently host applications and databases. Select an option below, or leave a comment to clarify your deployment!

With the increased need for environments that respond more quickly to changing business demands, many enterprises are moving to the cloud and hosted deployments for applications and software in order to offload development and maintenance overhead to a third party. The database is no exception. Businesses are turning to using database as a service (DBaaS) to handle their data needs.

DBaaS provides some obvious benefits:

  • Offload physical infrastructure to another vendor. It is the responsibility of whoever is providing the DBaaS service to maintain the physical environment – including hardware, software and best practices.
  • Scalability. You can add or subtract capacity as needed by just contacting your vendor. Have a big event on the horizon? Order more servers!
  • Expense. Since you no longer have shell out for operational costs or infrastructure upgrades (all handled by the vendor now), you can reduce capital and operation expenses – or at least reasonably plan on what they are going to be.

There are some potential disadvantages to a DBaaS as well:

  • Network performance issues. If your database is located off-premises, then it can be subject to network issues (or outages) that are beyond your control. These can translate into performance problems that impact the customer experience.
  • Loss of visibility. It’s harder (though not impossible) to always know what is happening with your data. Decisions around provisioning, storage and architecture are now in the hands of a third party.
  • Security and compliance. You are no longer totally in control of how secure or compliant your data is when using a DBaaS. This can be crucial if your business requires certain standards to operate in your market (healthcare, for example).

How are you hosting your database? On-premises? In the cloud? Which cloud? Is it co-located? Please answer using the poll below. Choose up to three answers. If you don’t see your solutions, use the comments to explain.

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.

Thanks in advance for your responses – they will help the open source community determine how databases are being hosted.

FOSDEM MySQL & Friends Devroom – Call for Papers ends December 1st !

Hi all,

The Fosdem MySQL & Friends devroom Committee is now completed and you can find it on http://www.mysqlandfriends.eu/committee/

As you can see some old friends are back in the Committee and we have also new names. As usual all big 3 MySQL distributions are present in the Committee and several Community Members are completing the list.

Back to the topic of this article, the CfP will end on December 1st to let us review the proposed talks, check with the speakers and prepare the schedule.

Don’t hesitate too long, please submit using https://penta.fosdem.org/submission/FOSDEM18/

Sakila greetings !

lefred.

Fun with Bugs #56 - On Some Public Bugs Fixed in MySQL 5.7.20

While MySQL 8.0.x hardly has much impact on my regular work, recent MySQL 5.7.20 release is something to check carefully. MySQL 5.7 is widely used in production, as a base for Percona Server 5.7, some features may be merged into MariaDB 10.x etc. So, here is my review of some community reported bugs that were fixed in recently released MySQL 5.7.20, based on the release notes.

Usually I start with InnoDB bug fixes, but in 5.7.20 several related fixes were made only to bugs reported internally. So, this time I have to start with partitioning:
  • Bug #86255 - First one to write about, and the bug report is private... Second one (Bug #76418) is also private. All we have is this:
    "Partitioning: In certain cases when fetching heap records a partition ID could be set to zero. (Bug #86255, Bug #26034430)"

    "Partitioning: Queries involving NULL were not always handled correctly on tables that were partitioned by LIST. (Bug #76418, Bug #20748521)"
That's all, folks. I fail to understand why bugs with such a description can stay private after they are fixed. I have to admit: I do not get it. Moreover, I am not going even to try any longer. Lists with one item look stupid, but hiding such bugs is not much better, IMHO.

Lucky I am, there were several bug fixes related to replication:
  • Bug #85352 - "Replication regression with RBR and partitioned tables". This regression bug (comparing to 5.5.x) was reported by Juan Arruti and immediately verified by Umesh Shastry. I do not know why it is listed as InnoDB in the release notes, as it hardly can be about native InnoDB partitioning, based on versions affected.
  • Bug #86224 - "mysqlrplsync errors out because slave is using --super-read-only option". It was reported by Richard Morris and verified by Bogdan Kecman.
  • Bug #84731 - "Group Replication: mysql client connections hang during group replication start", was reported by Kenny Gryp and verified by Umesh Shastry. Another bug reported by Kenny was also fixed in 5.7.20, Bug #84798 - "Group Replication can use some verbosity in the error log".
  • Bug #86250 - "thd->ha_data[ht_arg->slot].ha_ptr_backup == __null || (thd->is_binlog_applier()". This debug assertion was reported by Roel Van de Paar and verified by Umesh Shastry
  • Bug #85639 - "XA transactions are 'unsafe' for RPL using SBR". It was reported by João Gramacho.
  • Bug #86288 - "mysqlbinlog read-from-remote-server not honoring rewrite_db filtering", was reported by Julie Hergert.
Other bug fixes not to miss are:
  • Bug #85622 - "5.6: wrong results with materialization, views". It was reported by Shane Bester. Even though it was initially stated that only 5.6 is affected, release notes say there was a fix for 5.7.20 and even 8.0.3.
  • Bug #82992 - "Some warnings appear in dump from mysqldump". This funny bug was found by Nong LO and verified by Sinisa Milivojevic.
  • Bug #81714 - "mysqldump get_view_structure does not free MYSQL_RES in one error path". Thisbug was reported by Laurynas Biveinis, but it was noticed and patched by Yura Sorokin, also from Percona.
  • Bug #83950 - "LOAD DATA INFILE fails with an escape character followed by a multi-byte one", was reported by yours truly and verified by Umesh Shastry. Unfortunately the bug report itself does NOT say explicitly what versions had got the fix.
  • Bug #79596 - "If client killed after ROLLBACK TO SAVEPOINT previous stmts committed". This regression and potential data corruption bug was reported by Sveta Smirnova, verified by Umesh Shastry and studied at the source code level by Zhai Weixiang. Nice to see it fixed!
That's all, few build and packaging related bugs aside.

This was my very last attempt to do a detailed review of bug reports from MySQL Community based on official Release Notes. With private bugs and very few fixes for things coming from the public bugs database in general, it seems to make zero sense now to continue these. Authors of patches contributed are properly mentioned by Oracle, and we all know who verify community bug reports... One Twitter message would be enough to fit everything I have to say, and any real details should better be checked in git commits.

It's time to get back to the roots of this series and maybe write about bugs just opened, bugs not properly handled or just funny ones. I think it helped a lot back in 2013 to make MySQL 5.6 a release that was commonly accepted as a "good one". It's not fun any more and not much useful to report Oracle achievements in public bugs fixing, so I'd better switch to their problems.

A crashing bug in MySQL: the CREATE TABLE of death (more fun with InnoDB Persistent Statistics)

I ended one of my last posts - Fun with InnoDB Persistent Statistics - with a cryptic sentence: there is more to say about this but I will stop here for now.  What I did not share at the time is the existence of a crashing bug somehow related to what I found.  But let's start with some context. In Bug#86926, I found a way to put more than 64 characters in the field table_name of the

How to Choose the MySQL innodb_log_file_size

In this blog post, I’ll provide some guidance on how to choose the MySQL innodb_log_file_size.

Like many database management systems, MySQL uses logs to achieve data durability (when using the default InnoDB storage engine). This ensures that when a transaction is committed, data is not lost in the event of crash or power loss.

MySQL’s InnoDB storage engine uses a fixed size (circular) Redo log space. The size is controlled by innodb_log_file_size and innodb_log_files_in_group (default 2). You multiply those values and get the Redo log space that available to use. While technically it shouldn’t matter whether you change either the innodb_log_file_size or innodb_log_files_in_group variable to control the Redo space size, most people just work with the innodb_log_file_size and leave innodb_log_files_in_group alone.

Configuring InnoDB’s Redo space size is one of the most important configuration options for write-intensive workloads. However, it comes with trade-offs. The more Redo space you have configured, the better InnoDB can optimize write IO. However, increasing the Redo space also means longer recovery times when the system loses power or crashes for other reasons.  

It is not easy or straightforward to predict how much time a system crash recovery takes for a specific innodb_log_file_size value – it depends on the hardware, MySQL version and workload. It can vary widely (10 times difference or more, depending on the circumstances). However, around five minutes per 1GB of innodb_log_file_size is a decent ballpark number. If this is really important for your environment, I would recommend testing it by a simulating system crash under full load (after the database has completely warmed up).   

While recovery time can be a guideline for the limit of the InnoDB Log File size, there are a couple of other ways you can look at this number – especially if you have Percona Monitoring and Management installed.

Check Percona Monitoring and Management’s “MySQL InnoDB Metrics” Dashboard. If you see a graph like this:

where Uncheckpointed Bytes is pushing very close to the Max Checkpoint Age, you can almost be sure your current innodb_log_file_size is limiting your system’s performance. Increasing it can provide substantial performance improvements.

If you see something like this instead:

where the number of Uncheckpointed Bytes is well below the Max Checkpoint Age, then increasing the log file size won’t give you a significant improvement.

Note: many MySQL settings are interconnected. While a specific log file size might be good enough for smaller innodb_buffer_pool_size, larger InnoDB Buffer Pool values might warrant larger log files for optimal performance.

Another thing to keep in mind: the recovery time we spoke about early really depends on the Uncheckpointed Bytes rather than total log file size. If you do not see recovery time increasing with a larger innodb_log_file_size, check out InnoDB Checkpoint Age graph – it might be you just can’t fully utilize large log files with your workload and configuration.

Another way to look at the log file size is in context of log space usage:

This graph shows the amount of Data Written to the InnoDB log files per hour, as well as the total size of the InnoDB log files. In the graph above, we have 2GB of log space and some 12GB written to the Log files per hour. This means we cycle through logs every ten minutes.

InnoDB has to flush every dirty page in the buffer pool at least once per log file cycle time.

InnoDB gets better performance when it does that less frequently, and there is less wear and tear on SSD devices. I like to see this number at no less than 15 minutes. One hour is even better.  

Summary

Getting the innodb_log_file_file size is important to achieve the balance between reasonably fast crash recovery time and good system performance. Remember, your recovery time objective it is not as trivial as you might imagine. I hope the techniques described in this post help you to find the optimal value for your situation!

Webinar Thursday, October 19, 2017: What You Need to Get the Most Out of Indexes – Part 2

Join Percona’s Senior Architect, Matthew Boehm, as he presents What You Need to Get the Most Out of Indexes – Part 2 webinar on Thursday, October 19, 2017, at 11:00 am PDT / 2:00 pm EDT (UTC-7).

Register Now

Proper indexing is key to database performance. Finely tune your query writing and database performance with tips from the experts. MySQL offers a few different types of indexes and uses them in a variety of ways.

In this session you’ll learn:

  • How to use composite indexes
  • Other index usages besides lookup
  • How to find unoptimized queries
  • What is there beyond EXPLAIN?

Register for the webinar.

Matthew Boehm, Architect

Matthew joined Percona in the fall of 2012 as a MySQL consultant. His areas of knowledge include the traditional Linux/Apache/MySQL/PHP stack, memcached, MySQL Cluster, massive sharding topologies, PHP development and a bit of MySQL-C-API development. Previously, Matthew DBAed for the fifth largest MySQL installation at eBay/PayPal. He also hails from managed hosting environments. During his off-hours, Matthew is a nationally ranked, competitive West Coast Swing dancer and travels to competitions around the US. He enjoys working out, camping, biking and playing MMOs with his son.

Top Performance Metrics to Monitor in your MySQL Databases | Part-I

As tables increase in size and more and more users come online, it becomes increasingly necessary to fine-tune your database server from time to time. The secret to knowing what adjustments need to be made is to perform regular monitoring. Most databases offer dozens – if not hundreds – of performance metrics that you can assess.

In the earlier blog, we showed a number of ways to determine the cause(s) of MySQL database performance issues using a couple of built-in tools. With this two blog series, we will focus on monitoring key performance metrics. In part one, we’ll:

  • examine the benefits of performance monitoring,
  • outline the main performance metric categories
  • list the monitoring tools provided by MySQL, i.e.:
    • server variables
    • the Performance Schema
    • the sys Schema
  • learn how to monitor:
    • transaction throughput
    • query execution performance
Benefits of Following a Monitoring Plan

Most applications are backed by a database – including MySQL. To keep your databases and applications running smoothly, it’s crucial to monitor them effectively. A good database monitoring plan can help you stay on top of:

  • Performance: The main subject of this whitepaper, monitoring your database’s performance can help detect possible bottlenecks and other issues before catastrophe strikes. Beyond helping you avert emergencies, performance metrics can assist you in deciding whether a performance increase is necessary or feasible. For instance, by keeping a track record of query execution times you could spot any suboptimal SQL statements and come up with possible improvements.
  • Growth: Observe requirements in terms of users and traffic. Database usage needs tend to evolve faster than expected.
  • Security: Ensure adequate security measures have been applied.
Database Metric Types

Before we go through the process of identifying metrics to follow, perhaps we should start at the beginning and ask “what are metrics?”

Metrics capture a value pertaining to your systems at a specific point in time — for example, the number of users currently logged into the database.

Therefore, metrics are usually collected at regular intervals, such as once per second, one per minute, etc., in order to monitor a system over time.

There are two important categories of metrics: those that are most useful in identifying problems and those whose primary value is in investigating problems. This whitepaper covers which data to collect so that you can:

  1. Recognize potential problems before they occur.
  2. Quickly investigate and get to the bottom of performance issues.

Beyond metrics, there are other types of database monitoring that will not be addressed here. These include the monitoring of events and security.

How often should Monitoring be performed?

How often you monitor different aspects of your database depends on how mission-critical it and the application(s) that it supports are. If a failure or disruption will result in serious impact on your business operations and/or organization, or perhaps even result in catastrophe, then you need to be on top of both performance (and security) issues at all times. Luckily, you can reduce your need to constantly monitor your performance dashboard to a weekly inspection by setting up alerts to inform you of critical issues in real-time.

Database performance metrics should be monitored by specialized tools that provide real-time and/or periodic alerts. Real-time alerts are a must for any mission-critical databases or databases with sensitive information that is susceptible to attack so that you can take care of urgent issues as soon as they occur. Real-time preventive measures can protect your database from certain types of attack, even before you have time to respond.

The DBAs, IT operations staff, and users have shared responsibility in performance monitoring since some factors that affect database performance lie beyond the database itself. It also makes sense to include some of the app developers in the loop so that they can investigate the application side of things.

Although DBAs do not need to monitor the applications that interact with the database, it’s important that they possess a general understanding of how applications are implemented and their architecture.

Performance Metric Categories

The previous section described the two main uses of database metrics – problem identification and problem investigation. Likewise, there are two important categories of metrics that pertain to performance: work metrics and resource metrics. For each system that is part of your software infrastructure, consider which work metrics and resource metrics are applicable and available, and collect whatever you can. Even though you won’t need to monitor every metric, some may play a greater role once you’ve identified performance issues – i.e. during problem investigation.

The next two sections cover each of the two performance metric types in more detail.

Work Metrics

Work metrics gauge the top-level health of your database by measuring its useful output. Work metrics may be broken down into four subtypes:

  • Throughput: The amount of work the system is doing per unit of time. Throughput is usually recorded as an absolute number. Examples include the number of transactions or queries per second.
  • Success: Represents the percentage of work that was executed successfully, i.e. the number of successful queries.
  • Error: Captures the number of erroneous results, usually expressed as a rate of errors per unit of time. This yields errors per unit of work. Error metrics are often captured separately from success metrics when there are several potential sources of error, some of which are more serious or actionable than others.
  • Performance: Quantifies how efficiently a component is doing its work. The most common performance metric is latency, which represents the time required to complete a unit of work. Latency can be expressed as an average or as a percentile, such as “99% of requests returned within 0.1s”.

The above metrics provide high-level but telling data that can help you quickly answer the most important questions about a system’s internal health and performance. That is to say:

  1. Is the database available and effectively doing what it was designed to do?
  2. How fast is it producing work?
  3. What is the quality of that work?

Below are examples of each of the four work metric subtypes from the Monyog monitoring tool:

Statement size and throughput

 

 

A list of select queries with their associated stats, including Average Latency, as well as Success and Error Counts

Resource Metrics

Resources are hardware, software, and network components that are required by the database to do its job. Some resources are low-level, such as physical components like CPU, memory, disks, and network interfaces. Higher-level resources such as the query cache and database waits can also be considered a resource and therefore monitored.

Resource metrics are useful in helping you reconstruct a detailed picture of the database’s state, making them especially valuable for investigation and diagnosis of problems. Resource metrics cover four key areas:

  1. Utilization: The percentage of time that the database is busy, or the percentage of the database’s capacity that is in use.
  2. Saturation: A measure of the amount of requested work that the database cannot yet service, and waits in the queue.
  3. Errors: Represents internal errors that may or may not be observable in the database’s output.
  4. Availability: Denotes the percentage of time that the database responded to requests.

Here are some examples of resource metrics:

Resource metrics in Monyog

Monitoring Performance

Both Work and Resource metrics include two types of metrics as follows:

Work Metrics:

  • Database/Transaction/Query throughput
  • Query execution performance

Resource Metrics:

  • Connections
  • Buffer pool usage

The next several sections will focus on some of the best Work metrics to monitor. Resource metrics will be covered in Part 2.

Throughput

In general terms, throughput measures the speed of a database system, and is typically expressed as a number of transactions per second. Having said that, consider the following differences:

  • “write” transactions vs. “read” transactions
  • sustained rates vs. peak rates
  • a 10-byte row vs. a 1000-byte row

Due to these differences, it is best to measure:

  • Database Throughput: the database as a whole
  • Transaction Throughput: any operation
  • Query Throughput: query execution
Throughput Metrics in MySQL

Indeed, MySQL provides Throughput metrics for all of the above transaction types.

The Questions and Queries Status Variables

There are two general MySQL Status Variables for measuring query execution: Questions and Queries. Of the two, the client-centric view provided by the Questions metric often makes it easier to interpret than the Queries counter; the latter also counts statements executed as part of stored programs, as well as commands such as PREPARE and DEALLOCATE PREPARE that run as part of server-side prepared statements.

SHOW GLOBAL STATUS LIKE "Questions" Variable_name Value -------------------- Questions 66 SHOW GLOBAL STATUS LIKE "Queries"; Variable_name Value -------------------- Queries 149

You can also monitor the breakdown of read and write commands to better understand your database’s workload and identify potential bottlenecks. Read queries are generally captured by the Com_select metric. Writes increment one of three status variables, depending on the statement type:

Writes = Com_insert + Com_update + Com_delete SHOW GLOBAL STATUS LIKE "Com_select"; Variable_name Value -------------------- Com_select 49 When are MySQL Counters Incremented?

While the MySQL docs do list all of the various counter variables, they do not describe in detail when exactly each counter is incremented. This might seem like a trivial point, but really it is not, especially if you’re capturing metrics with high resolution in order to diagnose MySQL performance incidents.

For instance, if you would count queries when they start, a spike in the number of queries in a given second could be due to an increase in traffic. However, if you measure queries at the completion, spikes could be caused by some critical resource becoming available, which allowed for many queries to complete. This often occurs with table-level locks or row-level locks on InnoDB.

In the case of MySQL, it increments the Questions and Queries counters BEFORE executing the query. As a result, you may see a very uniform rate of questions when actually a lot of queries were started but were not completing quickly due to waiting on some resource.

The threads_running Status Variable

To check for unusual numbers of queries running concurrently and struggling to complete in time, it can be instructive to look at the Threads_running status variable.

SHOW GLOBAL STATUS LIKE "Threads_running"; Variable_name Value --------------------- Threads_running 29

A professional monitoring tool can present throughput metrics as a graph to make peaks and valleys more readily apparent:

Database Throughputs in Monyog

 

Transaction Throughput in Monyog

Query Execution Performance

Measuring query execution performance is all about finding those queries that take too long to identify the required data or bring the data back. One of the best metrics to gauge query speed is Latency. In terms of query execution, latency is simply the amount of time it takes a query to execute and return a result set, in other words, the time to make one round trip.

MySQL provides a few options for monitoring query latency, including built-in metrics and the performance schema. Enabled by default since MySQL 5.6.6, the tables of the performance_schema database within MySQL store low-level statistics about server events and query execution.

The slow_queries Server Variable

It stores the number of queries that have taken more than long_query_time seconds. What’s great about this counter is that it increments regardless of whether the slow query log is enabled. That’s a good thing because the slow query log is disabled by default because logging can place a bit of a drag on performance.

SHOW GLOBAL STATUS LIKE "Slow_queries"; Variable_name Value -------------------- Slow_queries 99 The Performance Schema

Many key metrics are contained in the performance schema’s events_statements_summary_by_digest table, which captures information about query volume, latency, errors, time spent waiting for locks, and index usage. These metrics and more are available for each SQL statement executed. Statements are presented in normalized form, meaning that data values are removed from the SQL and whitespace is standardized.

This query finds the top 10 statements by longest average run time:

SELECT substr(digest_text, 1, 50) AS digest_text_start , count_star , avg_timer_wait FROM performance_schema.events_statements_summary_by_digest ORDER BY avg_timer_wait DESC LIMIT 10; digest_text_start count_star avg_timer_wait ---------------------------------------------------------------------------------------------- SHOW FULL TABLES FROM `sakila` 1 1110825767786 SHOW GLOBAL STATUS LIKE ? 1 1038069287388 SELECT `digest_text`, `count_star`, `avg_timer_w 1 945742257586 SHOW FIELDS FROM `sakila` . `actor` 1 611721261340 SELECT `digest_text` , `count_star` , `avg_timer_w 2 335116484794 SHOW FIELDS FROM `sakila` . `actor_info` SELECT `a 1 221773712160 SELECT NAME , TYPE FROM `mysql` . `proc` WHERE `Db 2 148939688506 SHOW FIELDS FROM `vehicles` . `vehiclemodelyear` 1 144172298718 SHOW SCHEMAS 2 132611131408 SHOW FIELDS FROM `sakila` . `customer` 1 99954017212 A Note about Event Timer Units

Performance Schema displays event timer information in picoseconds (trillionths of a second) to present timing data in a standard unit. In the following example, TIMER_WAIT values are divided by 1000000000000 to convert time data into seconds. Values are also truncated to 6 decimal places:

SELECT substr(digest_text, 1, 50) AS digest_text_start , count_star , TRUNCATE(avg_timer_wait/1000000000000,6) FROM performance_schema.events_statements_summary_by_digest ORDER BY avg_timer_wait DESC LIMIT 10; digest_text_start count_star avg_timer_wait ----------------------------------------------------------------------------------------- SHOW FULL TABLES FROM `sakila` 1 1.110825 SHOW GLOBAL STATUS LIKE ? 1 1.038069 SELECT `digest_text`, `count_star`, `avg_timer_w 1 0.945742 etc…

Now we can easily see that the longest query took a little over 1 second to run.

The sys Schema

Rather than write SQL statements against the performance schema, it is generally easier to use the sys schema. It contains easily interpretable tables for inspecting your performance data.

The sys schema comes installed with MySQL starting with version 5.7.7, but users of earlier versions can also install it. For instance, to install the sys schema on MySQL 5.6, run the following commands:

git clone https://github.com/mysql/mysql-sys.git
cd mysql-sys/
mysql -u root -p < ./sys_56.sql

The sys schema provides an organized set of metrics in a more human-readable format, making the corresponding queries much simpler. For instance, to find the slowest statements (those in the 95th percentile by runtime), run the following query:

SELECT * FROM sys.statements_with_runtimes_in_95th_percentile;

Here again, a professional monitoring tool can really pay dividends by consolidating various performance metrics into one cohesive view:

Query Execution Performance in Monyog

Conclusion

This whitepaper is the first of a two-part series on the top MySQL performance metrics. In Part 1, we learned:

  • To keep your databases and applications running smoothly, it’s crucial to monitor them effectively.
  • There are two important categories of metrics: those that are most useful in identifying problems and those whose primary value is in investigating problems.
  • How often you monitor different aspects of your database depends on how mission-critical it and the application(s) that it supports are.
  • There are two important categories of metrics that pertain to performance: work metrics and resource metrics.
  • Both Work and Resource metrics include two types of metrics as follows:

Work Metrics:

    • Throughput
    • Query execution performance

Resource Metrics:

  • Connections
  • Buffer pool usage
  • There are two general MySQL Status Variables for measuring query execution: Questions and Queries. Of the two, the client-centric view provided by the Questions metric often makes it easier to interpret than the Queries counter.
  • MySQL provides a few options for monitoring query latency, including its built-in metrics and the performance schema. Enabled by default since MySQL 5.6.6, the tables of the performance_schema database within MySQL store low-level statistics about server events and query execution.
  • Many key metrics are contained in the performance schema’s events_statements_summary_by_digest table, which captures information about query volume, latency, errors, time spent waiting for locks, index usage.
  • The sys schema contains easily interpretable tables for inspecting your performance data.

Part two of the blog series will focus on Database Connections and Buffer Pool metrics.

Monyog is a MySQL monitoring tool that improves the database performance of your MySQL powered systems. Download your free trial.

The post Top Performance Metrics to Monitor in your MySQL Databases | Part-I appeared first on Monyog Blog.

MariaDB 5.5.58 and MariaDB Connector/ODBC 3.0.2 now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 5.5.58, as well as the recent release of MariaDB Connector/ODBC 3.0.2. These are both stable (GA) releases. See the release notes and changelog for details. Download MariaDB 5.5.58 Release Notes Changelog What is MariaDB 5.5? MariaDB APT and YUM Repository Configuration Generator Download […]

The post MariaDB 5.5.58 and MariaDB Connector/ODBC 3.0.2 now available appeared first on MariaDB.org.

Top Slowest Queries and their Associated Tables in MySQL

The following query gets data from performance_schema in MySQL and attempts to regex the digest to the list of tables in the same schema.
SELECT d.*,
(SELECT group_concat(distinct TABLE_NAME) FROM information_schema.TABLES
WHERE table_schema = d.schema_name and d.digest_text regexp table_name) table_name
FROM performance_schema.events_statements_summary_by_digest d
WHERE d.DIGEST_TEXT regexp "^(SELECT|UPDATE|DELETE|REPLACE|INSERT|CREATE)"
and d.LAST_SEEN >= curdate() - interval 7 day
ORDER BY d.SUM_TIMER_WAIT DESC limit 10\G

Announcing MySQL Server 5.7.20, 5.6.38, and 5.5.58

MySQL Server 5.7.20, 5.6.38, and 5.5.58, new versions of the popular Open Source Database Management System, have been released. These releases are recommended for use on production systems. For an overview of what’s new, please see http://dev.mysql.com/doc/refman/5.7/en/mysql-nutshell.html http://dev.mysql.com/doc/refman/5.6/en/mysql-nutshell.html http://dev.mysql.com/doc/refman/5.5/en/mysql-nutshell.html For information on installing the release on new servers, please see the MySQL installation documentation at […]

“Quick issue detection and excellent customer support”: Stefan Schiele

Stefan Schiele, CEO, SCT Schiele talks about how Monyog helped them run mission-critical applications smoothly and without errors.

SCT Schiele are specialists in e-commerce solutions in the B2B market. The company provides deep integrations into the merchandise business, thus enabling large clientele to increase their e-commerce revenue.

Enabling e-commerce

Today, every customer expects the websites to load faster and applications to run without lag.  When your applications perform poorly, your customers notice immediately. As Stefan Schiele puts it – “Nowadays everyone expects e-commerce systems to be available 24/7. Without continuous monitoring, this is simply unachievable.”

SCT Schiele data centre has a number of own production servers and servers on which their customers run & operate services; the company currently uses Oracle, InterSystems Caché, MySQL and MariaDB databases. To avoid transaction loss, Stefan’s team is planning to migrate all databases to a MariaDB Galera Cluster over the next few months.

Issue Resolution through Deadlock Monitoring

When a section of their e-commerce solution moved to a MariaDB Galera cluster, they experienced some issues with the documents detail view for orders, quotes and delivery notes.  Sometimes the stock availability wasn’t shown on the web page although they received this information from the SAP system. This report was inserted with a combined update/select SQL statement into the table containing the positions of the document, and this statement failed for an unknown reason with a 3 to 5 percent rate.

Stefan started trying out Monyog for its ability to deal with a MariaDB Galera Cluster which is set up as a master/master replication.

Stefan says,

“Monyog almost instantly showed the main cause of the issues our systems were experiencing through monitoring of deadlocks. All of the deadlocks affected the table where the document positions were stored, and these deadlocks were what caused some of the update statements to fail.”

After storing the information for the document views from the SAP system in the database, they used several insert/select and update/select statements to write this information to another table. In fact, this led to millions of write operations per day on the same table, and this was what caused the deadlocks on this table. After Monyog helped them identify the main cause of the issues, it became a simple fix.

Customer satisfaction: Monyog >> SCT Schiele >> E-commerce clients

Stefan praised the Monyog customer support team in assisting him to find the underlying issues quickly. In return, this helped SCT Schiele to provide greater support to the e-commerce solutions for a large clientele.

Stefan says: “For an e-commerce solution that receives hundreds of thousands of page hits per day, quickly fixing deadlock issues made the application run smoothly without any errors. Therefore this led to customer satisfaction – and that’s important.”

The post “Quick issue detection and excellent customer support”: Stefan Schiele appeared first on Monyog Blog.

This Week in Data with Colin Charles 10: MariaDB and Upcoming Apearances

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Beyond spending time getting ready for Velocity and Open Source Summit Europe this week, there was some feature testing this week that compared MySQL and MariaDB. Naturally, a long report/blog is coming soon. Stay tuned.

Releases

I reckon a lot of folks are swamped after Percona Live Europe Dublin and Oracle OpenWorld, so the releases in the MySQL universe are a bit quieter.

Link List Upcoming Appearances

Percona’s website keeps track of community events, so check out where to see and listen to a Perconian speak. My upcoming appearances are:

Feedback

I was asked why there weren’t many talks from MariaDB Foundation / MariaDB Corporation at Percona Live Europe 2017. Simple answer: there were hardly any submissions. We had two talk submissions from one speaker from the MariaDB Foundation (we accepted the one on MariaDB 10.3). There was another talk submission from a speaker from MariaDB Corporation (again, accepted). We announced the second talk in the sneak preview, but the talk was canceled as the speaker was unable to attend. We did, however, have a deep breadth of talks about MariaDB, with many talks that discussed high availability, security, proxies and the like.

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

What’s New in MariaDB Connector/ODBC 3.0

What’s New in MariaDB Connector/ODBC 3.0 RalfGebhardt Fri, 10/13/2017 - 09:16

We are pleased to announce the general availability (GA) of MariaDB Connector/ODBC 3.0. MariaDB Connector/ODBC 3.0.2 is the newest version of MariaDB Connector/ODBC. This release is compatible with MariaDB Connector/ODBC 2.0 – no code changes necessary to upgrade.

MariaDB Connector/ODBC 3.0 is based on the new MariaDB Connector/C 3.0. It therefore takes advantage of the new security enhancements and plugins provided by MariaDB Connector/C.

Direct dependencies to functions of MariaDB Connector/C have been removed, only the functions provided by the documented API are used. This allows dynamic linking where preferred. Our own binaries still use the static linking.

New features
  • Support of a new bulk load implementation (needs MariaDB Server 10.2)
  • SQLExecDirect, Prepares and executes a statement directly (needs MariaDB Server 10.2)
  • New SSL options based on the implementation in MariaDB Connector/C
    • GnuTLS
    • Windows SChannel: removes dependencies on external libraries
    • Windows SChannel: becomes the default for SSL on Windows 
    • TLSv1.1 and TLSv1.2 support
  • SSL option can now be defined in the setup dialog

Download the MariaDB Connector now and learn about the newest evolution of MariaDB Connector/ODBC 3.0.

We are pleased to announce the general availability (GA) of MariaDB Connector/ODBC 3.0. MariaDB Connector/ODBC 3.0.2 is the newest version of MariaDB Connector/ODBC. This release is compatible with MariaDB Connector/ODBC 2.0 – no code changes necessary to upgrade.

Login or Register to post comments

Understanding MySQL innodb_flush_log_at_trx_commit variable

The main objective of this post is to understand why innodb_fush_log_at_trx_commit variable is used? What are the applicable values for innodb_fush_log_at_trx_commit? How innodb_fush_log_at_trx_commit value will impact the MySQL performance and data safety?  How to change innodb_fush_log_at_trx_commit value? and How to change the frequency of InnoDB log flush?


Why innodb_fush_log_at_trx_commit?


In order to understand why we need innodb_fush_log_at_trx_commit variable, we should know about how InnoDB works. It is a huge and complex topic and it is not the scope of this article. I am trying to cover this topic in simple words and it is given below:


  1. InnoDB performs most of its operations at the memory (InnoDB Buffer Pool)
  2. It will write all changes from memory to the transaction log (InnoDB Log File)
  3. From transaction log - it will flush (write) data to storage disk (durable storage)


In order to achieve durability of data, we need to store each and every transaction data into hard disk storage. But consider, in a busy system, for each transaction commit, if InnoDB trying to flush (write) data to slow running disk, what will happen?, So how do we manage this situation, where we need to store each transaction data and at the same time maintaining good performance of the system.


The InnoDB provides the solution for this situation, based on your system, you can tell InnoDB, when you want to flush (write) data to disk. For example, you can tell InnoDB to work as mentioned below:

  1. Write to log file and flush data to disk in specified interval, not for every transaction commit.
  2. Write to log and flush to disk for each transaction commit.
  3. Write to log for every transaction commit but flush to disk at an interval not for each transaction commit.







Based on the application standard you can adjust these settings to maintain balance between performance and data safety. InnoDB provides a configurable variable to achieve this balance, this variable is called innodb_fush_log_at_trx_commit. This variable controls how often the log buffer is flushed.


What are the applicable values for innodb_fush_log_at_trx_commit and its impact?


InnoDB supports following 3 values for innodb_fush_log_at_trx_commit variable. Let see how these variables will change flush to disk behaviour:


innodb_flush_log_trx_at_commit=0


When you set innodb_flush_log_trx_at_commit=0, InnoDB will write the modified data (in InnoDB Buffer Pool) to log file (ib_logfile) and flush the log file (write to disk) every second, but it will not do anything at transaction commit.


Here, in case if there is a power failure or system crash, all the unflushed data will not be recoverable, since, it is not either written to log file or stored disk.


innodb_flush_log_trx_at_commit=1


When you set innodb_flush_log_trx_commit=1, InnoDB will write the log buffer to transaction log and flush to durable storage for every transaction.


Here, for all transaction commit, InnoDB will write to log and then write to disk, if in case the slower disk storage, it will badly impact the performance, i.e. the number of InnoDB transaction per second will be reduced.


innodb_flush_log_trx_at_commit=2


When you set innodb_flush_log_trx_commit = 2, InnoDB will write the log buffer to log file at every commit, but don't write data to disk. InnoDB flushes data once in every second.

Option 2, even if there is a power failure or system crash, data will be available in log file and can be recoverable.







So which option to be used on your application?


We have seen, the behaviour of each values, now it is based on the application requirement, we need to choose this variable value.


If performance is the main goal, you can set the value as 2. Since, InnoDB writing to disk once in every second, not for every transaction commit and it will improve the performance dramatically. In case if there is power failure or crash, data can be recoverable from transaction log.


If data safety is the main goal, you can set the value as 1, so that for every transaction commit, InnoDB will flush to disk. But performance may reduce in this case.


If you set value as 0, InnoDB will write from buffer to log once in every second and it will not perform any flush to disk operation for every transaction commit. The problem with this option is if in case of any power failure or system crash, there may be a chance of losing data up to one second.


Key points:


During buffer to log operation, i.e. writing from InnoDB buffer pool to InnoDB transaction log file - data is simply moved from InnoDB buffer to Operating system's cache, actually not written to the durable storage. So if you set innodb_fush_log_at_trx_commit either 0 or 2, there is a possibility of losing data up to one second.


If innodb_fush_log_at_trx_commit is set to 1 - InnoDB compels Operating System to flush data to durable storage. Writing to disk is a slow process and it is I/O blocking operation to ensure data is written to disk. So using this option, there is a chance; number of transaction per second will be reduced.


Note that, by default MySQL will auto commit the transactions.


Setting innodb_fush_log_at_trx_commit as 2 is useful when restoring huge databases, it will reduce the restoration time dramatically, and there are different opinions on this point, it is better to test in our own hand. In my experience, it really helped with the reduced restoration time.


Data Definition Language (DDL) changes flush the InnoDB log, independent of the innodb_fush_log_at_trx_commit setting.


Crash recovery operation works regardless of the innodb_fush_log_at_trx_commit setting, I am writing another article on InnoDB crash recovery.








How to configure innodb_fush_log_at_trx_commit variable?


The scope of the innodb_fush_log_at_trx_commit is GLOBAL and it can be set at dynamically without restarting server.


Dynamically on command line you can set innodb_fush_log_at_trx_commit as follows:


SET GLOBAL innodb_fush_log_at_trx_commit = 2;


On configuration file, you can set it as follows:


[mysqld]
innodb_fush_log_at_trx_commit = 2


Note: It requires server restart. Before making changes to configuration file, analyse the impact by setting it on dynamically.


If you try to set innodb_fush_log_at_trx_commit as session level variables, you will encounter the following error:


mysql> set session innodb_fush_log_at_trx_commit = 2;
ERROR 1229 (HY000): Variable 'innodb_fush_log_at_trx_commit' is a GLOBAL variable and should be set with SET GLOBAL
mysql>


How to change the frequency of InnoDB log file flushing?


The variable innodb_flush_log_at_timeout controls the frequency in which InnoDB flush log files. The frequency ranges from 1 to 2700 seconds, with the default value 1.







Note that, the higher this number, there is a higher the chance of losing data, in case of power failure or system crash.


For example, if you set this value as 5 seconds, in case of power failure, you may lose data upto 5 seconds.


In replication topology, to maintain durability and consistency of data, you can leave the default value, i.e. innodb_fush_log_at_trx_commit = 1.


I hope this article help you to understand, how InnoDB flush data and how it impact MySQL performance and data safety, also, how you can configure this variable to achieve maximum benefit. 
In future article let us see about how InnoDB crash recovery works and its configuration settings. If I missed something or if you wish to share your thoughts on this article, please mention in comment section, I will edit the post after review.





How to Automate Galera Cluster Using the ClusterControl CLI

As sysadmins and developers, we spend a lot our time in a terminal. So we brought ClusterControl to the terminal with our command line interface tool called s9s. s9s provides an easy interface to the ClusterControl RPC v2 API. You will find it very useful when working with large scale deployments, as the CLI allows will allow you to design more complex features and workflows.

This blog post showcases how to use s9s to automate the management of Galera Cluster for MySQL or MariaDB, as well as a simple master-slave replication setup.

Setup

You can find installation instructions for your particular OS in the documentation. What’s important to note is that if you happen to use the latest s9s-tools, from GitHub, there’s a slight change in the way you create a user. The following command will work fine:

s9s user --create --generate-key --controller="https://localhost:9501" dba

In general, there are two steps required if you want to configure CLI locally on the ClusterControl host. First, you need to create a user and then make some changes in the configuration file - all the steps are included in the documentation.

Deployment

Once the CLI has been configured correctly and has SSH access to your target database hosts, you can start the deployment process. At the time of writing, you can use the CLI to deploy MySQL, MariaDB and PostgreSQL clusters. Let’s start with an example of how to deploy Percona XtraDB Cluster 5.7. A single command is required to do that.

s9s cluster --create --cluster-type=galera --nodes="10.0.0.226;10.0.0.227;10.0.0.228" --vendor=percona --provider-version=5.7 --db-admin-passwd="pass" --os-user=root --cluster-name="PXC_Cluster_57" --wait

Last option “--wait” means that the command will wait until the job completes, showing its progress. You can skip it if you want - in that case, the s9s command will return immediately to shell after it registers a new job in cmon. This is perfectly fine as cmon is the process which handles the job itself. You can always check the progress of a job separately, using:

root@vagrant:~# s9s job --list -l -------------------------------------------------------------------------------------- Create Galera Cluster Installing MySQL on 10.0.0.226 [██▊ ] 26.09% Created : 2017-10-05 11:23:00 ID : 1 Status : RUNNING Started : 2017-10-05 11:23:02 User : dba Host : Ended : Group: users -------------------------------------------------------------------------------------- Total: 1

Let’s take a look at another example. This time we’ll create a new cluster, MySQL replication: simple master - slave pair. Again, a single command is enough:

root@vagrant:~# s9s cluster --create --nodes="10.0.0.229?master;10.0.0.230?slave" --vendor=percona --cluster-type=mysqlreplication --provider-version=5.7 --os-user=root --wait Create MySQL Replication Cluster / Job 6 FINISHED [██████████] 100% Cluster created

We can now verify that both clusters are up and running:

root@vagrant:~# s9s cluster --list --long ID STATE TYPE OWNER GROUP NAME COMMENT 1 STARTED galera dba users PXC_Cluster_57 All nodes are operational. 2 STARTED replication dba users cluster_2 All nodes are operational. Total: 2

Of course, all of this is also visible via the GUI:

Now, let’s add a ProxySQL loadbalancer:

root@vagrant:~# s9s cluster --add-node --nodes="proxysql://10.0.0.226" --cluster-id=1 WARNING: admin/admin WARNING: proxy-monitor/proxy-monitor Job with ID 7 registered.

This time we didn’t use ‘--wait’ option so, if we want to check the progress, we have to do it on our own. Please note that we are using a job ID which was returned by the previous command, so we’ll obtain information on this particular job only:

root@vagrant:~# s9s job --list --long --job-id=7 -------------------------------------------------------------------------------------- Add ProxySQL to Cluster Waiting for ProxySQL [██████▋ ] 65.00% Created : 2017-10-06 14:09:11 ID : 7 Status : RUNNING Started : 2017-10-06 14:09:12 User : dba Host : Ended : Group: users -------------------------------------------------------------------------------------- Total: 7 Scaling out

Nodes can be added to our Galera cluster via a single command:

s9s cluster --add-node --nodes 10.0.0.229 --cluster-id 1 Job with ID 8 registered. root@vagrant:~# s9s job --list --job-id=8 ID CID STATE OWNER GROUP CREATED RDY TITLE 8 1 FAILED dba users 14:15:52 0% Add Node to Cluster Total: 8

Something went wrong. We can check what exactly happened:

root@vagrant:~# s9s job --log --job-id=8 addNode: Verifying job parameters. 10.0.0.229:3306: Adding host to cluster. 10.0.0.229:3306: Testing SSH to host. 10.0.0.229:3306: Installing node. 10.0.0.229:3306: Setup new node (installSoftware = true). 10.0.0.229:3306: Detected a running mysqld server. It must be uninstalled first, or you can also add it to ClusterControl.

Right, that IP is already used for our replication server. We should have used another, free IP. Let’s try that:

root@vagrant:~# s9s cluster --add-node --nodes 10.0.0.231 --cluster-id 1 Job with ID 9 registered. root@vagrant:~# s9s job --list --job-id=9 ID CID STATE OWNER GROUP CREATED RDY TITLE 9 1 FINISHED dba users 14:20:08 100% Add Node to Cluster Total: 9 Managing

Let’s say we want to take a backup of our replication master. We can do that from the GUI but sometimes we may need to integrate it with external scripts. ClusterControl CLI would make a perfect fit for such case. Let’s check what clusters we have:

root@vagrant:~# s9s cluster --list --long ID STATE TYPE OWNER GROUP NAME COMMENT 1 STARTED galera dba users PXC_Cluster_57 All nodes are operational. 2 STARTED replication dba users cluster_2 All nodes are operational. Total: 2

Then, let’s check the hosts in our replication cluster, with cluster ID 2:

root@vagrant:~# s9s nodes --list --long --cluster-id=2 STAT VERSION CID CLUSTER HOST PORT COMMENT soM- 5.7.19-17-log 2 cluster_2 10.0.0.229 3306 Up and running soS- 5.7.19-17-log 2 cluster_2 10.0.0.230 3306 Up and running coC- 1.4.3.2145 2 cluster_2 10.0.2.15 9500 Up and running

As we can see, there are three hosts that ClusterControl knows about - two of them are MySQL hosts (10.0.0.229 and 10.0.0.230), the third one is the ClusterControl instance itself. Let’s print only the relevant MySQL hosts:

root@vagrant:~# s9s nodes --list --long --cluster-id=2 10.0.0.2* STAT VERSION CID CLUSTER HOST PORT COMMENT soM- 5.7.19-17-log 2 cluster_2 10.0.0.229 3306 Up and running soS- 5.7.19-17-log 2 cluster_2 10.0.0.230 3306 Up and running Total: 3

In the “STAT” column you can see some characters there. For more information, we’d suggest to look into the manual page for s9s-nodes (man s9s-nodes). Here we’ll just summarize the most important bits. First character tells us about the type of the node: “s” means it’s regular MySQL node, “c” - ClusterControl controller. Second character describes the state of the node: “o” tells us it’s online. Third character - role of the node. Here “M” describes a master and “S” - a slave while “C” stands for controller. Final, fourth character tells us if the node is in maintenance mode. “-” means there’s no maintenance scheduled. Otherwise we’d see “M” here. So, from this data we can see that our master is a host with IP: 10.0.0.229. Let’s take a backup of it and store it on the controller.

root@vagrant:~# s9s backup --create --nodes=10.0.0.229 --cluster-id=2 --backup-method=xtrabackupfull --wait Create Backup | Job 12 FINISHED [██████████] 100% Command ok

We can then verify if it indeed completed ok. Please note the “--backup-format” option which allows you to define which information should be printed:

root@vagrant:~# s9s backup --list --full --backup-format="Started: %B Completed: %E Method: %M Stored on: %S Size: %s %F\n" --cluster-id=2 Started: 15:29:11 Completed: 15:29:19 Method: xtrabackupfull Stored on: 10.0.0.229 Size: 543382 backup-full-2017-10-06_152911.xbstream.gz Total 1 Severalnines   DevOps Guide to Database Management Learn about what you need to know to automate and manage your open source databases Download for Free Related resources  Database Automation: Integrating the ClusterControl CLI with your ChatBot  How to use s9s -The Command Line Interface to ClusterControl Monitoring

All databases have to be monitored. ClusterControl uses advisors to watch some of the metrics on both MySQL and the operating system. When a condition is met, a notification is sent. ClusterControl provides also an extensive set of graphs, both real-time as well as historical ones for post-mortem or capacity planning. Sometimes it would be great to have access to some of those metrics without having to go through the GUI. ClusterControl CLI makes it possible through the s9s-node command. Information on how to do that can be found in the manual page of s9s-node. We’ll show some examples of what you can do with CLI.

First of all, let’s take a look at the “--node-format” option to “s9s node” command. As you can see, there are plenty of options to print interesting content.

root@vagrant:~# s9s node --list --node-format "%N %T %R %c cores %u%% CPU utilization %fmG of free memory, %tMB/s of net TX+RX, %M\n" "10.0.0.2*" 10.0.0.226 galera none 1 cores 13.823200% CPU utilization 0.503227G of free memory, 0.061036MB/s of net TX+RX, Up and running 10.0.0.227 galera none 1 cores 13.033900% CPU utilization 0.543209G of free memory, 0.053596MB/s of net TX+RX, Up and running 10.0.0.228 galera none 1 cores 12.929100% CPU utilization 0.541988G of free memory, 0.052066MB/s of net TX+RX, Up and running 10.0.0.226 proxysql 1 cores 13.823200% CPU utilization 0.503227G of free memory, 0.061036MB/s of net TX+RX, Process 'proxysql' is running. 10.0.0.231 galera none 1 cores 13.104700% CPU utilization 0.544048G of free memory, 0.045713MB/s of net TX+RX, Up and running 10.0.0.229 mysql master 1 cores 11.107300% CPU utilization 0.575871G of free memory, 0.035830MB/s of net TX+RX, Up and running 10.0.0.230 mysql slave 1 cores 9.861590% CPU utilization 0.580315G of free memory, 0.035451MB/s of net TX+RX, Up and running

With what we shown here, you probably can imagine some cases for automation. For example, you can watch the CPU utilization of the nodes and if it reaches some threshold, you can execute another s9s job to spin up a new node in the Galera cluster. You can also, for example, monitor memory utilization and send alerts if it passess some threshold.

The CLI can do more than that. First of all, it is possible to check the graphs from within the command line. Of course, those are not as feature-rich as graphs in the GUI, but sometimes it’s enough just to see a graph to find an unexpected pattern and decide if it is worth further investigation.

root@vagrant:~# s9s node --stat --cluster-id=1 --begin="00:00" --end="14:00" --graph=load 10.0.0.231 root@vagrant:~# s9s node --stat --cluster-id=1 --begin="00:00" --end="14:00" --graph=sqlqueries 10.0.0.231

During emergency situations, you may want to check resource utilization across the cluster. You can create a top-like output that combines data from all of the cluster nodes:

root@vagrant:~# s9s process --top --cluster-id=1 PXC_Cluster_57 - 14:38:01 All nodes are operational. 4 hosts, 7 cores, 2.2 us, 3.1 sy, 94.7 id, 0.0 wa, 0.0 st, GiB Mem : 2.9 total, 0.2 free, 0.9 used, 0.2 buffers, 1.6 cached GiB Swap: 3 total, 0 used, 3 free, PID USER HOST PR VIRT RES S %CPU %MEM COMMAND 8331 root 10.0.2.15 20 743748 40948 S 10.28 5.40 cmon 26479 root 10.0.0.226 20 278532 6448 S 2.49 0.85 accounts-daemon 5466 root 10.0.0.226 20 95372 7132 R 1.72 0.94 sshd 651 root 10.0.0.227 20 278416 6184 S 1.37 0.82 accounts-daemon 716 root 10.0.0.228 20 278304 6052 S 1.35 0.80 accounts-daemon 22447 n/a 10.0.0.226 20 2744444 148820 S 1.20 19.63 mysqld 975 mysql 10.0.0.228 20 2733624 115212 S 1.18 15.20 mysqld 13691 n/a 10.0.0.227 20 2734104 130568 S 1.11 17.22 mysqld 22994 root 10.0.2.15 20 30400 9312 S 0.93 1.23 s9s 9115 root 10.0.0.227 20 95368 7192 S 0.68 0.95 sshd 23768 root 10.0.0.228 20 95372 7160 S 0.67 0.94 sshd 15690 mysql 10.0.2.15 20 1102012 209056 S 0.67 27.58 mysqld 11471 root 10.0.0.226 20 95372 7392 S 0.17 0.98 sshd 22086 vagrant 10.0.2.15 20 95372 4960 S 0.17 0.65 sshd 7282 root 10.0.0.226 20 0 0 S 0.09 0.00 kworker/u4:2 9003 root 10.0.0.226 20 0 0 S 0.09 0.00 kworker/u4:1 1195 root 10.0.0.227 20 0 0 S 0.09 0.00 kworker/u4:0 27240 root 10.0.0.227 20 0 0 S 0.09 0.00 kworker/1:1 9933 root 10.0.0.227 20 0 0 S 0.09 0.00 kworker/u4:2 16181 root 10.0.0.228 20 0 0 S 0.08 0.00 kworker/u4:1 1744 root 10.0.0.228 20 0 0 S 0.08 0.00 kworker/1:1 28506 root 10.0.0.228 20 95372 7348 S 0.08 0.97 sshd 691 messagebus 10.0.0.228 20 42896 3872 S 0.08 0.51 dbus-daemon 11892 root 10.0.2.15 20 0 0 S 0.08 0.00 kworker/0:2 15609 root 10.0.2.15 20 403548 12908 S 0.08 1.70 apache2 256 root 10.0.2.15 20 0 0 S 0.08 0.00 jbd2/dm-0-8 840 root 10.0.2.15 20 316200 1308 S 0.08 0.17 VBoxService 14694 root 10.0.0.227 20 95368 7200 S 0.00 0.95 sshd 12724 n/a 10.0.0.227 20 4508 1780 S 0.00 0.23 mysqld_safe 10974 root 10.0.0.227 20 95368 7400 S 0.00 0.98 sshd 14712 root 10.0.0.227 20 95368 7384 S 0.00 0.97 sshd 16952 root 10.0.0.227 20 95368 7344 S 0.00 0.97 sshd 17025 root 10.0.0.227 20 95368 7100 S 0.00 0.94 sshd 27075 root 10.0.0.227 20 0 0 S 0.00 0.00 kworker/u4:1 27169 root 10.0.0.227 20 0 0 S 0.00 0.00 kworker/0:0 881 root 10.0.0.227 20 37976 760 S 0.00 0.10 rpc.mountd 100 root 10.0.0.227 0 0 0 S 0.00 0.00 deferwq 102 root 10.0.0.227 0 0 0 S 0.00 0.00 bioset 11876 root 10.0.0.227 20 9588 2572 S 0.00 0.34 bash 11852 root 10.0.0.227 20 95368 7352 S 0.00 0.97 sshd 104 root 10.0.0.227 0 0 0 S 0.00 0.00 kworker/1:1H

When you take a look at the top, you’ll see CPU and memory statistics aggregated across the whole cluster.

root@vagrant:~# s9s process --top --cluster-id=1 PXC_Cluster_57 - 14:38:01 All nodes are operational. 4 hosts, 7 cores, 2.2 us, 3.1 sy, 94.7 id, 0.0 wa, 0.0 st, GiB Mem : 2.9 total, 0.2 free, 0.9 used, 0.2 buffers, 1.6 cached GiB Swap: 3 total, 0 used, 3 free,

Below you can find the list of processes from all of the nodes in the cluster.

PID USER HOST PR VIRT RES S %CPU %MEM COMMAND 8331 root 10.0.2.15 20 743748 40948 S 10.28 5.40 cmon 26479 root 10.0.0.226 20 278532 6448 S 2.49 0.85 accounts-daemon 5466 root 10.0.0.226 20 95372 7132 R 1.72 0.94 sshd 651 root 10.0.0.227 20 278416 6184 S 1.37 0.82 accounts-daemon 716 root 10.0.0.228 20 278304 6052 S 1.35 0.80 accounts-daemon 22447 n/a 10.0.0.226 20 2744444 148820 S 1.20 19.63 mysqld 975 mysql 10.0.0.228 20 2733624 115212 S 1.18 15.20 mysqld 13691 n/a 10.0.0.227 20 2734104 130568 S 1.11 17.22 mysqld

This can be extremely useful if you need to figure out what’s causing the load and which node is the most affected one.

Hopefully, the CLI tool makes it easier for you to integrate ClusterControl with external scripts and infrastructure orchestration tools. We hope you’ll enjoy using this tool and if you have any feedback on how to improve it, feel free to let us know.

Tags:  galera MySQL MariaDB ClusterControl CLI

Mitigating replication lag and reducing read load with freno

At GitHub, we use MySQL as the main database technology backing our services. We run classic MySQL master-replica setups, where writes go to the master, and replicas replay master’s changes asynchronously. To be able to serve our traffic we read data from the MySQL replicas. To scale our traffic we may add more replica servers. Reading from the master does not scale and we prefer to minimize master reads.

With asynchronous replication, changes we make to the master are not immediately reflected on replicas. Each replica pulls changes from its master and replays them as fast as it can. There is a nonzero delay between the point in time where changes are made visible on a master and the time where those changes are visible on some replica or on all replicas. This delay is the replication lag.

The higher the replication lag on a host, the more stale its data becomes. Serving traffic off a lagging replica leads to poor user experience, as someone may make a change and then not see it reflected. Our automation removes lagging replicas from the serving pool after a few seconds, but even those few seconds matter: we commonly expect sub-second replication lag.

Maintaining low replication lag is challenging. An occasional INSERT or UPDATE is nothing, but we routinely run massive updates to our databases. These could be batched jobs, cleanup tasks, schema changes, schema change followups or otherwise operations that affect large datasets. Such large operations may easily introduce replication lag: while a replica is busy applying a change to some 100,000 rows, its data quickly becomes stale, and by the time it completes processing it’s already lagging and requires even more time to catch up.

Running subtasks

To mitigate replication lag for large operations we use batching. We never apply a change to 100,000 rows all at once. Any big update is broken into small segments, subtasks, of some 50 or 100 rows each.

As an example, say our app needs to purge some rows that satisfy a condition from a very large table. Instead of running a single DELETE FROM my_table WHERE some_condition = 1 we break the query to smaller subqueries, each operating on a different slice of the table. In its purest form, we would get a sequence of queries such as:

DELETE FROM my_table WHERE some_condition = 1 AND (id >= 0 AND id < 100); DELETE FROM my_table WHERE some_condition = 1 AND (id >= 100 AND id < 200); DELETE FROM my_table WHERE some_condition = 1 AND (id >= 200 AND id < 300); DELETE FROM my_table WHERE some_condition = 1 AND (id >= 300 AND id < 400); DELETE FROM my_table WHERE some_condition = 1 AND (id >= 400 AND id < 500); ...

These smaller queries can each be processed by a replica very quickly, making it available to process more events, some of which may be the normal site’s update traffic or the next segments.

However, the numbers still add up. On a busy hour a heavily loaded replica may still find it too difficult to manage both read traffic and massive changes coming from the replication stream.

Throttling

We recognize that most of the large volume operations come from background jobs, such as archiving or schema migrations. There is no particular user of API requests waiting on those operations to complete. It is fine for these operations to take a little while longer to complete.

In order to apply these large operations, we break them into smaller segments and throttle in between applying those segments. Each segment is small enough and safe to send down the replication stream, but a bunch of those segments can be too much for the replicas to handle. In between each segment we pause and ask: is replication happy and in good shape?

There is no direct mechanism in MySQL to do that. Closest would be semisynchronous replication, but even that doesn’t guarantee replication lag to be caught up nor be within reasonable margins.

It is up to us to be able to identify our relevant production traffic serving replicas and ask: “What is your current lag?”

If lag on all relevant replicas is within good margins (we expect subsecond), we proceed to run the next segment, then ask again. If lag is higher than desired, we throttle: we stall the operation, and keep polling lag until we’re satisfied it is low enough for us.

This flow suggests:

  • The app needs a way to identify production serving replicas.
  • The app needs to be able to gather lag information from those replicas.
  • It needs to do so from whatever hosts it’s running on, and potentially from multiple hosts concurrently.
The old GitHub throttler

Our site runs Ruby on Rails. Over time, we built into our Rails app a series of abstractions to identify the set of replicas that were active in a cluster, ask them their current replication delay value, and determine whether that value was low enough to continue writing on the master. In its simplest form, the api looks like this:

big_dataset.each_slice do |subset| GitHub::Throttler::MyCluster.throttle do write(subset) end end

We would routinely inspect HAProxy for the list relevant replicas and populate a special table with that list. The table was made available to the throttler. When .throttle was called, the throttler pulled the list of replicas and polled replication delay metrics from the MySQL servers.

A common tool part of Percona Toolkit called pt-heartbeat inserts a timestamp each 100ms in the master. That timestamp is replicated along with the rest of the information to the replicas, and as a consequence, the following query returns the replication delay in seconds from each of the replicas.

select unix_timestamp(now(6)) - unix_timestamp(ts) as replication_delay from heartbeat order by ts desc limit 1

The aggregated value of replication delay was the maximum of the different metrics polled from each replica.

The throttler also had local configuration to determine when that aggregated value was low-enough. If it wasn’t, the code block above would sleep for a second and check the replication delay again; if it instead was good, it would run the block, thus writing the subset.

Non ruby throttlers

As we’ve grown, we’ve introduced write workloads outside of our main Rails application, and they’ve needed to be throttled as well.

We routinely archive or purge old data via pt-archiver. This is a Perl script, and fortunately comes with its own implementation for replication lag based throttling. The tool crawls down the topology to find replicating servers, then periodically checks their replication lag.

Last year we introduced gh-ost, our schema migration tool. gh-ost, by definition, runs our most massive operations: it literally rebuilds and rewrites entire tables. Even if not throttled, some of our tables could take hours or days to rebuild. gh-ost is written in Go, and could not use the Ruby throttler implementation nor the Perl implementation. Nor did we wish for it to depend on either, as we created it as a general purpose, standalone solution to be used by the community. gh-ost runs its own throttling mechanism, checking first and foremost the very replica on which it operates, but then also the list of --throttle-control-replicas. gh-ost’s interactive commands allow us to change the list of throttle control replicas during runtime. We would compute the list dynamically when spawning gh-ost, and update, if needed, during migration.

Then Spokes brought more massive writes. As our infrastructure grew, more and more external Ruby and non-Ruby services began running massive operations on our database.

An operational anti-pattern

What used to work well when we were running exclusively Ruby on Rails code and in smaller scale didn’t work so well as we grew. We increasingly ran into operational issues with our throttling mechanisms.

We were running more and more throttling tasks, many in parallel. We were also provisioning, decommissioning, and refactoring our MySQL fleets. Our traffic grew substantially. We realized our throttling setups had limitations.

Different apps were getting the list of relevant replicas in different ways. While the Ruby throttler always kept an updated list, we’d need to educate pt-archiver and gh-ost about an initial list, duplicating the logic the Ruby throttler would use. And while the Ruby throttler found out in real time about list changes (provisioning, decommissioning servers in production), gh-ost had to be told about such changes, and pt-archiver’s list was immutable; we’d need to kill it and restart the operation for it to consider a different list. Other apps were mostly trying to operate similarly to the Ruby throttler, but never exactly.

As result, different apps would react differently to ongoing changes. One app would be able to gain the upper hand on another, running its own massive operation while starving the other.

The databases team members would have more complicated playbooks, and would need to run manual commands when changing our rosters. More importantly, the database team had no direct control of the apps. We would be able to cheat the apps into throttling if we wanted to, but it was all-or-nothing: either we would throttle everything or we would throttle nothing. Occasionally we would like to prioritize one operation over another, but had no way to do that.

The Ruby throttler provided great metric collection, but the other tools did not; or didn’t integrate well with the GitHub infrastructure, and we didn’t have visibility into what was being throttled and why.

We were wasting resources. The Ruby throttler would probe the MySQL replicas synchronously and sequentially per server host. Each throttle check by the app would introduce a latency to the operation by merely iterating the MySQL fleet, a wasted effort if no actual lag was found. It invoked the check for each request, which implied dozens of calls per second. That many duplicate calls were wasteful. As result we would see dozens or even hundreds of stale connections on our replica servers made by the throttler from various endpoints, either querying for lag or sleeping.

Introducing freno

We built freno, GitHub’s central throttling service, to replace all our existing throttling mechanisms and solve the operational issues and limitations described above.

freno is Spanish for brake, as in car brakes. The name throttled was already taken and freno was just the next most sensible name.

In its very basic essence, freno runs as a standalone service that understands replication lag status and can make recommendations to inquiring apps. However, let’s consider some of its key design and operational principles:

Continuous polls

freno continuously probes and queries the MySQL replicas. It does so independently of any app wanting to issue writes. It does so asynchronously to any app.

Knowledgable

freno continuously updates the list of servers per cluster. Within a 10 second timeframe freno will recognize that servers were removed or added. In our infrastructure, freno polls our GLB servers (seen as HAProxy) to get the roster of production replicas.

An app doesn’t need to (and actually just doesn’t) know the identity of backend servers, their count or location. It just needs to know the cluster it wants to write to.

Highly available

Multiple freno services form a raft cluster. Depending on cluster size some boxes can go down and still the freno service would be up, with a leader to serve traffic. Our proxies direct traffic to the freno leader (we have highly available proxies setup, but one may also detect the leader’s identity directly).

Cooperative

freno use is voluntary. It is not a proxy to the MySQL servers, and it has no power over the applications. It provides recommendations to applications that are interested in replication lag status. Those applications are expected to cooperate with freno’s recommendations.

An app issues a HEAD request to freno, and gets a 200 OK when it is clear to write, or a different code when it should throttle.

Throttle control

We are able to throttle specific apps, for a predefined duration and to a predefined degree. For example, an engineer can issue the following in chat:

shlomi-noach .freno throttle gh-ost ttl 120 ratio 0.5 Hubot gh-ost: ratio 0.5, expires at 2017-08-20T08:01:34-07:00

We may altogether refuse an app’s requests or only let it operate in low volumes.

Metrics

freno records all requests by all apps. We are able to see which app requested writes to which cluster and when. We know when it was granted access and when it was throttled.

Visibility

We are able to query freno and tell which metric hasn’t been healthy in the past 10 minutes, or who requested writes to a particular cluster, or what is being forcefully throttled, or what would be the response right now for a pt-archiver request on a cluster.

miguelff .freno sup Hubot All existing metrics: mysql/clusterA lag: 0.095545s mysql/clusterB lag: 0.070422s mysql/clusterC lag: 0.04162s mysql/clusterD lag: 0.39256s ... shlomi-noach .freno health 300 Hubot all metrics were reported healthy in past 300 seconds miguelff .freno check clusterA as pt-archiver Hubot { "StatusCode": 200, "Value": 0.085648, "Threshold": 0.8, "Message": "" }

freno brought a unified, auditable and controllable method for throttling our app writes. There is now a single, highly available, entity querying our MySQL servers.

Querying freno is as simple as running:

$ curl -s -I http://my.freno.com:9777/check/myscript/mysql/clusterB

where the myscript app requests access to the clusterB cluster. This makes it easy to use by any client.

Open source

freno has been open source since early in its development and is available under the MIT license.

While the simplest client may just run a HTTP GET request, we’ve also made available more elaborate clients:

freno was built for monitoring MySQL replication, but may be extended to collect and serve other metrics.

And then things took a turn

We wanted freno to serve those massive write operation jobs, typically initiated by background processes. However, we realized we could also use it to reduce read load from our masters.

Master reads

Reads from the master should generally be avoided, as they don’t scale well. There is only one master and it can only serve so many reads. Reads from the master are typically due to the consistent-read problem: a change has been made, and needs to be immediately visible in the next read. If the read goes to a replica, it may hit the replica too soon, before the change was replayed there.

There are various ways to solve the consistent-read problem, that include blocking on writes or blocking on reads. At GitHub, we have a peculiar common flow that can take advantage of freno.

Before freno, web and API GET requests were routed to the replicas only if the last write happened more than five seconds ago. This pseudo-arbitrary number had some sense: If the write was five seconds ago or more, we can safely read from the replica, because if the replica is lagging above five seconds, it means that there are worse problems to handle than a read inconsistency, like database availability being at risk.

When we introduced freno, we started using the information it knows about replication lag across the cluster to address this. Now, upon a GET request after a write, the app asks freno for the maximum replication lag across the cluster, and if the reported value is below the elapsed time since the last write (up to some granularity), the read is known to be safe and can be routed to a replica.

By applying that strategy, we managed to route to the replicas ~30% of the requests that before were routed to the master. As a consequence, the number of selects and threads connected on the master was reduced considerably, leaving the master with more free capacity:

We applied similar strategies to other parts of the system: Another example is search indexing jobs.

We index in Elasticsearch when a certain domain object changes. A change is a write operation, and as indexing happens asynchronously, we sometimes need to re-hydrate data from the database.

The time it takes to process a given job since the last write is in the order of a hundred milliseconds. As replication lag is usually above that value, we were also reading from the master within indexing jobs to have a consistent view of the data that was written. This job was responsible for another 11% of the reads happening on the master.

To reduce reads on the master from indexing jobs, we used the replication delay reported by freno to delay the execution of each indexing job until the data has been replicated. To do it, we store in the job payload the timestamp at which the write operation that triggered the job occurred, and based on the replication delay reported by freno, we wait until we are sure the data was replicated. This happens in less than 600ms 95% of the time.

The above two scenarios account for ~800 rps to freno, but replication delay cannot grow faster than the clock, and we used this fact to optimize access and let freno scale to growing usage demands. We implemented client side caching over memcache, using the same replication delay values for 20ms, and adding a cushioning time to compensate both freno sampling rates and the caching TTL. This way, we capped to 50rps to freno from the main application.

Scaling freno

Seeing that freno now serves production traffic, not just backend processes, we are looking into its serving capacity. At this time a single freno process is well capable of serving what requests we’re sending its way, with much room to spare. If we need to scale it, we can bootstrap and run multiple freno clusters: either used by different apps and over different clusters (aka sharding), or just for sharing read load, much like we add replicas to share query read load. The replicas themselves can tolerate many freno clients, if needed. Recall that we went from dozens/hundreds of throttler connections to a single freno connection; there’s room to grow.

Motivated by the client-side caching applied in the web application, we now can tell freno to write results to memcached, which can be used to decouple freno’s probing load from the app’s requests load.

Looking into the future

freno & freno-client make it easy for apps to add throttling to massive operations. However, the engineer still needs to be aware that their operation should use throttling, and it takes a programmatic change to the code to call upon throttling. We’re looking into intelligently identifying queries: the engineer would run “normal” query code, and an adapter or proxy would identify whether the query requires throttling, and how so.

Identifying MySQL SSL communication using ngrep

Prior to MySQL 5.7 client communications with a MySQL instance were unencrypted by default. This plaintext capability allowed for various tools including pt-query-digest to analyze TCP/IP traffic. Starting with MySQL 5.7 SSL is enabled by default and will be used in client communications if SSL is detected on the server.

We can detect and confirm this easily with the ngrep command.

Default client connection traffic (5.6)

On a MySQL 5.6 server we monitor the default MySQL port traffic.

mysql56$ sudo ngrep -d eth1 -wi -P ' ' -W single -l port 3306 interface: eth1 (192.168.42.0/255.255.255.0) filter: ( port 3306 ) and ((ip || ip6) || (vlan && (ip || ip6))) ...

We connect to this server using TCP/IP.

host$ mysql -uexternal -p -h192.168.42.16 mysql56> select 'unencrypted';

We can observe the communication to and from the server (in this example 192.168.42.16) is plaintext.

... # T 192.168.42.1:47634 -> 192.168.42.16:3306 [AP] select 'unencrypted' # T 192.168.42.16:3306 -> 192.168.42.1:47634 [AP] ! def unencrypted ! ! unencrypted # SSL System Variables (5.6 default)

A default 5.6 installation does not have SSL enabled as verified by system variables.

mysql56 >SHOW VARIABLES LIKE '%ssl%'; +---------------+----------+ | Variable_name | Value | +---------------+----------+ | have_openssl | DISABLED | | have_ssl | DISABLED | | ssl_ca | | | ssl_capath | | | ssl_cert | | | ssl_cipher | | | ssl_crl | | | ssl_crlpath | | | ssl_key | | +---------------+----------+ 9 rows in set (0.02 sec) Default client connection traffic (5.7)

Running the same example client connection with MySQL 5.7 you will observe that communications to and from the server (in this example 192.168.42.17) are not in plaintext.

mysql57$ sudo ngrep -d eth1 -wi -P ' ' -W single -l port 3306 interface: eth1 (192.168.42.0/255.255.255.0) filter: ( port 3306 ) and ((ip || ip6) || (vlan && (ip || ip6))) host$ mysql -uexternal -p -h192.168.42.17 mysql57> select 'encrypted'; T 192.168.42.1:36781 -> 192.168.42.17:3306 [AP] @ F l d iVr H b ^ s t Z ( 2d " ? | ) # T 192.168.42.17:3306 -> 192.168.42.1:36781 [AP] p% s` 3u5!%P] v= r # x E a y '! )Z 8 Js z. \t (r H@ 0 2 5k\ SSL System Variables (5.7 default)

A new MySQL 5.7 installation will have SSL enabled by default as seen in the MySQL variables.

mysql57 > SHOW GLOBAL VARIABLES LIKE '%ssl%'; +---------------+-----------------+ | Variable_name | Value | +---------------+-----------------+ | have_openssl | YES | | have_ssl | YES | | ssl_ca | ca.pem | | ssl_capath | | | ssl_cert | server-cert.pem | | ssl_cipher | | | ssl_crl | | | ssl_crlpath | | | ssl_key | server-key.pem | +---------------+-----------------+ 9 rows in set (0.00 sec) -no-ssl Client connection traffic (5.7)

If you want to emulate the unencrypted nature of MySQL 5.6 within any SSL enabled MySQL version (e.g. MySQL 5.7) you specify the --ssl option for mysql client connections. In MySQL 5.7 this option is also deprecated and --ssl-mode=disabled should be used>.

host$ > mysql -uexternal -p -h192.168.42.17 --ssl=0 host >select '-ssl=0 unencrypted'; T 192.168.42.1:36785 -> 192.168.42.17:3306 [AP] select '-ssl=0 unencrypted' # T 192.168.42.17:3306 -> 192.168.42.1:36785 [AP] ' def -ssl=0 unencrypted ! 3 -ssl=0 unencrypted References

https://wiki.christophchamp.com/index.php?title=Ngrep
http://infoheap.com/ngrep-quick-start-guide/
Encrypted Connections - MySQL 5.7 Reference Manual
Implementing MySQL Security Features - Tutorial at Percona Live Europe 2017.

A Mystery with open_files_limit

In this blog, we’ll look at a mystery around setting the open_file_limit variable in MySQL and Percona Server for MySQL.

MySQL Server needs file descriptors to run. It uses them to open new connections, store tables in the cache, create temporary tables to resolve complicated queries and access persistent ones. If mysqld is not able to open new files when needed, it can stop functioning correctly. A common symptom of this issue is error 24: “Too many open files.”

The number of file descriptors

mysqld can open simultaneously is defined by the configuration open_files_limit option. You would expect it to work like any other MySQL Server option: set in the configuration file, restart mysqld and use more or fewer descriptors. All other configuration variables work this way. But open_files_limit also depends on the operating system (OS) limits. This makes setting the variable more complicated. mysqld

As a user, when you start any application it cannot have limits set to be greater than the limits defined by the operating system for the user in question. Therefore, you would intuitively expect

mysqld to set open_files_limit  to any value that is less than the OS limit. This is not the case, however. No matter what value you set for the open_files_limit variable, the OS limit is used unless it is set to “infinity”.sveta@Thinkie:~$ ulimit -n 32000 sveta@Thinkie:$ cat /etc/my.cnf [mysqld] open-files-limit=16000 ... sveta@Thinkie:$ ./bin/mysqld & sveta@Thinkie:$ mysql -uroot Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 3 Server version: 5.7.19-17-debug-log Source distribution Copyright (c) 2009-2017 Percona LLC and/or its affiliates Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql> select @@open_files_limit; +--------------------+ | @@open_files_limit | +--------------------+ | 32000 | +--------------------+ 1 row in set (0.00 sec)

The reason for this can be found in the code contained in the 

mysys/my_file.c file:static uint set_max_open_files(uint max_file_limit) {   struct rlimit rlimit;   uint old_cur;   DBUG_ENTER("set_max_open_files");   DBUG_PRINT("enter",("files: %u", max_file_limit));   if (!getrlimit(RLIMIT_NOFILE,&rlimit))   {     old_cur= (uint) rlimit.rlim_cur;     DBUG_PRINT("info", ("rlim_cur: %u  rlim_max: %u",             (uint) rlimit.rlim_cur,             (uint) rlimit.rlim_max));     if (rlimit.rlim_cur == RLIM_INFINITY)       rlimit.rlim_cur = max_file_limit;     if (rlimit.rlim_cur >= max_file_limit)       DBUG_RETURN(rlimit.rlim_cur);    /* purecov: inspected */     rlimit.rlim_cur= rlimit.rlim_max= max_file_limit;     if (setrlimit(RLIMIT_NOFILE, &rlimit))       max_file_limit= old_cur;          /* Use original value */     else     {        rlimit.rlim_cur= 0;          /* Safety if next call fails */       (void) getrlimit(RLIMIT_NOFILE,&rlimit);       DBUG_PRINT("info", ("rlim_cur: %u", (uint) rlimit.rlim_cur));       if (rlimit.rlim_cur)          /* If call didn't fail */     max_file_limit= (uint) rlimit.rlim_cur;     }    }   DBUG_PRINT("exit",("max_file_limit: %u", max_file_limit));   DBUG_RETURN(max_file_limit); }

Particularly these lines:

if (rlimit.rlim_cur >= max_file_limit) DBUG_RETURN(rlimit.rlim_cur);    /* purecov: inspected */

This code tells

mysqld to take the maximum value of what is specified in either the variable open_files_limit, or the soft system user limit.

I reported this behavior as documentation bug #87681.

mysqld_safe

mysqld_safe has its own open_files_limit option. This option allows you to overwrite the system soft limit any way you want. However, on:
  • Red Hat Enterprise Linux 7
  • Oracle Linux 7
  • CentOS 7
  • SUSE Linux Enterprise Server 12
  • Fedora 25 and 26
  • Debian 8 or higher
  • Ubuntu 16.04 LTS or higher

This option as specified under the 

[mysqld_safe] header in the configuration file is not used when you start mysqld as a service. To explain the reason for this behavior, we need to step back into history. init.d

For a long time, many Linux Operating Systems used init.d to start certain commands together with the OS. The Init daemon executes scripts (usually located in the directory

/etc/init.d) at system startup, depending on the runlevel.

The different implementations of

init.d vary, but they have known drawbacks. For example, init.d starts everything sequentially. This means a new process has to wait if another has already started. This makes the startup process on multi-core machine slow. Another drawback related to our topic is that daemons started by init.d inherit OS limits from the root user. If a program needs to be run by another user, the switch needs to happen in the startup script itself. But the order of option files that such users read can be different, depending if they are logged in via the ssh, su or sudo commands. MySQL Server

MySQL Server’s startup sequence for the service is as follow:

  1. <Perform another job>
  2. Start mysqld_safe as mysql user: su - mysql -s /bin/bash -c "mysqld_safe > /dev/null &"

This behavior has existed at least since version 5.5.

Percona Server for MySQL

Before version 5.7, Percona Server for MySQL had a different startup sequence:

  1. <Perform another job>
  2. Start mysqld_safe as root and pass option --user=mysql to it: "${PERCONA_PREFIX}"/bin/mysqld_safe > /dev/null 2>&1 &

With this sequence, you only need to set a hard limit for a mysql user in the file 

/etc/security/limits.conf, and mysqld_safe will do the rest.

In version 5.7, Percona Server for MySQL backported the startup sequence from MySQL Server. Since then, setting a hard limit on the number of open files for mysql users in 

/etc/security/limits.conf is not enough. You also need to have a row session required pam_limits.so in the file /etc/pam.d/common-session. This is needed because the startup sequence for mysql users changed due to the design of  init.d. SystemD

Linux developers performed several trials to find a better startup solution than

init.d. Speaking for MySQL and Percona Server for MySQL startup, the most important innovation is SystemD. SystemD is becoming more and more popular. Therefore MySQL and Percona Server for MySQL do not use init.d on Red Hat Enterprise Linux 7, Oracle Linux 7, CentOS 7, SUSE Linux Enterprise Server 12, Fedora 25 and 26, Debian 8 or higher and Ubuntu 16.04 LTS or higher. Instead, they use SystemD.

What does this mean for MySQL users?

Scripts started by SystemD start as required by the system user from the start. Therefore they do not inherit limits from the root user and use their own limits specified in 

/etc/security/limits.conf. If you need to have your mysqld process limits differ from the defaults for user mysql, you need to set the option LimitNOFILE under the [Service] section in the service configuration file. Again, you cannot then lower this limit using open_files_limit option, unless you set it to Infinity. Both packages

To make things more complex, Percona Server for MySQL packages for Ubuntu contain both the 

mysql.server script (used by init.d) and the service description for SystemD. In fact, SystemD is used after install — but you might be confused when looking at only the package files. Conclusion

You should set the 

open_files_limit variable together with the operating system limits. You should study how init.d or SystemD works if you see values that you don’t expect.

How to change open_files_limit variable?

Operating System Startup daemon Where to put configuration Red Hat Enterprise Linux 7, Oracle Linux 7, CentOS 7
SUSE Linux Enterprise Server 12
Fedora 25 and 26
Debian 8+
Ubuntu 16.04 LTS+ SystemD /etc/security/limits.conf and
/etc/pam.d/common-session
Service configuration: sudo systemctl edit mysql
[mysqld] section of the configuration file Others init.d /etc/security/limits.conf and
/etc/pam.d/common-session
[mysqld_safe] section of the configuration file
[mysqld] section of the configuration file

 

Which values of open_files_limit variable make sense?

Soft User Limit open_files_limit range Infinity Any Positive Greater/equal than soft user limit and smaller than hard user limit

 

Lesson 06: Working with Database Structures

Notes/errata/updates for Chapter 6:
See the official book errata at http://tahaghoghi.com/LearningMySQL/errata.php – Chapter 6 includes pages 179 – 222.

Other notes:
At the end of the “Creating Tables” section (p.183-4), it says “We like using the underscore character to separate words, but that’s just a matter of style and taste; you could use underscores or dashes, or omit the word-separating formatting altogether.” While this is true, beware of using a dash, because MySQL will try to interpret “two-words”, thinking – is a minus sign. I recommend avoiding dashes for this reason (even though the book does this on page 215).

At the end of the “Collation and Character Sets” section (p.186), it says “When you’re creating a database, you can set the default character set and sort order for the database and its tables.” Note that the default character set for the server will set the default character set for any new databases created if a default character set is not specified; there is no change in existing databases. In turn, the default character set for the database sets the default character set for any new tables created but does not change any existing tables, and the default character set for a table determines the default character set for each column, which can be overridden by specifying a character set when defining a column.

Under the “Other Features” section it references a section called “Table Types”. This section is in chapter 7, p. 267.

Under the “Other Features” section it shows the SHOW CREATE TABLE example (p. 187). By default, MySQL shows you output in horizontal format – that is, one table row is shown as one row in the output. However, you can have MySQL show you output in vertical format, where one column is shows as one row in the output. Instead of using ; to end a query, use \G

Try it with:
SHOW CREATE TABLE artist;
vs
SHOW CREATE TABLE artist\G

And see the difference.

In the “Column Types” section on page 194, it says that “Only one TIMESTAMP column per table can be automatically set to the current date and time on insert or update.” This is not true as of MySQL version 5.6.5 and higher. As per the documentation at https://dev.mysql.com/doc/refman/5.6/en/timestamp-initialization.html: “For any TIMESTAMP or DATETIME column in a table, you can assign the current timestamp as the default value, the auto-update value, or both.”

In the section called “The AUTO_INCREMENT Feature”, on page 211, it says “If, however, we delete all the data in the table, the counter is reset to 1.” The example shows the use of TRUNCATE TABLE. Note that if you deleted all the data in the table with DELETE, such as “DELETE FROM count WHERE 1=1;”, the counter is NOT reset.

Supplemental material:
Data types:
Podcast on Strings
Podcast on Numeric data types
Podcast on ENUM, SET and different SQL modes
Podcast on Times and time zones

Topics covered:
How to CREATE, DROP and ALTER databases, tables, columns and indexes
Collations and character sets
Data types
AUTO_INCREMENT

Reference/Quick Links for MySQL Marinate

Pages