Planet MySQL

Percona Monitoring and Management (PMM) 1.16.0 Is Now Available

PMM (Percona Monitoring and Management) is a free and open-source platform for managing and monitoring MySQL, MongoDB, and PostgreSQL performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL® and MongoDB® servers to ensure that your data works as efficiently as possible.

While much of the team is working on longer-term projects, we were able to provide the following feature:

  • MySQL and PostgreSQL support for all cloud DBaaS providers – Use PMM Server to gather Metrics and Queries from remote instances!
  • Query Analytics + Metric Series – See Database activity alongside queries
  • Collect local metrics using node_exporter + textfile collector

We addressed 11 new features and improvements, and fixed 21 bugs.

MySQL and PostgreSQL support for all cloud DBaaS providers

You’re now able to connect PMM Server to your MySQL and PostgreSQL instances, whether they run in a cloud DBaaS environment, or you simply want Database metrics without the OS metrics.  This can help you get up and running with PMM using minimal configuration and zero client installation, however be aware there are limitations – there won’t be any host-level dashboards populated for these nodes since we don’t attempt to connect to the provider’s API nor are we granted access to the instance in order to deploy an exporter.

How to use

Using the PMM Add Instance screen, you can now add instances from any cloud provider (AWS RDS and Aurora, Google Cloud SQL for MySQL, Azure Database for MySQL) and benefit from the same dashboards that you are already accustomed to. You’ll be able to collect Metrics and Queries from MySQL, and Metrics from PostgreSQL.  You can add remote instances by selecting the PMM Add Instance item in a PMM group of the system menu:

where you will then have the opportunity to add a Remote MySQL or Remote PostgreSQL instance:

You’ll add the instance by supplying just the Hostname, database Username and Password (and optional Port and Name):

Also new as part of this release is the ability to display nodes you’ve added, on screen RDS and Remote Instances:

Server activity metrics in the PMM Query Analytics dashboard

The Query Analytics dashboard now shows a summary of the selected host and database activity metrics in addition to the top ten queries listed in a summary table.  This brings a view of System Activity (CPU, Disk, and Network) and Database Server Activity (Connections, Queries per Second, and Threads Running) to help you better pinpoint query pileups and other bottlenecks:

Extending metrics with node_exporter textfile collector

While PMM provides an excellent solution for system monitoring, sometimes you may have the need for a metric that’s not present in the list of node_exporter metrics out of the box. There is a simple method to extend the list of available metrics without modifying the node_exporter code. It is based on the textfile collector.  We’ve enabled this collector as on by default, and is deployed as part of linux:metrics in PMM Client.

The default directory for reading text files with the metrics is /usr/local/percona/pmm-client/textfile-collector, and the exporter reads files from it with the .prom extension. By default it contains an example file example.prom which has commented contents and can be used as a template.

You are responsible for running a cronjob or other regular process to generate the metric series data and write it to this directory.

Example – collecting docker container information

This example will show you how to collect the number of running and stopped docker containers on a host. It uses a crontab task, set with the following lines in the cron configuration file (e.g. in /etc/crontab):

*/1* * * * root echo -n "" > /tmp/docker_all.prom; docker ps -a -q | wc -l | xargs echo node_docker_containers_total >> /usr/local/percona/pmm-client/docker_all.prom; */1* * * * root echo -n "" > /tmp/docker_running.prom; docker ps | wc -l | xargs echo node_docker_containers_running_total >> /usr/local/percona/pmm-client/docker_running.prom;

The result of the commands is placed into the docker_all.prom and docker_running.prom files and read by exporter and will create two new metric series named node_docker_containers_total and node_docker_containers_running_total, which we’ll then plot on a graph:

New Features and Improvements
  • PMM-3195 Remove the light bulb
  • PMM-3194 Change link for “Where do I get the security credentials for my Amazon RDS DB instance?”
  • PMM-3189 Include Remote MySQL & PostgreSQL instance logs into PMM Server logs.zip system
  • PMM-3166 Convert status integers to strings on ProxySQL Overview Dashboard – Thanks,  Iwo Panowicz for  https://github.com/percona/grafana-dashboards/pull/239
  • PMM-3133 Include Metric Series on Query Analytics Dashboard
  • PMM-3078 Generate warning “how to troubleshoot postgresql:metrics” after failed pmm-admin add postgresql execution
  • PMM-3061 Provide Ability to Monitor Remote MySQL and PostgreSQL Instances
  • PMM-2888 Enable Textfile Collector by Default in node_exporter
  • PMM-2880 Use consistent favicon (Percona logo) across all distribution methods
  • PMM-2306 Configure EBS disk resize utility to run from crontab in PMM Server
  • PMM-1358 Improve Tooltips on Disk Space Dashboard – thanks, Corrado Pandiani for texts
Fixed Bugs
  • PMM-3202 Cannot add remote PostgreSQL to monitoring without specified dbname
  • PMM-3186 Strange “Quick ranges” tag appears when you hover over documentation links on PMM Add Instance screen
  • PMM-3182 Some sections for MongoDB are collapsed by default
  • PMM-3171 Remote RDS instance cannot be deleted
  • PMM-3159 Problem with enabling RDS instance
  • PMM-3127 “Expand all” button affects JSON in all queries instead of the selected one
  • PMM-3126 Last check displays locale format of the date
  • PMM-3097 Update home dashboard to support PostgreSQL nodes in Environment Overview
  • PMM-3091 postgres_exporter typo
  • PMM-3090 TLS handshake error in PostgreSQL metric
  • PMM-3088 It’s possible to downgrade PMM from Home dashboard
  • PMM-3072 Copy to clipboard is not visible for JSON in case of long queries
  • PMM-3038 Error adding MySQL queries when options for mysqld_exporters are used
  • PMM-3028 Mark points are hidden if an annotation isn’t added in advance
  • PMM-3027 Number of vCPUs for RDS is displayed incorrectly – report and proposal from Janos Ruszo
  • PMM-2762 Page refresh makes Search condition lost and shows all queries
  • PMM-2483 LVM in the PMM Server AMI is poorly configured/documented – reported by Olivier Mignault  and lot of people involved.  Special thanks to  Chris Schneider for checking with fix options
  • PMM-2003 Delete all info related to external exporters on pmm-admin list output
How to get PMM Server

PMM is available for installation using three methods:

Help us improve our software quality by reporting any Percona Monitoring and Management bugs you encounter using our bug tracking system.

How To Best Use Percona Server Column Compression With Dictionary

Very often, database performance is affected by the inability to cache all the required data in memory. Disk IO, even when using the fastest devices, takes much more time than a memory access. With MySQL/InnoDB, the main memory cache is the InnoDB buffer pool. There are many strategies we can try to fit as much data as possible in the buffer pool, and one of them is data compression.

With regular MySQL, to compress InnoDB data you can either use “Barracuda page compression” or “transparent page compression with punch holes”. The use of the ZFS filesystem is another possibility, but it is external to MySQL and doesn’t help with caching. All these solutions are transparent, but often they also have performance and management implications. If you are using Percona Server for MySQL, you have yet another option, “column compression with dictionary“. This feature is certainly not receiving the attention it merits. I think it is really cool—let me show you why.

We all know what compression means, who has not zipped a file before attaching it to an email? Compression removes redundancy from a file. What about the dictionary? A compression dictionary is a way to seed the compressor with expected patterns, in order to improve the compression ratio. Because you can specify a dictionary, the scope of usefulness of column compression with the Percona Server for MySQL feature is greatly increased. In the following sections, we’ll review the impacts of a good dictionary, and devise a way to create a good one without any guessing.

A simple use case

A compression algorithm needs a minimal amount of data in order to achieve a reasonable compression ratio. Typically, if the object is below a few hundred bytes, there is rarely enough data to have repetitive patterns and when the compression header is added, the compressed data can end up larger than the original.

mysql> select length('Hi!'), length(compress('Hi!')); +---------------+-------------------------+ | length('Hi!') | length(compress('Hi!')) | +---------------+-------------------------+ | 3 | 15 | +---------------+-------------------------+ 1 row in set (0.02 sec)

Compressing a string of three bytes results in a binary object of 15 bytes. That’s counter productive.

In order to illustrate the potential of the dictionary, I used this dataset:

http://skeeto.s3.amazonaws.com/share/JEOPARDY_QUESTIONS1.json.gz

It is a set of 100k Jeopardy questions written in JSON. To load the data in MySQL, I created the following table:

mysql> show create table TestColCompression\G *************************** 1. row *************************** Table: TestColCompression Create Table: CREATE TABLE `TestColCompression` ( `id` int(11) NOT NULL AUTO_INCREMENT, `question` text NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=79977 DEFAULT CHARSET=latin1 1 row in set (0.00 sec)

Then, I did some formatting to create insert statements:

zcat JEOPARDY_QUESTIONS1.json.gz | perl -p -e 's/\[\{/\{/g' | perl -p -e 's/\}, \{/\}\n\{/g' | perl -p -e "s/'/''/g" | \ (while read line; do echo "insert into testColComp (questionJson) values ('$line');"; done )

And I executed the inserts. About 20% of the rows had some formatting issues but nevertheless, I ended up with close to 80k rows:

mysql> show table status\G *************************** 1. row *************************** Name: TestColCompression Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 78110 Avg_row_length: 316 Data_length: 24690688 Max_data_length: 0 Index_length: 0 Data_free: 4194304 Auto_increment: 79977 Create_time: 2018-10-26 15:16:41 Update_time: 2018-10-26 15:40:34 Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: 1 row in set (0.00 sec)

The average row length is 316 bytes for a total data size of 23.55MB. The question JSON objects are large enough to matter, but barely large enough for compression. Here are the first five rows:

mysql> select question from TestColCompression limit 5\G *************************** 1. row *************************** question: {"category": "HISTORY", "air_date": "2004-12-31", "question": "'For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory'", "value": "$200", "answer": "Copernicus", "round": "Jeopardy!", "show_number": "4680"} *************************** 2. row *************************** question: {"category": "ESPN's TOP 10 ALL-TIME ATHLETES", "air_date": "2004-12-31", "question": "'No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves'", "value": "$200", "answer": "Jim Thorpe", "round": "Jeopardy!", "show_number": "4680"} *************************** 3. row *************************** question: {"category": "EVERYBODY TALKS ABOUT IT...", "air_date": "2004-12-31", "question": "'The city of Yuma in this state has a record average of 4,055 hours of sunshine each year'", "value": "$200", "answer": "Arizona", "round": "Jeopardy!", "show_number": "4680"} *************************** 4. row *************************** question: {"category": "OLD FOLKS IN THEIR 30s", "air_date": "2009-05-08", "question": "'The district of conservative rep. Patrick McHenry in this state includes Mooresville, a home of NASCAR'", "value": "$800", "answer": "North Carolina", "round": "Jeopardy!", "show_number": "5690"} *************************** 5. row *************************** question: {"category": "MOVIES & TV", "air_date": "2009-05-08", "question": "'Tim Robbins played a public TV newsman in "Anchorman: The Legend of" him'", "value": "$800", "answer": "Ron Burgundy", "round": "Jeopardy!", "show_number": "5690"}

Let’s begin by a straight column compression without specifying a dictionary:

mysql> alter table TestColCompression modify question text COLUMN_FORMAT COMPRESSED; Query OK, 79976 rows affected (4.25 sec) Records: 79976 Duplicates: 0 Warnings: 0 mysql> analyze table TestColCompression; +----------------------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +----------------------------+---------+----------+----------+ | colcomp.TestColCompression | analyze | status | OK | +----------------------------+---------+----------+----------+ mysql> show table status\G *************************** 1. row *************************** Name: TestColCompression Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 78995 Avg_row_length: 259 Data_length: 20496384 Max_data_length: 0 Index_length: 0 Data_free: 4194304 Auto_increment: 79977 Create_time: 2018-10-26 15:47:56 Update_time: 2018-10-26 15:47:56 Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: 1 row in set (0.00 sec)

As expected the data didn’t compress much. The compression ration is 0.82 or if expressed as a percentage, 18%. Since the JSON headers are always the same, and are present in all questions, we should minimally use them for the dictionary. Trying a minimal dictionary made of the headers gives:

mysql> SET @dictionary_data = 'category' 'air_date' 'question' 'value' 'answer' 'round' 'show_number' ; Query OK, 0 rows affected (0.01 sec) mysql> CREATE COMPRESSION_DICTIONARY simple_dictionary (@dictionary_data); Query OK, 0 rows affected (0.00 sec) mysql> alter table TestColCompression modify question text COLUMN_FORMAT COMPRESSED WITH COMPRESSION_DICTIONARY simple_dictionary; Query OK, 79976 rows affected (4.72 sec) Records: 79976 Duplicates: 0 Warnings: 0 mysql> analyze table TestColCompression; +----------------------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +----------------------------+---------+----------+----------+ | colcomp.TestColCompression | analyze | status | OK | +----------------------------+---------+----------+----------+ 1 row in set (0.00 sec) mysql> show table status\G *************************** 1. row *************************** Name: TestColCompression Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 78786 Avg_row_length: 246 Data_length: 19447808 Max_data_length: 0 Index_length: 0 Data_free: 4194304 Auto_increment: 79977 Create_time: 2018-10-26 17:58:17 Update_time: 2018-10-26 17:58:17 Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: 1 row in set (0.00 sec)

There is a little progress, we now have a compression ratio of 0.79. Obviously, we could do more but without a tool, we’ll have to guess. A compressor like zlib builds a dictionary as part of its compression effort, could we use that? Yes, but only if we can generate it correctly and access the result. That’s not readily available with the common compressors I know. Fortunately, someone else had the same issue and wrote a compressor able to save its dictionary. Please let me introduce femtozip.

Femtozip to the rescue

The tool, by itself, has no magic algorithm. It is based on zlib, from what I can understand from the code. Anyway, we won’t compress anything with it, we’ll use it to generate a good dictionary. In order to create a dictionary, the tool looks at a set of files and try to see patterns between them. The use of a single big file defeat the purpose. So, I generated one file per question with:

mkdir questions cd questions l=1; mysql -u blog -pblog colcomp -e 'select question from TestColCompression' | (while read line; do echo $line > ${l}; let l=l+1; done)

Then, I used the following command to generate a 1024 bytes dictionary using all the files starting by “1”:

../femtozip/cpp/fzip/src/fzip --model ../questions_1s.mod --build --dictonly --maxdict 1024 1* Building dictionary...

In about 10s the job was done. I tried with all the 80k files and… I had to kill the process after thirty minutes. Anyway, there are 11111 files starting with “1”, a very decent sample. Our generated dictionary looks like:

cat ../questions_1s.mod ", "air_date", "round": "Double Jeopardy!", "show_number": " of this for 00", "answer": "the 0", "question": "'e", "round": "Jeopardy!", "show_number": "r", "round": "{"cate gory": "S", "air_date": "1998-s", "round": "Double Jeopardy!", "show_number": " of the ", "air_date": "2008-{"category": "THE {"category": "As", "round": "Jeopardy!", "show_number": "4", "question": "'Jeopardy!", "show_number": "2'", "value": "$1000", "answer": "7", "question": "'The ", "question": "'A'", "value": "$600", "answer": "9", "questi on": "'In ", "question": "'This 3", "question": "'2", "question": "'e'", "value": "$", "round": "Double Jeopardy!", "show_number": "4", "round": "Jeopardy!", "show_number": "4"'", "value": "$S", "air_date": "199", "round": "Double Jeopardy!", "show_number": "5s'", "value": "$", "round": "Double Jeopardy!", "show_number": "3", "round": "Jeopardy !", "show_number": "3", "round": "Jeopardy!", "show_number": "5'", "value": "$200", "answer": "'", "value": "$800", "answer": "'", "value": "$400", "answer": "

With some formatting, I was able to create a dictionary with the above data:

mysql> SET @dictionary_data = '", "air_date", "round": "Double Jeopardy!", "show_number": " of this for 00", "answer": "the 0", "question": "''e", "round": "Jeopardy!", "show_number": "r", "round": "{"category": "S", "air_date": "1998-s", "round": "Double Jeopardy!", "show_number": " of the ", "air_date": "2008-{"category": "THE {"category": "As", "round": "Jeopardy!", "show_number": "4", "question": "''Jeopardy!", "show_number": "2''", "value": "$1000", "answer": "7", "question": "''The ", "question": "''A''", "value": "$600", "answer": "9", "question": "''In ", "question": "''This 3", "question": "''2", "question": "''e''", "value": "$", "round": "Double Jeopardy!", "show_number": "4", "round": "Jeopardy!", "show_number": "4"''", "value": "$S", "air_date": "199", "round": "Double Jeopardy!", "show_number": "5s''", "value": "$", "round": "Double Jeopardy!", "show_number": "3", "round": "Jeopardy!", "show_number": "3", "round": "Jeopardy!", "show_number": "5''", "value": "$200", "answer": "''", "value": "$800", "answer": "''", "value": "$400", "answer": "' ; Query OK, 0 rows affected (0.00 sec) mysql> CREATE COMPRESSION_DICTIONARY femtozip_dictionary (@dictionary_data); Query OK, 0 rows affected (0.00 sec) <\pre> And then, I altered the table to use the new dictionary:

mysql> alter table TestColCompression modify question text COLUMN_FORMAT COMPRESSED WITH COMPRESSION_DICTIONARY femtozip_dictionary; Query OK, 79976 rows affected (4.05 sec) Records: 79976 Duplicates: 0 Warnings: 0 mysql> analyze table TestColCompression; +----------------------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +----------------------------+---------+----------+----------+ | colcomp.TestColCompression | analyze | status | OK | +----------------------------+---------+----------+----------+ 1 row in set (0.00 sec) mysql> show table status\G *************************** 1. row *************************** Name: TestColCompression Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 79861 Avg_row_length: 190 Data_length: 15220736 Max_data_length: 0 Index_length: 0 Data_free: 4194304 Auto_increment: 79977 Create_time: 2018-10-26 17:56:09 Update_time: 2018-10-26 17:56:09 Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: 1 row in set (0.00 sec)

That’s interesting, we are now achieving a ratio of 0.61, a significant improvement. I pushed my luck and tried with a 2048 bytes dictionary. It further reduced the ratio to 0.57 but that was about the best I got. Larger dictionaries didn’t lower the ratio below 0.57. Zlib supports up to 32KB for the dictionary.

So, to recap:

  • column compression without dictionary, ratio of 0.82
  • column compression with simple dictionary, ratio of 0.79
  • column compression with a 1k dictionary from femtozip, ratio of 0.61
  • column compression with a 2k dictionary from femtozip, ratio of 0.57

The above example stores a JSON document in a text column. MySQL 5.7 includes a JSON datatype which behaves a bit differently regarding the dictionary. Delimiting characters like ‘{}’ are removed in the on disk representation of a JSON column. If you have TBs of data in similar tables, you should really consider column compression and a systematic way of determining the dictionary with femtozip. In addition to improve the compression, it is likely to be the less performance impacting solution. Would it be interesting to generate a dictionary from existing data with a command like this one?

CREATE COMPRESSION_DICTIONARY_FROM_DATA A_good_dictionary (2048, select questions from TestColCompression limit 10000);

where the dictionary creation process would implicitly includes steps similar to the ones I did with femtozip.

How To Implement Parent-Child Relationship in MySQL

If the first time you heard this phrase “parent-child relationship in MySQL”, you went “Ok, anyone else got cool made up names or are we to start brainstorming now”, then this article is for you. For the uninitiated, this might not make much sense. That’s perfectly fine because you do not necessarily need it everyday.

Parent-child relationship in MySQL has to deal with establishing some hierarchal dependencies between records in a database table. In MySQL terms, it means say that “row 12 is the parent of row 14” and stuff like that. In business terms, it will mean establishing a dependency relationship between two similar entities. For example, keeping a record of all the people in a village and matching parents to their children.

So, whenever you hear that phrase, think about a real-world parent-child relationship. For the rest of this article, we are going to explore how to establish said relationships and possible use cases.

Prerequisites
  1. Basic knowledge of any SQL-based database.
  2. Basic knowledge of programming to help you use it in a real application
Why Should We Use Parent-Child Relationships

Let’s assume we are building a productivity tool like Slack and we want to ensure we follow database normalization principles and eliminate redundant data from our database.

Hypothetically, our database will at least have the following tables:

companies id name subdomain user_id created_at users id company_id first_name last_name email password created_at

We have a really simple table now. Without thinking too much, you already know that a user will own a company, and multiple users can belong to a company. Right now, all is well and good here and we can proceed to build a great product… yay!!!

Issues will begin to arise when the user creates like 3 more companies (she has a saloon, a restaurant and a cake shop). She needs those 3 company profiles to manage her different businesses and different staff of the businesses. Now, she has to register 3 times with probably the same credentials. Well, it’s just one person, so no much harm done, right?

id company_id first_name last_name email password created_at 1 10 Aba Doe me@email.com ---------- 2012 2 217 Abb Doe me@email.com ---------- 2015 3 219 Abc Doe me@email.com ---------- 2018

She then invites 10 staff each to these companies, which is not much if you think about it. But, each of these 10 staff also belongs to at least one more company profiles, either a local group, another side business or an open source project. Now, you have each of these staff registering twice with the same credentials. I’m sweating a little bit thinking about it, but maybe we do not have a crisis on our hands.

Now, scale this scenario up by 1000x and try to imagine what your database table looks like… Take it all in my friend… Take it all in.

Establishing Parent-Child Relationships

The first thing we need to do would be to add a parent_id column to the users table. That column will hold the id of the first record we created for the user. We can then afford to not ask the user to create an account every time they are invited to a company. They can join directly and maybe set their firstname and lastname on that company (you have to allow them to do that). Our users table can now look something like this:

id company_id first_name last_name email password parent_id created_at 1 10 Aba Doe me@email.com ---------- NULL 2012 2 217 Abb Doe NULL NULL 1 2015 3 219 Abc Doe u@email.com ---------- NULL 2016 4 411 Abd Doe NULL NULL 2 2017 5 580 Abe Doe NULL NULL 1 2018

This looks better than what we had above, yes? We have eliminated redundant data completely from our table. Querying the records for our table becomes easier. After users are logged in, you can run a query like this to fetch all the companies tied to the user:

SELECT * FROM `companies` WHERE `id` IN ( SELECT `company_id` FROM `users` WHERE `id` = 1 OR `parent_id` = 1 )

Then you can list out all the companies for the user to select, and you can pick the user’s profile based on the companies they selected. You can do a similar query for that selection this time around:

SELECT * FROM `users` WHERE (`company_id` = 217 AND `parent_id` = 1) OR (`company_id` = 217 AND `id` = 1);

We use two conditions because we are not absolutely sure if we are selecting a parent or a child.

A better option may be to select both the company and the user profile tied to it, so when the user clicks, you can just retrieve that record directly.

SELECT `companies`.`name`, `users`.`id`, `users`.`name` FROM `companies` LEFT JOIN `users` ON `users`.`company_id` = `companies`.`id` where `users`.`parent_id` = 1 or `users`.`id` = 1

This will return the name of the company, id of the user and the name of the user. You can add this as a data-attribute to your HTML when you render the data and send it to your server when the user clicks on a company card, name or whatever you represent it with. You can also return more stuff depending on your own needs.

Do not forget to set limits where appropriate, as you do not want to overload your server with the task of fetching unnecessary results. Also, replace variables as they relate to your database tables.

Conclusion

We have explored a parent-child relationship at a very basic level. From the little table records we created, we can already see how this keeps our table a lot cleaner as we begin to add more users. Equally, we can already anticipate how much this will improve the experience of our users if all they have to do is just click a link and join a company.

Slack has a similar implementation for their authentication process. When next you use their company finder, understand that such a scenario could be implemented with a parent-child relationship.

Extending replication instrumentation: an insight on transaction retries

MySQL 8.0.13 improves replication lag monitoring by extending the instrumentation for transaction transient errors. These temporary errors, which include lock timeouts caused by client transactions executing concurrently as the slave is replicating, do not stop the applier thread: instead, they cause a transaction to retry.…

Slides and Workbooks From Oracle OpenWorld & CodeOne

First of all, thanks to everyone who attended my sessions at the recent Oracle OpenWorld and Code One in San Francisco. It was a great privilege to be allowed to make a number of presentations.

All of the workbooks and scripts from the hands-on labs (HOL) and the slides from the talks have been made available at OpenWorld’s and Code One’s session catalogues. You can download the files by using the OpenWorld catalog searching for my name or the session number (see below). Click on the download icon for each of the presentations you are interested in.

Click on the download icon in the Oracle OpenWorld session catalog to download the presentations.

For the hands-on labs the downloadable file includes the workbook as well as the scripts related to the exercises. The workbook contains the instructions for setting up the system used for the exercises as well as the exercises themselves and some additionaly background information. For the talks, the download consists of a PDF version of the slides.

The three hands-on labs and three talks I had were:

  • DEV5957: Develop Python Applications with MySQL Connector/Python
    This covered MySQL Connector/Python in general from installation to best practices. The talk focused on using the connector with SQL tables using both SQL statements and NoSQL CRUD methods. If you are interested in how I installed MySQL Connector/Python on iPad (the screen shot on in the right hand side of slide showing the pip command), see my previous blog about installing MySQL Connector/Python in Pythonista 3.
  • DEV5959: Python and the MySQL Document Store
    This covered how to use MySQL Connector/Python (and a little of MySQL Shell in Python mode) with the MySQL JSON Document Store using the NoSQL API (the X DevAPI).
  • HOL1703: A Practical Introduction to the MySQL Document Store
    This hands-on lab used the MySQL Shell in Python mode to use the MySQL Document Store including an introduction to the CRUD methods. The lab also includes a comparison of implementing the same X DevAPI program using Python, Node.js, and Java.
  • HOL1706: Developing Modern Applications with the MySQL Document Store and NodeJS
    This lab is similar to HOL1703 except it mainly uses Node.js programs to use the MySQL Document Store.
  • HOL2986: Using MySQL Common Table Expressions and Window Functions
    An introduction to the new MySQL 8.0 query features: common table expressions (CTEs) and the window functions that can be used for analytic queries.
  • THT6703: NoSQL + SQL = MySQL
    A brief introduction to MySQL, MySQL 8, and how you can use it both as a NoSQL document store and a traditional SQL database.

Enjoy.

MySQL Shell 8.0.13 Prompt: Now with New Line Support

I have already blogged a couple of times about the MySQL Shell prompt. In the first blog, I wrote about how in general to configure it, and in the second blog, I showed how to install the necessary fonts to use the prompt with the Awesome and Powerline fonts.

In this blog, I will show a new feature of MySQL Shell 8.0.13 which adds support to have a line break in the prompt and still get multi line statements align correctly. I will first discuss why you may want to use the new feature, then go through the new templates using this feature, and finally show my current favourite prompt.

Why Use a New Line in the Prompt?

You may ask why this feature is needed? The MySQL Shell prompt has support for showing a lot of information. While this is very nice, it also makes the prompt rather long. When you then have your query/statement, it requires a fair amount of real estate. Adding a new line in the prompt allows you to keep all the great information while still limiting the overall with of the prompt. The following figure shows a comparison of the single line prompt with a multi line prompt:

MySQL Shell 8.0.13: Comparing single and multi line prompts

Note that in the example, the multi line prompt includes the MySQL Server version which is not included in the single line prompt. It can of course be debated which of the two versions is preferable, but that is the beauty of MySQL Shell: you can choose the prompt that you prefer.

Now, lets look at the new templates in MySQL Shell 8.0.13.

New Prompt Templates with Line Breaks

There are three new prompt templates in the MySQL Shell installation. The prompt templates are located in /usr/share/mysqlsh/prompt/ on Linux and C:\Program Files\MySQL\MySQL Shell 8.0\share\mysqlsh\prompt on Microsoft Windows. You may also want to review the blog Configuring the MySQL Shell Prompt for more about the prompt templates. The three new prompt templates are:

  • prompt_dbl_256.json: The prompt uses 256 indexed colours. This is a two line version of the default prompt both on Oracle Linux 7 and Microsoft Windows 10.
  • prompt_dbl_256pl.json: Similar to the prompt_dbl_256.json prompt but using the Powerline font.
  • prompt_dbl_256pl+aw.json: Additionally adding the Awesome font symbols.

If you compare the prompt_dbl_256.json template with the default prompt_256.json template, the difference is:

The difference between the prompt_256.json and prompt_dbl_256.json templates.

So a new line is created using the "break": true syntax. That’s it. The other change in the screen shot is just to make the second line of the prompt be slightly indented.

This is great – if you want a new line before the end of the prompt, you can just use one of the new templates. However, personally I prefer to have my prompt slightly different.

My Favourite Prompt

My favourite prompt uses both the Awesome and Powerline fonts as well as adding a new line. I prefer to have the line break just before the language mode (the templates have the break after the mode). Additionally, I add the MySQL Server version (the value of the version system variable), and I use the Powerline separator on each new line.

The complete prompt definition is:

{ "desc" : "256/24bit color terminal theme with MySQL prefix, default schema, host:port, ssl, + indicator for X protocol, MySQL version, new line, active mode. Requires Powerline patched font with awesome symbols (e.g. awesome-terminal-fonts)", "classes" : { "SQL" : { "fg" : "15", "bg" : "166" }, "JS" : { "fg" : "0", "bg" : "221" }, "Py" : { "fg" : "15", "bg" : "25" }, "schema" : { "text": "\ue894 %schema%" }, "noschema" : { "text": "" }, "disconnected": { }, "hostx" : { "text": "\ue895 %transport%+" }, "hostc" : { "text": "\ue895 %transport%" }, "SSLhostx" : { "text": "\ue895 %transport%+ \ue0a2" }, "SSLhostc" : { "text": "\ue895 %transport% \ue0a2" }, "production" : { "text": " PRODUCTION ", "bg": "red", "fg": "white" }, "version": { "text": " %sysvar:version% ", "bg": "blue", "fg": 15 } }, "variables" : { "is_production": { "match" : { "pattern": "*;%host%;*", "value": ";%env:PRODUCTION_SERVERS%;" }, "if_true" : "production", "if_false" : "" }, "transport" : { "match" : { "pattern": "%socket%", "value": "" }, "if_true": "%host%:%port%", "if_false": "localhost" } }, "symbols" : { "separator" : "\ue0b0", "separator2" : "\ue0b1", "ellipsis" : "\u2026" }, "prompt" : { "text" : "\ue0b0 ", "cont_text" : " \ue0b0 ", "bg": "0" }, "segments": [ { "classes": ["disconnected%host%", "%is_production%"] }, { "text": " My", "bg": 254, "fg": 23 }, { "separator": "", "text": "SQL ", "bg": 254, "fg": 166 }, { "classes": ["disconnected%host%", "%ssl%host%session%"], "shrink": "truncate_on_dot", "bg": 237, "fg": 15, "weight": 10, "padding" : 1 }, { "classes": ["disconnected%host%", "version"] }, { "classes": ["noschema%schema%", "schema"], "bg": 242, "fg": 15, "shrink": "ellipsize", "weight": -1, "padding" : 1 }, { "text": " ", "bg": 0 }, { "break": true }, { "classes": ["%Mode%"], "text": "%Mode%", "padding" : 1 }, ] }

Examples of the resulting prompt are:

Example of the new prompt feature in MySQL Shell 8.0.13 with multi line queries.

Enjoy.

Percona Server for MySQL 8.0 Delivers Increased Reliability, Performance and Security

Percona released a Release Candidate (RC) version of Percona Server for MySQL 8.0, the company’s free, enhanced, drop-in replacement for MySQL Community Edition. Percona Server for MySQL 8.0 includes all the features of MySQL Community Edition 8.0, along with enterprise-class features from Percona that make it ideal for enterprise production environments. The latest release offers increased reliability, performance and security.

Percona Server for MySQL 8.0 General Availability (GA) will be available later this year. You learn how to install the release candidate software here. Please note this release candidate is not intended nor recommended for production environments

MySQL databases remain a pillar of enterprise data centers. But as the amount of data collected continues to soar, and as the types of databases deployed for specific applications continue to expand, organizations require a powerful and cost-effective MySQL solution for their production environments. Percona meets this need with its mature, proven open source alternative to MySQL Community Edition. Percona also backs MySQL 8.0 with the support and consulting services enterprises need to achieve optimal performance and maximize the value they obtain from the software – all with lower cost and complexity.

With more than 4,550,000 downloads, Percona Server for MySQL offers self-tuning algorithms and support for extremely high-performance hardware, delivering excellent performance and reliability for production environments. Percona Server for MySQL is trusted by thousands of enterprises to provide better performance. The following capabilities are unique to Percona Server for MySQL 8.0:

  • Greater scalability and availability, enhanced backups, and increased visibility to improve performance, reliability and usability
  • Parallel doublewrite functionality for greatly improved write performance, resulting in increased speed
  • Additional write-optimized storage engine, MyRocks, which takes advantage of modern hardware and database technology to reduce storage requirements, maintenance costs, and increase ROI in both on-premises and cloud-based applications, delivered with a MySQL-compatible interface.
  • Enhanced encryption functionality – including integration with Hashicorp Vault to simplify the management of encryption keys – for increased security
  • Advanced PAM-based authentication, audit logging, and threadpool scalability – enterprise-grade features available in Percona Server for MySQL without a commercial license

Percona Server for MySQL 8.0 also contains all the new features introduced in MySQL Community Edition 8.0, including:

  • Greater Reliability – A new transactional data dictionary makes recovery from failure easier, providing users with a higher level of comfort that data will not be lost. The transactional data dictionary is now crash-safe and centralized. In addition, support for atomic Data Definition Language (DDL) statements ensures that all operations associated with a DDL transaction are committed or rejected as a unit.
  • Enhanced Performance – New functions and expressions (along with Percona’s parallel doublewrite buffer) improve overall performance, allowing users to see their data more quickly. Enhanced JSON functionality improves document storage and query capabilities, and the addition of Window functions provides greater flexibility for aggregation queries. Common Table Expressions (CTE) enable improved query syntax for complex queries.
  • Increased Security – The ability to collect a typical series of permission grant statements for a user into a defined role and then apply that role to a user in MySQL makes the database environment more secure from outside attack by allowing administrators to better manage access permissions for authorized users. SQL Roles also enable administrators to more easily and efficiently determine a set of privileges for multiple associated users, saving time and reducing errors.
  • Expanded Queries – Percona Server for MySQL 8.0 provides support for spatial data types and indexes, as well as for Spatial Reference System (SRS), the industry-standard method for geospatial lookup.

Learn more about the Percona Server for MySQL 8.0 RC release here.

Percona XtraBackup 8.0-3-rc1 Is Available

Percona is glad to announce the release candidate of Percona XtraBackup 8.0-3-rc1 on October 31 2018. You can download it from our download site and apt and yum repositories.

This is a Release Candidate quality release and it is not intended for
production. If you want a high quality, Generally Available release, use the current stable version (the most recent stable version at the time of writing is 2.4.12 in the 2.4 series).

This release supports backing up and restoring MySQL 8.0 and Percona Server for MySQL 8.0

Things to Note
  • innobackupex was previously deprecated and has been removed
  • Due to the new MySQL redo log and data dictionary formats the Percona XtraBackup 8.0.x versions will only be compatible with MySQL 8.0.x and the upcoming Percona Server for MySQL 8.0.x
  • For experimental migrations from earlier database server versions, you will need to backup and restore and using XtraBackup 2.4 and then use mysql_upgrade from MySQL 8.0.x
Installation

As this is a release candidate, installation is performed by enabling the testing repository and installing the software via your package manager. For Debian based distributions see apt installation instructions, for RPM based distributions see yum installation instructions. Note that in both cases after installing the current percona-release package, you’ll need to enable the testing repository in order to install Percona XtraBackup 8.0.3-rc1.

Improvements
  • PXB-1655:  The --lock-ddl option is supported when backing up MySQL 8
Bugs Fixed
  • PXB-1678:  Incremental backup prepare run with the --apply-log-only option could roll back uncommitted transactions.
  • PXB-1672:  The MTS slave without GTID could be backed up when the --safe-slave-backup option was applied.

Release Candidate for Percona Server 8.0.12-2rc1 Is Available

Following the alpha release announced earlier, Percona announces the release candidate of Percona Server for MySQL 8.0.12-2rc1 on October 31, 2018. Download the latest version from the Percona website or from the Percona Software Repositories.

This release is based on MySQL 8.0.12 and includes all the bug fixes in it. It is a Release Candidate quality release and it is not intended for production. If you want a high quality, Generally Available release, use the current Stable version (the most recent stable release at the time of writing in the 5.7 series is 5.7.23-23).

Percona provides completely open-source and free software.

Installation

As this is a release candidate, installation is performed by enabling the testing repository and installing the software via your package manager.  For Debian based distributions see apt installation instructions, for RPM based distributions see yum installation instructions.  Note that in both cases after installing the current percona-release package, you’ll need to enable the testing repository in order to install Percona Server for MySQL 8.0.12-2rc1.  For manual installations you can download from the testing repository directly through our website.

New Features
  • #4550: Native Partitioning support for MyRocks storage engine
  • #3911: Native Partitioning support for TokuDB storage engine
  • #4946: Add an option to prevent implicit creation of column family in MyRocks
  • #4839: Better default configuration for MyRocks and TokuDB
  • InnoDB changed page tracking has been rewritten to account for redo logging changes in MySQL 8.0.11.  This fixes fast incremental backups for PS 8.0
  • #4434: TokuDB ROW_FORMAT clause has been removed, compression may be set by using the session variable tokudb_row_format instead.
Improvements
  • Several packaging changes to bring Percona packages more in line with upstream, including split repositories. As you’ll note from our instructions above we now ship a tool with our release packages to help manage this.
Bugs Fixed
  • #4785: Setting version_suffix to NULL could lead to handle_fatal_signal (sig=11) in Sys_var_version::global_value_ptr
  • #4788: Setting log_slow_verbosity and enabling the slow_query_log could lead to a server crash
  • #4947: Any index comment generated a new column family in MyRocks
  • #1107: Binlog could be corrupted when tmpdir got full
  • #1549: Server side prepared statements lead to a potential off-by-second timestamp on slaves
  • #4937: rocksdb_update_cf_options was useless when specified in my.cnf or on command line.
  • #4705: The server could crash on snapshot size check in RocksDB
  • #4791: SQL injection on slave due to non-quoting in binlogged ROLLBACK TO SAVEPOINT
  • #4953: rocksdb.truncate_table3 was unstable
Other bugs fixed:
  • #4811: 5.7 Merge and fixup for old DB-937 introduces possible regression
  • #4885: Using ALTER … ROW_FORMAT=TOKUDB_QUICKLZ leads to InnoDB: Assertion failure: ha_innodb.cc:12198:m_form->s->row_type == m_create_info->row_type
  • Numerous testsuite failures/crashes
Upcoming Features

MySQL Replication Recovery from BINLOG

MySQL Replication Recovery from BINLOG on MASTER

The recorded video on YouTube
https://youtu.be/STk0GThsRjc


Reference
https://lefred.be/content/howto-make-mysql-point-in-time-recovery-faster/

Background
MySQL Replication using binlog, transported to relay log and applied to Database on Slave is the basic mechanism.

The binary log and relay log in Replication to a certain extend have the structure.

There are few scenarios that we may consider using the Binary Log to recover the data.
1. To speed up replication on start up (https://lefred.be/content/howto-make-mysql-point-in-time-recovery-faster/)
2. With Async Replication in DR setup, if the MASTER's binlog file is available on DR site (using Storage Replication or any other way), recovering the data from binlog provides the last second data to be in-sync with the MASTER.   By using the MASTER's binlog, the Slave is able to recover all data from MASTER without data lost.

Assumption
1. MASTER and SLAVE in GTID replication setup
2. Replication is done with CHANNEL - channel name is 'channel1'
    mysql> CHANGE MASTER to ..... for channel 'channel1';

3. The recovery channel is named as 'channel2'
4.  The binary log prefix is mysqllog.  (log-bin=mysqllog)
     With this binlog prefix, the binlog filenames in datadir will be given with mysqllog.<number> and mysqllog.index
5.  The relay log prefix is relay (relay-log=relay)
     With this relay log prefix, the relay log filenames in datadir will be given with relay-<channelname>.<number> and relay-<channelname>.index

Steps Description
1. Stop the slave on channel1 (stop slave for channel 'channel1')
2. Add more records on MASTER.  Those records will not be replicated to SLAVE but only on MASTER (also they are written to Binary Log)

Recovery Steps on Slave
3. Copy the binary log files from MASTER (mysqllog.*) to somewhere (e.g. /tmp/binlogs)
4. Rename the file to be named as RELAY log channel2.  and create the index file
    mv mysqllog.*    to relay-channel2.* 
for i in $(ls /tmp/binlogs/*.0*)  do
ext=$(echo $i | cut -d'.' -f2);
cp $i relay-channel2.$ext;
done   cd /tmp/binlogs;ls -1 *.0* > relay-channel2.index

 
5. Copy the 'channel2' relay log files to Slave's datadir

6. Create the channel2 channel on Slave and Start Slave for channel 'channel2'.
On Slave Server ---
mysql> CHANGE MASTER TO RELAY_LOG_FILE='relay-channel2.000001',
RELAY_LOG_POS=1, MASTER_HOST='dummy';
mysql > start slave for channel 'channel2';
7. After recovery, Stop the channel 'channel2' and clean up.

8. Resume the Channel1, the data is in normal operation 

Fedora 29 Is Here and MySQL Is Ready

Fedora 29 was released earlier today, and we congratulate the Fedora community on the latest iteration of many people’s favourite Linux distro. Following our tradition of rapidly adding support for new Linux distro versions, we have added the following MySQL products to our official MySQL yum repos: MySQL Server (8.0.13 and 5.7.24) Connector C++ 8.0.13 Connector […]

Facebook open-sources new suite of Linux kernel components and tools

An integral part of Facebook’s engineering culture has always been our development work on open source solutions that solve real-world production issues and address key challenges in modern large-scale cloud computing. Today, we are announcing a suite of open source Linux kernel components and related tools that address critical fleet management issues. These include resource control, resource utilization, workload isolation, load balancing, measuring, monitoring, and much more.

Kernel and kernel application developers at Facebook partner with various internal teams to develop technologies that resolve issues and concerns in Facebook’s data centers — the same challenges that many throughout the industry share. The following products are now in production on a massive scale throughout all of Facebook’s infrastructure, as well as at many other organizations.

BPF

BPF is a highly flexible, efficient code execution engine in the Linux kernel that allows bytecode to run at various hook points, enabling safe and easy modifications of kernel behaviors with custom code. Although it’s been widely used for packet filtering, BPF’s instruction set is generic and flexible enough to support and allow for a wide variety of use cases beyond networking such as tracing and security (e.g., sandboxing).

At Facebook, ensuring fast, reliable access for our users is a top priority. To achieve this goal, our infrastructure engineers have developed traffic optimization systems in which BPF plays a prominent role. One such case is Katran, a software-based load-balancing solution with a completely reengineered forwarding plane that takes advantage of the BPF virtual machine. The Katran forwarding plane software library powers the network load balancer used in Facebook’s infrastructure and has helped improve the performance and scalability of network load balancing while drastically reducing inefficiencies.

Btrfs

Btrfs is a next-generation file system built with today’s data centers in mind. It is a copy-on-write (CoW) filesystem focused on advanced feature implementation, fault tolerance, repair, and easy administration. Btrfs is designed to address and manage large storage subsystems and supports features such as snapshots, online defragmentation, pooling, and integrated multiple device support.

Btrfs has played a role in increasing efficiency and resource utilization in Facebook’s data centers in a number of different applications. Recently, Btrfs helped eliminate priority inversions caused by the journaling behavior of the previous filesystem, when used for I/O control with cgroup2 (described below). Btrfs is the only filesystem implementation that currently works with resource isolation, and it’s now deployed on millions of servers, driving significant efficiency gains.

Netconsd

Netconsd is a UDP-based netconsole daemon that provides lightweight transport for Linux netconsole messages. It receives and processes log data from the Linux kernel and serves it up as structured data in a way that helps production engineers rapidly identify issues in the fleet.

At Facebook, netconsd provides vital data center statistics. It logs data continuously from millions of hosts and allows engineers to extract meaningful signals from the voluminous log data noise generated in the kernel, helping on-call production engineers rapidly identify and diagnose misbehaving services.

Cgroup2

Cgroup2 is the next-gen Linux kernel mechanism for grouping and structuring workloads, and controlling the amount of system resources assigned to each group. It has controllers for memory, I/O, central processing unit, and more. It also allows you to isolate workloads and prioritize and configure the distribution of resources for each one.

Resource control using cgroup2 is driving multi-tenancy improvements in Facebook’s data centers through better handling of memory overcommit and strategies such as load shedding when memory becomes scarce. It is improving resource utilization in Facebook’s fleet by isolating and protecting a system’s main workload from widely distributed system binaries and other system services that run on Facebook hosts.

The resources reserved for these system binaries were nicknamed the fbtax, which later became the name of the project to fix priority inversions and other resource distribution issues in Facebook’s fleet. The fbtax project demonstrated the possibility of comprehensive resource isolation while using operating system features such as memory management and file systems, opening the door to increased fleet efficiency by making workload stacking straightforward and safe. Check out the fbtax2 case study in the cgroup2 documentation for details.

PSI

Pressure Stall Information (PSI) provides for the first time a canonical way to quantify resource shortages with new pressure metrics for three major resources: memory, CPU, and I/O. These pressure metrics, in conjunction with other kernel and userspace tools that are part of this open source release, allow detection of resource shortages while they’re developing and responding intelligently. PSI stats provide early warning of impending resource shortages, enabling more proactive, granular, and nuanced responses.

We use PSI at Facebook in conjunction with cgroup2 to provide per-cgroup insight into resource use (and misuse) of a wide variety of different workloads, enabling increased utilization and reliability in situations in which resources are insufficient.

Visual showing how the various components and tools work together.

Oomd

Oomd is a userspace OOM (out-of-memory) process killer that acts with greater awareness of the context and priority of processes running on a system. It allows you to configure responses to OOM conditions, such as pausing or killing nonessentials, reallocating memory in the system, or other actions.

OOM killing traditionally happens inside the kernel. If a system runs out of physical memory, the Linux kernel is forced to OOM-kill one or more processes. This action is typically slow and painful because the kernel triggers only when the kernel itself can’t make forward progress: It lacks any way of knowing an application’s health, often resulting in thrashing behavior, in which the kernel thinks conditions are OK, but applications are suffering. On top of this behavior, configuring policy is complicated and inflexible. Oomd solves these problems in userspace by taking corrective action before an OOM occurs in kernel. A flexible plugin system that supports custom detection logic configures these actions. Oomd allows you to write custom protection rules for each workload.

In Facebook’s data centers, Oomd, in conjunction with PSI metrics and cgroup2, is increasing reliability and efficiency, driving large-capacity gains and significant increases in resource utilization.

Limitless possibilities

The use cases described here are just the beginning. The kernel components and tools included in this release can be adapted to solve a virtually limitless number of production problems. Facebook continues to develop and deploy solutions using these tools. We believe that this collection of services and products will be helpful for any developer building apps to serve millions of users on multiple platforms.

We are excited to release these solutions to the open source community and hope they will empower others to address similar production issues and develop innovative uses.

The post Facebook open-sources new suite of Linux kernel components and tools appeared first on Facebook Code.

20+ MongoDB Alternatives You Should Know About

As MongoDB® has changed their license from AGPL to SSPL many are concerned by this change, and by how sudden it has been. Will SSPL be protective enough for MongoDB, or will the next change be to go to an altogether proprietary license? According to our poll, many are going to explore MongoDB alternatives. This blog post provides a brief outline of technologies to consider.

Open Source Data Stores
  • PostgreSQL is the darling of the open source database community. Especially if your concern is the license,  PostgreSQL’s permissive licence is hard to beat. PostgreSQL has powerful JSON Support, and there are many successful stories of migrating from MongoDB to PostgreSQL
  • Citus While PostgreSQL is a powerful database, and you can store terabytes of data on a single cluster, at a larger scale you will need sharding. If so, consider the Citus PostgreSQL extension, or the DBaaS offering from the same guys.
  • TimescaleDB  If on the other hand you are storing  time series data in MongoDB, then TimescaleDB might be a good fit.
  • ToroDB If you would love to use PostgreSQL but need MongoDB wire protocol compatibility, take a look at ToroDB. While it can’t serve as a full drop-in replacement for MongoDB server just yet, the developer told me that with some work it is possible.
  • CockroachDB While not based on the PostgreSQL codebase, CockroachDB is PostgreSQL wire protocol compatible and it is natively distributed, so you will not need to do manual sharding.
  • MySQL® is another feasible replacement. MySQL 5.7 and MySQL 8 have great support for JSON, and it continues to get better with every maintenance release. You can also consider MySQL Cluster for medium size sharded environments. You can also consider MariaDB and Percona Server  for MySQL
  • MySQL DocStore is a CRUD interface for JSON data stored in MySQL, and while it is not the same as MongoDB’s query language, it is much easier to transition to compared to SQL.
  • Vitess Would you love to use MySQL but can’t stand manual sharding? Vitess is a powerful sharding engine for MySQL which will allow you to grow to great scale while using proven MySQL as a backend.
  • TiDB is another take on MySQL compatible sharding. This NewSQL engine is MySQL wire protocol compatible but underneath is a distributed database designed from the ground up.
  • CouchDB is a document database which speaks JSON natively.
  • CouchBase is another database engine to consider. While being a document based database, CouchBase offers the N1QL language which has SQL look and feel.
  • ArangoDB is multi-model database, which can be used as document store.
  • Elastic While not a perfect choice for every MongoDB workload, for workloads where document data is searched and analyzed ElasticSearch can be a great alternative.
  • Redis is another contender for some MongoDB workloads. Often used as a cache in front of MongoDB, it can also be used as a JSON store through extensions.  While such extensions from RedisLabs are no longer open source, GoodForm projects provides open source alternatives.
  • ClickHouse may be a great contender for moving analytical workloads from MongoDB. Much faster, and with JSON support and Nested Data Structures, it can be great choice for storing and analyzing document data.
  • Cassandra does not have a document data model, but it has proven to be extremely successful for building scalable distributed clusters. If this is your main use case for MongoDB, then you should consider Cassandra.
  • ScyllaDB is a protocol compatible Cassandra alternative which claims to offer much higher per node performance.
  • HBase is another option worth considering, especially if you already have a Hadoop/HDFS infrastructure.
Public Cloud Document Stores

Most major cloud providers offer some variant of a native document database for you to consider.

  • Microsoft Azure Cosmos DB is an interesting engine that provides multiple NoSQL APIs, including for MongoDB and Cassandra.
  • Amazon DynamoDB supports key value and document based APIs. While not offering MongoDB compatibility, DynamoDB has been around for a long time, and is the most battle tested of the public cloud database offerings.
  • Google Cloud DataStore  – Google Cloud offers a number of data storage options for you to consider, and Cloud DataStore offers a data model and query language that is the most similar to MongoDB.

If you’re not ready for a major migration effort, there is one more solution for you – Percona Server for MongoDB.  Based on MongoDB Community, and enhanced by Percona with Enterprise Features, Percona Server for MongoDB offers 100% compatibility. As we wrote in a previous post, we commit to shipping a supported AGPL version until the situation around SSPL is clearly resolved.

Want help on deciding what is the best option for you, or with migration heavy lifting? Percona Professional Services can help!

Have idea for another feasible MongoDB alternative?  Please comment, and I will consider adding it to the list!


Image by JOSHUA COLEMAN on Unsplash

SQL Firewalling Made Easy with ClusterControl & ProxySQL

Reading the title of this blog post may raise some questions. SQL firewall - what is that? What does it do? Why would I need something like that in the first place? Well, the ability to block certain queries could come in handy in certain situations. When using ProxySQL in front of your database servers, the proxy is able to inspect all SQL statements being sent. ProxySQL has a sophisticated rules engine, and can match queries that are to be allowed, blocked, re-written on the fly or routed to a specific database server. Let’s go through some examples.

You have a dedicated slave which is used by developers to test their queries against production data. You want to make sure the developers can only connect to that particular host and execute only SELECT queries.

Another case, let’s say that you encountered one too many accidents with people running schema changes and you would like to limit which users which can execute ALTER statement.

Finally, let’s think about a paranoid approach in which users are allowed to execute just a pre-defined whitelisted set of queries.

In our environment we have a replication setup with the master and two slaves.

In front of our databases, wee have three ProxySQL nodes with Keepalived managing Virtual IP. We also have ProxySQL cluster configured (as explained in this previous blog) so we don’t have to worry about making configuration or query rule changes three times on all three ProxySQL nodes. For the query rules, a simple read-write split is set up:

Let’s take a look at how ProxySQL, with its extensive query rules mechanism, can help us to achieve our goals in all those three cases.

Locking user access to a single hostgroup

A dedicated slave available to developers - this is not uncommon practice. As long as your developers can access production data (and if they are not allowed, e.g., due to compliance reasons, data masking as explained in our ProxySQL tutorial may help), this can help them to test and optimize queries on the real world data set. It may also help to verify data before executing some of the schema changes. For example, is my column really unique before adding a unique index?

With ProxySQL it is fairly easy to restrict access. For starters, let’s assume that the hostgroup 30 contains the slave we want developers to access.

We need an user which will be used by the developers to access that slave. If you have it already in ProxySQL, that’s fine. If not, you may either need to import it to ProxySQL (if it is created in MySQL but not in ProxySQL) or create it in both locations (if you’ll be creating a new user). Let’s go with the last option, creating a new user.

Let’s create a new user with limited privileges on both MySQL and ProxySQL. We will use it in query rules to identify traffic coming from the developers.

In this query rule we are going to redirect all of the queries which are executed by dev_test user to the hostgroup 30. We want this rule to be active and it should be the final one to parse, therefore we enabled ‘Apply’. We also configured RuleID to be smaller than the ID of the first existing rule as we want this query to be executed outside of the regular read/write split setup.

As you can see, we used an username but there are also other options.

If you can predict which development hosts will send the traffic to the database (for example, you can have developers use a specific proxy before they can reach the database), you can also use the “Client Address” option to match queries executed by that single host and redirect them to a correct hostgroup.

Disallowing user from executing certain queries

Now, let’s consider the case where we want to limit execution of some particular commands to a given user. This could be handy to ensure that the right people can run some of the performance impacting queries like schema changes. ALTER will be the query which we will use in this example. For starters, let’s add a new user which will be allowed to run schema changes. We will call it ‘admin_user’. Next, we need to create the required query rules.

We will create a query rule which uses ‘.*ALTER TABLE.*’ regular expression to match the queries. This query rule should be executed before other, read/write split rules. We assigned a rule ID of 20 to it. We define an error message that will be returned to the client in case this query rule will be triggered. Once done, we proceed to another query rule.

Here we use the same regular expression to catch the query but we don’t define any error text (which means that query will not return an error). We also define which user is allowed to execute it (admin_user in our case). We make sure this query is checked before the previous one, so we assigned a lower rule ID of 19 to it.

Once these two query rules are in place, we can test how they work. Let’s try to log in as an application user and run an ALTER TABLE query:

root@vagrant:~# mysql -P6033 -usbtest -ppass -h10.0.0.111 mysql: [Warning] Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 43160 Server version: 5.5.30 (ProxySQL) Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> use sbtest; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> alter table sbtest1 add index (pad); ERROR 1148 (42000): You are not allowed to execute ALTER mysql> ^DBye

As expected, we couldn’t execute this query and we received an error message. Let’s now try to connect using our ‘admin_user’:

root@vagrant:~# mysql -P6033 -uadmin_user -ppass -h10.0.0.111 mysql: [Warning] Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 43180 Server version: 5.5.30 (ProxySQL) Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> use sbtest; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> alter table sbtest1 add index (pad); Query OK, 0 rows affected (0.99 sec) Records: 0 Duplicates: 0 Warnings: 0

We managed to execute the ALTER as we logged in using ‘admin_user’. This is a very simple way of ensuring that only appointed people can run schema changes on your databases.

Creating a whitelist of allowed queries

Finally, let’s consider a tightly locked environment where only predefined queries can be executed. ProxySQL can be easily utilized to implement such setup.

First of all, we need to remove all existing query rules before we can implement what we need. Then, we need to create a catch-all query rule, which will block all the queries:

Related resources  Database Load Balancing for MySQL and MariaDB with ProxySQL - Tutorial  ProxySQL: All the Severalnines Resources

The rest we have to do is to create query rules for all of the queries which are allowed. You can do one rule per query. Or you can use regular expressions if, for example, SELECTs are always ok to run. The only thing you have to remember is that the rule ID has to be smaller than the rule ID of this catch-all rule, and ensure that the query will eventually hit the rule with ‘Apply’ enabled.

We hope that this blog post gave you some insight into how you can utilize ClusterControl and ProxySQL to improve security and ensure compliance of your databases.

Tags:  MySQL MariaDB database firewall proxysql

MySQL & MySQL Community team at conferences this week

As announced in on October 18, we are ready for being part of following shows this week. Please come to find MySQL staff there and talk to us about MySQL.

  • Madison PHP, Madison, US, November 2-3, 2018: We are Community sponsor. 
  • MOPCON 2018, Taipei, Taiwan, November 3-4, 2018: Do not forget to attend our sponsored MySQL talk given by Ivan Tu, the MySQL Principal Consultant Manger. Ivan will be talking about "The Mobile application supported by new generation MySQL 8.0" (scheduled for Nov 4, at 11:05-11:45am in Big Data track). Ivan will be available before and after the talk to answer your questions about MySQL. Please do not miss Ivan's talk and the opportunity to discuss news about MySQL 8.0. 

 

    On the consequences of sync_binlog != 1 (part #1)

    A well-known performance booster in MySQL is to set sync_binlog to 0.  However, this configuration alone comes with serious consequences on consistency and on durability (the C and D of ACID); I explore those in this series.  In this post, I give some background on the sync_binlog parameter and I explain part of the problem with setting it to 0 (or to a value different from 1).  The other

    MySQL on Docker: Running ProxySQL as Kubernetes Service

    When running distributed database clusters, it is quite common to front them with load balancers. The advantages are clear - load balancing, connection failover and decoupling of the application tier from the underlying database topologies. For more intelligent load balancing, a database-aware proxy like ProxySQL or MaxScale would be the way to go. In our previous blog, we showed you how to run ProxySQL as a helper container in Kubernetes. In this blog post, we’ll show you how to deploy ProxySQL as a Kubernetes service. We’ll use Wordpress as an example application and the database backend is running on a two-node MySQL Replication deployed using ClusterControl. The following diagram illustrates our infrastructure:

    Since we are going to deploy a similar setup as in this previous blog post, do expect duplication in some parts of the blog post to keep the post more readable.

    ProxySQL on Kubernetes

    Let’s start with a bit of recap. Designing a ProxySQL architecture is a subjective topic and highly dependent on the placement of the application, database containers as well as the role of ProxySQL itself. Ideally, we can configure ProxySQL to be managed by Kubernetes with two configurations:

    1. ProxySQL as a Kubernetes service (centralized deployment)
    2. ProxySQL as a helper container in a pod (distributed deployment)

    Both deployments can be distinguished easily by looking at the following diagram:

    This blog post will cover the first configuration - running ProxySQL as a Kubernetes service. The second configuration is already covered here. In contrast to the helper container approach, running as a service makes ProxySQL pods live independently from the applications and can be easily scaled and clustered together with the help of Kubernetes ConfigMap. This is definitely a different clustering approach than ProxySQL native clustering support which relies on configuration checksum across ProxySQL instances (a.k.a proxysql_servers). Check out this blog post if you want to learn about ProxySQL clustering made easy with ClusterControl.

    In Kubernetes, ProxySQL's multi-layer configuration system makes pod clustering possible with ConfigMap. However, there are a number of shortcomings and workarounds to make it work smoothly as what ProxySQL's native clustering feature does. At the moment, signalling a pod upon ConfigMap update is a feature in the works. We will cover this topic in much greater detail in an upcoming blog post.

    Basically, we need to create ProxySQL pods and attach a Kubernetes service to be accessed by the other pods within the Kubernetes network or externally. Applications will then connect to the ProxySQL service via TCP/IP networking on the configured ports. Default to 6033 for MySQL load-balanced connections and 6032 for ProxySQL administration console. With more than one replica, the connections to the pod will be load balanced automatically by Kubernetes kube-proxy component running on every Kubernetes node.

    ProxySQL as Kubernetes Service

    In this setup, we run both ProxySQL and Wordpress as pods and services. The following diagram illustrates our high-level architecture:

    In this setup, we will deploy two pods and services - "wordpress" and "proxysql". We will merge Deployment and Service declaration in one YAML file per application and manage them as one unit. To keep the application containers' content persistent across multiple nodes, we have to use a clustered or remote file system, which in this case is NFS.

    Deploying ProxySQL as a service brings a couple of good things over the helper container approach:

    • Using Kubernetes ConfigMap approach, ProxySQL can be clustered with immutable configuration.
    • Kubernetes handles ProxySQL recovery and balance the connections to the instances automatically.
    • Single endpoint with Kubernetes Virtual IP address implementation called ClusterIP.
    • Centralized reverse proxy tier with shared nothing architecture.
    • Can be used with external applications outside Kubernetes.

    We will start the deployment as two replicas for ProxySQL and three for Wordpress to demonstrate running at scale and load-balancing capabilities that Kubernetes offers.

    Preparing the Database

    Create the wordpress database and user on the master and assign with correct privilege:

    mysql-master> CREATE DATABASE wordpress; mysql-master> CREATE USER wordpress@'%' IDENTIFIED BY 'passw0rd'; mysql-master> GRANT ALL PRIVILEGES ON wordpress.* TO wordpress@'%';

    Also, create the ProxySQL monitoring user:

    mysql-master> CREATE USER proxysql@'%' IDENTIFIED BY 'proxysqlpassw0rd';

    Then, reload the grant table:

    mysql-master> FLUSH PRIVILEGES; ProxySQL Pod and Service Definition

    The next one is to prepare our ProxySQL deployment. Create a file called proxysql-rs-svc.yml and add the following lines:

    apiVersion: v1 kind: Deployment metadata: name: proxysql labels: app: proxysql spec: replicas: 2 selector: matchLabels: app: proxysql tier: frontend strategy: type: RollingUpdate template: metadata: labels: app: proxysql tier: frontend spec: restartPolicy: Always containers: - image: severalnines/proxysql:1.4.12 name: proxysql volumeMounts: - name: proxysql-config mountPath: /etc/proxysql.cnf subPath: proxysql.cnf ports: - containerPort: 6033 name: proxysql-mysql - containerPort: 6032 name: proxysql-admin volumes: - name: proxysql-config configMap: name: proxysql-configmap --- apiVersion: v1 kind: Service metadata: name: proxysql labels: app: proxysql tier: frontend spec: type: NodePort ports: - nodePort: 30033 port: 6033 name: proxysql-mysql - nodePort: 30032 port: 6032 name: proxysql-admin selector: app: proxysql tier: frontend

    Let's see what those definitions are all about. The YAML consists of two resources combined in a file, separated by "---" delimiter. The first resource is the Deployment, which we define the following specification:

    spec: replicas: 2 selector: matchLabels: app: proxysql tier: frontend strategy: type: RollingUpdate

    The above means we would like to deploy two ProxySQL pods as a ReplicaSet that matches containers labelled with "app=proxysql,tier=frontend". The deployment strategy specifies the strategy used to replace old pods by new ones. In this deployment, we picked RollingUpdate which means the pods will be updated in a rolling update fashion, one pod at a time.

    The next part is the container's template:

    - image: severalnines/proxysql:1.4.12 name: proxysql volumeMounts: - name: proxysql-config mountPath: /etc/proxysql.cnf subPath: proxysql.cnf ports: - containerPort: 6033 name: proxysql-mysql - containerPort: 6032 name: proxysql-admin volumes: - name: proxysql-config configMap: name: proxysql-configmap

    In the spec.templates.spec.containers.* section, we are telling Kubernetes to deploy ProxySQL using severalnines/proxysql image version 1.4.12. We also want Kubernetes to mount our custom, pre-configured configuration file and map it to /etc/proxysql.cnf inside the container. The running pods will publish two ports - 6033 and 6032. We also define the "volumes" section, where we instruct Kubernetes to mount the ConfigMap as a volume inside the ProxySQL pods to be mounted by volumeMounts.

    The second resource is the service. A Kubernetes service is an abstraction layer which defines the logical set of pods and a policy by which to access them. In this section, we define the following:

    apiVersion: v1 kind: Service metadata: name: proxysql labels: app: proxysql tier: frontend spec: type: NodePort ports: - nodePort: 30033 port: 6033 name: proxysql-mysql - nodePort: 30032 port: 6032 name: proxysql-admin selector: app: proxysql tier: frontend

    In this case, we want our ProxySQL to be accessed from the external network thus NodePort type is the chosen type. This will publish the nodePort on every Kubernetes nodes in the cluster. The range of valid ports for NodePort resource is 30000-32767. We chose port 30033 for MySQL-load balanced connections which is mapped to port 6033 of the ProxySQL pods and port 30032 for ProxySQL Administration port mapped to 6032.

    Therefore, based on our YAML definition above, we have to prepare the following Kubernetes resource before we can begin to deploy the "proxysql" pod:

    • ConfigMap - To store ProxySQL configuration file as a volume so it can be mounted to multiple pods and can be remounted again if the pod is being rescheduled to the other Kubernetes node.
    Preparing ConfigMap for ProxySQL

    Similar to the previous blog post, we are going to use ConfigMap approach to decouple the configuration file from the container and also for scalability purpose. Take note that in this setup, we consider our ProxySQL configuration is immutable.

    Firstly, create the ProxySQL configuration file, proxysql.cnf and add the following lines:

    datadir="/var/lib/proxysql" admin_variables= { admin_credentials="proxysql-admin:adminpassw0rd" mysql_ifaces="0.0.0.0:6032" refresh_interval=2000 } mysql_variables= { threads=4 max_connections=2048 default_query_delay=0 default_query_timeout=36000000 have_compress=true poll_timeout=2000 interfaces="0.0.0.0:6033;/tmp/proxysql.sock" default_schema="information_schema" stacksize=1048576 server_version="5.1.30" connect_timeout_server=10000 monitor_history=60000 monitor_connect_interval=200000 monitor_ping_interval=200000 ping_interval_server_msec=10000 ping_timeout_server=200 commands_stats=true sessions_sort=true monitor_username="proxysql" monitor_password="proxysqlpassw0rd" } mysql_replication_hostgroups = ( { writer_hostgroup=10, reader_hostgroup=20, comment="MySQL Replication 5.7" } ) mysql_servers = ( { address="192.168.55.171" , port=3306 , hostgroup=10, max_connections=100 }, { address="192.168.55.172" , port=3306 , hostgroup=10, max_connections=100 }, { address="192.168.55.171" , port=3306 , hostgroup=20, max_connections=100 }, { address="192.168.55.172" , port=3306 , hostgroup=20, max_connections=100 } ) mysql_users = ( { username = "wordpress" , password = "passw0rd" , default_hostgroup = 10 , active = 1 } ) mysql_query_rules = ( { rule_id=100 active=1 match_pattern="^SELECT .* FOR UPDATE" destination_hostgroup=10 apply=1 }, { rule_id=200 active=1 match_pattern="^SELECT .*" destination_hostgroup=20 apply=1 }, { rule_id=300 active=1 match_pattern=".*" destination_hostgroup=10 apply=1 } )

    Pay attention on the admin_variables.admin_credentials variable where we used non-default user which is "proxysql-admin". ProxySQL reserves the default "admin" user for local connection via localhost only. Therefore, we have to use other users to access the ProxySQL instance remotely. Otherwise, you would get the following error:

    ERROR 1040 (42000): User 'admin' can only connect locally

    Our ProxySQL configuration is based on our two database servers running in MySQL Replication as summarized in the following Topology screenshot taken from ClusterControl:

    All writes should go to the master node while reads are forwarded to hostgroup 20, as defined under "mysql_query_rules" section. That's the basic of read/write splitting and we want to utilize them altogether.

    Then, import the configuration file into ConfigMap:

    $ kubectl create configmap proxysql-configmap --from-file=proxysql.cnf configmap/proxysql-configmap created

    Verify if the ConfigMap is loaded into Kubernetes:

    $ kubectl get configmap NAME DATA AGE proxysql-configmap 1 45s Wordpress Pod and Service Definition

    Now, paste the following lines into a file called wordpress-rs-svc.yml on the host where kubectl is configured:

    apiVersion: apps/v1 kind: Deployment metadata: name: wordpress labels: app: wordpress spec: replicas: 3 selector: matchLabels: app: wordpress tier: frontend strategy: type: RollingUpdate template: metadata: labels: app: wordpress tier: frontend spec: restartPolicy: Always containers: - image: wordpress:4.9-apache name: wordpress env: - name: WORDPRESS_DB_HOST value: proxysql:6033 # proxysql.default.svc.cluster.local:6033 - name: WORDPRESS_DB_USER value: wordpress - name: WORDPRESS_DB_DATABASE value: wordpress - name: WORDPRESS_DB_PASSWORD valueFrom: secretKeyRef: name: mysql-pass key: password ports: - containerPort: 80 name: wordpress --- apiVersion: v1 kind: Service metadata: name: wordpress labels: app: wordpress tier: frontend spec: type: NodePort ports: - name: wordpress nodePort: 30088 port: 80 selector: app: wordpress tier: frontend

    Similar to our ProxySQL definition, the YAML consists of two resources, separated by "---" delimiter combined in a file. The first one is the Deployment resource, which will be deployed as a ReplicaSet, as shown under the "spec.*" section:

    spec: replicas: 3 selector: matchLabels: app: wordpress tier: frontend strategy: type: RollingUpdate

    This section provides the Deployment specification - 3 pods to start that matches label "app=wordpress,tier=backend". The deployment strategy is RollingUpdate which means the way Kubernetes will replace the pod is by using rolling update fashion, same with our ProxySQL deployment.

    The next part is the "spec.template.spec.*" section:

    restartPolicy: Always containers: - image: wordpress:4.9-apache name: wordpress env: - name: WORDPRESS_DB_HOST value: proxysql:6033 - name: WORDPRESS_DB_USER value: wordpress - name: WORDPRESS_DB_PASSWORD valueFrom: secretKeyRef: name: mysql-pass key: password ports: - containerPort: 80 name: wordpress volumeMounts: - name: wordpress-persistent-storage mountPath: /var/www/html


    In this section, we are telling Kubernetes to deploy Wordpress 4.9 using Apache web server and we gave the container the name "wordpress". The container will be restarted every time it is down, regardless of the status. We also want Kubernetes to pass a number of environment variables:

    • WORDPRESS_DB_HOST - The MySQL database host. Since we are using ProxySQL as a service, the service name will be the value of metadata.name which is "proxysql". ProxySQL listens on port 6033 for MySQL load balanced connections while ProxySQL administration console is on 6032.
    • WORDPRESS_DB_USER - Specify the wordpress database user that have been created under "Preparing the Database" section.
    • WORDPRESS_DB_PASSWORD - The password for WORDPRESS_DB_USER. Since we do not want to expose the password in this file, we can hide it using Kubernetes Secrets. Here we instruct Kubernetes to read the "mysql-pass" Secret resource instead. Secrets has to be created in advanced before the pod deployment, as explained further down.

    We also want to publish port 80 of the pod for the end user. The Wordpress content stored inside /var/www/html in the container will be mounted into our persistent storage running on NFS. We will use the PersistentVolume and PersistentVolumeClaim resources for this purpose as shown under "Preparing Persistent Storage for Wordpress" section.

    After the "---" break line, we define another resource called Service:

    apiVersion: v1 kind: Service metadata: name: wordpress labels: app: wordpress tier: frontend spec: type: NodePort ports: - name: wordpress nodePort: 30088 port: 80 selector: app: wordpress tier: frontend

    In this configuration, we would like Kubernetes to create a service called "wordpress", listen on port 30088 on all nodes (a.k.a. NodePort) to the external network and forward it to port 80 on all pods labelled with "app=wordpress,tier=frontend".

    Therefore, based on our YAML definition above, we have to prepare a number of Kubernetes resources before we can begin to deploy the "wordpress" pod and service:

    • PersistentVolume and PersistentVolumeClaim - To store the web contents of our Wordpress application, so when the pod is being rescheduled to other worker node, we won't lose the last changes.
    • Secrets - To hide the Wordpress database user password inside the YAML file.
    Preparing Persistent Storage for Wordpress

    A good persistent storage for Kubernetes should be accessible by all Kubernetes nodes in the cluster. For the sake of this blog post, we used NFS as the PersistentVolume (PV) provider because it's easy and supported out-of-the-box. The NFS server is located somewhere outside of our Kubernetes network (as shown in the first architecture diagram) and we have configured it to allow all Kubernetes nodes with the following line inside /etc/exports:

    /nfs 192.168.55.*(rw,sync,no_root_squash,no_all_squash)

    Take note that NFS client package must be installed on all Kubernetes nodes. Otherwise, Kubernetes wouldn't be able to mount the NFS correctly. On all nodes:

    $ sudo apt-install nfs-common #Ubuntu/Debian $ yum install nfs-utils #RHEL/CentOS

    Also, make sure on the NFS server, the target directory exists:

    (nfs-server)$ mkdir /nfs/kubernetes/wordpress

    Then, create a file called wordpress-pv-pvc.yml and add the following lines:

    apiVersion: v1 kind: PersistentVolume metadata: name: wp-pv labels: app: wordpress spec: accessModes: - ReadWriteOnce capacity: storage: 3Gi mountOptions: - hard - nfsvers=4.1 nfs: path: /nfs/kubernetes/wordpress server: 192.168.55.200 --- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: wp-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 3Gi selector: matchLabels: app: wordpress tier: frontend

    In the above definition, we are telling Kubernetes to allocate 3GB of volume space on the NFS server for our Wordpress container. Take note for production usage, NFS should be configured with automatic provisioner and storage class.

    Create the PV and PVC resources:

    $ kubectl create -f wordpress-pv-pvc.yml

    Verify if those resources are created and the status must be "Bound":

    $ kubectl get pv,pvc NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE persistentvolume/wp-pv 3Gi RWO Recycle Bound default/wp-pvc 22h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/wp-pvc Bound wp-pv 3Gi RWO 22h Preparing Secrets for Wordpress

    Create a secret to be used by the Wordpress container for WORDPRESS_DB_PASSWORD environment variable. The reason is simply because we don't want to expose the password in clear text inside the YAML file.

    Create a secret resource called mysql-pass and pass the password accordingly:

    $ kubectl create secret generic mysql-pass --from-literal=password=passw0rd

    Verify that our secret is created:

    $ kubectl get secrets mysql-pass NAME TYPE DATA AGE mysql-pass Opaque 1 7h12m Deploying ProxySQL and Wordpress

    Finally, we can begin the deployment. Deploy ProxySQL first, followed by Wordpress:

    $ kubectl create -f proxysql-rs-svc.yml $ kubectl create -f wordpress-rs-svc.yml

    We can then list out all pods and services that have been created under "frontend" tier:

    $ kubectl get pods,services -l tier=frontend -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE pod/proxysql-95b8d8446-qfbf2 1/1 Running 0 12m 10.36.0.2 kube2.local <none> pod/proxysql-95b8d8446-vljlr 1/1 Running 0 12m 10.44.0.6 kube3.local <none> pod/wordpress-59489d57b9-4dzvk 1/1 Running 0 37m 10.36.0.1 kube2.local <none> pod/wordpress-59489d57b9-7d2jb 1/1 Running 0 30m 10.44.0.4 kube3.local <none> pod/wordpress-59489d57b9-gw4p9 1/1 Running 0 30m 10.36.0.3 kube2.local <none> NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR service/proxysql NodePort 10.108.195.54 <none> 6033:30033/TCP,6032:30032/TCP 10m app=proxysql,tier=frontend service/wordpress NodePort 10.109.144.234 <none> 80:30088/TCP 37m app=wordpress,tier=frontend kube2.local <none>

    The above output verifies our deployment architecture where we are currently having three Wordpress pods, exposed on port 30088 publicly as well as our ProxySQL instance which is exposed on port 30033 and 30032 externally plus 6033 and 6032 internally.

    At this point, our architecture is looking something like this:

    Port 80 published by the Wordpress pods is now mapped to the outside world via port 30088. We can access our blog post at http://{any_kubernetes_host}:30088/ and should be redirected to the Wordpress installation page. If we proceed with the installation, it would skip the database connection part and directly show this page:

    It indicates that our MySQL and ProxySQL configuration is correctly configured inside wp-config.php file. Otherwise, you would be redirected to the database configuration page.

    Our deployment is now complete.

    ProxySQL Pods and Service Management

    Failover and recovery are expected to be handled automatically by Kubernetes. For example, if a Kubernetes worker goes down, the pod will be recreated in the next available node after --pod-eviction-timeout (default to 5 minutes). If the container crashes or is killed, Kubernetes will replace it almost instantly.

    Some common management tasks are expected to be different when running within Kubernetes, as shown in the next sections.

    Connecting to ProxySQL

    While ProxySQL is exposed externally on port 30033 (MySQL) and 30032 (Admin), it is also accessible internally via the published ports, 6033 and 6032 respectively. Thus, to access the ProxySQL instances within the Kubernetes network, use the CLUSTER-IP, or the service name "proxysql" as the host value. For example, within Wordpress pod, you may access the ProxySQL admin console by using the following command:

    $ mysql -uproxysql-admin -p -hproxysql -P6032

    If you want to connect externally, use the port defined under nodePort value he service YAML and pick any of the Kubernetes node as the host value:

    $ mysql -uproxysql-admin -p -hkube3.local -P30032

    The same applies to the MySQL load-balanced connection on port 30033 (external) and 6033 (internal).

    Scaling Up and Down

    Scaling up is easy with Kubernetes:

    $ kubectl scale deployment proxysql --replicas=5 deployment.extensions/proxysql scaled

    Verify the rollout status:

    $ kubectl rollout status deployment proxysql deployment "proxysql" successfully rolled out

    Scaling down is also similar. Here we want to revert back from 5 to 2 replicas:

    $ kubectl scale deployment proxysql --replicas=2 deployment.extensions/proxysql scaled

    We can also look at the deployment events for ProxySQL to get a better picture of what has happened for this deployment by using the "describe" option:

    $ kubectl describe deployment proxysql ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 20m deployment-controller Scaled up replica set proxysql-769895fbf7 to 1 Normal ScalingReplicaSet 20m deployment-controller Scaled down replica set proxysql-95b8d8446 to 1 Normal ScalingReplicaSet 20m deployment-controller Scaled up replica set proxysql-769895fbf7 to 2 Normal ScalingReplicaSet 20m deployment-controller Scaled down replica set proxysql-95b8d8446 to 0 Normal ScalingReplicaSet 7m10s deployment-controller Scaled up replica set proxysql-6c55f647cb to 1 Normal ScalingReplicaSet 7m deployment-controller Scaled down replica set proxysql-769895fbf7 to 1 Normal ScalingReplicaSet 7m deployment-controller Scaled up replica set proxysql-6c55f647cb to 2 Normal ScalingReplicaSet 6m53s deployment-controller Scaled down replica set proxysql-769895fbf7 to 0 Normal ScalingReplicaSet 54s deployment-controller Scaled up replica set proxysql-6c55f647cb to 5 Normal ScalingReplicaSet 21s deployment-controller Scaled down replica set proxysql-6c55f647cb to 2

    The connections to the pods will be load balanced automatically by Kubernetes.

    Configuration Changes

    One way to make configuration changes on our ProxySQL pods is by versioning our configuration using another ConfigMap name. Firstly, modify our configuration file directly via your favourite text editor:

    $ vim /root/proxysql.cnf

    Then, load it up into Kubernetes ConfigMap with a different name. In this example, we append "-v2" in the resource name:

    $ kubectl create configmap proxysql-configmap-v2 --from-file=proxysql.cnf

    Verify if the ConfigMap is loaded correctly:

    $ kubectl get configmap NAME DATA AGE proxysql-configmap 1 3d15h proxysql-configmap-v2 1 19m

    Open the ProxySQL deployment file, proxysql-rs-svc.yml and change the following line under configMap section to the new version:

    volumes: - name: proxysql-config configMap: name: proxysql-configmap-v2 #change this line

    Then, apply the changes to our ProxySQL deployment:

    $ kubectl apply -f proxysql-rs-svc.yml deployment.apps/proxysql configured service/proxysql configured

    Verify the rollout by using looking at the ReplicaSet event using the "describe" flag:

    $ kubectl describe proxysql ... Pod Template: Labels: app=proxysql tier=frontend Containers: proxysql: Image: severalnines/proxysql:1.4.12 Ports: 6033/TCP, 6032/TCP Host Ports: 0/TCP, 0/TCP Environment: <none> Mounts: /etc/proxysql.cnf from proxysql-config (rw) Volumes: proxysql-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: proxysql-configmap-v2 Optional: false Conditions: Type Status Reason ---- ------ ------ Available True MinimumReplicasAvailable Progressing True NewReplicaSetAvailable OldReplicaSets: <none> NewReplicaSet: proxysql-769895fbf7 (2/2 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 53s deployment-controller Scaled up replica set proxysql-769895fbf7 to 1 Normal ScalingReplicaSet 46s deployment-controller Scaled down replica set proxysql-95b8d8446 to 1 Normal ScalingReplicaSet 46s deployment-controller Scaled up replica set proxysql-769895fbf7 to 2 Normal ScalingReplicaSet 41s deployment-controller Scaled down replica set proxysql-95b8d8446 to 0 Related resources  Database Load Balancing for MySQL and MariaDB with ProxySQL - Tutorial  MySQL on Docker: Running a MariaDB Galera Cluster without Container Orchestration Tools  MySQL on Docker: Running ProxySQL as a Helper Container on Kubernetes

    Pay attention on the "Volumes" section with the new ConfigMap name. You can also see the deployment events at the bottom of the output. At this point, our new configuration has been loaded into all ProxySQL pods, where Kubernetes scaled down the ProxySQL ReplicaSet to 0 (obeying RollingUpdate strategy) and bring them back to the desired state of 2 replicas.

    Final Thoughts

    Up until this point, we have covered possible deployment approach for ProxySQL in Kubernetes. Running ProxySQL with the help of Kubernetes ConfigMap opens a new possibility of ProxySQL clustering, where it is somewhat different as compared to the native clustering support built-in inside ProxySQL.

    In the upcoming blog post, we will explore ProxySQL Clustering using Kubernetes ConfigMap and how to do it the right way. Stay tuned!

    Tags:  proxysql docker kubernetes MySQL MariaDB

    Percona Live Europe Presents: ClickHouse at Messagebird: Analysing Billions of Events in Real-Time*

    We’ll look into how Clickhouse allows us to ingest a large amount of data and run complex analytical interactive queries at MessageBird,. We also present the business needs that brought ClickHouse to our attention and detail the journey to its deployment. We cover the problems we faced, and how we dealt with them. We talk about our current Cloud production setup and how we deployed and use it.

    We are really enthusiastic to share a use case of Clickhouse, how it helped us to scale our analytics stack with the good, the bad and the ugly.

    The talk could be useful to newcomers and everyone wondering if Clickhouse could be useful to them.

    What we’re looking forward to…

    There are many talks, but these are among the top ones we’re looking forward to in particular:

    Aleksandar AleksandrovFélix Mattrat

    The post Percona Live Europe Presents: ClickHouse at Messagebird: Analysing Billions of Events in Real-Time* appeared first on Percona Community Blog.

    MySQL 8 loves Javascript @jsfoo

    Presented at the JSFoo conference for the first time.

    The slides are uploaded here https://www.slideshare.net/SanjayManwani/mysql-8-loves-javascript via @SlideShare

    Quick impression on the conference:

    1. Very buzzy you could see the high energy levels.  Lots to learn about javascript and the speakers were also enthusiastic to speak. 700 attendees.

    2. The focus was spot on. Security is something that everyone is worried about.

    3. Staff was always on the spot and helpful. The organization looked great with the big video boards announcing the next talk etc.

    4. It was also informal communication between the speakers with there being a whatsapp group for speaker to interact with one another.

    5. A few things that I did point out to organizers e.g. the banners on the road outside were not there  (disallowed because of a government conference that was happening in parallel), more aspirational attendees etc.

    All in all a big thanks to the has geek organizers, did enjoy the conference and would like to continue to be a part of the hasgeek conferences.

    (Photo courtesy:  official jsfoo photo tweeted by Zainab)

    Pages