Planet MySQL

Webinar Thursday, November 17: MySQL High Availability with Percona XtraDB Cluster 5.7

Join Percona’s Percona XtraDB Cluster Lead Software Engineer Krunal Bauskar for a webinar on Thursday, November 17, 2016, at 7:30 am PST on MySQL High Availability with Percona XtraDB Cluster 5.7.

Percona XtraDB Cluster 5.7 is our brand new MySQL 5.7 compatible Galera-based high availability (HA) solution. Whether you’re new to MySQL clustering technology, or experienced with Galera-based replication, this tutorial provides great insights into working with the software, including:

  • New and unique Features XtraDB Cluster 5.7, including Cluster Safe Mode, instrumentation with Performance Schema and extended support for encrypted tablespace in multi-master topology
  • Seamless integration with ProxySQL for better HA and read/write splitting
  • Improved security with native data at rest encryption and secure networking
  • Native integration with Docker, optimized for Container World
  • Monitoring with Percona Monitoring and Management (PMM)
  • Improved stability with many critical bug fixes and improved error messaging

This tutorial will demonstrate how to set up XtraDB Cluster, complete with High Availability Proxy and Monitoring, as well as perform the most important MySQL high availability management operations.

Register for this webinar here.

Krunal Bauskar, Percona XtraDB Cluster Lead Software Engineer

Krunal joined Percona in September 2015. Before joining Percona, he worked as part of the InnoDB team at MySQL/Oracle. He authored most of the temporary table revamp work, in addition to many other features. In the past, he worked with Yahoo! Labs researching big data issues, as well as working for a database startup that is now part of Teradata. His interests mainly include data management at any scale – which he has been working at for more than decade now.

Deploying and Monitoring MySQL and MongoDB clusters in the cloud with NinesControl

NinesControl is a new service from Severalnines which helps you deploy MySQL Galera and MongoDB clusters in the cloud. In this blog post we will show you how you can easily deploy and monitor your databases on AWS and DigitalOcean.

Deployment

At the moment of writing, NinesControl supports two cloud providers - Amazon Web Services and DigitalOcean. Before you attempt to deploy, you need first to configure access credentials to the cloud you’d like to run on. We covered this topic in a blog post.

Once it’s done, you should see in the “Cloud Accounts” tab the credentials defined for the chosen cloud provider.

You’ll see screen below as you do not have any clusters running yet:

You can click on “Deploy your first cluster” to start your first deployment. You will be presented with a screen like below - you can pick the cluster type you’d like to deploy, set some configuration settings like port, data directory and password. You can also set number of nodes in the cluster and which database vendor you’d like to use.

For MongoDB, the deployment screen is fairly similar with some additional settings to configure.

Once you are done here, it’s time to move to the second step - picking credentials to use to deploy your cluster. You have an option to pick either DigitalOcean and Amazon Web Services. You can also pick whatever credentials you have added to NinesControl. In our example, we just have a single credential but it’s perfectly ok to have more than one credential per cloud provider.

Once you’ve made your choice, proceed to the third, final step in which you will pick what kind of VM’s you’d like to use. This screen differs between AWS and DigitalOcean.

If you picked AWS, you will have an option to choose the operating system and VM size. You also need to pick the VPC in which you will deploy too and subnet which will be used by your cluster. If you don’t see anything on the drop-down list, you can click on “[Add]” buttons to create both VPC and subnet and NinesControl will create these for you. Finally, you need to set the volume size of the VMs. After that, you can trigger the deployment.

DigitalOcean uses a bit different screen setup but the idea is similar - you need to pick a region, operating system and a size of droplet.

Once you are done, click on “Deploy cluster” to start deployment.

Status of the deployment will be show in the cluster list. You can also click on a status bar to see full log of a deployment. Whenever you’d like to deploy a new cluster, you will have to click on the “Deploy cluster” button.

Monitoring

Once deployment completes, you’ll see a list of your clusters.

When you click on one of them, you’ll see a list of nodes in the cluster and cluster-wide metrics.

Of course, metrics are cluster-dependent. Above is what you will see on a MySQL/MariaDB Galera cluster. MongoDB will present you different graphs and metrics:

When you click on a node, you will be redirected to host statistics of that particular node - CPU, network, disk, RAM usage - all of those very important basics which tell you about node health:

As you can see, NinesControl not only allows you to deploy Galera and MongoDB clusters in a fast and efficient way but it also collects important metrics for you and shows them as graphs.

Give it a try and let us know what you think.

Tags: AWSdigitaloceanMySQLMariaDBMongoDBgalera

Using Vault with MySQL


Using Vault with MySQL

In my previous post I discussed using GPG to secure your database credentials. This relies on a local copy of your MySQL client config, but what if you want to keep the credentials stored safely along with other super secret information? Sure, GPG could still be used, but there must be an easier way to do this.

This post will look at a way to use Vault to store your credentials in a central location and use them to access your database. For those of you that have not yet come across Vault, it is a great way to manage your secrets – securing, storing and tightly controlling access. It has the added benefits of being able to handle leasing, key revocation, key rolling and auditing.

During this blog post we’ll accomplish the following tasks:

  1. Download the necessary software
  2. Get a free SAN certificate to use for Vault’s API and automate certificate renewal
  3. Configure Vault to run under a restricted user and secure access to its files and the API
  4. Create a policy for Vault to provide access control
  5. Enable TLS authentication for Vault and create a self-signed client certificate using OpenSSL to use with our client
  6. Add a new secret to Vault and gain access from a client using TLS authentication
  7. Enable automated, expiring MySQL grants

Before continuing onwards, I should drop in a quick note to say that the following is a quick example to show you how you can get Vault up and running and use it with MySQL, it is not a guide to production setup and does not cover High Availability (HA) implementations, etc.

Download time

We will be using some tools in addition to Vault, Let’s Encrypt, OpenSSL and json_pp (a command line utility using JSON::PP). For this post we’ll be using Ubuntu 16.04 LTS and we’ll presume that these aren’t yet installed.

$ sudo apt-get install letsencrypt openssl libjson-pp-perl

If you haven’t already heard of Let’s Encrypt then it is a free, automated, and open Certificate Authority (CA) enabling you to secure your website or other services without paying for an SSL certificate; you can even create Subject Alternative Name (SAN) certificates to make your life even easier, allowing one certificate to be used a number of different domains. The Electronic Frontier Foundation (EFF) provide Certbot, the recommended tool to manage your certificates, which is the new name for the letsencrypt software. If you don’t have letsencrypt/certbot in your package manager then you should be able to use the quick install method. We’ll be using json_pp to prettify the JSON output from the Vault API and openssl to create a client certificate.

We also need to download Vault, choosing the binary relevant for your Operating System and architecture. At the time of writing this, the latest version of Vault is 0.6.2, so the following steps may need adjusting if you use a different version.

# Download Vault (Linux x86_64), SHA256SUMS and signature $ wget https://releases.hashicorp.com/vault/0.6.2/vault_0.6.2_linux_amd64.zip https://releases.hashicorp.com/vault/0.6.2/vault_0.6.2_SHA256SUMS.sig https://releases.hashicorp.com/vault/0.6.2/vault_0.6.2_SHA256SUMS # Import the GPG key $ gpg --keyserver pgp.mit.edu --recv-keys 51852D87348FFC4C # Verify the checksums $ gpg --verify vault_0.6.2_SHA256SUMS.sig gpg: assuming signed data in `vault_0.6.2_SHA256SUMS' gpg: Signature made Thu 06 Oct 2016 02:08:16 BST using RSA key ID 348FFC4C gpg: Good signature from "HashiCorp Security <security@hashicorp.com>" gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: 91A6 E7F8 5D05 C656 30BE F189 5185 2D87 348F FC4C # Verify the download $ sha256sum --check <(fgrep vault_0.6.2_linux_amd64.zip vault_0.6.2_SHA256SUMS) vault_0.6.2_linux_amd64.zip: OK # Extract the binary $ sudo unzip -j vault_0.6.2_linux_amd64.zip -d /usr/local/bin Archive: vault_0.6.2_linux_amd64.zip inflating: /usr/local/bin/vault

Let’s Encrypt… why not?

We want to be able to access Vault from wherever we are, we can put additional security in place to prevent unauthorised access, so we need to get ourselves encrypted. The following example shows the setup on a public server, allowing the CA to authenticate your request. More information on different methods can be found in the Certbot documentation.

$ sudo letsencrypt --webroot -w /home/www/vhosts/default/public -d myfirstdomain.com -d myseconddomain.com #IMPORTANT NOTES: # - Congratulations! Your certificate and chain have been saved at # /etc/letsencrypt/live/myfirstdomain.com/fullchain.pem. Your cert will # expire on 2017-01-29. To obtain a new or tweaked version of this # certificate in the future, simply run certbot again. To # non-interactively renew *all* of your certificates, run "certbot # renew" # - If you like Certbot, please consider supporting our work by: # # Donating to ISRG / Let's Encrypt: https://letsencrypt.org/donate # Donating to EFF: https://eff.org/donate-le #

That’s all it takes to get a SAN SSL certificate! The server that this was executed has a public webserver serving the domains that the certificates were requested for. During the request process a file is place in the specified webroot and is used to authenticate the domain(s) for the request. Essentially, the command said:

myfirstdomain.com and myseconddomain.com use /home/www/vhosts/default/public for the document root, so place your files there

Let’s Encrypt CA issues short-lived certificates (90 days), so you need to keep renewing them, but don’t worry as that is as easy as it was to create them in the first place! You can test that renewal works OK as follows (which will renew all certificates that you have without --dry-run):

$ sudo letsencrypt renew --dry-run # #------------------------------------------------------------------------------- #Processing /etc/letsencrypt/renewal/myfirstdomain.com.conf #------------------------------------------------------------------------------- #** DRY RUN: simulating 'letsencrypt renew' close to cert expiry #** (The test certificates below have not been saved.) # #Congratulations, all renewals succeeded. The following certs have been renewed: # /etc/letsencrypt/live/myfirstdomain.com/fullchain.pem (success) #** DRY RUN: simulating 'letsencrypt renew' close to cert expiry #** (The test certificates above have not been saved.) # #IMPORTANT NOTES: # - Your account credentials have been saved in your Certbot # configuration directory at /etc/letsencrypt. You should make a # secure backup of this folder now. This configuration directory will # also contain certificates and private keys obtained by Certbot so # making regular backups of this folder is ideal.

Automating renewal

The test run for renewal worked fine, so we can now go and schedule this to take place automatically. I’m using systemd so the following example uses timers, but cron or similar could be used too. Here’s how to make systemd run the scheduled renew for you, running at 0600 – the rewew process will automatically proceed for any previously-obtained certificates that expire in less than 30 days.

$ sudo cat <<EOF > /etc/systemd/system/cert-renewal.service [Unit] Description=SSL renewal [Service] Type=simple ExecStart=/usr/bin/letsencrypt renew --quiet User=root Group=root EOF $ sudo cat <<EOF > /etc/systemd/system/cert-renewal.timer [Unit] Description=Automatic SSL renewal [Timer] OnCalendar=*-*-* 06:00:00 Persistent=true [Install] WantedBy=timers.target EOF $ sudo systemctl enable cert-renewal.timer Created symlink from /etc/systemd/system/timers.target.wants/cert-renewal.timer to /etc/systemd/system/cert-renewal.timer. $ sudo systemctl start cert-renewal.timer $ sudo systemctl list-timers NEXT LEFT LAST PASSED UNIT ACTIVATES Tue 2016-11-01 06:00:00 UTC 6h left n/a n/a cert-renewal.timer cert-renewal.service

Getting started with Vault

Firstly, a quick reminder that this is not an in-depth review, how-to or necessarily best-practice Vault installation as that is beyond the scope of this post. It is just to get you going to test things out, so please read up on the Vault documentation if you want to use it more seriously.

Whilst there is a development server that you can fire up with the command vault server -dev to get yourself testing a little quicker, we’re going to take a little extra time and configure it ourselves and make the data persistent. Vault supports a number of backends for data storage, including Zookeeper, Amazon S3 and MySQL, however the 3 maintained by HashiCorp are consul, file and inmem. The memory storage backend does not provide persistent data, so whilst there could possibly be uses for this it is really only useful for development and testing – it is the storage backend used with the -dev option to the server command. Rather than tackle the installation and configuration of Consul during this post, we’ll use file storage instead.

Before starting the server we’ll create a config, which can be written in one of 2 formats – HCL (HashiCorp Configuration Language) or JSON (JavaScript Object Notation). We’ll use HCL as it is a little cleaner and saves us a little extra typing!

# Create a system user $ sudo useradd -r -g daemon -d /usr/local/vault -m -s /sbin/nologin -c "Vault user" vault $ id vault uid=998(vault) gid=1(daemon) groups=1(daemon) # Create a config directory remove global access $ sudo mkdir /etc/vault /etc/ssl/vault $ sudo chown vault.root /etc/vault /etc/ssl/vault $ sudo chmod 750 /etc/vault /etc/ssl/vault $ sudo chmod 700 /usr/local/vault # Copy the certficates and key $ sudo cp -v /etc/letsencrypt/live/myfirstdomain.com/*pem /etc/ssl/vault /etc/letsencrypt/live/myfirstdomain.com/cert.pem -> /etc/ssl/vault/cert.pem /etc/letsencrypt/live/myfirstdomain.com/chain.pem -> /etc/ssl/vault/chain.pem /etc/letsencrypt/live/myfirstdomain.com/fullchain.pem -> /etc/ssl/vault/fullchain.pem /etc/letsencrypt/live/myfirstdomain.com/privkey.pem -> /etc/ssl/vault/privkey.pem # Create a combined PEM certificate $ sudo cat /etc/ssl/vault/{cert,fullchain}.pem /etc/ssl/vault/fullcert.pem # Write the config to file $ cat <<EOF | sudo tee /etc/vault/demo.hcl listener "tcp" { address = "10.0.1.10:8200" tls_disable = 0 tls_cert_file = "/etc/ssl/vault/fullcert.pem" tls_key_file = "/etc/ssl/vault/privkey.pem" } backend "file" { path = "/usr/local/vault/data" } disable_mlock = true EOF

So, we’ve now set up a user and some directories to store the config, SSL certificate and key, and also the data, restricting access to the vault user. The config that we wrote specifies that we will use the file backend, storing data in /usr/local/vault/data, and the listener that will be providing TLS encryption using our certificate from Let’s Encrypt. The final setting, disable_mlock is not recommended for production and is being used to avoid some extra configuration during this post. More details about the other options available for configuration can be found in the Server Configuration section of the online documentation.

Please note that the Vault datadir should be kept secured as it contains all of the keys and secrets. In the example, we have done this by placing it in the vault user’s home directory and only allowing the vault user access. You can take this further by restricting local access (via logins) and access control lists

Starting Vault

Time to start the server and see if everything is looking good!

$ sudo -su vault vault server -config=/etc/vault/demo.hcl >/tmp/vault-debug.log 2>&1 & $ jobs [1] + running sudo -su vault vault server -config=/etc/vault/demo.hcl > /tmp/vault-debug.lo $ VAULT_ADDR=https://myfirstdomain.com:8200 vault status Error checking seal status: Error making API request. URL: GET https://myfirstdomain.com:8200/v1/sys/seal-status Code: 400. Errors: * server is not yet initialized

Whilst it looks like something is wrong (we need to initialize the server), it does mean that everything is otherwise working as expected. So, we’ll initialize Vault, which is a pretty simple task, but you do need to make note/store some of the information that you will be given by the server during initialization – the unseal tokens and initial root key. You should distribute these to somewhere safe, but for now we’ll store them with the config.

# Change to vault user $ sudo su -l vault -s /bin/bash (vault)$ export VAULT_ADDR=https://myfirstdomain.com:8200 VAULT_SSL=/etc/ssl/vault # Initialize Vault and save the token and keys (vault)$ vault init 2>&1 | egrep '^Unseal Key|Initial Root Token' >/etc/vault/keys.txt (vault)$ chmod 600 /etc/vault/keys.txt # Unseal Vault (vault)$ egrep -m3 '^Unseal Key' /etc/vault/keys.txt | cut -f2- -d: | tr -d ' ' | while read key do vault unseal -ca-cert=${VAULT_SSL}/fullchain.pem -client-cert=${VAULT_SSL}/client.pem -client-key=${VAULT_SSL}/privkey.pem ${key} done Sealed: true Key Shares: 5 Key Threshold: 3 Unseal Progress: 1 Sealed: true Key Shares: 5 Key Threshold: 3 Unseal Progress: 2 Sealed: false Key Shares: 5 Key Threshold: 3 Unseal Progress: 0 # Check Vault status (vault)$ vault status Sealed: false Key Shares: 5 Key Threshold: 3 Unseal Progress: 0 Version: 0.6.2 Cluster Name: vault-cluster-ebbd5ec7 Cluster ID: 61ae8f54-f420-09c1-90bb-60c9fbfa18a2 High-Availability Enabled: false

There we go, the vault is initialized and the status command now returns details and confirmation that it is up and running. It is worth noting here that each time you start Vault it will be sealed, which means that it cannot be accessed until 3 unseal keys have been used with vault unseal – for additional security here you would ensure that a single person cannot know any 3 keys, so that it always requires more than one person to (re)start the service.

Setting up a policy

Policies allow you to set access control restrictions to determine the data that authenticated users have access to. Once again the documents used to write policies are in either the HCL or JSON format. They are easy to write and apply, the only catch being that the policies associated with a token cannot be changed (added/removed) once the token has been issued; you need to revoke the token and apply the new policies. However, If you want to change the policy rules then this can be done on-the-fly as modifications apply on the next call to Vault.

When we initialized the server we were given the initial root key and we now need to use that in order to start configuring the server.

(vault)$ export VAULT_TOKEN=$(egrep '^Initial Root Token:' /etc/vault/keys.txt | cut -f2- -d: | tr -d ' ')

We will create a simple policy that allows us to read the MySQL secrets, but prevent access to the system information and commands

(vault)$ cat <<EOF > /etc/vault/demo-policy.hcl path "sys/*" { policy = "deny" } path "secret/mysql/*" { policy = "read" capabilities = ["list", "sudo"] } EOF (vault)$ vault policy-write demo /etc/vault/demo-policy.hcl Policy 'demo' written.

We have only added one policy here, but you should really create as many policies as you need to suitably control access amongst the variety of humans and applications that may be using the service. As with any kind of data storage planning how to store your data is important, as it will help you write more compact policies with the level of granularity that you require. Writing everything in /secrets at the top level will most likely bring you headaches, or long policy definitions!

TLS authentication for MySQL secrets

We’re getting close to adding our first secret to Vault, but first of all we need a way to authenticate our access. Vault provides an API for access to your stored secrets, along with wealth of commands with direct use of the vault binary as we are doing at the moment. We will now enable the cert authentication backend, which allows authentication using SSL/TLS client certificates

(vault)$ vault auth-enable cert Successfully enabled 'cert' at 'cert'!

Generate a client certificate using OpenSSL

The TLS authentication backend accepts certificates that are either signed by a CA or self-signed, so let’s quickly create ourselves a self-signed SSL certificate using openssl to use for authentication.

# Create working directory for SSL managment and copy in the config $ mkdir ~/.ssl && cd $_ $ cp /usr/lib/ssl/openssl.cnf . # Create a 4096-bit CA $ openssl genrsa -des3 -out ca.key 4096 Generating RSA private key, 4096 bit long modulus ...........++ ..........................................................................++ e is 65537 (0x10001) Enter pass phrase for ca.key: Verifying - Enter pass phrase for ca.key: $ openssl req -config ./openssl.cnf -new -x509 -days 365 -key ca.key -out ca.crt Enter pass phrase for ca.key: You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [GB]: State or Province Name (full name) [Some-State]: Locality Name (eg, city) [Some-Place]: Organization Name (eg, company) [Percona]: Organizational Unit Name (eg, section) [Demo]: Comon Name (e.g. server FQDN or YOUR name) [ceri]: Email Address [thisisnotme@myfirstdomain.com]: # Create a 4096-bit Client Key and CSR $ openssl genrsa -des3 -out client.key 4096 Generating RSA private key, 4096 bit long modulus ......................++ ..................................++ e is 65537 (0x10001) Enter pass phrase for client.key: Verifying - Enter pass phrase for client.key: $ openssl req -config ./openssl.cnf -new -key client.key -out client.csr Enter pass phrase for client.key: You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [GB]: State or Province Name (full name) [Some-State]: Locality Name (eg, city) [Some-Place]: Organization Name (eg, company) [Percona]: Organizational Unit Name (eg, section) [Demo]: Comon Name (e.g. server FQDN or YOUR name) [ceri]: Email Address [thisisnotme@myfirstdomain.com]: Please enter the following 'extra' attributes to be sent with your certificate request A challenge password []: An optional company name []: # Self-sign $ openssl x509 -req -days 365 -in client.csr -CA ca.crt -CAkey ca.key -set_serial 01 -out client.crt Signature ok subject=/C=GB/ST=Some-State/L=Some-Place/O=Percona/OU=Demo/CN=ceri/emailAddress=thisisnotme@myfirstdomain.com Getting CA Private Key Enter pass phrase for ca.key: # Create an unencrypted copy of the client key $ openssl rsa -in client.key -out privkey.pem Enter pass phrase for client.key: writing RSA key # Copy the certificate for Vault access $ sudo cp client.crt /etc/ssl/vault/user.pem

OK, there was quite a lot of information there. You can edit openssl.cnf to set reasonable defaults for yourself and save time. In brief, we have created our own CA, created a self-signed certificate and then created a single PEM certificate with a decrypted key (this avoids specifying the password to use it – you may wish to leave the password in place to add more security, assuming that your client application can request the password.

Adding an authorisation certificate to Vault

Now that we have created a certificate and a policy we now need to allow authentication to occur using the certificate. We will give the token a 1-hour expiration and allow access to the MySQL secrets via the demo policy that we created in the previous step.

(vault)$ vault write auth/cert/certs/demo display_name=demo policies=demo certificate=@${VAULT_SSL}/user.pem ttl=3600 Success! Data written to: auth/cert/certs/demo $ curl --cert user.pem --key privkey.pem ${VAULT_ADDR}/v1/auth/cert/login -X POST {"request_id":"d5715ce1-2c6c-20c8-83ef-ce6259ad9110","lease_id":"","renewable":false,"lease_duration":0,"data":null,"wrap_info":null,"warnings":null,"auth":{"client_token":"e3b98fac-2676-9f44-fdc2-41114360d2fd","accessor":"4c5b4eb5-4faf-0b01-b732-39d309afd216","policies":["default","demo"],"metadata":{"authority_key_id":"","cert_name":"demo","common_name":"thisisnotme@myfirstdomain.com","subject_key_id":""},"lease_duration":600,"renewable":true}}

Awesome! We requested out first client token using an SSL client certificate, we are logged it and we were given our access token (client_token) in the response that provides us with a 1 hour lease (lease_duration) to go ahead and make requests as a client without reauthentication, but there is nothing in the vault right now.

Ssshh!! It’s secret!

“The time has come,” the Vault master said, “to encrypt many things: our keys and passwords and top-secret notes, our MySQL DSNs and strings.”

Perhaps the easiest way to use Vault with your application is to store information there as you would do in a configuration file and read it when the application first requires it. An example of such information is the Data Source Name (DSN) for a MySQL connection, or perhaps the information needed to dynamically generate a .my.cnf. As this is about using Vault with MySQL we will do exactly that and store the user, password and connection method as our first secret, reading it back using the command line tool to check that it looks as expected.

(vault)$ $ vault write secret/mysql/test password="mysupersecretpassword" user="percona" socket="/var/run/mysqld/mysqld.sock" Success! Data written to: secret/mysql/test (vault)$ vault read secret/mysql/test Key Value --- ----- refresh_interval 768h0m0s password mysupersecretpassword socket /var/run/mysqld/mysqld.sock user percona

A little while back (hopefully less than 1 hour ago!) we authenticated using cURL and gained a token, so now that we have something secret to read we can try it out. Fanfares and trumpets at the ready…

$ curl --cert user.pem --key privkey.pem -H 'Content-type: application/json' -H 'X-Vault-Token: 2f1fb630-cbe9-a8c9-5931-515a12d79291' ${VAULT_ADDR}/v1/secret/mysql/test -X GET 2>/dev/null | json_pp { "wrap_info" : null, "lease_id" : "", "request_id" : "c79033b1-f8f7-be89-4208-44d721a55804", "auth" : null, "data" : { "password" : "mysupersecretpassword", "socket" : "/var/run/mysqld/mysqld.sock", "user" : "percona" }, "lease_duration" : 2764800, "renewable" : false, "warnings" : null }

We did it! Now there is no longer the need to store passwords in your code or config files, you can just go and get them from Vault when you need them, such as when your application starts and holding them in memory, or on-demand if your application can tolerate any additional latency, etc. You would need to take further steps to make sure that your application is tolerant of Vault going down, as well as providing an HA setup of Vault to minimise the risk of the secrets being unavailable.

It doesn’t stop here though…

On-demand MySQL grants

Vault acts like a virtual filesystem and uses the generic storage backend by default, mounted as /secret, but due to powerful abstraction it is possible to use many other backends as mountpoints such as an SQL database, AWS IAM, HSMs and much more. We have kept things simple and been using the generic backend so far. You can view the available (mounted) backends using the mounts command:

(vault)$ vault mounts Path Type Default TTL Max TTL Description secret/ generic system system generic secret storage sys/ system n/a n/a system endpoints used for control, policy and debugging

We are now going to enable the MySQL backend, add the management connection (which will use the auth_socket plugin) and then request a new MySQL user that will auto-expire!

# Create a dedicated MySQL user account $ mysql -Bsse "CREATE USER vault@localhost IDENTIFIED WITH auth_socket; GRANT CREATE USER, SELECT, INSERT, UPDATE ON *.* TO vault@localhost WITH GRANT OPTION;" # Enable the MySQL backend and set the connection details (vault)$ vault mount mysql (vault)$ vault write mysql/config/connection connection_url="vault:vault@unix(/var/run/mysqld/mysqld.sock)/" Read access to this endpoint should be controlled via ACLs as it will return the connection URL as it is, including passwords, if any. # Write the template for the readonly role (vault)$ vault write mysql/roles/readonly sql="CREATE USER '{{name}}'@'%' IDENTIFIED WITH mysql_native_password BY '{{password}}' PASSWORD EXPIRE INTERVAL 1 DAY; GRANT SELECT ON *.* TO '{{name}}'@'%';" Success! Data written to: mysql/roles/readonly # Set the lease on MySQL grants (vault)$ vault write mysql/config/lease lease=1h lease_max=12h Success! Data written to: mysql/config/lease

Here you can see that a template is created so that you can customise the grants per role. We created a readonly role, so it just has SELECT access. We have set an expiration on the account so that MySQL will automatically mark the password as expired and prevent access. This is not strictly necessary since Vault will remove the user accounts that it created as it expires the tokens, but by adding an extra level in MySQL it would allow you to set the lease, which seems to be global, in Vault to a little longer than required and vary it by role using MySQL password expiration. You could also use it as a way of tracking which Vault-generated MySQL accounts are going to expire soon. The important part is that you ensure that the application is tolerant of reauthentication, whether it would hand off work whilst doing so, accept added latency, or perhaps the process would terminate and respawn.

Now we will authenticate and request our user to connect to the database with.

$ curl --cert user.pem --key privkey.pem -H 'Content-type: application/json' ${VAULT_ADDR}/v1/auth/cert/login -X POST 2>/dev/null | json_pp { "auth" : { "policies" : [ "default", "demo" ], "accessor" : "2e6d4b95-3bf5-f459-cd27-f9e35b9bed16", "renewable" : true, "lease_duration" : 3600, "metadata" : { "common_name" : "thisisnotme@myfirstdomain.com", "cert_name" : "demo", "authority_key_id" : "", "subject_key_id" : "" }, "client_token" : "018e6feb-65c4-49f2-ae30-e4fbba81e687" }, "lease_id" : "", "wrap_info" : null, "renewable" : false, "data" : null, "request_id" : "f00fe669-4382-3f33-23ae-73cec0d02f39", "warnings" : null, "lease_duration" : 0 } $ curl --cert user.pem --key privkey.pem -H 'Content-type: application/json' -H 'X-Vault-Token: 018e6feb-65c4-49f2-ae30-e4fbba81e687' ${VAULT_ADDR}/v1/mysql/creds/readonly -X GET 2>/dev/null | json_pp { "errors" : [ "permission denied" ] }

Oh, what happened? Well, remember the policy that we created earlier? We hadn’t allowed access to the MySQL role generator, so we need to update and apply the policy.

(vault)$ cat <<EOF | vault policy-write demo /dev/stdin path "sys/*" { policy = "deny" } path "secret/mysql/*" { policy = "read" capabilities = ["list", "sudo"] } path "mysql/creds/readonly" { policy = "read" capabilities = ["list", "sudo"] } EOF Policy 'demo' written.

Now that we have updated the policy to allow access to the readonly role (requests go via mysql/creds when requesting access) we can check that the policy has applied and whether we get a user account for MySQL.

# Request a user account $ curl --cert user.pem --key privkey.pem -H 'Content-type: application/json' -H 'X-Vault-Token: 018e6feb-65c4-49f2-ae30-e4fbba81e687' ${VAULT_ADDR}/v1/mysql/creds/readonly -X GET 2>/dev/null | json_pp { "request_id" : "7b45c9a1-bc46-f410-7af2-18c8e91f43de", "lease_id" : "mysql/creds/readonly/c661426c-c739-5bdb-cb7a-f51f74e16634", "warnings" : null, "lease_duration" : 3600, "data" : { "password" : "099c8f2e-588d-80be-1e4c-3c2e20756ab4", "username" : "read-cert-401f2c" }, "wrap_info" : null, "renewable" : true, "auth" : null } # Test MySQL access $ mysql -h localhost -u read-cert-401f2c -p Enter password: Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 17 Server version: 5.7.14-8-log Percona Server (GPL), Release '8', Revision '1f84ccd' Copyright (c) 2009-2016 Percona LLC and/or its affiliates Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql> show grants; +-----------------------------------------------+ | Grants for read-cert-401f2c@% | +-----------------------------------------------+ | GRANT SELECT ON *.* TO 'read-cert-401f2c'@'%' | +-----------------------------------------------+ 1 row in set (0.00 sec) # Display the full account information $ pt-show-grants --only='read-cert-401f2c'@'%' -- Grants dumped by pt-show-grants -- Dumped from server Localhost via UNIX socket, MySQL 5.7.14-8-log at 2016-11-08 23:28:37 -- Grants for 'read-cert-401f2c'@'%' CREATE USER IF NOT EXISTS 'read-cert-401f2c'@'%'; ALTER USER 'read-cert-401f2c'@'%' IDENTIFIED WITH 'mysql_native_password' AS '*FF157E33408E1FBE707B5FF89C87A2D14E8430C2' REQUIRE NONE PASSWORD EXPIRE INTERVAL 1 DAY ACCOUNT UNLOCK; GRANT SELECT ON *.* TO 'read-cert-401f2c'@'%';

Hurrah! Now we don’t even need to go and create a user, the application can get one when it needs one. We’ve made the account auto-expire so that the credentials are only valid for 1 day, regardless of Vault expiration, and also we’ve reduced the amount of time that the token is valid, so we’ve done a pretty good job of limiting the window of opportunity for any rogue activity

We’ve covered quite a lot in this post, some detail for which has been left out to keep us on track. The online documentation for OpenSSL, Let’s Encrypt and Vault are pretty good, so you should be able to take a deeper dive should you wish to. Hopefully, this post has given a good enough introduction to Vault to get you interested and looking to test it out, as well as bringing the great Let’s Encrypt service to your attention so that there’s very little reason to not provide a secure online experience for your readers, customers and services.

MongoDB Through a MySQL Lens

This blog post looks at MongoDB and MySQL, and covers high-level MongoDB strengths, weaknesses, features, and uses from the perspective of an SQL user.

Delving into NoSQL coming from an exclusively SQL background can seem like a daunting task. I have worked with SQL in both small MySQL environments and large Oracle SQL environments. 

When is it a good choice?

MongoDB is an incredibly robust, scalable, and operator-friendly database solution. MongoDB is a good choice when your developers will also be responsible for the database environment. In small shops and startups, this might be the case. MongoDB stores information in BSON (binary JSON). BSON is the native JSON (JavaScript Object Notation) language used by MongoDB to retrieve information stored in BSON on the back end. JSON is easily relatable to other programming languages, and many developers will already have experience with it.

MongoDB is also a good option when you expect a great deal of your traffic to be writes. This is not to say that MySQL does not have good options when dealing with write-heavy environments, but MongoDB handles this with relative ease. Facebook designed the RocksDB storage engine for write-heavy environments, and performs well (with benchmark testing demonstrating this).

MongoDB is a good choice when you need a schemaless, or schema-flexible, data structure. MongoDB handles changes to your data organization with relative ease and grace. This is the selling point of NoSQL solutions. There have been many improvements in the MySQL world that make online schema changes possible, but the ease at which this is done in MongoDB has not yet been matched. The ability to create records without defining structure gives MongoDB added flexibility.

Another reason to choose MongoDB is its functionality with regards to replication setup, built-in sharding, and auto elections. Setting up a replicated environment in MongoDB is easy, and the auto-election process allows a secondary to take over in the event of a primary database failure. Built-in sharding allows for easy horizontal scaling, which can be more complicated to manage, setup and configure in a MySQL environment.

When should you choose something else?

MongoDB is a great choice for some use cases. It is also not a great choice for others. MongoDB might not be the right choice when your data is highly relational and structured. MongoDB does not support  transactions, but on a document level there is atomicity. There are configuration considerations to make for a replicated environment with regards to write concern, but these come at the cost of performance. Write concern verifies that replicas have written the information. By default, MongoDB sets the write concern to request acknowledgment from the primary only, not replicas. This can lead to consistency issues if there is a problem with the replicas.

How is the structure different?

Many concepts in the SQL world are relatable to the document structure of MongoDB. Let’s take a look at the high-level structure of a simple MongoDB environment to better understand how MongoDB is laid out.

The below chart relates MySQL to MongoDB (which is found in MongoDB’s documentation).


Another interesting note is the mongod process. This is a daemon that processes data requests, much the same as the mysqld process for MySQL. This is the process that listens for MongoDB requests, and manages access to the database. As with MySQL, there are a number of start-up options for the mongod process. One of the most important configuration options is --config which specifies a config file to use for your mongod instance. Slightly different from MySQL, this file uses YAML formatting. Below is an example config file for MongoDB. Please note this is to demonstrate formatting. It isn’t optimized for any production database.

By definition, MongoDB is a document store database. This chart gives you some idea of how that relates to the structure of MySQL or any SQL flavor. Instead of building a table and adding data, you can immediately insert documents into a collection without having to define a structure. This is one of the advantages in flexibility that MongoDB offers over MySQL. It is important to note that just because MongoDB offers this flexibility does not mean that organizing a highly functional production MongoDB database is effortless. Similar to choosing any database, thought should be put into the structure and goal of the database to avoid pitfalls down the line.

# mongod.conf, Percona Server for MongoDB # for documentation of all options, see: # http://docs.mongodb.org/manual/reference/configuration-options/ # Where and how to store data. storage: dbPath: /var/lib/mongodb journal: enabled: true engine: rocksdb # where to write logging data. systemLog: destination: file logAppend: true path: /var/log/mongodb/mongod.log processManagement: fork: true pidFilePath: /var/run/mongod.pid # network interfaces net: port: 27017 bindIp: 127.0.0.1

NOTE: YAML formatting does not handle tab. Use spaces to indent.

How is querying different?

Interacting with the database via the shell also offers something slightly different from SQL. JSON queries MongoDB. Again, this should be familiar to web developers which is one of the appeals of using MongoDB. Below is an example of a query translated from SQL to MongoDB. We have a user table with just usernames and an associated ID.

In SQL:

select username from user where id = 2;

In MongoDB:

db.user.find({_id:2},{“username”:1})

In the JSON format, we specify the user collection to query from and then the ID associated with the document we are interested in. Finally, the field is specified from which we want the value. The result of this query would be the username of the user that has an ID of 2.

Final thoughts

MongoDB is not a silver bullet to your MySQL woes. As both databases continue to evolve, their weaknesses and strengths slowly start to blend together. Do not let the flexibility of MongoDB’s structure fool you into thinking that you must not have a plan for your database environment. This will surely lead to headaches down the road. The flexibility should allow for dynamic, fast changes not thoughtless ones. I encourage any MySQL user to get their hands on a MongoDB instance for testing purposes. MongoDB is a popular option in the e-commerce and gaming world because of its flexibility in schema design and its ability to scale horizontally with large amounts of data.

MySQL Connector/Node.js 1.0.5 m4 development has been released

MySQL Connector/Node.js is a new Node.js driver for use with the X DevAPI. This release, v1.0.5 M4, is the fourth development release of the MySQL Connector/Node.js 1.0 series.

The X DevAPI enables application developers to write code that combines the strengths of the relational and document models using a modern, NoSQL-like syntax that does not assume previous experience writing traditional SQL.

To learn more about how to write applications using the X DevAPI, see
http://dev.mysql.com/doc/x-devapi-userguide/en/index.html. For more information about how the X DevAPI is implemented in MySQL Connector/Node.js, and its usage, see http://dev.mysql.com/doc/dev/connector-nodejs.

Note

Please note that the X DevAPI requires at least MySQL Server version 5.7.12 or higher with the X Plugin enabled. For general documentation about how to get started using MySQL as a document store, see
http://dev.mysql.com/doc/refman/5.7/en/document-store.html.

Functionality Added or Changed

  • Added APIs for Transaction handling.
  • Added a Table creation API.

Enjoy and thanks for the support!

On behalf of the MySQL/Oracle Release Engineering Team
Piotr Obrzut

Bug fix – Error -1 from storage engine while creating table with non-existing datadir on MySQL version 5.6

According to this bug report http://bugs.mysql.com/bug.php?id=79151

when client tries to create a new table with not existing (or write permission denied) datadir, then he gets an error message like this: ERROR 1030 (HY000): Got error -1 from storage engine

As you see it is not informative error message, we don’t know what’s wrong at the backend. But in the version 5.7 it seems this problem was resolved already.

I discovered that at the low level of innodb storage engine, program tries to create subfolders, when occurs a problem it checks system error partly in the file storage/innobase/fil/fil0fil.cc:

success = os_file_create_subdirs_if_needed(path);

os_file_create_subdirs_if_needed function in the storage/innobase/os/os0file.cc file checks system errors but returns only boolean value depending on the result:

/*Creates all missing subdirectories along the given path.

@returnTRUE if call succeeded FALSE otherwise */

UNIV_INTERN

ibool

os_file_create_subdirs_if_needed(

/*=============================*/

const char*path)/*!< in: path name */

{

if (srv_read_only_mode) {

ib_logf(IB_LOG_LEVEL_ERROR,

“read only mode set. Can’t create subdirectories ‘%s'”,

path);

return(FALSE);

}

char*subdir = os_file_dirname(path);

if (strlen(subdir) == 1

    && (*subdir == OS_FILE_PATH_SEPARATOR || *subdir == ‘.’)) {

/* subdir is root or cwd, nothing to do */

mem_free(subdir);

return(TRUE);

}

/* Test if subdir exists */

os_file_type_ttype;

iboolsubdir_exists;

iboolsuccess = os_file_status(subdir, &subdir_exists, &type);

if (success && !subdir_exists) {

/* subdir does not exist, create it */

success = os_file_create_subdirs_if_needed(subdir);

if (!success) {

mem_free(subdir);

return(FALSE);

}

success = os_file_create_directory(subdir, FALSE);

}

mem_free(subdir);

return(success);

}

Depends on the value of boolean variable success in the fil0fil.cc file, program sends to the user unclear message. For this reason I added to the fil0fil.cc file a few lines of codes to resolve this problem:

// checking success variable value if false and global errno variable has a value

if(!success && errno) {

// sends to the client custom message with system error number

my_error(ER_GET_ERRNO, MYF(0), errno);

mem_free(path);

return(DB_ERROR);

}


MySQL Support Engineer's Chronicles, Issue #3

The original idea of this series was to publish one post per week, but it seems every other week I have some special topic that well deserves a dedicated post. Last week I had no time to complete my writing because of long (and, I hope, useful) Howto post on replacing corrupted partition using non-corrupted one from other server in replication setup. But I had links and notes collected in a draft that I am going to complete now.

First of all, during the previous week I had time to submit two more talks for the  "MySQL and Friends Devroom" at FOSDEM 2017, "Applying profilers to MySQL" and "Background Story of a MySQL Bug". Call for papers is still open, as far as I understand, so I may come up with few more ideas on what to talk about.

Strange to admit this, but sometimes I can be excited with something MySQL-related. Two weeks ago I've added a note to myself about great, detailed changelogs that MariaDB publishes, with proper links to GitHub commits. Check this one for 10.0.28, for example. I wish Oracle provides the same level of details for MySQL releases as a part of their official release notes.

Still, sometimes important changes happen in upstream MySQL, get merged, and details about inherited incompatible change (and its real impact) are still missing in any release notes of any kind. Let's consider recent example. Historically MySQL treated incorrect utf8 bytes sequences differently for INSERT than for LOAD DATA. INSERT failed, LOAD DATA just had the value truncated on the first incorrect character and continued. Eventually (in MySQL 5.6.32) this was fixed by upstream MySQL (it was also fixed in a wider context in MariaDB 10.2 in frames of MDEV-9823). MySQL 5.6.32 release notes says about the incompatible change:
  • "Incompatible Change: For multibyte character sets, LOAD DATA could fail to allocate space correctly and ignore input rows as a result. (Bug #76237, Bug #20683959, Bug #23080148)
    References: This issue is a regression of: Bug #14653594."
But it surely says nothing about the impact for replication or another Bug #78758 that is fixed in 5.6.32 (by making LOAD DATA fail with error). It costed me some time to figure out all the details. Read MDEV-11217 for the historical details, nice test case and different views on the decisions made. Note also that now error message about the bad utf8 character from LOAD DATA looks (IMHO) weird enough, as it contains actually only a valid part of the string. See my MDEV-11216 about this.

I am still having fun with InnoDB locks. This week I checked what locks are set when the same DELETE (for the table with the primary key and unique secondary index) is executed twice in the same transaction. Check Bug #83640 and tell me how this sequence of locks set in one transaction may make any sense:
---TRANSACTION 636201, ACTIVE 202 sec
5 lock struct(s), heap size 1184, 4 row lock(s), undo log entries 1
MySQL thread id 1, OS thread handle 0x7f9e513a7700, query id 92 localhost root init
show engine innodb status
TABLE LOCK table `test`.`tu` trx id 636201 lock mode IX
RECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X locks rec but not gap
RECORD LOCKS space id 11 page no 3 n bits 72 index `PRIMARY` of table `test`.`tu` trx id 636201 lock_mode X locks rec but not gap
RECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X
RECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X locks gap before recMy colleague Jan Lindström was also surprised, so we have MDEV-11215 as well, and a chance to see this studies and maybe changed by MariaDB engineers. Related problems were discussed in the past, see Bug #19762 and Bug #55717.

Some days I keep wondering what happens to XtraBackup these days in Percona. As far as I remember I was not able to reproduce lp:1461833 while working there, but the fact that the bug is still open and got no comments since I re-opened it gives me no additional confidence.

I report bugs and missing details in MySQL way too often even for my own liking. But MySQL manual really misses many details to explain results that users see in production. This week I'd like to remind about one of my bug reports about missing details in MySQL documentation, Bug #77390, and my request there:
"Please, explain all metadata and InnoDB locks set by online ALTER, with examples and details enough to explain non-trivial cases..." Honestly, until this month I never noted that DDL log exists in MySQL. This is a ddl_log.log binary file that can be "dumped" into a somewhat readable form using a script by Mattias Jonsson from Bug #47343:
[openxs@fc23 5.7]$ perl ~/ddl_log_dump.pl data/ddl_log.log
Header: Num entries: 4 Name length: 512 Block length 4096
Entry 1 type i action s next 0
  name ./test/trange2
not 'e' entry (i)
Entry 2 type l action d next 0
  name ./test/#sql-trange2
not 'e' entry (l)
Entry 3 type l action d next 2
  name ./test/trange2#P#pmax
not 'e' entry (l)
This file may grow until MySQL server restart completes, but what's worse, when it grows over 4GB in size it becomes unusable and effectively blocks any further concurrent DDL until we get rid of it. I had a lot of fun reading the code and reporting related Bug #83708. Unlucky users who do a lot of partitioning-related DDL may find the situation less funny when they hit this bug.


I plan to describe what I had to work on this week soon, while I still remember all the relevant details and feelings. So, stay tuned!

Testing MySQL 8

MySQL 8MySQL 8 is now available for testing. The list of features is impressive and I am sure many PHP developers will be interested in having a true data dictionary, invisible indexes, and more. Right now it is in a milestone release. I am still doing a lot of work with 5.7 and found a great way to have 8 around when I needed by using Docker. Follow the Docker install guide for your platform of choice and then run docker run --name mysql8 -e MYSQL_ROOT_PASSWORD=hidave -d mysql:8.0
Checking to See if it Really Works
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ae1824fc01b5 mysql:8.0 "docker-entrypoint.sh" About a minute ago Up About a minute 3306/tcp mysql8
Note that our instance of MySQL 8 has the container id of ae1824fc01b5. And odds are your container id will be anything but ae1824fc01b5. Connect to the Container docker exec -it ae1824fc01b5 bash Start up the MySQL command line interface.

# mysql -u root -p

Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 8.0.0-dmr MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>
MySQL 8 is running. There is no data loaded. To be able to do that from the host box, we will need to talk to the MySQL Container. And to do that we need the IP Address. ip a show eth0
6: eth0@if7: mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.2/16 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe11:2/64 scope link
valid_lft forever preferred_lft forever
Exit off the container. br> Now you can load your data and I used the world_x test database mysql -u root -p -h 172.17.0.2 Test!
#!/usr/bin/php
<?php
$mysqli = new mysqli("172.17.0.2", "root", "hidave", "world_x");
if ($mysqli->connect_errno) {
echo "Failed to connect to MySQL: (" . $mysqli->connect_errno . ") " . $mysqli->connect_error;
}

$res = $mysqli->query("SELECT Name FROM city ORDER BY Name limit 7");

echo "City names...\n";
for ($row_no = $res->num_rows - 1; $row_no >= 0; $row_no--) {
$res->data_seek($row_no);
$row = $res->fetch_assoc();
echo " Name = " . $row['Name'] . "\n";
}

Now for the first data out of MySQL 8 in the container
shell>php test.php
City names...
Name = Abakan
Name = Abaetetuba
Name = Abadan
Name = Aba
Name = Aalborg
Name = Aachen
Name = A Coru�a (La Coru�a)
So the test environment is set up. Onto the tests of MySQL 8.

Is Your Query Cache Really Disabled?

This blog post was motivated by an internal discussion about how to fully disable query cache in MySQL.

According to the manual, we should be able to disable “Query Cache” on the fly by changing

query_cache_type to 0, but as we will show this is not fully true. This blog will show you how to properly disable “query cache,” and how common practices might not be as good as we think.

Can we just disable it by changing variables, or does it requires a restart to avoid the global mutex? Let’s see how it works.

Some Query Cache context

The query cache stores the text of a “Select” statement together with the corresponding result that was sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again. The query cache is shared among sessions, so a result set generated by one client can be sent in response to the same query issued by another client.

But cacheable queries take out an “exclusive lock” on MySQL’s query cache. In addition, any insert, update, delete or other modifications to a table causes any relevant entries in the query cache to be flushed. If you see many “Waiting for query cache lock” in the processlist, you might be suffering from this exclusive lock. In this blog post, you can see how this global mutex in high concurrency can cause performance degradation.

If we are facing with this situation, how can we disable it?

Disabling Query Cache

There are two options that you can change:

query_cache_type and query_cache_size.

So if we change

query_cache_size to “0”, does it means the cache is disabled? Or we also have to change query_cache_type? Or both? And does MySQL require a restart to avoid the global mutex?

The source code shows us this:

int Query_cache::send_result_to_client(THD *thd, const LEX_CSTRING &sql) { ulonglong engine_data; Query_cache_query *query; #ifndef EMBEDDED_LIBRARY Query_cache_block *first_result_block; #endif Query_cache_block *result_block; Query_cache_block_table *block_table, *block_table_end; char *cache_key= NULL; size_t tot_length; Query_cache_query_flags flags; DBUG_ENTER("Query_cache::send_result_to_client"); /* Testing 'query_cache_size' without a lock here is safe: the thing we may loose is that the query won't be served from cache, but we save on mutex locking in the case when query cache is disabled. See also a note on double-check locking usage above. */ if (is_disabled() || thd->locked_tables_mode || thd->variables.query_cache_type == 0 || query_cache_size == 0) goto err; ...

MySQL is going to check if the query cache is enabled before it locks it. It is checking four conditions, and one of them has to be true. The last three could be obvious, but what is the “is_disabled()” function? Following the source code, we can find the next: sql_cache.h

void disable_query_cache(void) { m_query_cache_is_disabled= TRUE; } ... bool is_disabled(void) { return m_query_cache_is_disabled; }

sql_cache.cc

void Query_cache::init() { DBUG_ENTER("Query_cache::init"); mysql_mutex_init(key_structure_guard_mutex, &structure_guard_mutex, MY_MUTEX_INIT_FAST); mysql_cond_init(key_COND_cache_status_changed, &COND_cache_status_changed); m_cache_lock_status= Query_cache::UNLOCKED; initialized = 1; /* If we explicitly turn off query cache from the command line query cache will be disabled for the reminder of the server life time. This is because we want to avoid locking the QC specific mutex if query cache isn't going to be used. */ if (global_system_variables.query_cache_type == 0) query_cache.disable_query_cache(); DBUG_VOID_RETURN; }

If the

global_system_variables.query_cache_type == 0 condition is true it is going to call the disable_query_cache  function, which sets m_query_cache_is_disabled = True, so is_disabled going to be “True”. That means if we are setting query_cache_type to 0 in runtime, that should eliminate the global mutex. Let’s run some tests to confirm this and see if the global mutex disappears after changing query_cache_type to 0.

Running tests

Context on the tests:

  1. We ran simple OLTP tests using sysbench as follows:

sysbench --test="/usr/share/doc/sysbench/tests/db/oltp.lua" --report-interval=1 --max-time=120 --oltp-read-only=off --max-requests=0 --num-threads=4 --oltp-table-size=2000000 --mysql-host=localhost --mysql-db=test --db-driver=mysql --mysql-user=root run

  1. Important portion of my.cnf file:

query_cache_type =1 query_cache_limit = 1M query_cache_size =1G performance_schema_instrument='wait/synch/%Query_cache%=COUNTED'

So basically the tests were run for two minutes each while playing with

query_cache_type and query_cache_size.
  1. Started MySQL with query_cache_type = 1 and query_cache_size=1G.
  2. Change query_cache_type to 0. As we can see nothing changed, MySQL is still using the query cache.
  3. But when we stopped sysbench and started again (closing and opening new connections), we can see there are no more inserts going into query cache. But we still can see the queries like “Not Cached” that means changing the query_cache_type applies only for the new connections, and we still can see some mutex.
  4. Restarted MySQL with query_cache_type = 0 and query_cache_size=0. Finally we disabled the query cache and all the mutex is disappeared.
  5. Restarted MySQL with query cache enabled.
  6. We changed query_cache_size=0 and it almost worked, we could disable query cache on the fly, but as we can see there is still some mutex activity.
  7. Changing query_cache_type=0 and restarting sysbench does not have any effect on the mutex.

So the only way to stop any activity around query cache requires restarting MySQL with

query_cache_type = 0  and query_cache_size=0. Disabling it or even set it to “0” on runtime is not completely stopping mutex activity.

But why do we still need

query_cache_size while in theory query_cache_type should be enough?

As referenced above, the manual says if query_cache_type = 0:

Do not cache results in or retrieve results from the query cache. Note that this does not deallocate the query cache buffer. To do that, you should set query_cache_size to 0.

Based on our test, if we change

query_cache_type to 0, it still hits the cache.

So you might think “well, I don’t enable the query cache and use defaults to keep it disabled.” Keep reading, because you might be wrong. According to manual, starting from 5.6.8

query_cache_type=0 is set by default, but query_cache_size= 1048576  (1MB). This means that if we keep default configuration, we will still see activity in the query cache as follows:mysql -e "show global status like 'qca%';" +-------------------------+---------+ | Variable_name | Value | +-------------------------+---------+ | Qcache_free_blocks | 1 | | Qcache_free_memory | 1031320 | | Qcache_hits | 0 | | Qcache_inserts | 0 | | Qcache_lowmem_prunes | 0 | | Qcache_not_cached | 423294 | | Qcache_queries_in_cache | 0 | | Qcache_total_blocks | 1 | +-------------------------+---------+

But if we just add

query_cache_size=0  to my.cnf and check again (of course after restarting server):mysql -e "show global status like 'qca%';" +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Qcache_free_blocks | 0 | | Qcache_free_memory | 0 | | Qcache_hits | 0 | | Qcache_inserts | 0 | | Qcache_lowmem_prunes | 0 | | Qcache_not_cached | 0 | | Qcache_queries_in_cache | 0 | | Qcache_total_blocks | 0 | +-------------------------+-------+

We finally get no query cache related activity at all. How much overhead is caused by this? We’re not fully sure because we didn’t perform benchmarks, but we like to see no activity when we don’t want to.
Now we’re wondering if this case requires a bug report. Stay tuned, we will publish results in the post soon.

Digging more code

Let’s have a look on store_query function. MySQL uses this function to store queries in the query cache. If we read the code we can find this:

if (thd->locked_tables_mode || query_cache_size == 0) DBUG_VOID_RETURN;

It only checks the

query_cache_size, it does not check the type. Store_query is called in handle_query, which also does not check the query_chache_type.

Conclusion

There is some contradiction between checking the query cache and storing the data in the query cache, which needs further investigation. But as we can see it is not possible to fully disable the query cache on the fly by changing

query_cache_type  or/and query_cache_size to 0. Based on the code and the tests, if you want to make sure the query cache is fully disabled, change query_cache_size and query_cache_type to 0 and restart MySQL.

Is a known fact that query cache can be a big point of contention, and we are not trying to benchmark the performance overhead since this mostly depends on the workload type. However, we still can see some overhead if the query cache is not fully disabled when MySQL is started.

Log Buffer #494: A Carnival of the Vanities for DBAs

This Log Buffer Edition covers Oracle, SQL Server and MySQL.

Oracle:

Declarative REST Client Feign with Spring Boot

Sometimes it is necessary to put on your uber-geek hat and start using cryptic bits of code to retrieve information from an Oracle database.

Deployment of Oracle Applications and Database with OpenStack Murano

A current question on the OTN database forum asks: “What’s the difference between object and tablespace reorganization?” Here’s an analogy to address the question.

Bulk inserts in MongoDB

SQL Server:

TempDB growth due to version store on AlwaysOn secondary server

Power BI for Data Modelling

SSIS Package documentor

SQL Server Database Provisioning

When Stored Procedures Go Bad

MySQL:

Ready for the holiday shopping season? 20 tips to prepare your MariaDB database environment for Black Friday and Cyber Monday!

MySQL Cluster and real-time requirements

2-Phase Commit in NDBCluster

High Availability on a Shoestring Budget – Deploying a Minimal Two Node MySQL Galera Cluster

Separation of Query Server and Data Server

 

Database Daily Ops Series: GTID Replication

This post discusses ways of fixing broken GTID replication.

This blog series is all about the daily stories we have in Managed Services, dealing with customers’ environments (mostly when we need to quickly restore a service level within the SLA time).

One of the issues we encounter daily is replication using the GTID protocol. While there are a lot of blogs written about this subject, I would like to just highlight GTID replication operations, and the way you can deal with broken replication.

Most of the time we face way more complex scenarios then the one I’m about to present as an example, but the main goal of this blog is to quickly highlight the tools that can be used to fix issues to resume replication.

After reading this blog, you might ask yourself “Now, we know how to fix replication, but what about consistency?” The next blog will be entirely focused on that matter, data consistency!

Little less talk, little more action…

Replication is broken, and the SHOW SLAVE STATUS command output looks like below:

mysql> show slave statusG *************************** 1. row ***************************                Slave_IO_State: Waiting for master to send event                   Master_Host: 192.168.0.12                   Master_User: repl                   Master_Port: 3306                 Connect_Retry: 60               Master_Log_File: mysqld-bin.000005           Read_Master_Log_Pos: 879                Relay_Log_File: mysqld-relay-bin.000009                 Relay_Log_Pos: 736         Relay_Master_Log_File: mysqld-bin.000005              Slave_IO_Running: Yes             Slave_SQL_Running: No               Replicate_Do_DB:           Replicate_Ignore_DB:            Replicate_Do_Table:        Replicate_Ignore_Table:       Replicate_Wild_Do_Table:   Replicate_Wild_Ignore_Table:                    Last_Errno: 1062                    Last_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: ''. Query: 'insert into wb.t1 set i=1'                  Skip_Counter: 0           Exec_Master_Log_Pos: 634               Relay_Log_Space: 1155               Until_Condition: None                Until_Log_File:                 Until_Log_Pos: 0            Master_SSL_Allowed: No            Master_SSL_CA_File:            Master_SSL_CA_Path:               Master_SSL_Cert:             Master_SSL_Cipher:                Master_SSL_Key:         Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No                 Last_IO_Errno: 0                 Last_IO_Error:                Last_SQL_Errno: 1062                Last_SQL_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: ''. Query: 'insert into wb.t1 set i=1'   Replicate_Ignore_Server_Ids:              Master_Server_Id: 3                   Master_UUID: 46fdb7ad-5852-11e6-92c9-0800274fb806              Master_Info_File: mysql.slave_master_info                     SQL_Delay: 0           SQL_Remaining_Delay: NULL       Slave_SQL_Running_State:            Master_Retry_Count: 86400                   Master_Bind:       Last_IO_Error_Timestamp:      Last_SQL_Error_Timestamp: 161108 16:47:53                Master_SSL_Crl:            Master_SSL_Crlpath:            Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-4, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3             Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-3, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3, 81a567a8-5852-11e6-92cb-0800274fb806:1                 Auto_Position: 1 1 row in set (0.00 sec)
When a slave configured to replicate using the GTID protocol breaks, pay attention to the SHOW SLAVE STATUS command output. You will find the Retrieved_Gtid_Set and Executed_Gtid_Set in the listed columns. You can see that the last global transaction ID retrieved from the current master was not executed (it’s going to appear on the Retrieved_Gtid_Set but not on the Executed_Gtid_Set, following the GTID format).

That means that the slave has retrieved a transaction that, for some other reason, it couldn’t execute. That’s the global transaction ID you need if you want to inject a fake transaction and get replication resumed. The fake transaction you inject takes the place of the one that has an SQL that cannot be executed due to an error found in Last_Error from the SHOW SLAVE STATUS.

Let’s analyze it: #: replication is broken due to error 1062, when the primary key of a particular table is violated Last_Errno: 1062 Last_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: ''. Query: 'insert into wb.t1 set i=1'   #: you can identify what is the global transaction id with problems, so, getting the replication streaming broken            Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-4, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3             Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-3, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3, 81a567a8-5852-11e6-92cb-0800274fb806:1
As shown above, we can see clearly that the transaction causing issues is global transaction ID number 4, coming from master under UUID 46fdb7ad-5852-11e6-92c9-0800274fb806. You can use the SHOW RELAYLOG EVENTS to try and verify that is the transaction’s query that is causing problems: mysql> show relaylog events in 'mysqld-relay-bin.000009' from 736G *************************** 1. row ***************************    Log_name: mysqld-relay-bin.000009         Pos: 736  Event_type: Gtid   Server_id: 3 End_log_pos: 682        Info: SET @@SESSION.GTID_NEXT= '46fdb7ad-5852-11e6-92c9-0800274fb806:4' *************************** 2. row ***************************    Log_name: mysqld-relay-bin.000009         Pos: 784  Event_type: Query   Server_id: 3 End_log_pos: 755        Info: BEGIN *************************** 3. row ***************************    Log_name: mysqld-relay-bin.000009         Pos: 857  Event_type: Query   Server_id: 3 End_log_pos: 848        Info: insert into wb.t1 set i=1 *************************** 4. row ***************************    Log_name: mysqld-relay-bin.000009         Pos: 950  Event_type: Xid   Server_id: 3 End_log_pos: 879        Info: COMMIT /* xid=66 */ 4 rows in set (0.00 sec)
Before fixing and resuming the replication stream, we need to check why that INSERT query breaks replication. Let’s SELECT data and check the structure of table wb.t1: mysql> select * from wb.t1; +---+ | i | +---+ | 1 | +---+ 1 row in set (0.00 sec)   mysql> show create table wb.t1; +-------+-----------------------------------------------------+ | Table | Create Table                                                                                                       | +-------+-----------------------------------------------------+ | t1    | CREATE TABLE `t1` (   `i` int(11) NOT NULL DEFAULT '0',   PRIMARY KEY (`i`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+-----------------------------------------------------+ 1 row in set (0.01 sec)
It’s clear that something is wrong somewhere else than just the database. It’s time to fix and resume replication, and catch up with the master data. GTID replication data uses the below tools:
  • pt-slave-restart
  • mysqlslavetrx
  • inject a fake/empty transaction

pt-slave-restart

One of the easiest ways of resuming replication on slaves when replication is broken is using

pt-slave-restart, which is part of Percona Toolkit. Once you find the above facts (mainly the master UUID of the problematic global transaction ID that broke slave replication), you can move forward using pt-slave-restart  with the GTID flag —-master-uuid. This passes the master’s UUID and it skips all global transactions breaking replication on a specific slave server, as you can see below: [root@dbops02 ~]# pt-slave-restart --master-uuid 46fdb7ad-5852-11e6-92c9-0800274fb806 --host=localhost -u root 2016-11-08T17:24:09 h=localhost,u=root mysqld-relay-bin.000009         736 1062 2016-11-08T17:24:25 h=localhost,u=root mysqld-relay-bin.000010         491 1062 2016-11-08T17:24:34 h=localhost,u=root mysqld-relay-bin.000010         736 1062 2016-11-08T17:24:35 h=localhost,u=root mysqld-relay-bin.000010         981 1062 2016-11-08T17:24:36 h=localhost,u=root mysqld-relay-bin.000010        1226 1062
With the resources provided by pt-slave-restart, together with the above info, replication should resume. If you don’t have the Percona Toolkit package setup on your servers, make sure you follow these steps. It’s easier if you add the Percona Repository to your servers (you can use the Package Manager to install it for Debian-based and for RedHat-based systems).

mysqlslavetrx

To use mysqlslavetrx (which is part of MySQL Utilities developer by Oracle), I recommend you read the article written by Daniel Guzman, and install MySQL Utilities on your database servers. Using it to skip problematic transactions and inject fake ones is pretty straightforward as well .

So, find the below on the slave side:

         Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-13, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3             Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-8, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3, 81a567a8-5852-11e6-92cb-0800274fb806:1
Then use GTID_SUBTRACT as the first parameter, which you can find on Retrieved_Gtid_Set and the subset, which you can grab from the Executed_Gtid_Set. Use the UUID of the direct master in the function to find the global transaction IDs executed on slave: #: this below function is pretty cool and will exactly shows you if the slave is lacking some #: GTIDs master have and vice-versa - this is going to help out using mysqlslavetrx to put #: master and slave consistently with the same binary logs contents mysql> SELECT GTID_SUBTRACT('46fdb7ad-5852-11e6-92c9-0800274fb806:1-13','46fdb7ad-5852-11e6-92c9-0800274fb806:1-8') gap; *************************** 1. row *************************** gap: 46fdb7ad-5852-11e6-92c9-0800274fb806:9-13 1 row in set (0.00 sec)
Now we can use mysqlslavetrx to insert a fake transaction on the slave to resume replication, as below:  [root@dbops02 mysql-utilities-1.6.2]# mysqlslavetrx --gtid-set=46fdb7ad-5852-11e6-92c9-0800274fb806:9-13 --verbose --slaves=wb@localhost:3306 WARNING: Using a password on the command line interface can be insecure. # # GTID set to be skipped for each server: # - localhost@3306: 46fdb7ad-5852-11e6-92c9-0800274fb806:9-13 # # Injecting empty transactions for 'localhost:3306'... # - 46fdb7ad-5852-11e6-92c9-0800274fb806:9 # - 46fdb7ad-5852-11e6-92c9-0800274fb806:10 # - 46fdb7ad-5852-11e6-92c9-0800274fb806:11 # - 46fdb7ad-5852-11e6-92c9-0800274fb806:12 # - 46fdb7ad-5852-11e6-92c9-0800274fb806:13 # #...done. #

When you get back to the MySQL client on the slave, you’ll see that the retrieved and executed out of SHOW SLAVE STATUS will point that they are in the same position:

           Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-13, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3             Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-13, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3, 81a567a8-5852-11e6-92cb-0800274fb806:1
After, make sure you start the slave (mysqlslavetrx won’t start the replication as previous tool does).

Inject a Fake Transaction

Fake transactions are called empty transactions as well, but the fact is if a global transaction is affecting a slave, you must fake empty transactions that won’t affect data to resume replication and carry on processing the data streaming from the master aka replication! We need to know that it’s not going to affect future slaves, especially if this server becomes a new master after a failover/switchover process. You can get more information about Errant Transactions here and here.

mysql> stop slave; Query OK, 0 rows affected (0.01 sec)   mysql> set gtid_next='46fdb7ad-5852-11e6-92c9-0800274fb806:14'; Query OK, 0 rows affected (0.00 sec)   mysql> begin; commit; Query OK, 0 rows affected (0.00 sec)   Query OK, 0 rows affected (0.00 sec)   mysql> set gtid_next=automatic; Query OK, 0 rows affected (0.00 sec) Now, when you check retrieved and executed out of SHOW SLAVE STATUS, you can see the below:            Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3             Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3, 81a567a8-5852-11e6-92cb-0800274fb806:1
It’s time to start slave (and be happy)! mysql> show slave statusG *************************** 1. row ***************************                Slave_IO_State: Waiting for master to send event                   Master_Host: 192.168.0.12                   Master_User: repl                   Master_Port: 3306                 Connect_Retry: 60               Master_Log_File: mysqld-bin.000005           Read_Master_Log_Pos: 3329                Relay_Log_File: mysqld-relay-bin.000011                 Relay_Log_Pos: 491         Relay_Master_Log_File: mysqld-bin.000005              Slave_IO_Running: Yes             Slave_SQL_Running: Yes               Replicate_Do_DB:           Replicate_Ignore_DB:            Replicate_Do_Table:        Replicate_Ignore_Table:       Replicate_Wild_Do_Table:   Replicate_Wild_Ignore_Table:                    Last_Errno: 0                    Last_Error:                  Skip_Counter: 0           Exec_Master_Log_Pos: 3329               Relay_Log_Space: 3486               Until_Condition: None                Until_Log_File:                 Until_Log_Pos: 0            Master_SSL_Allowed: No            Master_SSL_CA_File:            Master_SSL_CA_Path:               Master_SSL_Cert:             Master_SSL_Cipher:                Master_SSL_Key:         Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No                 Last_IO_Errno: 0                 Last_IO_Error:                Last_SQL_Errno: 0                Last_SQL_Error:   Replicate_Ignore_Server_Ids:              Master_Server_Id: 3                   Master_UUID: 46fdb7ad-5852-11e6-92c9-0800274fb806              Master_Info_File: mysql.slave_master_info                     SQL_Delay: 0           SQL_Remaining_Delay: NULL       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it            Master_Retry_Count: 86400                   Master_Bind:       Last_IO_Error_Timestamp:      Last_SQL_Error_Timestamp:                Master_SSL_Crl:            Master_SSL_Crlpath:            Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3             Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3, 81a567a8-5852-11e6-92cb-0800274fb806:1                 Auto_Position: 1 1 row in set (0.00 sec)
Cheers!

Testing and verifying your MySQL backup strategy presentation

From the IAOUG Gold Coast OTN Day, Ronald Bradford gave a presentation on “Testing and Verifying your MySQL Backup Strategy”. Details in this presentation included:


Download PDF Presentation
  • Product options
    • mysqldump
    • mysqlpump
    • mydumper
    • Xtrabackup
    • MySQL Enterprise Backup
    • LVM/SAN Snapshot
    • Filesystem copy
  • Binary log backup options
  • B&R Strategy considerations
    • Time to backup
    • Time to restore
    • Consistency
    • Flexibility
    • Partial Capabilities
    • Cost
  • Technical Requirements
  • Testing & Verification
  • Using Failover

More information about mysqlpump available in MySQL 5.7 can be found at Introducing mysqlpump, playing with mysqlpump and mysqlpump reference manual.

Thanks to Pythian and more4apps for hosting the event.

Orchestrator and ProxySQL

In this blog post, I am going to show you how can you use Orchestrator and ProxySQL together.

In my previous blog post, I showed how to use bash scripts and move virtual IPs with Orchestrator. As in that post, I assume you already have Orchestrator working. If not, you can find the installation steps here.

In the case of a failover, Orchestrator changes the MySQL topology and promotes a new master. But who lets the application know about this change? This is where ProxySQL helps us.

ProxySQL

You can find the ProxySQL install steps here. In our test, we use the following topology:

For this topology we need the next rules in “ProxySQL”:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.56.107',601,3306,1000,10); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.56.106',601,3306,1000,10); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.56.105',601,3306,1000,0); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.56.105',600,3306,1000,0); INSERT INTO mysql_replication_hostgroups VALUES (600,601,''); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; insert into mysql_query_rules (username,destination_hostgroup,active) values('testuser_w',600,1); insert into mysql_query_rules (username,destination_hostgroup,active) values('testuser_r',601,1); insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('testuser_rw',601,1,3,'^SELECT'); LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK; insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('testuser_w','Testpass1.',1,600,'test',1); insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('testuser_r','Testpass1.',1,601,'test',1); insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('testuser_rw','Testpass1.',1,600,'test',1); LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;

See the connection pool:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc; +-----------+----------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+----------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 600 | 192.168.56.105 | 3306 | ONLINE | 4 | 0 | 4 | 0 | 2833 | 224351 | 0 | 3242 | | 601 | 192.168.56.107 | 3306 | ONLINE | 1 | 1 | 11 | 0 | 275443 | 11785750 | 766914785 | 431 | | 601 | 192.168.56.106 | 3306 | ONLINE | 1 | 1 | 10 | 0 | 262509 | 11182777 | 712120599 | 1343 | | 601 | 192.168.56.105 | 3306 | ONLINE | 1 | 1 | 2 | 0 | 40598 | 1733059 | 111830195 | 3242 | +-----------+----------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ 4 rows in set (0.00 sec)

It shows us “192.168.57.105” is in “hostgroup” 600, which means that server is the master.

How does ProxySQL decide who the new master is?

ProxySQL does not know what the topology looks like, which is really important. ProxySQL is monitoring the “read_only” variables on the MySQL servers, and the server where

read_only=off is going to get the writes. If the old master went down and we changed our topology, we have to change the read_only variables on the new master. Of course, applications like MHA or Orchestrator can do that for us.

We have two possibilities here: the master went down, or we want to promote a new master.

Master is down

If the master goes down, Orchestrator is going to change the topology and set the

read_only = OFF on the promoted master. ProxySQL is going to realize the master went down and send the write traffic to the server where read_only=OFF.

Let’s do a test. After we stopped MySQL on “192.168.56.105”, Orchestrator promoted “192.168.56.106” as the new master. ProxySQL is using it now as a master:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc; +-----------+----------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+----------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 600 | 192.168.56.106 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 790 | | 601 | 192.168.56.107 | 3306 | ONLINE | 0 | 0 | 13 | 0 | 277953 | 11894400 | 774312665 | 445 | | 601 | 192.168.56.106 | 3306 | ONLINE | 0 | 0 | 10 | 0 | 265056 | 11290802 | 718935768 | 790 | | 601 | 192.168.56.105 | 3306 | SHUNNED | 0 | 0 | 2 | 0 | 42961 | 1833016 | 117959313 | 355 | +-----------+----------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ 4 rows in set (0.00 sec)

This happens quickly and does not require any application, VIP or DNS modification.

Promoting a new Master

When we perform a

graceful-master-takeover with Orchestrator, it promotes a slave as a new master, removes the old master from the replicaset and sets read_only=ON.

From Orchestrator’s point of view, this is great. It promoted a slave as a new master, and old master is not part of the replicaset anymore. But as I mentioned earlier, ProxySQL does not know what the replicaset looks like.

It only knows we changed the

read_only variables on some servers. It is going to send reads to the old master, but it does not have up-to-date data anymore. This is not good at all.

We have two options to avoid this.

Remove master from read hostgroup

If the master is not part of the read hostgroup, ProxySQL won’t send any traffic there after we promote a new master. But in this case, if we lose the slaves, ProxySQL cannot redirect the reads to the master. If we have a lot of slaves, and the replication stopped on the saves because of an error or mistake, the master probably won’t be able to handle all the read traffic. But if we only have a few slaves, it would be good if the master can also handle reads if there is an issue on the slaves.

Using Scheduler

In this great blog post from Marco Tusa, we can see that ProxySQL can use “Schedulers”. We can use the same idea here as well. I wrote a script based on Marco’s that can recognize if the old master is no longer a part of the replicaset.

The script checks the followings:

  • read_only=ON – the server is read-only (on the slave servers, this has to be ON)
  • repl_lag  is NULL – on the master, this should be NULL (if the seconds_behind_master is not defined, ProxySQL will report repl_lag is NULL)

If the

read_only=ON, it means the server is not the master at the moment. But if the repl_lag is NULL, it means the server is not replicating from anywhere, and it probably was a master. It has to be removed from the Hostgroup. Adding a Scheduler

INSERT INTO scheduler (id,interval_ms,filename,arg1) values (10,2000,"/var/lib/proxysql/server_monitor.pl","-u=admin -p=admin -h=127.0.0.1 -G=601 -P=6032 --debug=0 --log=/var/lib/proxysql/server_check"); LOAD SCHEDULER TO RUNTIME;SAVE SCHEDULER TO DISK;

The script has parameters like username, password or port. But we also have to define the read Hostgroup (-G).

Let’s see what happens with ProsySQL after we run the command

orchestrator -c graceful-master-takeover -i rep1 -d rep2 :mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc; +-----------+----------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+----------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 600 | 192.168.56.106 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 504 | | 601 | 192.168.56.107 | 3306 | ONLINE | 0 | 2 | 2 | 0 | 6784 | 238075 | 2175559 | 454 | | 601 | 192.168.56.106 | 3306 | ONLINE | 0 | 0 | 2 | 0 | 6761 | 237409 | 2147005 | 504 | | 601 | 192.168.56.105 | 3306 | OFFLINE_HARD | 0 | 0 | 2 | 0 | 6170 | 216001 | 0 | 435 | +-----------+----------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ 4 rows in set (0.00 sec)

As we can see, the status changed to

OFFLINE_HARD:mysql> select * from mysql_servers; +--------------+----------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+ | hostgroup_id | hostname | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment | +--------------+----------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+ | 601 | 192.168.56.107 | 3306 | ONLINE | 1000 | 0 | 1000 | 10 | 0 | 0 | | | 601 | 192.168.56.106 | 3306 | ONLINE | 1000 | 0 | 1000 | 10 | 0 | 0 | | | 9601 | 192.168.56.105 | 3306 | ONLINE | 1000 | 0 | 1000 | 0 | 0 | 0 | | | 600 | 192.168.56.106 | 3306 | ONLINE | 1000 | 0 | 1000 | 10 | 0 | 0 | | +--------------+----------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+ 4 rows in set (0.00 sec)

This is because we changed the “hostgroup_id” to 9601. This is what we wanted so that the old master won’t get more traffic.

Conclusion

Because ProxySQL redirects the traffic based on the

read_only  variables, it is important to start the servers with read_only=ON (even on the master). In that case, we can avoid getting writes on many servers at the same time.

If we want to use

graceful-master-takeover with Orchestrator, we have to use a scheduler that can remove the old master from the read Hostgroup.

Announcing Galera Cluster 5.5.53 and 5.6.34 with Galera 3.19 with improvements to cluster restart

Codership is pleased to announce a new release of Galera Cluster for MySQL consisting of MySQL-wsrep 5.6.34 and Galera 3.19, wsrep API version 25.

This release includes enhancements to make cluster restarts faster and safer, which we will be blogging about over the coming days. It also incorporates all changes up to MySQL 5.6.34, including security fixes.

This and future releases will be available from http://www.galeracluster.com.
The source repositories and bug tracking are now on http://www.github.com/codership.

New features and notable fixes in the Galera replication library since last binary release by Codership (3.18):

  • A new wsrep provider option, gcache.recover, enables gcache to be recovered at restart. This allows the restarted node to serve IST to any other nodes that are also starting, which may speed up whole-cluster restarts.
  • An additional protection against restarting a cluster using a node that may not have all the transactions has been implemented. Unless a node was the last one to be shut down from the previous cluster, Galera will refuse to bootstrap from it until explicit action is taken by the operator.
  • Galera would refuse to start on PPC64EL (lp1633226, GAL-418)
  • Galera would crash or enter an infinite loop in the asio library (GAL-416)
  • It was not possible to set pc.wait_prim and pc.wait_prim_timeout options using wsrep_provider_options (GAL-360, codership/galera#360)
  • For security reasons, SSLv2, SSLv3, TLS 1.0 will no longer be negotiated during SSL handshake (codership/galera#428)

Notable bug fixes in MySQL-wsrep:

  • The ALTER USER statement is now replicated correctly (MW-305, codership/mysql-wsrep#292, lp1376269)
  • The wsrep_max_ws_rows could cause SELECT queries that use temporary tables to return an error (MW-309, lp1638138)

New features, notable changes and bug fixes in Oracle MySQL 5.6.34:

  • Incompatible change: The behavior of the –secure_file_priv option has changed and additional checks are performed on the value at server startup. The default value on most platforms has been changed to /var/lib/mysql-files. That directory is created at package installation time.
  • The yaSSL library has been upgraded to 2.4.2.

Galera Cluster is available as targeted packages and package repositories for a number of Linux distributions, including Ubuntu, Debian, Fedora, CentOS, RHEL, OpenSUSE and SLES. Obtaining packages using a package repository removes the need to download individual files and facilitates the deployment and upgrade of Galera nodes.

Known issues with this release:

  • If using the Ubuntu 16.04 Xenial package, the server can not be bootstrapped using systemd. Please use the SysV init script with the ‘bootstap’ option to bootstrap the node. Note that a server that has been started that way can not be controlled via systemd and must be stopped using the SysV script. Normal server startup and shutdown is possible via systemd.

Checking if a Slave Has Applied a Transaction from the Master

In this blog post, we will discuss how we can verify if an application transaction executed on the master has been applied to the slaves.

In summary, is a good practice to alleviate the load on the master by doing reads on slaves. It is acceptable in most of the cases to just connect on slaves and issue selects. But there are some cases we need to ensure that the data we just applied on our master has been applied on the slaves before we query it.

One way to do this is using a built-in function called MASTER_POS_WAIT. This function receives a binary log name and position. It will block the query until the slave applies transactions up to that point, or timeout. Here is one example of how to use it:

-- insert our data on master master [localhost] {msandbox} (test) > INSERT INTO test VALUES (); Query OK, 1 row affected (0.00 sec) -- get the binlog file and position from master master [localhost] {msandbox} (test) > SHOW MASTER STATUS; +------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-------------------+ | mysql-bin.000005 | 1591 | | | | +------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) -- connect on slave and run MASTER_POS_WAIT passing the binlog name and position slave [localhost] {msandbox} ((none)) > SELECT NOW(); SELECT MASTER_POS_WAIT('mysql-bin.000005', 1591); SELECT NOW(); +---------------------+ | NOW() | +---------------------+ | 2016-10-20 18:24:24 | +---------------------+ 1 row in set (0.00 sec) -- it will wait until the slave apply up to that point +-------------------------------------------+ | MASTER_POS_WAIT('mysql-bin.000005', 1591) | +-------------------------------------------+ | 1 | +-------------------------------------------+ 1 row in set (3.82 sec) +---------------------+ | NOW() | +---------------------+ | 2016-10-20 18:24:28 | +---------------------+ 1 row in set (0.00 sec)

Blocking the connection until the slave is in sync with the coordinate passed as a parameter on

MASTER_POS_WAIT might not be affordable to all applications, however.

As an alternative, MySQL 5.6+ makes use of relay_log_info_repository configuration. If we set this variable to TABLE, MySQL stores the slave status information in the 

slave_relay_log_info table under mysql database. We must configure the sync_relay_log_info variable, and set to 1 in case we use non-transactional tables such as MyISAM. It forces slave_relay_log_info  to sync after each statement. So edit my.cnf on slaves:relay_log_info_repository=TABLE sync_relay_log_info=1

Now we can query

slave_relay_log_info directly to see if the slave we are connected to already applied the transaction we need:master [localhost] {msandbox} (test) > INSERT INTO test VALUES (NULL); Query OK, 1 row affected (0.00 sec) master [localhost] {msandbox} (test) > SHOW MASTER STATUS; +------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-------------------+ | mysql-bin.000003 | 366 | | | | +------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) slave1 [localhost] {msandbox} ((none)) > SELECT COUNT(*) FROM mysql.slave_relay_log_info WHERE (Master_log_name > 'mysql-bin.000003') OR ( Master_log_name = 'mysql-bin.000003' AND Master_log_pos >= '366' ); +----------+ | COUNT(*) | +----------+ | 1 | +----------+ 1 row in set (0.00 sec)

Conclusion

You can use

relay_log_info_repository as a replacement for MASTER_POS_WAIT to check if a slave has applied a particular transaction. Since it won’t block your thread (in case the slave is not in sync) you will be able to either abort the operation or disconnect and move to the next slave.

Ready for the holiday shopping season? 20 tips to prepare your MariaDB database environment for Black Friday and Cyber Monday!

Ready for the holiday shopping season? 20 tips to prepare your MariaDB database environment for Black Friday and Cyber Monday!james_mclaurin_g Mon, 11/07/2016 - 23:41

Setting up a database environment for the expected loads that Black Friday or Cyber Monday brings can be a tricky situation.  However, the suggestions in this blog centers on the principles of Scalability, Capacity, Performance and High Availability (HA).  For this post, I define each principle as:

  • Scalability is the ability to add capacity by adding resources

  • Capacity is the ability to handle load

  • Performance is tuning your database environment for increased capacity and scale

  • High Availability is serving requests with good performance

This blog post covers how to tune system variable in the MariaDB database environment for capacity, scalability and high availability.  

Before deploying, do not accept these suggestions blindly. Each MariaDB environment is unique and requires additional thought before making any changes.  You will most likely need to adjust these settings for your specific use case and environment.

Things you need to know:

  • MariaDB configuration file is located in /etc/my.cnf. Every time you modify this file you will need to restart the MySQL service so the new changes can take effect.

Black Friday and Cyber Monday Tuning Recommendations

1. InnoDB Buffer Pool Size

The InnoDB buffer pool should generally be set to 60-80% of the available RAM when using InnoDB exclusively. Ideally, all InnoDB data and indexes should be able to fit within memory or at least the working data set.

More information:

2. InnoDB Logs

There are two general suggestions for InnoDB log file sizing: Set combined total size of InnoDB log files greater than 25-50% of the InnoDB buffer pool size or set combined InnoDB log file log size equal to one hour’s worth of log entries during peak load. Larger log files can lead to slower recovery in the event of a server crash. However, they also reduce the number of checkpoints needed and reduce disk I/O.

Evaluate the size of one hour’s worth of binary logs under operational load, then decide whether to increase the size of the InnoDB log files or not.

More information:

  • https://mariadb.com/kb/en/mariadb/xtradbinnodb-server-system-variables/#innodb_log_file_size

3. InnoDB Log Buffer Size

A larger InnoDB log buffer size means less disk I/O for larger transactions. It is suggested to set this to 64M on all servers.

More information:

  • https://mariadb.com/kb/en/mariadb/xtradbinnodb-server-system-variables/#innodb_log_buffer_size

4. InnoDB Log Flush Interval

The innodb_flush_log_at_trx_commit variable controls when flushing of the log buffer to disk occurs. innodb_flush_log_at_trx_commit = 1 (default) flushes the log buffer to disk at each transaction commit. This is the safest, but also the least performant.  

innodb_flush_log_at_trx_commit = 0 flushes the log buffer to disk every second, but nothing on transaction commit. Up to one second (possibly more due to process scheduling) could be lost. If there’s any crash, MySQL or the server, can lose data. This is the fastest, but least safe option.

innodb_flush_log_at_trx_commit = 2 writes the log buffer out to file on each commit but flushes to disk every second. If the disk cache has a battery backup (for instance a battery backed cache raid controller) this is generally the best balance of performance and safety. A crash of MySQL should not lose data. A server crash or power outage could lose up to a second (possibly more due to process scheduling). A battery backed cache reduces this possibility.

It is suggested to use the first option for safety.

More information:

  • https://mariadb.com/kb/en/mariadb/xtradbinnodb-server-system-variables/#innodb_flush_log_at_trx_commit

5. InnoDB IO Capacity

innodb_io_capacity should be set to approximately the maximum number of IOPS the underlying storage can handle.

This was set to 400, and has been increased to 1000 in the new configuration. It is suggested to benchmark the storage to determine whether this value can be increased further.

More information:

  • https://mariadb.com/kb/en/mariadb/xtradbinnodb-server-system-variables/#innodb_io_capacity

6. Thread Cache Size

It is suggested to monitor the value of Threads_created. If it continues increasing at more than a few threads per minute, increase the value of thread_cache_size.

The thread cache size is set to 200 in the new configuration.

More information:

  • https://mariadb.com/kb/en/mariadb/server-system-variables/#thread_cache_size

7. Table Cache, Table Definition Cache

The table_open_cache and table_defintion_cache variables control the number of tables and definitions to keep open for all threads.

Monitor Open_tables, Open_table_defintitions, Opened_tables, and Opened_table_definitions to determine the best value. The general suggestion is to set table_open_cache (and subsequently table_definition_cache) only high enough to reduce the rate of increase of the Opened_tables (and Opened_table_definitions respectively) status value.

Both table open cache and definition cache have been set to 2048 in the new configuration.

More information:

  • https://mariadb.com/kb/en/mariadb/server-system-variables/#table_open_cache

  • https://mariadb.com/kb/en/mariadb/server-system-variables/#table_definition_cache

8. Query Cache

Generally, if the query cache hit rate is below 50%, it should be investigated whether a performance benefit is being seen or not by having query cache enabled. There is an overhead with query cache for each query.

The query cache is currently disabled. Due to the nature of the application and the ratio of writes to reads, it is unlikely the query cache will offer any performance improvements and could negatively impact performance.

More information:

  • https://mariadb.com/kb/en/mariadb/query-cache/

9. Temporary Tables, tmp_table_size, & max_heap_table_size

MySQL uses the lower of max_heap_table_size and tmp_table_size to limit the size of temporary tables in memory. These are per client variables. While having this value large can help reduce the number of temporary tables created on disk, it also raises the risk of reaching the server's memory capacity since this is per client. Generally 32M to 64M is the suggested value to begin with for both variables and tune as needed.

Temporary tables are often used for GROUP BY, ORDER BY, DISTINCT, UNION, sub queries, etc. Ideally, MySQL should create these in memory, and as few on disk as possible.

It is important to note that queries not using joins appropriately and creating large temporary tables can be a cause for higher number of temporary tables on disk. Another reason is the memory storage engine uses fixed length columns and assumes worst case scenario. If columns are not sized correctly (for example, a VARCHAR(255) for a short string), this influences the size of the table in memory and can cause it to go to disk earlier than it should. Also, temporary tables with blob and text columns will immediately go to disk as the memory storage engine does not support them.

Both have been set to 64M in the new configuration.

More information:

  • https://mariadb.com/kb/en/mariadb/server-system-variables/#tmp_table_size

10. Warning Log Level

It is suggested to set this to log_warnings = 2. Doing so logs information about aborted connections and access-denied errors.

More information:

  • https://mariadb.com/kb/en/mariadb/server-system-variables/#log_warnings

  • http://www.chriscalender.com/what-exactly-does-log_warnings2-log/

11. Max Connections

Determine an appropriate value for max connections and change. Recommended starting value would be 500 and adjust up or down as needed by monitoring the Max_used_connections status variable.

More information:

  • https://mariadb.com/kb/en/mariadb/server-system-variables/#max_connections

12. Transaction Isolation

It is suggested to investigate the available transaction isolation levels, and determine the best transaction isolation for this server’s use case.

More information:

  • https://mariadb.com/kb/en/mariadb/set-transaction/

  • https://mariadb.com/kb/en/mariadb/server-system-variables/#tx_isolation

  • http://karlssonondatabases.blogspot.com/2012/08/the-real-differences-betweenread.html

  • https://www.facebook.com/notes/mysql-at-facebook/repeatable-read-versus-readcommitted-for-innodb/244956410932

13. Binary Log Format

It is recommended to use ROW binary log format for master-master replication.

More information:

14. Auto Increment Offsets

To help reduce the chances of collision between two masters being written to simultaneously, the auto increment and auto increment offset values need to be adjusted accordingly. 

15. Sync Binlog

By default, flushing of the binlog to disk is handled by the OS. In the event of a server crash, it is possible to lose transactions from the binary log leading to replication being out sync. Setting sync_binlog = 1 causes the binlog file to be flushed on every commit.

This is slower, but the safest option.

More information:

16. Crash Safe(r) Slaves

To help avoid replication errors after a slave crash, enable relay log recover and the syncing of the relay log and relay log info files to disk.

More information:

17. Log Slave Updates

To have chained replication (master -> slave-> slave), log slave updates needs to be enabled. This tells a slave to write replicated transactions to its own binary log, so that they can then be replicated to slaves off of it.

18. Read Only Slaves

Slaves should be read only to avoid data accidentally being written to them.

Note: Users with super privilege can still write when the server is read only.

19. Slave Net Timeout

The slave_net_timeout variable is the number of seconds the slave will wait for a packet from the master before trying to reconnect. The default is 3600 (1 hour). This means if the link goes down and isn’t detected, it could be up to an hour before the slave reconnects. This could lead to the slave suddenly being up to an hour behind the master.

It is suggested to set slave_net_timeout to a more reasonable value such as 30 or 60.

More information:

20. Learn More

Join our webinar on Thursday, November 10 at 10am PST and 10am CET on last minute preparations for peak traffic periods like Black Friday and Cyber Monday.

Register now for the 10am PST webinar

Register now for the 10am CET webinar

With all the excitement surrounding the perfect Thanksgiving meal, it's often easy to overlook how to prep your database environment for the biggest online shopping day of the year! Each year, more and more shoppers opt for online holiday deals, instead of the more traditional mall experience, which means that retailers must prepare for multiple days of high online traffic to their e-commerce sites. This year you’ll be prepared as I’m passing along a few tips to tune your database environment for some of the biggest online holiday shopping days - Black Friday and Cyber Monday!

Login or Register to post comments

Ready for the holiday shopping season? 20 tips to prepare your MariaDB database environment for Black Friday and Cyber Monday!

Mon, 2016-11-07 19:56James McLaurin

With all the excitement surrounding the perfect Thanksgiving meal, it's often easy to overlook how to prep your database environment for the biggest online shopping day of the year! Each year, more and more shoppers opt for online holiday deals, instead of the more traditional mall experience, which means that retailers must prepare for multiple days of high online traffic to their e-commerce sites. This year you’ll be prepared as I’m passing along a few tips to tune your database environment for some of the biggest online holiday shopping days - Black Friday and Cyber Monday!

Setting up a database environment for the expected loads that Black Friday or Cyber Monday brings can be a tricky situation.  However, the suggestions in this blog centers on the principles of Scalability, Capacity, Performance and High Availability (HA).  For this post, I define each principle as:

  • Scalability is the ability to add capacity by adding resources
  • Capacity is the ability to handle load
  • Performance is tuning your database environment for increased capacity and scale
  • High Availability is serving requests with good performance

This blog post covers how to tune system variable in the MariaDB database environment for capacity, scalability and high availability.  

Before deploying, do not accept these suggestions blindly. Each MariaDB environment is unique and requires additional thought before making any changes.  You will most likely need to adjust these settings for your specific use case and environment.

Things you need to know:

MariaDB configuration file is located in /etc/my.cnf. Every time you modify this file you will need to restart the MySQL service so the new changes can take effect.

Black Friday and Cyber Monday Tuning Recommendations

1. InnoDB Buffer Pool Size

The InnoDB buffer pool should generally be set to 60-80% of the available RAM when using InnoDB exclusively. Ideally, all InnoDB data and indexes should be able to fit within memory or at least the working data set.

More information:

https://mariadb.com/kb/en/mariadb/xtradbinnodb-server-system-variables/#...

2. InnoDB Logs

There are two general suggestions for InnoDB log file sizing: Set combined total size of InnoDB log files greater than 25-50% of the InnoDB buffer pool size or set combined InnoDB log file log size equal to one hour’s worth of log entries during peak load. Larger log files can lead to slower recovery in the event of a server crash. However, they also reduce the number of checkpoints needed and reduce disk I/O.

Evaluate the size of one hour’s worth of binary logs under operational load, then decide whether to increase the size of the InnoDB log files or not.

More information:

https://mariadb.com/kb/en/mariadb/xtradbinnodb-server-system-variables/#...

3. InnoDB Log Buffer Size

A larger InnoDB log buffer size means less disk I/O for larger transactions. It is suggested to set this to 64M on all servers.

More information:

https://mariadb.com/kb/en/mariadb/xtradbinnodb-server-system-variables/#...

4. InnoDB Log Flush Interval

The innodb_flush_log_at_trx_commit variable controls when flushing of the log buffer to disk occurs. innodb_flush_log_at_trx_commit = 1 (default) flushes the log buffer to disk at each transaction commit. This is the safest, but also the least performant.  

innodb_flush_log_at_trx_commit = 0 flushes the log buffer to disk every second, but nothing on transaction commit. Up to one second (possibly more due to process scheduling) could be lost. If there’s any crash, MySQL or the server, can lose data. This is the fastest, but least safe option.

innodb_flush_log_at_trx_commit = 2 writes the log buffer out to file on each commit but flushes to disk every second. If the disk cache has a battery backup (for instance a battery backed cache raid controller) this is generally the best balance of performance and safety. A crash of MySQL should not lose data. A server crash or power outage could lose up to a second (possibly more due to process scheduling). A battery backed cache reduces this possibility.

It is suggested to use the first option for safety.

More information:

https://mariadb.com/kb/en/mariadb/xtradbinnodb-server-system-variables/#...

5. InnoDB IO Capacity

innodb_io_capacity should be set to approximately the maximum number of IOPS the underlying storage can handle.

This was set to 400, and has been increased to 1000 in the new configuration. It is suggested to benchmark the storage to determine whether this value can be increased further.

More information:

https://mariadb.com/kb/en/mariadb/xtradbinnodb-server-system-variables/#...

6. Thread Cache Size

It is suggested to monitor the value of Threads_created. If it continues increasing at more than a few threads per minute, increase the value of thread_cache_size.

The thread cache size is set to 200 in the new configuration.

More information:

https://mariadb.com/kb/en/mariadb/server-system-variables/#thread_cache_...

7. Table Cache, Table Definition Cache

The table_open_cache and table_defintion_cache variables control the number of tables and definitions to keep open for all threads.

Monitor Open_tables, Open_table_defintitions, Opened_tables, and Opened_table_definitions to determine the best value. The general suggestion is to set table_open_cache (and subsequently table_definition_cache) only high enough to reduce the rate of increase of the Opened_tables (and Opened_table_definitions respectively) status value.

Both table open cache and definition cache have been set to 2048 in the new configuration.

More information:

https://mariadb.com/kb/en/mariadb/server-system-variables/#table_open_cache

https://mariadb.com/kb/en/mariadb/server-system-variables/#table_definit...

8. Query Cache

Generally, if the query cache hit rate is below 50%, it should be investigated whether a performance benefit is being seen or not by having query cache enabled. There is an overhead with query cache for each query.

The query cache is currently disabled. Due to the nature of the application and the ratio of writes to reads, it is unlikely the query cache will offer any performance improvements and could negatively impact performance.

More information:

https://mariadb.com/kb/en/mariadb/query-cache/

9. Temporary Tables, tmp_table_size, & max_heap_table_size

MySQL uses the lower of max_heap_table_size and tmp_table_size to limit the size of temporary tables in memory. These are per client variables. While having this value large can help reduce the number of temporary tables created on disk, it also raises the risk of reaching the server's memory capacity since this is per client. Generally 32M to 64M is the suggested value to begin with for both variables and tune as needed.

Temporary tables are often used for GROUP BY, ORDER BY, DISTINCT, UNION, sub queries, etc. Ideally, MySQL should create these in memory, and as few on disk as possible.

It is important to note that queries not using joins appropriately and creating large temporary tables can be a cause for higher number of temporary tables on disk. Another reason is the memory storage engine uses fixed length columns and assumes worst case scenario. If columns are not sized correctly (for example, a VARCHAR(255) for a short string), this influences the size of the table in memory and can cause it to go to disk earlier than it should. Also, temporary tables with blob and text columns will immediately go to disk as the memory storage engine does not support them.

Both have been set to 64M in the new configuration.

More information:

https://mariadb.com/kb/en/mariadb/server-system-variables/#tmp_table_size

10. Warning Log Level

It is suggested to set this to log_warnings = 2. Doing so logs information about aborted connections and access-denied errors.

More information:

https://mariadb.com/kb/en/mariadb/server-system-variables/#log_warnings

http://www.chriscalender.com/what-exactly-does-log_warnings2-log/

11. Max Connections

Determine an appropriate value for max connections and change. Recommended starting value would be 500 and adjust up or down as needed by monitoring the Max_used_connections status variable.

More information:

https://mariadb.com/kb/en/mariadb/server-system-variables/#max_connections

12. Transaction Isolation

It is suggested to investigate the available transaction isolation levels, and determine the best transaction isolation for this server’s use case.

More information:

https://mariadb.com/kb/en/mariadb/set-transaction/

https://mariadb.com/kb/en/mariadb/server-system-variables/#tx_isolation

http://karlssonondatabases.blogspot.com/2012/08/the-real-differences-bet...

https://www.facebook.com/notes/mysql-at-facebook/repeatable-read-versus-...

13. Binary Log Format

It is recommended to use ROW binary log format for master-master replication.

More information:

https://mariadb.com/kb/en/mariadb/binary-log-formats/

14. Auto Increment Offsets

To help reduce the chances of collision between two masters being written to simultaneously, the auto increment and auto increment offset values need to be adjusted accordingly. 

15. Sync Binlog

By default, flushing of the binlog to disk is handled by the OS. In the event of a server crash, it is possible to lose transactions from the binary log leading to replication being out sync. Setting sync_binlog = 1 causes the binlog file to be flushed on every commit.

This is slower, but the safest option.

More information:

https://mariadb.com/kb/en/mariadb/replication-and-binary-log-server-syst...

16. Crash Safe(r) Slaves

To help avoid replication errors after a slave crash, enable relay log recover and the syncing of the relay log and relay log info files to disk.

More information:

https://mariadb.com/kb/en/mariadb/replication-and-binary-log-server-syst...

17. Log Slave Updates

To have chained replication (master -> slave-> slave), log slave updates needs to be enabled. This tells a slave to write replicated transactions to its own binary log, so that they can then be replicated to slaves off of it.

18. Read Only Slaves

Slaves should be read only to avoid data accidentally being written to them.

Note: Users with super privilege can still write when the server is read only.

19. Slave Net Timeout

The slave_net_timeout variable is the number of seconds the slave will wait for a packet from the master before trying to reconnect. The default is 3600 (1 hour). This means if the link goes down and isn’t detected, it could be up to an hour before the slave reconnects. This could lead to the slave suddenly being up to an hour behind the master.

It is suggested to set slave_net_timeout to a more reasonable value such as 30 or 60.

More information:

https://mariadb.com/kb/en/mariadb/replication-and-binary-log-server-syst...

20. Learn More

Join our webinar on Thursday, November 10 at 10am PST and 10am CET on last minute preparations for peak traffic periods like Black Friday and Cyber Monday.

Register now for the 10am PST webinar

Register now for the 10am CET webinar

Tags: High AvailabilityHowtoScaling About the Author

MySQL Cluster and real-time requirements

This blog gives some background to the decisions made when designing the
storage engine NDB Cluster used in MySQL Cluster around how to support
real-time requirements (or as I sometime refer to it, predictable response
time requirements).

Requirement analysisWhen analysing the requirements for NDB Cluster based on its usage in telecom
databases two things were important. The first requirement is that we need to
be able to respond to queries within a few milliseconds (today even down to
tens of microseconds). The second requirement is that we need to do this while
at the same time supporting a mix of simple traffic queries combined with a
number of more complex queries running at the same time.

The first requirement was the main requirement that led to NDB Cluster using a
main memory storage model with durability on disk using a REDO log and
various checkpoints. Today we also support storing non-indexed columns on
disk in combination with columns stored in main memory.

Potential solutionsThe second requirement was a bit harder to handle. To solve the second requirement
in an extremely large environment with many CPUs can be done by allowing the
traffic queries and management queries to run on different CPUs. This model will
however not work at all in a confined environment with only 1-2 CPUs and it will
even be hard to put to work in a large environment since the usage of the
management queries will come and go quickly.

The next potential solution is to simply leave the problem to the OS. Modern OSsof today use a time-sharing model. However each time quanta is fairly long
compared to our requirement of responding within parts of a millisecond.
So this model won't work very well either.

Yet another possibility would be to use a real-time operating system, but this would
marginalise the product too much.

Most DBMS today use the OS to handle the requirements on reponse times. So as an
example if one uses MySQL/InnoDB and send various queries to the MySQL Server,
some traffic queries and some management queries, MySQL will use different threads
for each query. MySQL will deliver good throughput even in the context of very
varying workloads since the OS will use time-sharing to fairly split the CPU usage
amongst the various threads. However it will not be able to handle response time
requirements of parts of a millisecond with a mixed load of simple and complex
queries.

AXE VMSo when designing NDB Cluster we wanted to avoid this problem. NDB was designed
within Ericsson. In Ericsson a real-time telecom switch had been developed in the
70s, the AXE. The AXE is still in popular use today and new versions of it are still
developed. AXE had a solution to this problem which was built around a message
passing machine.

I spent a good deal of the 90s developing a virtual machine for AXE called AXE VM
that later turned into a real product called APZ VM (APZ is the name of the CPU
subsystem in the AXE). This virtual machine was able to execute on any machine.
The AXE VM used a model where execution was handled as execution of signals. A
signal is simply a message, this message contains an address label, it contains
a signal number and it contains data of various sizes. A signal is executed inside
a block, a block is a module that is self-contained, it owns all its data and
the only manner to get to the data in the block is through sending a signal to the
block.

So effectively the AXE VM implemented a real-time operating system inside a normal
operating system such as Windows, Linux, Solaris or Mac OS X.

The AXE VM also had a lot of handling of the language used in AXE called PLEX. This
is no longer present in NDB. But NDB still is implemented using signals and blocks.
The blocks are implemented in C++ and in AXE VM it was possible to have such
blocks, they were called simulated blocks. In NDB all blocks are nowadays simulated
blocks.

How does NDB solve the real-time problemSo how does this model enable response times of down to parts of a millisecond even
in a highly loaded system. First of all it is important to state that NDB does
handle this. We have very demanding customers both in the telecom, networking and
in financial sectors and lately also in the storage world that expects to run
complex transactions involving tens of different key lookups and scan queries and
that expects these transactions to complete within a few milliseconds even at
90-95% load in the system.

As an example in the financial sector missing the deadline might mean that you miss
the opportunity to buy or sell some stock equity in real-time trading. In the telecom
sector your telephone call setup and other telco services depends on immediate
response to complex transactions.

At the same time these systems also need to ensure that they can analyse the data
in real-time, these queries obviously have less demanding response time
requirements, but they are not allowed to impact the response time of the traffic queries.

The virtual machine model implements this by using a design technique where each
signal is only allowed to execute for a few microseconds. A typical key lookup
query in modern CPUs takes less than two microseconds to execute. Scanning a table
is divided up into scanning a few rows at a time where each such scan takes less
than ten microseconds. All other maintenance work to handle restarts, node failures,
aborts, creating new tables and so forth is similarly implemented with the same
requirements on signal execution.

So what this means is that a typical traffic transaction is normally handled by one
key lookup or a short scan query and then the response is sent back to the API node.
A transaction consists of a number of such interactions normally on the order of
tens of such queries. This means that each interaction needs to complete within
100-200 microseconds in order to handle response times of a few millseconds
for the transaction.

NDB can handle this response time requirement even when 20-30 messages are
queued up before the message given that each message will only take on the order
of 1-2 microseconds to execute. So most of the time is still spent in the transporter
layer sending the message and receiving the message.

A complex query will execute in this model by being split into many small signal
executions. Each time a signal is completed it will put itself back into the queue
of signals and wait for its next turn.

So traffic queries will always have the ability to meet strict requirements on
response time. Another nice thing with this model is that it will adapt to
varying workloads within a few microseconds. So if there is currently no traffic
queries to execute, then the complex query will get the CPU to itself since the
next signal will execute immediately after being put on the queue.

Handling memory allocationOne more important factor in ensuring that NDB can always operate in an optimalmanner and deliver the expected throughput is that we control memory. All thememory is allocated at startup, this means that we cannot get into a situation wherewe oversubscribe the main memory of the machine. NDB even have a number ofconfig parameters to ensure that the memory used by NDB data nodes is neverpaged out.
Locking of CPUsOne more manner of ensuring that NDB always operates in an optimal manner is tocontrol the placement of threads onto different CPUs.
Behaviour of NDB at high loadThere is one more very important aspect of this model. As load increases two
things happens. First we execute more and more signals every time we have
received a set of signals. This means that the overhead to collect each
signal decreases. Second executing larger and larger sets of signals means
that we send larger and larger packets. This means that the cost per packet
decreases. Thus actually NDB data nodes executes more and more efficiently
as load increases. This is a very important characteristic that avoids many
overload problems.

Building a mixed environment for traffic and management queriesFinally the separation of Data Server and Query Server functionality makes it
possible to use different Query Server for traffic queries to the ones used
for complex queries. So in the MySQL Cluster model this means that you can
use a set of MySQL Servers in the cluster to handle short real-time queries.
You can use a different set of MySQL Servers to handle complex queries.
Thus MySQL Cluster can handle real-time requirements in a proper configuration
of the cluster even when operating using SQL queries.

ConclusionThe interface to the Data Server is as you can now see implemented on top of
signals, the most common ones are TCKEYREQ that implements all types of
key lookups using the primary key and SCAN_TABREQ that implements all types
of scan queries (also including join queries that have been pushed down to
data nodes). There is a protocol to carry these signals that currently uses
TCP/IP sockets but have historically also been carried by SCI, Infiniband
and shared memory transporters.

So the separation of Data Server and Query Server functionality might mean
that MySQL Cluster have slightly longer minimum response time compared to
a local storage engine in MySQL, but MySQL Cluster will continue to deliver
low and predictable response times even using varying workloads and even
when executing at very high loads.

One experiment that was done when developing pushdown join functionality
showed that the performance of those pushed down joins was the same
when executing in an otherwise idle cluster as when executing in a cluster
that performed 50.000 update queries per second.

NDB has been designed such that with some work of configuring it properly
it can be extremely reliable in delivering predictable response times. At the
same time we're working hard to make it easier and easier to configure also
when you don't want to control every bell and whistle. One step in this direction
is the introduction of the ability to read also from backup replicas and the
adaptive control of which threads that help out in sending.

Pages