Planet MySQL

Customizing Per-Process Metrics in PMM

If you have set up per-process metrics in Percona Monitoring and Management, you may have found yourself in need of tuning it further to not only group processes together, but to display some of them in isolation. In this blogpost we will explore how to modify the rules for grouping processes, so that you can make the most out of this awesome PMM integration.

Let’s say you have followed the link above on how to set up the per-process metrics integration on PMM, and you have imported the dashboard to show these metrics. You will see something like the following:

This is an internal testing server we use, in which you can see a high number of VBoxHeadless (29) and mysqld (99) processes running. All the metrics in the dashboard will be grouped by the name of the command used. But, what if we want to see metrics for only one of these processes in isolation? As things stand, we will not be able to do so. It may not make sense to do so in a testing environment, but if you are running multiple mysqld processes (or mongos, postgres, etc) bound to different ports, you may want to see metrics for each of them separately.

Modifying the configuration file

Enter all.yaml!

In the process-exporter documentation on using a configuration file, we can see the following:

The general format of the -config.path YAML file is a top-level process_names section, containing a list of name matchers. […] A process may only belong to one group: even if multiple items would match, the first one listed in the file wins.

This means that even if we have two rules that would match a process, only the first one will be taken into account. This will allow us to both list processes by themselves, and not miss any non-grouped process. How? Let’s imagine we have the following processes running:

mysqld --port=1 mysqld --port=2 mysqld --port=3 mysqld --port=4

And we wanted to be able to tell apart the instances running in ports 1 and 2 from the other ones, we could use the following rules:

- name: "mysqld_port_1"  cmdline:  - '.*mysqld.*port=1.*' - name: "mysqld_port_2"  cmdline:  - '.*mysqld.*port=2.*' - name: "{{.Comm}}"  cmdline:  - '.+'

In cmdline we will need the regular expression against which to match the process command running. In this case, we made use of the fact that they were using different ports, but any difference in the command strings can be used. The last rule is the one that will default to “anything else” (with the regular expression that matches anything).

The default rule at the end will make sure you don’t miss any other process, so unless you want only some processes metrics collected, you should always have a rule for it.

A real life working example of configuring per-process metrics

In case all these generic information didn’t make much sense, we will present a concrete example, hoping that it will make everything fit together nicely.

In this example we want to have the mysqld instance using the mysql_sandbox16679.sock socket isolated from all the others, and the VM with ID finishing in 97eafa2795da listed by their own. All other processes are to be grouped together by using the basename of the executable.

You can check the output from ps aux to see the full command used. For instance:

shell> ps aux | grep 97eafa2795da agustin+ 27785  0.7 0.2 5619280 542536 ?      Sl Nov28 228:24 /usr/lib/virtualbox/VBoxHeadless --comment centos_node1_1543443575974_22181 --startvm a0151e29-35dd-4c14-8e37-97eafa2795da --vrde config

So, we can use the following regular expression for it (we use .* to match any string):

.*VBoxHeadless.*97eafa2795da.*

The same applies to the regular expression for the mysqld process.

The configuration file will end up looking like:

shell>  cat /etc/process-exporter/all.yaml process_names:  - name: "Custom VBox"    cmdline:    - '.*VBoxHeadless.*97eafa2795da.*'  - name: "Custom MySQL"    cmdline:    - '.*mysqld.*mysql_sandbox16679.sock.*'  - name: "{{.Comm}}"    cmdline:    - '.+'

Let’s restart the service, so that new changes apply, and we will check the graphs after five minutes, to see new changes. Note that you may have to reload the page for the changes to apply.

shell> systemctl restart process-exporter

After refreshing, we will see the new list of processes in the drop-down list:

And after we select them, we will be able to see data for those processes in particular:

Thanks to the default configuration at the end, we are still capturing data from all the other mysqld processes. However, they will have their own group, as mentioned before:

 

Meet MariaDB 10.4 Support in MySQL Database Tools

We are glad to announce the release of the updated dbForge tools for MySQL. To ensure that the users of our MySQL tools can work with the most up-to-date database engines, we keep expanding connectivity options for our MySQL management tool. For this release, we have implemented support for the latest MariaDB 10.4. in  the […]

Connectivity to MariaDB Keeps Extending in dbForge Studio for MySQL

We are thrilled to inform our MySQL users that fresh and new dbForge Studio for MySQL, v8.1 has been just rolled out! To ensure that the users of dbForge Studio for MySQL can work with the most up-to-date database engines, we keep expanding connectivity options for our MySQL management tool. In this version, we have […]

MySQL in the Oracle Cloud – IaaS Getting Started

In this blog post I’ll be starting another series with MySQL on the Oracle Cloud, but focusing this  post on IaaS initial setups.  In “the series” I’ll review at a high level, building an InnoDB Cluster on IaaS Compute in the Oracle Cloud, mostly focusing certain cloud characteristics as they pertain to building, staging and managing… Read More »

Upcoming Webinar Thurs 1/17: How to Rock with MyRocks

Please join Percona’s Chief Technology Officer, Vadim Tkachenko, as he presents How to Rock with MyRocks on Thursday, January 17th at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

Register Now

MyRocks is a new storage engine from Facebook and is available in Percona Server for MySQL. In what cases will you want to use it? We will check different workloads and when MyRocks is most suitable for you. Also, as for any new engine, it’s important to set it up and tune it properly. So, we will review the most important settings to pay attention to.

Register for this webinar to learn How to Rock with MyRocks.

Super Simple GraphQL with Node

GraphQL is a specification and therefore language agnostic. When it comes GraphQL development with node, there are various options available ranging from graphql-js, express-graphql to apollo-server. In this tutorial, I'll be showing you how simple it is to get a fully featured GraphQL server up and running in Node.js with Apollo Server.

Since the lauch of Apollo Server 2, creating GraphQL server with Apollo Server has gotten a lot easier, to to mention the other features that came with it. For the purpose of demonstration, we'll be building a GraphQL server for a simple recipe app.

Prerequisites

This tutorial assumes the following:

  • Node.js and NPM installed on your computer
  • Basic knowledge of GraphQL
What is GraphQL?

GraphQL is a declarative data fetching specification and query language for APIs. It was created by Facebook. GraphQL is an effective alternative to REST, as it was created to overcome some of the shortcomings of REST like under/over fetching.

How does it work?

Unlike REST, GraphQL uses one endpoint. So basically, we make one request to the endpoint and we'll get one response as JSON. This JSON response can contain as little or as much data as we want. Like REST, GraphQL can be operated over HTTP, though GraphQL is protocol agnostic.

A typical GraphQL server is comprised of schema and resolvers. A schema (or GraphQL schema) contains type definitions that would make up a GraphQL API. A type definition contains field(s), each with what it is expected to return. Each field is mapped to a function on the GraphQL server called a resolver. Resolvers contains the implementation logic and returns data for a field. Other words, schema contains type definition, while resolvers contains the actual implementations.

Setting up the database

We’ll start by setting up our database. To keep things simple and straightforward, we’ll be using SQLite for our database. Also, we’ll be using Sequelize, which is an ORM for Node.js, to interact with our database. First, let's create a new project:

$ mkdir graphql-recipe-server $ cd graphql-recipe-server $ yarn init -y

Next, let's install Sequelize:

$ yarn add sequelize sequelize-cli sqlite3

In addtion to install Sequelize, we are also installing sqlite3 package for Node.js. To help us scaffold our project, we’ll be using the Sequelize CLI, which we are installing as well.

Let's scaffold our project with the CLI:

$ node_modules/.bin/sequelize init

This will create following folders:

config: contains config file, which tells Sequelize how to connect with our database. models: contains all models for our project, also contains an index.js file which integrates all the models together. migrations: contains all migration files. seeders: contains all seed files.

For the purpose of this tutorial, we won't be creating with any seeders. Open config/config.json and replace it content with below:

// config/config.json { "development": { "dialect": "sqlite", "storage": "./database.sqlite" } }

We set the dialect to sqlite and set the storage to point to a SQLite database file.

Next, we need to create the database file directly inside the project's root directory:

$ touch database.sqlite Creating models and migrations

With the database set up out of the way, we can start creating the models for our project. Our recipe app will have two models: User and Recipe. We’ll be using the Sequelize CLI for this:

$ node_modules/.bin/sequelize model:create --name User --attributes name:string,email:string,password:string

This is will create a user.js file inside the models directory and a corresponding migration file inside the migrations directory.

Since we don't want any on the fields on the User model to be nullable, we need to explicitly defined that. Open migrations/XXXXXXXXXXXXXX-create-user.js and update the fields definitions as below:

// migrations/XXXXXXXXXXXXXX-create-user.js name: { allowNull: false, type: Sequelize.STRING }, email: { allowNull: false, type: Sequelize.STRING }, password: { allowNull: false, type: Sequelize.STRING }

Then we'll do the same in the User model:

// models/user.js name: { type: DataTypes.STRING, allowNull: false }, email: { type: DataTypes.STRING, allowNull: false }, password: { type: DataTypes.STRING, allowNull: false }

Next, let's create the Recipe model:

$ node_modules/.bin/sequelize model:create --name Recipe --attributes title:string,ingredients:text,direction:text

Just as we did with the User model, we'll do the same for the Recipe model. Open migrations/XXXXXXXXXXXXXX-create-recipe.js and update the fields definitions as below:

// migrations/XXXXXXXXXXXXXX-create-recipe.js userId: { type: Sequelize.INTEGER.UNSIGNED, allowNull: false }, title: { allowNull: false, type: Sequelize.STRING }, ingredients: { allowNull: false, type: Sequelize.STRING }, direction: { allowNull: false, type: Sequelize.STRING },

You'll notice we have an additional field: userId, which would hold the ID of the user that created a recipe. More on this shortly.

Update the Recipe model as well:

// models/recipe.js title: { type: DataTypes.STRING, allowNull: false }, ingredients: { type: DataTypes.STRING, allowNull: false }, direction: { type: DataTypes.STRING, allowNull: false }

To wrap up with our models, let's define the relationship between them. You might have guessed with the inclusion of the userId column to the recipes table, that we want to be able to associate a recipe to a user and vice-versa. So, we want a one-to-many relationships between our models.

Open models/user.js and update the User.associate function as below:

// models/user.js User.associate = function (models) { User.hasMany(models.Recipe) }

We need to also define the inverse of the relationship on the Recipe model:

// models/recipe.js Recipe.associate = function (models) { Recipe.belongsTo(models.User, { foreignKey: 'userId' }) }

By default, Sequelize will use a camel case name from the corresponding model name and its primary key as the foreign key. So in our case, it will expect the foreign key to be UserId. Since we named the column differently, we need to explicitly define the foreignKey on the association.

Now, we can run the migrations:

$ node_modules/.bin/sequelize db:migrate Creating the GraphQL server

Phew! Let's get to the GraphQL stuff. As earlier mentioned, we'll be using Apollo Server for building our GraphQL server. So, let's install it:

$ yarn add apollo-server graphql bcryptjs

Apollo Server requires graphql as a dependency, hence the need to install it as well. Also, we install bcryptjs, which we'll use to hash users passwords later on.

With those installed, create a src directory, then within it, create an index.js file and paste the code below in it:

// src/index.js const { ApolloServer } = require('apollo-server') const typeDefs = require('./schema') const resolvers = require('./resolvers') const models = require('../models') const server = new ApolloServer({ typeDefs, resolvers, context: { models } }) server .listen() .then(({ url }) => console.log('Server is running on localhost:4000'))

Here, we create a new instance of Apollo Server, passing to it our schema and resolvers (both which we'll create shortly). We also pass the models as the context to the Apollo Server. This will allow us have access to the models from our resolvers.

Finally, we start the server.

Defining the GraphQL Schema

GraphQL schema is used to define the functionality a GraphQL API would have. Basically, a GraphQL schema is comprised of types. A type can be for defining the structure of our domain specific entity. In addition to defining types for our domain specific entities, we can also define types for GraphQL operations, which will in turn translates to the functionality a GraphQL API will have. These operations are: queries, mutations and subscriptions. Queries are used to perform read operations (fetching of data) on a GraphQL server. Mutations on the other hand are used to perform write operations (inserting, updating or deleting data) on a GraphQL server. Subscriptions are completely different from these two, as they are used to add realtime functionality to a GraphQL server.

We'll be focusing only on queries and mutations in this tutorial.

Now that we understand what GraphQL schema is, let's create the schema for our app. Within the src directory, create a schema.js file and paste the code below in it:

// src/schema.js const { gql } = require('apollo-server') const typeDefs = gql` `type User { id: Int! name: String! email: String! recipes: [Recipe!]! } type Recipe { id: Int! title: String! ingredients: String! direction: String! user: User! } type Query { user(id: Int!): User allRecipes: [Recipe!]! recipe(id: Int!): Recipe } type Mutation { createUser(name: String!, email: String!, password: String!): User! createRecipe( userId: Int! title: String! ingredients: String! direction: String! ): Recipe! } `` module.exports = typeDefs

First, we pull in the gql package from apollo-server. Then we use it to define our schema. Ideally, we'd want to our GraphQL schema to mirror our database schema as much as possible. So we define to types: User and Recipe, which corresponds to our models. On the User type, in addition to defining the fields we have on the User model, we also define a recipes fields, which will be used to retrieve the user's recipes. Same with the Recipe type, we define a user field, which will be used to get the user of a recipe.

Next, we define three queries; for fetching a single user, for fetching all recipes that have been created and for fetching a single recipe respectively. Both the user and recipe queries can either return a user or recipe respectively or return null if no corresponding match was found for the ID. The allRecipes query will always return a array of recipes, which might be empty if no recipe as been created yet.

Lastly, we define mutations for creating and new user as well as creating a new recipe. Both mutations return back the created user and recipe respectively.

Tips: The ! denotes a field is required, while [] denotes the field will return an array of items.

Creating the resolvers

Resolvers defines how the fields in a schema is executed. In other words, our schema is useless without resolvers. Creating a resolvers.js file inside the src directory and past the following code in it:

// src/resolvers.js const resolvers = { Query: { async user (root, { id }, { models }) { return models.User.findById(id) }, async allRecipes (root, args, { models }) { return models.Recipe.findAll() }, async recipe (root, { id }, { models }) { return models.Recipe.findById(id) } }, } module.exports = resolvers

We start by creating the resolvers for our queries. Here, we are making use of the models to perform the necessary queries on the database and simply return the results.

Still inside src/resolvers.js, let's import bcryptjs at the top of the file:

// src/resolvers.js const bcrypt = require('bcryptjs')

Then add the code below immediately after the Query object:

// src/resolvers.js Mutation: { async createUser (root, { name, email, password }, { models }) { return models.User.create({ name, email, password: await bcrypt.hash(password, 10) }) }, async createRecipe (root, { userId, title, ingredients, direction }, { models }) { return models.Recipe.create({ userId, title, ingredients, direction }) } },

The createUser mutation accepts the name, email and password of a user, and creates a new record in the database with the supplied details. We make sure to hash the password using the bcrypt package before persisting it to the database. It returns the newly created user. The createRecipe mutation accepts the ID of the user that's creating the recipe as well as the details for the recipe itself, persists them to the database and returns the newly created recipe.

To wrap up with the resolvers, let's define how we want our custom fields (recipes on the User and user on Recipe) to be resolved. Add the code below inside src/resolvers.js just immediately the Mutation object:

// src/resolvers.js User: { async recipes (user) { return user.getRecipes() } }, Recipe: { async user (recipe) { return recipe.getUser() } }

These uses the methods (getRecipes() , getUser()), which are made available on our models by Sequelize due to the relationships we defined.

Testing our GraphQL server

It's time to test our GraphQL server out. First, we need to start the server with:

$ node src/index.js

This should be running on http://localhost:4000, and we should see GraphQL Playground running if we access it. Let's try creating a new recipe:

# create a new recipe mutation { createRecipe( userId: 3 title: "Sample 2" ingredients: "Salt, Pepper" direction: "Add salt, Add pepper" ) { id title ingredients direction user { id name email } } }

We should see a result as below:

Conclusion

In this tutorial, we looked at how to creating a GraphQL server in Node.js with Apollo Server. We also saw how to integrate a database with a GraphQL server using Sequelize.

Understanding Status of MariaDB Server JIRA Issues

In my previous blog post on MariaDB's JIRA for MySQL users who are familiar with MySQL bugs database (but may be new to JIRA) I've presented some details about statuses that JIRA issues may have. There is no one to one correspondence with MySQL bug's statuses that I once described in details here. In case of MariaDB Server bugs ("JIRA issues") one may have to check not only "Status" field, but also "Resolution" filed and even "Labels" field to quickly understand what is the real status and what MariaDB engineers decided or are waiting for. So, I think some additional clarifications may help MySQL users who check or report MariaDB bugs as well.

Let me present details of this statuses correspondence in a simple table, where the first column contains MySQL's bug status, while 3 other columns contain the content of corresponding MariaDB Server JIRA issue's fields, "Status", "Resolution" and "Labels". There is also "Comment" column with some explanation on what else is usually done in JIRA issue when it gets this set of values defining its status or what this may mean in MySQL bugs database etc. Most important MySQL bug statuses are taken from this my post (there are more of them, but others are rarely used, especially when real work on bugs was moved into internal bugs database by Oracle, or were removed since that post as it happened to "To be fixed later").

MySQL Bug Status MariaDB JIRA Status MariaDB JIRA Resolution MariaDB JIRA Label Comment Open OPEN Unresolved Typical status for just reported bug Closed CLOSED Fixed You should see list of versions that got the fix in the Fix Version/s field Duplicate CLOSED Duplicate So, in MariaDB it's "closed as a duplicate" Analyzing OPEN Unresolved Usually bug is assigned when some engineer is working on it, including analysis stage Verified CONFIRMED Unresolved CONFIRMED bugs are usually assigned in JIRA while in MySQL "Verified" bugs are usually unassigned Won't fix CLOSED Won't Fix Usually remains assigned Can't repeat CLOSED Cannot reproduce Unlike in MySQL, usually means that both engineer and bug reporter are not able to reproduce this No Feedback CLOSED Incomplete need_feedback As in MySQL, bug should stay with "need_feedback" label for some time before it's closed as incomplete Need Feedback OPEN Unresolved need_feedback Usually in the last comment in the bug you can find out what kind of feedback is required. No automatic setting to "No Feedback" in 30 days Not a Bug CLOSED Not a Bug   Unsupported CLOSED Won't Fix There is no special "Unsupported" status in MariaDB. Most likely when there is a reason NOT to fix it's stated in the comment.
In the table above you can click on some links to see the list of MariaDB bugs with the status discussed in the table row. This is how I am going to use this post from now on, as a quick search starting point :) It will also be mentioned on one of slides of my upcoming FOSDEM 2019 talk.

AWS Lambda And Node.js Tutorial | Getting Started With Serverless

In this tutorial, we will see AWS Lambda And Node.js Tutorial | Getting Started With Serverless. In this post, we will see how we can configure the serverless on mac and up and running with aws lambda with nodejs from scratch. We will look at how we can deploy a simple test aws-nodejs application. But before the start, we need to understand some terminology. So let us know that first.

Function As A Service (FAAS)

The architecture AWS Lambda is called either a Serverless Computing or Function as a Service. It’s groundbreaking technology because of the lack of servers.

But wait, that sounds strange. Well the code is not running on potatoes, is it!? Okay, that’s just a saying.

What’s going on here is that you, as a developer, don’t need to worry about the infrastructure your code is running on.

You deploy the code into the cloud, and it handles the creation of all needed resources by itself using the containers.

Every time an AWS Lambda Function is created, a container is spun up to serve it. It’s not a Docker container though, instead of a proprietary container built by AWS.

Principles of FaaS
  1. It completely abstracts the servers away from a developer.
  2. The billing is based on the consumption and executions of functions, not server instance sizes like Amazon EC2.
  3. Services that are event-driven and instantaneously scalable.

On a basic level, you can describe them as a way to run some code when ‘something’ happens, or event occurs like receive an HTTP request from the client.

Well known FaaS Providers

AWS, Azure, and Google Cloud all provide this kind of serverless solution.  A lot of innovation and R&D is still going on in this area, and things are rapidly improving and changing over time. Leading three in the industries are following.

  1. AWS Lambda
  2. Azure Functions
  3. Cloud Functions
Serverless Framework

Serverless is useful to build web, mobile and IoT applications using AWS Lambda and API Gateway, Azure Functions, Google Cloud Functions, and more.

Serverless computing is a cloud-computing execution model in which a cloud provider acts as a server, dynamically managing the allocation of machine resources. 

The pricing is entirely based on the actual amount of resources consumed by an application, rather than on pre-purchased units of capacity used in traditional servers.

Serverless is the native architecture of a cloud that enables you to shift more of your operational responsibilities to AWS, increasing your ability towards your main application logic and less infrastructure centric.

Serverless allows you to build and run applications and services without thinking about servers. 

It eliminates infrastructure management tasks such as server or cluster provisioning, patching, operating system maintenance, and capacity provisioning.

You can build on serverless for nearly any type of application or backend service, and everything required to run and scale your application with high availability is handled for you.

AWS Lambda And Node.js Tutorial

For this task, you need to create an AWS account. It provides a Free-tier with the limitations. You can create your free account here.

You will have a node.js install on your local machine. I am using MacBook Air, and my node version is 11.3.0.

You also need to have an editor. I am using Visual Studio Code.

Step 1: Install Serverless Globally

The first step is to install serverless globally in your local machine. Type the following command to install. If you are using Windows, then open the CMD on administrator mode.

sudo npm install serverless -g

 

Ignore that yellow warning; we have successfully installed the serverless in our machine.

Okay, now create a folder called node-lambda and go inside that folder and create an aws-nodejs template.

Now, we can use this serverless package to install a template specific to aws-nodejs. So, let us install aws-nodejs template for serverless.

serverless create -t aws-nodejs

 

Open the project on VSCode, and you will see the folder structure like this image.

 

Step 2: Create Access Key and Secret For AWS

Next step is to create an access key and secret to connect the serverless with our aws account.

After your successful signup on AWS, you can create your access key and secret.

See the below image, when you logged in, you can click on your name on the upper right corner, and you can see it opens up a drop down and then click on the My Security Credentials.

You will see this kind of page with a popup.

 

For this demo, let’s go with the Continue to security Credentials. However, you can go with the IAM Users policy to create and give access to specific users and policies.

Next, go for Access keys (access key ID and secret access key) and create new keys. You can find both access key and access secret.

Now, you need to provide that keys to the serverless to connect serverless and aws-nodejs.

For that, you need to type the following command in your terminal and in between that command, you need to provide your access key and secret. If you have already done it, then you do not need to do that. Otherwise, you need to do that.

serverless config credentials --provider aws --key your aws key --secret your aws secret

In the above command, you need to place your access key instead of your aws key and place your access secret instead of your aws secret.

After hitting the enter, if your credentials are right, then you will see a success message like below.

 

Bingo, you got it right, and now your configuration step is over.

Step 3: Creating your First Function

Now, open the editor in which your project is open and go to the handler.js file.

Modify that code with the following code.

// handler.js 'use strict'; module.exports.hello = async (event, context) => { return { statusCode: 200, body: JSON.stringify({ message: 'Expecto Petronum', input: event, }), }; };

So, when the HTTP request hit the AWS Lambda Service URL, we will get a response like this message which is Expecto Petronum.

Now, go to this file called serverless.yml.

I have modified that file with the following code and removed the commented part in the file.

service: lambda-test provider: name: aws runtime: nodejs8.10 functions: hello: handler: handler.hello events: - http: path: users/create method: get

What I have done is first, I have changed the service name to lambda-test and added a function called hello, which is inside the handler.js file.

So, when the request hit the URL, this function will execute and gives the response defined in that function.

Okay, now all the coding is done.

We need to deploy this function into the AWS. So type the following command.

serverless deploy

 

If the deployment is the first time, then you will see something like this.

 

So, the endpoint is generated for our service. We can use that endpoint as an API or web service to get a JSON response. So it also becomes a BaaS(Backend as a Service).

Here, in the console, I can see my endpoint to which I can get the response. You also have your endpoint which you can access from your browser. So go to the browser and hit that endpoint and you will see an output like this.

 

That is it guys, we have successfully deployed our first function as a service using AWS Lambda.

Step 4: Create a second function

Go to the project inside the VSCode and open the serverless.yml file and add another function called app.

service: lambda-test provider: name: aws runtime: nodejs8.10 functions: hello: handler: handler.hello events: - http: path: users/create method: get app: handler: handler.app events: - http: path: app/get method: get

Also, we need to write the code inside the handler.js file. 

This app function decides what to give the response when the URL hit in the browser or API endpoint gets accessed. Let us add a new function inside the handler.js file.

// handler.js 'use strict'; module.exports.hello = async (event, context) => { return { statusCode: 200, body: JSON.stringify({ message: 'Expecto Petronum', input: event, }), }; }; module.exports.app = async (event, context) => { return { statusCode: 200, body: JSON.stringify({ message: 'AppDividend API is accessed', input: event, }), }; };

Save the file and deploy the code.

serverless deploy

 

See in the endpoints section: you have now two endpoints.

So, it is beneficial and easy to create a new function as a backend service, and you will be charged based on how many times your service has been accessed.

If you want to build an API and do not manage any server, then you have this best way to use AWS Lambda to avoid managing your server and focus on only the code. Rest will handle by the cloud service provider.

Finally, AWS Lambda And Node.js Tutorial | Getting Started With Serverless post is over. Thanks for taking.

The post AWS Lambda And Node.js Tutorial | Getting Started With Serverless appeared first on AppDividend.

AWS Aurora MySQL – HA, DR, and Durability Explained in Simple Terms

It’s a few weeks after AWS re:Invent 2018 and my head is still spinning from all of the information released at this year’s conference. This year I was able to enjoy a few sessions focused on Aurora deep dives. In fact, I walked away from the conference realizing that my own understanding of High Availability (HA), Disaster Recovery (DR), and Durability in Aurora had been off for quite a while. Consequently, I decided to put this blog out there, both to collect the ideas in one place for myself, and to share them in general. Unlike some of our previous blogs, I’m not focused on analyzing Aurora performance or examining the architecture behind Aurora. Instead, I want to focus on how HA, DR, and Durability are defined and implemented within the Aurora ecosystem.  We’ll get just deep enough into the weeds to be able to examine these capabilities alone.

Aurora MySQL – What is it?

We’ll start with a simplified discussion of what Aurora is from a very high level.  In its simplest description, Aurora MySQL is made up of a MySQL-compatible compute layer and a multi-AZ (multi availability zone) storage layer. In the context of an HA discussion, it is important to start at this level, so we understand the redundancy that is built into the platform versus what is optional, or configurable.

Aurora Storage

The Aurora Storage layer presents a volume to the compute layer. This volume is built out in 10GB increments called protection groups.  Each protection group is built from six storage nodes, two from each of three availability zones (AZs).  These are represented in the diagram above in green.  When the compute layer—represented in blue—sends a write I/O to the storage layer, the data gets replicated six times across three AZs.

Durable by Default

In addition to the six-way replication, Aurora employs a 4-of-6 quorum for all write operations. This means that for each commit that happens at the database compute layer, the database node waits until it receives write acknowledgment from at least four out of six storage nodes. By receiving acknowledgment from four storage nodes, we know that the write has been saved in at least two AZs.  The storage layer itself has intelligence built-in to ensure that each of the six storage nodes has a copy of the data. This does not require any interaction with the compute tier. By ensuring that there are always at least four copies of data, across at least two datacenters (AZs), and ensuring that the storage nodes are self-healing and always maintain six copies, it can be said that the Aurora Storage platform has the characteristic of Durable by Default.  The Aurora storage architecture is the same no matter how large or small your Aurora compute architecture is.

One might think that waiting to receive four acknowledgments represents a lot of I/O time and is therefore an expensive write operation.  However, Aurora database nodes do not behave the way a typical MySQL database instance would. Some of the round-trip execution time is mitigated by the way in which Aurora MySQL nodes write transactions to disk. For more information on exactly how this works, check out Amazon Senior Engineering Manager, Kamal Gupta’s deep-dive into Aurora MySQL from AWS re:Invent 2018.

HA and DR Options

While durability can be said to be a default characteristic to the platform, HA and DR are configurable capabilities. Let’s take a look at some of the HA and DR options available. Aurora databases are deployed as members of an Aurora DB Cluster. The cluster configuration is fairly flexible. Database nodes are given the roles of either Writer or Reader. In most cases, there will only be one Writer node. The Reader nodes are known as Aurora Replicas. A single Aurora Cluster may contain up to 15 Aurora Replicas. We’ll discuss a few common configurations and the associated levels of HA and DR which they provide. This is only a sample of possible configurations: it is not meant to represent an exhaustive list of the possible configuration options available on the Aurora platform.

Single-AZ, Single Instance Deployment

The most basic implementation of Aurora is a single compute instance in a single availability zone. The compute instance is monitored by the Aurora Cluster service and will be restarted if the database instance or compute VM has a failure. In this architecture, there is no redundancy at the compute level. Therefore, there is no database level HA or DR. The storage tier provides the same high level of durability described in the sections above. The image below is a view of what this configuration looks like in the AWS Console.

Single-AZ, Multi-Instance

HA can be added to a basic Aurora implementation by adding an Aurora Replica.  We increase our HA level by adding Aurora Replicas within the same AZ. If desired, the Aurora Replicas can be used to also service some of the read traffic for the Aurora Cluster. This configuration cannot be said to provide DR because there are no database nodes outside the single datacenter or AZ. If that datacenter were to fail, then database availability would be lost until it was manually restored in another datacenter (AZ). It’s important to note that while Aurora has a lot of built-in automation, you will only benefit from that automation if your base configuration facilitates a path for the automation to follow. If you have a single-AZ base deployment, then you will not have the benefit of automated Multi-AZ availability. However, as in the previous case, durability remains the same. Again, durability is a characteristic of the storage layer. The image below is a view of what this configuration looks like in the AWS Console. Note that the Writer and Reader are in the same AZ.

Multi-AZ Options

Building on our previous example, we can increase our level of HA and add partial DR capabilities to the configuration by adding more Aurora Replicas. At this point we will add one additional replica in the same AZ, bringing the local AZ replica count to three database instances. We will also add one replica in each of the two remaining regional AZs. Aurora provides the option to configure automated failover priority for the Aurora Replicas. Choosing your failover priority is best defined by the individual business needs. That said, one way to define the priority might be to set the first failover to the local-AZ replicas, and subsequent failover priority to the replicas in the other AZs. It is important to remember that AZs within a region are physical datacenters located within the same metro area. This configuration will provide protection for a disaster localized to the datacenter. It will not, however, provide protection for a city-wide disaster. The image below is a view of what this configuration looks like in the AWS Console. Note that we now have two Readers in the same AZ as the Writer and two Readers in two other AZs.

Cross-Region Options

The three configuration types we’ve discussed up to this point represent configuration options available within an AZ or metro area. There are also options available for cross-region replication in the form of both logical and physical replication.

Logical Replication

Aurora supports replication to up to five additional regions with logical replication.  It is important to note that, depending on the workload, logical replication across regions can be notably susceptible to replication lag.

Physical Replication

One of the many announcements to come out of re:Invent 2018 is a product called Aurora Global Database. This is Aurora’s implementation of cross-region physical replication. Amazon’s published details on the solution indicate that it is storage level replication implemented on dedicated cross-region infrastructure with sub-second latency. In general terms, the idea behind a cross-region architecture is that the second region could be an exact duplicate of the primary region. This means that the primary region can have up to 15 Aurora Replicas and the secondary region can also have up to 15 Aurora Replicas. There is one database instance in the secondary region in the role of writer for that region. This instance can be configured to take over as the master for both regions in the case of a regional failure. In this scenario the secondary region becomes primary, and the writer in that region becomes the primary database writer. This configuration provides protection in the case of a regional disaster. It’s going to take some time to test this, but at the moment this architecture appears to provide the most comprehensive combination of Durability, HA, and DR. The trade-offs have yet to be thoroughly explored.

Multi-Master Options

Amazon is in the process of building out a new capability called Aurora Multi-Master. Currently, this feature is in preview phase and has not been released for general availability. While there were a lot of talks at re:Invent 2018 which highlighted some of the components of this feature, there is still no affirmative date for release. Early analysis points to the feature being localized to the AZ. It is not known if cross-region Multi-Master will be supported, but it seems unlikely.

Summary

As a post re:Invent takeaway, what I learned was that there is an Aurora configuration to fit almost any workload that requires strong performance behind it. Not all heavy workloads also demand HA and DR. If this describes one of your workloads, then there is an Aurora configuration that fits your needs. On the flip side, it is also important to remember that while data durability is an intrinsic quality of Aurora, HA and DR are not. These are completely configurable. This means that the Aurora architect in your organization must put thought and due diligence into the way they design your Aurora deployment. While we all need to be conscious of costs, don’t let cost consciousness become a blinder to reality. Just because your environment is running in Aurora does not mean you automatically have HA and DR for your database. In Aurora, HA and DR are configuration options, and just like the on-premise world, viable HA and DR have additional costs associated with them.

For More Information See Also:

 

 

 

LSM math: fixing mistakes in my last post

My last post explained the number of levels in an LSM that minimizes write amplification using 3 different estimates for the per-level write-amp. Assuming the per-level growth factor is w then the 3 estimates were approximately w, w+1 and w-1 and named LWA-1, LWA-2 and LWA-3 in the post.

I realized there was a mistake in that post for the analysis of LWA-3. The problem is that the per-level write-amp must be >= 1 (and really should be > 1) but the value of w-1 is <= 1 when the per-level growth factor is <= 2. By allowing the per-level write-amp to be < 1 it easy to incorrectly show that a huge number of levels reduces write-amp as I do for curve #3 in this graph. While I don't claim that (w-1) or (w-1)/2 can't be a useful estimate for per-level write-amp in some cases, it must be used with care.

Explaining LWA-3

The next challenge is to explain how LWA-3 is derived. That comes from equation 12 on page 9 of the Dostoevsky paper. Start with the (T-1)/(K+1) term and with K=1 then this is (T-1)/2. T in the paper is the per-level growth factor so this is the same as (w-1)/2. The paper mentions that this is derived using an arithmetic series but does not show the work. I show my work but was not able to reproduce that result.

Assume that the per-level growth factor is w, all-to-all compaction is used and the LSM tree has at least 3 levels. When full L1 has size 1, L2 has size w and L3 has size w*w. There are four derivations below - v1, v2, v3, v4. The results are either w/2 or (w+1)/2 which doesn't match (w-1)/2 from the paper. Fortunately, my previous post shows how to minimize total write-amp assuming the per-level write-amp is w/2 or (w+1)/2. I will contact the author to figure out what I am missing.

The analysis below is for merges from L1 to L2, but it holds for merges from Ln to Ln+1. I think that v1 and v2 are correct and their estimate for per-level write-amp is (w+1)/2. As explained below I don't think that v3 or v4 are correct, their estimate for per-level write-amp is w/2.

I have yet to explain how to get (w-1)/2.

v1

Assume that merges are triggered from Ln to Ln+1 when a level is full -- L1 has size 1, L2 has size w, L3 has size w*w. A level is empty immediately after it is merged into the next level. So L2 gets full, then is merged into L3 and becomes empty, then slowly gets larger as L1 is merged into it w times. The per-level write-amp from this is (w+1)/2.

* merges into L2 write output of size 1, 2, ..., w
* then L2 is full
* sum of that sequence -> w*(w+1)/2
* average value is sum/w -> (w+1)/2

1) Moving data of size 1 from L1 to L2 writes (w+1)/2 on average
2) Therefore per-level write-amp for L1 -> L2 is (w+1)/2

Note that per-level write-amp is (avg merge output to Ln / size of Ln-1)
* avg merge output to L2 is (w+1)/2
* size of Ln-1 is 1

v2

Assume that merges are triggered from Ln to Ln+1 when a level is almost full -- L1 has size 1 * (w-1)/w, L2 has size w * (w-1)/w, L3 has size (w*w) * (w-1)/w. The trigger conditions can be reduced to L1 has size (w-1)/w, L2 has size (w-1) and L3 has size w*(w-1).

This assumes that w merges are done from L1 to L2 for L2 to go from empty to full. Each merge adds data of size (w-1)/w because L1:L2 merge is triggered when L1 has that much data. Thus L2 has size (w-1) after w merges into it at which point L2:L3 merge can be done. The per-level write-amp from this is the same as it was for v1.

* merges into L2 write output of size (w-1)/w * [1, 2, ..., w]
* then L2 is full
* sum of that sequence -> (w-1)/w * w*(w+1)/2 = (w-1)(w+1)/2
* average value is sum/w -> (w-1)(w+1)/(2*w)

As from v1, per-level write-amp is (avg merge output to Ln / size of Ln-1)
* avg merge output to L2 = (w-1)(w+1)/(2*w)
* size of L1 = (w-1)/w

start with: ( (w-1)(w+1)/(2*w) ) / ( (w-1)/w )
simplify to: (w+1)/2

v3

Merges are triggered the same as for v1 but I assume that only w-1 merges are done from Ln to Ln+1 rather than w. Ln+1 won't be full at the end of that, for example L2 would have size w-1 rather than the expected size w. But I was curious about the math. The per-level write-amp is w/2.

* merges into L2 write output of size 1, 2, ..., w-1
* sum of that sequence -> (w-1)*w/2
* average value is sum/(w-1) -> w/2

1) Moving data of size 1 from L1 to L2 writes w/2 on average
2) Therefore per-level write-amp for L1 -> L2 is w/2

v4

Merges are triggered the same as for v2. But as with v3, only w-1 merges are done into a level. Again I don't think this is correct because a level won't have enough data to trigger compaction at that point. The per-level write-amp here is the same as for v3.

* merges into L2 write output of size (w-1)/w * [1, 2, ..., w-1]
* sum of that sequence -> (w-1)/w * (w-1)*w/2 = (w-1)(w-1)/2
* average value is sum/(w-1) -> (w-1)/2

As from v1, per-level write-amp is (avg merge output to Ln / size of Ln-1)
* avg merge output to L2 = (w-1)/2
* size of L1 = (w-1)/w

start with: ( (w-1)/2 ) / ( (w-1)/w )
simplify to: w/2



ProxySQL 1.4.13 and Updated proxysql-admin Tool

ProxySQL 1.4.13, released by ProxySQL, is now available for download in the Percona Repository along with an updated version of Percona’s proxysql-admin tool.

ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. René Cannaò created ProxySQL for DBAs as a means of solving complex replication topology issues.

The ProxySQL 1.4.13 source and binary packages available at https://percona.com/downloads/proxysql include ProxySQL Admin – a tool, developed by Percona to configure Percona XtraDB Cluster nodes into ProxySQL. Docker images for release 1.4.13 are available as well: https://hub.docker.com/r/percona/proxysql/. You can download the original ProxySQL from https://github.com/sysown/proxysql/releases. GitHub hosts the documentation in the wiki format.

Improvements
  • PSQLADM-53: Improved validation when --write-node is used with proxysql-admin
  • PSQLADM-122: galera/node monitor log now reports the count of async slave nodes that are online.
Bugs Fixed
  • PSQLADM-124: If the scheduler is configured with a –config-file that points to a file that doesn’t exist, the ERR_FILE was pointing to /dev/null. As a result, the user would not be notified about the error.
  • PSQLADM-126: proxysql-admincould show an error when --syncusers was used and and mysql_users table was empty.
  • PSQLADM-127: proxysql_galera_checker could corrupt the scheduler configuration after restart
  • PSQLADM-129: Stopping or restarting ProxySQL can lead to multiple instances of proxysql_galera_checker running at the same time

ProxySQL is available under Open Source license GPLv3.

How to grant privileges to users in MySQL 8.0

It seems, that this is a question that regularly shows up in forums or stackoverflow.

To start, let’s highlight the fact that in MySQL 8.0 it’s not any more possible to create a user directly from the GRANT command (ERROR 1410 (42000): You are not allowed to create a user with GRANT).

This means that to grant some privileges, the user must be created first.

Let’s create a user ‘user1‘ with ‘ChangeMe‘ as password that the user will have to change:

mysql> create user 'user1' identified by 'ChangeMe' password expire;
Query OK, 0 rows affected (1.35 sec)

Let’s try to connect to MySQL using that new created user:

$ mysql -u user1 -pChangeMe -h localhost
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 347
Server version: 8.0.13
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>

Nothing special, we are connected as expected… but didn’t I explicitly expired the password ?

Yes I did, let’s try any statement:

mysql> select now();
ERROR 1820 (HY000): You must reset your password using ALTER USER
statement before executing this statement.

We must change the password as expected. Let’s change it to ‘MySQL8isGreat‘:

mysql> set password='MySQL8isGreat';
Query OK, 0 rows affected (0.34 sec)

And now we can use MySQL and run any statement we are allowed to do (that we have the privileges for).

mysql> select now();
+---------------------+
| now() |
+---------------------+
| 2019-01-10 14:36:05 |
+---------------------+
1 row in set (0.00 sec)
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
+--------------------+
1 row in set (0.20 sec)

It seems I don’t have access to many databases…

The default privilege is very limited:

mysql> show grants;
+-----------------------------------+
| Grants for user1@% |
+-----------------------------------+
| GRANT USAGE ON . TO user1@% |
+-----------------------------------+
1 row in set (0.00 sec)

It’s now time to grant more privileges to our user… but which privileges are available ?

In 8.0.13, they are currently 46 privileges !

To list them all, just run:

mysql> show privileges;
+----------------------------+---------------------------------------+-------------------------------------------------------+
| Privilege | Context | Comment |
+----------------------------+---------------------------------------+-------------------------------------------------------+
| Alter | Tables | To alter the table |
| Alter routine | Functions,Procedures | To alter or drop stored functions/procedures |
| Create | Databases,Tables,Indexes | To create new databases and tables |
| Create routine | Databases | To use CREATE FUNCTION/PROCEDURE |
| Create role | Server Admin | To create new roles |
| Create temporary tables | Databases | To use CREATE TEMPORARY TABLE |
| Create view | Tables | To create new views |
| Create user | Server Admin | To create new users |
| Delete | Tables | To delete existing rows |
| Drop | Databases,Tables | To drop databases, tables, and views |
| Drop role | Server Admin | To drop roles |
| Event | Server Admin | To create, alter, drop and execute events |
| Execute | Functions,Procedures | To execute stored routines |
| File | File access on server | To read and write files on the server |
| Grant option | Databases,Tables,Functions,Procedures | To give to other users those privileges you possess |
| Index | Tables | To create or drop indexes |
| Insert | Tables | To insert data into tables |
| Lock tables | Databases | To use LOCK TABLES (together with SELECT privilege) |
| Process | Server Admin | To view the plain text of currently executing queries |
| Proxy | Server Admin | To make proxy user possible |
| References | Databases,Tables | To have references on tables |
| Reload | Server Admin | To reload or refresh tables, logs and privileges |
| Replication client | Server Admin | To ask where the slave or master servers are |
| Replication slave | Server Admin | To read binary log events from the master |
| Select | Tables | To retrieve rows from table |
| Show databases | Server Admin | To see all databases with SHOW DATABASES |
| Show view | Tables | To see views with SHOW CREATE VIEW |
| Shutdown | Server Admin | To shut down the server |
| Super | Server Admin | To use KILL thread, SET GLOBAL, CHANGE MASTER, etc. |
| Trigger | Tables | To use triggers |
| Create tablespace | Server Admin | To create/alter/drop tablespaces |
| Update | Tables | To update existing rows |
| Usage | Server Admin | No privileges - allow connect only |
| XA_RECOVER_ADMIN | Server Admin | |
| SET_USER_ID | Server Admin | |
| ROLE_ADMIN | Server Admin | |
| RESOURCE_GROUP_USER | Server Admin | |
| RESOURCE_GROUP_ADMIN | Server Admin | |
| BINLOG_ADMIN | Server Admin | |
| SYSTEM_VARIABLES_ADMIN | Server Admin | |
| GROUP_REPLICATION_ADMIN | Server Admin | |
| CONNECTION_ADMIN | Server Admin | |
| REPLICATION_SLAVE_ADMIN | Server Admin | |
| ENCRYPTION_KEY_ADMIN | Server Admin | |
| BACKUP_ADMIN | Server Admin | |
| PERSIST_RO_VARIABLES_ADMIN | Server Admin | |
+----------------------------+---------------------------------------+-------------------------------------------------------+
46 rows in set (0.00 sec)

You can see that a new user doesn’t have access to the test database anymore:
mysql> use test;
ERROR 1044 (42000): Access denied for user ‘user1’@’%’ to database ‘test’

Let’s allow our user to create tables in the database users1 that we created for him and also allow him to perform the following actions:

  • Alter
  • Create
  • Delete
  • Drop
  • Index
  • Insert
  • Select
  • Update
  • Trigger
  • Alter routine
  • Create routine
  • Execute
  • Create temporary tables
mysql> grant alter,create,delete,drop,index,insert,select,update,trigger,alter routine,
create routine, execute, create temporary tables on user1.* to 'user1';
Query OK, 0 rows affected (0.23 sec)

NO NEED TO RUN FLUSH PRIVILEGES !

And in the open session for user1, we can check the granted privileges:

mysql> show grants\G
******************** 1. row ********************
Grants for user1@%: GRANT USAGE ON . TO user1@%
******************** 2. row ********************
Grants for user1@%: GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP,
INDEX, ALTER, CREATE TEMPORARY TABLES, EXECUTE, CREATE ROUTINE, ALTER
ROUTINE, TRIGGER ON user1.* TO user1@%
2 rows in set (0.00 sec)

Now let’s imagine we want to have multiple users that will have access to the same database (mydatabase), instead of specifying all the grants for each users, let’s use a common role for all of them. We will call it ‘developer_user‘:

mysql> create ROLE developer_users;
mysql> grant alter,create,delete,drop,index,insert,select,update,trigger,alter
routine,create routine, execute, create temporary tables
on mydatabase.* to 'developer_user';
Query OK, 0 rows affected (0.12 sec)

Let’s grant the role to user1:

mysql> grant 'developer_user' to 'user1';
Query OK, 0 rows affected (0.16 sec)

Now back again in user1‘s session and let’s verify:

mysql> SELECT CURRENT_ROLE();
+----------------+
| CURRENT_ROLE() |
+----------------+
| NONE |
+----------------+
1 row in set (0.00 sec)

mysql> set role 'developer_user';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT CURRENT_ROLE();
+----------------------+
| CURRENT_ROLE() |
+----------------------+
| developer_user@% |
+----------------------+
1 row in set (0.00 sec)

mysql> show grants\G
******************** 1. row ********************
Grants for user1@%: GRANT USAGE ON . TO user1@%
******************** 2. row ********************
Grants for user1@%: GRANT SELECT, INSERT, UPDATE, DELETE, CREATE,
DROP, INDEX, ALTER, CREATE TEMPORARY TABLES, EXECUTE,
CREATE ROUTINE, ALTER ROUTINE, TRIGGER ON mydatabase.* TO user1@%
******************** 3. row ********************
Grants for user1@%: GRANT SELECT, INSERT, UPDATE, DELETE, CREATE,
DROP, INDEX, ALTER, CREATE TEMPORARY TABLES, EXECUTE,
CREATE ROUTINE, ALTER ROUTINE, TRIGGER ON user1.* TO user1@%
******************** 4. row ********************
Grants for user1@%: GRANT developer_user@% TO user1@%
4 rows in set (0.00 sec)

Now we would like that every time user1 logs into MySQL, his new role will be set:

mysql> set default role 'developer_user' to 'user1';
Query OK, 0 rows affected (0.22 sec)

Let’s also create a user2 having the default role:

mysql> create user 'user2' identified by 'DontChangeMe' default role 'developer_user';
Query OK, 0 rows affected (0.18 sec)

And we can immediately test it:

$ mysql -u user2 -pDontChangeMe -h localhost
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 352
Server version: 8.0.13 MySQL Community Server - GPL
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show grants\G
******************** 1. row ********************
Grants for user2@%: GRANT USAGE ON . TO user2@%
******************** 2. row ********************
Grants for user2@%: GRANT SELECT, INSERT, UPDATE, DELETE, CREATE,
DROP, INDEX, ALTER, CREATE TEMPORARY TABLES, EXECUTE,
CREATE ROUTINE, ALTER ROUTINE, TRIGGER ON mydatabase.* TO user2@%
******************** 3. row ********************
Grants for user2@%: GRANT developer_user@% TO user2@%
3 rows in set (0.18 sec) Summary

In summary, now in MySQL 8.0 you cannot create a user from GRANT, you don’t need to run FLUSH PRIVILEGES command (this is effective for a long time already, please forget about it !), you can use ROLES and you have more password management options.

LSM math: revisiting the number of levels that minimizes write amplification

I previously used math to explain the number of levels that minimizes write amplification for an LSM tree with leveled compaction. My answer was one of ceil(ln(T)) or floor(ln(T)) assuming the LSM tree has total fanout = T where T is size(database) / size(memtable).

Then I heard from a coworker that the real answer is less than floor(ln(T)). Then I heard from Niv Dayan, first author of the Dostoevsky paper, that the real answer is larger than ceil(ln(T)) and the optimal per-level growth factor is ~2 rather than ~e.

All of our answers are correct. We have different answers because we use different functions to estimate the per-level write-amp. The graph of the functions for total write-amp using the different cost functions is here and you can see that the knee in the curve occurs at a different x value for two of the curves and the third curve doesn't appear to have a minimum.

While working on this I learned to love the Lambert W function. But I wonder whether I made the math below for LWA-2 harder than necessary. I am happy to be corrected. I appreciate the excellent advice on Quora: here, here and here. The online graphing calculator Desmos is another great resource.

Math

I use differentiable functions to express the total write-amp as a function of the number of levels, then determine the value (number of levels) at which the first derivative is zero as that might be the global minimum. Constants, variables and functions below include:
  • T - total fanout, = size(database) / size(memtable)
  • n - number of levels in the LSM tree
  • LWA, LWA-x - function for the per-level write-amp
  • TWA, TWA-x - function for the total write-amp, = n * LWA
  • w - per-level growth factor, = T^(1/n) for all levels to minimize write-amp
The function for total write-amp has the form: TWA = n * LWA where n is the number of levels and LWA is the per-level write-amp. LWA is a function of T and n. The goal is determine the value of n at which TWA is minimized. While n must be an integer the math here doesn't enforce that and the result should be rounded up or down to an integer. T is a constant as I assume a given value for total fanout. Here I use T=1024.

I wrote above that the 3 different answers came from using 3 different estimates for the per-level write-amp and I label these LWA-1, LWA-2 and LWA-3. When w is the per-level growth factor then the per-level write-amp functions are:
  • LWA-1 = w -- I used this to find that the best n = ceil(ln(T)) or floor(ln(T))
  • LWA-2 = w + 1 -- with this the best n is less than that found with LWA-1
  • LWA-3 = (w - 1) / 2 -- with this the best n is greater than that found with LWA-1
I can also state the per-level write-amp functions directly with T and n. I didn't above to make it easier to see the differences.
  • LWA-1 = T^(1/n)
  • LWA-2 = T^(1/n) + 1
  • LWA-3 = (T^(1/n) - 1) / 2
Explaining LWA

First I explain LWA-1 and LWA-2. Compacting 1 SST from Ln to Ln+1 requires merging 1 SST from Ln with ~w SSTs from Ln+1 where w=10 by default with RocksDB. The output will be between w and w+1 SSTs. If the output is closer to w then LWA-1 is correct. If the output is closer to w+1 then LWA-2 is correct. This paper explains why the per level write-amp is likely to be less than w. Were I to use f*w where f < 1 for LWA-1 then the math still holds. Maybe that is a future blog post.

LWA-3 assumes that all-to-all compaction is used rather than some-to-some. I explain the difference here. RocksDB/LevelDB leveled uses some-to-some but all-to-all is interesting. With all-to-all when compaction from Ln to Ln+1 finishes then Ln is empty and slowly gets full after each merge into it. Assume the per-level growth factor is w and Ln-1, Ln and Ln+1 are full at sizes 1, w and w*w. Then Ln becomes full after w merges from Ln-1 and those write output of size 1, 2, ..., w-1, w. The sum of the first w integers is w(w+1)/2. Divide this by w to get the averge -- (w+1)/2. However above LWA-3 is (w-1)/2 not (w+1)/2. I will explain that in another blog post. Note that in LWA-3 the numerator, w-1, is more interesting than the denominator, 2. Dividing by any constant doesn't change where the minimum occurs assuming there is a minimum and that is visible on this graph that shows the impact of dividing by 2 on the total write-amp.

Read on to understand the impact of using w-1, w or w+1 as the function for per-level write-amp. The difference might be more significant than you expect. It surprised me.

Minimizing TWA

This graph shows the total write-amp for LWA-1, LWA-2 and LWA-3. I call the total write-amp TWA-1, TWA-2 and TWA-3. Two of the curves, for TWA-1 and TWA-2, appear to have a minimum. One occurs for x between 4 and 6, the other for x between 6 and 8. The third curve, for TWA-3, doesn't appear to have a minimum and is decreasing as x (number of levels) grows.

The next graph uses the first derivative for the total write-amp functions, so it is for TWA-1', TWA-2' and TWA-3'. A global minimum for TWA-x can occur when TWA-x' = 0 and from the graph TWA-1'=0 when x=6.931 and TWA-2'=0 when x=5.422 which matches the estimate from the previous paragraph. From the graph it appears that TWA-3' approaches zero as x gets large but is never equal to zero.

The next step is to use math to confirm what is visible on the graphs.

Min write-amp for LWA-1

See my previous post where I show that n = ln(T) minimizes total write-amp if n isn't limited to an integer and then the per-level growth factor is e. Since the number of levels must be an integer then one of ceil(ln(T)) or floor(ln(T)) minimized total write-amp.

Min write-amp for LWA-2

I can reuse some of the math from my previous post. But this one is harder to solve.

# wa is the total write-amp
# n is the number of levels
# t is the total fanout
wa = n * ( t^(1/n) + 1 )
wa = n*t^(1/n) + n

# the difference between this and the previous post is '+1'
wa' = t^(1/n) + n * ln(t) * t^(1/n) * (-1) * (1/n^2) + 1
wa' = t^(1/n) - (1/n) * ln(t) * t^(1/n) + 1

At this point the difference between this and the previous post is '+1'. But wait this starts to get interesting.
# critical point for this occurs when wa' = 0
t^(1/n) - (1/n) * ln(t) * t^(1/n) + 1 = 0

# multiply by t^(-1/n)
1 - (1/n) * ln(t) + t^(-1/n) = 0

# move some terms to RHS
t^(-1/n) = (1/n) ln(t) - 1
# use ln on LHS and RHS to get rid of '^(1/n)'
ln ( t^(-1/n) ) = ln( (1/n) * ln(t) - 1 )
(-1/n) ln(t) =  ln( (1/n) * ln(t) - 1

I got stuck here but eventually made progress.

# let a = (1/n) ln(t) and rewrite
-a = ln(a - 1)

# let x=a-1, a=x+1 and rewrite
-(x+1) = ln(x)

# do e^LHS = e^RHS
e^-(x+1) = e^ln(x)
e^-x * e^-1 = x

# multiply LHS and RHS by e^x
e^-1 = e^x * x

# e^-1 -> (1/e)
(1/e)  =  e^x * x

At last I can use Lambert W function!

# Given: e^x * x = K, then x = W(K)
x = W(e^-1) ~= 0.27846

# because a=x+1
a ~= 1.27846

# a = (1/n) ln(t) -> n = (1/a) ln(t), t=1024
n = 1/1.27846 * ln(1024)

# The value for n that minimizes total write-amp
# from the graph I claimed that n=5.422. this is close
n = 5.4217

Min write-amp for LWA-3
Update-1 - I think I made a few mistakes here. So you can stop reading until update-2 arrives.

Update-2 - this post explains my mistake and uses math to estimate that per-level write-amp = (w+1)/2 when all-to-all compaction is used. I am still unable to derive (w-1)/2.

I started to work on this without paying attention to the curve for LWA-3'. From the graph it appears to converge to 0 but is always less than 0, TWA-3 is decreasing as x, number of levels, gets large. Therefore make the number of levels as large as possible, 2M or 2B, to minimize total write-amp as visible in this graph.

But more levels in the LSM tree comes at a cost -- more read-amp. And the reduction in write-amp is small when the number of levels increases from 20 to 200 to 2000 to 2M. Again, this is visible in the graph. Besides, if you really want less write-amp then use tiered compaction rather than leveled with too many levels.

The other consideration is the minimal per-level growth factor that should be allowed. If the min per-level growth factor is 2. Then then that occurs when the number of levels, n, is:

# assume total fanout is 1024
2^n = 1024
log2(2^n) = log2(1024)
n = log2(1024) = 10

Alas the total fanout isn't always a power of 2. Given that the number of levels must be an integer then the goal is to use the smallest number of levels such that the per-level growth factor >= 2. Therefore when x isn't limited to an integer there is no answer -- just make x as large as possible (1M, 1B, etc) in which case the per-level growth factor converges to 1 but is always greater than 1.

The above can be repeated where the constraint is either the max number of levels or a different value for the min per-level growth factor (either <2 or >2). Regardless, if LWA-3 is the cost function then total write-amp is minimized by using as many levels as possible subject to these constraints.

Below is some math for LWA-3 and LWA-3'.

# wa is the total write-amp
# n is the number of levels
# t is the total fanoutwa = n * ( t^(1/n) - 1 ) / 2
wa = (n*t^(1/n) - n ) / 2

# the big difference between this and the previous post is '+1'
wa' = [ t^(1/n) + n * ln(t) * t^(1/n) * (-1) * (1/n^2) - 1 ] / 2
wa' = [ t^(1/n) - (1/n) * ln(t) * t^(1/n) - 1 ] / 2

# determine when wa' = 0[ t^(1/n) - (1/n) * ln(t) * t^(1/n) - 1 ] / 2 = 0

# multiply LHS and RHS by 2t^(1/n) - (1/n) * ln(t) * t^(1/n) - 1 = 0# multiply LHS and RHS by t^(-1/n)
1 - (1/n) * ln(t) - t^(-1/n) = 0

# move last term to RHS
1 - (1/n) * ln(t) = t^(-1/n)

# probably a good idea to stop here
# LHS is likely to be <0 so can't use ln(LHS) = ln(RHS)

Date arithmetic with Julian days, BC dates, and Oracle rules

Here are routines that can handle date arithmetic on BC dates, Julian day functions, and simulation of Oracle's support of old-style-calendar dates -- including simulation of an Oracle bug. So the routines are good for extending the range of useable dates, compact storage, and import/export between DBMSs that have different rules.

If you need to refresh your understanding of dates, read our old-but-lovely article first: The Oracle Calendar.

I wrote the main routines with standard SQL so they should run on
any DBMS that supports the standard, but tested only with
MySQL and MariaDB.


ocelot_date_to_julianday
Return number of days since 4713-01-01, given yyyy-mm-dd [BC] date
ocelot_date_validate
Return okay or error, given yyyy-mm-dd BC|AD date which may be invalid
ocelot_date_datediff
Return number of days difference, given two yyyy-mm-dd [BC] dates
ocelot_date_test
Return 'OK' after a thorough test of the entire range of dates

All function arguments look like this:
yyyy-mm-dd [BC] ... CHAR|VARCHAR. yyyy-mm-dd is the standard date format for year and month and date, optionally followed by a space and 'BC'. If 'BC' is missing, 'AD' is assumed. Must be between 4713-01-01 BC and 9999-12-31 for Julian-calendar dates, or between 4714-11-24 BC and 9999-12-31 for Gregorian-calendar dates. Routines will return bad results if dates are invalid, if there is any doubt then run ocelot_date_validate() first.
julian_day ... INTEGER. For an explanation of what a "Julian day number" is, see Wikipedia. Do not confuse with "Julian-calendar date" -- the name is similar but Julian days can be converted to or from dates in the Gregorian calendar too. Must be between 0 (which is 4713-01-01 BC) and a maximum (which is 9999-12-31).
'J' or 'G' or 'O' ... CHAR. This is an "options" flag. 'J' means use the Julian (old style) calendar. 'G' means use the Gregorian (new style) calendar.'O' means use the Oracle rules, which we described in the earlier article. If options is not 'J' or 'G' or 'O', 'G' is assumed.



Example expressions:
#1 ocelot_date_to_julianday('0001-01-01','G') returns 1721426
#2 ocelot_date_to_julianday('0001-01-01','J') returns 1721424
#3 ocelot_date_to_julianday('4712-01-01 BC', 'O') returns 0
#4 ocelot_date_datediff('0001-01-01','0001-01-01 BC','G') returns 366
#5 ocelot_date_to_julianday('1492-10-12','J')%7; returns 4
/* Explanations: #3 returns 0 because there's a year 0000,
#4 returns 366 because 0001 BC is a leap year,
#5 returns weekday = 4 for the original Columbus Day
and he used a Julian calendar. */

The source code

The code is original but the general idea is not -- I gratefully acknowledge Peter Baum's 1998 article "Date Algorithms".

I use the Ocelot GUI (ocelotgui) when I write routines for MySQL/MariaDB. Since it recognizes all their syntax quirks it can give me hints when I'm typing something wrong, and saves me from the hassles of "delimiter". And it has a debugger. Version 1.0.8 was released yesterday for download via github.

I start with a standard 2-clause BSD license and then show the CREATE statements for each routine. To install: just cut-and-paste what follows this paragraph until the end of this section. If you are not using ocelotgui you will have to say DELIMITER // and put // at the end of each CREATE statement.

/*
Copyright (c) 2019 Ocelot Computer Services Inc.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

*/

/*
ocelot_date_to_julianday(yyyy-mm-dd[ BC], J|G|O) Return number of days
------------------------
If J: will return 0 for '4713-01-01 BC', all calculations use Julian calendar
If G: will return 0 for '4714-11-24 BC', all calculations use Gregorian calendar
If O: will return 0 for '4712-01-01 BC', switch between calendars after 1582-10-04
*/
CREATE FUNCTION ocelot_date_to_julianday(in_date VARCHAR(25), options CHAR(1)) RETURNS DECIMAL(8)
LANGUAGE SQL DETERMINISTIC CONTAINS SQL
BEGIN
DECLARE year, month, day, century, leap INT;
DECLARE jd DOUBLE PRECISION;
DECLARE bc_as_char CHAR(2);
SET year = CAST(SUBSTRING(in_date FROM 1 FOR 4) AS DECIMAL(8));
SET month = CAST(SUBSTRING(in_date FROM 6 FOR 2) AS DECIMAL(8));
SET day = CAST(SUBSTRING(in_date FROM 9 FOR 2) AS DECIMAL(8));
SET bc_as_char = SUBSTRING(in_date FROM CHAR_LENGTH(in_date) - 1 FOR 2);
IF bc_as_char = 'BC' THEN
IF options = 'O' THEN SET year = 0 - year;
ELSE SET year = (0 - year) + 1; END IF;
END IF;
IF month = '1582-10-05' AND bc_as_char 'BC') THEN
SET century = FLOOR(year / 100.0);
SET leap = 2 - century + FLOOR(century / 4.0);
ELSE
SET leap = 0;
END IF;
SET jd = FLOOR(365.25 * (year + 4716)) + FLOOR(30.6001 * (month + 1)) + day + leap - 1524;
RETURN CAST(jd AS DECIMAL(8));
END;

/*
ocelot_date_validate (yyyy-mm-dd[ BC] date, J|G|O) Return 'OK' or 'Error ...'
--------------------
Possible errors:
Format of first parameter is not 'yyyy-mm-dd' or 'yyyy-mm-dd BC'.
Second parameter is not 'J' or 'G' or 'O'.
Minimum date = 4713-01-01 BC if J, 4712-01-01 BC if O, 4714-11-14 BC if G.
Maximum date = 9999-12-31.
If 'O': 0001-mm-dd BC, or between 1582-10-05 and 1582-10-14.
nnnn-02-29 if nnnn is not a leap year.
Month not between 1 and 12.
Day not between 1 and maximum for month.
Otherwise return 'OK'.
*/
CREATE FUNCTION ocelot_date_validate(in_date VARCHAR(25), options CHAR(1)) RETURNS VARCHAR(50)
LANGUAGE SQL DETERMINISTIC CONTAINS SQL
BEGIN
DECLARE year, month, day, leap_days DECIMAL(8);
DECLARE bc_or_ad VARCHAR(3) DEFAULT '';
IF options IS NULL
OR (options 'J' AND options 'G' AND options 'O') THEN
RETURN 'Error, Options must be J or G or O';
END IF;
IF in_date IS NULL
OR (CHAR_LENGTH(in_date) 10 AND CHAR_LENGTH(in_date) 13)
OR SUBSTRING(in_date FROM 1 FOR 1) NOT BETWEEN '0' AND '9'
OR SUBSTRING(in_date FROM 2 FOR 1) NOT BETWEEN '0' AND '9'
OR SUBSTRING(in_date FROM 3 FOR 1) NOT BETWEEN '0' AND '9'
OR SUBSTRING(in_date FROM 4 FOR 1) NOT BETWEEN '0' AND '9'
OR SUBSTRING(in_date FROM 5 FOR 1) '-'
OR SUBSTRING(in_date FROM 6 FOR 1) NOT BETWEEN '0' AND '9'
OR SUBSTRING(in_date FROM 7 FOR 1) NOT BETWEEN '0' AND '9'
OR SUBSTRING(in_date FROM 8 FOR 1) '-'
OR SUBSTRING(in_date FROM 9 FOR 1) NOT BETWEEN '0' AND '9'
OR SUBSTRING(in_date FROM 10 FOR 1) NOT BETWEEN '0' AND '9' THEN
RETURN 'Error, Date format is not nnnn-nn-nn';
END IF;
IF CHAR_LENGTH(in_date) = 13 THEN
SET bc_or_ad = SUBSTRING(in_date FROM 11 FOR 3);
IF bc_or_ad ' BC' THEN
RETURN 'Error, only space + BC is allowed after yyyy-mm-dd';
END IF;
END IF;
SET year = CAST(SUBSTRING(in_date FROM 1 FOR 4) AS DECIMAL(8));
SET month = CAST(SUBSTRING(in_date FROM 6 FOR 2) AS DECIMAL(8));
SET day = CAST(SUBSTRING(in_date FROM 9 FOR 2) AS DECIMAL(8));
IF year = 0 THEN
RETURN 'Error, year 0';
END IF;
IF bc_or_ad = ' BC' THEN
IF options = 'J' AND year > 4713 THEN
RETURN 'Error, minimum date = 4713-01-01 BC';
END IF;
IF options = 'O' AND year > 4712 THEN
RETURN 'Error, minimum date = 4712-01-01 BC';
END IF;
IF OPTIONS = 'G' THEN
IF year > 4714
OR (year = 4714 AND month 12 THEN RETURN 'Error, month not between 1 and 12'; END IF;
SET leap_days = 0;
IF month = 2 AND day = 29 THEN
IF bc_or_ad = ' BC' AND options 'O' THEN SET year = year - 1; END IF;
IF year % 4 = 0 THEN
IF options = 'J' OR (options = 'O' AND (bc_or_ad = ' BC' OR SUBSTRING(in_date FROM 1 FOR 10) 0 OR year % 400 = 0 THEN
SET leap_days = 1;
END IF;
END IF;
END IF;
IF leap_days = 0 THEN RETURN 'Error, February 29, not a leap year'; END IF;
END IF;
IF month = 1 AND day > 31
OR month = 2 AND day - leap_days > 28
OR month = 3 AND day > 31
OR month = 4 AND day > 30
OR month = 5 AND day > 31
OR month = 6 AND day > 30
OR month = 7 AND day > 31
OR month = 8 AND day > 31
OR month = 9 AND day > 30
OR month = 10 AND day > 31
OR month = 11 AND day > 30
OR month = 12 AND day > 31 THEN
RETURN 'Error, day > maximum day in mnth';
END IF;
IF options = 'O'
AND bc_or_ad ' BC'
AND SUBSTRING(in_date FROM 1 FOR 10) BETWEEN '1582-10-05' AND '1582-10-14' THEN
RETURN 'Error, Date during Julian-to-Gregorian cutover';
END IF;
RETURN 'OK';
END;

/*
ocelot_date_datediff(date, date, J|G|O) Return number of days between two dates
--------------------
Results for positive Gregorian will be the same as MySQL/MariaDB datediff().
This is an extension of datediff() which works with BC Gregorian and other calendars.
Mostly it's just to show how easily a routine can be written if there is a
Julian-day function.
*/
CREATE FUNCTION ocelot_date_datediff(date_1 VARCHAR(25), date_2 VARCHAR(25), options CHAR(1)) RETURNS INT
LANGUAGE SQL DETERMINISTIC CONTAINS SQL
RETURN ocelot_date_to_julianday(date_1, options) - ocelot_date_to_julianday(date_2, options);

/*
ocelot_date_test(J|G|O) Test that all legal dates have the correct Julian day
----------------
You only need to run this once. The Julian day routine looks bizarre so this
test is here to give assurance that the ocelot_date_to_julianday function is okay.
Start with a counter integer = 0 and a yyyy-mm-dd BC date = the minimum for the calendar.
For each iteration of the loop, increment the counter and increment the date,
call ocelot_date_to_julianday and check that it returns a value equal to the counter.
Stop when date is 9999-12-31.
For Oracle emulation we do not check dates which are invalid due to cutover or bugs.
Bonus test: positive Gregorian dates must match MySQL|MariaDB datediff results.
Bonus test: check validity of each incremented date.
*/
CREATE FUNCTION ocelot_date_test(options CHAR(1)) RETURNS CHAR(50)
LANGUAGE SQL DETERMINISTIC CONTAINS SQL
BEGIN
DECLARE tmp VARCHAR(25);
DECLARE tmp_validity VARCHAR(50);
DECLARE year_as_char, month_as_char, day_as_char VARCHAR(25);
DECLARE year_as_int, month_as_int, day_as_int DECIMAL(8);
DECLARE ju, ju2 INT;
DECLARE bc_as_char VARCHAR(3) DEFAULT '';
DECLARE is_leap INT DEFAULT 1;
IF options = 'J' THEN
SET ju = 0; SET tmp = '4713-01-01 BC'; SET bc_as_char = ' BC'; SET is_leap = 1;
END IF;
IF options = 'G' THEN
SET ju = 0; SET tmp = '4714-11-24 BC'; SET bc_as_char = ' BC'; SET is_leap = 0;
END IF;
IF options = 'O' THEN
SET ju = 0; SET tmp = '4712-01-01 BC'; SET bc_as_char = ' BC'; SET is_leap = 1;
END IF;
WHILE tmp '10000-01-01' DO
IF options 'O'
OR SUBSTRING(tmp FROM 1 FOR 4) '0000'
OR bc_as_char ' BC' THEN
SET tmp_validity = ocelot_date_validate(tmp, options);
IF tmp_validity 'OK' THEN RETURN tmp_validity; END IF;
END IF;
SET ju2 = ocelot_date_to_julianday(tmp, options);
IF ju2 ju OR ju2 IS NULL THEN RETURN CONCAT('Fail ', tmp); END IF;

IF options = 'G' and bc_as_char ' BC' THEN
IF ju2 - 1721426 DATEDIFF(tmp,'0001-01-01') THEN
RETURN CONCAT('Difference from datediff() ', tmp);
END IF;
END IF;
SET year_as_char = SUBSTRING(tmp FROM 1 FOR 4);
SET month_as_char = SUBSTRING(tmp FROM 6 FOR 2);
SET day_as_char = SUBSTRING(tmp FROM 9 FOR 2);
SET year_as_int = CAST(year_as_char AS DECIMAL(8));
SET month_as_int = CAST(month_as_char AS DECIMAL(8));
SET day_as_int = CAST(day_as_char AS DECIMAL(8));
/* Increase day */
SET day_as_int = day_as_int + 1;
IF options = 'O' AND year_as_int = 1582 AND month_as_int = 10 AND day_as_int = 5 AND bc_as_char ' BC' THEN
SET day_as_int = day_as_int + 10;
END IF;
IF month_as_int = 1 AND day_as_int > 31
OR month_as_int = 2 AND day_as_int - is_leap > 28
OR month_as_int = 3 AND day_as_int > 31
OR month_as_int = 4 AND day_as_int > 30
OR month_as_int = 5 AND day_as_int > 31
OR month_as_int = 6 AND day_as_int > 30
OR month_as_int = 7 AND day_as_int > 31
OR month_as_int = 8 AND day_as_int > 31
OR month_as_int = 9 AND day_as_int > 30
OR month_as_int = 10 AND day_as_int > 31
OR month_as_int = 11 AND day_as_int > 30
OR month_as_int = 12 AND day_as_int > 31 THEN
/* Increase month */
SET day_as_int = 1;
SET month_as_int = month_as_int + 1;
IF month_as_int > 12 THEN
/* Increase year */
SET month_as_int = 1;
IF bc_as_char = ' BC' THEN SET year_as_int = year_as_int - 1;
ELSE SET year_as_int = year_as_int + 1; END IF;
IF (year_as_int = 0 AND (options = 'J' OR options = 'G'))
OR (year_as_int =-1 AND options = 'O') THEN
SET year_as_int = 1;
SET bc_as_char = '';
SET is_leap = 0;
END IF;
/* Recalculate is_leap */
BEGIN
DECLARE divisible_year_as_int INT;
SET divisible_year_as_int = year_as_int;
IF bc_as_char ' BC' OR options = 'O' THEN
SET divisible_year_as_int = year_as_int;
ELSE
SET divisible_year_as_int = year_as_int - 1;
END IF;
SET is_leap = 0;
IF divisible_year_as_int % 4 = 0 THEN
SET is_leap = 1;
IF options = 'G'
OR (options = 'O' AND bc_as_char ' BC' AND year_as_int > 1582) THEN
IF divisible_year_as_int % 100 = 0
AND divisible_year_as_int % 400 0 THEN
SET is_leap = 0;
END IF;
END IF;
END IF;
END;
END IF;
END IF;
SET day_as_char = CAST(day_as_int AS CHAR);
IF LENGTH(day_as_char) = 1 THEN SET day_as_char = CONCAT('0', day_as_char); END IF;
SET month_as_char = CAST(month_as_int AS CHAR);
IF LENGTH(month_as_char) = 1 THEN SET month_as_char = CONCAT('0', month_as_char); END IF;
SET year_as_char = CAST(year_as_int AS CHAR);
WHILE LENGTH(year_as_char)

Belgium! PHP Benelux, Pre-FOSDEM MySQL Day, and FOSDEM

I am getting ready to hit the road for the first time in 2019.  And the first stop is Belgium.

PHP Benelux is the first stop January 25 & 26, 2019 in Antwerp.  This is my firs time to this show but I have heard others rave about it for years and I am looking for to this show.

Next is an event on February 1st that is quickly filling up. The Pre-FOSDEM MySQL Day   Come hear session from the MySQL Engineering Teams, some other great technical speakers, and talk to them!  Great event but please reserve your space ASAP as this is a very popular event!

◾10:00–10:30 MySQL Shell – A devops-engineer day with MySQL’s development and administration tool by Miguel Araújo ◾10:35–11:05 MySQL Shell : The best DBA tool? by Frédéric Descamps ◾11:25–11:55 Tuning MySQL 8.0 InnoDB for high load by Dimitri Kravtchuk ◾13:30–14:00 New index features in MySQL 8.0 by Erik Frøseth ◾14:40–15:10 Regular expressions with full Unicode support by Martin Hansson ◾16:50–17:20 8 Group replication features that will make you smile by Tiago Vale ◾17:25–17:55 Document Store & PHP by David Stokes
And finally on Feb 2nd - 3rd is the chaotic FOSDEM.,  No pre-registration but we have a small room that gets packed quickly. And drop by the MySQL booth to talk MySQL and get some SWAG.

MySQL examples in Node.js

If you’re integrating your Node.js service with MySQL, you probably want to execute queries.

I’ve seen a fair bit of ugly code to do this, often using callbacks. I thought I would share some of my own patterns here. Maybe this is useful to others to read.

I don’t typically use ORM’s. If you’re already happy with sequalize, this article is probably not for you.

Prequisites

The most popular NPM package for MySQL is mysql, but I use the mysql2 package instead.

The mysql2 package is also very popular, but has a few advantages. The advantage I care about is support for promises. The authors of this package have actually teamed up with the authors of the mysql package to reduce double efforts, and was made to be compatible with mysql so for many people it’s a drop-in replacement.

Creating a pool const mysql = require('mysql2/promise'); const pool = mysql.createPool({ host: 'localhost', user: 'root', database: 'test', waitForConnections: true, connectionLimit: 10, queueLimit: 0 });

A pool manages multiple connections. I create a pool once per node.js server.

Note: if you are migrating from the mysql to mysql2 package, you probably don’t want to require mysql2/promise. You can instead require mysql2 to get full backwards compatibility and use the promise() functions to get access to promisified versions of the APIs.

Note2: mysql2 uses the utf8mb4 charset by default. If you don’t know what this means trust that it’s the right choice. If you do know what this is, rejoice that somebody finally picked a sane default.

Executing a SELECT query

Generally I don’t need transactions for SELECT queries, because of this I can simply ask the pool to execute the query.

async function getBlogPost(id) { const result = await pool.query('SELECT * from posts WHERE id = ?', [id]); if (!result[0].length < 1) { throw new Error('Post with this id was not found'); } return result[0][0]; }

result[0][0] looks a bit weird, this is because the result of the query returns a tuple (array) with 2 items. The first item has the result of the query, and the second has the meta data. This subtle decision is my #1 complaint about the library because it makes a lot of things slightly uglier than they need to be.

So if we want just the first record of the result, this would you need to use result[0][0].

Whenever I write a function that should return exactly 1 item, I will either return an item or throw an error. I don’t return undefined or null.

A SELECT query that returns multiple records is more elegant:

async function getAllBlogPost() { const result = await pool.query('SELECT * from posts'); return result[0]; }

Whenever I write a function that returns 0 or more items, this function always returns an array with 0 or more items. If the collection is empty, I return an empty array.

Note: sometimes there is a distinction between an empty collection or a collection not existing. If that distinction exists, I do throw an error in the latter case.

Executing an INSERT query

Generally when doing INSERT queries, most people use the following syntax:

INSERT INTO posts (title, body) VALUES (?, ?)

MySQL also has a second system for inserting that is less popular, and looks more like an UPDATE statement:

INSERT INTO posts SET title = ?, body = ?

The second syntax is the one I use. A big advantage is that the ‘value’ is close to the name of the field. If you ever had to count questionmarks, you know why this is nice.

Naively you can execute this query as follows:

async function insertPost(title, body) { await pool.query( 'INSERT INTO posts SET title = ?, body = ?', [ title, body ] ); }

But there is a nicer way to do this:

async function insertPost(title, body) { await pool.query( 'INSERT INTO posts SET ?', { title, body } ); }

If you are used to MySQL prepared statements, you might wonder why does this work?

The reason is that the placeholder ? gets special treatement when you pass objects or arrays to it.

Specifically, if you pass an array like this:

['foo', 'bar']

It expands to

'foo', 'bar'

And objects such as this:

{ foo: 'bar', gaz: 'zim' }

Expand to:

`foo` = 'bar', `gaz` = 'zim`

Here’s another neat example of this:

async function getByCategoryIds(ids) { // Pretends ids is [1, 2] const result = await pool.query( 'SELECT * from posts WHERE category_id IN (?)', [ids], ); return result[0]; }

This actually works. If you are well versed with how MySQL works you might scream at this point: “This doesn’t use real prepared statements”, and you would be right.

Prepared statements

Both the mysql and mysql2 package by default emulate prepared statements client-side. A lot of people feel that this is a really bad security practice.

I disagree, and might write about that in another article at one point. Ask most security experts though and they’ll tell you this is bad.

To do a real prepared statements, you have to use the execute() function:

async function insertPost(title, body) { await pool.execute( 'INSERT INTO posts SET title = ?, body = ?', [ title, body ] ); }

This uses real MySQL prepared statements, but unfortunately doesn’t let you work with arrays and objects.

Running multiple queries on a single connection

Every time you call .query() or .execute(), you might get a new connection from the pool. Sometimes it’s a good idea to use the same connection if you do multiple queries.

One reason might be that you have multiple replicated MySQL servers and you want to ensure that you’re running the queries on the same machine with the same state.

To do this, you need to grab a connection from the pool, and release it once you’re done.

The ideal pattern for me looks like this:

async function batchingThings() { const connection = await pool.getConnection(); try { await connection.query('...'); await connection.query('...'); } finally { connection.release(); } }

Remember that if you run multiple queries on a single connection they must be serialized. They must not be parallel.

Transactions

If you run multiple queries that change the state of the database, it’s often a very good idea to have them all succeed or fail as a group.

Transactions are used for that. After a transaction has started its possible to roll back every query since the start of the transaction.

The pattern I use for this is similar to the last, but a bit more complex:

async function batchingThings() { const connection = await pool.getConnection(); await connection.beginTransaction(); try { await connection.query('...'); await connection.query('...'); await connection.commit(); } catch (err) { await connection.rollback(); // Throw the error again so others can catch it. throw err; } finally { connection.release(); } }

If my transaction code becomes too complex and I need to split it up over multiple functions, I pass the connection object around as the argument.

Typescript everything

I wanted my examples to be accessible, but all my code is written in Typescript. Every MySQL table row has its own type and I’m as strict as I can.

This is a really good idea. If there’s interest, I can write another post with typescript and mysql2 examples.

Questions / comments?
  1. You can reply to this tweet to automatically see your response here.
  2. If you’re a dev, you can also send a pull request and edit in your comment in this article.
I disagree with this article because you're a bad person But also feel free to be creative! -->

Percona Toolkit 3.0.13 Is Now Available

Percona announces the release of Percona Toolkit 3.0.13 for January 9, 2019.

Percona Toolkit is a collection of advanced open source command-line tools, developed and used by the Percona technical staff, that are engineered to perform a variety of MySQL®, MongoDB® and system tasks that are too difficult or complex to perform manually. With over 1,000,000 downloads, Percona Toolkit supports Percona Server for MySQL, MySQL®, MariaDB®, Percona Server for MongoDB and MongoDB.

Percona Toolkit, like all Percona software, is free and open source. You can download packages from the website or install from official repositories.

This release includes the following changes:

Bug fixes:
  • PT-1673: pt-show-grants was incompatible with MariaDB 10+ (thanks Tim Birkett)
  • PT-1638: pt-online-schema-change was erroneously taking MariaDB 10.x for MySQL 8.0 and rejecting to work with it to avoid the upstream bug #89441 scope.
  • PT-1616: pt-table-checksum failed to resume on large tables with binary strings containing invalid UTF-8 characters.
  • PT-1573: pt-query-digest didn’t work in case of log_timestamps = SYSTEM my.cnf option.
  • PT-157: Specifying a non-primary key index with the ‘i’ part of the --source argument made pt-archiver to ignore the --primary-key-only option presence.
Improvements:
  • PT-1340: pt-stalk now doesn’t call mysqladmin debug command by default to avoid flooding in the error log. CMD_MYSQLADMIN="mysqladmin debug" environment variable reverts pt-stalk to the previous way of operation.
  • PT-1637: A new --fail-on-stopped-replication option  allows pt-table-checksum to detect failing slave nodes.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

Pages