Planet MySQL

MySQL Connector/Net 8.0.10-rc has been released

Dear MySQL users,

MySQL Connector/Net 8.0.10 is the first release candidate of MySQL
Connector/Net to add support for the new X DevAPI. The X DevAPI enables
application developers to write code that combines the strengths of the
relational and document models using a modern, NoSQL-like syntax that
does not assume previous experience writing traditional SQL.

To learn more about how to write applications using the X DevAPI, see
http://dev.mysql.com/doc/x-devapi-userguide/en/index.html. For more
information about how the X DevAPI is implemented in Connector/Net, see
http://dev.mysql.com/doc/dev/connector-net.

Please note that the X DevAPI requires at least MySQL Server version
5.7.12 or higher with the X Plugin enabled. For general documentation
about how to get started using MySQL as a document store, see
http://dev.mysql.com/doc/refman/5.7/en/document-store.html.

To download MySQL Connector/Net 8.0.10-rc, see the “Development
Releases” tab at
http://dev.mysql.com/downloads/connector/net/

Changes in MySQL Connector/Net 8.0.10 (2018-01-30, Release Candidate) Functionality Added or Changed * The .NET Core 2.0 implementation now supports the following connection-string options: AutoEnlist, InteractiveSession, Logging, Replication, and UseUsageAdvisor. For more information about the options, see Connector/Net Connection-String Options Reference (http://dev.mysql.com/doc/connector-net/en/connector-net-connection-options.html). (Bug #27297337) * X DevAPI: In the process of refining the definition of the X DevAPI to cover the most relevant usage scenarios, the following API components have been removed from the X DevAPI implementation for MySQL Connector/Net: + API components that support session configurations The MySqlX.XDevAPI.Config namespace and all members of the namespace. + API components that support views CreateView(), DropView(), and ModifyView() methods from the MySqlX.XDevAPI.Schema class. ViewAlgorithm, ViewSqlSecurityEnum, and ViewCheckOptionEnum enumerations from the MySqlX.DataAccess namespace. Note The Table.IsView property remains available for query operations. * Support for .NET Core 2.0 and .NET Standard 2.0 has been added (.NET Core 1.1 support continues). With .NET Core 2.0, most of the common ADO.NET classes are available for use, such as: + System.Data.DataTable, System.Data.DataColumn, and System.Data.DataRow + System.Data.DataSet + System.Data.Common.DataAdapter * Support for Entity Framework Core 2.0 has been added (Entity Framework 1.1 support continues). Currently, the MySQL Connector/Net implementation excludes the following 2.0 features: + Modeling: table splitting, owned types, model-level query filters, database scalar function mapping, self-contained type configuration for code first. + High performance: DbContext pooling and explicitly compiled queries. + Change tracking: attach can track a graph of new and existing entities. + Query: improved LINQ translation, group-join improvements, string interpolation in FromSql and ExecuteSqlCommand, new EF.Functions.Like(). + Database management: pluralization hook for DbContext scaffolding. + Others: only one provider per model, consolidated logging and diagnostics. * X DevAPI: MySQL Connector/Net now supports setting and releasing named transaction savepoints, which can be assigned a name explicitly or by default using the savepoint_(uuid) format. In addition, a transaction can be rolled back to a named savepoint. New methods were added to the MySqlX.XDevAPI.BaseSession class to implement corresponding SQL statements using the X Protocol: + SetSavepoint() and SetSavepoint(name) correspond to the SAVEPOINT statement. + ReleaseSavepoint() corresponds to the RELEASE SAVEPOINT statement. + RollbackTo() corresponds to the ROLLBACK TO statement. All errors generated by MySQL when one of the new methods is called will be returned by MySQL Connector/Net. * X DevAPI: The MySqlX.XDevAPI.CRUD.ModifyStatement.Patch method was added to enable the inclusion of JSON-like objects within Collection.Modify() operations that describe the changes to apply to all documents matching the condition. * Support for the caching_sha2_password authentication plugin through the classic MySQL protocol was added. Support through the X Protocol is limited to secure connections only (sslmode=required). Caching SHA-2 pluggable authentication offers faster authentication than basic SHA-256 authentication. A new and related connection option, AllowPublicKeyRetrieval, was also added. * X DevAPI: The MySqlX.XDevAPI.Collection.CreateIndex method implementation was modified to enable the inclusion of a JSON document that defines the index to be created. Index-definition details can include the fields affected, data types, and so on. Bugs Fixed * X DevAPI: When the PLAIN authentication option was used to make a secure connection, the database name was excluded from the authenticating data and the database value was not set. PLAIN authentication is the default option for connections made with TLS or Unix Sockets. (Bug #27098974, Bug #88427) * Boolean values within a JSON document were improperly stored as strings. (Bug #26837112) * Invoking the MySql.Web.Security.MySqlWebSecurity.CreateUserAndAccount method with valid arguments, including additionalUserAttributes as an object with key/value pairs, returned an out-of-range exception. Thanks to Stein Setvik for contributing to the fix. (Bug #25046364) * The default character set and encoding were not set properly when making a connection to MySQL 5.6 and 5.7 servers configured to use the utf8 character set. (Bug #23257011) * SSL connections made to a single MySQL instance could not be disconnected and created repeatedly without restarting the client application to clear the half-open sockets. (Bug #20393654, Bug #75022) Nuget packages are available at: https://www.nuget.org/packages/MySql.Data/8.0.10-rc https://www.nuget.org/packages/MySql.Web/8.0.10-rc https://www.nuget.org/packages/MySql.Data.EntityFramework/8.0.10-rc https://www.nuget.org/packages/MySql.Data.EntityFrameworkCore/8.0.10-rc https://www.nuget.org/packages/MySql.Data.EntityFrameworkCore.Design/8.0.10-rc Enjoy and thanks for the support!

On Behalf of the MySQL/ORACLE RE Team
Gipson Pulla

MariaDB Server 10.0.34 now available

MariaDB Server 10.0.34 now available dbart Tue, 01/30/2018 - 12:21

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.0.34. See the release notes and changelog for details and visit mariadb.com/downloads to download.

Download MariaDB Server 10.0.34

Release Notes Changelog What is MariaDB 10.0?

The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.0.34. See the release notes and changelog for details.

Login or Register to post comments

MySQL Query Performance: Not Just Indexes

In this blog post, I’ll look at whether optimizing indexing is always the key to improving MySQL query performance (spoiler, it isn’t).

As we look at MySQL query performance, our first concern is often whether a query is using the right indexes to retrieve the data. This is based on the assumption that finding the data is the most expensive operation – and the one you should focus on for MySQL query optimization. However, this is not always the case.

Let’s look at this query for illustration:

mysql> show create table tbl G *************************** 1. row ***************************       Table: tbl Create Table: CREATE TABLE `tbl` (  `id` int(11) NOT NULL AUTO_INCREMENT,  `k` int(11) NOT NULL DEFAULT '0',  `g` int(10) unsigned NOT NULL,  PRIMARY KEY (`id`),  KEY `k_1` (`k`) ) ENGINE=InnoDB AUTO_INCREMENT=2340933 DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql> explain select g,count(*) c from tbl where k<1000000 group by g having c>7 G *************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: tbl   partitions: NULL         type: ALL possible_keys: k_1          key: NULL      key_len: NULL          ref: NULL         rows: 998490     filtered: 50.00        Extra: Using where; Using temporary; Using filesort 1 row in set, 1 warning (0.00 sec) mysql> select g,count(*) c from tbl where k<1000000 group by g having c>7; +--------+----+ | g      | c  | +--------+----+ |  28846 |  8 | | 139660 |  8 | | 153286 |  8 | ... | 934984 |  8 | +--------+----+ 22 rows in set (6.80 sec)

Looking at this query, many might assume the main problem is that this query is doing a full table scan. One could wonder then, “Why does the MySQL optimizer not use index (k)?”  (It is because the clause is not selective enough, by the way.) This thought might cause someone to force using the index, and get even worse performance:

mysql> select g,count(*) c from tbl force index(k) where k<1000000 group by g having c>7; +--------+----+ | g      | c  | +--------+----+ |  28846 |  8 | | 139660 |  8 | ... | 934984 |  8 | +--------+----+ 22 rows in set (9.37 sec)

Or someone might extend the index on (k) to (k,g) to be a covering index for this query. This won’t improve performance either:

mysql> alter table tbl drop key k_1, add key(k,g); Query OK, 0 rows affected (5.35 sec) Records: 0  Duplicates: 0  Warnings: 0 mysql> explain select g,count(*) c from tbl where k<1000000 group by g having c>7 G *************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: tbl   partitions: NULL         type: range possible_keys: k          key: k      key_len: 4          ref: NULL         rows: 499245     filtered: 100.00        Extra: Using where; Using index; Using temporary; Using filesort 1 row in set, 1 warning (0.00 sec) mysql> select g,count(*) c from tbl where k<1000000 group by g having c>7; +--------+----+ | g      | c  | +--------+----+ |  28846 |  8 | | 139660 |  8 | ... | 915436 |  8 | | 934984 |  8 | +--------+----+ 22 rows in set (6.80 sec)

This wasted effort is all due to focusing on the wrong thing: figuring out how can we find all the rows that match k<1000000  as soon as possible. This is not the problem in this case. In fact, the query that touches all the same columns but doesn’t use GROUP BY runs 10 times as fast:

mysql> select sum(g) from tbl where k<1000000; +--------------+ | sum(g)       | +--------------+ | 500383719481 | +--------------+ 1 row in set (0.68 sec)

For this particular query, whether or not it is using the index for lookup should not be the main question. Instead, we should look at how to optimize GROUP BY – which is responsible for some 90% of the query response time.

In my next blog post, I will write about how MySQL performs the GROUP BY operation to provide further help on optimizing these queries.

MySQL vs MariaDB vs Percona Server: Security Features Comparison

Security of data is critical for any organisation. It’s an important aspect that can heavily influence the design of the database environment. When deciding upon which MySQL flavour to use, you need to take into consideration the security features available from the different server vendors. In this blog post, we’ll come up with a short comparison of the latest versions of the MySQL Community Edition from Oracle, Percona Server and MariaDB:

mysqld Ver 5.7.20-19 for Linux on x86_64 (Percona Server (GPL), Release 19, Revision 3c5d3e5d53c) mysqld Ver 5.7.21 for Linux on x86_64 (MySQL Community Server (GPL)) mysqld Ver 10.2.12-MariaDB for Linux on x86_64 (MariaDB Server)

We are going to use Centos 7 as the operating system - please keep in mind that results we present here may be slightly different on other distributions like Debian or Ubuntu. We’d also like to focus on the differences and will not cover the commonalities - Percona Server and MariaDB are flavors of MySQL, so some of the security features (e.g., how access privileges of MySQL files look like) are shared among them.

Initial security Users

Both Percona Server and MySQL Community Server comes with a randomly generated temporary password for the root user. You need to check the contents of MySQL’s error log to find it:

2018-01-19T13:47:45.532148Z 1 [Note] A temporary password is generated for root@localhost: palwJu7uSL,g

Once you log in, a password change is forced upon you:

[root@localhost ~]# mysql -uroot -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.7.21 Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> select * from mysql.user; ERROR 1820 (HY000): You must reset your password using ALTER USER statement before executing this statement.

Password has to be strong enough, this is enforced by the validate_password plugin:

mysql> alter user root@localhost identified by 'password123.'; ERROR 1819 (HY000): Your password does not satisfy the current policy requirements mysql> alter user root@localhost identified by 'password123.A'; Query OK, 0 rows affected (0.00 sec)

MariaDB does not generate a random root password and it provides passwordless access to the root account from (and only from) localhost.

[root@localhost ~]# mysql Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 12 Server version: 10.2.12-MariaDB MariaDB Server Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> SELECT CURRENT_USER(); +----------------+ | CURRENT_USER() | +----------------+ | root@localhost | +----------------+ 1 row in set (0.00 sec)

This is not a big issue during the initial deployment phase, as the DBA is supposed to configure and secure access to the database later on (by running mysql_secure_installation for example). The bigger problem here is that a good practice is not enforced by MariaDB. If you don’t have to setup a strong password for the root user, it could be that nobody changes it later and passwordless access will remain. Then this would become a serious security threat.

Another aspect we’d like to look at is anonymous, passwordless access. Anonymous users allow anyone to get in, it doesn’t have to be a predefined user. If such access is passwordless, it means that anyone can connect to MySQL. Typically such account has only USAGE privilege but even then it is possible to print a status (‘\s’) which contains information like MySQL version, character set etc. Additionally, if ‘test’ schema is available, such user has the ability to write to that schema.

Both MySQL Community Server and Percona server do not have any anonymous users defined in MySQL:

mysql> select user, host, authentication_string from mysql.user; +---------------+-----------+-------------------------------------------+ | user | host | authentication_string | +---------------+-----------+-------------------------------------------+ | root | localhost | *EB965412B594F67C8EB611810EF8D406F2CF42BD | | mysql.session | localhost | *THISISNOTAVALIDPASSWORDTHATCANBEUSEDHERE | | mysql.sys | localhost | *THISISNOTAVALIDPASSWORDTHATCANBEUSEDHERE | +---------------+-----------+-------------------------------------------+ 3 rows in set (0.00 sec)

On the other hand, MariaDB is open for anonymous, passwordless access.

MariaDB [(none)]> select user,host,password from mysql.user; +------+-----------------------+----------+ | user | host | password | +------+-----------------------+----------+ | root | localhost | | | root | localhost.localdomain | | | root | 127.0.0.1 | | | root | ::1 | | | | localhost | | | | localhost.localdomain | | +------+-----------------------+----------+ 6 rows in set (0.00 sec)

In addition to that, the ‘test’ schema is available - which allows anonymous users to issue writes to the database.

[root@localhost ~]# mysql -umyanonymoususer Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 14 Server version: 10.2.12-MariaDB MariaDB Server Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> use test; Database changed MariaDB [test]> CREATE TABLE mytab (a int); Query OK, 0 rows affected (0.01 sec) MariaDB [test]> INSERT INTO mytab VALUES (1), (2); Query OK, 2 rows affected (0.02 sec) Records: 2 Duplicates: 0 Warnings: 0 MariaDB [test]> SELECT * FROM mytab; +------+ | a | +------+ | 1 | | 2 | +------+ 2 rows in set (0.00 sec)

This poses a serious threat, and needs to be sorted out. Else, it can be easily exploited to attempt to overload the server with writes.

Data in transit security

MySQL Community Server and both of its forks support the use of SSL to encrypt data in transit. This is extremely important for Wide Area Networks, but also shouldn’t be overlooked in a local network. SSL can be used both client and server-side. Regarding server-side configuration (to encrypt traffic from master to slaves, for example), it looks identical across the board. There is a difference though when it comes to client-side SSL encryption, introduced in MySQL 5.7. Prior to 5.7, one had to generate SSL keys and CA’s and define them in the configurations of both server and client. This is how MariaDB’s 10.2 SSL setup looks like. In both MySQL Community Server 5.7 and in Percona Server 5.7 (which is based on MySQL 5.7), there is no need to pre-generate keys. It is all done automatically, in the background. All you need to do is to enable SSL on your client by setting the correct ‘--ssl-mode’. For MySQL’s CLI client, this is not even needed as it enables SSL by default:

[root@localhost ~]# mysql -p -h127.0.0.1 Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 6 Server version: 5.7.21 MySQL Community Server (GPL) Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> \s -------------- mysql Ver 14.14 Distrib 5.7.21, for Linux (x86_64) using EditLine wrapper Connection id: 6 Current database: Current user: root@localhost SSL: Cipher in use is DHE-RSA-AES256-SHA Current pager: stdout Using outfile: '' Using delimiter: ; Server version: 5.7.21 MySQL Community Server (GPL) Protocol version: 10 Connection: 127.0.0.1 via TCP/IP Server characterset: latin1 Db characterset: latin1 Client characterset: utf8 Conn. characterset: utf8 TCP port: 3306 Uptime: 2 days 21 hours 51 min 52 sec Threads: 1 Questions: 15 Slow queries: 0 Opens: 106 Flush tables: 1 Open tables: 99 Queries per second avg: 0.000 --------------

On the other hand MariaDB would require additional configuration as SSL is disabled by default:

[root@localhost ~]# mysql -h127.0.0.1 Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 18 Server version: 10.2.12-MariaDB MariaDB Server Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> \s -------------- mysql Ver 15.1 Distrib 10.2.12-MariaDB, for Linux (x86_64) using readline 5.1 Connection id: 18 Current database: Current user: root@localhost SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server: MariaDB Server version: 10.2.12-MariaDB MariaDB Server Protocol version: 10 Connection: 127.0.0.1 via TCP/IP Server characterset: latin1 Db characterset: latin1 Client characterset: utf8 Conn. characterset: utf8 TCP port: 3306 Uptime: 2 days 22 hours 26 min 58 sec Threads: 7 Questions: 45 Slow queries: 0 Opens: 18 Flush tables: 1 Open tables: 12 Queries per second avg: 0.000 -------------- Data at rest encryption

First of all, backups - there are freely available backup tools like xtrabackup or MariaDB Backup (which is a fork of xtrabackup). These allow to create encrypted backups of all three MySQL flavors we discuss in this blog post.

All three flavours support encryption of the running database, but there are differences in what pieces of data are encrypted.

The MySQL Community Server supports encryption of InnoDB tablespaces only. Keys used for encryption are stored in files (which is not compliant with regulations - keys should be stored in a vault - something which MySQL Enterprise supports). Percona Server is based on MySQL Community Server, so it also supports encryption of InnoDB tablespaces. Recently, in Percona Server 5.7.20, support for encryption of general tablespaces (compared to only individual ones in previous versions and MySQL Community Edition) was added. Support for encryption of binary logs was also added. Percona Server comes with a keyring_vault plugin, which can be used to store keys in Hashicorp Vault server, making Percona Server 5.7.20 compliant with regulatory requirements regarding data at rest encryption.

MariaDB 10.2 has more advanced data-at-rest encryption support. In addition to tablespace and binary/relay log encryption, it has support for encrypting InnoDB redo logs. Currently, it is the more complete solution regarding data encryption.

Audit logging Related resources  ClusterControl for Database Security  Ten Tips on How to Achieve MySQL and MariaDB Security  Database Security - How to fully SSL-encrypt MySQL Galera Cluster and ClusterControl

All three MySQL flavors have support for audit logging. Their scope is pretty much comparable: connect and disconnect events, queries executed, tables accessed. The logs contain information about which user participated in such event, from what host the user logged from, the time it happened, and similar info. Such events can be also logged via syslog and stored on an external log server to enable log analysis and parsing.

Data masking, SQL firewall

All of the discussed MySQL flavors work with some kind of tool which would allow implementing data masking, and would be able to block SQL traffic based on some rules. Data masking is a method of obfuscating some data outside of the database, but before it reaches client. An example would be credit card data which is stored in plain text in the database, but when a developer wants to query such data, she will see ‘xxxxxxxx...’ instead of numbers. The tools we are talking here are ProxySQL and MaxScale. MaxScale is a product of MariaDB Corporation, and is subscription-based. ProxySQL is a free to use database proxy. Both proxies can be used with any of the MySQL flavours.

That’s all for today folks. For further reading, check out these 10 tips for securing your MySQL and MariaDB databases.

Tags:  MySQL MariaDB security database security percona

How to Reset WordPress Admin Password via MySQL Command Prompt

Have you ever forgotten your WordPress administrator password and did not have access to your e-mail account or haven’t configured your e-mail account properly in WordPress? Fear not, because in today’s tutorial we are going to learn how to reset the WordPress administrator password through the MySQL command prompt. Resetting WordPress Admin Password via MySQL Command Prompt, its fairly easy task, you will need to have the login credentials of your database and user, or have access to wp-config.php file. Let’s get started! Resetting the WordPress admin password through MySQL Let’s begin by looking up the username and password you set […]

MySQL 8 Windowing Functions

Windowing functions are way to group rows of data for analysis. This provides a 'window' to look at only the relevant data only.  Plus there are about a dozen supporting functions to take advantage of all this. There is some carry over logically from the aggregate (or group by) functions but they open up some easy ways to dig statistically into your data.

Now for a contrived example.


mysql> Select row_number() over w as '#',
Name, Continent, Population,
sum(Population) over w as 'Pop'
from country where continent='South America'
window w as (partition by Continent Order by Continent);
+----+------------------+---------------+------------+-----------+
| # | Name | Continent | Population | Pop |
+----+------------------+---------------+------------+-----------+
| 1 | Argentina | South America | 37032000 | 345780000 |
| 2 | Bolivia | South America | 8329000 | 345780000 |
| 3 | Brazil | South America | 170115000 | 345780000 |
| 4 | Chile | South America | 15211000 | 345780000 |
| 5 | Colombia | South America | 42321000 | 345780000 |
| 6 | Ecuador | South America | 12646000 | 345780000 |
| 7 | Falkland Islands | South America | 2000 | 345780000 |
| 8 | French Guiana | South America | 181000 | 345780000 |
| 9 | Guyana | South America | 861000 | 345780000 |
| 10 | Peru | South America | 25662000 | 345780000 |
| 11 | Paraguay | South America | 5496000 | 345780000 |
| 12 | Suriname | South America | 417000 | 345780000 |
| 13 | Uruguay | South America | 3337000 | 345780000 |
| 14 | Venezuela | South America | 24170000 | 345780000 |
+----+------------------+---------------+------------+-----------+
14 rows in set (0.00 sec)


In the above example, we created a window named 'w' to allows us to 'peek' at the data arranged by Continent. I am cheating here by only looking at the Continent of South America. You will get a much better look at all this if you remove the WHERE continent='South America' from the query but I use it here for brevity.

And I used two Windows functions on the data from that window.  The ROW_NUMBER()  function provides a nice way to do exactly what the name of this function says.  And the SUM() function adds up all the population columns.

We can even set up ranks, or buckets, to divide up the data.



mysql> Select row_number() over w as '#', Name, Population,  ntile(5) over w as 'tile', sum(Population) over w as 'Pop'  from country where continent='South America'  window w as (partition by Continent Order by Continent);
+----+------------------+------------+------+-----------+
| # | Name | Population | tile | Pop |
+----+------------------+------------+------+-----------+
| 1 | Argentina | 37032000 | 1 | 345780000 |
| 2 | Bolivia | 8329000 | 1 | 345780000 |
| 3 | Brazil | 170115000 | 1 | 345780000 |
| 4 | Chile | 15211000 | 2 | 345780000 |
| 5 | Colombia | 42321000 | 2 | 345780000 |
| 6 | Ecuador | 12646000 | 2 | 345780000 |
| 7 | Falkland Islands | 2000 | 3 | 345780000 |
| 8 | French Guiana | 181000 | 3 | 345780000 |
| 9 | Guyana | 861000 | 3 | 345780000 |
| 10 | Peru | 25662000 | 4 | 345780000 |
| 11 | Paraguay | 5496000 | 4 | 345780000 |
| 12 | Suriname | 417000 | 4 | 345780000 |
| 13 | Uruguay | 3337000 | 5 | 345780000 |
| 14 | Venezuela | 24170000 | 5 | 345780000 |
+----+------------------+------------+------+-----------+
14 rows in set (0.00 sec)
mysql>

Here we used NTILE(5) to divide the results into five bucks.  By the way change the 5 to a 4 and you have quartiles or 100 for percentiles.  This does not really mean much statistically since the countries are arranged alphabetically.

So lets make this a little more statistically meaningful. Lets look at the population of South America with the largest countries first and broken into quartiles.


mysql> Select row_number() over w as '#', Name, Population, ntile(4) over w as 'tile',  sum(Population) over w as 'Pop'  from country where continent='South America'  window w as (partition by Continent Order by Population desc);
+----+------------------+------------+------+-----------+
| # | Name | Population | tile | Pop |
+----+------------------+------------+------+-----------+
| 1 | Brazil | 170115000 | 1 | 170115000 |
| 2 | Colombia | 42321000 | 1 | 212436000 |
| 3 | Argentina | 37032000 | 1 | 249468000 |
| 4 | Peru | 25662000 | 1 | 275130000 |
| 5 | Venezuela | 24170000 | 2 | 299300000 |
| 6 | Chile | 15211000 | 2 | 314511000 |
| 7 | Ecuador | 12646000 | 2 | 327157000 |
| 8 | Bolivia | 8329000 | 2 | 335486000 |
| 9 | Paraguay | 5496000 | 3 | 340982000 |
| 10 | Uruguay | 3337000 | 3 | 344319000 |
| 11 | Guyana | 861000 | 3 | 345180000 |
| 12 | Suriname | 417000 | 4 | 345597000 |
| 13 | French Guiana | 181000 | 4 | 345778000 |
| 14 | Falkland Islands | 2000 | 4 | 345780000 |
+----+------------------+------------+------+-----------+
14 rows in set (0.00 sec)

Now notice the Pop column as it suddenly becomes a very useful running total.

 I have only touched a few of the new functions to support Windowing functions but there is much more of interest here. 

MySQL Enterprise Monitor 4.0.3 has been released

We are pleased to announce that MySQL Enterprise Monitor 4.0.3 is now available for download on the My Oracle Support (MOS) web site. It will also be available for download via the Oracle Software Delivery Cloud in a few days. MySQL Enterprise Monitor is the best-in-class tool for monitoring and management of your MySQL assets and is included with your MySQL Enterprise Edition and MySQL Enterprise Carrier Grade subscriptions.

You can find more information on the contents of this release in the change log.

Highlights of MySQL Enterprise Monitor 4.0 include:

  • Modern look and feel: a redesigned user interface delivers a vastly improved overall user experience. The visible changes--the layout, the icons, the and the overall aesthetics--provide a more natural and intuitive experience. Views dynamically change and adjust to your current context and the assets you've selected, everything from individual MySQL instances or hosts to your custom Groups, to your complex replication and clustered topologies. Additional enhancements include a much more responsive UI and a greater capacity to scale, allowing you to more effectively manage thousands of MySQL related assets.
  • MySQL Cluster monitoring: we now auto-discover your MySQL Cluster installations and give you visibility into the performance, availability, and health of each MySQL instance and NDB process, as well as the health of the MySQL Cluster instance as a single logical system. The Overview dashboard displays detailed instrumentation available for each MySQL Cluster and the Topology dashboard displays the current configuration of your MySQL Cluster enabling you to quickly see the status of the MySQL Cluster instance as a whole and each individual process. The Topology dashboard allows you to easily see how your MySQL Cluster installations are currently functioning.
  • A User Statistics report: which provides an easy way to monitor MySQL resource utilization broken down by user.

You will find binaries for the new release on My Oracle Support. Choose the "Patches & Updates" tab, and then choose the "Product or Family (Advanced Search)" side tab in the "Patch Search" portlet.

You will also find the binaries on the Oracle Software Delivery Cloud soon.  Type "MySQL Enterprise Monitor" in the search box, or enter a license name to find Enterprise Monitor along with other MySQL products: "MySQL Enterprise Edition" or "MySQL Cluster Carrier Edition".  Then select your platform.

Please open a bug or a ticket on My Oracle Support to report problems, request features, or give us general feedback about how this release meets your needs.

If you are not a MySQL Enterprise customer and want to try the Monitor and Query Analyzer using our 30-day free customer trial, go to http://www.mysql.com/trials, or contact Sales at http://www.mysql.com/about/contact.

Thanks and Happy Monitoring!

- The MySQL Enterprise Tools Development Team

Useful URLs

MySQL Enterprise Monitor 3.4.6 has been released

We are pleased to announce that MySQL Enterprise Monitor 3.4.6 is now available for download on the My Oracle Support (MOS) web site. This is a maintenance release that includes a few new features and fixes a number of bugs. You can find more information on the contents of this release in the change log.

You will find binaries for the new release on My Oracle Support. Choose the "Patches & Updates" tab, and then choose the "Product or Family (Advanced Search)" side tab in the "Patch Search" portlet.

Important: MySQL Enterprise Monitor (MEM) 4.0 offers many significant improvements over MEM 3.4 and we highly recommend that you consider upgrading. More information on MEM 4.0 is available here:

Please open a bug or a ticket on My Oracle Support to report problems, request features, or give us general feedback about how this release meets your needs.

If you are not a MySQL Enterprise customer and want to try the Monitor and Query Analyzer using our 30-day free customer trial, go to http://www.mysql.com/trials, or contact Sales at http://www.mysql.com/about/contact.

Thanks and Happy Monitoring!

- The MySQL Enterprise Tools Development Team

Useful URLs

MySQL Enterprise Monitor 3.3.8 has been released

We are pleased to announce that MySQL Enterprise Monitor 3.3.8 is now available for download on the My Oracle Support (MOS) web site. This is a maintenance release that includes a few new features and fixes a number of bugs. You can find more information on the contents of this release in the change log.

You will find binaries for the new release on My Oracle Support. Choose the "Patches & Updates" tab, and then choose the "Product or Family (Advanced Search)" side tab in the "Patch Search" portlet.

Important: MySQL Enterprise Monitor (MEM) 4.0 offers many significant improvements over MEM 3.3 and 3.4, and we highly recommend that you consider upgrading. More information on MEM 4.0 is available here:

Please open a bug or a ticket on My Oracle Support to report problems, request features, or give us general feedback about how this release meets your needs.

If you are not a MySQL Enterprise customer and want to try the Monitor and Query Analyzer using our 30-day free customer trial, go to http://www.mysql.com/trials, or contact Sales at http://www.mysql.com/about/contact.

Thanks and Happy Monitoring!

- The MySQL Enterprise Tools Development Team

Useful URLs

Export MySQL database table to CSV (delimited / Excel) file

Today lets talk a little about converting a MySQL table to CSV (Excel). My friend was looking to export MySQL to Excel, I saw couple of questions for export MySQL tables to CSV on forums. Since I saw the question often, I thought of writing out all the ways I can think of for exporting […]

Hostname support in Group Replication whitelist

Introduction

Since the release of Group Replication, one of its main security features is IP Address Whitelisting. This allows us to control whoever can send requests to enter in a group.

On MySQL version 8.0.4, we added a new possibility that it was lacking since the beginning: the ability to configure IP Address Whitelisting with hostnames.…

Using MySQL 5.7 Generated Columns to Increase Query Performance

In this blog post, we’ll look at ways you can use MySQL 5.7 generated columns (or virtual columns) to improve query performance.

Introduction

About two years ago I published a blog post about Generated (Virtual) Columns in MySQL 5.7. Since then, it’s been one of my favorite features in the MySQL 5.7 release. The reason is simple: with the help of virtual columns, we can create fine-grained indexes that can significantly increase query performance. I’m going to show you some tricks that can potentially fix slow reporting queries with GROUP BY and ORDER BY.

The Problem

Recently I was working with a customer who was struggling with this query:

SELECT CONCAT(verb, ' - ', replace(url,'.xml','')) AS 'API Call', COUNT(*) as 'No. of API Calls', AVG(ExecutionTime) as 'Avg. Execution Time', COUNT(distinct AccountId) as 'No. Of Accounts', COUNT(distinct ParentAccountId) as 'No. Of Parents' FROM ApiLog WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' GROUP BY CONCAT(verb, ' - ', replace(url,'.xml','')) HAVING COUNT(*) >= 1 ;

The query was running for more than an hour and used all space in the tmp directory (with sort files).

The table looked like this:

CREATE TABLE `ApiLog` ( `Id` int(11) NOT NULL AUTO_INCREMENT, `ts` timestamp DEFAULT CURRENT_TIMESTAMP, `ServerName` varchar(50) NOT NULL default '', `ServerIP` varchar(50) NOT NULL default '', `ClientIP` varchar(50) NOT NULL default '', `ExecutionTime` int(11) NOT NULL default 0, `URL` varchar(3000) NOT NULL COLLATE utf8mb4_unicode_ci NOT NULL, `Verb` varchar(16) NOT NULL, `AccountId` int(11) NOT NULL, `ParentAccountId` int(11) NOT NULL, `QueryString` varchar(3000) NOT NULL, `Request` text NOT NULL, `RequestHeaders` varchar(2000) NOT NULL, `Response` text NOT NULL, `ResponseHeaders` varchar(2000) NOT NULL, `ResponseCode` varchar(4000) NOT NULL, ... // other fields removed for simplicity PRIMARY KEY (`Id`), KEY `index_timestamp` (`ts`), ... // other indexes removed for simplicity ) ENGINE=InnoDB;

We found out the query was not using an index on the timestamp field (“ts”):

mysql> explain SELECT CONCAT(verb, ' - ', replace(url,'.xml','')) AS 'API Call', COUNT(*) as 'No. of API Calls', avg(ExecutionTime) as 'Avg. Execution Time', count(distinct AccountId) as 'No. Of Accounts', count(distinct ParentAccountId) as 'No. Of Parents' FROM ApiLog WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' GROUP BY CONCAT(verb, ' - ', replace(url,'.xml','')) HAVING COUNT(*) >= 1G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ApiLog partitions: NULL type: ALL possible_keys: ts key: NULL key_len: NULL ref: NULL rows: 22255292 filtered: 50.00 Extra: Using where; Using filesort 1 row in set, 1 warning (0.00 sec)

The reason for that is simple: the number of rows matching the filter condition was too large for an index scan to be efficient (or at least the optimizer thinks that):

mysql> select count(*) from ApiLog WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' ; +----------+ | count(*) | +----------+ | 7948800 | +----------+ 1 row in set (2.68 sec)

Total number of rows: 21998514. The query needs to scan 36% of the total rows (7948800 / 21998514).

In this case, we have a number of approaches:

  1. Create a combined index on timestamp column + group by fields
  2. Create a covered index (including fields that are selected)
  3. Create an index on just GROUP BY fields
  4. Create an index for loose index scan

However, if we look closer at the “GROUP BY” part of the query, we quickly realize that none of those solutions will work. Here is our GROUP BY part:

GROUP BY CONCAT(verb, ' - ', replace(url,'.xml',''))

There are two problems here:

  1. It is using a calculating field, so MySQL can’t just scan the index on verb + url. It needs to first concat two fields, and then group on the concatenated string. That means that the index won’t be used.
  2. The URL is declared as “varchar(3000) COLLATE utf8mb4_unicode_ci NOT NULL” and can’t be indexed in full (even with innodb_large_prefix=1  option, which is the default as we have utf8 enabled). We can only do a partial index, which won’t be helpful for GROUP BY optimization.

Here, I’m trying to add a full index on the URL with

innodb_large_prefix=1:mysql> alter table ApiLog add key verb_url(verb, url); ERROR 1071 (42000): Specified key was too long; max key length is 3072 bytes

Well, changing the “GROUP BY CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”))” to  “GROUP BY verb, url” could help (assuming that we somehow trim the field definition from  varchar(3000) to something smaller, which may or may not be possible). However, it will change the results as it will not remove the .xml extension from the URL field.

The Solution

The good news is that in MySQL 5.7 we have virtual columns. So we can create a virtual column on top of “CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”))”. The best part: we do not have to perform a GROUP BY with the full string (potentially > 3000 bytes). We can use an MD5 hash (or longer hashes, i.e., sha1/sha2) for the purposes of the GROUP BY.

Here is the solution:

alter table ApiLog add verb_url_hash varbinary(16) GENERATED ALWAYS AS (unhex(md5(CONCAT(verb, ' - ', replace(url,'.xml',''))))) VIRTUAL; alter table ApiLog add key (verb_url_hash);

So what we did here is:

  1. Declared the virtual column with type varbinary(16)
  2. Created a virtual column on CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”), and used an MD5 hash on top plus an unhex to convert 32 hex bytes to 16 binary bytes
  3. Created and index on top of the virtual column

Now we can change the query and GROUP BY verb_url_hash column:

mysql> explain SELECT CONCAT(verb, ' - ', replace(url,'.xml','')) AS 'API Call', COUNT(*) as 'No. of API Calls', avg(ExecutionTime) as 'Avg. Execution Time', count(distinct AccountId) as 'No. Of Accounts', count(distinct ParentAccountId) as 'No. Of Parents' FROM ApiLog WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' GROUP BY verb_url_hash HAVING COUNT(*) >= 1; ERROR 1055 (42000): Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'ApiLog.ApiLog.Verb' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

MySQL 5.7 has a strict mode enabled by default, which we can change for that query only.

Now the explain plan looks much better:

mysql> select @@sql_mode; +-------------------------------------------------------------------------------------------------------------------------------------------+ | @@sql_mode | +-------------------------------------------------------------------------------------------------------------------------------------------+ | ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION | +-------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> set sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'; Query OK, 0 rows affected (0.00 sec) mysql> explain SELECT CONCAT(verb, ' - ', replace(url,'.xml','')) AS 'API Call', COUNT(*) as 'No. of API Calls', avg(ExecutionTime) as 'Avg. Execution Time', count(distinct AccountId) as 'No. Of Accounts', count(distinct ParentAccountId) as 'No. Of Parents' FROM ApiLog WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' GROUP BY verb_url_hash HAVING COUNT(*) >= 1G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ApiLog partitions: NULL type: index possible_keys: ts,verb_url_hash key: verb_url_hash key_len: 19 ref: NULL rows: 22008891 filtered: 50.00 Extra: Using where 1 row in set, 1 warning (0.00 sec)

MySQL will avoid any sorting, which is much faster. It will still have to eventually scan all the table in the order of the index. The response time is significantly better: ~38 seconds as opposed to > an hour.

Covered Index

Now we can attempt to do a covered index, which will be quite large:

mysql> alter table ApiLog add key covered_index (`verb_url_hash`,`ts`,`ExecutionTime`,`AccountId`,`ParentAccountId`, verb, url); Query OK, 0 rows affected (1 min 29.71 sec) Records: 0 Duplicates: 0 Warnings: 0

We had to add a “verb” and “url”, so beforehand I had to remove the COLLATE utf8mb4_unicode_ci from the table definition. Now explain shows that we’re using the index:

mysql> explain SELECT CONCAT(verb, ' - ', replace(url,'.xml','')) AS 'API Call', COUNT(*) as 'No. of API Calls', AVG(ExecutionTime) as 'Avg. Execution Time', COUNT(distinct AccountId) as 'No. Of Accounts', COUNT(distinct ParentAccountId) as 'No. Of Parents' FROM ApiLog WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' GROUP BY verb_url_hash HAVING COUNT(*) >= 1G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ApiLog partitions: NULL type: index possible_keys: ts,verb_url_hash,covered_index key: covered_index key_len: 3057 ref: NULL rows: 22382136 filtered: 50.00 Extra: Using where; Using index 1 row in set, 1 warning (0.00 sec)

The response time dropped to ~12 seconds! However, the index size is significantly larger compared to just verb_url_hash (16 bytes per record).

Conclusion

MySQL 5.7 generated columns provide a valuable way to improve query performance. If you have an interesting case, please share in the comments.

Enterprises choose Galera Cluster. Read why.

“Our three megastores’ and webshop’s combined revenue was 432 million euros in the year 2017. Galera Cluster has been the cornerstone of the system, secured our sales and supported our growth.”

“MySQL master/slave setup was just too fragile and needed regular work to fix each time the slave got out of sync. Now with Galera we’ve found that even if we take a node offline for a while, it re-syncs automatically and without intervention.”

“The big plus is that all nodes have exact the same information, and no node is running for even a fraction of a second behind. This will also mean that no information will be lost!”

 

READ MORE CASE STUDIES AND TESTIMONIALS

orchestrator 3.0.6: faster crash detection & recoveries, auto Pseudo-GTID, semi-sync and more

orchestrator 3.0.6 is released and includes some exciting improvements and features. It quickly follows up on 3.0.5 released recently, and this post gives a breakdown of some notable changes:

Faster failure detection

Recall that orchestrator uses a holistic approach for failure detection: it reads state not only from the failed server (e.g. master) but also from its replicas. orchestrator now detects failure faster than before:

  • A detection cycle has been eliminated, leading to quicker resolution of a failure. On our setup, where we poll servers every 5sec, failure detection time dropped from 7-10sec to 3-5sec, keeping reliability. The reduction in time does not lead to increased false positives.
    Side note: you may see increased not-quite-failure analysis such as "I can't see the master" (UnreachableMaster).
  • Better handling of network scenarios where packets are dropped. Instead of hanging till TCP timeout, orchestrator now observes server discovery asynhronously. We have specialized failover tests that simulate dropped packets. The change reduces detection time by some 5sec.
Faster master recoveries

Promoting a new master is a complex task which attempts to promote the best replica out of the pool of replicas. It's not always the most up-to-date replica. The choice varies depending on replica configuration, version, and state.

With recent changes, orchestrator is able to to recognize, early on, that the replica it would like to promote as master is ideal. Assuming that is the case, orchestrator is able to immediate promote it (i.e. run hooks, set read_only=0 etc.), and run the rest of the failover logic, i.e. the rewiring of replicas under the newly promoted master, asycnhronously.

This allows the promoted server to take writes sooner, even while its replicas are not yet connected. It also means external hooks are executed sooner.

Between faster detection and faster recoveries, we're looking at some 10sec reduction in overall recovery time: from moment of crash to moment where a new master accepts writes. We stand now at < 20sec in almost all cases, and < 15s in optimal cases. Those times are measured on our failover tests.

We are working on reducing failover time unrelated to orchestrator and hope to update soon.

Automated Pseudo-GTID

As reminder, Pseudo-GTID is an alternative to GTID, without the kind of commitment you make with GTID. It provides similar "point your replica under any other server" behavior GTID allows.

There's still many setups out there where GTID is not (yet?) deployed and enabled. However, Pseudo-GTID is often misunderstood, and though I've blogged and presented Pseudo-GTID many times in the past, I still find myself explaining to people the setup is simple and does not involve change to one's topologies.

Well, it just got simpler. orchestrator is now able to automatically inject Pseudo-GTID for you.

Say the word: "AutoPseudoGTID": true, grant the necessary privilege, and your non-GTID topology is suddenly supercharged with magical Pseudo-GTID tokens that provide you with:

  • Arbitrary relocation of replicas
  • Automated or manual failovers (masters and intermediate masters)
  • Vendor freedom: runs on Oracle MySQL, Percona Server, MariaDB, or all of the above at the very same time.
  • Version freedom (still on 5.5? No problem. Oh, this gets you crash-safe replication as extra bonus, too)

Auto-Pseudo-GTID further simplifies the infrastructure in that you no longer need to take care of injecting Pseudo-GTID onto the master as well as handle master identity changes. No more event_scheduler to enable/disable nor services to start/stop.

More and more setups are moving to GTID. We may, too! But I find it peculiar that Pseudo-GTID was suggested 4 years ago, when 5.6 GTID was already released, and still many setups are not yet running GTID. Please try it out! Read more.

Semi-sync support

Semi-sync has been internally supported via a specialized patch contributed by Vitess, to flag a server as semi-sync-able and handle enablement of semi-sync upon master failover.

orchestrator now supports semi-sync more generically. You may use orchestrator to enable/disable semi-sync master/replica side, via orchestrator -c enable-semi-sync-master, orchestrator -c enable-semi-sync-replica, orchestrator -c disable-semi-sync-master, orchestrator -c disable-semi-sync-replica commands (or API equivalent).

The API will also tell you whether semi-sync is enabled on instances. Noteworthy that configured != enabled. A server can be configured with rpl_semi_sync_master_enabled=ON, but if no semi-sync replicas are found, the Rpl_semi_sync_master_status state is OFF.

More

UI changes, removal of prepared statements, documentation updates, raft updates...

orchestrator is free and open source and released under the Apache 2 license. It is authored at and used by GitHub.

I'll be presenting orchestrator/raft in FOSDEM next week, at the MySQL and Friends Room.

Maxscale Data Archiving with filters ( Mq & Tee )

Introduction –

          Maxscale is an excellent Proxy from Mariadb Corporation, which providing the High Availability, Realtime RW split with replication Glaera cluster Amazon RDS Amazon Aurora, binlog streaming and many more advanced features, here in this blog we will discuss one such feature, 

In this blog post, i am going to share my recent activity with Maxscale. We had to help one of our client to archive only the DML ( CREATE & INSERT ) data into archive server from specific table.

Problem Statement –

Our client is having only one standalone ( Master ) setup and a Archive server , they need to archive one table, which should be affect from all the Queries as Production Server at same time, the table should be affect only with live  INSERT & UPDATE statements in Archive Server. Remaining SQL statements has to be filtered out. Hope it make sense !!!

How to achieve this ? Is that possible ?

Yes, we can achieve this via the Maxscale 2.1 MQ & Tee filters(mirror).

Architecture –

Client will be connected with Maxscale, the Tee filter will be pipe all the connection from Master (standalone) to Archive via MQ filter. The MQ filter match the mentioned string & allow the needed SQL statements to archive Server.

The queries routed to the archive servers are also transformed into a canonical format and sent to a RabbitMQ broker for analysis.

Below, i am sharing my testing environment results & steps as well.

Requirements for Testing –

Mariadb Servers ( 2 )

Production Server – 192.168.33.16

Archive Server – 192.168.33.17

Maxscale & Rabbit MQ Server 

Proxy Server – 192.168.33.13

Steps & Results –

MariaDB Server Installation –

I had installed the MariaDB Server on both production & archive environment using MariDB repository.

[root@mydbopslabs16 vagrant]# cat /etc/yum.repos.d/MariaDB.repo [mariadb] name = MariaDB baseurl = http://yum.mariadb.org/10.1/centos7-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1

Installing the MariaDB server,

yum install MariaDB-server MariaDB-client

Starting the MariaDB server,

systemctl status mariadb

Version – 10.1.30

Configuring RabbitMQ & Maxscale –

Now, let’s  start with Maxscale & RabbitMQ Server configuration on node 192.168.33.13,

I have directly installed the latest Maxscale using the RPM file,

yum install https://downloads.mariadb.com/MaxScale/2.1.13/rhel/7/x86_64/maxscale-2.1.13-1.rhel.7.x86_64.rpm service maxscale start

For install the Rabbit MQ server, we need to configure the epel repository,

Installing epel repo,

yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

Installing dependencies,

yum install http://packages.erlang-solutions.com/erlang-solutions-1.0-1.noarch.rpm

Installing RabbitMQ server,

yum install https://www.rabbitmq.com/releases/rabbitmq-server/v3.5.6/rabbitmq-server-3.5.6-1.noarch.rpm

It will take some time.

Starting the RabbitMQ Server,

systemctl start rabbitmq-server

Adding user & permissions,

rabbitmqctl add_user "sakthi_labs" "sakthi@labs" rabbitmqctl set_permissions "sakthi_labs" ".*" ".*" ".*" [root@mydbopslabs13 vagrant]# rabbitmqctl list_queues Listing queues ...

Now, we have installed all the needed things. Finally, we need to configure the Maxscale configuration file, Below is the set of configurations need for smooth process.

[root@mydbopslabs13 vagrant]# cat /etc/maxscale.cnf # The production and archive servers [production] type=server address=192.168.33.16 port=3306 protocol=MySQLBackend [archive] type=server address=192.168.33.17 port=3306 protocol=MySQLBackend [MySQL Monitor] type=monitor module=mysqlmon servers=production, archive user=maxscale passwd=maxscale@123 monitor_interval=10000 [Production] type=service router=readconnroute servers=production user=maxscale passwd=maxscale@123 filters=Tee [Archive] type=service router=readconnroute servers=archive user=maxscale passwd=maxscale@123 filters=MQ Filter [Production Listener] type=listener service=Production protocol=MySQLClient port=4000 [Archive Listener] type=listener service=Archive protocol=MySQLClient port=4001 [Tee] type=filter module=tee service=Archive match=match=\(insert\)\|\(create\) [MQ Filter] type=filter module=mqfilter hostname=192.168.33.13 port=5672 username=sakthi_labs password=sakthi@labs exchange=msg-ex-1 queue=msg-queue-1 key=MaxScale [MaxAdmin Service] type=service router=cli [MaxAdmin Listener] type=listener service=MaxAdmin Service protocol=maxscaled socket=default

Note – Make sure to create the users with needed privileges for Maxscale access on both production & archive Env. Refer the MariaDB official page here for user administration.

Now, feel free we have done with the configurations.

[root@mydbopslabs13 vagrant]# maxadmin MaxScale> list servers Servers. -------------------+-----------------+-------+-------------+-------------------- Server | Address | Port | Connections | Status -------------------+-----------------+-------+-------------+-------------------- production | 192.168.33.16 | 3306 | 0 | Running archive | 192.168.33.17 | 3306 | 0 | Running -------------------+-----------------+-------+-------------+--------------------

 

Journey to testing phase –

From configuration file,  i have just configured the match filter with CREATE & INSERT Statements.

match=match=\(insert\)\|\(create\)

So, it should allow the CREATE & INSERT statements to the archive & production as well. Remaining all will be affect only production Server.

In Maxscale host, i am creating the database and table with production listener port ( 4000 ).

[root@mydbopslabs13 vagrant]# hostname -I 192.168.33.13 [root@mydbopslabs13 vagrant]# mysql -h192.168.33.13 -uarchive_labs -p'mydbops@123' -P4000 -e "create database data_archivelab" [root@mydbopslabs13 vagrant]# mysql -h192.168.33.13 -uarchive_labs -p'mydbops@123' -P4000 -e "use data_archivelab; create table archive_test(id int, name varchar(16));" [root@mydbopslabs13 vagrant]# rabbitmqctl list_queues Listing queues ... msg-queue-1 2 ( queue status )

Here archive_labs is the user, which i have created for Maxscale connection.

In production host,

[root@mydbopslabs16 vagrant]# hostname -I 192.168.33.16 [root@mydbopslabs16 vagrant]# mysql -u'archive' -p'archive@123' -e "show schemas like 'data_archivelab'" -s -N data_archivelab [root@mydbopslabs16 vagrant]# mysql -u'archive' -p'archive@123' -e "use data_archivelab; show tables" -s -N archive_test

In archive host,

[root@mydbopslabs17 init.d]# hostname -I 192.168.33.17 [root@mydbopslabs17 init.d]# mysql -u'archive' -p'archive@123' -e "show schemas like 'data_archivelab'" -s -N data_archivelab [root@mydbopslabs17 init.d]# mysql -u'archive' -p'archive@123' -e "use data_archivelab; show tables" -s -N archive_test

Well, seems  the DB & Table has been created on both production & archive Servers.Now,

testing with INSERT Statement ,

In Maxscale host,

[root@mydbopslabs13 vagrant]# hostname -I 192.168.33.13 [root@mydbopslabs13 vagrant]# mysql -h192.168.33.13 -uarchive_labs -p'mydbops@123' -P4000 -e "use data_archivelab; insert into archive_test values(1,'MySQL'),(2,'MariaDB'),(3,'Percona'),(4,'Mydbops');"

In Production host,

[root@mydbopslabs16 vagrant]# hostname -I 192.168.33.16 [root@mydbopslabs16 vagrant]# mysql -u'archive' -p'archive@123' -e "use data_archivelab; select * from archive_test" -s -N 1       MySQL 2       MariaDB 3       Percona 4       Mydbops

In archive host,

[root@mydbopslabs17 init.d]# hostname -I 192.168.33.17 [root@mydbopslabs17 init.d]# mysql -u'archive' -p'archive@123' -e "use data_archivelab; select * from archive_test" -s -N 1       MySQL 2       MariaDB 3       Percona 4       Mydbops

Perfect, seems the results illustrate the CREATE & INSERT statements was perfectly updating the records in both production & archive Servers.

Finally, now i am going to delete some records, let see what happen,

In Maxscale host,

[root@mydbopslabs13 vagrant]# hostname -I 192.168.33.13 [root@mydbopslabs13 vagrant]# mysql -h192.168.33.13 -uarchive_labs -p'mydbops@123' -P4000 -e "use data_archivelab; delete from archive_test where id in (1,3);"

In production host,

[root@mydbopslabs16 vagrant]# hostname -I 192.168.33.16 [root@mydbopslabs16 vagrant]# mysql -u'archive' -p'archive@123' -e "use data_archivelab; select * from archive_test" -s -N 2       MariaDB 4       Mydbops

In archive host,

[root@mydbopslabs17 init.d]# hostname -I 192.168.33.17 [root@mydbopslabs17 init.d]# mysql -u'archive' -p'archive@123' -e "use data_archivelab; select * from archive_test" -s -N 1       MySQL 2       MariaDB 3       Percona 4       Mydbops

 

Well, the things are clear now. The DELETE statement is not affecting the archive Env, still it affects the production.

Hope this post will be helpful for understand the Maxscale data archiving process. This architecture can be implement in such a scenario like to save all the data without any delete or any modifications.  

This completes the data archiving process with Maxscale, remaining features will be followed up in future blog posts.

Thank you !!!

On InnoDB's Online DDL

I am completing my preparations for the upcoming FOSDEM talk, and one of last things I wanted to do in frames of them is a quick review of known bugs and problems in current (as in MySQL 5.7 GA) implementation of so called "online" DDL for InnoDB tables.

In my previous post I already shared my view on another important InnoDB feature, persistent statistics. Unlike that, I do not really hate online DDL. I just try to avoid it if possible and use tools like pt-online-schema-change or gh-ost instead. Not because it is not documented properly (the documentation is quite detailed, there are still things to clarify though) or does not work as designed, but mostly because the term "online" (if we understand it as "not blocking", or "without blocking/affecting the application and read/write operations to the table being changed is available") is a bit misleading (it is more like "less blocking" or "blocking for shorter periods of time", faster and in-place, sometimes), and because it does not work the way one might expect in any kind of replication setups.

To be more specific:
  • Replication ignores LOCK=NONE :) Slave will only start to apply "concurrent" DML after commit, and this leads to a huge replication lag.
  • In too many cases the entire table is rebuilt (data are (re-)written), in place or by creating a copy, while notable writes in the process of running ALTER TABLE are really required only if we are introducing stricter constraints (and even in this case we can just validate the table, return error if some row does not satisfy new constraint, too long to fit, for example, and then change metadata if all rows are OK) or adding new indexes (that in any case can not be used until they are built).
  • The online log has to be kept (in memory or in temporary file). There is one such log file for each index being created or table being altered. Manual says:
    "This log file stores data inserted, updated, or deleted in the table during the DDL operation. The temporary log file is extended when needed by the value of innodb_sort_buffer_size, up to the maximum specified by innodb_online_alter_log_max_size. If a temporary log file exceeds the upper size limit, the ALTER TABLE operation fails and all uncommitted concurrent DML operations are rolled back. Thus, a large value for this option allows more DML to happen during an online DDL operation, but also extends the period of time at the end of the DDL operation when the table is locked to apply the data from the log."The problem is that the size depends on the concurrent DML workload and is hard to predict. Note also "when the table is locked" above to understand how much "online" is this...
There are also bugs, and I'd like to discuss some of them:
  • Bug #82997, "Online DDL fails with". There are not enough public details to be sure with what exactly, but maybe the problems (several are reported) happen when the table altered has generated column. if this is really so, the bug may be fixed in MySQL 5.7.19+.
  • Bug #73196, "Allow ALTER TABLE to run concurrently on master and slave". I can not put this better than Andrew Morgan did it in this verified feature request:
    "With online ALTER TABLE it is possible for the DDL operation to run for many hours while still processing DML on that same table. The ALTER TABLE is not started on the slave until after it has completed on the master and it will again take many hours to run on the slave. While the DDL runs on the slave, it is not possible for it to process any transactions which followed the ALTER TABLE on the master as they may be dependent on the changes that were made to the table's schema. This means that the slave will lag the master by many hours while the ALTER TABLE runs and then while it catches up on the backlog of DML sent from the master while that was happening."Both pt-osc and gh-ost resolve this problem, as they take replication topology into account and can throttle changes on master if needed. See also this documentation request by Daniël van Eeden, Bug #77619 , that lists more limitations of "online" DDL, and check how it helped to clarify them here.
  • Bug #67286, "InnoDB Online DDL hangs". It ended up as "Not a bug", but there is a good explanation of exclusive metadata lock set by the "online" ALTER in the comments:
    "The final (short) phase of ALTER where the internal data dictionary is updated requires exclusive access. That's why the ALTER was blocked by the active transaction having a shared lock on the table."I once studied similar (and even simpler) case in a lot of details with gdb, see this blog post. I've clearly see MDL_EXCLUSIVE lock request for simple ALTER TABLE ... STATS_AUTO_RECALC=1 that (according to the manual) "permits concurrent DML". Other manual page clarifies:
    "In most cases, an online DDL operation on a table waits for currently executing transactions that are accessing the table to commit or roll back because it requires exclusive access to the table for a brief period while the DDL statement is being prepared. Likewise, the online DDL operation requires exclusive access to the table for a brief time before finishing. Thus, an online DDL statement also waits for transactions that are started while the DDL is in progress to commit or roll back before completing."Dear MySQL Oracle developers, just remove "In most cases" (or clarify it), and this would be fair enough!
  • Bug #84004, "Manual misses details on MDL locks set and released for online ALTER TABLE". That's my documentation request I filed after spending some time tracing metadata locks usage in gdb. My request is simple (typos corrected):
    "Describe all kinds of metadata locks used by MySQL, their interactions and order of acquisition and release for most important SQL statements, including (but not limited to) all kinds of online ALTER TABLE statements for InnoDB tables."
  • Bug #68498, "can online ddl for innodb be more online?". This report by Mark Callaghan that refers to this detailed study is still "Verified". Based on the comments to that blog post, it is "enough online", but the details of implementation were not clearly documented at the moment. Check for the details and clarifications in the comments!
  • Bug #72109, "Avoid table rebuild when adding or removing of auto_increment settings". The bug report from Simon Mudd is still "Verified".
  • Bug #57583, "fast index create not used during "alter table foo engine=innodb"". The bug is still "Verified" and I can not tell from the manual if this is implemented in MySQL 5.7 or not.
  • Bug #83557, "Can't use LOCK=NONE to drop columns in table with virtual columns" - nice "Verified" bug report by Monty Solomon.
  • Bug #70790, "ALTER TABLE REBUILD PARTITION SHOULD NOT PREVENT DML IN UNAFFECTED PARTITIONS". My former colleague in Oracle Arnaud Adant simply asked to provide proper and reasonable support of online DDL for partitioned tables. This bug report is still "Verified", but at least we have a separate manual page now that explains the details and limitations of online DDL with partitioned tables (most of Arnaud's requests are still NOT implemented).
  • Bug #81819, "ALTER TABLE...LOCK=NONE is not allowed when FULLTEXT INDEX exists". As Marko Mäkelä explains in the last comment of this "Verified" feature request:
    "However, LOCK=NONE is never supported when a FULLTEXT INDEX exists on the table. Similarly, LOCK=NONE is not supported when SPATIAL INDEX (introduced in MySQL 5.7) exist. Speaking as the author of WL#6255 which implemented ALTER TABLE...LOCK=NONE for InnoDB B-tree indexes in MySQL 5.6, I share the bug reporter's disappointment."
To summarize, online DDL in MySQL 5.7 is surely far above and beyond "fast index creation", but there is still a lot of room from improvements. Real solution (that allows to perform ALTER TABLE fast and without unnecessary changes/writes to data in way more cases) may come with real data dictionary in MySQL and support for multiple table versions there, or from ideas like those implemented in MDEV-11369, "Instant add column for InnoDB", and expressed in MDEV-11424, "Instant ALTER TABLE of failure-free record format changes". Until that all is implemented I'd prefer to rely on good old tools like pt-osc

In any case we speak about backward incompatible changes to the way MySQL works and stores data now.

Replicating into Elasticsearch Webinar Followup

We had a great webinar on Wednesday looking at how we can use Tungsten Replicator for moving data into Elasticsearch, whether that’s for analytics, searching, or reporting.

You can ahead and watch the video recording of that session here

We had one question on that session, which I wanted to answer in full:

Can UPDATE and DELETE be converted to INSERT?

One of the interesting issues with replicating databases between databases is that we don’t always want the information in the same format when it goes over another side. Typically when replicating within a homogeneous environment the reason we are using replication is that we want an identical copy over on the other side of the process. In heterogeneous, especially when we move the data out to an analytics environment like Elasticsearch, we might not. That covers a whole range of aspects, from the datatypes and other columns.

When it comes to the point in this question, what we’ve really got is an auditing rather than strict data replicating style scenario, and it sets up a different set of problems.

The simple answer is that currently, no, we don’t. 

The longer answer is that we want to do this correctly and it needs some careful considerations depending on your target environment.

Usually, we want to update the target database with a like-for-like operation for the target database, and we do that by making use of a primary key or other identifiers to update the right information. There can only be one primary key or identifier and if we are tracking modifications rather than the current record state, we need to not use the primary ID. But that also means we don’t normally track what operation created the entry we just modify the record accordingly which would additionally require another field. That means for the above request to make sense within the typical replication deployment, we need to do two things:

  1. Stop using a primary key as the record identifier
  2. Record the operation information (and changes) in the target. For most situations, this means you want to know the old and new data, for example, in an update operation.

Fortunately, within Elasticsearch we can do this a bit more easily, first by using the auto-generated ID, and second by using the document structure within Elasticsearch to model the old/new information, or add a new field to say what operation that is.

We haven’t that within Elasticsearch yet but have learnt some good practices for handling that type of change within our Kafka applier. Those changes will be in a release shortly and we’ll be adding them into Elasticsearch soon after.

Have many more questions about Elasticsearch or Tungsten Replicator? Feel free to go ahead and ask!

 

 

 

On InnoDB's Persistent Optimizer Statistics

As I put it in recent Facebook post, one of MySQL features that I truly hate is InnoDB's persistent statistics. I think I should clarify this statement. It's great to have a way to save statistics in between server restarts, to have better control on the way it is estimated (even on a per table basis), set it explicitly, check it with simple SELECT. These all are great additions since MySQL 5.6.2+ that I truly appreciate (even if I may not be happy with some implementation details). They helped to make plans for queries against InnoDB more predictable and allow (with some efforts applied) MySQL query optimizer to really work as "optimizer" instead of "randomizer" or "pessimizer" (as some people called it) for InnoDB tables.

What I hate about it mostly is the way innodb_stats_auto_recalc is implemented, and the fact that it is enabled by default since MySQL 5.6.6+ or so. Here is why:
  1. Even if one enables automatic statistics recalculation, she can not be sure that statistics is correct and up to date. One still really has to run ANALYZE TABLE every time after substantial changes of data to be sure, and this comes with a cost (that Percona tried to finally overcome with the fix to lp:1704195 that appeared in their Percona Server 5.7.20-18+). Or enjoy whatever bits of statistics (taken in the process of background recalculation) may be present at the moment and the resulting execution plans...
  2. The details on automatic statistics recalculation are not properly documented (if only in some comments to some bug reports). This changes to better with time (thanks to continue pressure from MySQL community, including your truly, in a form of bug reports), but still most of MySQL users are far from understanding why something happens or NOT happens when this feature is involved.
  3. Implementation introduced background thread (that does dirty reads) to do recalculation, and separate transactions against InnoDB tables where statistics is stored. This complicates implementation, analysis in gdb etc, and introduced more bugs related to coordination of work performed by this thread and other background and user threads.
  4. Recently nobody from Oracle cares much to fix bugs related to this feature.
Let me try to illustrate and prove the points above with some MySQL bug reports (as usual). Many of these bugs are still "Verified" and not fixed as of recent release of recent GA version, MySQL 5.7. The order is somewhat random:
  • Bug #70741, "InnoDB background stats thread is not properly documented" - that's one of my requests to improve documentation. Some more details were added and the bug is closed, but make sure to read the entire comment "[26 Nov 2013 13:41] Vasil Dimov" if you want to understand better how it all works.
  • Bug #70617, "Default persistent stats can cause unexpected long query times" - this is one of bugs that led me to filing the previous documentation request. Check comments by Vasil Dimov there that he made before closing it as "Not a bug"... His comments are the best documentation of the way feature is implemented that I've seen in public. Make your own conclusions.
  • Bug #78289, "Bad execution plan with innodb_stats_persistent enabled" - note that the problem started after pt-osc was applied (to overcome the problems with another feature I hate, "online" ALTER TABLE, most likely). This utility applies RENAME to the table that is altered at the last stage, and as a result statistics just disappears and you have either to wait until it is calculated again in the background, or run ANALYZE... Surely this is "Not a bug".
  • Bug #80178 and Bug #78066 are about cases when SHOW INDEXES may still give wrong results while (with persistent statistics automatic recalculation disabled) one expects the same values we see in the tables where statistics is stored, or just correct ones. Both bugs are still "Verified", even though from the comment in the latter one may assume that the problem may be fixed in recent MySQL 5.7.x.
  • Bug #75428, "InnoDB persistent statistics not persistent enough". The counter of updated rows since last recalculation does not survive restarts, and 10% threshold is not configurable, so if server restarts often and table is big enough, we may get statistics never updated. Still "Verified".
  • Bug #72368, "Empty/zero statistics for imported tablespace until explicit ANALYZE TABLE". Still "Verified", but may be fixed in versions newer than 5.6. Importing tablespace was NOT a reason for automatic statistics recalculation to ever happen for the table...
  • Bug #84940, "MySQL Server crash possibly introduced in InnoDB statistics calculation". This regression bug in 5.6.35 and 5.7.17 was quickly fixed in the next releases, but still caused troubles for some time.
  • Bug #82184, "Table status inconsistent, requires ANALYZE TABLE executed twice". As Andrii Nikitin stated himself, "Most probably second ANALYZE is needed to give some time to purge thread remove old versions of the rows.", in case the table has huge blobs. The bug is still "Verified".
  • Bug #71814, "Persistent stats activity conflicts with mysqldump import of same info". The bug is "Closed" without any reason stated in public (what a surprise...). Note the following comment by Shane Bester (who actually verified and explained the bug):
    "Personally, I don't like that mysqldump dumps the content of these tables that should be auto-generated."He had also suggested a workaround to disable persistent statistics (SET GLOBAL innodb_stats_auto_recalc=0; SET GLOBAL innodb_stats_persistent=0;) before importing a dump. The problem here is a race condition between the importing of mysql database and the background statistics thread that can insert rows into the table between the CREATE and LOCK TABLE in the dump. See Bug #80705, "Mysqlpump in default configuration does not work", also, with a clear request: "Do not dump innodb_index_stats or innodb_table_stats". Something to think about.
  • Bug #84654, "Cardinality reset to 0 with persistent statistics when AUTO_INCREMENT touched".  Still "Verified".
  • Bug #84287, "row inserts, statement updates, persistent stats lead to table scans+ lag slaves". It seems automatic recalculation of statistics on slave is not triggered by inserting more rows via row-based replication events. Still "Verified".
  • Bug #82969 , "InnoDB statistics update may temporarily cause wrong index cardinalities". This bug (still "Verified") is my all times favorite race condition in the implementation of persistent statistics by Oracle (well, this one and Bug #82968 that is fixed at least in recent 5.7.x and in MariaDB).
 There are also bugs related to other details of InnoDB persistent statistics implementation:
  • Bug #78401, "ANALYZE TABLE" may assign temporary values to table stats during its execution". Statistics is not updated atomically, it is first reset and then recaclulated. Still "Verified".
  • Bug #86926, "The field table_name (varchar(64)) from mysql.innodb_table_stats can overflow." - this may be not enough for partitioned table, as partition names may be longer. Still "Verified".
  • Bug #67179, "mysql system tables innodb_table_stats,slave_master_info not accessible on clean". This was a famous bug during early days of MySQL 5.6 that affected many users who tried to upgrade. You may still need this file from it one day, to re-create missing InnoDB tables in the mysql database.
  • Bug #80986, "innodb complains about innodb_table_stats even if persistent stats disabled". Still "Verified".
  • Bug #86927, "Renaming a partitioned table does not update mysql.innodb_table_stats.". Fixed recently in MySQL 5.7.21 and 8.0.4.
  • Bug #84455 - the topic of this bug report is different and not relevant, but Shane Bester noted the following in the error log uploaded:
    [Warning] InnoDB: A transaction id in a record of table `mysql`.`innodb_table_stats`
    is newer than the system-wide maximum.
    This is both suspicious and scary. May be related to the way background thread works.
  • Bug #74747, "Failing assertion: index->id == btr_page_get_index_id(page) btr0cur.cc line 899". Yes, this is a debug assertion only provoked explicitly, but note what is written in the error log before it happens:
    InnoDB: Cannot save table statistics for table "db1"."t1": Too many concurrent transactionsIt means background thread opens a separate transaction (no surprise, but still worth to note).
  • Bug #86702, "please disable persistent stats on the mysql.gtid_executed table". This is a valid and verified request to remove the related overhead for this "system" InnoDB table (as it was correctly done for several others).
So, the implementation of InnoDB's Persistent Optimizer Statistics is far from perfect or well documented. One may ask what I'd suggest instead? I often think and state that only engine-independent persistent statistics (in MariaDB style) should exist, and this should be recalculated only by explicit ANALYZE TABLE statement, maybe with more options to set sample size and other details than we have now. No background threads, nothing automatic until automated by the DBA (for this I'd appreciate a package like Oracle's dbms_stats).

This kind of idea is usually not well accepted. One of recent (valid) complains by Domas here were "No, thanks, don't need more MDL holders." and "I prefer lockless versioned stats, when it gets to what I prefer."

Some of the problems mentioned above may be resolved in MySQL 8 (or not) with its atomic data dictionary operations. Other idea presented in MDEV-15020 is to store statistics with data in the same .ibd file. We shell see what may happen, but current implementation, even though it improved a lot since early MySQL 5.6 days, is hardly long term acceptable.

Spotify Top 200 in mySQL

I do a lot of data analysis lately, and I try to find answers to questions through data for my companies pressing questions. Let's look at the past year of 2017 and answer questions for people who like music.



artist is the artist name
track is the artist's track name
list_date is which chart date the artist show up on the top200
streams is the number of plays following spotify specific rules

Let's look at the data set

select count(*) from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01';
+----------+
| count(*) |
+----------+
|    74142 |
+----------+
1 row in set (0.04 sec)


How many artists made it in the top200 for the United States?

mysql> select count(DISTINCT(artist)) from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01';
+-------------------------+
| count(DISTINCT(artist)) |
+-------------------------+
|                     527 |
+-------------------------+
1 row in set (0.09 sec)

Wow, it's really hard to be a musician. Only 527 broke the top200.
How many tracks in 2017 broke the top200?
 select count(DISTINCT(track)) from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01'; +------------------------+ | count(DISTINCT(track)) | +------------------------+ |                   1682 | +------------------------+
For the entire year, 1682 songs defined the united states listing habits for the most part.

Who showed up the most in the top200 for 2017?
mysql> select artist,count(*) AS CNT from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01' group by 1 order by 2 DESC LIMIT 10; +------------------+------+ | artist           | CNT  | +------------------+------+ | Drake            | 3204 | | Lil Uzi Vert     | 1891 | | Kendrick Lamar   | 1874 | | Post Malone      | 1776 | | Ed Sheeran       | 1581 | | The Weeknd       | 1566 | | Migos            | 1550 | | Future           | 1536 | | The Chainsmokers | 1503 | | Kodak Black      | 1318 | +------------------+------+ 10 rows in set (0.16 sec)
Drake killed it, but Lil Uzi Vert is the star of the year, IMHO. Drake has a pedigree while Lil Uzi just started running.
Also from these artists I can tell HIP HOP dominated us charts; Let's verify this assumption.
mysql> select artist,SUM(streams) AS CNT from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01' group by 1 order by 2 DESC LIMIT 10; +------------------+------------+ | artist           | CNT        | +------------------+------------+ | Drake            | 1253877919 | | Kendrick Lamar   | 1161624639 | | Post Malone      |  954546910 | | Lil Uzi Vert     |  818889040 | | Ed Sheeran       |  714523363 | | Migos            |  682008192 | | Future           |  574005011 | | The Chainsmokers |  557708920 | | 21 Savage        |  472043174 | | Khalid           |  463878924 | +------------------+------------+ 10 rows in set (0.48 sec)

Yup hip hop dominated the top 10 steams.
What about tracks? What are the top 10 tracks by streams?
 select track,SUM(streams) AS CNT from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01' group by 1 order by 2 DESC LIMIT 10; +-------------------+-----------+ | track             | CNT       | +-------------------+-----------+ | HUMBLE.           | 340136186 | | XO TOUR Llif3     | 314758565 | | Congratulations   | 283551832 | | Shape of You      | 280898054 | | Unforgettable     | 261753940 | | Mask Off          | 242524530 | | Despacito - Remix | 241370570 | | rockstar          | 225517132 | | Location          | 224879215 | | 1-800-273-8255    | 219689749 | +-------------------+-----------+ 10 rows in set (0.43 sec)
Which tracks and artists had the most time in the top200?
 select artist,track,count(*) AS CNT from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01' group by 2 order by 3 DESC LIMIT 10; +------------------+-------------------------------------+-----+ | artist           | track                               | CNT | +------------------+-------------------------------------+-----+ | D.R.A.M.         | Broccoli (feat. Lil Yachty)         | 485 | | French Montana   | Unforgettable                       | 417 | | PnB Rock         | Selfish                             | 394 | | Travis Scott     | goosebumps                          | 365 | | Post Malone      | Go Flex                             | 365 | | Childish Gambino | Redbone                             | 365 | | Post Malone      | Congratulations                     | 365 | | Post Malone      | White Iverson                       | 365 | | Migos            | Bad and Boujee (feat. Lil Uzi Vert) | 364 | | Bruno Mars       | That's What I Like                  | 364 | +------------------+-------------------------------------+-----+ 10 rows in set (0.20 sec)
Also from this data I can tell that Post Malone had a fantastic year!

So, more questions can be answered, like who held the number 1 position on the top200 the most?
select artist,track,count(*) AS CNT from spotify.top200 WHERE country='us' and pos=1 and list_date >= '2017-01-01' and list_date < '2018-01-01' group by 2 order by 3 DESC LIMIT 10; +----------------+-------------------------------------+-----+ | artist         | track                               | CNT | +----------------+-------------------------------------+-----+ | Post Malone    | rockstar                            | 105 | | Kendrick Lamar | HUMBLE.                             |  67 | | Ed Sheeran     | Shape of You                        |  48 | | Luis Fonsi     | Despacito - Remix                   |  47 | | Migos          | Bad and Boujee (feat. Lil Uzi Vert) |  29 | | 21 Savage      | Bank Account                        |  20 | | Drake          | Passionfruit                        |  12 | | Logic          | 1-800-273-8255                      |  10 | | Taylor Swift   | Look What You Made Me Do            |  10 | | French Montana | Unforgettable                       |   7 | +----------------+-------------------------------------+-----+ 10 rows in set (0.26 sec)

Wow can see hear that Post Malone is the star!
In summary, getting public data sources and doing simple queries can give a clearer insight into data to answer some pressing questions one may have.
With the schema above what questions would you answer?

Pages