Planet MySQL

On Percona Community Blog

I liked Percona Community Blog from the beginning. First of all, the idea is great. There is no other community blog for the MySQL ecosystem.

Well, Oracle has its own planet.mysql.com – and I have to say, they are correct: as far as I know, they never censored posts about MariaDB and Percona Server, nor opinions that they don’t like. I wrote heavy criticism about Oracle, sometimes using strong terms (“another dirty trick”), but they never censored me. I like to be fair regardless who/what I’m talking about, so this is a good time to spend some good words about them. Not the first time, anyway. That said, their blogroll only advertises a very small number of blogs. Very good ones of course (except for mine?), but this has the inevitable side effect of obfuscating the rest of the world. If John Doe writes an enlightening post about MySQL, I’ll never read it, because everything I need to know appears on Planet MySQL.

Percona Community Blog may have the same side effect… or maybe not, or the effect could be weaker at least. I saw outstanding contents by my well-known friend JF, yes. But I also saw articles by people that I don’t know, and I never saw on Planet MySQL. So I believe PCB is proving itself quite inclusive.

I started to publish some contents there. First, I used it to promote my talk at Percona Live Europe in Frankfurt, MariaDB System-Versioned Tables. Then I published an article on the same topic, Some notes on MariaDB system-versioned tables. Even if recently I’m not writing as much as I used to do some years ago, I believe that you will see more posts from me in the near future. PCB is a great place to publish stuff.

One could object that PBC contains the name of a private company and is hosted on its own website, so it is not genuinely a community project. Which is absolutely true. But if you want to see something better in the MySQL ecosystem, you will have to create it, because currently it doesn’t exist.

So, is this blog going to die? Absolutely not. This is my personal space. Any third-party website, no matter how good, can disappear or delete our contents, and there is nothing you can do about it. A personal space is there till you want it to be there. I don’t know how I will decide what will go here and what will go on PCB, I’ll have to think more about it.

Furthermore, being in several places is a form of redundancy, if we decide that our presence on the web is important for us. That is why I always keep my profiles on LinkedIn and Facebook a bit active, and some days ago I even created a YouTube playlist with my webinar recordings – only three, two of which in Italian, but still.

Well, enough babbling. Just a final word: if you have something interesting to say about open source databases, you should definitely propose it to PBC. Making it even more interesting is up to us!

Federico

Hangs when using trylock reader writer lock functions


The pthreads reader writer locks have a couple of problems that can lead to application hangs. The trylock functions can cause other threads to hang indefinitely in the rdlock or wrlock functions even after the lock is no longer held.  If these hanging threads hold other locks, the application can deadlock and grind to a halt.  This bug was reported in glibc bug 23844 and has existed since glibc 2.25 was released.  The 2.25 version includes a new reader writer lock implementation which replaces large critical sections with compare and exchange atomic instructions.  The code paths for an uncontended lock should be very fast.  Unfortunately, a couple of bugs that can hang the application need to be fixed.

The trywrlock function is a non-blocking function that tries to grab a write lock on a reader writer lock.  Unfortunately, this function is missing logic that the wrlock function has when the phase of the reader writer lock transitions from read phase to write phase.  This missing logic is intended to synchronize the state of the lock with other reader threads so that when the write lock is released, blocked reader threads are awakened.  Since the logic is missing, the reader threads are not awakened.

The tryrdlock function is a non-blocking function that tries to grab a read lock on a reader writer lock.  Unfortunately, this function is missing logic that the rdlock function has when the phase of the reader writer lock transitions from write phase to read phase.  This missing logic awakens reader threads that may be waiting for the read phase.  It just so happened that the thread executing tryrdlock got there first.  Since the logic is missing, the reader threads are not awakened.

There are patches to the trywrlock and tryrdlock functions in the glibc bug report that fix these bugs.  Hopefully, these patches will be included in a new glibc release soon.

The MySQL server uses reader writer locks for various purposes.  I wonder if it is affected by these reader writer lock bugs.




Amazon RDS Aurora Serverless – The Basics

When I attended AWS Re:Invent 2018, I saw there was a lot of attention from both customers and the AWS team on Amazon RDS Aurora Serverless. So I decided to take a deeper look at this technology, and write a series of blog posts on this topic.

In this first post of the series, you will learn about Amazon Aurora Serverless basics and use cases. In later posts, I will share benchmark results and in depth realization results.

What Amazon Aurora Serverless Is

A great source of information on this topic is How Amazon Aurora Serverless Works from the official AWS  documentation. In this article, you learn what Serverless deployment rather than provisional deployment means. Instead of specifying an instance size you specify the minimum and maximum number of “Aurora Capacity Units” you would like to have:

Once you set up such an instance it will automatically scale between its minimum and maximum capacity points. You also will be able to scale it manually if you like.

One of the most interesting Aurora Serverless properties in my opinion is its ability to go into pause if it stays idle for specified period of time.

This feature can save a lot of money for test/dev environment where load can be intermittent.  Be careful, though, using this for production size databases as waking up is far from instant. I’ve seen cases of it taking over 30 seconds in my experiments.

Another thing which may surprise you about Amazon Aurora Serverless, at the time of this writing, is that it is not very well coordinated with other Amazon RDS Aurora products –  it is only available as a MySQL 5.6 based edition and is not compatible with recent parallel query innovations either as it comes with list of other significant limitations. I’m sure Amazon will resolve these in due course, but for now you need to be aware of them.

A simple way to think about it is as follows: Amazon Aurora Serverless is a way to deploy Amazon Aurora so it scales automatically with load; can automatically pause when there is no load; and resume automatically when requests come in.

What Amazon Aurora Serverless is not

When I think about Serverless Computing I think about about elastic scalability across multiple servers and resource usage based pricing.   DynamoDB, another Database which is advertised as Serverless by Amazon, fits those criteria while Amazon Aurora Serverless does not.

With Amazon Aurora Serverless, for better or for worse, you’re still living in the “classical” instance word.  Aurora Capacity Units (ACUs) are pretty much CPU and Memory Capacity. You still need to understand how many database connections you are allowed to have. You still need to monitor your CPU usage on the instance to understand when auto scaling will happen.

Amazon Aurora Serverless also does not have any magic to scale you beyond single instance performance, which you can get with provisioned Amazon Aurora

Summary

I’m excited about the new possibilities Amazon Aurora Serveless offers.  As long as you do not expect magic and understand this is one of the newest products in the Amazon Aurora family, you surely should give it a try for applications which fit.

If you’re hungry for more information about Amazon Aurora Serverless and can’t wait for the next articles in this series, this article by Jeremy Daly contains a lot of great information.


Photo by Emily Hon on Unsplash

Find out A Little More About College or university Training?

Obtaining Advanced schooling Training

As it will probably appear improbable, there ARE ways to continue committed and have the best semester as of yet. Only in the following you’re willing to purchase training of this supreme great in a matter of several working days. Vacationing enthusiastic for the semester is often a nightmare for everyone.

Be certain to also label downward any buy research papers necessary times or timetable transformations connected to your work, every time you haven’t actually. Applying school help to a customized training is definitely not to be concerned about. More, it will certainly potentially indicate itself somehow by way of your application.

It’s right, there is the time to decide on the journalist, and enjoy their description prior to buying. Commercial frontrunners should trainer or assist staff to make sure that they’ve the perfect context and practicing for their designated succeed. You won’t ever be bad in looking for advice to gain success.

International candidates are responsible for their personal university student visa premiums and written documents and ought to note that there’s no funds (like federal government financial aid) accessible to world wide applicants. Our http://colegioebenezer.edu.co/?the=masters-writing-creative-uk-the-in-programs.htm coursework provider has existed for some generations. A. No, you can utilize the same PGP policy for all your permits and contents locations.

The Five-Moment Dominate for College or university Training

Training authoring is among the most important parts of academic lifestyle. Caused by our thorough crafting strategy, you’ll also are able to ranking properly in your particular elegance. Working hard people which might be cramming curriculums in their own stretched daily schedules will happen on large quantities following and making quickly thrown onto their to-do catalog.

Get started with Smallish much like with any new valuable experience and challenge, get into bit of and take a while to know the way the method is most effective. Also customize the united products till you’re major from the other conclusion. At exactly the same time, your topic area shouldn’t be brand new considering that it isn’t just going to be simplistic that you diligently uncover suitable components during https://payforessay.net/buy-essay preliminary research position.

You are likely to get a more powerful comprehension of this suitable arrangement for school article writing and possess the possibility to experience new information specialist methods and rehearse your personal coming up with qualities. Demands, finally, you will definitely get an mysterious that top level completeness organize article writing methods help you stay far by a lousy personal debt and present you converse more effective levels. The fact is, scholars run into the wide range of the duties that must be implemented in writing.

The Basic Principles of Advanced schooling Coursework Exposed

As for instance, you will definitely have already got a trustworthy source of earnings established, that make it less difficult to have enough money for tuition coupled with other expenses. At the same time, there are many of advantages to doing so! Unless of course you’re completed your experiments, you won’t know how to have a nice certainly-payed off position which includes a a great deal position advancement.

Should you have questions dealing with which developmental programs it is important to choose, watch your advisor. For biochemistry, AP credit standing might be implemented all the way to the all around chemistry condition as well as an individual semester of at the same time yrs all round extremely important training. It is definitely not really worth to keep and also hardwearing . grades at risk thanks to troubling coursework university specifications.

You will discover also on-line-centred coursework system you will get training to incorporate a unique selection of excellent quality by way of your work. Among the many leading problems some school students go through is the amount of training mandatory. You ought to understand how to improve your reviewing to coincide employing the rigor of an courses you’re currently being confronted with.

The Secrets to College Coursework

Use a glimpse at what a CSU-Global training course appears to be. The strain of balancing work closely with college or university labor sometimes have you questioning regardless of whether higher education was a terrific view. Robust consuming food is extremely important to obtain ideal nutrition so as to continue being concentrated and inform even while understanding.

Factors You Won’t Like About University Training and Issues You Will

Higher education willingness shouldn’t are the intent. Handling your classes in lieu of beyond it is typically specifically easy and can make it easier for make certain that school coursework integrates effectively in to your schedule. Classmates are expected to visit MCC tutorials even in the event the school isn’t in program to have a specified moment.

You may not be going to collect your college college degree yet would like to display that you’re advanced schooling prepared. Advanced schooling coursework is certainly an essential part of learning for every single learner. They is definitely not able to end up with educational credit for courses of instruction for which they failed to in the correct way subscribe.

Staying any material up-to-date is essential. When you finish you’ve applied to participate in the Pioneer Voice Package, we’ll delegate you with a Pioneer Show professional. Make sure you investigate the self-revealed suggestions you’ve provided.

University Coursework and University or college Coursework – The Ideal Formula

Definitely, listed here, at our website, the students in many cases can recognized the full range of sustaining assistance and don’t get complications with the crafted activities in anyway. Just like a highest an amazing term paper can provide you with a large status and luxury within the exam. At the identical time, your subject shouldn’t be innovative since it isn’t going to be simplistic for you to choose suitable content while in the exploration point.

Stuff You Won’t Like About Advanced schooling Coursework and Things You Will

A large amount of medical related universities will urge increased training in biology also. Also, you don’t have to pay us promptly, as long as we provide you university training aid and you also are pleased with it only then do you need to pay for us. Right before start typing in your coursework, you’ll requirement to attach your college information.

Some prerequisites are characterized with respect to training as well as others are described relating to competencies. Find trainings dependent on the stories you have to consider! Pick modules depending on the concepts you prefer to consider.

Up in Arms About Higher education Coursework?

Just look at the example following, and you will have a notion of what circumstances to predict away from your experience with CSU-International. When choosing a university, purchase the a you prefer possibly the most and experience the most from home, not one that you consider you might want to have a look at or one that you feel person desire you to consult with. Should you don’t correspond with the folks close to you as you’re being employed full-time on consuming any amount of college training, you are likely to be likely nowhere rapidly.

It’s also recommended that you gain benefit from the wealth of online details and solutions available for get acquainted with the basic techniques of AI. When you are done you’ve used to participate in the Pioneer Express Program, we’ll delegate that you simply Pioneer Convey expert. The data is assessed upon a every year base.

What University Training Is – and What exactly it is Not

The world wide web part is really personal reference to exactly how you purchase the training paper within the internet on the site. Should you be focused on your educational history, don’t be reluctant to get in touch with us using your requests. Many organisations may give a cheap coursework priceand offer exclusive writing articles.

The Grubby Truth About Advanced schooling Training

If being in another country in the correct time of app, global applicants accessible interview must be geared up into the future one on one. Make sure you be capable of focus on the institution do the trick as well and then have no supplementary figuring out bend in employing the correct software application that within the internet advanced schooling has to have. The University won’t earn any tuition changes for adjustments in enrollment adopting the verdict this initial four weeks as a result of the start of signing up.

Move credit rating is probably not found in being able to meet the nominal array of 400-standard credit score hours needed for the master’s level. Our training expert services has been in existence for the majority of ages. Don’t sense like it is important to daily schedule everything into 15-moment increments, but do ensure that you can match all obligations.

The Secret Jewel of Higher education Coursework

You might be a existing university or advanced schooling scholar, or you might be going to the workforce for the very first time. Come to consider your academic likes and dislikes and what you would like to do subsequently, after higher education. School students are liable for featuring method of travel.

These questions and answers can certainly help you know more. Your full time occupation will permit you to maintain a proficient mindset and viewpoint as you’re in education at precisely the same time, you’re just going to be well before every one of your other other graduate students. One can find opportunities youngsters could make ever since will possibly grow their possibility for success, or solely go away them around the rank quo with all others.

The Brand New Slope On College or university Coursework Just Published

Whilst it could look and feel impossible, it is possible to keep commited and have the perfect semester yet. Only at this point you’re able to acquire coursework with the the greatest possible level in only a few weeks. The toughest day of my college livelihood, undoubtedly, was aiming to find out what credits transferred, states in the usa Crotty.

The University or college Training Mask

You are likely to obtain a more robust comprehension of the suitable design for school writing articles and also have the chance to look into new research routines and employ your own special generating competencies. At any time you don’t have lots of experience to indicate your functionality and functionality working, it is normally vital to include any appropriate college or university coursework, even when you didn’t scholar that includes a qualification. Either you’re seeking category composition articles or any subject areas a lot when you are they’ve been at the sciences, we’ll manage to assist you.

Absolute best Choices of Higher education Coursework

Move Move guides in which the learner earned a passing class are eligible for review and college or university exchange credit rating might possibly be awarded. Whether or not you’re an expensive university undergraduate which desires to gain a high standard for the hire file, or even College or university learner aiming to find a doctorate, we’ve tailored software to suit what you will necessitate. Trainees are expected to go to MCC training systems even in the event the highschool isn’t in session for a chosen event.

Individuals with over a few semesters of university training may wish to think about our Liberal Analyses degree finalization opportunities. College students ought to be signed up for a partnering institution so as to take part in a DEEP product. Enrollees also need to be accepted directly into the guru internship consisting of college student educating.

In the event you don’t fully enter into each and every courses as you to start with send in the application or don’t make corrections as asked, the application will probably be delayed in finalizing and you could endanger your odds for admission. You ought to are able to target the high school effort by itself and just have no special trying to learn contour in using the best software packages that through the internet university or college will involve. The School won’t make any tuition changes for alterations in registration following final result of that early 4 weeks once the beginning of subscription.

The 5-Min Law for University Coursework

Training writing articles is among one of the most crucial elements of educational reality. The past task will bring together what you’ve perfected in the type of a dissertation, studies carrying out, or establishment growth proposition. Moving classmates which have been cramming training systems within his or her snug agendas will come over large quantities perusing and writing articles quickly thrown to their to-do report.

Working hard full-time when bringing any university or college coursework will be hectic and perhaps even wholly mind-boggling now and again. Only the following you’re willing to get coursework for this supreme fine quality within just two to three days or weeks. Keeping motivated for the whole semester is definitely a headache for everyone.

The Key to university Training

You’re almost definitely going crazy thinking about means to make all of it perform well. It’s unbelievably snug considering that it enables you to discover who more likely to draft your educational papers. If you decide you don’t connect with the people surrounding you as you’re working out full time over spending any sum of higher education coursework, you will be getting not anywhere high-speed.

There are several pupils for whom the whole download of honors sessions may well be environmentally friendly. There are lots of nice information out there for children, and thus don’t be scared to question! So long as you anticipate having to take a considerable amount of Traditional Training training courses for high school, decide on lessons from several different zones.

You do not be intending to are given your college college diploma yet wish to establish that you’re university or college informed. It is vital for pupils in school as it chooses their potential future and helps all of them to come up with a triumphant vocation. Honors College students utilize the Honors Coursebook in combination with the University’s world-wide-web lessons listing to build their group schedules each one semester.

Create a list of all of the occasions you want to caused by a new level, frame it, and set it into your rested learn room. Look into employing a planner if you ever must practice better time management planning to be assured you’re ways to get quite enough learn time. At the exact same time, your topic shouldn’t be cutting edge since it isn’t going to be simple that you identify best suited fabrics through evaluation state.

The Unseen Gem of School Training

You’ve reached identify that it is important. 1 grounds shutting will result in all campuses having been not open. Until you’re completed your studies, you won’t have the option to include a well-paid back project with a even more career building.

These answers and questions may help you learn more. Your full time profession will enable you to carry on knowledgeable attitude and personality as you’re in education as well as identical time, you’re usually in advance of your entire other fellow graduated pupils. You will find possible choices college students might make now that will possibly boost their possibility for success, or simply just post them within a reputation quo with other people.

News, Sits and University or college Training

You’ll be a part of every individual action of an technique, training and paying attention to your entire instant. In the event the pieces of paper is designed in acquiescence with an attained basic, without a glitches, then it’s ready for defense. In actual fact, participants go through the large amount of the tasks that must definitely be completed in creating.

Essential Components of School Training

Preserving any material updated is crucial. Immediately after you’ve applied to take part in the Pioneer Exhibit Training program, we’ll delegate that you simply Pioneer Show specialist. The Leader Talk about Application provides a remarkable chance of quickly convenience to college.

Exchange credit will not be utilized in achieving the nominal amount of 400-tier consumer credit working hours essential for the master’s college diploma. Our training customer service has existed for quite a few generations. Finding out how to examine properly can assist you to make your a lot of the time that you just simply pay for your schoolwork.

Loads of healthcare schools will promote even more training in biology also. For chemistry, AP credit might possibly be carried out to the complete chemistry necessity so when at least one semester of simultaneously decades total a necessity coursework. It is definitely not worthwhile to continue to keep your levels on the line as a consequence of troubling training university qualifications.

You will discover also on-line-dependent training customer service you can receive training to add a selected array of good quality by way of your career. One of several biggest worries some enrollees ordeal is the amount of coursework recommended. It truly is easy to write coursework exclusively by yourself, but there’s no good results warranted.

Fun with Bugs #75 - On MySQL Bug Reports I am Subscribed to, Part XII

From the lack of comments to my previous post it seems everything is clear with ERROR 1213 in different kinds and forks of MySQL. I may still write a post of two about MyRocks or TokuDB deadlocks one day, but let's get back to my main topic of MySQL bugs. Today I continue my series of posts about community bug reports I am subscribed to with a review of bugs reported in November, 2018, starting from the oldest and skipping those MySQL 8 regression ones I've already commented on. I also skip documentation bugs that should be a topic for a separate post one day (to give more illustration to these my statements).

These are the most interesting bug reports from Community members in November 2018:
  • Bug #93139 - "mysqldump temporary views missing definer". This bug reported by Nikolai Ikhalainen from Percona looks like a regression (that can appear in a bit unusual case of missing root user) in all versions starting from 5.6. There is no regression tag, surely. Also for some reason I do not see 8.0.x as affected version, while from the text it seems MySQL 8 is also affected.
  • Bug #93165 - "Memory leak in sync_latch_meta_init() after mysqld shutdown detected by ASan". This bug was reported by Yura Sorokin from Percona, who also made important statement in his last comment (that I totally agree with):
    "In commit https://github.com/mysql/mysql-server/commit/e93e8db42d89154b37f63772ce68c1efda637609 you literally made 14 MTR test cases ignore ALL memory problems detected by ASan, not only those which you consider 'OK' when you terminate the process with the call to 'exit()'. In other words, new memory leaks introduced in FUTURE commits may not be detected because of those changes. Address Sanitizer is a very powerful tool and its coverage should be constantly extending rather than shrinking."
  • Bug #93196 - "DD crashes on assert if ha_commit_trans() returns error". It seems Vlad Lesin from Percona spent notable time testing everything related to new MySQL 8 data dictionary (maybe while Percona worked on their Percona Server for MySQL 8.0 that should have MyRocks also supported, should be able to provide native partitioning and proper integration with data dictionary). See also his Bug #93250 - "the result of tc_log->commit() is ignored in trans_commit_stmt()".
  • Bug #93241 - "Query against full text index with ORDER BY silently fails". Nice finding by Jonathan Balinski, with detailed test cases and comments added by Shane Bester. One more confirmation that FULLTEXT indexes in InnoDB are still problematic.
  • Bug #93276 - "Crash when calling mysql_real_connect() in loop". Nice regression in C API (since 8.0.4!) noted by Reggie Burnett and still not fixed.
  • Bug #93321 - "Assertion `rc == TYPE_OK' failed". The last but not the least, yet another debug assertion (and error in non-debug build) found in MySQL 8.0.13 by Roel Van de Paar from Percona. You already know where QA for MySQL happens to large extent, don't you?
  • Bug #93361 - "memory/performance_schema/table_handles have memory leak!". It's currently in "Need Feedback" status and may end up as not a bug, but I've never seen 9G of memory used for just one Performance Schema table so far. It's impressive.
  • Bug #93365 - "Query on performance_schema.data_locks causes replication issues". Probably the first case when it was proved that query to some Performance Schema table may block some important server activity. Nice finding by Daniël van Eeden.
  • Bug #93395 - "ALTER USER succeeds on master but fails on slave." Yet another way to break replication was found by Jean-François Gagné. See also his Bug #93397 - "Replication does not start if restart MySQL after init without start slave."
  • Bug #93423 - "binlog_row_image=full not always honored for binlog_format=MIXED". For some reason this bug (with a clear test case) reported by James Lawrie is still "Open".
  • Bug #93430 - "Inconsistent output of SHOW STATUS LIKE 'Handler_read_key';". This weird inconsistency was found by Przemysław Skibiński from Percona.
Thinking about the future of MySQL 8 somewhere in Greenwich... To summarize this review:
  1. I obviously pay a lot of attention to bug reports from Percona engineers.
  2. It seems memory problems detected by ASan in some MTR test cases are deliberately ignored instead of being properly fixed.
  3. There are still many surprises waiting for early adopters of MySQL 8.0 GA :) 
That's all I have to say about specific MySQL bugs in 2018. Next "Fun with Bugs" post, if any, will appear only next year. I am already subscribed to 11 bugs reported in December 2018. Stay tuned!

Group Replication: A member in “RECOVERING” state is part of the primary partition

If you are using MySQL InnoDB Cluster (Group Replication) with ProxySQL, you should be familiar with the 2 functions and 1 view required in SYS Schema that ProxySQL uses to see if a node is online, partitioned or not, and if it’s lagging or not (see link1 and link2).

I received recently a very valuable contribution from Bruce DeFrang that fixes a bug in one of the function that were added to SYS.

In fact, Bruce discovered that when a node was in RECOVERING state, it was not count in the Primary Partition. This could lead in having the only ONLINE Primary Master considered as being partitioned and therefore, ProxySQL won’t consider the node as a valid candidate for routing the queries to it.

I already updated the original gist with these addition, so if you are linking it somewhere, you have now the fixed version.

For the others, here is the file: addtion_to_sys_8.0.2.sql

The same file is of course valid for all MySQL >= 8.0.2.

In conclusion, thank you Bruce for considering MySQL Group Replication and thank you for sharing your comments with me and for contributing back your fix.

LSM math - size of search space for LSM tree configuration

I have written before and will write again about using 3-tuples to explain the shape of an LSM tree. This makes it easier to explain the configurations supported today and configurations we might want to support tomorrow in addition to traditional tiered and leveled compaction. The summary is that n LSM tree has N levels labeled from L1 to Ln and Lmax is another name for L1. There is one 3-tuple per level and the components of the 3-tuple are (type, fanout, runs) for Lk (level k) where:
  • type is Tiered or Leveled and explains compaction into that level
  • fanout is the size of a sorted run in Lk relative to a sorted run from Lk-1, a real and >= 1
  • runs is the number of sorted runs in that level, an integer and >= 1
Given the above how many valid configurations exist for an LSM tree? There are additional constraints that can be imposed on the 3-tuple but I will ignore most of them except for limiting fanout and runs to be <= 20. The answer is easy - there are an infinite number of configurations because fanout is a real.
The question is more interesting when fanout is limited to an integer and the number of levels is limited to between 1 and 10. I am doing this to explain the size of the search space but I don't think that fanout should be limited to an integer.
There are approximately 2^11 configurations only considering compaction type, which has 2 values, and 1 to 10 levels because there are 2^N configurations of compaction types for a tree with N levels and the sum of 2^1 + 2^2 + ... + 2^9 + 2^10 = 2^11 - 1
But when type, fanout and runs are considered then there are 2 x 20 x 20 = 800 choices per level and 800^N combinations for an LSM tree with N levels. Considering LSM trees with 1 to 10 levels then the number of valid configurations is the sum 800^1 + 800^2 + ... + 800^9 + 800^10. That is a large number of configurations if exhaustive search were to be used to find the best configuration. Note that I don't think exhaustive search should be used.

Replicating data into Clickhouse

Clickhouse is a relatively new analytics and datawarehouse engine that provides for very quick insertion and analysing of data. Like most analytics platforms it’s built on a column-oriented storage basis and unlike many alternatives is completely open source. It’s also exceedingly fast, even on relatively modest platforms.

Clickhouse does have some differences from some other environments, for example, data inserted cannot easily be updated, and it supports a number of different storage and table engine formats that are used to store and index the information. So how do we get into that from our MySQL transactional store?

Well, you can do dumps and loads, or you could use Tungsten Replicator to do that for you. The techniques I’m going to describe here are not in an active release, but use the same principles as other part of our data loading.

We’re going to use the CSV-based batch loading system that is employed by our Hadoop, Vertica and Amazon Redshift appliers to get the data in. Ordinarily we would run a materialization step that would merge and update the data from the staging tables, which import the raw change information and turn that into ‘base’ or carbon copy tables. We can’t do that with Clickhouse as the data cannot be modified once imported, but we can still use the information that gets imported.

If you are familiar with the way we load data in this method, you will know that we import information using a CSV file and each row of the file is either an INSERT or DELETE, with an UPDATE operation being simulated by a DELETE followed by an INSERT. All rows are also tagged with date, time, and transaction ID information, we can always identify the latest update.

Finally, one other thing to note about the Clickhouse environment, and that’s the data types are defined slightly differently. In most databases we are familiar with INT, or LONG or VARCHAR. Within Clickhouse the datatypes you use within the database for table fields more closely match the types in C, so Int32 or Int64. That means creating a simple table uses a definition like this:

CREATE TABLE sales.stage_xxx_msg ( tungsten_opcode String, tungsten_seqno Int32, tungsten_row_id Int32, tungsten_commit_timestamp String, id Int32, msg String ) ENGINE = Log;

You can also see we dont have a timestamp datatype, or CHAR/VARCHAR, just String.

With all that in mind, let’s try loading some data into Clickhouse using Tungsten Replicator!

First, a basic MySQL extraction recipe:

tools/tpm configure alpha \ --disable-relay-logs=true \ --enable-heterogeneous-service=true \ --install-directory=/opt/continuent \ --master=ubuntuheterosrc \ --mysql-allow-intensive-checks=true \ --replication-password=Tamsin \ --replication-user=root \ --skip-validation-check=MySQLMyISAMCheck

We’re going to use a fairly standard replicator install, extracting from a basic MySQL 5.7 server and insert the change data into Clickhouse.

For the Clickhouse side, we’ll use the batch applier with a different, custom, template:

tools/tpm configure alpha \ --batch-enabled=true \ --batch-load-template=clickhouse \ --datasource-mysql-conf=/dev/null \ --datasource-type=file \ --install-directory=/opt/continuent \ --master=ubuntuheterosrc \ --members=clickhouse2 \ --property=replicator.datasource.global.csvType=vertica \ --replication-password=password \ --replication-port=8123 \ --replication-user=tungsten \ --skip-validation-check=InstallerMasterSlaveCheck \ --start-and-report=true

That’s it! We make one other change from other installations, in that because we cannot update information in Clickhouse, rather than using Clickhouse to store the Replicator status information, we’ll use the File datasource type, which stores the information within a file on the local filesystem.

To generate this information I’ll generate about 18,000 transactions of data which is a mixture of INSERT, DELETE and UPDATE operations, we’ll load this into MySQL in tandem across 20 threads.

Let’s run the load and check clickhouse:

clickhouse2 :) select * from stage_xxx_msg limit 10; SELECT * FROM stage_xxx_msg LIMIT 10 ┌─tungsten_opcode─┬─tungsten_seqno─┬─tungsten_row_id─┬─tungsten_commit_timestamp─┬─id─┬─msg──────────────────┐ │ I │ 15 │ 1 │ 2018-12-12 09:48:17.000 │ 9 │ 4qwciTQiKdSrZKCwflf1 │ │ I │ 16 │ 2 │ 2018-12-12 09:48:17.000 │ 10 │ Qorw8T10xLwt7R0h7PsD │ │ I │ 17 │ 3 │ 2018-12-12 09:48:17.000 │ 11 │ hx2QIasJGShory3Xv907 │ │ I │ 19 │ 1 │ 2018-12-12 09:48:17.000 │ 12 │ oMxnT7RhLWpvQSGYtE6V │ │ I │ 20 │ 2 │ 2018-12-12 09:48:17.000 │ 13 │ fEuDvFWyanb1bV9Hq8iM │ │ I │ 23 │ 1 │ 2018-12-12 09:48:17.000 │ 14 │ oLVGsNjMPfWcxnRMkpKI │ │ I │ 25 │ 2 │ 2018-12-12 09:48:17.000 │ 15 │ w3rYUrzxXjb3o9iTHtnS │ │ I │ 27 │ 3 │ 2018-12-12 09:48:17.000 │ 16 │ aDFjRpTOK6ruj3JaX2Na │ │ I │ 30 │ 4 │ 2018-12-12 09:48:17.000 │ 17 │ SXDxPemQ5YI33iT1MVoZ │ │ I │ 32 │ 5 │ 2018-12-12 09:48:17.000 │ 18 │ 8Ta8C0fjIMRYEfVZBZjE │ └─────────────────┴────────────────┴─────────────────┴───────────────────────────┴────┴──────────────────────┘ 10 rows in set. Elapsed: 0.005 sec.

Analysing the overall times, I processed 358,980 transactions through MySQL and into Clickhouse using relatively modest virtual machines on my laptop and it took 538 seconds. That’s about 670 transactions a second. Bear in mind we’re comitting every 100 rows here, larger commit intervals would probably be quicker overall. This is using the default settings, and I know from past testing and imports that I can go much faster.

I’d count that as a success!

Bear in mind we’re also writing to separate databases and tables here, but with the adddbname filter and the modified applier we can insert all of that data into a single table so that if you are concentrating data into a single database/table combination you can do this in one step with Tungsten Replicator.

As I said before, Clickhouse is not currently a supported target for the Replicator, but if you are interested please get in touch!

Some Notes on MariaDB system-versioned Tables

As mentioned in a previous post, I gave a talk at Percona Live Europe 2018 about system-versioned tables. This is a new MariaDB 10.3 feature, which consists of preserving old versions of a table rows. Each version has two timestamps that indicate the start (INSERT,UPDATE) of the validity of that version, and its end (DELETE, UPDATE). As a result, the user is able to query these tables as they appear at a point in the past, or how data evolved in a certain time range. An alternative name for this feature is temporal table, and I will use it in the rest of this text.

In this post, I want to talk a bit about temporal tables best practices. Some of the information that I will provide is not present in the documentation; while they are based on my experience and tests, there could be errors. My suggestions for good practices are also based on my experience and opinions, and I don’t consider them as universal truths. If you have different opinions, I hope that you will share them in the comments or as a separate blog post.

Create temporal columns

It is possible – but optional – to create the columns that contain the timestamps of rows. Since there is no special term for them, I call them temporal columns. MariaDB allows us to give them any name we like, so I like to use the names valid_from and valid_to, which seem to be some sort of de facto standard in data warehousing. Whichever names you decide to use, I advise you to use them for all your temporal columns and for nothing else, so that the meaning will be clear.

Temporal columns are generated columns, meaning that their values are generated by MariaDB and cannot be modified by the user. They are also invisible columns, which means that they can only be read by mentioning them explicitly. In other words, the following query will not return those columns:

SELECT * FROM temporal_table;

Also, that query will only show current versions of the rows. In this way, if we make a table temporal, existing applications and queries will continue to work as before.

But we can still read old versions and obtain timestamp with a query like this:

SELECT *, valid_from, valid_to     FROM temporal_table FOR SYSTEM_TIME ALL     WHERE valid_from < NOW() - INTERVAL 1 MONTH;

If we don’t create these columns, we will not be able to read the timestamps of current and old row versions. We will still be able to read data from a point in time or from a time range by using some special syntax. However, I believe that using the consolidated WHERE syntax is easier and more expressive than using some syntax sugar.

Primary keys

For performance reasons, InnoDB tables should always have a primary key, and normally it shouldn’t be updated. Temporal tables provide another reason to follow this golden rule – even on storage engines that are not organised by primary key, like MyISAM.

The reason is easy to demonstrate with an example:

SELECT id, valid_from, valid_to FROM t FOR SYSTEM_TIME ALL WHERE id IN (500, 501); +-----+----------------------------+----------------------------+ | id | valid_from | valid_to | +-----+----------------------------+----------------------------+ | 500 | 2018-12-09 12:22:45.000001 | 2018-12-09 12:23:03.000001 | | 501 | 2018-12-09 12:23:03.000001 | 2038-01-19 03:14:07.999999 | +-----+----------------------------+----------------------------+

What do these results mean? Maybe row 500 has been deleted and row 501 has been added. Or maybe row 500 has been modified, and its id became 501. The timestamps suggest that the latter hypothesis is more likely, but there is no way to know that for sure.

That is why, in my opinion, we need to be able to assume that UPDATEs never touch primary key values.

Indexes

Currently, the documentation says nothing about how temporal columns are indexed. However, my conclusion is that the valid_to column is appended to UNIQUE indexes and the primary key. My opinion is based on the results of some EXPLAIN commands, like the following:

EXPLAIN SELECT email, valid_to FROM customer ORDER BY email \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: customer type: index possible_keys: NULL key: unq_email key_len: 59 ref: NULL rows: 4 Extra: Using where; Using index

This means that the query only reads from a UNIQUE index, and not from table data – therefore, the index contains the email column. It is also able to use the index for sorting, which confirms that email is the first column (as expected). In this way, UNIQUE indexes don’t prevent the same value from appearing multiple times, but it will always be shown at different points in time.

It can be a good idea to include valid_to or valid_from in some regular indexes, to optimize queries that use such columns for filtering results.

Transaction-safe temporal tables

Temporal columns contain timestamps that indicate when a row was INSERTed, UPDATEd, or DELETEd. So, when autocommit is not enabled, temporal columns don’t match the COMMIT time. For most use cases, this behaviour is desirable or at least acceptable. But there are cases when we want to only see committed data, to avoid data inconsistencies that were never seen by applications.

To do so, we can create a history-precise temporal table. This only works with InnoDB – not with RocksDB or TokuDB, even if they support transactions. A history-precise temporal table doesn’t contain timestamps; instead, it contains the id’s of transactions that created and deleted each row version. If you know PostgreSQL, you are probably familiar with the xmin and xmax columns – it’s basically the same idea, except that in postgres at some point autovacuum will make old row versions disappear. Because of the similarity, for transaction-precise temporal tables, I like to call the temporal columns xmin and xmax.

From this short description, the astute reader may already see a couple of problems with this approach:

  • Temporal tables are based on transaction id’s or on timestamps, not both. There is no way to run a transaction-precise query to extract data that were present one hour ago. But think about it: even if it was possible, it would be at least problematic, because transactions are meant to be concurrent.
  • Transaction id’s are written in the binary log, but such information is typically only accessible by DBAs. An analyst (someone who’s typically interested in temporal tables) has no access to transaction id’s.

A partial workaround would be to query tables with columns like created_at and modified_at. We can run queries like this:

SELECT created_at, xmin FROM some_table WHERE created_at >= '2018-05-05 16:00:00' ORDER BY created_at LIMIT 1;

This will return the timestamp of the first row created since ‘2018-05-05 16:00:00’, as well as the id of the transaction which inserted it.

While this approach could give us the information we need with a reasonable extra work, it’s possible that we don’t have such columns, or that rows are not inserted often enough in tables that have them.

In this case, we can occasionally write in a table the current timestamp and the current transaction id. This should allow us to associate a transaction to the timestamp we are interested in. We cannot write all transaction id’s for performance reasons, so we can use two different approaches:

  • Write the transaction id and the timestamp periodically, for example each minute. This will not create performance problems. On the other hand, we are arbitrarily deciding the granularity of our “log”. This could be acceptable or not.
  • Write this information when certain events happen. For example when a product is purchased, or when a user changes their password. This will give us a very precise way to see the data as they appeared during critical events, but will not allow us to investigate with the same precision other types of events.
Partitioning

If we look at older implementations of temporary tables, in the world of proprietary databases (Db2, SQL Server, Oracle), they generally store historical data in a separate physical table or partition, sometimes called a history table. In MariaDB this doesn’t happen automatically or by default, leaving the choice to the user. However, it seems to me a good idea in the general case to create one or more partitions to store historical rows. The main reason is that, rarely, a query has to read both historical and current data, and reading only one partition is an interesting optimization.

Excluding columns from versioning

MariaDB allows us to exclude some columns from versioning. This means that if we update the values of those columns, we update the current row version in place rather than creating a new one. This is probably useful if a column is frequently updated and we don’t care about these changes. However, if we update more columns with one statement, and only a subset of them is excluded from versioning, a new row version is still created. All in all, the partial exclusion of some rows could be more confusing than useful in several cases.

Replication

10.3 is a stable version, but it is still recent. Some of us adopt a new major version after some years, and we can even have reasons to stick with an old version. Furthermore, of course, many of us use MySQL, and MariaDB is not a drop-in replacement.

But we can still enjoy temporal tables by adding a MariaDB 10.3 slave. I attached such a slave to older MariaDB versions, and to MySQL 5.6. In all tests, the feature behaved as expected.

Initially, I was worried about replication lags. I assumed that, if replication lags, the slave applies the changes with a delay, and the timestamps in the tables are delayed accordingly. I am glad to say that I was wrong: the timestamps in temporal tables seem to match the ones in the binary log, so replication lags don’t affect their correctness.

This is true both with row-based replication and with statement-based replication.

A small caveat about temporal tables is that the version timestamps are only precise at second level. The fractional part should be ignored. You may have noticed this in the example at the beginning of this post.

Backups

For backups you will need to use mariabackup instead of xtrabackup.

mysqldump can be used, not necessarily from a MariaDB distribution. However, it treats temporal tables as regular tables. It does not backup historical data. This is necessary because of a design choice: we cannot insert rows with timestamps in the past. This makes temporal tables much more reliable. Also, temporal tables are likely to be (or become) quite big, so a dump is probably not the best way to backup them.


Photo by Ashim D’Silva on Unsplash

The post Some Notes on MariaDB system-versioned Tables appeared first on Percona Community Blog.

Convert Class Components to Functional Components in a React Project (Solution to Code Challenge #14)

Last week on the code challenge we set out to refactor some class components in a create-react-app project to functional components using react hooks.

In this post, we shall complete the challenge. On completion of this challenge, you should be able to completely write react components having state and lifecycle methods using JavaScript functions.

Are you yet to complete the challenge, just fork this codesandbox and get started. You can look through these posts by Chris and Peter for guidance.

You can also look through Twitter for the hashtag #ScotchCodeChallenge to share some of the amazing entries, same as the comment section of the challenge post.

The Challenge

We were provided with a simple react app on codesandbox with only the required dependencies installed. In the react app we have 6 individual components all written as class components which required conversion to functional components.

To use React Hooks, you must be running React from version 16.7

Brief

React introduced the use of React hooks in the alpha version of react 16.7. How do hooks work? Two new APIs were exposed to handle state and lifecycle methods (which are the core components of class functions). These APIs are useState and useEffect which handles state and lifecycle methods effectively.

Next, we shall explore the usage of these two hooks - Hooks help you utilize the features of class components in functional components.

The Solution

Without looking through the process of re-writing the components you can go straight the solution codesandbox here: https://codesandbox.io/s/mq297nro3x

https://codesandbox.io/s/mq297nro3x

Looking at the individual components from first to last in src/components we have:

One.js

This was previously written as a class component with a default export.

import React, { Component } from "react"; class One extends Component { state = { count: 0 }; increase = () => { this.setState({ count: this.state.count + 1 }); }; render() { return ( <div style={{ marginBottom: "50px" }}> <h2>Challenge 1</h2> <p>Count is: {this.state.count}</p> <button onClick={this.increase}>Increase Count!</button> </div> ); } } export default One;

As can be seen, we have a state variable present count, which is used to build a simple counter. Using the useState hook we shall re-write the component to a function.

import React, { useState } from "react"; const One = () => { const [count, setCount] = useState(0); return ( <div style={{ marginBottom: "50px" }}> <h2>Challenge 1</h2> <p>Count is: {count}</p> <button onClick={() => setCount(count + 1)}>Increase Count!</button> </div> ); }; export default One;

We can see this is much cleaner and condensed. Here we simply used destructured the useState hook and assigned the values of count and setCount to the state value and the method to change this value respectively.

The initial value of the state variable is passed to useState and hence, the value of count and actions of setCount can then be used in the rendered component.

Two.js

Similarly, from One.js we have a state variable only this time it's a string.

import React, { Component } from "react"; class Two extends Component { state = { activeUser: "Chris" }; changeUser = () => { this.setState({ activeUser: "Bolingo!" }); }; render() { return ( <div style={{ marginBottom: "50px" }}> <h2>Challenge 2</h2> <p>Active User is: {this.state.activeUser}</p> <button onClick={this.changeUser}>Change Me!</button> </div> ); } } export default Two;

In a similar fashion we utilize useState to include the state variable in a functional component as seen below:

import React, { useState } from "react"; const Two = () => { const [activeUser, changeUser] = useState("Chris"); const newName = () => changeUser("Bolingoli!"); return ( <div style={{ marginBottom: "50px" }}> <h2>Challenge 2</h2> <p>Active User is: {activeUser}</p> <button onClick={newName}>Change Me!</button> </div> ); }; export default Two;

In this component, we created a new function to handle the name change using the changeUser function destructured from useState.

How about we look at multiple state values this time next.

Three.js

In this component, we have 3 state variables which would change in the component. useState can be destructured multiple times for each individual variable. With this, we can handle all state variable at once, and on render, they are invoked in the order which they are listed.

import React, { Component } from "react"; class Three extends Component { state = { year: 1995, type: "Mercedes", used: true }; swapCar = () => { this.setState({ year: 2018, type: "BMW", used: false }); }; render() { return ( <div style={{ marginBottom: "50px" }}> <h2>Challenge 3</h2> <h3>Car Spec is:</h3> <ul> <li>{this.state.type}</li> <li>{this.state.year}</li> <li>{this.state.used ? "Used Car" : "Brand New!"}</li> </ul> <button onClick={this.swapCar}>Swap Car!</button> </div> ); } } export default Three;

Refactoring this we have:

import React, { useState } from "react"; function Three() { const [year, changeYear] = useState(1995); const [type, changeType] = useState("Mercedes"); const [used, changeCondition] = useState(true); const swapCar = () => { changeYear(2018); changeType("BMW"); changeCondition(false); }; return ( <div style={{ marginBottom: "50px" }}> <h2>Challenge 3</h2> <h3>Car Spec is:</h3> <ul> <li>{type}</li> <li>{year}</li> <li>{used ? "Used Car" : "Brand New!"}</li> </ul> <button onClick={swapCar}>Swap Car!</button> </div> ); } export default Three;

Similarly, we created a function swapCar to handle all changes.

Four.js

In this component, componentDidMount was used to update the value of a state variable, 5 seconds after the component mounts. Here we would require the useEffect hook to handle the lifecycle method. Originally we had:

import React, { Component } from "react"; class Four extends Component { state = { message: "What's happening this week?" }; componentDidMount() { setTimeout(() => { this.setState({ message: "I only know it's gon be lit!!" }); }, 5000); } render() { return ( <div style={{ marginBottom: "50px" }}> <h2>Challenge 4</h2> <p>Status: {this.state.message}</p> </div> ); } } export default Four;

With useEffect and useState we have:

import React, { useState, useEffect } from "react"; const Four = () => { const [message, newessage] = useState("What's happening this week?"); useEffect(() => { setTimeout(() => { newessage("I only know it's gon be lit!!"); }, 5000); }, []); return ( <div style={{ marginBottom: "50px" }}> <h2>Challenge 4</h2> <p>Status: {message}</p> </div> ); }; export default Four;

In useEffect , we used the setTimeout function to delay the execution of the newMessage method which updates the state. Notice the empty array passed as a second parameter passed to useEffect, this allows the function to run only once (on the first render). Functions in useEffect update for everytime the value of the array or a value in the array changes. This should give you insights on how to handle updates in the component.

Five.js

Here we are required to convert a class component which conditionally renders another component using ternary operators, to a functional component.

import React, { Component } from "react"; import Little from "./Little"; class Five extends Component { state = { showText: true }; showLittle = () => { this.setState({ showText: !this.state.showText }); }; render() { return ( <div style={{ marginBottom: "50px" }}> <h2>Challenge 5</h2> <h3>Here below lies little text in a box</h3> <button onClick={this.showLittle}>Click to toggle Little</button> {this.state.showText ? <Little /> : ""} </div> ); } } export default Five;

Converting to a functional component we have:

import React, { useState } from "react"; import Little from "./Little"; const Five = () => { const [showText, toggleShowText] = useState(true); const showLittle = () => { toggleShowText(!showText); }; return ( <div style={{ marginBottom: "50px" }}> <h2>Challenge 5</h2> <h3>Here below lies little text in a box</h3> <button onClick={showLittle}>Click to toggle Little</button> {showText ? <Little /> : ""} </div> ); }; export default Five;

useState was employed once more to create and update a state value. Also, a component Little.js was imported and conditionally rendered once the state is toggled using the created button. Let's look at the last component and its behavior on toggle.

Little.js

We have a simple class component which utilizes the componentWillUnmount method to alert users. This is just a simple depiction of the method and other actions can be carried out in the method.

import React, { Component } from "react"; class Little extends Component { componentWillUnmount() { alert("Goodbye!!"); } render() { return ( <div style={{ marginBottom: "50px", border: "1px solid black" }}> <h5> Hi I'm Little and its nice to meet you!!!</h5> </div> ); } } export default Little;

Converting this to a functional component and retaining the features of the lifecycle method we have:

import React, { useEffect } from "react"; const Little = () => { useEffect(() => { return () => { alert("Goodbye!!"); }; }); return ( <div style={{ marginBottom: "50px", border: "1px solid black" }}> <h5> Hi I'm Little and its nice to meet you!!!</h5> </div> ); }; export default Little;

To use the componentWillUnmount method, simply return a function in the useEffect hook. Now once we click the button in Five.js to toggle Little.js the function in the useEffect hook of Little.js will be called before the component unmounts.

Conclusion

In this post, we saw the usage of React Hooks to convert class components to functional components yet retaining the features like state and lifecycle in these components. Try out your hands on other lifecycle methods as well as using both useEffect and useState in a single component.

Shinguz: To NULL, or not to NULL, that is the question!

As we already stated in earlier articles in this blog [1 and 2] it is a good idea to use NULL values properly in MariaDB and MySQL.

One of my Mantras in MariaDB performance tuning is: Smaller tables lead to faster queries! One consequence out of this is to store NULL values instead of some dummy values into the columns if the value is not known (NULL: undefined/unknown).

To show how this helps related to space used by a table we created a little example:

CREATE TABLE big_null1 ( id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , c01 VARCHAR(32) NOT NULL , c02 VARCHAR(32) NOT NULL , c03 VARCHAR(32) NOT NULL , c04 VARCHAR(32) NOT NULL , c05 VARCHAR(32) NOT NULL , c06 VARCHAR(32) NOT NULL , c07 VARCHAR(32) NOT NULL , c08 VARCHAR(32) NOT NULL , c09 VARCHAR(32) NOT NULL , c10 VARCHAR(32) NOT NULL , c11 VARCHAR(32) NOT NULL , c12 VARCHAR(32) NOT NULL , INDEX (c03) , INDEX (c06) , INDEX (c09) ); CREATE TABLE big_null2 ( id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , c01 VARCHAR(32) NOT NULL , c02 VARCHAR(32) NOT NULL , c03 VARCHAR(32) NOT NULL , c04 VARCHAR(32) NOT NULL , c05 VARCHAR(32) NOT NULL , c06 VARCHAR(32) NOT NULL , c07 VARCHAR(32) NOT NULL , c08 VARCHAR(32) NOT NULL , c09 VARCHAR(32) NOT NULL , c10 VARCHAR(32) NOT NULL , c11 VARCHAR(32) NOT NULL , c12 VARCHAR(32) NOT NULL , INDEX (c03) , INDEX (c06) , INDEX (c09) );

Now we fill the table with default values (empty string or dummy values) because we do not know yet the contents:

INSERT INTO big_null1 VALUES (NULL, '', '', '', '', '', '', '', '', '', '', '', ''); INSERT INTO big_null1 SELECT NULL, '', '', '', '', '', '', '', '', '', '', '', '' FROM big_null1; ... up to 1 Mio rows INSERT INTO big_null2 VALUES (NULL, 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.' , 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.'); INSERT INTO big_null2 SELECT NULL, 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.' , 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.' FROM big_null2; ... up to 1 Mio rows ANALYZE TABLE big_null1; ANALYZE TABLE big_null2; SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free FROM information_schema.tables WHERE table_name IN ('big_null1', 'big_null2') ORDER BY table_name; +------------+------------+----------------+-------------+--------------+-----------+ | table_name | table_rows | avg_row_length | data_length | index_length | data_free | +------------+------------+----------------+-------------+--------------+-----------+ | big_null1 | 1046760 | 37 | 39387136 | 36225024 | 4194304 | | big_null2 | 1031990 | 264 | 273416192 | 89899008 | 6291456 | +------------+------------+----------------+-------------+--------------+-----------+

The opposite example is a table which allows NULL values for unknown fields:

CREATE TABLE big_null3 ( id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , c01 VARCHAR(32) NULL , c02 VARCHAR(32) NULL , c03 VARCHAR(32) NULL , c04 VARCHAR(32) NULL , c05 VARCHAR(32) NULL , c06 VARCHAR(32) NULL , c07 VARCHAR(32) NULL , c08 VARCHAR(32) NULL , c09 VARCHAR(32) NULL , c10 VARCHAR(32) NULL , c11 VARCHAR(32) NULL , c12 VARCHAR(32) NULL , INDEX (c03) , INDEX (c06) , INDEX (c09) );

Also this table is filled with unknown values but this time with value NULL instead of an empty string:

INSERT INTO big_null3 (id) VALUES (NULL); INSERT INTO big_null3 (id) SELECT NULL FROM big_null3; ... up to 1 Mio rows ANALYZE TABLE big_null3; SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free FROM information_schema.tables WHERE table_name IN ('big_null1', 'big_null2', 'big_null3') ORDER BY table_name ; +------------+------------+----------------+-------------+--------------+-----------+ | table_name | table_rows | avg_row_length | data_length | index_length | data_free | +------------+------------+----------------+-------------+--------------+-----------+ | big_null1 | 1046760 | 37 | 39387136 | 36225024 | 4194304 | | big_null2 | 1031990 | 264 | 273416192 | 89899008 | 6291456 | | big_null3 | 1047800 | 26 | 27852800 | 36225024 | 7340032 | +------------+------------+----------------+-------------+--------------+-----------+

We see, that this table already uses much less space when we make correct use of NULL values...

So let us do some simple query run time tests:

big_null1 big_null2 big_null3 SELECT * FROM big_nullx 1.1 s 1.3 s 0.9 s SELECT * FROM big_nullx AS t1
  JOIN big_nullx AS t2 ON t2.id = t1.id
  JOIN big_nullx AS t3 ON t1.id = t3.id 5.0 s 5.7 s 4.2 s

One of my advices is, to fill the columns with NULL values if possible. So let us try this advice as well:

CREATE TABLE big_null4 ( id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , c01 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c02 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c03 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c04 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c05 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c06 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c07 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c08 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c09 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c10 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c11 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c12 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , INDEX (c03) , INDEX (c06) , INDEX (c09) ); INSERT INTO big_null4 (id) VALUES (NULL); INSERT INTO big_null4 (id) SELECT NULL FROM big_null4; ... up to 1 Mio rows ANALYZE TABLE big_null4; SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free FROM information_schema.tables WHERE table_name IN ('big_null1', 'big_null2', 'big_null3', 'big_null4') ORDER BY table_name ; +------------+------------+----------------+-------------+--------------+-----------+ | table_name | table_rows | avg_row_length | data_length | index_length | data_free | +------------+------------+----------------+-------------+--------------+-----------+ | big_null1 | 1046760 | 37 | 39387136 | 36225024 | 4194304 | | big_null2 | 1031990 | 264 | 273416192 | 89899008 | 6291456 | | big_null3 | 1047800 | 26 | 27852800 | 36225024 | 7340032 | | big_null4 | 998533 | 383 | 382599168 | 118358016 | 6291456 | +------------+------------+----------------+-------------+--------------+-----------+

So following my advice we fill with NULL values:

UPDATE big_null4 SET c01 = NULL, c02 = NULL, c03 = NULL, c04 = NULL, c05 = NULL, c06 = NULL , c07 = NULL, c08 = NULL, c09 = NULL, c10 = NULL, c11 = NULL, c12 = NULL; ANALYZE TABLE big_null4; SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free FROM information_schema.tables WHERE table_name IN ('big_null1', 'big_null2', 'big_null3', 'big_null4') ORDER BY table_name; +------------+------------+----------------+-------------+--------------+-----------+ | table_name | table_rows | avg_row_length | data_length | index_length | data_free | +------------+------------+----------------+-------------+--------------+-----------+ | big_null1 | 1046760 | 37 | 39387136 | 36225024 | 4194304 | | big_null2 | 1031990 | 264 | 273416192 | 89899008 | 6291456 | | big_null3 | 1047800 | 26 | 27852800 | 36225024 | 7340032 | | big_null4 | 1047285 | 364 | 381779968 | 126222336 | 33554432 | +------------+------------+----------------+-------------+--------------+-----------+

It seems like we do not see the effect yet. So lets optimize the table to reclaim the space:

OPTIMIZE TABLE big_null4; SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free FROM information_schema.tables WHERE table_name IN ('big_null1', 'big_null2', 'big_null3', 'big_null4') ORDER BY table_name ; +------------+------------+----------------+-------------+--------------+-----------+ | table_name | table_rows | avg_row_length | data_length | index_length | data_free | +------------+------------+----------------+-------------+--------------+-----------+ | big_null1 | 1046760 | 37 | 39387136 | 36225024 | 4194304 | | big_null2 | 1031990 | 264 | 273416192 | 89899008 | 6291456 | | big_null3 | 1047800 | 26 | 27852800 | 36225024 | 7340032 | | big_null4 | 1047180 | 30 | 32030720 | 39370752 | 4194304 | +------------+------------+----------------+-------------+--------------+-----------+

And you see there we get much of the space back... NULL is a good thing!

Taxonomy upgrade extras:  null performance optimize Backup table default

Shinguz: To NULL, or not to NULL, that is the question!

As we already stated in earlier articles in this blog [1 and 2] it is a good idea to use NULL values properly in MariaDB and MySQL.

One of my Mantras in MariaDB performance tuning is: Smaller tables lead to faster queries! One consequence out of this is to store NULL values instead of some dummy values into the columns if the value is not known (NULL: undefined/unknown).

To show how this helps related to space used by a table we created a little example:

CREATE TABLE big_null1 ( id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , c01 VARCHAR(32) NOT NULL , c02 VARCHAR(32) NOT NULL , c03 VARCHAR(32) NOT NULL , c04 VARCHAR(32) NOT NULL , c05 VARCHAR(32) NOT NULL , c06 VARCHAR(32) NOT NULL , c07 VARCHAR(32) NOT NULL , c08 VARCHAR(32) NOT NULL , c09 VARCHAR(32) NOT NULL , c10 VARCHAR(32) NOT NULL , c11 VARCHAR(32) NOT NULL , c12 VARCHAR(32) NOT NULL , INDEX (c03) , INDEX (c06) , INDEX (c09) ); CREATE TABLE big_null2 ( id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , c01 VARCHAR(32) NOT NULL , c02 VARCHAR(32) NOT NULL , c03 VARCHAR(32) NOT NULL , c04 VARCHAR(32) NOT NULL , c05 VARCHAR(32) NOT NULL , c06 VARCHAR(32) NOT NULL , c07 VARCHAR(32) NOT NULL , c08 VARCHAR(32) NOT NULL , c09 VARCHAR(32) NOT NULL , c10 VARCHAR(32) NOT NULL , c11 VARCHAR(32) NOT NULL , c12 VARCHAR(32) NOT NULL , INDEX (c03) , INDEX (c06) , INDEX (c09) );

Now we fill the table with default values (empty string or dummy values) because we do not know yet the contents:

INSERT INTO big_null1 VALUES (NULL, '', '', '', '', '', '', '', '', '', '', '', ''); INSERT INTO big_null1 SELECT NULL, '', '', '', '', '', '', '', '', '', '', '', '' FROM big_null1; ... up to 1 Mio rows INSERT INTO big_null2 VALUES (NULL, 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.' , 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.'); INSERT INTO big_null2 SELECT NULL, 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.' , 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.' FROM big_null2; ... up to 1 Mio rows ANALYZE TABLE big_null1; ANALYZE TABLE big_null2; SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free FROM information_schema.tables WHERE table_name IN ('big_null1', 'big_null2') ORDER BY table_name; +------------+------------+----------------+-------------+--------------+-----------+ | table_name | table_rows | avg_row_length | data_length | index_length | data_free | +------------+------------+----------------+-------------+--------------+-----------+ | big_null1 | 1046760 | 37 | 39387136 | 36225024 | 4194304 | | big_null2 | 1031990 | 264 | 273416192 | 89899008 | 6291456 | +------------+------------+----------------+-------------+--------------+-----------+

The opposite example is a table which allows NULL values for unknown fields:

CREATE TABLE big_null3 ( id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , c01 VARCHAR(32) NULL , c02 VARCHAR(32) NULL , c03 VARCHAR(32) NULL , c04 VARCHAR(32) NULL , c05 VARCHAR(32) NULL , c06 VARCHAR(32) NULL , c07 VARCHAR(32) NULL , c08 VARCHAR(32) NULL , c09 VARCHAR(32) NULL , c10 VARCHAR(32) NULL , c11 VARCHAR(32) NULL , c12 VARCHAR(32) NULL , INDEX (c03) , INDEX (c06) , INDEX (c09) );

Also this table is filled with unknown values but this time with value NULL instead of an empty string:

INSERT INTO big_null3 (id) VALUES (NULL); INSERT INTO big_null3 (id) SELECT NULL FROM big_null3; ... up to 1 Mio rows ANALYZE TABLE big_null3; SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free FROM information_schema.tables WHERE table_name IN ('big_null1', 'big_null2', 'big_null3') ORDER BY table_name ; +------------+------------+----------------+-------------+--------------+-----------+ | table_name | table_rows | avg_row_length | data_length | index_length | data_free | +------------+------------+----------------+-------------+--------------+-----------+ | big_null1 | 1046760 | 37 | 39387136 | 36225024 | 4194304 | | big_null2 | 1031990 | 264 | 273416192 | 89899008 | 6291456 | | big_null3 | 1047800 | 26 | 27852800 | 36225024 | 7340032 | +------------+------------+----------------+-------------+--------------+-----------+

We see, that this table already uses much less space when we make correct use of NULL values...

So let us do some simple query run time tests:

big_null1 big_null2 big_null3 SELECT * FROM big_nullx 1.1 s 1.3 s 0.9 s SELECT * FROM big_nullx AS t1
  JOIN big_nullx AS t2 ON t2.id = t1.id
  JOIN big_nullx AS t3 ON t1.id = t3.id 5.0 s 5.7 s 4.2 s

One of my advices is, to fill the columns with NULL values if possible. So let us try this advice as well:

CREATE TABLE big_null4 ( id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , c01 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c02 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c03 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c04 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c05 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c06 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c07 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c08 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c09 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c10 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c11 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , c12 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!' , INDEX (c03) , INDEX (c06) , INDEX (c09) ); INSERT INTO big_null4 (id) VALUES (NULL); INSERT INTO big_null4 (id) SELECT NULL FROM big_null4; ... up to 1 Mio rows ANALYZE TABLE big_null4; SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free FROM information_schema.tables WHERE table_name IN ('big_null1', 'big_null2', 'big_null3', 'big_null4') ORDER BY table_name ; +------------+------------+----------------+-------------+--------------+-----------+ | table_name | table_rows | avg_row_length | data_length | index_length | data_free | +------------+------------+----------------+-------------+--------------+-----------+ | big_null1 | 1046760 | 37 | 39387136 | 36225024 | 4194304 | | big_null2 | 1031990 | 264 | 273416192 | 89899008 | 6291456 | | big_null3 | 1047800 | 26 | 27852800 | 36225024 | 7340032 | | big_null4 | 998533 | 383 | 382599168 | 118358016 | 6291456 | +------------+------------+----------------+-------------+--------------+-----------+

So following my advice we fill with NULL values:

UPDATE big_null4 SET c01 = NULL, c02 = NULL, c03 = NULL, c04 = NULL, c05 = NULL, c06 = NULL , c07 = NULL, c08 = NULL, c09 = NULL, c10 = NULL, c11 = NULL, c12 = NULL; ANALYZE TABLE big_null4; SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free FROM information_schema.tables WHERE table_name IN ('big_null1', 'big_null2', 'big_null3', 'big_null4') ORDER BY table_name; +------------+------------+----------------+-------------+--------------+-----------+ | table_name | table_rows | avg_row_length | data_length | index_length | data_free | +------------+------------+----------------+-------------+--------------+-----------+ | big_null1 | 1046760 | 37 | 39387136 | 36225024 | 4194304 | | big_null2 | 1031990 | 264 | 273416192 | 89899008 | 6291456 | | big_null3 | 1047800 | 26 | 27852800 | 36225024 | 7340032 | | big_null4 | 1047285 | 364 | 381779968 | 126222336 | 33554432 | +------------+------------+----------------+-------------+--------------+-----------+

It seems like we do not see the effect yet. So lets optimize the table to reclaim the space:

OPTIMIZE TABLE big_null4; SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free FROM information_schema.tables WHERE table_name IN ('big_null1', 'big_null2', 'big_null3', 'big_null4') ORDER BY table_name ; +------------+------------+----------------+-------------+--------------+-----------+ | table_name | table_rows | avg_row_length | data_length | index_length | data_free | +------------+------------+----------------+-------------+--------------+-----------+ | big_null1 | 1046760 | 37 | 39387136 | 36225024 | 4194304 | | big_null2 | 1031990 | 264 | 273416192 | 89899008 | 6291456 | | big_null3 | 1047800 | 26 | 27852800 | 36225024 | 7340032 | | big_null4 | 1047180 | 30 | 32030720 | 39370752 | 4194304 | +------------+------------+----------------+-------------+--------------+-----------+

And you see there we get much of the space back... NULL is a good thing!

Taxonomy upgrade extras:  null performance optimize Backup table default

LSM math - how many levels minimizes write amplification?

How do you configure an LSM tree with leveled compaction to minimize write amplification? For a given number of levels write-amp is minimal when the same fanout (growth factor) is used between all levels, but that does not explain the number of levels to use. In this post I answer that question.
  1. The number of levels that minimizes write-amp is one of ceil(ln(T)) or floor(ln(T)) where T is the total fanout -- sizeof(database) / sizeof(memtable)
  2. When #1 is done then the per-level fanout is e when the number of levels is ln(t) and a value close to e when the number of levels is an integer.
Introduction
I don't recall reading this result elsewhere, but I am happy to update this post with a link to such a result. I was encouraged to answer this after a discussion with the RocksDB team and thank Siying Dong for stating #2 above while leaving the math to me. I assume the original LSM paper didn't address this problem because that system used a fixed number of levels.
One result from the original LSM paper and updated by me is that write-amp is minimized when the per-level growth factor is constant. Sometimes I use fanout or per-level fanout rather than per-level growth factor. In RocksDB the option name is max_bytes_for_level_multiplier. Yes, this can be confusing. The default fanout in RocksDB is 10.

Math

I solve this for pure-leveled compaction which differs from what RocksDB calls leveled. In pure-leveled all levels used leveled compaction. In RocksDB leveled the first level, L0, uses tiered and the other levels used leveled. I started to explain this here where I claim that RocksDB leveled is really tiered+leveled. But I am not asking for them to change the name.

Assumptions:
  • LSM tree uses pure-leveled compaction and compaction from memtable flushes into the first level of the LSM tree uses leveled compaction
  • total fanout is T and is size(Lmax) / size(memtable) where Lmax is the max level of the LSM tree
  • workload is update-only so the number of keys in the database is fixed
  • workload has no write skew and all keys are equally likely to be updated
  • per-level write-amp == per-level growth factor. In practice and in theory the per-level write-amp tends to be less than the per-level growth factor.
  • total write-amp is the sum of per-level write-amp. I ignore write-amp from the WAL. 

Specify function for write-amp and determine critical points

# wa is the total write-amp
# n is the number of levels
# per-level fanout is the nth root of the total fanout
# per-level fanout = per-level write-amp
# therefore wa = number of levels * per-level fanout
wa = n * t^(1/n)

# given the function for write-amp as wa = a * b
# ... then below is a' * b + a * b'
a = n, b = t^(1/n)
wa' = t^(1/n) + n * ln(t) * t^(1/n) * (-1) * (1/n^2)

# which simplifies to
wa' = t^(1/n) - (1/n) * ln(t) * t^(1/n)

# critical point for this occurs when wa' = 0
t^(1/n) - (1/n) * ln(t) * t^(1/n) = 0
t^(1/n) = (1/n) * ln(t) * t^(1/n)
1 = (1/n) * ln(t)
n = ln(t)

When t = 1024 then n = ln(1024) ~= 6.93. In this case write-amp is minimized when 7 levels are used although 6 isn't a bad choice.

Assuming the cost function is convex (see below) the critical point is the minimum for write-amp. However, n must be an integer so the number of levels that minimizes write-amp is one of: ceil(ln(t)) or floor(ln(t)).

The graph for wa when t=1024 can be viewed thanks to Desmos. The function looks convex and I show below that it is.

Determine whether critical point is a min or max

The critical point found above is a minimum for wa if wa is convex so we must show that the second derivative is positive.

wa = n * t ^ (1/n)
wa' = t^(1/n) - (1/n) * ln(t) * t^(1/n)
wa' = t^(1/n) * (1 - (1/n) * ln(t))

# assuming wa' is a * b then wa'' is a' * b + a * b' 
a  = t^(1/n)
a' = ln(t) * t^(1/n) * -1 * (1/n^2)
a' = - ln(t) * t^(1/n) * (1/n^2)

b  = 1 - (1/n) * ln(t)
b' = (1/n^2) * ln(t)

# a' * b 
- ln(t) * t^(1/n) * (1/n^2)         --> called x below
+ ln(t) * ln(t) * (1/n^3) * t^(1/n) --> called y below

# b' * a
t^(1/n) * (1/n^2) * ln(t)           --> called z below

# therefore wa'' = x + y + z
# note that x, y and z all contain: t^(1/n), 1/n and ln(t)
wa'' = t^(1/n) * (1/n) * ln(t) * (-(1/n) + (ln(t) * 1/n^2) + (1/n))
wa'' = t^(1/n) * (1/n) * ln(t) * ( ln(t) * 1/n^2 )'
wa'' = t^(1/n) * 1/n^3 * ln(t)^2

Therefore wa'' is positive, wa is convex and the critical point is a minimum value for wa

Solve for per-level fanout

The next step is to determine the value of the per-level fanout when write-amp is minimized. If the number of levels doesn't have to be an integer then this occurs when ln(t) levels are used and below I show that the per-level fanout is e in that case. When the number of levels is limited to an integer then the per-level fanout that minimizes write-amp is a value that is close to e.

# total write-amp is number of levels * per-level fanout
wa = n * t^(1/n)

# The per-level fanout is t^(1/n) and wa is minimized when n = ln(t)
# Therefore we show that t^(1/n) = e when n = ln(t)
Assume t^(1 / ln(t)) = e
ln (t^(1 / ln(t))) = ln e
(1 / ln(t)) * ln(t) = 1
1=1

When the t=1024 then ln(t) ~= 6.93. With 7 levels the per-level fanout is t^(1/7) ~= 2.69 while e ~= 2.72.



OpenSource Conference Tokyo w/ MySQL

As announce on Oct 18, 2018 we would like to just remind about the Open Source Conference in Tokyo, Japan which will be hold tomorrow, December 14, 2018. Please come to hear the MySQL talk on "State of Dolphin" by Yoshiaki Yamasaki, the MySQL Senior Sales Consultant Asia Pacific and Japan region. During his talk the general product updates will be covered. You can also find our team at the MySQL booth in the expo area.

We are looking forward to seeing and talking to you at this show!

 

    Tungsten Clustering 6.0.4 and Tungsten Replicator 6.0.4 Released

    Continuent is pleased to announce that Tungsten Clustering 6.0.4 and Tungsten Replicator 6.0.4 are now available!

    Our v6.0.4 release fixes a number of bugs and introduces some new features, with improvements across the board in a variety of different components.

    Some of the key improvements include:

    • When installing from an RPM, the installation would automatically restart the Connector during the installation. This behavior can now be controlled by setting the parameter --no-connectors within the configuration to prevent tpm from restarting the Connectors during the automated update processing.
    • Cross-site Replicators within a composite multimaster deployment can now be configured to point to slaves by default, and to prefer slaves over masters during operation using the new --policy-relay-from-slave=true option
    • You can now enable an option so that when a site comes back online, the Connector will disconnect all connections that couldn’t get to their preferred site so that they will then reconnect to the expected site with the appropriate affinity (--connector-reset-when-affinity-back=true)

    Some of the highlights common to both products:

    • When using tpm diag, the command would fail to parse net-ssh options
    • The Net::SSH internal options have been updated to reflect changes in the latest Net::SSH release

    Fixes specific to Tungsten Clustering:

    • When performing a tpm update in a cluster with an active witness, ensure that the host with the witness is restarted correctly
    • In a composite multimaster deployment, once a datasource has been welcomed to the cluster, ensure that individual clusters within the composite agree on the overall state of all clusters
    • Because long service names can cause formatting issues, a new option, --cctrl-column-width has been added which can be used to configure the minimum column width used to display information
    • The Connector has been modified to get the driver and JDBC URL of the datasource from the Connector-specific configuration, overriding the information normally distributed to it by the manager. This prevents the Connector from using incorrect settings, or empty values.
    • Ensure that datasources are fenced correctly when a Replicator fails
    • Ensure that standby datasources are displayed within cctrl correctly

    The tungsten_prep_upgrade command could fail if there were certain special characters within the tpm options

  • The check_tungsten_progress command could fail within Composite Multimaster deployments because there is no single default service
  • Tab completion within cctrl would not always work in all cases, especially when the -multi option was in effect
  • -->

    Fixes specific to Tungsten Replicator:

    • The trepctl command previously required the -service option to be the first option on the command-line. The option can now be placed in any position on the command-line.
    • If no service is specified then using trepctl and multiple services are configured, then an error would be reported, but no list of potential services would be provided. This has been updated so that trepctl will output the list available services and potential commands.
    • Heartbeats would be inserted into the replication flow using UTC even if the Replicator had been configured to use a different timezone

    Full release notes are available:

    https://docs.continuent.com/tungsten-clustering-6.0/release-notes-6-0-4.html

    https://docs.continuent.com/tungsten-replicator-6.0/release-notes-6-0-4.html

    AWS Elastic Block Storage (EBS) – Can We Get It Truly Elastic?

    At AWS Re:Invent 2018 there were many great announcements of AWS New Services and New Features, but one basic feature that I’ve been waiting for years to be released is still nowhere to be  found.

    AWS Elastic Block Storage (EBS) is great and it’s got better through the years, adding different storage types and features like Provisioned IOPS. However, it still has the most basic inconvenient requirement – I have to decide in advance how much space I need to allocate, and pay for all of that allocated space whether I use it or not.

    It would be so much better if AWS would allow true consumption model pricing with EBS, where you pay for the storage used, not the storage allocated. This is already the case for S3,  RDS, and even EC2 instances (with Unlimited Option on T2/T3 Instances), not to mention Serverless focused services.

    For example, I would love to be able to create a 1TB EBS volume but only pay for 10GB of storage if I only use this amount of space.

    Modern storage subsystems do a good job differentiating between the space available on the block device and what’s being used by user files and filesystem metadata. The space that’s not allocated any more can be TRIMmed. This is a basic requirement for working well on flash storage, and as modern EC2 instances already provision EBS storage as emulated NVMe devices, I would imagine Amazon could hook into such functionality to track space actually used.

    For us at Percona this would make shipping applications on AWS Marketplace much more convenient. Right now for Percona Monitoring and Management (PMM)  we have to choose how much space to allocate to the EBS volume by default, picking between it being expensive to run because of paying for the large unused EBS volume or setting a very limited by default capacity that needs user action to resize the EBS volume. Consumption-based EBS pricing would solve this dilemma.

    This problem seems to be well recognized and understood. For example Pure Storage Cloud Block Storage (currently in Beta) is  expected to have such a feature.

    I hope with its insane customer focus AWS will add this feature in the future, but currently we have to get by without it.


    Image: Arnold Reinhold [CC BY-SA 2.5], via Wikimedia Commons

    Upcoming Webinar Wed 12/12: MySQL 8 for Developers

    Please join Percona’s CEO Peter Zaitsev as he presents MySQL 8 for Developers on Wednesday, December 12th, 2018 at 11:00 AM PST (UTC-7) / 2:00 PM EST (UTC-5).

    Register Now

    There are many great new features in MySQL 8, but how exactly can they help your application? This session takes a practical look at MySQL 8 features. It also details which limitations of previous MySQL versions are overcome by MySQL 8. Lastly, what you can do with MySQL 8 that you could not have done before is discussed.

    Register for MySQL 8 for Developers to learn how MySQL’s new features can help your application and more.

    Percona XtraBackup 8.0.4 Is Now Available

    Percona is glad to announce the release of Percona XtraBackup 8.0.4 on December 10, 2018. You can download it from our download site and apt and yum repositories.

    Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.

    This release of Percona Xtrabackup is a General Availability release ready for use in a production environment.

    Please note the following about this release:

    • The deprecated innobackupex has been removed. Use the xtrabackup command to back up your instances: $ xtrabackup --backup --target-dir=/data/backup
    • When migrating from earlier database server versions, backup and restore and using XtraBackup 2.4 and then use mysql_upgrade from MySQL 8.0.x
    • If using yum or apt repositories to install Percona Xtrabackup 8.0.4, ensure that you have enabled the new tools repository. You can do this with the percona-release enable tools release command and then install the percona-xtrabackup-80 package.

    All Percona software is open-source and free. We are grateful to the community for the invaluable contributions to Percona XtraBackup. We would especially like to highlight the input of Alexey Kopytov who has been actively offering improvements and submitting bug reports for Percona XtraBackup.

    New Features
    • Percona XtraBackup 8.0.4 is based on MySQL 8.0.13 and fully supports Percona Server for MySQL 8.0 series and MySQL 8.0 series.
    Bugs Fixed
    • PXB-1699:xtrabackup --prepare could fail on backups of MySQL 8.0.13 databases
    • PXB-1704:xtrabackup --prepare could hang while performing insert buffer merge
    • PXB-1668: When the --throttle option was used, the applied value was different from the one specified by the user (off by one error)
    • PXB-1679: PXB could crash when ALTER TABLE … TRUNCATE PARTITION command was run during a backup without locking DDL

    Importing Data by Mask

    Introduction In this article, we will show how to perform routine data export from multiple files by a certain mask with help of the Data Import functionality of dbForge Studio for MySQL and how to schedule the recurring execution of the import with Microsoft Task Scheduler. Scenario Suppose, we need to simultaneously import multiple daily […]

    Pages