Planet MySQL

Druid @ Airbnb Data Platform

How Druid enables analytics at Airbnb Realtime and batch analytics at Airbnb and the role Druid plays in our analytics system architecture

By Pala Muthiah and Jinyang Li

Wikipedia: A druid was a member of the high-ranking professional class in ancient Celtic cultures. It is believed that druids perform secret rituals in forests, not unlike this unique Experience on Airbnb!

Airbnb serves millions of guests and hosts in our community. Every second, their activities on Airbnb.com, such as searching, booking, and messaging, generate a huge amount of data we anonymize and use to improve the community’s experience on our platform.

The Data Platform Team at Airbnb strives to leverage this data to improve our customers’ experiences and optimize Airbnb’s business. Our mission is to provide infrastructure to collect, organize, and process this deluge of data (all in privacy-safe ways), and empower various organizations across Airbnb to derive necessary analytics and make data-informed decisions from it.

The primary way high-level analytics is exposed and shared within the company is through various dashboards. A lot of people use these dashboards every day to make various decisions. Dashboards also allow real-time tracking and monitoring of various aspects of our business and systems. As a result, the timeliness of these dashboards is critical to the daily operation of Airbnb. However, we are faced with three challenges:

First, it would take a long time to aggregate data in the warehouse and generate the necessary data for these dashboards using systems like Hive and Presto at query time. Hive/Presto has to read all the data and aggregate them on demand, resulting in all necessary computation getting invoked at query time. Even if those engines are used to pre-compute the aggregation and store them, the storage format is not optimized for repeated slicing and dicing of data that analytics queries demand.

Second, the system needs to be reliable and scalable. It is powering core analytics use cases at Airbnb, hence any downtime will have severe impact on the business and its employees. Also, the volume of data, queries, and users continue to grow and our analytics system should be able to cope with increasing demand.

Third, we need a system that integrates well with our data infrastructure that is based on open source frameworks. For example, most of our datasets are stored in Hadoop, and we use Kafka and Spark Streaming to process our data streams.

This is where Druid comes in.

Advantages of Druid Fast query time

With predefined data-sources and pre-computed aggregations, Druid offers sub-seconds query latency. The dashboards built on top of Druid can be noticeably faster than those built on others systems. Compared to Hive and Presto, Druid can be an order of magnitude faster.

Architecture that Offers Reliability and Scalability

Druid architecture is well separated out into different components for ingestion, serving, and overall coordination. We have found this componentized architecture to be reliable and stable for our workload, and it has allowed us to scale the system easily as needed.

Druid’s architecture of separating data storage into deep storage for long term storage of data while caching the data temporarily in historical node has worked well for us. Keeping the analytical data permanently in S3 gives us disaster recovery for free and allows us to easily manage upgrade and upkeep of cluster hardware (e.g. easily switch node types to take advantage of latest hardware).

Integration with Open Source Frameworks

Druid also integrates smoothly with the open source data infrastructure that’s primarily based on Hadoop and Kafka:

  1. Druid’s API allows us to easily ingest data from Hadoop for batch analytics
  2. Druid enables real-time analytics via stream processing engines. Druid provides a streaming client API, Tranquility, that is integrated with streaming engines such Samza or Storm and can be integrated with any other JVM based streaming engine. At Airbnb, streaming ingestion of data into Druid for real time analytics is implemented through Spark Streaming job employing the Tranquility client.
  3. Druid is well integrated with Apache Superset, an open source data visualization system developed and open sourced by Airbnb. Superset serves as the interface for users to compose and execute analytics queries on Druid and visualize the results.
How Airbnb Uses Druid: Dual Cluster Configuration

At Airbnb, two Druid clusters are running in production. Two separate clusters allow dedicated support for different uses, even though a single Druid cluster can handle more data sources than what we need. In total we have 4 Brokers, 2 Overlords, 2 Coordinators, 8 Middle Managers, 40 Historical nodes. In addition, our clusters are supported by one MySQL server and one ZooKeeper cluster with 5 nodes. Druid clusters are relatively small and low cost comparing with other service clusters like HDFS and Presto.

Of the two Druid clusters, one is dedicated to centralized critical metrics services. With the goal of serving all the dashboards at Airbnb, users can easily define their metrics through simple YAML files. Users can view their dashboards and metrics on Superset without knowing anything about Druid.

All the batch jobs are scheduled with Airflow, ingesting data from our Hadoop cluster.

All the real-time and other data sources for self-service users are handled by the other Druid cluster. Real-time data are ingested through Spark Streaming + Tranquility client setup.

Improving Druid Usage at Airbnb

While Druid provides many powerful widely applicable features that satisfy most businesses, we did implement features within or on top of Druid to better serve our special use cases.

A Framework for Instantly Answering Ad-hoc Analytics Queries

Airbnb has a large number of data scientists embedded in different business teams. Each of them may have ad-hoc questions about the business that need insight derived from the data, which often requires arbitrary ways to aggregate data.

To meet this need, we have built a self-service system on top of Druid that allows individual teams to easily define how the data their application or service produces should be aggregated and exposed as a Druid data source. Data scientists and analysts can then query Druid to answer ad-hoc questions.

User define their data source with configuration as simple as below. Real-time data from Kafka and batch data from HDFS/S3 will be ingested according to the config file.

Druid aggregates its real-time data over windows of 5 minutes, plus 1-minute latency from pipelines.

The real-time streaming from Druid empower us to enable a number of sophisticated functionalities for our users. One of interesting use cases for real-time ingestion is anomaly detection. With real-time data ingested and aggregated quickly in Druid, we can detect anything in production that does not conform to an expected pattern very quickly.

Integration with Presto

Druid has a mature query mechanism with JSON over HTTP RESTful API, in addition to SQL query support with recent versions. However, one of the limitations of Druid is that it does not yet allow cross data source queries (simplistically speaking, a join query). All the aggregate queries are limited to a single data-source. In Airbnb however, we do have scenarios where multiple data-sources with overlapping dimensions need to joined together for certain queries. The alternative is to keep all the data in one single data-source, which is not optimal in our scenario for various reasons including cadence of data generation, source of data being different (e.g. different services produce the data) and so on. However, the need for cross data source query is real and has recently become a hard requirement.

To cater to these scenarios, we have developed an in-house solution that is based on Presto. Specifically, we introduced Presto connector for Druid that can push down queries to Druid by individual data sources, and can retrieve and join the data to complete the execution of the cross data-sources query. The details of the implementation are still evolving and is out of scope for this article. We will provide more details in a separate post in the future.

Improve backfill performance

The secret why Druid queries are much faster than other systems comes at the cost of ingestion. Every data segment needs to be ingested from MapReduce jobs first before it is available for queries. This works great as write-once-read-multiple-times model, and the framework only needs to ingest new data on a daily basis.

However, problems arise when an owner of a data source wants to redesign it and regenerate historical data. It means data over the past years need to be re-ingested into Druid to replace the old ones. This requires a very large ingestion job with a long running MapReduce task, making it expensive especially when error happens in the middle of re-ingestion.

One potential solution is to split the large ingestion into several requests in order to achieve better reliability. However, query results will be inconsistent as it will be computed from a mix of existing old as well as newly ingested data. Backfill jobs are actually more frequent than we expected as user requirements and ingestion framework functionalities evolve, making its performance a pain point that begs improvement.

To solve this, we have designed a solution that basically keeps all the newly ingested segments inactive until explicit activation. This enables ingestion framework to split the source of data into smaller intervals with acceptable sizes. The framework then ingests these intervals in parallel (as parallel as Yarn cluster resources allow). Since the newly ingested data is still inactive, the segments are hidden in the background and there’s no mix of different versions of data when computing results for queries being executed while backfill ingestion is still in progress. When we activate the latest version of segments for the data source, it will be refreshed with the new version without downtime. Split and refresh greatly improved backfill performance and has made backfills that used to run longer than a day to now finish in one hour.

Monitoring and operation

We monitor Druid continuously for reliable service and best performance. Druid is robust and resilient to node failures. Most nodes failures are transparent and unnoticeable to users. Even if a role that is a single point of failure (like Coordinator, Overlord, or even ZooKeeper) fails, Druid cluster is still able to provide query service to users. However, to honor our SLA with users, any service outage should be caught in time or even before failure happens.

Like other clusters, we monitor every machine in the Druid clusters by collecting machine statistics and raising an alert if any instance reaches its capacity or goes into bad state. To monitor overall cluster availability, we ingest one piece of canary data into Druid every 30 minutes, and check if query result from each Broker node matches the latest ingested data every 5 minutes. Any degradation in service, including query, ingestion, or downstream HDFS instability, can be detected within the SLA.

Druid has been running at Airbnb for years and it is one of the systems with the lowest maintenance cost. Druid’s multi-role design makes operations easy and reliable. Cluster administrator can adjust cluster configuration and add/remove nodes based on the monitoring metrics. As data grows in our Druid cluster, we can continue adding historical node capacity to cache and serve the larger amount of data easily. If real-time ingestion workload shows an uptick, we can easily add middle manager nodes accordingly. Similarly, if more capacity is needed to handle queries, we can increase the broker node count. Thanks to Druid’s decoupled architecture, we have done a large operation that migrates all data in deep storage from HDFS to S3 with newly rebuilt cluster, with only minutes downtime.

Challenges and future improvements

While Druid has served us well in our data platform architecture, there are new challenges as our usage of Druid grows within the company.

One of the issues we deal with is the growth in the number of segment files that are produced every day that need to be loaded into the cluster. Segment files are the basic storage unit of Druid data, that contain the pre-aggregated data ready for serving. At Airbnb, we are encountering a few scenarios where a large number of our data sources sometime need to be recomputed entirely, resulting a large number of segment files that need to be loaded at once onto the cluster. Currently, ingested segments are loaded by coordinators sequentially in a single thread, centrally. As more and more segments are produced, the coordinator is unable to keep up and we see increasing delay between the time an ingestion job completes and the time the data becomes available for querying (after being loaded by the coordinator). Sometimes the delay can be hours long.

The usual solution is to attempt to increase the target segment size and thus reduce segment count. However, in our usage, the input volume of data to produce a larger segment (by a Hadoop worker running ingestion task) is so high that the Hadoop job would run for too long crunching that data, and many times would fail due to various reasons.

We are currently exploring various solutions, including compacting segments right after ingestion and before it is handed off to the coordinator, and different configurations to increase the segment size without jeopardizing the ingestion job stability when possible.

Conclusion

Druid is a big data analytics engine designed for scalability, maintainability, and performance. Its well factored architecture allows easy management and scaling of Druid deployment, and its optimized storage format enables low latency analytics queries. We have successfully deployed Druid at Airbnb for our use cases and see continued growth in its footprint as our user base and use cases grow.

Airbnb’s Data Platform team is always looking for good engineers with relevant skills! If you are enthusiastic about building out data infrastructure like this one and interested in joining the team, please check out our open positions and send your application!

Druid @ Airbnb Data Platform was originally published in Airbnb Engineering & Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Build a React To-Do App with React Hooks (No Class Components)

Yes, a to-do app, seems like something we’ve all done in the past. The great thing about a to-do app is that it covers all 4 aspects of CRUD; Create, Read, Update, Delete. As a developer, we need to know CRUD and a to-do app is the best way to start that learning or refresh some of the things we may be a little rusty on.

React is the frontend JavaScript library we are going to be using today. Whether you are new to React or are an avid React user, we can all agree on one thing: React is pretty great.

We've already done a React To-Do App here at Scotch, but that was with Class components. Today we will have no Class components by integrating React Hooks, React’s newest upgrade feature. Let’s get started.

https://codesandbox.io/s/oj3qm2zq06

React Hooks

React is always improving and growing. The latest update is React 16.7, so new it’s in alpha stages. In this upgrade, we are given the power of React Hooks.

React Hooks allow for functional components to have a state and utilize lifecycle methods.

No longer do we need to rely on just class components for that functionality.

You can learn all about React Hooks here or visit React’s docs on Hooks here.

Starting a React App

Navigate to the place you would like your new application to be located and type:

npx create-react-app react-to-do

Note: Running npxbefore the command allows for it to be installed if it is not already installed globally on your machine.

Sit back and relax, React is taking care of the initial build for your new application.

Once it is done, you will need to navigate into the project:

cd react-to-do

and then to run the project:

npm run start

and navigate to http://localhost:3000/ to see the spinning React logo in your browser.

Styling Your Application

Jump into your src/App.css file and add in the three classes we will be using throughout our app. Styling won't be the focus of this app, so we'll keep this short and sweet.

src/App.css .app { background: #209cee; padding: 30px; height: 100vh; } .todo-list { background: #e8e8e8; border-radius: 4px; padding: 5px; max-width: 400px; } .todo { background: #fff; box-shadow: 1px 1px 1px rgba(0, 0, 0, 0.15); padding: 3px 10px; font-size: 12px; margin-bottom: 6px; border-radius: 3px; display: flex; align-items: center; justify-content: space-between; } Reading a List of Items. cRud.

With your application running and the styling ready to be used, let’s start on the Read part of CRUD. We’ll want to make a list of things, just so we can Read/view the list.

Adding in State

Go into your src/App.js file and let’s add a state to our component. We are going to be using React Hooks so state will look a little different than what you're used to in classes.

src/App.js function App() { const [todos, setTodos] = useState([ { text: "Learn about React" }, { text: "Meet friend for lunch" }, { text: "Build really cool todo app" } ]); // we'll render our todos here ... // return <div></div> }

The component, as we can see, is a functional component. In past versions of React, function components were unable to handle state, but now, by using Hooks, they can.

  • The first parameter, todos is what we are going to name our state.
  • The second parameter, setTodos is what we are going to use to set the state.

We've got a writeup on array destructuring if you want to know more info about that [todos, setTodos] syntax here.

The hook of useState is what React uses to "hook" into the state or lifecycle of the component. We then create an array of objects and we have the beginnings of our state.

Comparing to a Class Component

Let's take a quick detour and see how this would've done with classes:

class App extends Component { state = { todos: [ { text: "Learn about React" }, { text: "Meet friend for lunch" }, { text: "Build really cool todo app" } ] } setTodos = todos => this.setState({ todos }); render() { return <div></div> } }

A lot more typing. React Hooks let's us make that really clean! We'll continue with our functional component version from above.

Our "Todo" Component

We will want to create a component that we can use later on in the return of the main App component. We will call that Todo and it will pass in the (todo) and show the “text” part of the todo (todo.text), like so:

const Todo = ({ todo }) => <div className="todo">{todo.text}</div>;

Let’s see how we will use that in our App component.

Using Our Todo Variable to Get a List Returned

Go down to the return part of the App component and remove almost everything. We want to empty out that return part especially so that the spinning logo we saw earlier when we navigated to http://localhost:3000, goes away and we have our list being displayed on the page.

By using the JavaScript method, map(), we are able to create a new array of items by mapping over the todo items from state and displaying them by index.

Let’s create a pretty list of items:

src/App.js return ( <div className="app"> <div className="todo-list"> {todos.map((todo, index) => ( <Todo key={index} index={index} todo={todo} /> ))} </div> </div> );

Navigate to your browser and you should see something like this:

Creating New Items to the To-Do List - Crud

Want to create a new item to the list? What if we forgot to add something to our to-do list and don’t want to forget that thing? Let’s give our application the power to Create a new item for our to-do app.

While in the src/App.js file, we are going to want to add a couple of things. At the top we are going to add another component, we’ll call it TodoForm. In this component we want it to:

  • Start with an empty state for an input field.
  • Be able to update the form by setting the state.
  • Handle the submit.
Setting our Empty State for the Form Input

Remember, we are using React Hooks so state is going to be a little different. To set our state we are going to want to write it like so:

const [value, setValue] = useState("");

The first is the "value" and the second is how we are going to be setting the state. The state starts off empty and as we add things to our state, it will add it to our list of to-do items.

We will want to add in a handleSubmit variable that can handle our addTodo function (we will make that function soon) and add the item to the list. If nothing is in the input box and the user presses “enter”, we want it to not do anything (i.e., not add in an empty tile to the list).

Adding that functionality into a form that has an input box, we should have our code look like this:

src/App.js function TodoForm({ addTodo }) { const [value, setValue] = useState(""); const handleSubmit = e => { e.preventDefault(); if (!value) return; addTodo(value); setValue(""); }; return ( <form onSubmit={handleSubmit}> <input type="text" className="input" value={value} onChange={e => setValue(e.target.value)} /> </form> ); }

The addTodo function I told you about earlier? Let’s go ahead and build that now. Staying within App.js, under the state of the App component, the function should be able to grab the existing list of items, add on the new item, and display that new list.

src/App.js const addTodo = text => { const newTodos = [...todos, { text }]; setTodos(newTodos); };

Notice the lack of this.state.? With the new React Hooks, we have no more need to use that. Can you use it? Sure, of course. But the new Hooks allow for less typing, more efficiency, and understand that this.state. is going to be implied in certain places.

See that spread operator? The three dots before the todos, that is essentially "copying" the list for us so that we are able to add on the new to-do item. Then using our keyword that we set earlier, we will set the state with setTodos.

By using the TodoForm down in the return of the App component, we will see that input box pop up now. The entire src/App.js file should look like this so far:

src/App.js import React, { useState } from "react"; import "./App.css"; const Todo = ({ todo }) => <div className="todo">{todo.text}</div>; function TodoForm({ addTodo }) { const [value, setValue] = useState(""); const handleSubmit = e => { e.preventDefault(); if (!value) return; addTodo(value); setValue(""); }; return ( <form onSubmit={handleSubmit}> <input type="text" className="input" value={value} onChange={e => setValue(e.target.value)} /> </form> ); } function App() { const [todos, setTodos] = useState([ { text: "Learn about React" }, { text: "Meet friend for lunch" }, { text: "Build really cool todo app" } ]); const addTodo = text => { const newTodos = [...todos, { text }]; setTodos(newTodos); }; return ( <div className="app"> <div className="todo-list"> {todos.map((todo, index) => ( <Todo key={index} index={index} todo={todo} /> ))} <TodoForm addTodo={addTodo} /> </div> </div> ); } export default App;

Go to your browser and play around. You can now add in a to-do item to your list!

Updating Items in the To-Do List to be Completed - crUd

How would we want to update our to-do application? Maybe let’s have the functionality of being able to cross off an item. Let’s get started on that code!

Updating our State

Our state in our App component needs a little extra to it for the "Completed" status to be able to change. We will be adding in another key/value pair to our list of objects. By adding in an "isCompleted: false" value, we set that to false to begin with and will, when prompted, change that to true.

src/App.js const [todos, setTodos] = useState([ { text: "Learn about React", isCompleted: false }, { text: "Meet friend for lunch", isCompleted: false }, { text: "Build really cool todo app", isCompleted: false } ]);

We will need a function like the addTodo function but this one will be able to “complete” an item. We will want to do some similar things that we did in the addTodo like using the spread operator to grab the current list of items. In this function, we will be changing the isCompleted status to true so that it knows it has been completed. It will then update the state and set the state to the newTodos.

src/App.js const completeTodo = index => { const newTodos = [...todos]; newTodos[index].isCompleted = true; setTodos(newTodos); };

By using completeTodo in the Todo function, we are going to be able to fire off that functionality. When the “Complete” button is clicked, it will add in the textDecoration styling and cross-out the item. We are using a ternary operator, a feature within ES6 JavaScript, which is a simpler way of doing an if/else statement. This is our way of completing an item on the list and “updating” the list. The code should look as follows:

src/App.js function Todo({ todo, index, completeTodo }) { return ( <div className="todo" style={{ textDecoration: todo.isCompleted ? "line-through" : "" }} > {todo.text} <div> <button onClick={() => completeTodo(index)}>Complete</button> </div> </div> ); }

Dive down to the return of the App component and we’ll add in the following line:

completeTodo={completeTodo}

to look like this in the code:

src/App.js <div className="card-content"> {todos.map((todo, index) => ( <Todo key={index} index={index} todo={todo} completeTodo={completeTodo} /> ))} </div>

Returning to the browser, your to-do app should look something like this when a “Complete” button is clicked.

Now we can read our list, add to our list, and update the completed status of each item. What’s left? Delete an item.

Deleting a To-Do Item - cruD

So you’ve completed an item on your to-do list, the day is over and you want to delete it from your list to be able to start over tomorrow. We are going to want to delete that item completely. Let’s see how we can get that going.

By adding just a couple lines, we will be able to add in the functionality of deleting an item.

We will go ahead and build the removeTodo function so that when we click on an “X” to delete an item, that will be fired off. That function will be located by the others underneath the state of the App component.

In this removeTodo function, we will again be using the spread operator but once we grab that current list, we will be "splicing" the chosen index off of the array of items. Once that is removed, we will return the new state by setting it with setTodos to be newTodos.

src/App.js const removeTodo = index => { const newTodos = [...todos]; newTodos.splice(index, 1); setTodos(newTodos); };

In your Todo function, you will want to add in this line:

<button onClick={() => removeTodo(index)}>x</button>

like this:

src/App.js function Todo({ todo, index, completeTodo, removeTodo }) { return ( <div className="todo" style={{ textDecoration: todo.isCompleted ? "line-through" : "" }} > {todo.text} <div> <button onClick={() => completeTodo(index)}>Complete</button> <button onClick={() => removeTodo(index)}>x</button> </div> </div> ); }

You’ll see as well that we are bringing in removeTodo at the top and then using it in the onClick of the “X”.

Adding in the removeTodo in the Todo part of the returning the App component, our “delete” will be fully functional. Add it in here:

src/App.js <Todo key={index} index={index} todo={todo} completeTodo={completeTodo} removeTodo={removeTodo} />

With that added in, go to your browser and you’ll see a button with an "X" that when clicked, deletes the item completely.

The Final Product

The entire src/App.js file should look like this in the end:

src/App.js import React, { useState } from "react"; import "./App.css"; function Todo({ todo, index, completeTodo, removeTodo }) { return ( <div className="todo" style={{ textDecoration: todo.isCompleted ? "line-through" : "" }} > {todo.text} <div> <button onClick={() => completeTodo(index)}>Complete</button> <button onClick={() => removeTodo(index)}>x</button> </div> </div> ); } function TodoForm({ addTodo }) { const [value, setValue] = useState(""); const handleSubmit = e => { e.preventDefault(); if (!value) return; addTodo(value); setValue(""); }; return ( <form onSubmit={handleSubmit}> <input type="text" className="input" value={value} onChange={e => setValue(e.target.value)} /> </form> ); } function App() { const [todos, setTodos] = useState([ { text: "Learn about React", isCompleted: false }, { text: "Meet friend for lunch", isCompleted: false }, { text: "Build really cool todo app", isCompleted: false } ]); const addTodo = text => { const newTodos = [...todos, { text }]; setTodos(newTodos); }; const completeTodo = index => { const newTodos = [...todos]; newTodos[index].isCompleted = true; setTodos(newTodos); }; const removeTodo = index => { const newTodos = [...todos]; newTodos.splice(index, 1); setTodos(newTodos); }; return ( <div className="app"> <div className="todo-list"> {todos.map((todo, index) => ( <Todo key={index} index={index} todo={todo} completeTodo={completeTodo} removeTodo={removeTodo} /> ))} <TodoForm addTodo={addTodo} /> </div> </div> ); } export default App; What have we learned?

A to-do app can be a great reminder or starting point when it comes to CRUD in web development. Being able to read information, create new information, update existing information, and deleting said information can be powerful in any application.

React Hooks are great. They allow for a more straight-forward way of coding and can make your code clear and concise.

Now go have fun adding all your many to-do items to your newly built to-do app. And then have even more fun crossing them off when you finish them!

Happy coding!

Build a To-Do App with React Hooks (No Class Components)

Yes, a to-do app, seems like something we’ve all done in the past. The great thing about a to-do app is that it covers all 4 aspects of CRUD; Create, Read, Update, Delete. As a developer, we need to know CRUD and a to-do app is the best way to start that learning or refresh some of the things we may be a little rusty on.

React is the frontend JavaScript library we are going to be using today. Whether you are new to React or are an avid React user, we can all agree on one thing: React is pretty great.

We've already done a React To-Do App here at Scotch, but that was with Class components. Today we will have no Class components by integrating React Hooks, React’s newest upgrade feature. Let’s get started.

https://codesandbox.io/embed/oj3qm2zq06

React Hooks

React is always improving and growing. The latest update is React 16.7, so new it’s in alpha stages. In this upgrade, we are given the power of React Hooks.

React Hooks allow for functional components to have a state and utilize lifecycle methods.

No longer do we need to rely on just class components for that functionality.

You can learn all about React Hooks here or visit React’s docs on Hooks here.

Starting a React App

Navigate to the place you would like your new application to be located and type:

npx create-react-app react-to-do

Note: Running npxbefore the command allows for it to be installed if it is not already installed globally on your machine.

Sit back and relax, React is taking care of the initial build for your new application.

Once it is done, you will need to navigate into the project:

cd react-to-do

and then to run the project:

npm run start

and navigate to http://localhost:3000/ to see the spinning React logo in your browser.

Styling Your Application

Jump into your src/App.css file and add in the three classes we will be using throughout our app. Styling won't be the focus of this app, so we'll keep this short and sweet.

src/App.css .app { background: #209cee; padding: 30px; height: 100vh; } .todo-list { background: #e8e8e8; border-radius: 4px; padding: 5px; max-width: 400px; } .todo { background: #fff; box-shadow: 1px 1px 1px rgba(0, 0, 0, 0.15); padding: 3px 10px; font-size: 12px; margin-bottom: 6px; border-radius: 3px; display: flex; align-items: center; justify-content: space-between; } Reading a List of Items. cRud.

With your application running and the styling ready to be used, let’s start on the Read part of CRUD. We’ll want to make a list of things, just so we can Read/view the list.

Adding in State

Go into your src/App.js file and let’s add a state to our component. We are going to be using React Hooks so state will look a little different than what you're used to in classes.

src/App.js function App() { const [todos, setTodos] = useState([ { text: "Learn about React" }, { text: "Meet friend for lunch" }, { text: "Build really cool todo app" } ]); // we'll render our todos here ... // return <div></div> }

The component, as we can see, is a functional component. In past versions of React, function components were unable to handle state, but now, by using Hooks, they can.

  • The first parameter, todos is what we are going to name our state.
  • The second parameter, setTodos is what we are going to use to set the state.

We've got a writeup on array destructuring if you want to know more info about that [todos, setTodos] syntax here.

The hook of useState is what React uses to "hook" into the state or lifecycle of the component. We then create an array of objects and we have the beginnings of our state.

Comparing to a Class Component

Let's take a quick detour and see how this would've done with classes:

class App extends Component { state = { todos: [ { text: "Learn about React" }, { text: "Meet friend for lunch" }, { text: "Build really cool todo app" } ] } setTodos = todos => this.setState({ todos }); render() { return <div></div> } }

A lot more typing. React Hooks let's us make that really clean! We'll continue with our functional component version from above.

Our "Todo" Component

We will want to create a component that we can use later on in the return of the main App component. We will call that Todo and it will pass in the (todo) and show the “text” part of the todo (todo.text), like so:

const Todo = ({ todo }) => <div className="todo">{todo.text}</div>;

Let’s see how we will use that in our App component.

Using Our Todo Variable to Get a List Returned

Go down to the return part of the App component and remove almost everything. We want to empty out that return part especially so that the spinning logo we saw earlier when we navigated to http://localhost:3000, goes away and we have our list being displayed on the page.

By using the JavaScript method, map(), we are able to create a new array of items by mapping over the todo items from state and displaying them by index.

Let’s create a pretty list of items:

src/App.js return ( <div className="app"> <div className="todo-list"> {todos.map((todo, index) => ( <Todo key={index} index={index} todo={todo} /> ))} </div> </div> );

Navigate to your browser and you should see something like this:

Creating New Items to the To-Do List - Crud

Want to create a new item to the list? What if we forgot to add something to our to-do list and don’t want to forget that thing? Let’s give our application the power to Create a new item for our to-do app.

While in the src/App.js file, we are going to want to add a couple of things. At the top we are going to add another component, we’ll call it TodoForm. In this component we want it to:

  • Start with an empty state for an input field.
  • Be able to update the form by setting the state.
  • Handle the submit.
Setting our Empty State for the Form Input

Remember, we are using React Hooks so state is going to be a little different. To set our state we are going to want to write it like so:

const [value, setValue] = useState("");

The first is the "value" and the second is how we are going to be setting the state. The state starts off empty and as we add things to our state, it will add it to our list of to-do items.

We will want to add in a handleSubmit variable that can handle our addTodo function (we will make that function soon) and add the item to the list. If nothing is in the input box and the user presses “enter”, we want it to not do anything (i.e., not add in an empty tile to the list).

Adding that functionality into a form that has an input box, we should have our code look like this:

src/App.js function TodoForm({ addTodo }) { const [value, setValue] = useState(""); const handleSubmit = e => { e.preventDefault(); if (!value) return; addTodo(value); setValue(""); }; return ( <form onSubmit={handleSubmit}> <input type="text" className="input" value={value} onChange={e => setValue(e.target.value)} /> </form> ); }

The addTodo function I told you about earlier? Let’s go ahead and build that now. Staying within App.js, under the state of the App component, the function should be able to grab the existing list of items, add on the new item, and display that new list.

src/App.js const addTodo = text => { const newTodos = [...todos, { text }]; setTodos(newTodos); };

Notice the lack of this.state.? With the new React Hooks, we have no more need to use that. Can you use it? Sure, of course. But the new Hooks allow for less typing, more efficiency, and understand that this.state. is going to be implied in certain places.

See that spread operator? The three dots before the todos, that is essentially "copying" the list for us so that we are able to add on the new to-do item. Then using our keyword that we set earlier, we will set the state with setTodos.

By using the TodoForm down in the return of the App component, we will see that input box pop up now. The entire src/App.js file should look like this so far:

src/App.js import React, { useState } from "react"; import "./App.css"; const Todo = ({ todo }) => <div className="todo">{todo.text}</div>; function TodoForm({ addTodo }) { const [value, setValue] = useState(""); const handleSubmit = e => { e.preventDefault(); if (!value) return; addTodo(value); setValue(""); }; return ( <form onSubmit={handleSubmit}> <input type="text" className="input" value={value} onChange={e => setValue(e.target.value)} /> </form> ); } function App() { const [todos, setTodos] = useState([ { text: "Learn about React" }, { text: "Meet friend for lunch" }, { text: "Build really cool todo app" } ]); const addTodo = text => { const newTodos = [...todos, { text }]; setTodos(newTodos); }; return ( <div className="app"> <div className="todo-list"> {todos.map((todo, index) => ( <Todo key={index} index={index} todo={todo} /> ))} <TodoForm addTodo={addTodo} /> </div> </div> ); } export default App;

Go to your browser and play around. You can now add in a to-do item to your list!

Updating Items in the To-Do List to be Completed - crUd

How would we want to update our to-do application? Maybe let’s have the functionality of being able to cross off an item. Let’s get started on that code!

Updating our State

Our state in our App component needs a little extra to it for the "Completed" status to be able to change. We will be adding in another key/value pair to our list of objects. By adding in an "isCompleted: false" value, we set that to false to begin with and will, when prompted, change that to true.

src/App.js const [todos, setTodos] = useState([ { text: "Learn about React", isCompleted: false }, { text: "Meet friend for lunch", isCompleted: false }, { text: "Build really cool todo app", isCompleted: false } ]);

We will need a function like the addTodo function but this one will be able to “complete” an item. We will want to do some similar things that we did in the addTodo like using the spread operator to grab the current list of items. In this function, we will be changing the isCompleted status to true so that it knows it has been completed. It will then update the state and set the state to the newTodos.

src/App.js const completeTodo = index => { const newTodos = [...todos]; newTodos[index].isCompleted = true; setTodos(newTodos); };

By using completeTodo in the Todo function, we are going to be able to fire off that functionality. When the “Complete” button is clicked, it will add in the textDecoration styling and cross-out the item. We are using a ternary operator, a feature within ES6 JavaScript, which is a simpler way of doing an if/else statement. This is our way of completing an item on the list and “updating” the list. The code should look as follows:

src/App.js function Todo({ todo, index, completeTodo }) { return ( <div className="todo" style={{ textDecoration: todo.isCompleted ? "line-through" : "" }} > {todo.text} <div> <button onClick={() => completeTodo(index)}>Complete</button> </div> </div> ); }

Dive down to the return of the App component and we’ll add in the following line:

completeTodo={completeTodo}

to look like this in the code:

src/App.js <div className="card-content"> {todos.map((todo, index) => ( <Todo key={index} index={index} todo={todo} completeTodo={completeTodo} /> ))} </div>

Returning to the browser, your to-do app should look something like this when a “Complete” button is clicked.

Now we can read our list, add to our list, and update the completed status of each item. What’s left? Delete an item.

Deleting a To-Do Item - cruD

So you’ve completed an item on your to-do list, the day is over and you want to delete it from your list to be able to start over tomorrow. We are going to want to delete that item completely. Let’s see how we can get that going.

By adding just a couple lines, we will be able to add in the functionality of deleting an item.

We will go ahead and build the removeTodo function so that when we click on an “X” to delete an item, that will be fired off. That function will be located by the others underneath the state of the App component.

In this removeTodo function, we will again be using the spread operator but once we grab that current list, we will be "splicing" the chosen index off of the array of items. Once that is removed, we will return the new state by setting it with setTodos to be newTodos.

src/App.js const removeTodo = index => { const newTodos = [...todos]; newTodos.splice(index, 1); setTodos(newTodos); };

In your Todo function, you will want to add in this line:

<button onClick={() => removeTodo(index)}>x</button>

like this:

src/App.js function Todo({ todo, index, completeTodo, removeTodo }) { return ( <div className="todo" style={{ textDecoration: todo.isCompleted ? "line-through" : "" }} > {todo.text} <div> <button onClick={() => completeTodo(index)}>Complete</button> <button onClick={() => removeTodo(index)}>x</button> </div> </div> ); }

You’ll see as well that we are bringing in removeTodo at the top and then using it in the onClick of the “X”.

Adding in the removeTodo in the Todo part of the returning the App component, our “delete” will be fully functional. Add it in here:

src/App.js <Todo key={index} index={index} todo={todo} completeTodo={completeTodo} removeTodo={removeTodo} />

With that added in, go to your browser and you’ll see a button with an "X" that when clicked, deletes the item completely.

The Final Product

The entire src/App.js file should look like this in the end:

src/App.js import React, { useState } from "react"; import "./App.css"; function Todo({ todo, index, completeTodo, removeTodo }) { return ( <div className="todo" style={{ textDecoration: todo.isCompleted ? "line-through" : "" }} > {todo.text} <div> <button onClick={() => completeTodo(index)}>Complete</button> <button onClick={() => removeTodo(index)}>x</button> </div> </div> ); } function TodoForm({ addTodo }) { const [value, setValue] = useState(""); const handleSubmit = e => { e.preventDefault(); if (!value) return; addTodo(value); setValue(""); }; return ( <form onSubmit={handleSubmit}> <input type="text" className="input" value={value} onChange={e => setValue(e.target.value)} /> </form> ); } function App() { const [todos, setTodos] = useState([ { text: "Learn about React", isCompleted: false }, { text: "Meet friend for lunch", isCompleted: false }, { text: "Build really cool todo app", isCompleted: false } ]); const addTodo = text => { const newTodos = [...todos, { text }]; setTodos(newTodos); }; const completeTodo = index => { const newTodos = [...todos]; newTodos[index].isCompleted = true; setTodos(newTodos); }; const removeTodo = index => { const newTodos = [...todos]; newTodos.splice(index, 1); setTodos(newTodos); }; return ( <div className="app"> <div className="todo-list"> {todos.map((todo, index) => ( <Todo key={index} index={index} todo={todo} completeTodo={completeTodo} removeTodo={removeTodo} /> ))} <TodoForm addTodo={addTodo} /> </div> </div> ); } export default App; What have we learned?

A to-do app can be a great reminder or starting point when it comes to CRUD in web development. Being able to read information, create new information, update existing information, and deleting said information can be powerful in any application.

React Hooks are great. They allow for a more straight-forward way of coding and can make your code clear and concise.

Now go have fun adding all your many to-do items to your newly built to-do app. And then have even more fun crossing them off when you finish them!

Happy coding!

How to install Percona Server 5.7 on Debian/Ubuntu without a root password prompt

In the last few months, I had to install Percona Server 5.7 (PS5.7) on Debian a few times.  I was regularly annoyed by apt-get -y install percona-server-server-5.7 prompting me for a password.  But that annoyance did not push me to investigate the subject in detail: it was always a quick manual fix and Googling did not lead to a straightforward solution.  However in the last days, I had to

Importing Data from MongoDB to MySQL using JavaScript

MySQL Shell 8.0.13 (GA) introduced a new feature to import JSON documents to MySQL in an easy way. The basics of this new feature were described in a previous blog post. Here, we provide more details about this feature, focusing on a practical use case that may be of interest for some readers: How to import JSON data from MongoDB to MySQL.…

Auto-Refreshing Reports in MySQL Shell

MySQL Shell makes it easy to develop tools you can use for example to generate reports. In a previous blog, I showed how to use external modules in MySQL Shell. In this blog, I will take it one step further and use the curses Python module to create auto-refreshing reports. The first example will be kept very simple to show the idea, then a more realistic example will be shown where the top N files sorted by I/O will be returned.

Note: Out of the box, this does not work on Windows as Python does not ship with the curses library. Basic Example

As a basic example, consider the query SELECT NOW(). This returns the date and time. Let’s query that every second for 10 seconds, then return to the prompt. The easiest is to look at the example and then discuss what it does:

import curses from time import sleep def clock(): # Setup curses stdscr = curses.initscr() curses.noecho() curses.cbreak() stdscr.keypad(True) # Define the query query = session.sql("SELECT NOW()") # Clear screen stdscr.clear() # Run the clock for 10 seconds i = 0 while i < 10: i = i + 1 result = query.execute() now = result.fetch_one() stdscr.addstr(0, 0, "Now") stdscr.addstr(1, 0, "-------------------") stdscr.addstr(2, 0, str(now[0])) stdscr.refresh() sleep(1) # Reset the cures behaviour and finish curses.nocbreak() stdscr.keypad(False) curses.echo() curses.endwin() clock()

Tip: As MySQL Shell treats an empty line as the termination of a multi line block of code, ensure you have some white space on the blank lines in the definition of clock() if you are copying and pasting the code.

Inside the clock() function, curses is first set up to initialize the screen, not to echo key inputs, and to react to keys instantly (don’t wait for enter to be hit). The latter is not needed here as there is nothing checking for key inputs, however in many cases (like the iotop example later in the blog), you want to listen for single key inputs, for example to stop the execution. The counter part of these steps are done at the end of the function to clean up.

Next the query that will be executed is defined. Here I take advantage of the X DevAPI’s support for preparing a query and use and re-use it later. This way, the query is only defined in a single spot. Then the screen is cleared and everything is ready for the loop that will do the main part of the job.

The loop in this example is very simple. It just goes through 10 iterations with a one second sleep at the end of each loop. In a real case, you may want to make this more resilient to delays or add another kinds of logic. The query is executed and the single row in the result is fetched. Finally, the addstr() curses method is used to print the output in the desired location (upper left corner in this case).

When you execute the function, you will get a result like in the following screen shot:

Using MySQL Shell as a clock

This is all fine, but using MySQL Shell to show a clock is hardly the most interesting use of MySQL Shell. Let’s look at an example that is more usable in the real World.

MySQL iotop

If you are a Linux user, you may know the iotop utility which allows you to monitor the I/O activity in a similar way to what the top command does CPU and memory usage for processes. Let’s implement a basic MySQL my_iotop module with the by_file_by_bytes function that displays the N (default 10) MySQL files that have had the most I/O, refreshes with a specified interval (default 10 seconds), and continues until you hit the q key.

This may sound like a complex task, however most of the steps are the same as in the previous example. The top N files with respect to I/O can be found using the global_io_global_by_file_by_bytes view from the sys schema. This sorts by the total number of bytes read and written for the file in descending order. So, only a simple single view query is needed. For this example to avoid the output handling to be too complex, fixed width columns will be used and file paths longer than 50 characters are truncated.

The only thing that really is required that was not part of the previous example is to add the limit to the number of files to include and to detect when q is entered. The limit is however easy to add when using the select() CRUD method.

Another thing that is worth doing is to include the feature in an external module. This makes it easier to reuse. This requires you to add the directory where you will save your modules to the ~/.mysqlsh/mysqlshrc.py file, for example:

import sys sys.path.append("/home/jesper/.mysqlsh/lib")

See also my previous blog on using external modules for more information.

In this example the file with the source code is called my_iotop.py stored in the directory added to the mysqlshrc.py file.

Warning: The following code is devoid of error handling. If you intend to use this in production please validate the input and check for errors.

The complete source is:

import curses from datetime import datetime from time import sleep def global_by_file_by_bytes(session, max_files=10, delay=10): # Define the output format fmt_header = "| {0:50s} | {1:12s} | {2:13s} | {3:13s} | {4:12s} " \ + "| {5:13s} | {6:13s} | {7:13s} | {8:8s} |" header = fmt_header.format("File", "# Reads", "Bytes Read", "Avg. Read", "# Writes", "Bytes Write", "Avg. Write", "Bytes Total", "Write %") bar = "+" + "-" * 52 + "+" + "-" * 14 + "+" + "-" * 15 + "+" + "-" * 15 \ + "+" + "-" * 14 + "+" + "-" * 15 + "+" + "-" * 15 + "+" + "-" * 15 \ + "+" + "-" * 10 + "+" fmt_row = "| {0:50.50s} | {1:12d} | {2:13s} | {3:13s} | {4:12d} " \ + "| {5:13s} | {6:13s} | {7:13s} | {8:8s} |" # Setup curses stdscr = curses.initscr() curses.start_color() curses.init_pair(1, curses.COLOR_RED, curses.COLOR_WHITE) curses.noecho() curses.cbreak() # Listing for 1/10th of second at a time curses.halfdelay(1) stdscr.keypad(True) # Define the query sys_schema = session.get_schema("sys") table = sys_schema.get_table("io_global_by_file_by_bytes") query = table.select().limit(max_files) # Clear screen stdscr.clear() # Run the query and generate the report keep_running = True while keep_running: time = datetime.now() result = query.execute() stdscr.addstr(0, 0, time.strftime('%A %-d %B %H:%M:%S'), curses.color_pair(1)) stdscr.addstr(2, 0, bar) stdscr.addstr(3, 0, header) stdscr.addstr(4, 0, bar) # Print the rows in the result line = 5 for row in result.fetch_all(): stdscr.addstr(line, 0, fmt_row.format(*row)) line = line + 1 stdscr.addstr(line, 0, bar) stdscr.refresh() # Wait until delay seconds have passed while listening for the q key while (datetime.now() - time).total_seconds() < delay: c = stdscr.getch() if c == ord("q"): keep_running = False break # Reset the cures behaviour and finish curses.nocbreak() stdscr.keypad(False) curses.echo() curses.endwin()

The example are using a few more of the curses features which I will not go into detail with. I will recommend you to read the Python documentation and the Curses Programming with Python tutorial, if you are interested in learning more about using curses.

You start the report by calling the global_by_file_by_bytes() function. The session for the connection is required as an argument. Optional arguments are the delay between each iteration (delay) and the maximum number of files to include in the report (max_files). An example using a delay of 5 seconds is:

mysql-py> import my_iotop mysql-py> my_iotop.global_by_file_by_bytes(session, delay=5)

While the implementation shown here is quite rough in its edges, it does show the potential. And remember you have the full Python language available for manipulating the data. Click on the video below to see an example of the report.

https://mysql.wisborg.dk/wp-content/uploads/2018/11/shell_my_iotop.mp4

Innotop for MySQL 8.0

Recently, I did some maintenance on Innotop and I merged several pull requests including one to support MySQL 8.0 (thank you yoku0825 for the contribution).

As you know, Innotop is written in Perl and to be able to use it with MySQL 8.0, you need to have Perl DBD for MySQL compatible with MySQL 8.0 too (a driver that support the new default authentication plugin, caching_sha2_password).

If you are using Fedora 29, I created such package and you can download it here: perl-DBD-MySQL-4.046-5.mysql8.fc29.x86_64

We can see the difference between the default package and this one.

Old package:

$ ldd /usr/lib64/perl5/vendor_perl/auto/DBD/mysql/mysql.so ... libmariadb.so.3 => /lib64/libmariadb.so.3 (0x00007f08f026d000) ... libssl.so.1.1 => /lib64/libssl.so.1.1 (0x00007f08f0033000) libcrypto.so.1.1 => /lib64/libcrypto.so.1.1 (0x00007f08efd57000) ...

New package:

$ ldd /usr/lib64/perl5/vendor_perl/auto/DBD/mysql/mysql.so ... libmysqlclient.so.21 => /usr/lib64/mysql/libmysqlclient.so.21 (0x00007fa3ad611000) ... libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007fa3ad1ae000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fa3ad1a6000) ...

Now you can use again Innotop with MySQL 8.0 !


Thank you again to all contributors and don’t hesitate to submit your patches.

Percona Server for MySQL 5.7.23-24 Is Now Available

Percona announces the release of Percona Server for MySQL 5.7.23-24 on November 12, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.7.23, including all the bug fixes in it. Percona Server for MySQL 5.7.23-24 is now the current GA release in the 5.7 series. All of Percona’s software is open-source and free.

This release introduces InnoDB encryption improvements and merges upstream MyRocks changes. Also, we’ve improved the usage of column families in MyRocks. The InnoDB encryption improvements are in Alpha quality and we don’t recommend that they are used in production.

New Features Bugs Fixed
  • #4723: PURGE CHANGED_PAGE_BITMAPS did not work when innodb_data_home_dir was used
  • #4937: rocksdb_update_cf_options was ignored when specified in my.cnf or on the command line
  • #1107: The binlog could be corrupted when tmpdir got full
  • #4834: The encrypted system tablespace could have an empty uuid

Other bugs fixed

  • #4106: “Assertion `log.getting_synced’ failed in rocksdb::DBImpl::MarkLogsSynced(uint64_t, bool, const rocksdb::Status&)“
  • #4930: “main.percona_log_slow_innodb: Result content mismatch”
  • #4811: “5.7 Merge and fixup for old DB-937 introduces possible regression”
  • #4705: “crash on snapshot size check in RocksDB”

Find the release notes for Percona Server for MySQL 5.7.23-24 in our online documentation. Report bugs in the Jira bug tracker.

Vitess Weekly Digest 11/12/18

We continue the digest from the Slack discussions for Sep 22 2018 to Oct 5 2018. We've fallen slightly behind on this, but will catch up again soon.

Enable VtGateExecute in vtctld Arsalan [Sep 22nd] Hi, I want to query vitess on vtgate but I have below error. How can i fix this problem?
vitess@vtctldv3-hrl74:/$ vtctlclient -server 10.5.61.20:16999  VtGateExecute -server 10.5.61.21:16991 "show tables" E0923 05:14:55.169771    1102 main.go:61] Remote error: rpc error: code = Unknown desc = query commands are disabled (set the -enable_queries flag to enable)
sougou you need to add this flag to vtctld:  -enable_queries         if set, allows vtgate and vttablet queries. May have security implications, as the queries will be run from this process.
Multi-column Primary Key sjmudd [Sep 26th] Question: if a mysql table has a multi-column primary key, can Vitess handle this ? I vaguely remember the answer might be “no”. but looking in vschema_test.go I think that may not be the case.
sougou multi-column pk should work. What we don't fully support yet are multi-column vindexes. there's partial support for multi-column lookup vindexes, where we write multiple columns, but only use the first column for computing the keyspace id.
Vitess wins InfoWorld Bossie award sougou [Sep 26th] "The best open source software for data storage and analytics": https://www.infoworld.com/article/3306454/big-data/the-best-open-source-software-for-data-storage-and-analytics.html#slide9
Also, @acharis and @tpetr have been accepted to present vitess at kubecon: https://kccna18.sched.com/event/Gras/vitess-at-hubspot-how-we-moved-hundreds-of-mysql-databases-into-kubernetes-tom-petr-alexander-charis-hubspot
Splitting sharded and unsharded tables Mark Solters [Sep 27th] If one were to shard foo/0 to baz/{-80,80-}, would the SplitClone foo/0 syntax have to be modified to reflect that the destination shards are no longer in the source keyspace? Inspecting the command seems to suggest that there is no “destination shards” option for SplitClone.  In this case, my question is better phrased as “How can we horizontally reshard one keyspace into another?”  It seems the missing part is convincing foo/0 that it “overlaps” with baz/-80,80-. I think the correct answer here is that I should be vertically sharding.
sougou Yeah. you first have to vertical shard from foo/0 to baz/0, and then reshard baz to baz/{...}
Mark Solters I see. Is there a guide or set of instructions for the vertical sharding procedure?
sougou It's one of the top items that we need to document. In the meantime, you can look at my scripts. Steps are similar to horizontal sharding: https://gist.github.com/sougou/e6259e958b5707d5888a5ac406418cc3
Mark Solters Very interesting! I’ve basically done these steps minus this CreateKeyspace --served_from 'master:test_keyspace,replica:test_keyspace,rdonly:test_keyspace' target Is that step strictly necessary?  If I have already copied over the relevant CopySchemaShard, and baz already has a functional master etc. can I simply run VerticalCloneSplit? i ask because i notice VerticalCloneSplit specifies the destination but not the source.
(The thread was forgotten at this point) Answer: The specific step is strictly necessary for SplitClone. It's currently a rigid workflow that we plan to make more flexible in the future.
Show tables of a keyspace mgersh [Sep 27th] is there a query to show all tables for a certain keyspace? we are using the jdbc driver "show tables from $keyspace" just returns the default keyspace's tables
leoxlin If you connect to a specific Keyspace, you can use SHOW TABLES;
If pods crash, check resource limits
mshivanna [Oct 1st] hi we recently spun up vitess cluster now we want to import a mysql dump. what is the right way to do it? We execed into the master node (vttablet-100) mysql container and did a usual mysql import but the vttablet pod kept crashin after sometime. The dump is 13Gb in size. 
sougou Can you look at the reason why the pod was terminated? Most likely, you ran out of some resource.
mshivanna yeah i have increased the cpu its not failing quite often but its failing will increase again. will update here thank you @sougou
Ameet presents at Velocity conference, NY ameet [Oct 2nd] Hi Vitessians, I just finished my talk about _Moving Slack's database to Vitess_ at Velocity Conference, NY. Here's the link to the talk and slides: https://conferences.oreilly.com/velocity/vl-ny/user/proposal/status/69885 The response was great. There was lots of interest about Vitess.
Improved control on resharding young [Oct 5th] Is there an open issue for encapsulating resetting vreplication in one command? It's quite annoying to do manually.
sougou i'm wokring on a PR to perform high level commands for canceling resharding etc.
UPDATE:  these features are now done.
Understanding keyspace ids and keyranges Mark Solters [Oct 5th] given a shard like 80- what is the smallest keyspace ID that would be mapped to this shard? is it 0x80? 0x81? 0x800? 0x8000? (i am trying to construct a numeric_static_map) i know that we use a syntax that makes the RHS zeros “optional” but I also think that the center of the keyspace can’t be as small a number as 0x80. (!) is what we write as 0x80 actually base-10 9223372036854775808?
sougou the keyspace id is a buch of bits (left justified) there is theoretically no limit to the number of bytes it can have it may get confusing if you try to see it as a number because numbers are right justified in some cases, we 'encode' a number as a keyspace id. If so, we set the length as 64 bits. it's just a convention. One could have used 128 bits or 32 bits, or whatever.
Mark Solters hmm, OK so how then does one construct a numeric_static_map? the vitess repo contains a numeric_static_map_test.json where it is simply mapping some example primary keys to integers like 1 and 2, which if i'm reading this correctly, are the keyspace IDs im basically confused about how this map here translates to shards: https://github.com/vitessio/vitess/blob/master/data/test/vtgate/numeric_static_map_test.json {     "1": 1,     "3": 2 }
sougou Oops. Sorry about the tangent. Yeah. In your case, you're right. the 922.. number is the lowest number for shard 80- this is because numeric_static_map converts the number into a 64-bit keyspace id.
Mark Solters yes, and i take your point about the question being ambiguous without something grounding the bit size hmmmm so, for this (trivial) example to work i guess id have to construct a shard like -000000000000002 which would have an effective key size of 1
sougou right. that shard can have 2 ids (0 & 1)
Meaning of target in the Go driver derekperkins [Oct 5th] @sougou what is a target in the go driver? https://github.com/vitessio/vitess/blob/master/go/vt/vitessdriver/driver.go#L129
sougou best documentation is the test https://github.com/vitessio/vitess/blob/master/go/vt/vitessdriver/driver_test.go
derekperkins ah, thanks and leaving it empty presumably points to master?
sougou yeah well if you didn't override the default tablet type in vtgate

Another shows this week with MySQL!!!

We are happy to confirm that we are ready for another two shows which will take place this week. One in the US second in Europe. Please find details below:

  • PHP[World]
    • Place: Washington DC, US
    • Date: November 14-15, 2018
    • MySQL Community team is a Workshop sponsor here with David Stokes, the MySQL Community Manager as MySQL representative on site. 
  • BGOUG
    • Place: Pravets, Bulgaria
    • Date: November 16, 2018
    • MySQL sponsors & attends this show for years. This year Georgi Kodinov, the Senior Software Developer Manager for MySQL will have a talk on "Data Masking in MySQL Enterprise 5.7 and 8". The talk is scheduled for Friday, Nov 16 @17:30-18:15 in DB session. See full schedule here.

We are looking forward to seeing & talking to you there!!

 

 

 

 

 

Madrid MUG Meetup this week

We are happy to announce that this Thursday, November 15 at 7:00 p.m., there will be a MySQL meetup in Madrid, Spain. Please find more details below:  
  • Date: Nov 15, 2018
  • Time: 7:00-9:00pm
  • Place: booking.com @ Torre Europa Paseo de la Castellana 95, Planta 11 · Madrid
  • Agenda:
    • Morgan Tocker: "TiDB: Distributed, horizontally scalable, MySQL compatible”
    • Simon Mudd: "Vitess and MySQL Cluster" More information & registration can be found on the Meetup website: https://www.meetup.com/Madrid-MySQL-users-group/events/255291731/  

Do not forget to register on the page if you are coming.

MySQL @DOAG 2018

DOAG 2018 Conference + Exhibition will be held November 20 - 23, 2018 in Nuremberg. Participants will have the opportunity to attend more than 400 talks and international top speakers, plus a wide choice of workshops and community activities. This is a great opportunity to expand your knowledge and benefit from the know-how of the Oracle community. 
As in the past years we will be present with more than 10+ specific MySQL talks from various MySQL customer and MySQL experts. 
During the 3 conference days, you'll have the chance to meet the MySQL and Cloud experts from 9am-12am at Cloud Café at the Oracle booth (3rd floor, 320) or you arrange a meeting with us: mysql-sales_de@oracle.com.

Catching up with Kettle REMIX

Dear Kettle and Neo4j friends,

Since I joined the Neo4j team in April I haven’t given you any updates despite the fact that a lot of activity has been taking place in both the Neo4j and Kettle realms.

First and foremost, you can grab the cool Neo4j plugins from neo4j.kettle.be (the plugin in the marketplace is always out of date since it takes weeks to update the metadata).

Then based on valuable feedback from community members we’ve updated the DataSet plugin (including unit testing) to include relative paths for filenames (for easier git support), to avoid modifying transformation metadata and to set custom variables or parameters.

I’ve also created a plugin to debug transformations and jobs a bit easier.  You can do things like set specific logging levels on steps (or only for a few rows) and work with zoom levels.

Then, back on the subject of Neo4j, I’ve created a plugin to log the execution results of transformations and jobs (and a bit of their metadata) to Neo4j.

Graph of a transformation executing a bunch of steps. Metadata on the left, logging nodes on the right.

Those working with Azure might enjoy the Event Hubs plugins for a bit of data streaming action in Kettle.

The Kettle Needful Things plugin aims to fix bugs and solve silly problems in Kettle.  For now it sets the correct local metastore on Carte servers AND… features a new launcher script called MaitreMaitre supports transformations and jobs, local, remote and clustered execution.

The Kettle Environment plugin aims to take a stab at lifecycle management by allowing you to define a list of Environments:

In each Environment you can set all sorts of metadata but also the location of the Kettle and MetaStore home folders.

Finally, because downloading, patching, installing and configuring all this is a lot of work, I’ve created an automated process which does this for you on a daily bases (for testing) and so you can download Kettle Community Edition version 8.1.0.0 patched to 8.1.0.4 with all the extra plugins above in its 1GB glory at : remix.kettle.be

To get it on your machine simply run:

wget remix.kettle.be -O remix.zip

You can also give these plugins (Except for Needful-things and Environment) on my sandbox WebSpoon server.  You can easily run your own WebSpoon from the also daily updated docker container.

If you have suggestions, bugs, rants, please feel free to leave them here or in the respective github projects.  Any feedback is as always more than welcome.  In fact, thanks you all for the feedback given so far.  It’s making all the difference.  If you feel the need to contribute more opinions on the subjects of Kettle feel free to send me a mail (mattcasters at gmail dot com) to join our kettle-community Slack channel.

Enjoy!

Matt

MySQL, MariaDB & Friends: the Committee

We have sent requests and also received candidatures for the MySQL, MariaDB & Friends Devroom Committee. Now we are happy to announce you the name of the people who will rate your submissions:

Know that the Committee is known, don’t hesitate to submit you talk!

See you soon at FOSDEM !

Reminder: MySQL User Group NL and Madrid MySQL User Group Meetups presenting Scaling MySQL this week (Monday/Thursday)

This is a quick reminder that this week there are 2 MySQL User Group Meetings (Amsterdam and Madrid) where we will be talking about Scaling MySQL. Guest speaker will be Morgan Tocker from PingCap who will talk about TiDB. More information: MySQL User Group NL  meetup taking place on Monday (presenting: Morgan, Daniël and Simon) … Continue reading Reminder: MySQL User Group NL and Madrid MySQL User Group Meetups presenting Scaling MySQL this week (Monday/Thursday)

Tracking Foreign Keys

The other day, I was reading a blog by Magnus Hagander about tracking foreign keys throughout a schema in PostgreSQL. I thought it was a good idea, so I decided to look at how you can track foreign key in MySQL.

The way I decided to do it was to start out with a table, then find all tables referencing the table by a foreign key. From this basic (and simple as it will be shown) query, it is possible to create a chain of relations. The key table for the queries is information_schema.KEY_COLUMN_USAGE which has information about all the foreign key relationships.

However, before getting that far, it is necessary to have some tables to work with.

Sample Schema

A small schema, but with relatively complex foreign keys relations, can be created with the following statements:

DROP SCHEMA IF EXISTS db1; CREATE SCHEMA db1; CREATE TABLE db1.a ( a_id int, PRIMARY KEY (a_id) ); CREATE TABLE db1.b ( b_id1 int, b_id2 int, a_id int, PRIMARY KEY (b_id1, b_id2), FOREIGN KEY (a_id) REFERENCES a (a_id) ); CREATE TABLE db1.c ( c_id int, b_id1 int, b_id2 int, PRIMARY KEY (c_id), FOREIGN KEY (b_id1, b_id2) REFERENCES b (b_id1, b_id2) ); CREATE TABLE db1.d ( d_id int, a_id int, c_id int, PRIMARY KEY (d_id), FOREIGN KEY (a_id) REFERENCES a (a_id), FOREIGN KEY (c_id) REFERENCES c (c_id) ); CREATE TABLE db1.e ( e_id int, a_id int, PRIMARY KEY (e_id), FOREIGN KEY (a_id) REFERENCES a (a_id) ); CREATE TABLE db1.f ( f_id int, b_id1 int, b_id2 int, c_id int, PRIMARY KEY (f_id), FOREIGN KEY (b_id1, b_id2) REFERENCES b (b_id1, b_id2), FOREIGN KEY (c_id) REFERENCES c (c_id) );

The base table is a. There are three tables, b, d, and e, with a direct foreign key to table a.  Tables c and f in turn references table b through a two column foreign key, and tables d and f references table c. So there are quite a few paths to get to table a from another table. Before looking at how the relationship can be found from the information_schema.KEY_COLUMN_USAGE, it is worth creating a visual representation of the schema.

MySQL Workbench EER Diagrams

A long standing feature of MySQL Workbench is its ability to create enhanced entity–relationship (EER) diagrams. This shows a box with information about the columns and indexes of each table in a schema. Additionally there are lines connecting tables related by foreign keys. So, an EER diagram includes what we are looking for – the chain of foreign keys.

You can create an ERR diagram by clicking on Database and then choose Reverse Engineer … from the menu in MySQL Workbench:

Choose Reverse Engineer in the MySQL Workbench menu.

Alternatively use CTRL + R. You can do this from the homepage or from a database connection. Either way, you are taken to the connection options screen where you can choose an already defined connection or create a new one – this works the same as when you connect to a MySQL instance to execute queries:

Specify the connections options to create to the MySQL instance with the schema for the EER diagram.

When you continue, MySQL Workbench will connect to MySQL Server and get a list of the schemas available. Once you get to the Select Schemas page, you can choose the schema or schemas you want to create the EER diagram for. In this case choose the db1 schema (unless you created the tables in another schema):

Choose the schemas to import.

For this example, you can use the defaults for the rest of the screens. On the Select Objects screen, you can optionally choose to select a subset of tables for the diagram. On the same screen, you choose whether you want to place the imported objects into a diagram (enabled by default); you want to do that for this example.

Tip: If MySQL Workbench crashes when creating the diagram, try open Edit → Configuration… → Modelling in the menu and check the Force use of software based rendering for EER diagrams option.

At the end, you have the diagram. You can move the tables around to place them as you like. One example of the diagram is:

MySQL Workbench EER Diagram

This makes it easy to see the relations between the tables.

But what do you do, if you want to analyze the relationship in a program or for some other reason have the relationships in a text format? Let’s look at that.

Querying the Foreign Key Relationship

As mentioned, the base table for looking at foreign key relations is the information_schema.KEY_COLUMN_USAGE table. It has the following definition:

mysql> DESC information_schema.KEY_COLUMN_USAGE; +-------------------------------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------------------+------------------+------+-----+---------+-------+ | CONSTRAINT_CATALOG | varchar(64) | YES | | NULL | | | CONSTRAINT_SCHEMA | varchar(64) | YES | | NULL | | | CONSTRAINT_NAME | varchar(64) | NO | | NULL | | | TABLE_CATALOG | varchar(64) | YES | | NULL | | | TABLE_SCHEMA | varchar(64) | YES | | NULL | | | TABLE_NAME | varchar(64) | YES | | NULL | | | COLUMN_NAME | varchar(64) | YES | | NULL | | | ORDINAL_POSITION | int(10) unsigned | NO | | NULL | | | POSITION_IN_UNIQUE_CONSTRAINT | binary(0) | YES | | NULL | | | REFERENCED_TABLE_SCHEMA | binary(0) | YES | | NULL | | | REFERENCED_TABLE_NAME | binary(0) | YES | | NULL | | | REFERENCED_COLUMN_NAME | binary(0) | YES | | NULL | | +-------------------------------+------------------+------+-----+---------+-------+ 12 rows in set (0.01 sec)

In MySQL 8.0 this is a view on the new data dictionary, so effectively a plain InnoDB query and it is fast to query. In MySQL 5.7 and earlier, querying it requires opening the tables which can be slow and all tables must be opened. If you have many tables and they are not cached in the table caches yet, querying KEY_COLUMN_USAGE can be slow and impact the general performance.

Basic Query – Single Column per Foreign Key

The three columns prefixed with REFERENCED_ contains the information about a foreign key. For example, for the tables used in this blog, if you want to know which tables have a direct foreign key to table a in the db1 schema, you can query KEY_COLUMN_USAGE with a WHERE clause on REFERENCED_TABLE_SCHEMA and REFERENCED_TABLE_NAME like:

mysql> SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, REFERENCED_TABLE_SCHEMA, REFERENCED_TABLE_NAME, REFERENCED_COLUMN_NAME FROM information_schema.KEY_COLUMN_USAGE WHERE REFERENCED_TABLE_SCHEMA = 'db1' AND REFERENCED_TABLE_NAME = 'a'; +--------------+------------+-------------+-------------------------+-----------------------+------------------------+ | TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | REFERENCED_TABLE_SCHEMA | REFERENCED_TABLE_NAME | REFERENCED_COLUMN_NAME | +--------------+------------+-------------+-------------------------+-----------------------+------------------------+ | db1 | b | a_id | db1 | a | a_id | | db1 | d | a_id | db1 | a | a_id | | db1 | e | a_id | db1 | a | a_id | +--------------+------------+-------------+-------------------------+-----------------------+------------------------+ 3 rows in set (0.01 sec)

So, the tables b, d, and e has a foreign key to a_id in the db1.a table, and the column name for each of the three tables is also called a_id. This is just as expected.

The query works great for finding the immediate relations where the foreign key only includes a single column. However, for cases where there are multiple columns in the foreign key, there will be two rows for each referencing table. So what to do?

Basis Query – Multiple Columns per Foreign Key

To avoid having one row per column in a multi-column foreign key, you need to perform an aggregation. You can for example use the GROUP_CONCAT() to generate a single value combining the column names. In MySQL 8.0, you can also consider creating a JSON array by using the JSON_ARRAYAGG() function:

mysql> SELECT TABLE_SCHEMA, TABLE_NAME, JSON_ARRAYAGG(COLUMN_NAME) ColumnNames FROM information_schema.KEY_COLUMN_USAGE WHERE REFERENCED_TABLE_SCHEMA = 'db1' AND REFERENCED_TABLE_NAME = 'b' GROUP BY TABLE_SCHEMA, TABLE_NAME; +--------------+------------+--------------------+ | TABLE_SCHEMA | TABLE_NAME | ColumnNames | +--------------+------------+--------------------+ | db1 | c | ["b_id1", "b_id2"] | | db1 | f | ["b_id1", "b_id2"] | +--------------+------------+--------------------+ 2 rows in set (0.01 sec)

This queries the foreign keys to the b tables. The c and f tables have a foreign key using the b_id1 and b_id2 columns.

This query result also means that the c and f tables are related to the a table through the b table. Would it not be great, if there was a single query that could provide the foreign key chains? Well, in MySQL 8 you can get this using a common table expression (CTE).

Querying Foreign Key Chains – Step by Step Tip: If you are just interested in the final query, skip to the next subsection.

The query will use a recursive common table expression. This requires a seed query and a recursive query (that works on the rows generated in the previous iteration). A good seed query is similar to what we had for the basis query. However, to make it possible to aggregate all of the steps in the chain, the chain will be generated as a JSON array with each part of the chain being a JSON object. The seed query becomes:

mysql> SELECT 1 AS Level, TABLE_SCHEMA, TABLE_NAME, JSON_ARRAY( JSON_OBJECT('Level', 1, 'TableSchema', TABLE_SCHEMA, 'TableName', TABLE_NAME, 'ColumnNames', JSON_ARRAYAGG(COLUMN_NAME) ) ) AS ReferenceChain FROM information_schema.KEY_COLUMN_USAGE WHERE REFERENCED_TABLE_SCHEMA = 'db1' AND REFERENCED_TABLE_NAME = 'a' GROUP BY TABLE_SCHEMA, TABLE_NAME; +-------+--------------+------------+---------------------------------------------------------------------------------+ | Level | TABLE_SCHEMA | TABLE_NAME | ReferenceChain | +-------+--------------+------------+---------------------------------------------------------------------------------+ | 1 | db1 | b | [{"Level": 1, "TableName": "b", "ColumnNames": ["a_id"], "TableSchema": "db1"}] | | 1 | db1 | d | [{"Level": 1, "TableName": "d", "ColumnNames": ["a_id"], "TableSchema": "db1"}] | | 1 | db1 | e | [{"Level": 1, "TableName": "e", "ColumnNames": ["a_id"], "TableSchema": "db1"}] | +-------+--------------+------------+---------------------------------------------------------------------------------+ 3 rows in set (0.01 sec)

Now, you can take each of these relations and look for tables having a foreign key to them, and so forth. That is the recursive part of the query. There is one complication though: GROUP BY is not allowed in the recursive part. The workaround is to use a subquery:

mysql> WITH RECURSIVE `reference` (Level, TABLE_SCHEMA, TABLE_NAME, ReferenceChain) AS ( SELECT 1, TABLE_SCHEMA, TABLE_NAME, JSON_ARRAY( JSON_OBJECT('Level', 1, 'TableSchema', TABLE_SCHEMA, 'TableName', TABLE_NAME, 'ColumnNames', JSON_ARRAYAGG(COLUMN_NAME) ) ) FROM information_schema.KEY_COLUMN_USAGE WHERE REFERENCED_TABLE_SCHEMA = 'db1' AND REFERENCED_TABLE_NAME = 'a' GROUP BY TABLE_SCHEMA, TABLE_NAME UNION SELECT r.Level+1, ko.TABLE_SCHEMA, ko.TABLE_NAME, JSON_ARRAY_APPEND( ReferenceChain, '$', (SELECT JSON_OBJECT('Level', r.Level+1, 'TableSchema', TABLE_SCHEMA, 'TableName', TABLE_NAME, 'ColumnNames', JSON_ARRAYAGG(COLUMN_NAME) ) FROM information_schema.KEY_COLUMN_USAGE ki WHERE ki.TABLE_SCHEMA = ko.TABLE_SCHEMA AND TABLE_NAME = ko.TABLE_NAME AND ki.REFERENCED_TABLE_SCHEMA = ko.REFERENCED_TABLE_SCHEMA AND ki.REFERENCED_TABLE_NAME = ko.REFERENCED_TABLE_NAME) ) FROM `reference` r INNER JOIN information_schema.KEY_COLUMN_USAGE ko ON ko.REFERENCED_TABLE_SCHEMA = r.TABLE_SCHEMA AND ko.REFERENCED_TABLE_NAME = r.TABLE_NAME ) SELECT Level, TABLE_SCHEMA, TABLE_NAME, JSON_PRETTY(ReferenceChain) AS ReferenceChain FROM `reference`\G *************************** 1. row *************************** Level: 1 TABLE_SCHEMA: db1 TABLE_NAME: b ReferenceChain: [ { "Level": 1, "TableName": "b", "ColumnNames": [ "a_id" ], "TableSchema": "db1" } ] *************************** 2. row *************************** Level: 1 TABLE_SCHEMA: db1 TABLE_NAME: d ReferenceChain: [ { "Level": 1, "TableName": "d", "ColumnNames": [ "a_id" ], "TableSchema": "db1" } ] *************************** 3. row *************************** Level: 1 TABLE_SCHEMA: db1 TABLE_NAME: e ReferenceChain: [ { "Level": 1, "TableName": "e", "ColumnNames": [ "a_id" ], "TableSchema": "db1" } ] *************************** 4. row *************************** Level: 2 TABLE_SCHEMA: db1 TABLE_NAME: c ReferenceChain: [ { "Level": 1, "TableName": "b", "ColumnNames": [ "a_id" ], "TableSchema": "db1" }, { "Level": 2, "TableName": "c", "ColumnNames": [ "b_id1", "b_id2" ], "TableSchema": "db1" } ] *************************** 5. row *************************** Level: 2 TABLE_SCHEMA: db1 TABLE_NAME: f ReferenceChain: [ { "Level": 1, "TableName": "b", "ColumnNames": [ "a_id" ], "TableSchema": "db1" }, { "Level": 2, "TableName": "f", "ColumnNames": [ "b_id1", "b_id2" ], "TableSchema": "db1" } ] *************************** 6. row *************************** Level: 3 TABLE_SCHEMA: db1 TABLE_NAME: d ReferenceChain: [ { "Level": 1, "TableName": "b", "ColumnNames": [ "a_id" ], "TableSchema": "db1" }, { "Level": 2, "TableName": "c", "ColumnNames": [ "b_id1", "b_id2" ], "TableSchema": "db1" }, { "Level": 3, "TableName": "d", "ColumnNames": [ "c_id" ], "TableSchema": "db1" } ] *************************** 7. row *************************** Level: 3 TABLE_SCHEMA: db1 TABLE_NAME: f ReferenceChain: [ { "Level": 1, "TableName": "b", "ColumnNames": [ "a_id" ], "TableSchema": "db1" }, { "Level": 2, "TableName": "c", "ColumnNames": [ "b_id1", "b_id2" ], "TableSchema": "db1" }, { "Level": 3, "TableName": "f", "ColumnNames": [ "c_id" ], "TableSchema": "db1" } ] 7 rows in set (0.02 sec)

Here the ARRAY_APPEND() function is used to add the next part of the chain to ReferenceChain. The query relies on that the UNION is a UNION DISTINCT by default, so for the cases where there are two columns in the foreign key, the second (duplicate) row is automatically filtered out. For the main query, JSON_PRETTY() is used to make it easier to read the JSON document. If you are using the query in an application, this is not needed.

You can stop here. The result is correct. However, you may think there are more rows than you would expect. For example the chain a → b is there on its own (1st row) even though there are also tables with foreign keys to b. If you want to include subchains in the result, then you are all set. If you want to filter chains out that are part of another chain, a little more work is needed.

To filter out chains that are also included in subsequent rows, it is in one way or another necessary to keep track of whether a row has any child rows (i.e. that a subsequent row is generated based on the row). One way to do this is to have a serialized form of the chain, however the disadvantage is that you don’t know how long a string you need to store that (and the string length must be specified in the seed query). Another option is to generate an ID for each row – for example using the UUID() function. Then in rows generated from the row make a reference to the parent row. This is the option used here.

A disadvantage of this approach is that for tables with more then one column in the foreign key, the two rows generated are no longer identical. So, it is necessary to handle this in the main query. However, it is now easy to only include the end of the chains as these will not have another row with the parent ID set to the row’s ID. To find this, use a LEFT OUTER JOIN and look for rows where the optional row returns a NULL ID (that is, a row was not found).

Final Query

The final query thus becomes:

mysql> WITH RECURSIVE `reference` (ID, ParentID, Level, TABLE_SCHEMA, TABLE_NAME, ReferenceChain) AS ( SELECT UUID(), CAST(NULL AS CHAR(36)), 1, TABLE_SCHEMA, TABLE_NAME, JSON_ARRAY( JSON_OBJECT('Level', 1, 'TableSchema', TABLE_SCHEMA, 'TableName', TABLE_NAME, 'ColumnNames', JSON_ARRAYAGG(COLUMN_NAME) ) ) FROM information_schema.KEY_COLUMN_USAGE WHERE REFERENCED_TABLE_SCHEMA = 'db1' AND REFERENCED_TABLE_NAME = 'a' GROUP BY TABLE_SCHEMA, TABLE_NAME UNION SELECT UUID(), r.ID, r.Level+1, ko.TABLE_SCHEMA, ko.TABLE_NAME, JSON_ARRAY_APPEND( ReferenceChain, '$', (SELECT JSON_OBJECT('Level', r.Level+1, 'TableSchema', TABLE_SCHEMA, 'TableName', TABLE_NAME, 'ColumnNames', JSON_ARRAYAGG(COLUMN_NAME) ) FROM information_schema.KEY_COLUMN_USAGE ki WHERE ki.TABLE_SCHEMA = ko.TABLE_SCHEMA AND TABLE_NAME = ko.TABLE_NAME AND ki.REFERENCED_TABLE_SCHEMA = ko.REFERENCED_TABLE_SCHEMA AND ki.REFERENCED_TABLE_NAME = ko.REFERENCED_TABLE_NAME) ) FROM `reference` r INNER JOIN information_schema.KEY_COLUMN_USAGE ko ON ko.REFERENCED_TABLE_SCHEMA = r.TABLE_SCHEMA AND ko.REFERENCED_TABLE_NAME = r.TABLE_NAME ) SELECT DISTINCT r.Level AS NumLevels, JSON_PRETTY(r.ReferenceChain) AS ReferenceChain FROM `reference` r LEFT OUTER JOIN `reference` rc ON rc.ParentID = r.ID WHERE rc.ID IS NULL\G *************************** 1. row *************************** NumLevels: 1 ReferenceChain: [ { "Level": 1, "TableName": "d", "ColumnNames": [ "a_id" ], "TableSchema": "db1" } ] *************************** 2. row *************************** NumLevels: 1 ReferenceChain: [ { "Level": 1, "TableName": "e", "ColumnNames": [ "a_id" ], "TableSchema": "db1" } ] *************************** 3. row *************************** NumLevels: 2 ReferenceChain: [ { "Level": 1, "TableName": "b", "ColumnNames": [ "a_id" ], "TableSchema": "db1" }, { "Level": 2, "TableName": "f", "ColumnNames": [ "b_id1", "b_id2" ], "TableSchema": "db1" } ] *************************** 4. row *************************** NumLevels: 3 ReferenceChain: [ { "Level": 1, "TableName": "b", "ColumnNames": [ "a_id" ], "TableSchema": "db1" }, { "Level": 2, "TableName": "c", "ColumnNames": [ "b_id1", "b_id2" ], "TableSchema": "db1" }, { "Level": 3, "TableName": "d", "ColumnNames": [ "c_id" ], "TableSchema": "db1" } ] *************************** 5. row *************************** NumLevels: 3 ReferenceChain: [ { "Level": 1, "TableName": "b", "ColumnNames": [ "a_id" ], "TableSchema": "db1" }, { "Level": 2, "TableName": "c", "ColumnNames": [ "b_id1", "b_id2" ], "TableSchema": "db1" }, { "Level": 3, "TableName": "f", "ColumnNames": [ "c_id" ], "TableSchema": "db1" } ] 5 rows in set (0.03 sec)

The DISTINCT in the main part of the query ensures that duplicates due to multiple columns in the foreign key are filtered out.

Note: One thing this version of the query does not handle is circular key relations. For example if you add the column c_id to a with a foreign key to the c table, then an infinite number of chains will be created. So, there need to be a condition that detects when a loop is getting created. That is an exercise for the reader – or for a later blog.

Thus, this schema has five unique chains leading to the a tables. You can also verify this from the EER diagram – for reference, here it is again:

MySQL Workbench EER Diagram

C-Level MySQL

The old adage that time is money does not give any hints about buying more time. A MySQL Enterprise subscription does allow you to purchase time. Your technical staff has to do more with less time than ever before and operational delays add up extremely quickly.  Having the right tools when things go critical means staff can do what they need to do, when they need to do it.

 

So how does MySQL Enterprise Edition buy you more time?

 

When things inevitably go wrong, searching for free answers to problems on web sites with many dubious or flat out wrong solutions is a waste of precious time.  Informational websites like are fine sources of general information for researching technical issues but not when mission critical services are degraded or down.  An online recommendation to do what very large Enterprise Company does may not make sense for your company if you are not doing exactly what they are doing, at their scale. Getting the right help at crucial moments gets your business back on track faster.

A MySQL Enterprise Support Contract brings you the best technical support on the MySQL Database products by highly trained Oracle MySQL Engineers. You get the right answers to your questions.

 

Tools That Really Help

 

MySQL Enterprise Monitor is included in Enterprise Edition and is the best tool to monitor all your instances and report issues before they escalate.  MEM can make recommendations about your schemas and systems to achieve better performance and provide better understanding of what is really happening to your data. Fixing potential issues before they can blossom is much less expensive.  And you get proactive monitoring to keep on track of you databases instances that will inform your staff how your data is doing letting them concentrate on other issues.

 

Keeping your data safe is a major concern these days in a world with ever shrinking privacy.  MySQL Enterprise Backup ensures that your data is properly archived in case of accident by staff or hardware failure. Data losses are almost inevitable and you need to be able to quickly restore that data and MySQL Enterprise Backup is designed to do just that.

 

By default, MySQL 8.0 encrypts data between server and clients with TLS. But to keep your on-disk data safe from prying eyes, it helps to have at rest encryption paired with Oracle Key Vault to manage encryption key. The auditors doing your PCI or GDPR inspections will know Oracle Key Vault from thousands of other customers and will also recognize the Oracle Audit Vault for its deep reporting capabilities.  Enterprise Authentication can mask designated fields from those not authorized to view that data.  And Enterprise Firewall learns the patterns of your Structured Query Language traffic and can block malicious queries before they reach your database server. Outside software bolt-ons to manage your database security can be found but are not integrated to work together for your protection and managing all those desperate pieces is a drain of your staff’s time, often laborious to maintain and can draw unwanted attention from your auditors.

 

You will receive regular updates of the MySQL Enterprise Software directly from Oracle where the code is written.  It is easy to miss releases of software in the Open Source world and find yourself vulnerable to an exploit in security or missing features in later releases. The vulnerability is identified when the patch is released.

 

Conclusion

 

A MySQL Enterprise Subscription allows you and your staff to concentrate on the issues your business is facing.  It provides the tools you need for oversight, management, security, and safety of your critical data and “buys you more time” to work on delivering value to your enterprise.

 

A Tale of Two JSON Implementations - MySQL and MariaDB

JSON has proven to be a very import data format with immense popularity. A good part of my time for the last two or so years has been dedicated to this area and I even wrote a book on the subject.  This is a comparison of the implementations of handling JSON data in MySQL and MariaDB. I had requests from the community and customers for this evaluation.


JSON Data Types Are Not All Equal
MySQL added a JSON data type in version 5.7 and it has proven to be very popular.  MariaDB has  JSON support  version 10.0.16 but is actually an alias to a longtext data type so that statement based replication from MySQL to MariaDB is possible.

MySQL stores  JSON documents are converted to an internal format that permits quick read access to document elements.  The binary format is structured to enable the server to look up sub-objects or nested values directly by key or array index without reading all values before or after them in the document.  From a practical standpoint the big thing most people notice is that the data is alphabetized.

MariaDB does no such optimization and their documentation states the claim that the performance is at least equivalent.

JSON TablesThe first comparison is 'how hard is it to put non-JSON data into a JSON column?'  The standard (see below under 'extra'), by the way, states that the data must be in the UTF8MB4 character set. And what is going on 'underneath the cover'?

The first step is to create tables to hold JSON data.


MySQL:

create table t1 (j json);
Query OK, 0 rows affected (0.1826 sec)

 MySQL  localhost:33060+ ssl  json  SQL > show create table t1;
+-------+----------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                   |
+-------+----------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `j` json DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+----------------------------------------------------------------------------------------------------------------+
1 row in set (0.0149 sec)

 MySQL  localhost:33060+ ssl  json  SQL > desc t1;
+-------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------+------+-----+---------+-------+
| j     | json | YES  |     | NULL    |       |
+-------+------+------+-----+---------+-------+
1 row in set (0.0028 sec)


MariaDB:

MariaDB [json]> create table t2 (j json);
Query OK, 0 rows affected (0.046 sec)

MariaDB [json]> show create table t2;
+-------+----------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                     |
+-------+----------------------------------------------------------------------------------------------------------------------------------+
| t2    | CREATE TABLE `t2` (
 `j` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+----------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.000 sec)

MariaDB [json]> desc t2;
+-------+----------+------+-----+---------+-------+
| Field | Type     | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| j     | longtext | YES  | | NULL |      |
+-------+----------+------+-----+---------+-------+
1 row in set (0.001 sec)

MariaDB [json]>




Conclusion: Both use UTF8MB4 and the underlying tables are roughly equivalent.

Checking Constraints Ensuring that only JSON gets into the JSON columns is essential. MySQL does this by default for the JSON data type but MariaDB does not.

MySQL:

insert into t1 values('junk');

ERROR: 3140: Invalid JSON text: "Invalid value." at position 0 in value for column 't1.j'.

MariaDB:



MariaDB [json]> insert into t2 values ('junk');
Opps! We now have a NON-JSON value in the table.

To be fair, MariaDB does let you establish a constraint check on the column to avoid this issue.


MariaDB [json]> drop table t2;
Query OK, 0 rows affected (0.046 sec)
MariaDB [json]> create table t2 (j json, check (json_valid(j))); Query OK, 0 rows affected (0.049 sec)

MariaDB [json]> insert into t2 values ('junk'); ERROR 4025 (23000): CONSTRAINT `CONSTRAINT_1` failed for `json`.`t2`
Conclusion: MySQL does as advertised by rejecting non-JSON data by default while MariaDB can do so with a little extra work.
Loading Data
The only major complaint about JSON data is that there is just so much of it. So having a way to bulk load is important.

MySQL:

MySQL's new shell (mysqlsh) has Python, JavaScipt, and SQL modes.  It is very easy to use either the Python or JavaScript modes to write a quick script to read bulk data sets line by line.  Giuseppe 'Datacharmer' Maxia has a great example of converting data from MongoDB to MySQL using the shell and I have used that example code extensively in the past. But now 8.0.13 has a bulk loader built into the shell.
The New MySQL Shell's utility to bulk load JSON
This utility functions allows JSON data to be stores in a JSON Document Collection (MySQL Document Store) or in a JSON column of s structured table. Very handy.


MariaDB:

Maria does not have an equivalent to the new MySQL Shell nor does it have a bulk loader utility for JSON data. 

I tired to use the Connect Storage Engine (not installed by default) and follow the examples on the CONNECT JSON Table Type page without luck. The Connect engine is supposed to have the capability to auto discover a table structure and define the table itself.  Mea Cupla for not getting this to work and I would encourage those who do use this feature to send me pointers PLEASE! But after several hours and tedious attempts to follow the examples exactly it was time to move on to something else.

Conclusion: MySQL does better loading data, especially in bulk.
JSON Functions
Both databases have functions and there are some differences.


  1. Functions That Create JSON Values
    1. JSON_ARRAY, JSON_OBJECT and JSON_QUOTE  are found in both and work the same.
  2. Functions That Search JSON Values
    1. JSON_CONTAINS, JSON_CONTAINS_PATH, JSON_EXTRACT, JSON_KEYS, and JSON_SEARCH are found in both and work the same.  However only MySQL has the -> and ->>  shortcuts for JSON_EXTRACT and JSON_UNQUOTE(JSON_EXTRACT)).. 
  3. Functions That Modify JSON Values
    1. JSON_ARRAY_APPEND, JSON_ARRAY_INSERT, JSON_INSERT, JSON_REMOVE, JSON_REPLACE, JSON_SET, and JSON_UNQUOTE are in both and work the same.  
    2. MariaDB has JSON_MERGE which has been deprecated in MYSQL 8.0.3 and replaced with JSON_MERGE_PRESERVE & JSON_MERGE_PATCH.   MySQL 8.0 supports the JSON Merge Patch format defined in RFC 7396 function.
  4. Functions That Return JSON Value Attributes
    1. JSON_DEPTH, JSON_LENGTH, JSON_TYPE, and JSON_VALID are found in both and work the same.
  5. Table Functions
    1. JSON_TABLE which allows you to use unstructured JSON data in a structured temporary table is in MySQL only.
  6. JSON Utility Functions
    1. JSON_PRETTY, JSON_STORAGE_FREE, and JSON_STORAGE_SIZE are only in MySQL.
  7. Other Functions
    1. JSON_ARRAYAGG and JSON_OBJECTAGG are only in MySQL and are very handy for turning non JSON data into JSON.
    2. JSON_VALUE and JSON_QUERY are MariaDB specific and return an object/array or a scalar respectively.

Conclusion:  I tested both database's functions and found they worked as advertised.  JSON_PRETTY is much missed by my eyes when dealing with documents with several levels of embedding with MariaDB.  The merging functions are richer for MySQL especially for those who need to follow the merge patch standard. And JSON_TABLE is a valuable tool when trying to treat unstructured data as structured data temporarily especially combined with CTEs and Windowing Functions that were introduced in MySQL 8.   
Updating Data
Updating data can be expensive and JSON data can be expansive.
MySQL:
MySQL 5.7 required a complete re-write of the document.  If this is something you do a lot then you need to consider upgrading to MySQL 8.
MySQL 8.0's  the optimizer can perform a partial, in-place update of a JSON column instead of removing the old document and writing the new document in its entirety to the column.  Replication. But there are conditions to this: 1) It has to be a JSON column, 2) The UPDATE statement uses any of the three functions JSON_SET(), JSON_REPLACE(), or JSON_REMOVE() to update the column but a direct set like UPDATE mytable SET jcol = '{"a": 10, "b": 25'}) does not work, 3) The input column and the target column must be the same column, 4) All changes replace existing array or object values with new ones, and do not add any new elements to the parent object or array, and 5) The value being replaced must be at least as large as the replacement value. In other words, the new value cannot be any larger than the old one (An exception to this requirement occurs when a previous partial update has left sufficient space for the larger value. You can use the function JSON_STORAGE_FREE() see how much space has been freed by any partial update). If you can follow those rules the optimizer will do partial rewrites.
MariaDB:
I searched the MariaDB docs and Jira but found no mention of partial JSON column updates.  If anyone has links, please send them to me. So it appears that MariaDB does a full rewrite.
Conclusion:  MySQL is more efficient here.
Replication  Efficient replication is a must and goes double for JSON with the potential of very large document payloads having to cross network connections.
MySQL:
In MySQL 5.7 an update to a JSON column was written to the binary log as the complete document. In MySQL 8.0, it is possible to log partial updates to JSON documents.   In statement based replication JSON partial updates are always logged as partial updates. 
But in row based replication they are logged as complete documents. To enable the logging of partial updates, set binlog_row_value_options=PARTIAL_JSON. Please note that the replication master has this variable set, partial updates received from that master are handled and applied by a replication slave regardless of the slave's own setting for the variable.

MariaDB:

I searched the MariaDB docs and Jira but found no mention of partial JSON Replication updates.  If anyone has links, please send them to me. So expect the full document to be replicated.

Conclusion: MySQL can be more efficient.

X DevAPI / Document StoreOnly MySQL has the Document Store and the underlying Document Store and it is proving to be very popular with customers.  The ability to use a database as a NoSQL Document Store and a relational database is popular.  Not having embedded strings in their code and having the IDEs help is driving developers to this approach. 

MariaDB's CTO said at Zendcon that they will support if customers demand.

Conclusion:  MySQL is the sole choice here.


Overall ConclusionI have been using MySQL's JSON data type since the 5.7 DMRs and know them well.  MariaDB's implementation seems very familiar and works as expected.  MySQL is superior in the partial updates of data and replication, functions such as JSON_TABLE, the X DevAPI, and bulk loading of data. 

Extra
The IETF's standard is here and makes pretty quick reading.

MySQL Replication Data Recovery using 'mysqlbinlog' - Part II

MySQL Replication Data Recovery using 'mysqlbinlog' - Part II

The previous post (PART-I)
http://mysqlhk.blogspot.com/2018/10/mysql-replication-recovery-from-binlog.html

It describes the Replication Recovery from binlog by using those binlog files to be treated as Relay Log.  The Relay Log mechanism when the server is startup, the recovery is the SQL_THREAD applier to apply data to the database.    Check on the PART-I post for details.

Part II is about using the MySQL utility "mysqlbinlog" to dump the content from binlog files and apply the SQL to the Database.

Documentation
https://dev.mysql.com/doc/refman/8.0/en/mysqlbinlog.html
https://dev.mysql.com/doc/refman/8.0/en/point-in-time-recovery.html

The following sections describe the tutorial for Replication Data Recovery using 'mysqlbinlog'. 

The tutorial includes the followings topics (Note: Using GTID replication)
1. Initialize MySQL Instances (3316, 3326) installation & Configure Replication (Master and Slave)
2. Create Database and Table (test1.mytable1) with 8 rows
3. Stop the IO_THREAD on the Slave Server to simulate the situation where Data cannot transport to the RELAY Log on Slave Server.  Creating more data on Master and using "FLUSH LOGS;" to switching log files on Master server.
4. Checking the Replication Status on Slave Server for the Last retrieved position and file on Master.
5. Using 'mysqlbinlog' utility to extract SQL from the Master's binlog files starting from the position and file registered on Slave status.
6. Apply the SQLs to the Slave Server to recover the data and Check the Slave Status

7. Resume Replication "START SLAVE IO_THREAD FOR CHANNEL '<channel name>';"

Installation & Configurationo Assumptions
1. basedir = /usr/local/mysql   : The MySQL Binary Package
2. datadir = /home/mysql/data/server1 : The Server 1 Data Directory
3. datadir = /home/mysql/data/server2 : The Server 2 Data Directory
4. Using GTID and Channel for the replication setup
5. OS user for executing the commands - OS user : 'mysql'  (Do not use root
6. Configuration Files   (/home/mysql/data/my1.cnf & /home/mysql/data/my2.cnf)

/home/mysql/data/my1.cnf
[mysqld]
server-id=1
datadir=/home/mysql/data/server1
basedir=/usr/local/mysql

port=3316
socket=/home/mysql/data/server1/mysqld.sock

mysqlx-port=33160
mysqlx-socket=/home/mysql/data/server1/mysqlx.sock

log-error=/home/mysql/data/server1/mysqld.error
log-bin=mysqllog
relay-log=relay.bin

gtid-mode=on
enforce-gtid-consistency
master_info_repository=TABLE
relay_log_info_repository=TABLE


/home/mysql/data/my2.cnf
[mysqld]
server-id=2
datadir=/home/mysql/data/server2
basedir=/usr/local/mysql

port=3326
socket=/home/mysql/data/server2/mysqld.sock

mysqlx-port=33260
mysqlx-socket=/home/mysql/data/server2/mysqlx.sock

log-error=/home/mysql/data/server2/mysqld.error
log-bin=mysqllog
relay-log=relay.bin

gtid-mode=on
enforce-gtid-consistency
master_info_repository=TABLE
relay_log_info_repository=TABLE


STEP 1 : Initialize MySQL Instances and Configure Replication

Initialize Data Folder for 2 instances

# /usr/local/mysql/bin/mysqld --defaults-file=/home/mysql/data/my1.cnf --initialize-insecure
# /usr/local/mysql/bin/mysqld --defaults-file=/home/mysql/data/my2.cnf --initialize-insecure

Start up MySQL Instances
# /usr/local/mysql/bin/mysqld --defaults-file=/home/mysql/data/my1.cnf &
# /usr/local/mysql/bin/mysqld --defaults-file=/home/mysql/data/my2.cnf &

 Configure Replication User on Master and Slave
# /usr/local/mysql/bin/mysql -uroot -h127.0.0.1 -P3316 -e "
drop user if exists repl@'localhost';
create user repl@'localhost' identified with mysql_native_password by 'repl';
grant replication slave on *.* to repl@'localhost';
"

# /usr/local/mysql/bin/mysql -uroot -h127.0.0.1 -P3326 -e "
drop user if exists repl@'localhost';
create user repl@'localhost' identified with mysql_native_password by 'repl';
grant replication slave on *.* to repl@'localhost';
"

# /usr/local/mysql/bin/mysql -uroot -h127.0.0.1 -P3316 << EOL3316
reset master;
reset slave;
EOL3316

# /usr/local/mysql/bin/mysql -uroot -h127.0.0.1 -P3326 << EOL3326
reset master;
reset slave;
EOL3326

# /usr/local/mysql/bin/mysql -uroot -h127.0.0.1 -P3326 << EOL1

change master to
master_host='127.0.0.1',
master_user='repl',
master_password='repl',
master_port=3316,
master_auto_position=1
for channel 'channel1';

start slave for channel 'channel1';

show slave status for channel 'channel1'\G

EOL1

 STEP 2 : Create Database and Table (test1.mytable1) with 8 rows of data

# /usr/local/mysql/bin/mysql -uroot -h127.0.0.1 -P3316

mysql> create database if not exists test1;
mysql> create table if not exists test1.mytable1 (f1 int not null auto_increment primary key, f2 varchar(20));

mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');



The data should now be replicated to Slave.  To show the data on Slave and Check the Replication Status, Execute :

# /usr/local/mysql/bin/mysql -uroot -h127.0.0.1 -P3326 -e "select count(*) from test1.mytable1;"

 /usr/local/mysql/bin/mysql -uroot -h127.0.0.1 -P3326 -e "show slave status for channel 'channel1'\G"

STEP 3 : Stop the IO_THREAD on Slave Server

This is to simulate the connection between master and slave being disconnected.  So there is no more NEW update to RELAY LOG on Slave Server.   To do this :

# /usr/local/mysql/bin/mysql -uroot -h127.0.0.1 -P3326 -e "stop slave IO_THREAD for channel 'channel1';"

Creating 16 more rows on Master Server, however they are not replicated to Slave Server.

# /usr/local/mysql/bin/mysql -uroot -h127.0.0.1 -P3316

mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');

At this point, Master Server has 32 rows.  However the Slave Server has ONLY 16 rows.

Executing "FLUSH LOGS" to flush the binlog and switch to new logs.  Inserting 16 more rows.
# /usr/local/mysql/bin/mysql -uroot -h127.0.0.1 -P3316

mysql> FLUSH LOGS;
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');

mysql> FLUSH LOGS;
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');

After executing 2 times of "FLUSH LOGS;" commands, there are 3 binlog files created in the datadir "/home/mysql/data/server1" of the Master Server.   They are
mysqllog.000001
mysqllog.000002
mysqllog.000003
mysqllog.index

Number of ROWS on MASTER : 48
Number of ROWS on SLAVE : 16

***************************************************************************************************************
***  To recover the DATA from BINLOG of the Master Server to Slave Server   
***************************************************************************************************************

STEP 4 : Check the status on Slave  Server for what the last position of the BINLOG is and which file is the LAST file on Master Server being used.

Assuming we have the binlog files from MASTER Files - from datadir "/home/mysql/data/server1"
-- mysqllog.000001
-- mysqllog.000002
-- mysqllog.000003
-- mysqllog.index

# /usr/local/mysql/bin/mysql -uroot -h127.0.0.1 -P3326 -e "show slave status\G"
E.g. The output from command may look like this

 *************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: 127.0.0.1
                  Master_User: repl
                  Master_Port: 3316
                Connect_Retry: 60
              Master_Log_File: mysqllog.000001
          Read_Master_Log_Pos: 5422
 
              Relay_Log_File: relay-channel1.000002
                Relay_Log_Pos: 5634
        Relay_Master_Log_File: mysqllog.000001
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
...

...
       Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 98424208-e3f5-11e8-bf4f-0800271b198a:1-18
            Executed_Gtid_Set: 98424208-e3f5-11e8-bf4f-0800271b198a:1-18
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: channel1
           Master_TLS_Version:
       Master_public_key_path:
        Get_master_public_key: 0

 **************************************************
The Master Log File : mysqllog.000001
and Master Log Pos : 5422

This is the last position that the Slave IO_THREAD has read from Master Binlog file.

STEP 5 : Use 'mysqlbinlog' utility to extract SQL from MASTER Binlog files
Note : There are 3 binlog files from datadir "/home/mysql/data/server1"
-- mysqllog.000001
-- mysqllog.000002
-- mysqllog.000003
-- mysqllog.index

The command should include ALL binlog files starting from the one indicated by the Slave Status "Master Log Fie"

# cd /home/mysql/data/server1/
# /usr/local/mysql/bin/mysqlbinlog --start-position=5422 mysqllog.000001 mysqllog.000002 mysqllog.000003 > /tmp/my.sql

The output '/tmp/my.sql' contains ALL SQL statements with data after the transaction from position = 5422.   The FIRST mysqllog.000001 in the command is important which is the Master Log File from the Slave Status.

STEP 6 : Apply the SQL to the Slave Server and Check the Slave Status

# /usr/local/mysql/bin/mysql -uroot -h127.0.0.1 -P3326 < /tmp/my.sql

# /usr/local/mysql/bin/mysql -uroot -h127.0.0.1 -P3326 -e "show slave status\G"
E.g. The output from command may look like this

*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: 127.0.0.1
                  Master_User: repl
                  Master_Port: 3316
                Connect_Retry: 60
              Master_Log_File: mysqllog.000001
          Read_Master_Log_Pos: 5422
               Relay_Log_File: relay-channel1.000002
                Relay_Log_Pos: 5634
        Relay_Master_Log_File: mysqllog.000001
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
...
...
           Retrieved_Gtid_Set: 98424208-e3f5-11e8-bf4f-0800271b198a:1-18
            Executed_Gtid_Set: 98424208-e3f5-11e8-bf4f-0800271b198a:1-54
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: channel1
           Master_TLS_Version:
       Master_public_key_path:
        Get_master_public_key: 0

************************************************************
Comparing the output from the previous Slave Status, the Master Log File and Post do not change.  But the Retrieved GTID_Set and Executed_Gtid_set indicated the execution has already applied the extra rows on the Master.

BEFORE data recovery from BINLOG
           Retrieved_Gtid_Set: 98424208-e3f5-11e8-bf4f-0800271b198a:1-18
            Executed_Gtid_Set: 98424208-e3f5-11e8-bf4f-0800271b198a:1-18

AFTER data recovery from BINLOG 
           Retrieved_Gtid_Set: 98424208-e3f5-11e8-bf4f-0800271b198a:1-18
            Executed_Gtid_Set: 98424208-e3f5-11e8-bf4f-0800271b198a:1-54

The outcome is from recovery is to get more transactions applied to slave.




STEP 7 :  Resume Replication

# /usr/local/mysql/bin/mysql -uroot -h127.0.0.1 -P3326 -e "START SLAVE IO_THREAD FOR CHANNEL 'channel1';"

Creating extra 16 rows on MASTER and check data on SLAVE
# /usr/local/mysql/bin/mysql -uroot -h127.0.0.1 -P3316

mysql> FLUSH LOGS;
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');
mysql>insert into test1.mytable1 (f2) values ('aaaaaaaaaaaaaaa');


Check the Slave Status
# /usr/local/mysql/bin/mysql -uroot -h127.0.0.1 -P3326 -e "show slave status\G"
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: repl
                  Master_Port: 3316
                Connect_Retry: 60
              Master_Log_File: mysqllog.000003
          Read_Master_Log_Pos: 10689
               Relay_Log_File: relay-channel1.000003
                Relay_Log_Pos: 5700
        Relay_Master_Log_File: mysqllog.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
...
...
           Retrieved_Gtid_Set: 98424208-e3f5-11e8-bf4f-0800271b198a:1-18:55-72
            Executed_Gtid_Set: 98424208-e3f5-11e8-bf4f-0800271b198a:1-72
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: channel1
           Master_TLS_Version:
       Master_public_key_path:
        Get_master_public_key: 0
*********************************************************************
The Replication Channel has already been resumed.  The IO and SQL Threads are up and running :
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

The Master_log_file and Read_Master_log_Pos has already been started with new number and file name:
               Master_Log_File: mysqllog.000003
          Read_Master_Log_Pos: 10689

The Retrieved_Gtid_Set and Executed_Gtid_Set on Slave Server has already been updated with latest information :
           Retrieved_Gtid_Set: 98424208-e3f5-11e8-bf4f-0800271b198a:1-18:55-72
            Executed_Gtid_Set: 98424208-e3f5-11e8-bf4f-0800271b198a:1-72


The Master and Slave Servers are UP and RUNNING.


Oracle Recognizes Percona Fixes in MySQL 8.0

MySQL 8.0 Code Contributions (Shutterstock)

An Oracle engineer thanked two Percona engineers by name, along with engineers from Facebook and elsewhere, for their recent MySQL 8.0 source code contributions. Oracle incorporated their work into its latest MySQL production release (8.0.13).

Percona’s Zsolt Parragi authored a patch for a rare replication bug that left locked mutexes in production builds following debug crashes (bug #89421). Yura Sorokin authored a patch to fix wrong file I/O statistics in the MySQL Performance Schema (bug #90264).  Percona CTO Vadim Tkachenko cited both patches as examples of Percona’s continuing contributions to the open source community. This is one of Percona’s core ideals since the company’s founding in 2006.  

In past last three years alone, Percona has reported on over 600 bugs in the MySQL server.  Most of these bug reports Percona provided Oracle engineers with reproducible test cases. They also contained detailed stack traces and other information appropriate for analyzing and fixing the bug. During that same period, Oracle accepted at least 20 patches authored by Percona engineers into its MySQL code base.

Over its 12 year history, Percona engineers have created numerous open source projects that have won widespread community adoption.  These include Percona Server for MySQL, an enhanced version of the flagship MySQL database, Percona XtraDB Cluster, a high availability database solution, Percona Server for MongoDB®, an enhanced fork of the MongoDB® database, a  Percona XtraBackup, a database backup tool, Percona Tookit, a suite of utilities for database administrators, and the most recent, Percona Monitoring and Management (PMM), a GUI tool providing visibility into database performance.

Pages