Stockholm, Seoul, Tokyo

MySQL

Percona sessions at the MySQL conference

Planet MySQL - Mon, 2010/03/15 - 11:10pm

Many Percona employees will be at the 2010 MySQL conference. We’ll be giving a lot of informative technical talks on various topics. Here’s a list:

In addition, you’ll be able to visit us at our booth in the expo hall, where you can play Stump The Experts with your tough problems. And Daniel Nichter will be staffing the Maatkit booth, so you can ask him all about Maatkit. If Maatkit is of interest to you, you might also want to attend the Maatkit BoF session.

We look forward to seeing you there. It’s going to be a great conference!

Entry posted by Baron Schwartz | No comment

Add to: | | | |


PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

Denormalization - Examples Needed

Planet MySQL - Mon, 2010/03/15 - 4:16pm
Dear MySQL Community

I would like to ask for your help as I am writing a report for use cases for denormalization.
Could you please add a comment if you have used denormalization in the past to help solve a certain problem. If you could add what was the problem, how did you apply denormalization and how effective (or not) was it?

Thank you in advance.
PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

Dbspj update, teaser for UC

Planet MySQL - Mon, 2010/03/15 - 3:07pm
5 months more...

reminder, what is Dbspj:
- It's a new feature for MySQL Cluster
- It gives the possibility to push-down SQL joins, i.e to evaluate the joins inside the data nodes.

latest and greatest:
- last 5 months spent on SQL integration
- big effort having it used only for cases that we actually support
- lots of testing using RQG
- a significant part of bugs found last couple weeks, has not been related to spj, but in fact "ordinary" optimizer bugs
- we think that it's quite usable now.

and the numbers:
- my plan was to present TPC-W as "realistic" numbers
- now they look more like fantastic numbers
- if i could, i would add a "25x" to my title
- visit my UC presentation (link) to learn more

and fyi: we also plan to provide a feature preview source (or binary) release for



(i will of course also disclose all information provided in UC presentation after the event)
PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

Percona-XtraDB version 9.1

Planet MySQL - Mon, 2010/03/15 - 5:13am

Dear Community,

We are announcing today the new version 9.1 of XtraDB storage engine.

The name of binaries has changed to Percona-XtraDB. It is applicable to all packages including RPM, DEB and tar.gz packages.

New features in version 9.1:

  • MySQL 5.1.43 is taken as the basis
  • packages name changed to Percona-XtraDB
  • Enabled support of SSL
  • Enabled profiling
  • Added script to sort LRU dump
  • New supported platforts are added. The full list includes:
    • CentOS 5 (x86_64 and i386)
    • CenOS 4 (x86_64 and i386)
    • Debian lenny (x86_64 and i386)
    • Debian etch (x86_64 and i386)
    • Ubuntu  Jaunty (x86_64 and i386)
    • Ubuntu Intrepid (x86_64 and i386)
    • Ubuntu Hardy (x86_64 and i386)
    • FreeBSD 8 (x86_64 and i386)
    • OpenSolaris (x86_64)

Fixed bugs:

  • Bug #506894: buf_flush_LRU_recommendation() is too optimistic
  • Fixed mysql-tests:
    • mysql
    • mysql_upgrade
    • ssl tests
    • enabled rpl_killed_ddl and innodb-autoinc tests

Percona-XtraDB obsoletes mysql-server packages, so upgrade is pretty straightforward.
Instead of updating currently installed packages Percona-XtraDB should be installed. It will replace currently installed mysql-server, mysql-client, etc.

Centos platform.
1. If you didn't install yet Percona YUM repo, please do it. It is available both for x86_64 and i386.

2. Install Percona-XtraDB

PLAIN TEXT CODE:
  1. # yum install Percona-XtraDB-server Percona-XtraDB-client

Debian platform.

1. Add Percona repository to the sources.list

PLAIN TEXT CODE:
  1. deb http://repo.percona.com/apt lenny main
  2. deb-src http://repo.percona.com/apt lenny main

Instead of "lenny" put the name of your distribution.
2. Update the repository with "apt-get update"
3. Install Percona-XtraDB package

PLAIN TEXT CODE:
  1. # apt-get install percona-xtradb-server percona-xtradb-client

The binaries for supported platforms are located on http://www.percona.com/percona-builds/Percona-XtraDB/Percona-XtraDB-5.1.43-9.1/ .
The latest source code of XtraDB, including development branch you can find on LaunchPAD.

Please report any bugs found on Bugs in Percona XtraDB server.
For general questions use our Pecona-discussions group, and for development question Percona-dev group.

For support, commercial and sponsorship inquiries contact Percona

Entry posted by Aleksandr Kuzminsky | 8 comments

Add to: | | | |


PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

Response to Community Feedback

Planet MySQL - Mon, 2010/03/15 - 4:11am

In our first alpha of InfiniDB 1.1, we’ve responded to a number of feedback items from our community, and we’d like to thank those of you who have been working with us on these and other things. Specifically, two recurring comments from the community, with respect to our 1.0 version, were: (1) CREATE TABLE statements for wide tables sometimes take a long time to complete; (2) The amount of space used by empty and/or small tables is excessive.


First, a little background as tRead More...
PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

Postgres at MySQL Conference?

Planet MySQL - Mon, 2010/03/15 - 3:49am
During the MySQL conference Call for Papers there was some talk of getting one or two Postgres sessions into the mix, as a lot of MySQL users seem to have questions about Postgres these days. Alas, looking through the MySQLcon schedule I don't see any on there. I've also looked through the BOF's and nothing about Postgres to be found there either. So, maybe no one is interested in Postgres after all.

However I held a Postgres BOF at MySQLcon last year and we got a handful of people, and since I am going to be at MySQLcon again this year, I might as well host one again. I think it's too late to schedule one formally, but I can put some info on the schedule sheets once I'm at the conference; if you are interested in learning some more about Postgres, please keep an eye out.
PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

Thoughts on Thoughts on Drizzle :)

Planet MySQL - Mon, 2010/03/15 - 2:25am

Mark has some good thoughts on drizzle. I think they’re all valid… and have some extra thoughts too:

“I have problems to solve today”. This is (of course) an active concern in my brain… If we don’t have something out that solves some set of problems with reasonable stability and reliability (and soon), then we are failing. I feel we’re getting there, and will have a solid foundation to build upon.

Drizzle replication, MySQL replication: “I can’t compare the two until Drizzle replication is running in production.“. Completely agree. We need to only say replication is stable and reliable when it really is. Realistic test suites are needed. Very defensive programming of the replication system is needed (you want to know when something has gone wrong). We also need to have it constantly be verifying the right thing is going on. We want our problems to be user visible, not silent and invisible. Having high standards will hopefully pay off when people start running it in production….

3 byte int: “Does this mean that some of my tables will grow from 3GB to 4GB on disk?” I think we’re moving the responsibility down to the engines. The 3 byte int type says two things: use less storage space, limit the maximum value. Often you want the former, not the latter. There are many ways to more efficiently pack integers for storage when they are usually smaller than the maximum you want. The protobuf library does a good job of it.

I think it is the job of storage engines to do better here. Once you’re in memory, 3 byte numbers are horrible to work with.. copy out of a row buffer, convert into a 32bit number and then do foo. Modern CPUs favor 32 or 64bit alignment of data a *lot*. 3byte numbers do not align to 32 or 64bits very well… making things much slower for the common case of using cached data.

“I need stored procedures. They are required for high-performance OLTP as they minimize transaction duration for multi-statement transactions.” The reduction of network round trips is always crucial. I think a lot of round trips could go away if you could issue multiple statements at once (not via semicolon separating them, by protocol awesomeness).

There should be a way to send a set of statements that should be executed. There should also be a way to specify that if no error occurred, commit. This could then be (in the common case) a single round trip to the database. You then only have to make round-trips when what statement to issue next depends on the result of a previous one. The next step being to reduce these round trips… which can either be solved by executing something inside the database server (e.g. stored procedures) or something closer to the database server so that the round trips aren’t as large. This would be where Gearman enters.

I’m interested to see where these two approaches (issuing in batches and executing closer to the DB server) fall down… I know that latency may not be as good… but throughput should be a lot better.

I take heart with “I have yet to use them in MySQL” though. I have my own theories as to why this is… my biggest thought is that it’s because the many, many programmers writing SQL that Mark sees aren’t SQL Stored Procedure programmers. They spend their days in a couple of languages (maybe Perl, Python, PHP, Java, C, C++) and never programmed SQL:2003 Stored Procedures and it just doesn’t come as quickly (or as bug free) as writing code in the languages you use every day.

“Long Running insert, update and delete statements consume too many resources in InnoDB.” I wonder if this desire for MyISAM could be filled by PBXT or BlitzDB? The main reason that MyISAM is currently a temporary table only engine is that MyISAM and the server core were never that well separated.

My ultimate wish is that all engine authors take the approach of that there is an API to their engine and the Storage Engine is merely glue between the database server and their API.

The BlitzDB engine has this, Innobase partially does (and my Embedded InnoDB work goes the whole way) and MySQL Cluster is likely the oldest example.

As a side note, the BlitzDB plugin should go into the main Drizzle tree fairly soon. One of the joys of having an optional plugin that doesn’t touch the core of the server is that we can do this without much worry at all.

“Does Drizzle build on Windows?” Well… no. Funnily enough though, it is increasingly easy to make a Windows port. All the platform specific things are increasingly just plugins. The build system is a sticker… and no, we’re not going to switch to CMake. The C stands for something, and it’s something that even I may not print here… (I had never thought that being able to open up automake generated Makefiles and look at them would be a feature).

This next Drizzle milestone release should be exciting though…

I look forward to having Drizzle widely deployed and relied upon… I think we’ll do well..


PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

MySQL Cluster 7.0

Planet MySQL - Mon, 2010/03/15 - 1:00am
MySQL Cluster 7.0 (7.0.13 GA, published on Monday, 15 Mar 2010)
PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

15 months – And it is done

Planet MySQL - Sun, 2010/03/14 - 11:28pm
Finally: Not quite 65 million years in the making (Jurassic Park, hint, hint), but it took about 15 months to get my first book to the printer. A few days ago Udo – my co-author – and I approved the final version of the MySQL Admin Cookbook for publishing. From what I see the book has not been added consistently to the online book stores around the net, but I will most certainly put links on here
PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

Writing another book: Pentaho Kettle Solutions

Planet MySQL - Sun, 2010/03/14 - 9:35pm
Last year, at about this time of the year, I was well involved in the process of writing the book Pentaho Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL" for Wiley. To date, "Pentaho Solutions" is still the only all-round book on the open source Pentaho Business Intelligence suite.

It was an extremely interesting project to participate in, full of new experiences. Although the act of writing was time consuming and at times very trying for me as well as my family, it was completely worth it. I have none but happy memories of the collaboration with my full co-author Jos van Dongen, our technical editors Jens Bleuel, Jeroen Kuiper, Tom Barber and Tomas Morgner, several of the Pentaho Developers, and last but not least, the team at Wiley, in particular Robert Elliot and Sara Shlaer.

When the book was finally published, late August 2009, I was very proud - as a matter of fact, I still am :) Both Jos and I have been rewarded with a lot of positive feedback, and so far, book sales are meeting the expectations of the publisher. We've had mostly positive reviews on places like Amazon, and elsewhere on the web. I'd like to use this opportunity to thank everybody that took the time to review the book: Thank you all - it is very rewarding to get this kind of feedback, and I appreciate it enourmously that you all took the time to spread the word. Beer is on me next time we meet :)

Announcing "Pentaho Kettle Solutions"
In the autumn of 2009, just a month after "Pentaho Solutions" was published, Wiley contacted Jos and me to find out if we were interested in writing a more specialized book on ETL and data integration using Pentaho. I felt honoured, and took the fact that Wiley, an experienced and well-reknowned publisher in the field of data warehousing and business intelligence, voiced interested in another Pentaho book by Jos an me as a token of confidence and encouragement that I value greatly. (For Pentaho Solutions, we heard that Wiley was interested, but we contacted them.) At the same time, I admit I had my share of doubts, having the memories of what it took to write Pentaho Solutions still fresh in my mind.

As it happens, Jos and I both attended the 2009 Pentaho Community Meeting, and there we seized the opportunity to talk to Matt Casters, chief Pentaho Data Integration and founding developer of Kettle (a.k.a. Pentaho Data Integration). Both Jos and I didn't expect Matt to be able to free up any time in his ever busy schedule to help us to write the new book. Needless to say, he made us both very happy when he rather liked the idea, and expressed immediate interest in becoming a full co-author!

Together, the three of us made a detailed outline and wrote a formal proposal for Wiley. Our proposal was accepted in December 2009, and we have been writing since, focusing on the forthcoming Kettle version, Kettle 4.0 . The tentative title of the book is Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration. It is planned to be published in September 2010, and it will have approximately 750 pages.



Our working copy of the outline is quite detailed but may still change in the future, which is why I won't publish it here until we finished our first draft of the book. I am 99% confident that the top level of the outline is stable, and I have no reservation in releasing that already:

  • Part I: Getting Started

    • ETL Primer

    • Kettle Concepts

    • Installation and Configuration

    • Sample ETL Solution


  • Part II: ETL Subsystems

    • Overview of the 34 Subsystems of ETL

    • Data Extraction

    • Cleansing and Conforming

    • Handling Dimension Tables

    • Fact Tables

    • Loading OLAP Cubes


  • Part III: Management and Deployment

    • Testing and Debugging

    • Scheduling and Monitoring

    • Versioning and Migration

    • Lineage and Auditing

    • Securing your Environment

    • Documenting


  • Part IV: Performance and Scalability

    • Performance Tuning

    • Parallization and Partitioning

    • Dynamic Clustering in the Cloud

    • Realtime and Streaming data


  • Part V: Integrating and Extending Kettle

    • Pentaho BI Integration

    • Third-party Kettle Integration

    • Extending Kettle


  • Part VI: Advanced Topics

    • Webservices and Web APIs

    • Complex File Handling

    • Data Vault Management

    • Working with ERP Systems



Feel free to ask me any questions about this new book. If you're interested, stay tuned - I will probably be posting 2 or 3 updates as we go.
PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

O’Gara Cloud Computing Article Off Base

Planet MySQL - Sun, 2010/03/14 - 8:30pm

Maureen O’Gara, self-described as “the most read technology reporter for the past 20 years”, has written an article about Drizzle at Rackspace for one of Sys-con’s online zines called Cloud Computing Journal, of which she is an editor.

I tried commenting on Maureen’s article on their website, but the login system is apparently borked, at least for registered users who use OpenID, which it wants to still have a separate user ID and login. Note to sys-con.com: OpenID is designed so that users don’t have to remember yet another login for your website.

Besides having little patience for content-sparse websites that simply provide an online haven for dozens of Flash advertisements per web page, the article had some serious problems with it, not the least of which was using large chunks of my Happiness is a Warm Cloud article without citation. Very professional.

OK, to start with, let’s take this quote from the article:

Drizzle runs the risk of not being as stable as MySQL, because the Drizzle team is taking things out and putting other stuff in. Of course it may be successful in trying to create a product that’s more stable than MySQL. But creating a stable DBMS engine is something that has always taken years and years.

This is just about the most naïve explanation for whether a product will or will not be stable that I’ve ever read. If Maureen had bothered to email or call any one of the core Drizzle developers, they’d have been happy to tell her what is and is not stable about Drizzle, and why. Drizzle has not changed the underlying storage engines, so the InnoDB storage engine in Drizzle is the same plugin as available in MySQL (version 1.0.6).

The pieces of MySQL which were removed from Drizzle happen to be the parts of MySQL which have had the most stability issues — namely the additional features added to MySQL 5.0: stored procedures, views, triggers, stored functions, the INFORMATION_SCHEMA implementation, and server-side cursors and prepared statements. In addition to these removed features of MySQL, Drizzle also has no built-in Query Cache, does not support anything other than UTF-8 character sets, and has removed the MySQL replication system and binary logging — moving a rewrite of these pieces out into the plugin ecosystem.

The pieces that were added to Drizzle have mostly been done by adding plugins that provide functionality. Maureen, the reason this was done was precisely to allow for greater stability of the kernel by segregating new features and functionality into the plugin ecosystem, where they can be properly versioned and quarantined, therefore increasing kernel stability. It’s pretty much the biggest principle of Drizzle’s design…

The core developers of Drizzle (and much of the Drizzle community) would also have been happy to tell Maureen how the Drizzle team defines “stability”: when the community says Drizzle is stable — simple as that.

OK, so the next thing I took objection to is the following line:

Half of Rackspace’s customers are on MySQL so there’ll be some donkey-style nosing to get them to migrate.

I think my Rackspace colleagues might have quite a bit to say about the above. I haven’t seen any Rackers talking about mass migration from MySQL to Drizzle. As far as I have seen, the plan is to provide Drizzle as an additional service to Rackspace customers.

Rackspace evidently wants its new boys, who were not the core pillars of the MySQL engineering team, to hitch MySQL, er, Drizzle to Cassandra

MySQL != Drizzle. Implying that the two are equal do a disservice to both, as they have very different target markets and developer audiences.

The smart money is betting that even if a good number of high-volume web sites go down this route, an even higher number such as Facebook and Google will continue with relational databases, primarily MySQL.

Again, probably best to do your homework on this one, too. Facebook runs an amalgamation of a custom MySQL version and storage engines, distributed key-value stores, and Memcached servers. I would think that Facebook moving to Drizzle would be one tough migration. Thousands (tens of thousands?) of MySQL servers all running custom software and integrated into their caching layers is a huge barrier to entry, and not one I would expect a large site like Facebook to casually undertake. But, the same could be said about a move to SQL Server or Oracle, for that matter, and has little to do with Drizzle.

Google is moving away from using MySQL entirely. Mark Callaghan, previously at Google, has moved over to Facebook (possibly because of this trend at Google to get rid of MySQL), and Anthony Curtis, formerly of MySQL, then Google, left Google partially because of this reason.

OK, so the next quote got me really fired up because it demonstrates a complete lack of understanding (maybe not Maureen’s, but the unnamed source it’s from at least):

Somebody – sorry we forget who exactly – claimed that as GPL 2 code Drizzle “severely limits revenue opportunities. For Rackspace, the opportunity to have some key Drizzle developers on its payrolls basically comes down to a promotional benefit, trying to position Rackspace as particularly Drizzle-savvy in the eyes of the community and currying favor for its seemingly generous contributions. What’s unclear is whether they may develop some Drizzle-related functionality that they will then not release as open source and just rent out to Rackspace hosting customers…that would be a way for them to differentiate themselves from competitors and GPLv2 would in principle allow this.”

A few points to make about the above quote.

First, name your source. I find it difficult to believe that the most-read technology writer would not write down a source. Is it the same person you deliberately left out of a quote from my Happiness article? (why did you do that, btw?).

Second, the MySQL server source code is licensed under the GPL 2, and so is Drizzle’s kernel, because it is a derivative work of the MySQL server.

Let me be clear: Developers who contribute code to Drizzle do so under the GPLv2 if that contribution is in the Drizzle kernel. If the code contribution is a plugin, the contributor is free to pick whatever license they choose.

Third, licensing has little if anything to do with revenue at all. The license is besides the point. There are two things which dictate the company’s revenue derivation from software:

  1. Copyright ownership
  2. Principles of the Company

Drizzle, Rackspace, or any company a Drizzle contributor works for, does not have the copyright ownership of the MySQL source code, from which Drizzle’s kernel is derived. Oracle does. Therefore, companies do not have any right to re-sell Drizzle (under any license) without explicit permission from Oracle. Period. Has nothing to do with the GPLv2.

That said, contributors do have the right to make money on plugins built for the Drizzle server, and Rackspace, while not having expressed any interest to yours truly in doing so, has the right like any other Drizzle contributor, to make money on plugins its contributors create for Drizzle.

It is my knowledge (after actually having talked to Rackspace managers and decision makers), that Rackspace is not interested in getting into the business of selling commercial Drizzle plugins. Their core direction is to create value for their customers, and I fail to see how getting into the commercial software sales business meets that goal.

Next time, please feel free to contact myself or any other Drizzle contributor to get the low-down on Drizzle-related stuff. We’ll be nice. I promise.


PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

[MySQL][Spider][VP]Spider-2.16 Vartical Partitioning-0.9 released

Planet MySQL - Sun, 2010/03/14 - 8:04pm
I'm pleased to announce the release of Spider storage engine version 2.16(beta) and Vertical Partitioning storage engine version 0.9(beta).
Spider is a Storage Engine for database sharding.
http://spiderformysql.com/
Vertical Partitioning is a Storage Engine for vertical partitioning for a table.
http://launchpad.net/vpformysql

The main changes in this version are following.
(This release for Vertical Partitioning is a bug fix release)
- Add table parameter "semi_split_read".
- Add server parameter "spider_semi_split_read".
  This parameters are for searching performance improvement.

Please see "99_change_logs.txt" in the download documents for checking other changes.

Enjoy!
PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

Thoughts on Drizzle

Planet MySQL - Sun, 2010/03/14 - 7:16pm
I wish the case for Drizzle could be made without bashing MySQL. Sometimes it is, but too often it isn't. I guess this is karma.

This isn't a rant against Drizzle. This is a rant against pulling up Drizzle by pushing down MySQL. I occasionally have negative things to say about MySQL, but I usually say them to get the problems fixed. We have lots of complaints about MySQL because we use it in production.

What have I learned about the Drizzle vision?
  • Drizzle will re-think everything
    • Alas, I have problems to solve today. While I am passionate about doing things correctly, I am also aware that compromises must be made to get things done. Some of those compromises turn out to be mistakes. It isn't always possible to know which compromises will turn out to be a mistake. Nor is it always possible to identify the right thing.
  • You hate MySQL replication? You now love Drizzle
    • I love what Drizzle might do for replication. I love what MySQL is doing with replication. I can't compare the two until Drizzle replication is running in production.
  • MySQL has a data type called a 3-byte integer. Think about that for a moment. On today’s server hardware that does not make a whole lot of sense
    • I thought about it. Does this mean that some of my tables will grow from 3G to 4G on disk? I won't be happy if that is the result.
  • No triggers or stored procedures. That stuff is bloat as done in MySQL, and Drizzle has other ways to deal with these needs. These capabilities can be added in later as needed such that they are done right. 
    • I need stored procedures. They are required for high-performance OLTP as they minimize transaction duration for multi-statement transactions. Alas, I have yet to use them in MySQL.
  • MyISAM is gone. Long live the Queen! 
    • Alas, I need MyISAM. Long-running insert, update and delete statements consume too many resources in InnoDB. Such statements are used for reporting jobs on slaves and in that case I want to use InnoDB for production tables and MyISAM for transient tables.
  • Ever tried to compile MySQL from source. Hah! Yeah, drizzle builds like butter.
    • I have no problems building MySQL from source. I have had more problems building Drizzle because it has a few more dependencies (google protobufs, libdrizzle). But both are easy to build and nobody cares too much in either case with one exception. Does Drizzle build on Windows?

PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

Drizzle on the Rackspace Cloud Blog

Planet MySQL - Sun, 2010/03/14 - 7:30am

Adrian has talked about a few of us Drizzle Hackers joining Rackspace over at the Rackspace Cloud Blog.


PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

Drizzle Developer Day 2010

Planet MySQL - Sun, 2010/03/14 - 6:46am

Hi one and all!

Interested in database systems? Interested because you use them? Because you manage them? Write SQL that goes to them? Or are you one of the people of questionable sanity like myself who develops them?

Well… do we have the offer for you.

Friday, April 16th. Right after the MySQL Conference and Expo at the Santa Clara Convention Center, you can come along to the Drizzle Developer Day.

You will want to add your name to this wiki page: http://drizzle.org/wiki/Drizzle_Developer_Day_2010_signup

Suggest topics over at:
http://drizzle.org/wiki/Drizzle_Developer_Day_2010

Hope to see you there!


PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

Revisiting and defending my Tweets from NoSQL Live in Boston

Planet MySQL - Sat, 2010/03/13 - 9:38pm
Some people have been confused by my Twitter stream from last week's NoSQL Live in Boston conference.  I've never been very good at staying "on message", or adhering to a set of "talking points".  Some people thought I was there to "defend the status quo", or to defend and promote memcached, or memcached+mysql.

True, I was there in part to teach about and promote memcached, and especially Gear6 Memcached.  I have a great employer, but they are not sending me to conferences because they are a charity.  I taught the memcached  breakout session, and also worked the "hallway track", giving a number of people a crash course in what memcache is and what makes it useful, and handed out a pile of business cards.

As for my tweets and pithy statements... Well, some over-simplification has to happen to reduce a concept to 140 characters

My statement, "NoSQL as cheaper to deploy, manage, and maintain is a myth. it costs just as much, if not more", which I said that into the mike at the start of the scaling panel, was very popular to tweet and retweet.

It's something that is a "Jedi Truth", it's true from a "certain point of view".  When you add together the costs of the much shallower "bench" of hirable operational experience, the increased "minimum useful hardware footprint"  (which seems to be about at least 5 full storage nodes), and the evolving maturity of the client libraries, and such, NoSQL is not going to save you money.  Until an important threshold is reached.  When you scale and/or your data representation "impedance mismatch" hits that nasty inflection point, where the "buck for bang" curve suddenly starts to rise hard, and it looks like you will need to start spending infinite amounts of money to keep growing.  Then the NoSQL approach does become cheaper, because it's actually doable.  And it's probably wise to start considering, researching, and then migrating to NoSQL before you hit that wall.

My statement "people have been wanting DBA-less databases about as long as they have wanted programmer-less programming languages" was also popular.  I stand by it.  NoSQL doesn't crack this nut, nothing ever well.  Some NoSQL solutions look like they are "DBA-less", such as AWS SDB, AWS RDB, FluidDB.  Those systems are not DBA-less, they have DBAs, just that the cost of the DBA is "hidden" in the per-drink rental cost of those systems, instead of sitting on your balance sheet as a salary.

The statement "Twitter is using Cassandra because bursty writes are cheap, compared to others" is something I said not because I knew it, but because I just learned it, and I was a bit surprised by it.  I think that the original statement was by Ryan King of Twitter, who was also on the scaling panel.

My statement "Memcached should be integrated into all NoSQL stores" is something I also firmly believe.  The very-high-performance in-memory distributed key value store is a very useful building block for larger systems, and I think that whatever larger NoSQL systems we end up will use it as a component of their internal implementations.

The statement "being able to drop nodes as important as being able to add, because scalability is pointless w/o reliability" was also by Ryan King.  I tweeted it because it is very much something worth broadcasting and remembering it.  It has a little more context in his next statement "The first day we stood up our #cassandra cluster, 2 nodes had hdd die... Clients never noticed."  Machines fail.  And as they get faster and cheaper, and as clusters get bigger, machine failure must become something that must not be any sort of emergency.

My statement "open source means folks dont need a standards body" was a extreme simplication of part of Sandro Hawke's talk.  I tweeted it because it was something I've felt to be mostly true enough for a long time, and it was nice to see someone else recognize it.  As Sandro stated later in twitter, "I think I added an important "sometimes"!".  And he is correct.  It's not true as an absolute statement.



All in all, NoSQL Live was a very good conference.  I felt that the speakers taught and learned, all the other attendees taught and learned, and the networking and hallway track was first rate.  Thanks to 10gen for organizing it, and being entirely fair to their "competition" in the rapidly growing and evolving NoSQL space.
PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

Recent Work on Improving Drizzle’s Storage Engine API

Planet MySQL - Sat, 2010/03/13 - 8:07am

Over the past six weeks or so, I have been working on cleaning up the pluggable storage engine API in Drizzle.  I’d like to describe some of this work and talk a bit about the next steps I’m taking in the coming months as we roll towards implementing Log Shipping in Drizzle.

First, how did it come about that I started working on the storage engine API?

From Commands to Transactions

Well, it really goes back to my work on Drizzle’s replication system.  I had implemented a simple, fast, and extensible log which stored records of the data changes made to a server.  Originally, the log was called the Command Log, because the Google Protobuffer messages it contained were called message::Commands.  The API  for implementing replication plugins was very simple and within a month or so of debuting the API, quite a few replication plugins had been built, including one replicating to Memcached, a prototype one replicating to Gearman, and a filtering replicator plugin.

In addition, Marcus Eriksson had created the RabbitReplication project which could replicate from Drizzle to other data stores, including Cassandra and Project Voldemort.  However, Marcus did not actually implement any C/C++ plugins using the Drizzle replication API.  Instead, RabbitReplication simply read the new Command Log, which due to it simply being a file full of Google Protobuffer messages, was quick and easy to read into memory using a variety of different programming languages.  RabbitReplication is written in Java, and it was great to see other programming languages be able to read Drizzle’s replication log so easily.  Marcus later coded up a C++ TransactionApplier plugin which replaces the Drizzle replication log and instead replicates the GPB messages directly to RabbitMQ.

And there, you’ll note that one of the plugins involved in Drizzle’s replication system is called TransactionApplier.  It used to be called CommandApplier. That was because the GPB Command messages were individual row change events for the most part.  However, I made a series of changes to the replication API and now the GPB messages sent through the APIs are of class message::Transaction.  message::Transaction objects contain a transaction context, with information about the transaction’s start and end time, it’s transaction identifer, along with a series of message::Statement objects, each of which representing a part of the data changes that the SQL transaction made.

Thus, the Command Log now turned into the Transaction Log, and everywhere the term Command was used now was replaced with the terms Transaction and Statement (depending on whether you were talking about the entire Transaction or a piece of it). Log entries were now written at COMMIT to the Transaction Log and were not written if no COMMIT occurred1.

After finishing this work to make the transaction log write Transaction messages at commit time, I was keen to begin coding up the publisher and subscriber plugins which represent a node in the replication environment. However, Brian had asked me to delay working on other replication features and ensure that the replication API could support fully distributed transactions via the X/Open XA distributed transaction protocol. XA support had been removed from Drizzle when the MySQL binlog and original replication system was ripped out and needed some TLC. Fair enough, I said. So, off I went to work on XA.

If Only It Were Simple…

As anyone who has worked on the MySQL source code or developed storage engines for MySQL knows, working with the MySQL pluggable storage engine API is sometimes not the easiest or most straightforward thing. I think the biggest problem with the MySQL storage engine API is that, due to understandable historical reasons, it’s an API that was designed with the MyISAM and HEAP storage engines in mind. Much of the transactional pieces of the API seem to be a bolted-on afterthought and can be very confusing to work with.

As an example, Paul McCullagh, developer of the transactional storage engine PBXT, recently emailed the mysql internals mailing list asking how the storage engine could tell when a SQL statement started and ended. You would think that such a seemingly basic functionality would have a simple answer. You’d be wrong. Monty Widenius answered like this:

Why not simply have a counter in your transaction object for how start_stmt – reset(); When this is 0 then you know stmnt ended.

In Maria we count number of calls to external_lock() and when the sum goes to 0 we know the transaction has ended.

To this, Mark Callaghan responded:

Why does the solution need to be so obscure?

Monty answered (emphasis mine):

Historic reasons.

MySQL never kept a count of which handlers are used by a transaction, only which tables.

So the original logic was that external_lock(lock/unlock) is called for each usage of the table, which is normally more than enough information for a handler to know when a statement starts/ends.

The one case this didn’t work was in the case someone does lock tables as then external_lock is not called per statement. It was to satisfy this case that we added a call to start_stmt() for each table.

It’s of course possible to change things so that start_stmt() / end_stmt() would be called once per used handler, but this would be yet another overhead for the upper level to do which the current handlers that tracks call to external_lock() doesn’t need.

Well, in Drizzle-land, we aren’t beholden to “historic reasons” So, after looking through the in-need-of-attention transaction processing code in the kernel, I decided that I would clean up the API so that storage engines did not have to jump through hoops to notify the kernel they participate in a transaction or just to figure out when a statement and a transaction started and ended.

The resulting changes to the API are quite dramatic I think, but I’ll leave it to the storage engine developers to tell me if the changes are good or not. The following is a summary of the changes to the storage engine API that I committed in the last few weeks.

plugin::StorageEngine Split Into Subclasses

The very first thing I did was to split the enormous base plugin class for a storage engine, plugin::StorageEngine, into two other subclasses containing transactional elements. plugin::TransactionalStorageEngine is now the base class for all storage engines which implement SQL transactions:

/** * A type of storage engine which supports SQL transactions. * * This class adds the SQL transactional API to the regular * storage engine. In other words, it adds support for the * following SQL statements: * * START TRANSACTION; * COMMIT; * ROLLBACK; * ROLLBACK TO SAVEPOINT; * SET SAVEPOINT; * RELEASE SAVEPOINT; */ class TransactionalStorageEngine :public StorageEngine { public: TransactionalStorageEngine(const std::string name_arg, const std::bitset<HTON_BIT_SIZE> &flags_arg= HTON_NO_FLAGS);   virtual ~TransactionalStorageEngine(); ... private: void setTransactionReadWrite(Session& session);   /* * Indicates to a storage engine the start of a * new SQL transaction. This is called ONLY in the following * scenarios: * * 1) An explicit BEGIN WORK/START TRANSACTION is called * 2) After an explicit COMMIT AND CHAIN is called * 3) After an explicit ROLLBACK AND RELEASE is called * 4) When in AUTOCOMMIT mode and directly before a new * SQL statement is started. */ virtual int doStartTransaction(Session *session, start_transaction_option_t options) { (void) session; (void) options; return 0; }   /** * Implementing classes should override these to provide savepoint * functionality. */ virtual int doSetSavepoint(Session *session, NamedSavepoint &savepoint)= 0; virtual int doRollbackToSavepoint(Session *session, NamedSavepoint &savepoint)= 0; virtual int doReleaseSavepoint(Session *session, NamedSavepoint &savepoint)= 0;   /** * Commits either the "statement transaction" or the "normal transaction". * * @param[in] The Session * @param[in] true if it's a real commit, that makes persistent changes * false if it's not in fact a commit but an end of the * statement that is part of the transaction. * @note * * 'normal_transaction' is also false in auto-commit mode where 'end of statement' * and 'real commit' mean the same event. */ virtual int doCommit(Session *session, bool normal_transaction)= 0;   /** * Rolls back either the "statement transaction" or the "normal transaction". * * @param[in] The Session * @param[in] true if it's a real commit, that makes persistent changes * false if it's not in fact a commit but an end of the * statement that is part of the transaction. * @note * * 'normal_transaction' is also false in auto-commit mode where 'end of statement' * and 'real commit' mean the same event. */ virtual int doRollback(Session *session, bool normal_transaction)= 0; virtual int doReleaseTemporaryLatches(Session *session) { (void) session; return 0; } virtual int doStartConsistentSnapshot(Session *session) { (void) session; return 0; } };

As you can see, plugin::TransactionalStorageEngine inherits from plugin::StorageEngine and extends it with a series of private pure virtual methods that implement the SQL transaction parts of a query — doCommit(), doRollback(), etc. Implementing classes simply inherit from plugin::TransactionalStorageEngine and implement their internal transaction processing in these private methods.

In addition to the SQL transaction, however, is the concept of an XA transaction, which is for distributed transaction coordination. The XA protocol is a two-phase commit protocol because it implements a PREPARE step before a COMMIT occurs. This XA API is exposed via two other classes, plugin::XaResourceManager and plugin::XaStorageEngine. plugin::XaResourceManager derived classes implement the resource manager API of the XA protocol. plugin::XaStorageEngine is a storage engine subclass which, while also implementing SQL transactions, also implements XA transactions.

Here is the plugin::XaResourceManager class:

/** * An abstract interface class which exposes the participation * of implementing classes in distributed transactions in the XA protocol. */ class XaResourceManager { public: XaResourceManager() {} virtual ~XaResourceManager() {} ... private: /** * Does the COMMIT stage of the two-phase commit. */ virtual int doXaCommit(Session *session, bool normal_transaction)= 0; /** * Does the ROLLBACK stage of the two-phase commit. */ virtual int doXaRollback(Session *session, bool normal_transaction)= 0; /** * Does the PREPARE stage of the two-phase commit. */ virtual int doXaPrepare(Session *session, bool normal_transaction)= 0; /** * Rolls back a transaction identified by a XID. */ virtual int doXaRollbackXid(XID *xid)= 0; /** * Commits a transaction identified by a XID. */ virtual int doXaCommitXid(XID *xid)= 0; /** * Notifies the transaction manager of any transactions * which had been marked prepared but not committed at * crash time or that have been heurtistically completed * by the storage engine. * * @param[out] Reference to a vector of XIDs to add to * * @retval * Returns the number of transactions left to recover * for this engine. */ virtual int doXaRecover(XID * append_to, size_t len)= 0; };

and here is the plugin::XaStorageEngine class:

/** * A type of storage engine which supports distributed * transactions in the XA protocol. */ class XaStorageEngine :public TransactionalStorageEngine, public XaResourceManager { public: XaStorageEngine(const std::string name_arg, const std::bitset<HTON_BIT_SIZE> &flags_arg= HTON_NO_FLAGS);   virtual ~XaStorageEngine(); ... };

Pretty clear. A plugin::XaStorageEngine inherits from both plugin::TransactionStorageEngine and plugin::XaResourceManager because it implements both SQL transactions and XA transactions. The InnobaseEngine is a plugin which inherits from plugin::XaStorageEngine because InnoDB supports SQL transactions as well as XA.

Explicit Statement and Transaction Boundaries

The second major change I made addressed the problem that Mark Callaghan noted in asking why finding out when a statement starts and ends was so obscure. I added two new methods to plugin::StorageEngine called doStartStatement() and doEndStatement(). The kernel now explicitly tells storage engines when a SQL statement starts and ends. This happens before any calls to Cursor::external_lock() happen, and there are no exception cases. In addition, the kernel now always tells transactional storage engines when a new SQL transaction is starting. It does this via an explicit call to plugin::TransactionalStorageEngine::doStartTransaction(). No exceptions, and yes, even for DDL operations.

What this means is that for a transactional storage engine, it no longer needs to “count the calls to Cursor::external_lock()” in order to know when a statement or transaction starts and ends. For a SQL transaction, this means that there is a clear code call path and there is no need for the storage engine to track whether the session is in AUTOCOMMIT mode or not. The kernel does all that work for the storage engine. Imagine a Session executes a single INSERT statement against an InnoDB table while in AUTOCOMMIT mode. This is what the call path looks like:

drizzled::Statement::Insert::execute() | -> drizzled::mysql_lock_tables() | -> drizzled::TransactionServices::registerResourceForTransaction() | -> drizzled::plugin::TransactionalStorageEngine::startTransaction() | -> InnobaseEngine::doStartTransaction() | -> drizzled::plugin::StorageEngine::startStatement() | -> InnobaseEngine::doStartStatement() | -> drizzled::plugin::StorageEngine::getCursor() | -> drizzled::Cursor::write_row() | -> InnobaseCursor::write_row() | -> drizzled::TransactionServices::autocommitOrRollback() | -> drizzled::plugin::TransactionStorageEngine::commit() | -> InnobaseEngine::doCommit()

I think this will come as a welcome change to storage engine developers working with Drizzle.

No More Need for Engine to Call trans_register_ha()

There was an interesting comment in the original documentation for the transaction processing code. It read:

Roles and responsibilities
————————–

The server has no way to know that an engine participates in
the statement and a transaction has been started
in it unless the engine says so. Thus, in order to be
a part of a transaction, the engine must “register” itself.
This is done by invoking trans_register_ha() server call.
Normally the engine registers itself whenever handler::external_lock()
is called. trans_register_ha() can be invoked many times: if
an engine is already registered, the call does nothing.
In case autocommit is not set, the engine must register itself
twice — both in the statement list and in the normal transaction
list.

That comment, and I’ve read it dozens of times, always seemed strange to me. I mean, does the server really not know that an engine participates in a statement or transaction unless the engine tells it? Of course not.

So, I removed the need for a storage engine to “register itself” with the kernel. Now, the transaction manager inside the Drizzle kernel (implemented in the TransactionServices component) automatically monitors which engines are participating in an SQL transaction and the engine doesn’t need to do anything to register itself.

In addition, due to the break-up of the plugin::StorageEngine class and the XA API into plugin::XaResourceManager, Drizzle’s transaction manager can now coordinate XA transactions from plugins other than storage engines. Yep, that’s right. Any plugin which implements plugin::XaResourceManager can participate in an XA transaction and Drizzle will act as the transaction manager. What’s the first plugin that will do this? Drizzle’s transaction log. The transaction log isn’t a storage engine, but it is able to participate in an XA transaction, so it will implement plugin::XaResourceManager but not plugin::StorageEngine.

Performance Impact of Code Changes

So, that “yet another overhead” Monty talked about in the quote above? There wasn’t any noticeable impact in performance or scalability at all. So much for optimize-first coding.

What’s Next?

The next thing I’m working on is removing the notion of the “statement transaction”, which is also a historical by-product, this time because of BerkeleyDB. Gee, I’ve got a lot of work ahead of me…

[1] Actually, there is a way that a transaction that was rolled back can get written to the transaction log. For bulk operations, the server can cut a Transaction message into multiple segments, and if the SQL transaction is rolled back, a special RollbackStatement message is written to the transaction log.


PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

Index only

Planet MySQL - Sat, 2010/03/13 - 3:29am
A problem with SQL is SQL. It is easy to write queries that require random IO in the worst case. It is usually easy to find queries that do too much random IO on a NoSQL system as you must code the extra data fetches manually.

Digg has begun to write about their reasons for migrating from MySQL to Cassandra. They provide an excellent summary and then describe a performance problem fixed by the migration. I think Cassandra and a few other members of the NoSQL family are amazing technology but I don't think a migration was needed to fix this performance problem. A better index on the Diggs table would have done that. Others have said the same thing. Maybe I don't have all of the details. I can only go on what was written in the blog.

You can learn more about the power of indexes at the MySQL conference.

The Diggs table was the source of the problem:
CREATE TABLE Diggs (
  id      INT(11),
  itemid  INT(11),
  userid  INT(11),
  digdate DATETIME,
  PRIMARY KEY (id),
  KEY user  (userid),
  KEY item  (itemid)
) ENGINE=InnoDB;It supported an important query that was too slow. A simple form of this query is:
SELECT digdate, id
FROM Diggs
WHERE userid in (10, 20, 30) AND itemid = 50
ORDER BY digdate DESC, id DESC LIMIT 4;This query requires too much random IO because it isn't index only. The query can use either the index on itemid or the index on userid. In both cases it will scan more entries than it needs to from the secondary index and then lookup the remaining columns from the primary index. Each lookup on the primary index can do one disk seek. On my test server the plan for this query is:
id    select_type    table    type    possible_keys    key    key_len    ref    rows    Extra
1    SIMPLE    Diggs    ref    user,item    item    5    const    8960    Using where; Using filesortAfter running the query I ran SHOW SESSION STATUS LIKE "Handler_read%" and the result from that is below. The query scanned 5000 entries from the secondary index and would have done more than 5000 disk seeks in the worst case to lookup columns from the primary key index.

Variable_name    Value
Handler_read_first    0
Handler_read_key    3
Handler_read_next    5000
Handler_read_prev    0
Handler_read_rnd    0
Handler_read_rnd_next    0The query is much faster for a table with different indexes
CREATE TABLE DiggsFast (
  id      INT(11),
  itemid  INT(11),
  userid  INT(11),
  digdate DATETIME,
  PRIMARY KEY (itemid,userid,digdate,id),
  UNIQUE KEY (id)
) ENGINE=InnoDB;
The query has a better plan:
id    select_type    table    type    possible_keys    key    key_len    ref    rows    Extra
1    SIMPLE    DiggsFast    range    PRIMARY    PRIMARY    8    NULL    149    Using where; Using index; Using filesort It also is much better in reality. The output from SHOW SESSION STATUS LIKE "Handler_read%" is listed below. With a better index the query scans 150 entries from 5 range scans of the index. It should do about 5 disk seeks in the worst case. It is also index only so it doesn't have to lookup other columns after the index scan. Although that doesn't matter much in this case because the query uses the primary key index which has all columns for an InnoDB table. This query will be much faster than the previous one (5 disk seeks versus 5000).

Note that this query uses the first two columns in the primary index for the predicates on itemid and userid. InnoDB stores all columns in the primary key index entries so any query that uses the PK index is index only.
Variable_name    Value
Handler_read_first    0
Handler_read_key    5
Handler_read_next    150
Handler_read_prev    0
Handler_read_rnd    0
Handler_read_rnd_next    0 UPDATE

I created another variant of the Diggs table that uses a secondary index for the query. InnoDB includes all columns from a PK index in the secondary index to serve as the pointer to the row. Note there is a difference between being 'in the index' and being indexed.
CREATE TABLE DiggsFast2 (
  id      INT(11),
  itemid  INT(11),
  userid  INT(11),
  digdate DATETIME,
  KEY itemuserdig (itemid,userid,digdate),
  PRIMARY KEY (id)
) ENGINE=InnoDB;From the query plan:
id    select_type    table    type    possible_keys    key    key_len    ref    rows    Extra
1    SIMPLE    DiggsFast2    range    itemuserdig    itemuserdig    10    NULL    150    Using where; Using index; Using filesort
digdate    idAnd SHOW SESSION LIKE "Handler_read%"
Variable_name    Value
Handler_read_first    0
Handler_read_key    5
Handler_read_next    150
Handler_read_prev    0
Handler_read_rnd    0
Handler_read_rnd_next    0
PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

Understanding Drizzle user authentication options &#8211; Part 2

Planet MySQL - Fri, 2010/03/12 - 11:45pm

A key differentiator in Drizzle from it’s original MySQL roots is user based authentication. Gone is the host/user and schema/table/column model that was stored in the MyISAM based mysql.user table.

Authentication is now completely pluggable, leveraging existing systems such as PAM, LDAP via PAM and Http authentication.

In this post I’ll talk about HTTP authentication which requires an external http server to implement successfully. You can look at Part 1 for PAM authentication.

Compiling for http auth support

By default during compilation you may find.

checking for libcurl... no configure: WARNING: libcurl development lib not found: not building auth_http plugin. On Debian this is found in libcurl4-gnutls-dev. On RedHat it's in libcurl-devel.

In my case I needed:

$ sudo yum install curl-devel

NOTE: Bug #527255 talks about issues of the message being incorrect for libcurl-devel however this appears it may be valid in Fedora Installs

After successfully installing the necessary pre-requisite you should see.

checking for libcurl... yes checking how to link with libcurl... -lcurl checking if libcurl has CURLOPT_USERNAME... no HTTP Authentication

We need to enable the plugin at server startup.

$ sbin/drizzled --mysql-protocol-port=3399 --plugin_add=auth_http &

You need to ensure the auth_http plugin is active by checking the data dictionary plugin table.

drizzle> select * from data_dictionary.plugins where plugin_name='auth_http'; +-------------+----------------+-----------+-------------+ | PLUGIN_NAME | PLUGIN_TYPE | IS_ACTIVE | MODULE_NAME | +-------------+----------------+-----------+-------------+ | auth_http | Authentication | TRUE | | +-------------+----------------+-----------+-------------+

The auth_http plugin also has the following system variables.

drizzle> SHOW GLOBAL VARIABLES LIKE '%http%'; +------------------+-------------------+ | Variable_name | Value | +------------------+-------------------+ | auth_http_enable | OFF | | auth_http_url | http://localhost/ | +------------------+-------------------+ 2 rows in set (0 sec)

In order to configure Http authentication, you need to have the following settings added to your drizzled.cnf file. For example:

$ cat etc/drizzled.cnf [drizzled] auth_http_enable=TRUE auth_http_url=http://thedrizzler.com/auth

NOTE: Replace the domain name with something you have, even localhost.

A Drizzle restart gives us

$ bin/drizzle -e "SHOW GLOBAL VARIABLES LIKE 'auth_http%'" +------------------+-----------------------------+ | Variable_name | Value | +------------------+-----------------------------+ | auth_http_enable | ON | | auth_http_url | http://thedrizzler.com/auth | +------------------+-----------------------------+

By default, currently if the settings result in an invalid url, then account validation does not fail and you can still login. It is recommended that you always configure pam authentication as well as a fall back.

$ wget -O tmp http://thedrizzler.com/auth --17:32:32-- http://thedrizzler.com/auth Resolving thedrizzler.com... 208.43.73.220 Connecting to thedrizzler.com|208.43.73.220|:80... connected. HTTP request sent, awaiting response... 404 Not Found 17:32:32 ERROR 404: Not Found. $ bin/drizzle drizzle > exit Configuring passwords

To correctly configured your web server to perform the HTTP auth, you can use this Apache syntax as an example.

The following is added to the VirtualHost entry in your web browser.

<Directory /var/www/drizzle/auth> AllowOverride FileInfo All AuthConfig AuthType Basic AuthName "Drizzle Access Only" AuthUserFile /home/drizzle/.authentication Require valid-user </Directory> $ sudo su - $ mkdir /var/www/drizzle/auth $ touch /var/www/drizzle/auth/index.htm $ apachectl graceful

We check we now need permissions for the URL.

$ wget -O tmp http://thedrizzler.com/auth --17:35:48-- http://thedrizzler.com/auth Resolving thedrizzler.com... 208.43.73.220 Connecting to thedrizzler.com|208.43.73.220|:80... connected. HTTP request sent, awaiting response... 401 Authorization Required Authorization failed.

You need to create the username/password for access.

$ htpasswd -cb /home/drizzle/.authentication testuser sakila $ cat /home/drizzle/.authentication testuser:85/7CbdeVql4E

Confirm that the http auth with correct user/password works.

$ wget -O tmp http://thedrizzler.com/auth --user=testuser --password=sakila --17:37:45-- http://thedrizzler.com/auth Resolving thedrizzler.com... 208.43.73.220 Connecting to thedrizzler.com|208.43.73.220|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Drizzle HTTP Authentication in action

By default we now can’t login

$ bin/drizzle ERROR 1045 (28000): Access denied for user ''@'127.0.0.1' (using password: NO) $ bin/drizzle --user=testuser --password=sakila999 ERROR 1045 (28000): Access denied for user 'testuser'@'127.0.0.1' (using password: YES) $ bin/drizzle --user=testuser --password=sakila Welcome to the Drizzle client.. Commands end with ; or \g. Your Drizzle connection id is 6 Server version: 7 Source distribution (trunk) Type 'help;' or '\h' for help. Type '\c' to clear the buffer. drizzle>
PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL

Understanding Drizzle user authentication options &#8211; Part 1

Planet MySQL - Fri, 2010/03/12 - 10:46pm

A key differentiator in Drizzle from it’s original MySQL roots is user based authentication. Gone is the host/user and schema/table/column model that was stored in the MyISAM based mysql.user table.

Authentication is now completely pluggable, leveraging existing systems such as PAM, LDAP via PAM and Http authentication.

In this post I’ll talk about PAM authentication which is effectively your current Linux based user security.

This information is based on the current build 1317.

Compiling for PAM support

Your Drizzle environment needs to be compiled with PAM support. You would have received the following warning during a configure.

$ ./configure ... checking for libpam... no configure: WARNING: Couldn't find PAM development support, pam_auth will not be built. On Debian, libpam is in libpam0g-dev. On RedHat it's in pam-devel.

The solution is provided in the warning message which is another great thing about Drizzle. The pre checks for dependencies and the optional messages like these far exceed the MySQL equivalent compilation process. In my case:

$ sudo yum install pam-devel

When correctly configured, it should look like:

checking for libpam... yes checking how to link with libpam... -lpam Working with PAM

You need to enable the PAM authentication plugin at drizzled startup.

sbin/drizzled --plugin_add=auth_pam &

Unfortunately connecting fails to work with

time sbin/drizzle --user=testuser --password=***** --port=4427 real 0m0.003s user 0m0.003s sys 0m0.001s

A look into the source at src/drizzle-2010.03.1317/plugin/auth_pam/auth_pam.cc shows a needed config file

117 retval= pam_start("check_user", userinfo.name, &conv_info, &pamh); Configuring PAM

In order to enable PAM with Drizzle you need to have the following system configuration.

$ cat /etc/pam.d/check_user auth required pam_unix.so account required pam_unix.so $ time sbin/drizzle --user=testuser --password=***** --port=4427 ERROR 1045 (28000): Access denied for user 'testuser'@'127.0.0.1' (using password: YES) real 0m2.055s user 0m0.002s sys 0m0.002s

This did some validation but still failed.

It seems Bug #484069 may fix this problem, however this is not currently in the main line!

Stay Tuned!

You first need to ensure the pam plugin is active by checking the data dictionary plugin table.

drizzle> select * from data_dictionary.plugins where plugin_name=’pam’; +————-+—————-+—————+—————+————————–+—————-+ | PLUGIN_NAME | PLUGIN_VERSION | PLUGIN_STATUS | PLUGIN_AUTHOR | PLUGIN_DESCRIPTION | PLUGIN_LICENSE | +————-+—————-+—————+—————+————————–+—————-+ | pam | 0.1 | ACTIVE | Brian Aker | PAM based authenication. | GPL | +————-+—————-+—————+—————+————————–+—————-+ 1 row in set (0 sec)

-->


PlanetMySQL Voting: Vote UP / Vote DOWN
Categories: MySQL