Whilst supporting customers at SkySQL I often have to load gigabytes of SQL data into MySQL servers to run tests. This process can be slow especially for InnoDB because in a standard dump file every insert is a transaction and every transaction has to be synchronised to disk for crash safety. The thing is, most of the time I don’t care if the machine I’m using crashes whilst I’m loading this data into the server. There are of course many ways around this, such as editing the SQL files and wrapping transactions around batches of inserts and editing the configuration files to disable all the syncing involved. But I don’t want one configuration to load in data and then another to play with the data, so this is where libeatmydata comes in.
libeatmydata is a preloaded library that disables disk syncing functionality from doing just that. The OS will decide when to sync the data to disk. This is great for loading in an SQL dump file, taking single insert dumps on default configuration down from hours to minutes. But you wouldn’t want to do it during the production running of your server because power failure would certainly lose you some data.
So, how do you use libeatmydata with MySQL? Simple, this is the command to start it:
LD_PRELOAD=/usr/lib/libeatmydata.so mysqld
Then you can load in your dump file, shutdown mysqld safely and start it up again without libeatmydata.
A great application I could see for this is scripting the startup of slaves, feeding a dump file into the server with libeatmydata and then restarting without this once the slave is ready.
UPDATE
Kristian Nielsen asked in the comments how much faster it is, so I have run a quick benchmark to find out. In this test I am using a 218MB test file of single row inserts I had generated for an old support issue. I am also using a clean MySQL 5.1.51 installation (cleaned on each run) on my i7 based laptop:
Vanilla MySQL 5.1.51
real 166m19.504s
user 0m23.891s
sys 0m6.084s
MySQL 5.1.51 with --sync-binlog=0 --innodb_flush_log_at_trx_commit=0
real 5m33.578s
user 0m11.096s
sys 0m3.215s
MySQL 5.1.51 with libeatmydata
real 3m14.123s
user 0m10.932s
sys 0m3.108s
How much faster is this than something like this?
ReplyDeletemysqld --sync-binlog=0 --innodb_flush_log_at_trx_commit=0
The problem with the libeatmydata approach is that in case of power failure
(or even mysqld crash, depending on what libeatmydata does), you can end up
with InnoDB unable to recover at all. InnoDB relies on fsync() to know when
tablespace data is flushed to disk so that it can make checkpoints and safely
discard old parts of its recovery log.
You should use the eatmydata command instead of using LD_PRELOAD as the library might be located on some other location (like it is on Ubuntu 10.10).
ReplyDeleteUsing mysqld with sync_binlog and innodb_flush_log_at_trx_commit will still use fsync for some things like frm syncing and probably also for xa transactions.
$ strace -fe sync eatmydata sync
Process 3959 attached
Process 3959 detached
--- SIGCHLD (Child exited) @ 0 (0) ---
Process 3960 attached
Process 3960 detached
--- SIGCHLD (Child exited) @ 0 (0) ---
$ strace -fe sync sync
sync() = 0
kristiannielsen: the point I am trying to make is whilst importing a dump file into a new mysqld I don't really mind that InnoDB won't recover from a power failure, I can just start again. If mysqld crashes, well, in my job that is a bonus ;)
ReplyDeleteI've not benchmarked one vs. the other yet, but using libeatmydata feels faster in my opinion.
Daniël: the executable isn't in the bzr version I have, I don't use Ubuntu and I suspect they create or package one (or a script or something).
kristiannielsen: I've updated this post with a quick benchmark
ReplyDeleteAndrew,
ReplyDeleteWhat hardware configuration did you run it on ?
I would expect fsync() to add a lot on systems without BBU but a lot less on real servers and BTW do you also strip O_DIRECT flag using this library some way ?
I guess this is the main advantage as for transaction commit related overhead it is a lot
easier to change innodb_flush_log_at_trx_commit variable.
Also also did you check nosync option in Innodb ?
Hey Peter,
ReplyDeleteThanks for the comment.
I used a Lenovo T410, i7 M620 CPU, 4GB RAM (I forget the speed/type), ST9500420AS 7200RPM HDD (I totally agree on your BBU comment). Running Fedora 15 pre-release at the time (currently testing Ubuntu 11.04 on it). The my.cnf file only had the datadir in it, so all defaults.
I just checked the source and libeatmydata strips O_SYNC and O_DSYNC during open() but doesn't touch O_DIRECT.
I did not check the nosync option in InnoDB. That is a good point. Unfortunately the test data I used is destroyed now. If I find time I'll try and start a new benchmark.
The reason I don't strip out O_DIRECT is that using O_DIRECT places IO alignment requirements on the application, and if we stripped out O_DIRECT it means that an app could violate these rules and not get errors. I didn't want libeatmydata to possibly hide bugs.
ReplyDeleteFor innodb, nosync is an option. But libeatmydata also catches other sync operations in the database server (and for mysql test-run as I originally wrote libeatmydata for, NDB processes).