The ec2-consistent-snapshot software tries its best to flush and lock a MySQL database on an EC2 instance while it initiates the EBS snapshot, and for many environments it does a pretty good job.
However, there are situations where the database may spend time performing crash recovery from the log file when it is started from a copy of the snapshot. We are seeing this behavior at CampusExplorer.com where the database is constantly active and we have innodb_log_file_size set (probably too) high. The delay is doubtless exacerbated by the fact that the blocks on the new EBS volume are being recovered from S3 as it is being built from the snapshot.
Google has created an innodb_disallow_writes MySQL patch which I think points out the problem we may be hitting.
“Note that it is not sufficient to run FLUSH TABLES WITH READ LOCK as there are background IO threads used by InnoDB that may still do IO.”
It would be very nice to have this patch incorporated in MySQL on Ubuntu. It looks like the OurDelta folks have already incorporated the patch. [Update: See rsimmons’ comment below which explains why this particular patch might not be the answer.]
In any case, when we bring up a database using an EBS volume created from an EBS snapshot of an active database, it can take up to 45 minutes recovering before it lets normal clients connect. This is too long for us so we’re trying a new approach.
The ec2-consistent-snapshot now has a --mysql-stop option which shuts down the MySQL server, initiates the snapshot, and then restarts the database. Our hope is that this will get us a snapshot which can be restored and run without delay. If any MySQL experts can point out the potential flaws in this, please do.
Since we obviously can’t stop and start our production database every hour, we are performing this snapshot activity on a replication slave that is dedicated to snapshots and backups.
We continue to perform occasional snapshots on the production database EBS volume just to help keep it reliable per Amazon’s instructions, but we don’t expect to be able to restore it without crash recovery.
If you’d like to test the new --mysql-stop option, please upgrade your ec2-consistent-snapshot package from the Alestic PPA and let me know how it goes.


It would be great to get an option for mysql_socket. I have a non-standard socket location in my setup and I've had to modify ec2-consistent-snapshot so that mysql_host was "localhost:mysql_socket=/path/to/mysql.sock". I don't know enough Perl, or I'd do it myself.
Thanks for all the great tools and articles
-David
I don't think there's a way for you to avoid doing a recovery or clean shutdown. If I understand correctly, the innodb log is basically a way of borrowing against the future, by doing fast sequential writes to the end of a log file instead of random writes to the data files. But eventually the writes need to be applied to the data files. A typical busy mysql server will have plenty of dirty blocks in the buffer pool, which means they have been written to the log but not the data files. At some point you will need to apply these changes to the data files, whether through clean shutdown or recovery. As far as I know the standard options for a _hot_ consistent backup are:
1) filesystem level snapshot and then recovery. you can do the recovery on a spare host immediately after the snapshot so you can quickly restore later in case of disaster
2) percona xtrabackup. this cleverly avoids needs a FS level snapshot, but requires doing a "log apply" step (basically recovery) before being able to use the backup.
You should definitely experiment with reducing innodb_log_file_size as far as you can before it has a negative performance impact. Your recovery time should be directly proportional to it.
As far as I understand, the innodb_disallow_writes patch is just an alternative when you can't do a filesystem level snapshot. It allows you to get a clean copy of the data and logs without shutting down the server, but since it doesn't flush dirty pages you would still need to do a recovery after copying the files.
Performing backups from a slave is fine, but there are some bugs I have encountered with replication that can cause the slave to accumulate small differences from the master over time. So I would recommend checking for that (mk-table-checksum) and/or periodically resyncing the slave from the master.
David: Can you just set the complete value using the --mysql-host option? Please add comments to this ticket: https://bugs.launchpad.net/ec2-consistent-snapshot/+bug/481477
rsimmons: Thanks. A lot of useful information! We'll try reducing innodb_log_file_size and will keep an eye on the replication issues which we've also heard about.
Without the work Eric has done, a lot of us would not be using AWS as much as we do because Eric has made it so much easier.
One thing I am still struggling with is how to set up some form of MySQL replication, Multi-Master and/or Master-Slave configuration on EC2.
It would be wonderful if you could share some of the techniques / best practices you have discovered if not some code!
In any case, thanks for all the stuff you do to help all of us.
Rob
Rob: I learned most of what I know about setting up master-slave replication here:
http://dev.mysql.com/doc/refman/5.0/en/replication-howto.html