ec2-consistent-snapshot on GitHub and v0.43 Released

The source for ec2-conssitent-snapshot has historically been available here:

ec2-consistent-snapshot on Launchpad.net using Bazaar

For your convenience, it is now also available here:

ec2-consistent-snapshot on GitHub using Git

You are welcome to fork ec2-consistent snapshot under the liberal terms of the Apache License, Version 2.0.

I welcome patch submissions, especially if:

New Release of ec2-consistent-snapshot and Screencast by Ahmed Kamal

ec2-consistent-snapshot is a tool that uses the Amazon EC2 API to initiate a snapshot of an EBS volume with some additional work to help ensure that an XFS file system and/or MySQL database are in a consistent state on that snapshot.

Ahmed Kamal pointed out to me yesterday that we can save lots of trouble installing ec2-consistent-snapshot by adding a dependency on the new libnet-amazon-ec2-perl package in Ubuntu instead of forcing people to install the Net::Amazon::EC2 Perl package through CPAN (not easy for the uninitiated).

I released a new version of ec2-consistent-snapshot which has this new dependency and updated documentation. Installing this software on Ubuntu 10.04 Lucid, 10.10 Maverick, and the upcoming Natty release is now as easy as:

ec2-consistent-snapshot: New release 0.35

ec2-consistent-snapshot version 0.35 has been released on the Alestic PPA. This software is a wrapper around the EBS create-snapshot API call and can be used to help ensure that the file system and any MySQL database on the EBS volume are in a consistent state, suitable for restoring at a later time.

The most important change in this release is a fix for a defect that has been nagging many folks for months. In rare situations, the create-snapshot API call itself took longer than 10 seconds to return from the EC2 web service at Amazon. The software did not trap the alarm correctly and exited without unfreezing the XFS file system which forced us to add an awkward unfreeze command in all cron jobs.

Improving Security on EC2 With AWS Identity and Access Management (IAM)

A few hours ago, Amazon launched a public preview of AWS Identity and Access Management (IAM) which is a powerful feature if you have a number of developers who need to access and to manage resources for an AWS account. A unique IAM user can be created for each developer and specific permissions can be doled out as needed.

You can also create IAM users for system functions, dramatically increasing the security of your AWS account in the event a server is compromised. That benefit is the focus of this article using an example frequently cited by EC2 users: Automating EBS snapshots on a local EC2 instance without putting the keys to your AWS kingdom on the file system.

Before the release of AWS IAM, if you wanted to create EBS snapshots in a local cron job on an EC2 instance, you needed to put the master AWS credentials in the file system on that instance. If those AWS credentials were compromised, the attacker could perform all sorts of havoc with resources in your AWS account and charges to your credit card.

With the launch of AWS IAM, we can create a system IAM user with its own AWS keys and all it is allowed to do is… create EBS snapshots! These keys are placed on the instance and used in the snapshot cron job. Now, an attacker can do very little damage with those keys if they are compromised, and we all feel much safer.

The AWS IAM documentation is required reading and a great reference. This article is only intended to serve as a practical introduction to one simple application of IAM.

These instructions assume you are running Ubuntu 10.04 (Lucid) on both your local system and on Amazon EC2. Adjust as appropriate for other distributions and releases.

IAM Installation

Ubuntu does not yet have an official software package for AWS IAM, so we need to download the IAM command line toolkit from Amazon. This can be done on any machine including your local desktop. The IAM command line tools require Java so we need to make sure that is installed as well.

Eventually, you’ll want to install this software somewhere more permanent, but for this demo, we’ll just use it from a subdirectory.

sudo apt-get install openjdk-6-jre unzip
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk
wget http://awsiammedia.s3.amazonaws.com/public/tools/cli/latest/IAMCli.zip
unzip IAMCli.zip
export AWS_IAM_HOME=$(echo $(pwd)/IAMCli-*)
export PATH=$PATH:$AWS_IAM_HOME/bin

The AWS IAM tools require you to save your AWS account’s main access key id and AWS secret access key in yet another file format. Create this AWS credential file as, say, $HOME/.aws-credentials-master.txt in the following format (replacing the values with your own credentials):

AWSAccessKeyId=YOURACCESSKEYIDHERE
AWSSecretKey=YOURSECRETKEYHERE

Note: The above is the sample content of a file you are creating, and not shell commands to run.

Protect the above file and set an environment variable to tell IAM where to find it:

export AWS_CREDENTIAL_FILE=$HOME/.aws-credentials-master.txt
chmod 600 $AWS_CREDENTIAL_FILE

We can now use the iam-* command line tools to create and manage AWS IAM users, groups, and policies.

Create IAM User

How you manage your users and groups is sure to be a personal preference that is fine tuned over time, but for the purposes of this demo, I’ll propose that for tracking purposes we put non-human users into a new group named “system”.

iam-groupcreate -g system

Create the snapshotter system user, saving the keys to a file:

user=snapshotter
iam-usercreate -u $user -g system -k |
  tee $HOME/.aws-keys-$user.txt
chmod 600 $HOME/.aws-keys-$user.txt

You will want to have this snapshotter keys file on the EC2 instance, so copy it there:

rsync -Paz $HOME/.aws-keys-$user.txt REMOTEUSER@REMOTESYSTEM:

Allow IAM user snapshotter to create EBS snapshots of any EBS volume:

iam-useraddpolicy \
  -p allow-create-snapshot \
  -e Allow \
  -u $user \
  -a ec2:CreateSnapshot \
  -r '*'

There’s a lot of preparatory and other commands in this article, but take a second to focus on the fact that the core, functional steps are simply the iam-usercreate and iam-useraddpolicy commands above. Two commands and you have a new AWS IAM user with restricted access to your AWS account.

Create EBS Snapshot

For the purposes of this demo, we’ll assume you’re using the ec2-consistent-snapshot tool to create EBS snapshots with a consistent file system and perhaps a consistent MySQL database. (If you’re not using this tool, then you could have simply used ec2-create-snapshot from any computer without having to go through the trouble of creating a new IAM user.)

Make sure you have the latest ec2-consistent-snapshot software installed on the EC2 instance:

sudo add-apt-repository ppa:alestic/ppa
sudo apt-get install ec2-consistent-snapshot

Create the snapshot on the EC2 instance. Adjust options to fit your local EBS volume mount points and MySQL database setup.

sudo ec2-consistent-snapshot \
  --aws-credentials-file $HOME/.aws-keys-snapshotter.txt \
  --xfs-filesystem /YOURMOUNTPOINT \
  YOURVOLUMEID

Follow similar steps to create users and set policies for other system activities you perform on your EC2 instances. IAM can control access to many different AWS resource types, API calls, specific resources, and has even more fine tuned control parameters including time-based restrictions.

The release of AWS Identity and Access Management alleviates one of the biggest concerns security-conscious folks used to have when they started using AWS with a single key that gave complete access and control over all resources. Now the control is entirely in your hands.

Cleanup

If you have followed the steps in this demo and you wish to undo most of what was done, here are some steps for reference.

Delete the IAM user and the IAM group:

iam-userdel -u $user -r
iam-groupdel -g system

Wipe the credentials and keys files and remove the downloaded and unzipped IAM command line toolkit:

sudo apt-get install wipe
wipe  $HOME/.aws-credentials-master.txt \
      $HOME/.aws-keys-$user.txt
rm    IAMCli.zip
rm -r $AWS_IAM_HOME

Make sure to wipe the snapshotter key file on the remote EC2 instance as well.

Support

If you’re looking for help with AWS IAM, there is a new AWS IAM forum dedicated to the topic.

[Update 2010-11-19: Fix path where new zip file is expanded]

ec2-consistent-snapshot release 0.1-9

Thanks to everybody who submitted bug reports and feature requests for ec2-consistent-snapshot, software which can be used to create consistent EBS snapshots on Amazon EC2 especially for use with XFS and/or MySQL.

A new release (0.1-9) has been published to the Alestic PPA. This release provides the following fixes and enhancements:

  • Read MySQL “host” parameter from .my.cnf if provided. (closes: bug#485978)

  • Support quoted values in .my.cnf (closes: bug#454184)

  • Require Net::Amazon::EC2 version 0.11 or higher. (closes: bug#490686)

  • Require xfsprogs package. (closes: bug#493420)

  • Replace “/etc/init.d/mysql stop” with “mysqladmin shutdown”. (closes: bug#497557)

  • Document –description option which was added earlier. (closes: bug#487692)

If you already have ec2-consistent-snapshot installed, you can upgrade using commands like:

sudo apt-get update
sudo apt-get install ec2-consistent-snapshot

If you don’t yet have the Alestic PPA set up, run these commands first:

codename=$(lsb_release -cs)
echo "deb http://ppa.launchpad.net/alestic/ppa/ubuntu $codename main"|
  sudo tee /etc/apt/sources.list.d/alestic-ppa.list    
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys BE09C571

If you find bugs or think of feature requests, please submit them.

New --mysql-stop option for ec2-consistent-snapshot

The ec2-consistent-snapshot software tries its best to flush and lock a MySQL database on an EC2 instance while it initiates the EBS snapshot, and for many environments it does a pretty good job.

However, there are situations where the database may spend time performing crash recovery from the log file when it is started from a copy of the snapshot. We are seeing this behavior at CampusExplorer.com where the database is constantly active and we have innodb_log_file_size set (probably too) high. The delay is doubtless exacerbated by the fact that the blocks on the new EBS volume are being recovered from S3 as it is being built from the snapshot.

Google has created an innodb_disallow_writes MySQL patch which I think points out the problem we may be hitting.

“Note that it is not sufficient to run FLUSH TABLES WITH READ LOCK as there are background IO threads used by InnoDB that may still do IO."

It would be very nice to have this patch incorporated in MySQL on Ubuntu. It looks like the OurDelta folks have already incorporated the patch. [Update: See rsimmons' comment below which explains why this particular patch might not be the answer.]

In any case, when we bring up a database using an EBS volume created from an EBS snapshot of an active database, it can take up to 45 minutes recovering before it lets normal clients connect. This is too long for us so we’re trying a new approach.

The ec2-consistent-snapshot now has a --mysql-stop option which shuts down the MySQL server, initiates the snapshot, and then restarts the database. Our hope is that this will get us a snapshot which can be restored and run without delay. If any MySQL experts can point out the potential flaws in this, please do.

Since we obviously can’t stop and start our production database every hour, we are performing this snapshot activity on a replication slave that is dedicated to snapshots and backups.

We continue to perform occasional snapshots on the production database EBS volume just to help keep it reliable per Amazon’s instructions, but we don’t expect to be able to restore it without crash recovery.

If you’d like to test the new --mysql-stop option, please upgrade your ec2-consistent-snapshot package from the Alestic PPA and let me know how it goes.

Creating Consistent EBS Snapshots with MySQL and XFS on EC2

In the article Running MySQL on Amazon EC2 with Elastic Block Store I describe the principles involved in using EBS on EC2. Though originally published in 2008, it is still relevant today and is worth reviewing to get context for this article.

In the above tutorial, I included a sample script which followed the basic instructions in the article to initiate EBS snapshots of an XFS file system containing a MySQL database. For the most part this script worked for basic installations with low volume.

Over the last year as I and my co-workers have been using this code in production systems, we identified a number of ways it could be improved. Or, put another way, some serious issues came up when the idealistic world view of the original simplistic script met the complexities which can and do arise in the brutal real world.

We gradually improved the code over the course of the year, until the point where it has been running smoothly on production systems with no serious issues. This doesn’t mean that there aren’t any areas left for improvement, but does seem like it’s ready for the general public to give it a try.

The name of the new program is ec2-consistent-snapshot.

Features

Here are some of the ways in which the ec2-consistent-snapshot program has improved over the original:

  • Command line options for passing in AWS keys, MySQL access information, and more.

  • Can be run with or without a MySQL database on the file system. This lets you use the command to initiate snapshots for any EBS volume.

  • Can be used with or without XFS file systems, though if you don’t use XFS, you run the risk of not having a consistent file system on EBS volume restore.

  • Instead of using the painfully slow ec2-create-snapshot command written in Java, this Perl program accesses the EC2 API directly with orders of magnitude speed improvement.

  • A preliminary FLUSH is performed on the MySQL database before the FLUSH WITH READ LOCK. This preparation reduces the total time the tables are locked.

  • A preliminary sync is performed on the XFS file system before the xfs_freeze. This preparation reduces the total time the file system is locked.

  • The MySQL LOCK now has timeouts and retries around it. This prevents horrible blocking interactions between the database lock, long running queries, and normal transactions. The length of the timeout and the number of retries are configurable with command line options.

  • The MySQL FLUSH is done in such a way that the statement does not propagate through to slave databases, negatively impacting their performance and/or causing negative blocking interactions with long running queries.

  • Cleans up MySQL and XFS locks if it is interrupted, if a timeout happens, or if other errors occur. This prevents a number of serious locking issues when things go wrong with the environment or EC2 API.

  • Can snapshot EBS volumes in a region other than the default (e.g., eu-west-1).

  • Can initiate snapshots of multiple EBS volumes at the same time while everything is consistently locked. This has been used to create consistent snapshots of RAIDed EBS volumes.

Installation

On Ubuntu, you can install the ec2-consistent-snapshot package using the new Alestic PPA (personal package archive) hosted on Launchpad.net. Here are the steps to set up access to packages in the Alestic PPA and install the software package and its dependencies:

sudo add-apt-repository ppa:alestic &&
sudo apt-get update &&
sudo apt-get install -y ec2-consistent-snapshot

Now you can read the documentation using:

man ec2-consistent-snapshot

and run the ec2-consistent-snapshot command itself.

Feedback

If you find any problems with ec2-consistent-snapshot, please create bug reports in launchpad. The same mechanism can be used to submit ideas for improvement, which are especially welcomed if you include a patch.

Other questions and feedback are accepted in the comments section for this article. If you’re reading this on a planet, please click through on the title to read the comments.

[Update 2011-02-09: Simplify install instructions to not require use of CPAN.]