ec2-consistent-snapshot release 0.1-9

Thanks to everybody who submitted bug reports and feature requests for ec2-consistent-snapshot, software which can be used to create consistent EBS snapshots on Amazon EC2 especially for use with XFS and/or MySQL.

A new release (0.1-9) has been published to the Alestic PPA. This release provides the following fixes and enhancements:

  • Read MySQL “host” parameter from .my.cnf if provided. (closes: bug#485978)

  • Support quoted values in .my.cnf (closes: bug#454184)

  • Require Net::Amazon::EC2 version 0.11 or higher. (closes: bug#490686)

  • Require xfsprogs package. (closes: bug#493420)

  • Replace “/etc/init.d/mysql stop” with “mysqladmin shutdown”. (closes: bug#497557)

  • Document –description option which was added earlier. (closes: bug#487692)

If you already have ec2-consistent-snapshot installed, you can upgrade using commands like:

sudo apt-get update
sudo apt-get install ec2-consistent-snapshot

If you don’t yet have the Alestic PPA set up, run these commands first:

codename=$(lsb_release -cs)
echo "deb http://ppa.launchpad.net/alestic/ppa/ubuntu $codename main"|
  sudo tee /etc/apt/sources.list.d/alestic-ppa.list    
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys BE09C571

If you find bugs or think of feature requests, please submit them.

Listing Recent Prices for EC2 Spot Instances

The new spot instances on EC2 are a great way to get some extra compute power at a price you can live with, especially if you are flexible on exactly when the instances run. On the other hand maybe you won’t get the compute power if the spot instance price never drops to the max price you are willing to pay.

The best way to approach auction type situations like this is often to simply list the maximum price you can afford. Your instance(s) will get run if and when the spot instance price reaches that price and you will regularly get charged less depending on what other users are bidding for their instances.

Though I don’t recommend trying to chase the spot instance price around, it is natural to be curious about what others have been paying and whether or not you might have a chance to get in with your bid.

The ec2-describe-spot-price-history command lists the historical values of the spot instance price for all sizes and types of instances. This is useful for studying fluctuations over time, but sometimes you’ll just want to know what the latest spot instance price is, even though this is not necessarily the price you would get charged if you opened new spot instance requests.

The following command shows the most recent spot instance price for each size of the Linux instances:

ec2-describe-spot-price-history -d Linux/UNIX |
  sort -rk3 |
  perl -ane 'print "$F[1]\t$F[3]\n" unless $seen{$F[3]}++' |
  sort -n

To check the spot price in a different region, simply add an option like

--region us-west-1

or

--region eu-west-1

As of the writing of this article, the prices returned for the us-east-1 region are as follows:

0.026	m1.small
0.05	c1.medium
0.11	m1.large
0.25	c1.xlarge
0.265	m1.xlarge
0.443	m2.2xlarge
0.997	m2.4xlarge

The exact spot instance prices will vary significantly from these samples over time and likely even during the course of a day. In fact, the spot instance price may occasionally exceed the on demand (standard) EC2 instance price.

Who’s going to be the first person to provide handy, real time, spot price history graphs by EC2 region, instance class (Linux/Windows), and instance type with correct X-axis date/time scaling?

Good luck bidding!

Increasing Root Disk Size of an "EBS Boot" AMI on EC2

This article is about running a new EC2 instance with a larger boot volume than the default. You can also resize a running EC2 instance.

Amazon EC2’s new EBS Boot feature not only provides persistent root disks for instances, but also supports root disks larger than the previous limit of 10GB under S3 based AMIs.

Since EBS boot AMIs are implemented by creating a snapshot, the AMI publisher controls the default size of the root disk through the size of the snapshot. There are a number of factors which go into deciding the default root disk size of an EBS boot AMI and some of them conflict.

On the one hand, you want to give users enough free space to run their applications, but on the other hand, you don’t want to increase the cost of running the instance too much. EBS volumes run $0.10 to $0.11 per GB per month depending on the region, or about $10/month for 100GB and $100/month for 1TB.

I suspect the answer to this problem might be for AMI publishers to provide a reasonable low default, perhaps 10GB as per the old standard or 15GB following in the footsteps of Amazon’s first EBS Boot AMIs. This would add $1.00 to $1.50 per month to running the instance which seems negligible for most purposes. Note: There are also IO charges and charges for EBS snapshots, but those are more affected by usage and less by the size of the original volume.

For applications where the EBS boot AMI’s default size is not sufficient, users can increase the root disk size at run time all the way up to 1 TB. Here’s a quick overview of how to do this.

Example

The following demonstrates how to run Amazon’s getting-started-with-ebs-boot AMI increasing the root disk from the default of 15GB up to 100GB.

Before we start, let’s check to see the default size of the root disk in the target AMI and what the device name is:

$ ec2-describe-images ami-b232d0db
IMAGE	ami-b232d0db	amazon/getting-started-with-ebs-boot	amazon	available	public		i386	machine	aki-94c527fd	ari-96c527ff		ebs
BLOCKDEVICEMAPPING	/dev/sda1		snap-a08912c9	15	

We can see the EBS snapshot id snap-a08912c9 and the fact that it is 15 GB attached to /dev/sda1. If we start an instance of this AMI it will have a 15 GB EBS volume as the root disk and we won’t be able to change it once it’s running.

Now let’s run the EBS boot AMI, but we’ll override the default size, specifying 100 GB for the root disk device (/dev/sda1 as seen above):

ec2-run-instances \
  --key KEYPAIR \
  --block-device-mapping /dev/sda1=:100 \
  ami-b232d0db

If we check the EBS volume mapped to the new instance we’ll see that it is 100GB, but when we ssh to the instance and check the root file system size we’ll notice that it is only showing 15 GB:

$ df -h /
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              15G  1.6G   13G  12% /

There’s one step left. We need to resize the file system so that it fills up the entire 100 GB EBS volume. Here’s the magic command for ext3. In my early tests it took 2-3 minutes to run. [Update: For Ubuntu 11.04 and later, this step is performed automatically when the AMI is booted and you don’t need to run it manually.]

$ sudo resize2fs /dev/sda1
resize2fs 1.40.4 (31-Dec-2007)
Filesystem at /dev/sda1 is mounted on /; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 7
Performing an on-line resize of /dev/sda1 to 26214400 (4k) blocks.
The filesystem on /dev/sda1 is now 26214400 blocks long.

Finally, we can check to make sure that we’re running on a bigger file system:

$ df -h /
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              99G  1.6G   92G   2% /

Note: The output reflects “99” instead of “100” because of slight differences in how df and EBS calculate “GB” (e.g., 1024 MB vs 1000 MB).

XFS

If it were possible to create an EBS boot AMI with an XFS root file system, then the resizing would be near instantaneous using commands like the following. [Update: For Ubuntu 11.04 and later, this step is performed automatically when the AMI is booted and you don’t need to run it manually.]

sudo apt-get update && sudo apt-get install -y xfsprogs
sudo xfs_growfs /

The Ubuntu kernels built for EC2 by Canonical have XFS support built in, so XFS based EBS boot AMIs might be possible. This would also allow for more consistent EBS snapshots.

Toolset

Make sure you are running the latest version of the ec2-run-instances command. The current version can be determined with the command

ec2-version

To use EBS boot features, the version should be at least 1.3-45772.

[Updated 2009-12-11: Switch instructions to default us-east-1 since all regions now support this feature.]
[Updated 2011-12-13: Note that file system resize is done automatically on boot in Ubuntu 11.04 and later.]

Ubuntu Karmic Desktop on EC2

As Thilo Maier pointed out in comments on my request for UDS input, I have been publishing both server and desktop AMIs for running Ubuntu on EC2 up through Jaunty, but the official Karmic AMIs on EC2 only support server installations by default.

Ubuntu makes it pretty easy to install the desktop software on a server, and NX from NoMachine makes it pretty easy to access that desktop remotely, with near real-time interactivity even over slowish connections.

Here’s a quick guide to setting this up, starting with an Ubuntu 9.10 Karmic AMI on Amazon EC2:

  1. Create a user-data script which installs runurl (not on Karmic AMIs by default) and then runs a simple desktop and NX server installation script. Examine the desktop script to see what it’s doing to install the software.

     cat <<EOM >install-desktop
     #!/bin/bash -ex
     wget -qO/usr/bin/runurl run.alestic.com/runurl
     chmod 755 /usr/bin/runurl
     runurl run.alestic.com/install/desktop
     EOM
    
  2. Start an instance on EC2 telling it to run the above user-data script on first boot. The following example uses the current 32-bit Karmic server AMI. Make sure you’re using the latest AMI id.

     ec2-run-instances                   \
       --key YOURKEY                     \
       --user-data-file install-desktop  \
       ami-1515f67c
    
  3. Connect to the new instance and wait for it to complete the desktop software installation (when sshd is restarted). This takes about 30 minutes on an m1.small instance and 10 minutes on a c1.medium instance. Then generate and set a secure password for the ubuntu user using copy/paste from the pwgen output. Save the secure password so you can enter it into the NX client later.

     ssh -i YOURKEY.pem ubuntu@THEHOST
     tail -f /var/log/syslog | egrep --line-buffer user-data:
     pwgen -s 16 1
     sudo passwd ubuntu
    

    If anybody knows how to use ssh keys with NX, I’d love to do this instead of using passwords.

  4. Back on your local system, install and run the NX client. For computers not running Ubuntu, download the appropriate software from NoMachine.

     wget http://64.34.161.181/download/3.4.0/Linux/nxclient_3.4.0-5_i386.deb
     sudo dpkg -i nxclient_3.4.0-5_i386.deb
     /usr/NX/bin/nxclient --wizard &
    

    Point the NX Client to the external hostname of your EC2 instance. Enter the Login “ubuntu” and the Password from above. Choose the “Gnome” desktop.

If all goes well, you should have a complete and fresh Ubuntu desktop filling most of your screen, available for you to mess around with and then throw away.

ec2-terminate-instances INSTANCEID

If you want to have a persistent desktop with protection from crashes, you’ll need to learn how to do things like placing critical directories on EBS volumes.

If you’d like to run KDE on EC2, replace the package “ubuntu-desktop” with “kubuntu-desktop” in the installation script.

Ubuntu Developer Summit - EC2 Lucid

For the last year I have been working with Canonical and the Ubuntu server team, helping to migrate over to a more official process what I’ve been doing for the community in supporting Ubuntu on EC2. The Ubuntu 9.10 Karmic EC2 images are a fantastic result of this team’s work: An Ubuntu image running on real Ubuntu kernels with official support available.

For the next two days (Nov 14-15) I will be participating in the Ubuntu Developer Summit (UDS) in Dallas, Texas. Developers from around the world will be gathering this week to scope and define the next version of Ubuntu which will be released in April 2010.

As part of this, I think it would be helpful to gather input from the community. Please use the comment section of this article to share what you would like to see happen with the direction of Ubuntu on EC2 in the coming release(s).

Are there any features which you find missing? Functionality which would be helpful? Problems which keep getting in your way?

Feel free to brainstorm and toss out ideas big and small, even if they are not completely thought through or if they would also take help from Amazon to complete. It may already be too late to start off some of the proposals for the Lucid release cycle, but having ideas to think about for future releases never hurts.

New --mysql-stop option for ec2-consistent-snapshot

The ec2-consistent-snapshot software tries its best to flush and lock a MySQL database on an EC2 instance while it initiates the EBS snapshot, and for many environments it does a pretty good job.

However, there are situations where the database may spend time performing crash recovery from the log file when it is started from a copy of the snapshot. We are seeing this behavior at CampusExplorer.com where the database is constantly active and we have innodb_log_file_size set (probably too) high. The delay is doubtless exacerbated by the fact that the blocks on the new EBS volume are being recovered from S3 as it is being built from the snapshot.

Google has created an innodb_disallow_writes MySQL patch which I think points out the problem we may be hitting.

“Note that it is not sufficient to run FLUSH TABLES WITH READ LOCK as there are background IO threads used by InnoDB that may still do IO."

It would be very nice to have this patch incorporated in MySQL on Ubuntu. It looks like the OurDelta folks have already incorporated the patch. [Update: See rsimmons' comment below which explains why this particular patch might not be the answer.]

In any case, when we bring up a database using an EBS volume created from an EBS snapshot of an active database, it can take up to 45 minutes recovering before it lets normal clients connect. This is too long for us so we’re trying a new approach.

The ec2-consistent-snapshot now has a --mysql-stop option which shuts down the MySQL server, initiates the snapshot, and then restarts the database. Our hope is that this will get us a snapshot which can be restored and run without delay. If any MySQL experts can point out the potential flaws in this, please do.

Since we obviously can’t stop and start our production database every hour, we are performing this snapshot activity on a replication slave that is dedicated to snapshots and backups.

We continue to perform occasional snapshots on the production database EBS volume just to help keep it reliable per Amazon’s instructions, but we don’t expect to be able to restore it without crash recovery.

If you’d like to test the new --mysql-stop option, please upgrade your ec2-consistent-snapshot package from the Alestic PPA and let me know how it goes.

Understanding Access Credentials for AWS/EC2

Amazon Web Services (AWS) has a dizzying proliferation of credentials, keys, ids, usernames, certificates, passwords, and codes which are used to access and control various account and service features and functionality. I have never met an AWS user who, when they started, did not have trouble figuring out which ones to use when and where, much less why.

Amazon is fairly consistent across the documentation and interfaces in the specific terms they use for the different credentials, but nowhere have I found these all listed in one place. (Update: Shlomo pointed out Mitch Garnaat’s article on this topic which, upon examination, may even have been my subconscious inspiration for this. Mitch goes into a lot of great detail in his two part post.)

Pay close attention to the exact names so that you use the right credentials in the right places.

(1) AWS Email Address and (2) Password. This pair is used to log in to your AWS account on the AWS web site. Through this web site you can access and change information about your account including billing information. You can view the account activity. You can control many of the AWS services through the AWS console. And, you can generate and view a number of the other important access keys listed in this article. You may also be able to order products from Amazon.com with this account, so be careful. You should obviously protect your password. What you do with your email address is your business. Both of these values may be changed as needed.

(3) MFA Authentication Code. If you have ordered and activated a multi-factor authentication device, then parts of the AWS site will be protected not only by the email address and password described above, but also by an authentication code. This is a 6 digit code displayed on your device which changes every 30 seconds or so. The AWS web site will prompt you for this code after you successfully enter your email address and password.

(4) AWS Account Number. This is a 12 digit number separated with dashes in the form 1234-5678-9012. You can find your account number under your name on the top right of most pages on the AWS web site (when you are logged in). This number is not secret and may be available to other users in certain circumstances. I don’t know of any situation where you would use the number in this format with dashes, but it is needed to create the next identifier:

(5) AWS User ID. This is a 12 digit number with no dashes. In fact, it is simply the previously mentioned AWS Account Number with the dashes removed (e.g., 12345678912). Your User ID is needed by some API and command line tools, for example when bundling a new image with ec2-bundle-vol. It can also be entered in to the ElasticFox plugin to help display the owners of public AMIs. Again, your User ID does not need to be kept private. It is shown to others when you publish an AMI and make it public, though it might take some detective work to figure out who the number really belongs to if you don’t publicize that, too.

(6) AWS Access Key ID and (7) Secret Access Key. This is the first of two pairs of credentials which can be used to access and control basic AWS services through the API including EC2, S3, SimpleDB, CloudFront, SQS, EMR, RDS, etc. Some interfaces use this pair, and some use the next pair below. Pay close attention to the names requested. The Access Key ID is 20 alpha-numeric characters like 022QF06E7MXBSH9DHM02 and is not secret; it is available to others in some situations. The Secret Access Key is 40 alpha-numeric-slash-plus characters like kWcrlUX5JEDGM/LtmEENI/aVmYvHNif5zB+d9+ct and must be kept very secret.

You can change your Access Key ID and Secret Access Key if necessary. In fact, Amazon recommends regular rotation of these keys by generating a new pair, switching applications to use the new pair, and deactivating the old pair. If you forget either of these, they are both available from AWS.

(8) X.509 Certificate and (9) Private Key. This is the second pair of credentials that can be used to access the AWS API (SOAP only). The EC2 command line tools generally need these as might certain 3rd party services (assuming you trust them completely). These are also used to perform various tasks for AWS like encrypting and signing new AMIs when you build them. These are the largest credentials, taking taking the form of short text files with long names like cert-OHA6ZEBHMCGZ66CODVHEKKVOCYWISYCS.pem and pk-OHA6ZEBHMCGZ66CODVHEKKVOCYWISYCS.pem respectively.

The Certificate is supposedly not secret, though I haven’t found any reason to publicize it. The Private Key should obviously be kept private. Amazon keeps a copy of the Certificate so they can confirm your requests, but they do not store your Private Key, so don’t lose it after you generate it. Two of these pairs can be associated with your account at any one time, so they can be rotated as often as you rotate the Access Key ID and Secret Access Key. Keep records of all historical keys in case you have a need for them, like unencrypting an old AMI bundle which you created with an old Certificate.

(10) Linux username. When you ssh to a new EC2 instance you need to connect as a user that already exists on that system. For almost all public AMIs, this is the root user, but on Ubuntu AMIs published by Canonical, you need to connect using the ubuntu user. Once you gain access to the system, you can create your own users.

(11) public ssh key and (12) private ssh key. These are often referred to as a keypair in EC2. The ssh keys are used to make sure that only you can access your EC2 instances. When you run an instance, you specify the name of the keypair and the corresponding public key is provided to that instance. When you ssh to the above username on the instance, you specify the private key so the instance can authenticate you and let you in.

You can have multiple ssh keypairs associated with a single AWS account; they are created through the API or with tools like the ec2-add-keypair command. The private key must be protected as anybody with this key can log in to your instances. You generally never see or deal with the public key as EC2 keeps this copy and provides it to the instances. You download the private key and save it when it is generated; Amazon does not keep a record of it.

(13) ssh host key. Just to make things interesting, each EC2 instance which you run will have its own ssh host key. This is a private file generated by the host on first boot which is used to protect your ssh connection to the instance so it cannot be intercepted and read by other people. In order to make sure that your connection is secure, you need to verify that the (14) ssh host key fingerprint, which is provided to you on your first ssh attempt, matches the fingerprint listed in the console output of the EC2 instance.

At this point you may be less or more confused, but here are two rules which may help:

A. Create a file or folder where you jot down and save all of the credentials associated with each AWS account. You don’t want to lose any of these, especially the ones which Amazon does not store for you. Consider encrypting this information with (yet another) secure passphrase.

B. Pay close attention to what credentials are being asked for by different tools. A “secret access key” is different from a “private key” is different from a “private ssh key”. Use the above list to help sort things out.

Use the comment section below to discuss what I missed in this overview.

How *Not* to Upgrade to Ubuntu 9.10 Karmic on Amazon EC2

WARNING!

Though most Ubuntu 9.04 Jaunty systems can upgrade to 9.10 Karmic in place, this is not possible on EC2 and should not be attempted. If you do try this, your system will become unusable on reboot and there will be no recovery and no access to any of the data on the boot disk or ephemeral storage.

Here’s why:

  • Ubuntu 9.10 Karmic has a version of udev which requires a newer kernel than you would be running for Ubuntu 9.04 Jaunty (especially on EC2).

  • You cannot upgrade the kernel used by a running instance on Amazon EC2 (not even rebooting).

  • When an EC2 instance cannot boot (as in the case of the udev/kernel mismatch) your only option is to terminate it, losing the local storage.

How To Upgrade

In order to upgrade to Karmic you will need to start a new EC2 instance running a fresh copy of the appropriate Karmic AMI. I post the latest AMI ids for Karmic in the second table on https://alestic.com/.

Keep your old instance(s) running while you configure and test the new Karmic instances. EC2 makes it easy to have multiple sets of servers running in parallel instead of upgrading in place. When you are confident your new servers are functioning properly, you can discard the old ones.

The Ubuntu 9.10 Karmic AMIs released by Canonical have a number of differences from the community Ubuntu AMIs which have been published on https://alestic.com.

One of the biggest differences is that you will ssh to ubuntu@ instead of to root@ on your instance. You can then sudo to perform commands as the root user. Back in April I wrote a guide about Using sudo, ssh, rsync on the Official Ubuntu Images for EC2.

The Ubuntu server team has put a lot of work into making Ubuntu 9.10 Karmic function beautifully on Amazon EC2 and it’s been a pleasure to have a small part in the process. I’m already using the Karmic AMIs on EC2 for one of my production processes. Please give these AMIs a spin and give feedback.

1 TB of Memory in 1 Minute with 1 Command

Amazon Web Services just announced the release of two new instance types for EC2. These new types have 34.2 GB and 68.4 GB of RAM with a decent amount of CPU capacity on modern CPUs to go along with it.

Others have already done a great job of describing the instance types:

Jeff Barr’s AWS blog

RightScale’s blog

but when it comes to flexing the raw power at my fingertips with AWS, sometimes I can’t help myself. So…

sitting on my couch with my laptop watching an episode of “Lie to me” on TiVo I just typed:

ec2-run-instances            \
  --instance-type m2.4xlarge \
  --key KEYPAIR              \
  --instance-count 19        \
  ami-e6f6158f

and in under a minute and about $45 later, I had ssh access to well over 1 TB (1,000 GB) of free memory. To be sure, it was spread over 19 Ubuntu servers, but still, there’s gotta be something I can do with that, no?

Here are the results on a single one of these servers running Ubuntu 8.04 Hardy:

root@domU-12-31-39-08-7F-51:~# free
             total       used       free     shared    buffers     cached
Mem:      71687580    1521464   70166116          0       2632      17704
-/+ buffers/cache:    1501128   70186452
Swap:            0          0          0

root@domU-12-31-39-08-7F-51:~# free -g
             total       used       free     shared    buffers     cached
Mem:            68          1         66          0          0          0
-/+ buffers/cache:          1         66
Swap:            0          0          0

Wait, I’d better do whatever I’m gonna do quick or I’m going to be charged another $45.60 for the next hour’s worth of fun!

Ok, time to cut my losses:

ec2-describe-instances | 
  egrep m2.4xlarge | 
  cut -f2 | 
  xargs ec2-terminate-instances

In case you didn’t feel like spending $2.40 to find out the CPUs on one of these beasts, here’s one of the ones I ran:

vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Xeon(R) CPU           X5550  @ 2.67GHz
stepping	: 5
cpu MHz		: 2666.760
cache size	: 8192 KB
bogomips	: 5203.00

And remember that there’s 8 of these on the m2.4xlarge instance size. (Exact CPUs not guaranteed, your results my vary, etc.)

Amazon Web Services seems to keep releasing new features in advance of when our growing startup needs them. As we start to think about whether we are going to need to trim some tables or split up the database, here comes an instance type that will let us grow a lot longer just focusing on our core business challenges instead of on the infrastructure.

New Releases of Ubuntu and Debian Images for Amazon EC2 (Kernel, Security, PPA, runurl, Tools)

New updates have been released for the Ubuntu and Debian AMIs (EC2 images) published on:

https://alestic.com

The following notes apply to this release:

  • The images have been upgraded to use the newest 2.6.21 kernel, ramdisk, and kernel modules from Amazon. This fixes a serious security hole in the previous 2.6.21 kernel.

  • The Alestic PPA (personal package archive) has been added to the Ubuntu AMIs. This makes it easy to install software packages listed in this PPA, including ec2-consistent-snapshot.

  • The runurl package from the Alestic PPA has been pre-installed on the Ubuntu AMIs. This can be a handy tool for setting up new instances with user-data scripts.

  • The EC2 AMI tools have been upgraded to version 1.3-34544.

  • The ec2-ami-tools package version has been pinned so it does not get downgraded if the official Ubuntu archives still have older versions.

  • All packages have been upgraded to their respective latest versions.

  • The Ubuntu Karmic images were not updated and have been removed from the listings at the top of https://alestic.com. If you would like to use Ubuntu Karmic Beta, please test with the AMIs published by Canonical listed a bit lower down on the page.

Please give these new images a spin and let us know if you run into any problems.

Enjoy

Encrypting Ephemeral Storage and EBS Volumes on Amazon EC2

Over the years, Amazon has repeatedly recommended that customers who care about the security of their data should consider encrypting information stored on disks, whether ephemeral storage (/mnt) or EBS volumes. This, even though they take pains to ensure that disk blocks are wiped between uses by different customers, and they implement policies which restrict access to disks even by their own employees.

There are a few levels where encryption can take place:

  1. File level. This includes tools like GnuPG, freely available on Ubuntu in the gnupg package. If you use this approach, make sure that you don’t store the unencrypted information on the disk before encrypting it.

  2. File system level. This includes useful packages like encfs which transparently encrypt files before saving to disk, presenting the unencrypted contents in a virtual file system. This can even be used on top of an s3fs file system letting you store encrypted data on S3 with ease.

  3. Block device level. You can place any file system you’d like on top of the encrypted block interface and neither your application nor your file system realize that the hardware disk never sees unencrypted data.

The rest of this article presents a simple way to set up a level of encryption at the block device level using cryptsetup/LUKS. It has been tested on the 32-bit Ubuntu 9.10 Jaunty server AMI listed on https://alestic.com and should work on other Ubuntu AMIs and even other distros with minor changes.

This walkthrough uses the /mnt ephemeral storage, but you can replace /mnt and /dev/sda2 with appropriate mount point and device for 64-bit instance types or EBS volumes.

Setup

Install tools and kernel modules:

sudo apt-get update
sudo apt-get install -y cryptsetup xfsprogs
for i in sha256 dm_crypt xfs; do 
  sudo modprobe $i
  echo $i | sudo tee -a /etc/modules
done

Before you continue, make sure there is nothing valuable on /mnt because we’re going to replace it!

sudo umount /mnt
sudo chmod 000 /mnt

Encrypt the disk and create your favorite file system on it:

sudo luksformat -t xfs /dev/sda2
sudo cryptsetup luksOpen /dev/sda2 crypt-sda2

Remember your passphrase! It is not recoverable!

Update /etc/fstab and replace the /mnt line (or create a new line for an EBS volume):

fstabentry='/dev/mapper/crypt-sda2 /mnt xfs noauto 0 0'
sudo perl -pi -e "s%^.* /mnt .*%$fstabentry%" /etc/fstab

Mount the file system on the encrypted block device:

sudo mount /mnt

You’re now to free to place files on /mnt knowing that the content will be encrypted before it is written to the hardware disk.

After reboot, /mnt will appear empty until you re-mount the encrypted partition, entering your passphrase:

sudo cryptsetup luksOpen /dev/sda2 crypt-sda2
sudo mount /mnt

Notes

See “man cryptsetup” for info on adding keys and getting information from the LUKS disk header.

It is possible to auto-mount the encrypted disk on reboot if you are willing to put your passphrase in the root partition (almost ruins the point of encryption). See the documentation on crypttab and consider adding a line like:

crypt-sda2 /dev/sda2 /PASSPHRASEFILE luks

Study the cryptsetup documentation carefully so that you understand what is going on. Keeping your data private is important, but it’s also important that you know how to get it back in the case of problems.

This article does not attempt to cover all of the possible security considerations you might need to take into account for data leakage on disks. For example, sensitive information might be stored in /tmp, /etc, or log files on the root disk. If you have swap enabled, anything in memory could be saved in the clear to disk whenever the operating system feels like it.

How do you solve your data security challenges on EC2?

This article was based on a post made on the EC2 Ubuntu group.

Creating Consistent EBS Snapshots with MySQL and XFS on EC2

In the article Running MySQL on Amazon EC2 with Elastic Block Store I describe the principles involved in using EBS on EC2. Though originally published in 2008, it is still relevant today and is worth reviewing to get context for this article.

In the above tutorial, I included a sample script which followed the basic instructions in the article to initiate EBS snapshots of an XFS file system containing a MySQL database. For the most part this script worked for basic installations with low volume.

Over the last year as I and my co-workers have been using this code in production systems, we identified a number of ways it could be improved. Or, put another way, some serious issues came up when the idealistic world view of the original simplistic script met the complexities which can and do arise in the brutal real world.

We gradually improved the code over the course of the year, until the point where it has been running smoothly on production systems with no serious issues. This doesn’t mean that there aren’t any areas left for improvement, but does seem like it’s ready for the general public to give it a try.

The name of the new program is ec2-consistent-snapshot.

Features

Here are some of the ways in which the ec2-consistent-snapshot program has improved over the original:

  • Command line options for passing in AWS keys, MySQL access information, and more.

  • Can be run with or without a MySQL database on the file system. This lets you use the command to initiate snapshots for any EBS volume.

  • Can be used with or without XFS file systems, though if you don’t use XFS, you run the risk of not having a consistent file system on EBS volume restore.

  • Instead of using the painfully slow ec2-create-snapshot command written in Java, this Perl program accesses the EC2 API directly with orders of magnitude speed improvement.

  • A preliminary FLUSH is performed on the MySQL database before the FLUSH WITH READ LOCK. This preparation reduces the total time the tables are locked.

  • A preliminary sync is performed on the XFS file system before the xfs_freeze. This preparation reduces the total time the file system is locked.

  • The MySQL LOCK now has timeouts and retries around it. This prevents horrible blocking interactions between the database lock, long running queries, and normal transactions. The length of the timeout and the number of retries are configurable with command line options.

  • The MySQL FLUSH is done in such a way that the statement does not propagate through to slave databases, negatively impacting their performance and/or causing negative blocking interactions with long running queries.

  • Cleans up MySQL and XFS locks if it is interrupted, if a timeout happens, or if other errors occur. This prevents a number of serious locking issues when things go wrong with the environment or EC2 API.

  • Can snapshot EBS volumes in a region other than the default (e.g., eu-west-1).

  • Can initiate snapshots of multiple EBS volumes at the same time while everything is consistently locked. This has been used to create consistent snapshots of RAIDed EBS volumes.

Installation

On Ubuntu, you can install the ec2-consistent-snapshot package using the new Alestic PPA (personal package archive) hosted on Launchpad.net. Here are the steps to set up access to packages in the Alestic PPA and install the software package and its dependencies:

sudo add-apt-repository ppa:alestic &&
sudo apt-get update &&
sudo apt-get install -y ec2-consistent-snapshot

Now you can read the documentation using:

man ec2-consistent-snapshot

and run the ec2-consistent-snapshot command itself.

Feedback

If you find any problems with ec2-consistent-snapshot, please create bug reports in launchpad. The same mechanism can be used to submit ideas for improvement, which are especially welcomed if you include a patch.

Other questions and feedback are accepted in the comments section for this article. If you’re reading this on a planet, please click through on the title to read the comments.

[Update 2011-02-09: Simplify install instructions to not require use of CPAN.]

Hidden Dangers in Creating Public EBS Snapshots on EC2

Amazon EC2 recently released a feature which lets you share an EBS snapshot so that other accounts can access it. The snapshot can be shared with specific individual accounts or with the public at large.

You should obviously be careful what files you put on a shared EBS snapshot because other people are going to be able to read them. What may not be so obvious to is that you also need to be wary of what files are not currently on the snapshot but once were.

For example, if you copied some files onto the EBS volume, then realized a few contained sensitive information, you might think it’s sufficient to delete the private files and continue on to create a public EBS snapshot of the volume.

The problem with this is that EBS is an elastic block store device, not an interface at the file system level. Any block which was once written to on the block device will be available on the shared EBS volume, even if it is not being used by a visible file on the file system.

Since popular Linux file systems do not generally wipe data when a file is deleted, it is often possible to recover the contents of the deleted files. Even attempting to overwrite a file may, depending on the application, leave the original content available on the disk.

This means any content that touched your EBS volume at any point may still be available to users of your shared EBS snapshot.

To be clear: I do not consider this to be a security flaw in EC2 or EBS. It is merely a security risk for people who do not understand and take precautions against the combination of interactions with file systems, block devices, EBS volumes, and snapshots.

$100 Reward

To demonstrate the security risk, I have created a simple challenge with a tangible reward. Here is a public EBS snapshot:

snap-d53484bc

This EBS snapshot contains two files. The first file is README-1.txt which has nothing sensitive in it but will let you know that you’ve got the right device mounted on your EC2 instance.

The second file created on the source EBS volume contained an Amazon.com gift certificate for $100. I deleted this second file, then took an EBS snapshot of the volume and released it to the public.

The first person who successfully recovers the deleted file on this shared EBS snapshot and enters the gift certificate code into their Amazon.com account will win the $100 prize. Subsequent solvers will get a notice from Amazon that the certificate has already been redeemed, but you still get credit for solving it and helping demonstrate the risks.

Feel free to post a comment on this blog entry if you recovered the deleted file on the shared EBS snapshot. Recipes for doing so are welcomed even if you were not the first. I tested this, so know it’s possible and that the deleted file is still accessible (but I did not redeem the gift certificate, of course).

Good luck!

Solving: "I can't connect to my server on Amazon EC2"

Help! I can’t connect to my EC2 instance!
Woah! My box just stopped talking to me!
Hey! I can’t access the server!

These and other variations on the connectivity theme are some of the most common problems raised on the Amazon EC2 forum.

The EC2 community and Amazon employees do a valiant job helping users track down and solve these issues despite the facts that (1) there are hundreds of reasons why a server or service might not be accessible, (2) connectivity is one of the harder problems to diagnose, especially without being hands-on, and (3) users complaining about a problem generally don’t provide the clues necessary to solve the issue (because the ones who knew what those clues were probably solved it themselves and didn’t post).

This article is an attempt to provide some general assistance to folks who are experiencing connectivity issues with Amazon EC2. Please post additional help in the comments; this document will be updated over time.

Questions

First off, you should understand that it’s ok to ask for help. When you do, though, you should provide as many details as possible about what you are trying to do and what results you are seeing. It also helps if you drop some clues about your level of expertise. A person using Linux for the first time is likely to make different mistakes on EC2 than a person who is having problems connecting to a custom AMI they built from scratch.

The more specific you can be about your problem and the more information you can provide, the more likely somebody will be able to help. Here are some common questions which are important to have answered for connectivity problems on EC2:

  1. When you say you “can’t connect” what application are you trying to use and on what port? For example: “ssh to port 22” or “accessing port 80 with Firefox”. If you don’t know what a port is, then provide as many details as possible about the application you’re using and what command or steps you are taking to initiate the connection.

  2. What, specifically, happens when you try to connect? Does it hang for a long time and eventually time out? Do you get an error message? What is the exact text you see? (Copy and paste, don’t summarize.)

  3. What is the AMI id which the instance is running? If it is not a public AMI, then what is the AMI id of the public AMI it is based on?

  4. What Linux distro and release is the instance running? E.g., Ubuntu 9.04 Jaunty, Debian etch, Fedora 8, CentOS 5.

  5. What is the instance id of the instance you are trying to contact? Providing this can let Amazon employees take a look at the internals of what might be going on.

  6. What are the internal and external IP addresses and/or host names for the instance you are trying to reach? Providing this information is, in effect, giving permission to the community to try to contact your server over the network so that they can gather information about connectivity and help solve your problem.

  7. Have you ever been able to contact this instance in the past? How recently?

  8. How long has the instance been running?

  9. Have you ever been able to contact another instance of the same AMI?

  10. Is there a difference in connectivity when you try from another EC2 instance instead of from the Internet?

  11. What were you doing when the connectivity stopped?

  12. What is the console output of the instance? You can get this through an API client or a command like:

     ec2-get-console-output INSTANCE_ID
    

There are so many reasons that connectivity might be down to a remote server or service that it would be impossible to get a significant percentage of them listed in one article. I’ll start by listing some of the more common problems here; please add to the comments as you run into or remember others.

You

By far the most common cause of the problem is you (the person experiencing the problem) and that’s ok. We all make mistakes. It’s important, though, that you start with this attitude: open to the possibilities that you typed something wrong, forgot a step, or didn’t quite understand the complex instructions. Ninety percent of the people reading this paragraph think I’m talking to somebody else; oddly, they also think this sentence is not about them.

Here are some of the most common reasons folks (including me) can’t connect to their Amazon EC2 instance. Really.

  • You’re not connecting to the right instance or to the instance you think you’re trying to connect to. Servers on EC2 are identified by opaque instance ids like i-ae1df2c6 and opaque host names like ec2-75-101-182-20.compute-1.amazonaws.com. It’s easy for anybody to get these confused or mistype them.

  • The instance you’re trying to connect to has not completed the boot process yet. Though some AMIs are ready to connect in under a minute, others can take 10+ minutes.

  • The instance you’re trying to connect to has been terminated. (Did you just shut down what you thought was a different instance?)

  • The service you are trying to reach on the instance is not running on that instance.

  • The service you are trying to reach on the instance is not listening on that port or that network interface.

  • You did not open the port in the security group.

  • You did not start the instance with the correct security group.

  • You did not start the instance with the same ssh keypair as you are using to access it.

  • Your local firewall is preventing you from getting out to that port on any server outside your network. Talk to your local network administrators.

  • Your firewall on the instance is preventing access to the service. Try shutting down iptables temporarily to see if that helps.

You “experts” laugh when you read these, but if you’re having trouble reaching a server, I recommend you go through each one carefully and double check that your assumptions are correct and the world is really as you remember it. Remember: We all make mistakes. A lot of these come from personal experience.

If you’re not quite sure what terms like “security group” and “keypair” mean in the EC2 context, I recommend going back and reading some introductory material. These are important concepts for beginners.

ssh

The ssh connectivity problems generally fall into a couple major buckets

  1. ssh is not accessible, or

  2. ssh is rejecting the connection due to a failure to authenticate or authorize

You can find out which type of problem you have by using a command like

telnet HOSTNAME 22

If this connects, then ssh is running and accepting connections on port 22. (Hit [Enter] a couple times to disconnect from the telnet session). If you don’t connect, then it’s important to note if the attempt basically hung forever or if you got a “Connection refused” type of message immediately. (Hit [Ctrl]-[C] to stop the telnet command.)

If the connection attempt hangs, then there might be a problem with the security group, iptables, or your instance might not be running at that IP address.

If the telnet connection attempt gets rejected, then there might be a problem with iptables, ssh configuration, ssh not running on the instance, or perhaps it’s listening on different port if the admin likes to configure things a bit more securely. The console output can be helpful in determining if sshd was started at boot.

If you can get connected to the ssh port with telnet, then you need to start debugging why ssh is not letting you in. The most important information can be gathered by running the ssh connection attempt in verbose (-v) mode:

ssh -v -i KEYPAIR.pem USERNAME@HOSTNAME

The complete output of this command can be very helpful to post when asking for help.

The most common problems with ssh relate to:

  • Forgetting to specify -i KEYPAIR.pem in the ssh command

  • Not starting the instance specifying a keypair

  • Using a different keypair than the one which was used to start the instance

  • Not ssh’ing with the correct username. Amazon Linux instances require a first ssh connection with ec2-user@... while mages published by Canonical require a first connection with ubuntu@... and others might need you to ssh using root@...

  • Not having the correct ownership or mode on the .ssh directory or authorized_keys file.

  • Not having the correct Allow* or *Authentication settings in /etc/ssh/sshd_config

Apache

Web servers are much easier to connect to than other applications because there is generally no authentication and authorization involved to get a basic web page. If you can’t reach your web server on EC2, then it’s generally one of the simple problems described above like using the wrong IP address, trying to reach a terminated instance, or not having the web port opened in the security group.

MySQL

The most common problem specific to MySQL connectivity on EC2 is the fact that MySQL is configured securely by default to not allow access by remote hosts. If you need to allow a connection from your other instances running in EC2, then edit /etc/mysql/my.cnf and replace this line:

bind-address            = 127.0.0.1

with

bind-address            = 0.0.0.0

and restart the mysqld server.

IMPORTANT! You should not open the MySQL port in the EC2 security group. You only want your own EC2 instances to connect to the database and the default security group allows your EC2 instances to connect to any port on your other EC2 instances. If you open up the port to the public, then your database will be attacked by the Internet at large.

If you need to talk to your MySQL database running on EC2 from a server running outside EC2, then do it over a secure channel like an ssh tunnel or openvpn. You don’t need the MySQL port open in the security group to do this. The MySQL protocol is not by itself encrypted and your usernames and passwords would be sent in the clear for anybody else to intercept if you didn’t talk over a secure channel.

Custom AMIs

If you are building your own custom AMIs from scratch, then there are a number of complicated barriers to getting network and ssh connectivity working. Unfortunately it is nearly impossible to debug these problems since you don’t have access to the machine to see what went wrong. Console output is your only friend in these cases.

Here are some examples of odd things which others in the EC2 community have run into and solved:

  • Make sure you start networking on instance boot. It should come up with DHCP on eth0.

  • Make sure your Linux distro does not save the MAC address somewhere, preventing the network from functioning in the next instance. Ubuntu stores this in the /etc/udev/rules.d/70-persistent-net.rules file and Debian stores this in the /etc/udev/rules.d/z25_persistent-net.rules file.

  • Make sure your image downloads the ssh keypair and installs it in authorized_keys.

  • Make sure you have the right devices created and file systems mounted.

  • Make sure you’re using a udev lower than v144 as higher versions are incompatible with Amazon’s 2.6.21 kernel.

  • Make sure you’re using the right libc6 and related configurations including /lib/tls

Amazon

I realize this was your first thought, but it’s such a rare cause, I’ve put it here at the end. Sometimes there are problems with Amazon EC2. The hardware running your instance may fail or the networks might have temporary glitches. There are a couple different classes of problems here:

  1. Small scale problems local to the hardware running your instance. Though these are rare for any single instance, they are happening all the time for some customer somewhere given that AWS has hundreds of thousands of customers. Amazon often sends you an email when they notice that an instance is starting to have problems, and you should move to a new instance as soon as possible. If the failure happens without the warning, the only solution is to move to a new instance anyway, so you should always be prepared to do this.

  2. Large scale problems which affect a large number of customers simultaneously. These are very rare, and generally don’t affect more than a single availability zone given the way that Amazon has spread out the risk in their architecture.

You can check the AWS service health dashboard to see if Amazon is aware of any widespread problems with the EC2 service. If there are problems with a specific availability zone, you may want to move your servers to a different availability zone until the issues get resolved.

First Responses

For general cases where you can’t immediately figure out what went wrong with the connectivity, here are two things which are almost always recommended on EC2: reboot the instance and replace the instance.

Reboot your EC2 instance using the EC2 Console, another API client, or a command like:

    ec2-reboot-instances INSTANCE_ID

After giving it sufficient time to come up, see if that fixed the connectivity problem. Do not reboot your instance if you currently have a working ssh connection to it, but other ssh connections are failing!

If you have a production service running on Amazon EC2 and you lose connectivity to an instance, then I recommend your first reaction be to kick off a replacement instance so that it boots and configures itself while you investigate the original issue. If you don’t solve the problem by the time the replacement is ready, simply switch over to the new server. You may want to continue investigating what happened with the old server, though I generally don’t care what the problem was unless it happens more than once or twice in a short time period.

If your installation environment does not allow you to easily start replacement instances, then you should reconsider how you are using EC2 and work to improve this.

Seeking Help

If the above did not help you solve your problem reaching your EC2 instance, you may want to reach out to the community including some AWS employees on the EC2 forum.

Amazon also has premium AWS support available.

Requests for connectivity help by posting a comment on this particular thread will not be published or answered. Please only post a comment if you have corrections or additional information to share for users experiencing problems. I do occasionally receive and respond to questions posted on other articles, but for this topic, please use the EC2 forum.

[Update 2011-10-01: Amazon Linux requires ssh using ubuntu@...]

runurl - A Tool and Approach for Simplifying user-data Scripts on EC2

Many Ubuntu and Debian images for Amazon EC2 include a hook where scripts passed as user-data will be run as root on the first boot.

At Campus Explorer, we’ve been experimenting with an approach where the actual user-data is a very short script which downloads and runs other scripts. This idea is not new, but I have simplified the process by creating a small tool named runurl which adds a lot of flexibility and convenience when configuring new servers.

Usage

The basic synopsis looks like:

runurl URL [ARGS]...

The first argument to the runurl command is the URL of a script or program which should be run. All following options and arguments are passed verbatim to the program as its options and arguments. The exit code of runurl is the exit code of the program.

The runurl command is a very short and simple script, but it makes the user-data startup scripts even shorter and simpler themselves.

Example 1

If the following content is stored at http://run.alestic.com/demo/echo

#!/bin/bash
echo "$@"

then this command:

runurl run.alestic.com/demo/echo "hello, world"

will itself output:

hello, world

You can specify the “http://” in the URLs, but since it’s using wget to download them, the specifier is not necessary and the code might be easier to read without it.

Example 2

Here’s a more substantial sample user-data script which invokes a number of other remote scripts to upgrade the Ubuntu packages, install the munin monitoring software, install and run the Folding@Home application using origami with credit going to Team Ubuntu. It finally sends an email back home that it’s active.

This sample assumes that runurl is installed on the AMI (e.g., Ubuntu AMIs published on https://alestic.com>). For other AMIs, see below for additional commands to add to the start of the script.

#!/bin/bash -ex
runurl run.alestic.com/apt/upgrade
runurl run.alestic.com/install/munin
cd /root
runurl run.alestic.com/install/folding@home -u ec2 -t 45104 -b small
runurl run.alestic.com/email/start youremail@example.com

Note that the last command passes a parameter to the script, identifying where the email should be sent. Please change this if you test the script.

With the above content stored in a file named folding.user-data, you could start 5 new c1.medium instances running the Folding@Home software using the command:

ec2-run-instances \
  --user-data-file folding.user-data \
  --key [KEYPAIR] \
  --instance-type c1.medium\
  --instance-count 5 \
  ami-ed46a784

You can log on to an instance and monitor the installation with

tail -f /var/log/syslog

Once the Folding@Home application is running, you can monitor its progress with:

/root/origami/origami status

and after 15 minutes, check out the Munin system stats at

http://ec2-HOSTNAME/munin/

Expiring URLs

One of the problems with normal user-data scripts is that the contents exist as long as the instance is running and any user on the instance can read the contents of the user-data. This puts any private or confidential information in the user-data at risk.

If you put your actual startup code in private S3 buckets, you can pass runurl a URL to the contents, where the URL expires shortly after it is run. Or, the script could even delete the contents itself if you set it up correctly. This reduces the exposure to the time it takes for the instance to start up and does not let anybody else access the URL during that time.

Updating

Another benefit of keeping the actual startup code separate from the user-data content itself is that you can modify the startup code stored at the URL without modifying the user-data content.

This can be useful with services like EC2 Auto Scaling, where the specified user-data cannot be dynamically changed in a launch configuration without creating a whole new launch configuration.

If you modify the runurl scripts, the next server to be launched will automatically pick up the new instructions.

Bootstrapping

The runurl tool is pre-installed in the latest Ubuntu AMIs published on https://alestic.com. If you are using an Ubuntu image which does not include this software, you can install it from the Alestic PPA using the following commands at the top of your user-data script:

sudo add-apt-repository ppa:alestic/ppa &&
sudo apt-get update &&
sudo apt-get install -y runurl

If you are using an Ubuntu release without the add-apt-repository command or a Linux distro other than Ubuntu, you can install runurl using the following commands:

sudo wget -qO/usr/bin/runurl run.alestic.com/runurl
sudo chmod 755 /usr/bin/runurl

The subsequent commands in the user-data script can then use the runurl command as demonstrated in the above example.

SSL

To improve your certainty that you are talking to the right server and getting the right data, you could use SSL (https) in your URLs. If you are talking to S3 buckets, however, you’ll need to use the old style S3 bucket access style like:

runurl https://s3.amazonaws.com/run.alestic.com/demo/echo "hello, mars"

This is probably not as critical when accessing it from an EC2 instance as you’re operating over Amazon’s trusted network.

Caveats

There are a number of things which can go wrong when using a tool like runurl. Here are some to think about:

  • Only run content which you control or completely trust.

  • Just because you like the content of a URL when you look at it in your browser does not mean that it will still look like that when your instance goes to run it. It could change at any point to something that is broken or even malicious unless it is under your control.

  • If you depend on this approach for serious applications, you need to make sure that the content you are downloading is coming from a reliable server. S3 is reasonable (with retries) but you also need to consider the DNS server if you are depending on a non-AWS hostname to access the S3 bucket.

The name run.alestic.com points to an S3 bucket, but the DNS for this name is not redundant or worthy of use by applications with serious uptime requirements. This particular service should be considered my playground for ideas and there is no commitment on my part to make sure that it is up or that the content remains stable.

If you like what you see, please feel free to copy any of the open source content on run.alestic.com and store it on your own reliable and trusted servers. It is all published under the Apache2 license.

Project

I’m using this simple script as an opportunity to come up to speed with hosting projects on Launchpad. You can access the source code and submit bugs at

https://launchpad.net/runurl

You can also use launchpad and bazaar to branch the source into parallel projects and/or submit requests to merge patches into the main development branch.

[Update 2009-10-11: Document use of Alestic PPA]
[Update 2010-01-25: Simplify boostrap instructions for Ubuntu]
[Update 2010-08-17: Switch to using “add-apt-repository” for bootstrap instructions]