Encrypting Ephemeral Storage and EBS Volumes on Amazon EC2

Over the years, Amazon has repeatedly recommended that customers who care about the security of their data should consider encrypting information stored on disks, whether ephemeral storage (/mnt) or EBS volumes. This, even though they take pains to ensure that disk blocks are wiped between uses by different customers, and they implement policies which restrict access to disks even by their own employees.

There are a few levels where encryption can take place:

  1. File level. This includes tools like GnuPG, freely available on Ubuntu in the gnupg package. If you use this approach, make sure that you don’t store the unencrypted information on the disk before encrypting it.

  2. File system level. This includes useful packages like encfs which transparently encrypt files before saving to disk, presenting the unencrypted contents in a virtual file system. This can even be used on top of an s3fs file system letting you store encrypted data on S3 with ease.

  3. Block device level. You can place any file system you’d like on top of the encrypted block interface and neither your application nor your file system realize that the hardware disk never sees unencrypted data.

The rest of this article presents a simple way to set up a level of encryption at the block device level using cryptsetup/LUKS. It has been tested on the 32-bit Ubuntu 9.10 Jaunty server AMI listed on https://alestic.com and should work on other Ubuntu AMIs and even other distros with minor changes.

This walkthrough uses the /mnt ephemeral storage, but you can replace /mnt and /dev/sda2 with appropriate mount point and device for 64-bit instance types or EBS volumes.

Setup

Install tools and kernel modules:

sudo apt-get update
sudo apt-get install -y cryptsetup xfsprogs
for i in sha256 dm_crypt xfs; do 
  sudo modprobe $i
  echo $i | sudo tee -a /etc/modules
done

Before you continue, make sure there is nothing valuable on /mnt because we’re going to replace it!

sudo umount /mnt
sudo chmod 000 /mnt

Encrypt the disk and create your favorite file system on it:

sudo luksformat -t xfs /dev/sda2
sudo cryptsetup luksOpen /dev/sda2 crypt-sda2

Remember your passphrase! It is not recoverable!

Update /etc/fstab and replace the /mnt line (or create a new line for an EBS volume):

fstabentry='/dev/mapper/crypt-sda2 /mnt xfs noauto 0 0'
sudo perl -pi -e "s%^.* /mnt .*%$fstabentry%" /etc/fstab

Mount the file system on the encrypted block device:

sudo mount /mnt

You’re now to free to place files on /mnt knowing that the content will be encrypted before it is written to the hardware disk.

After reboot, /mnt will appear empty until you re-mount the encrypted partition, entering your passphrase:

sudo cryptsetup luksOpen /dev/sda2 crypt-sda2
sudo mount /mnt

Notes

See “man cryptsetup” for info on adding keys and getting information from the LUKS disk header.

It is possible to auto-mount the encrypted disk on reboot if you are willing to put your passphrase in the root partition (almost ruins the point of encryption). See the documentation on crypttab and consider adding a line like:

crypt-sda2 /dev/sda2 /PASSPHRASEFILE luks

Study the cryptsetup documentation carefully so that you understand what is going on. Keeping your data private is important, but it’s also important that you know how to get it back in the case of problems.

This article does not attempt to cover all of the possible security considerations you might need to take into account for data leakage on disks. For example, sensitive information might be stored in /tmp, /etc, or log files on the root disk. If you have swap enabled, anything in memory could be saved in the clear to disk whenever the operating system feels like it.

How do you solve your data security challenges on EC2?

This article was based on a post made on the EC2 Ubuntu group.

Creating Consistent EBS Snapshots with MySQL and XFS on EC2

In the article Running MySQL on Amazon EC2 with Elastic Block Store I describe the principles involved in using EBS on EC2. Though originally published in 2008, it is still relevant today and is worth reviewing to get context for this article.

In the above tutorial, I included a sample script which followed the basic instructions in the article to initiate EBS snapshots of an XFS file system containing a MySQL database. For the most part this script worked for basic installations with low volume.

Over the last year as I and my co-workers have been using this code in production systems, we identified a number of ways it could be improved. Or, put another way, some serious issues came up when the idealistic world view of the original simplistic script met the complexities which can and do arise in the brutal real world.

We gradually improved the code over the course of the year, until the point where it has been running smoothly on production systems with no serious issues. This doesn’t mean that there aren’t any areas left for improvement, but does seem like it’s ready for the general public to give it a try.

The name of the new program is ec2-consistent-snapshot.

Features

Here are some of the ways in which the ec2-consistent-snapshot program has improved over the original:

  • Command line options for passing in AWS keys, MySQL access information, and more.

  • Can be run with or without a MySQL database on the file system. This lets you use the command to initiate snapshots for any EBS volume.

  • Can be used with or without XFS file systems, though if you don’t use XFS, you run the risk of not having a consistent file system on EBS volume restore.

  • Instead of using the painfully slow ec2-create-snapshot command written in Java, this Perl program accesses the EC2 API directly with orders of magnitude speed improvement.

  • A preliminary FLUSH is performed on the MySQL database before the FLUSH WITH READ LOCK. This preparation reduces the total time the tables are locked.

  • A preliminary sync is performed on the XFS file system before the xfs_freeze. This preparation reduces the total time the file system is locked.

  • The MySQL LOCK now has timeouts and retries around it. This prevents horrible blocking interactions between the database lock, long running queries, and normal transactions. The length of the timeout and the number of retries are configurable with command line options.

  • The MySQL FLUSH is done in such a way that the statement does not propagate through to slave databases, negatively impacting their performance and/or causing negative blocking interactions with long running queries.

  • Cleans up MySQL and XFS locks if it is interrupted, if a timeout happens, or if other errors occur. This prevents a number of serious locking issues when things go wrong with the environment or EC2 API.

  • Can snapshot EBS volumes in a region other than the default (e.g., eu-west-1).

  • Can initiate snapshots of multiple EBS volumes at the same time while everything is consistently locked. This has been used to create consistent snapshots of RAIDed EBS volumes.

Installation

On Ubuntu, you can install the ec2-consistent-snapshot package using the new Alestic PPA (personal package archive) hosted on Launchpad.net. Here are the steps to set up access to packages in the Alestic PPA and install the software package and its dependencies:

sudo add-apt-repository ppa:alestic &&
sudo apt-get update &&
sudo apt-get install -y ec2-consistent-snapshot

Now you can read the documentation using:

man ec2-consistent-snapshot

and run the ec2-consistent-snapshot command itself.

Feedback

If you find any problems with ec2-consistent-snapshot, please create bug reports in launchpad. The same mechanism can be used to submit ideas for improvement, which are especially welcomed if you include a patch.

Other questions and feedback are accepted in the comments section for this article. If you’re reading this on a planet, please click through on the title to read the comments.

[Update 2011-02-09: Simplify install instructions to not require use of CPAN.]

Hidden Dangers in Creating Public EBS Snapshots on EC2

Amazon EC2 recently released a feature which lets you share an EBS snapshot so that other accounts can access it. The snapshot can be shared with specific individual accounts or with the public at large.

You should obviously be careful what files you put on a shared EBS snapshot because other people are going to be able to read them. What may not be so obvious to is that you also need to be wary of what files are not currently on the snapshot but once were.

For example, if you copied some files onto the EBS volume, then realized a few contained sensitive information, you might think it’s sufficient to delete the private files and continue on to create a public EBS snapshot of the volume.

The problem with this is that EBS is an elastic block store device, not an interface at the file system level. Any block which was once written to on the block device will be available on the shared EBS volume, even if it is not being used by a visible file on the file system.

Since popular Linux file systems do not generally wipe data when a file is deleted, it is often possible to recover the contents of the deleted files. Even attempting to overwrite a file may, depending on the application, leave the original content available on the disk.

This means any content that touched your EBS volume at any point may still be available to users of your shared EBS snapshot.

To be clear: I do not consider this to be a security flaw in EC2 or EBS. It is merely a security risk for people who do not understand and take precautions against the combination of interactions with file systems, block devices, EBS volumes, and snapshots.

$100 Reward

To demonstrate the security risk, I have created a simple challenge with a tangible reward. Here is a public EBS snapshot:

snap-d53484bc

This EBS snapshot contains two files. The first file is README-1.txt which has nothing sensitive in it but will let you know that you’ve got the right device mounted on your EC2 instance.

The second file created on the source EBS volume contained an Amazon.com gift certificate for $100. I deleted this second file, then took an EBS snapshot of the volume and released it to the public.

The first person who successfully recovers the deleted file on this shared EBS snapshot and enters the gift certificate code into their Amazon.com account will win the $100 prize. Subsequent solvers will get a notice from Amazon that the certificate has already been redeemed, but you still get credit for solving it and helping demonstrate the risks.

Feel free to post a comment on this blog entry if you recovered the deleted file on the shared EBS snapshot. Recipes for doing so are welcomed even if you were not the first. I tested this, so know it’s possible and that the deleted file is still accessible (but I did not redeem the gift certificate, of course).

Good luck!

Solving: "I can't connect to my server on Amazon EC2"

Help! I can’t connect to my EC2 instance!
Woah! My box just stopped talking to me!
Hey! I can’t access the server!

These and other variations on the connectivity theme are some of the most common problems raised on the Amazon EC2 forum.

The EC2 community and Amazon employees do a valiant job helping users track down and solve these issues despite the facts that (1) there are hundreds of reasons why a server or service might not be accessible, (2) connectivity is one of the harder problems to diagnose, especially without being hands-on, and (3) users complaining about a problem generally don’t provide the clues necessary to solve the issue (because the ones who knew what those clues were probably solved it themselves and didn’t post).

This article is an attempt to provide some general assistance to folks who are experiencing connectivity issues with Amazon EC2. Please post additional help in the comments; this document will be updated over time.

Questions

First off, you should understand that it’s ok to ask for help. When you do, though, you should provide as many details as possible about what you are trying to do and what results you are seeing. It also helps if you drop some clues about your level of expertise. A person using Linux for the first time is likely to make different mistakes on EC2 than a person who is having problems connecting to a custom AMI they built from scratch.

The more specific you can be about your problem and the more information you can provide, the more likely somebody will be able to help. Here are some common questions which are important to have answered for connectivity problems on EC2:

  1. When you say you “can’t connect” what application are you trying to use and on what port? For example: “ssh to port 22” or “accessing port 80 with Firefox”. If you don’t know what a port is, then provide as many details as possible about the application you’re using and what command or steps you are taking to initiate the connection.

  2. What, specifically, happens when you try to connect? Does it hang for a long time and eventually time out? Do you get an error message? What is the exact text you see? (Copy and paste, don’t summarize.)

  3. What is the AMI id which the instance is running? If it is not a public AMI, then what is the AMI id of the public AMI it is based on?

  4. What Linux distro and release is the instance running? E.g., Ubuntu 9.04 Jaunty, Debian etch, Fedora 8, CentOS 5.

  5. What is the instance id of the instance you are trying to contact? Providing this can let Amazon employees take a look at the internals of what might be going on.

  6. What are the internal and external IP addresses and/or host names for the instance you are trying to reach? Providing this information is, in effect, giving permission to the community to try to contact your server over the network so that they can gather information about connectivity and help solve your problem.

  7. Have you ever been able to contact this instance in the past? How recently?

  8. How long has the instance been running?

  9. Have you ever been able to contact another instance of the same AMI?

  10. Is there a difference in connectivity when you try from another EC2 instance instead of from the Internet?

  11. What were you doing when the connectivity stopped?

  12. What is the console output of the instance? You can get this through an API client or a command like:

     ec2-get-console-output INSTANCE_ID
    

There are so many reasons that connectivity might be down to a remote server or service that it would be impossible to get a significant percentage of them listed in one article. I’ll start by listing some of the more common problems here; please add to the comments as you run into or remember others.

You

By far the most common cause of the problem is you (the person experiencing the problem) and that’s ok. We all make mistakes. It’s important, though, that you start with this attitude: open to the possibilities that you typed something wrong, forgot a step, or didn’t quite understand the complex instructions. Ninety percent of the people reading this paragraph think I’m talking to somebody else; oddly, they also think this sentence is not about them.

Here are some of the most common reasons folks (including me) can’t connect to their Amazon EC2 instance. Really.

  • You’re not connecting to the right instance or to the instance you think you’re trying to connect to. Servers on EC2 are identified by opaque instance ids like i-ae1df2c6 and opaque host names like ec2-75-101-182-20.compute-1.amazonaws.com. It’s easy for anybody to get these confused or mistype them.

  • The instance you’re trying to connect to has not completed the boot process yet. Though some AMIs are ready to connect in under a minute, others can take 10+ minutes.

  • The instance you’re trying to connect to has been terminated. (Did you just shut down what you thought was a different instance?)

  • The service you are trying to reach on the instance is not running on that instance.

  • The service you are trying to reach on the instance is not listening on that port or that network interface.

  • You did not open the port in the security group.

  • You did not start the instance with the correct security group.

  • You did not start the instance with the same ssh keypair as you are using to access it.

  • Your local firewall is preventing you from getting out to that port on any server outside your network. Talk to your local network administrators.

  • Your firewall on the instance is preventing access to the service. Try shutting down iptables temporarily to see if that helps.

You “experts” laugh when you read these, but if you’re having trouble reaching a server, I recommend you go through each one carefully and double check that your assumptions are correct and the world is really as you remember it. Remember: We all make mistakes. A lot of these come from personal experience.

If you’re not quite sure what terms like “security group” and “keypair” mean in the EC2 context, I recommend going back and reading some introductory material. These are important concepts for beginners.

ssh

The ssh connectivity problems generally fall into a couple major buckets

  1. ssh is not accessible, or

  2. ssh is rejecting the connection due to a failure to authenticate or authorize

You can find out which type of problem you have by using a command like

telnet HOSTNAME 22

If this connects, then ssh is running and accepting connections on port 22. (Hit [Enter] a couple times to disconnect from the telnet session). If you don’t connect, then it’s important to note if the attempt basically hung forever or if you got a “Connection refused” type of message immediately. (Hit [Ctrl]-[C] to stop the telnet command.)

If the connection attempt hangs, then there might be a problem with the security group, iptables, or your instance might not be running at that IP address.

If the telnet connection attempt gets rejected, then there might be a problem with iptables, ssh configuration, ssh not running on the instance, or perhaps it’s listening on different port if the admin likes to configure things a bit more securely. The console output can be helpful in determining if sshd was started at boot.

If you can get connected to the ssh port with telnet, then you need to start debugging why ssh is not letting you in. The most important information can be gathered by running the ssh connection attempt in verbose (-v) mode:

ssh -v -i KEYPAIR.pem USERNAME@HOSTNAME

The complete output of this command can be very helpful to post when asking for help.

The most common problems with ssh relate to:

  • Forgetting to specify -i KEYPAIR.pem in the ssh command

  • Not starting the instance specifying a keypair

  • Using a different keypair than the one which was used to start the instance

  • Not ssh’ing with the correct username. Amazon Linux instances require a first ssh connection with ec2-user@... while mages published by Canonical require a first connection with ubuntu@... and others might need you to ssh using root@...

  • Not having the correct ownership or mode on the .ssh directory or authorized_keys file.

  • Not having the correct Allow* or *Authentication settings in /etc/ssh/sshd_config

Apache

Web servers are much easier to connect to than other applications because there is generally no authentication and authorization involved to get a basic web page. If you can’t reach your web server on EC2, then it’s generally one of the simple problems described above like using the wrong IP address, trying to reach a terminated instance, or not having the web port opened in the security group.

MySQL

The most common problem specific to MySQL connectivity on EC2 is the fact that MySQL is configured securely by default to not allow access by remote hosts. If you need to allow a connection from your other instances running in EC2, then edit /etc/mysql/my.cnf and replace this line:

bind-address            = 127.0.0.1

with

bind-address            = 0.0.0.0

and restart the mysqld server.

IMPORTANT! You should not open the MySQL port in the EC2 security group. You only want your own EC2 instances to connect to the database and the default security group allows your EC2 instances to connect to any port on your other EC2 instances. If you open up the port to the public, then your database will be attacked by the Internet at large.

If you need to talk to your MySQL database running on EC2 from a server running outside EC2, then do it over a secure channel like an ssh tunnel or openvpn. You don’t need the MySQL port open in the security group to do this. The MySQL protocol is not by itself encrypted and your usernames and passwords would be sent in the clear for anybody else to intercept if you didn’t talk over a secure channel.

Custom AMIs

If you are building your own custom AMIs from scratch, then there are a number of complicated barriers to getting network and ssh connectivity working. Unfortunately it is nearly impossible to debug these problems since you don’t have access to the machine to see what went wrong. Console output is your only friend in these cases.

Here are some examples of odd things which others in the EC2 community have run into and solved:

  • Make sure you start networking on instance boot. It should come up with DHCP on eth0.

  • Make sure your Linux distro does not save the MAC address somewhere, preventing the network from functioning in the next instance. Ubuntu stores this in the /etc/udev/rules.d/70-persistent-net.rules file and Debian stores this in the /etc/udev/rules.d/z25_persistent-net.rules file.

  • Make sure your image downloads the ssh keypair and installs it in authorized_keys.

  • Make sure you have the right devices created and file systems mounted.

  • Make sure you’re using a udev lower than v144 as higher versions are incompatible with Amazon’s 2.6.21 kernel.

  • Make sure you’re using the right libc6 and related configurations including /lib/tls

Amazon

I realize this was your first thought, but it’s such a rare cause, I’ve put it here at the end. Sometimes there are problems with Amazon EC2. The hardware running your instance may fail or the networks might have temporary glitches. There are a couple different classes of problems here:

  1. Small scale problems local to the hardware running your instance. Though these are rare for any single instance, they are happening all the time for some customer somewhere given that AWS has hundreds of thousands of customers. Amazon often sends you an email when they notice that an instance is starting to have problems, and you should move to a new instance as soon as possible. If the failure happens without the warning, the only solution is to move to a new instance anyway, so you should always be prepared to do this.

  2. Large scale problems which affect a large number of customers simultaneously. These are very rare, and generally don’t affect more than a single availability zone given the way that Amazon has spread out the risk in their architecture.

You can check the AWS service health dashboard to see if Amazon is aware of any widespread problems with the EC2 service. If there are problems with a specific availability zone, you may want to move your servers to a different availability zone until the issues get resolved.

First Responses

For general cases where you can’t immediately figure out what went wrong with the connectivity, here are two things which are almost always recommended on EC2: reboot the instance and replace the instance.

Reboot your EC2 instance using the EC2 Console, another API client, or a command like:

    ec2-reboot-instances INSTANCE_ID

After giving it sufficient time to come up, see if that fixed the connectivity problem. Do not reboot your instance if you currently have a working ssh connection to it, but other ssh connections are failing!

If you have a production service running on Amazon EC2 and you lose connectivity to an instance, then I recommend your first reaction be to kick off a replacement instance so that it boots and configures itself while you investigate the original issue. If you don’t solve the problem by the time the replacement is ready, simply switch over to the new server. You may want to continue investigating what happened with the old server, though I generally don’t care what the problem was unless it happens more than once or twice in a short time period.

If your installation environment does not allow you to easily start replacement instances, then you should reconsider how you are using EC2 and work to improve this.

Seeking Help

If the above did not help you solve your problem reaching your EC2 instance, you may want to reach out to the community including some AWS employees on the EC2 forum.

Amazon also has premium AWS support available.

Requests for connectivity help by posting a comment on this particular thread will not be published or answered. Please only post a comment if you have corrections or additional information to share for users experiencing problems. I do occasionally receive and respond to questions posted on other articles, but for this topic, please use the EC2 forum.

[Update 2011-10-01: Amazon Linux requires ssh using ubuntu@...]

runurl - A Tool and Approach for Simplifying user-data Scripts on EC2

Many Ubuntu and Debian images for Amazon EC2 include a hook where scripts passed as user-data will be run as root on the first boot.

At Campus Explorer, we’ve been experimenting with an approach where the actual user-data is a very short script which downloads and runs other scripts. This idea is not new, but I have simplified the process by creating a small tool named runurl which adds a lot of flexibility and convenience when configuring new servers.

Usage

The basic synopsis looks like:

runurl URL [ARGS]...

The first argument to the runurl command is the URL of a script or program which should be run. All following options and arguments are passed verbatim to the program as its options and arguments. The exit code of runurl is the exit code of the program.

The runurl command is a very short and simple script, but it makes the user-data startup scripts even shorter and simpler themselves.

Example 1

If the following content is stored at http://run.alestic.com/demo/echo

#!/bin/bash
echo "$@"

then this command:

runurl run.alestic.com/demo/echo "hello, world"

will itself output:

hello, world

You can specify the “http://” in the URLs, but since it’s using wget to download them, the specifier is not necessary and the code might be easier to read without it.

Example 2

Here’s a more substantial sample user-data script which invokes a number of other remote scripts to upgrade the Ubuntu packages, install the munin monitoring software, install and run the Folding@Home application using origami with credit going to Team Ubuntu. It finally sends an email back home that it’s active.

This sample assumes that runurl is installed on the AMI (e.g., Ubuntu AMIs published on https://alestic.com>). For other AMIs, see below for additional commands to add to the start of the script.

#!/bin/bash -ex
runurl run.alestic.com/apt/upgrade
runurl run.alestic.com/install/munin
cd /root
runurl run.alestic.com/install/folding@home -u ec2 -t 45104 -b small
runurl run.alestic.com/email/start youremail@example.com

Note that the last command passes a parameter to the script, identifying where the email should be sent. Please change this if you test the script.

With the above content stored in a file named folding.user-data, you could start 5 new c1.medium instances running the Folding@Home software using the command:

ec2-run-instances \
  --user-data-file folding.user-data \
  --key [KEYPAIR] \
  --instance-type c1.medium\
  --instance-count 5 \
  ami-ed46a784

You can log on to an instance and monitor the installation with

tail -f /var/log/syslog

Once the Folding@Home application is running, you can monitor its progress with:

/root/origami/origami status

and after 15 minutes, check out the Munin system stats at

http://ec2-HOSTNAME/munin/

Expiring URLs

One of the problems with normal user-data scripts is that the contents exist as long as the instance is running and any user on the instance can read the contents of the user-data. This puts any private or confidential information in the user-data at risk.

If you put your actual startup code in private S3 buckets, you can pass runurl a URL to the contents, where the URL expires shortly after it is run. Or, the script could even delete the contents itself if you set it up correctly. This reduces the exposure to the time it takes for the instance to start up and does not let anybody else access the URL during that time.

Updating

Another benefit of keeping the actual startup code separate from the user-data content itself is that you can modify the startup code stored at the URL without modifying the user-data content.

This can be useful with services like EC2 Auto Scaling, where the specified user-data cannot be dynamically changed in a launch configuration without creating a whole new launch configuration.

If you modify the runurl scripts, the next server to be launched will automatically pick up the new instructions.

Bootstrapping

The runurl tool is pre-installed in the latest Ubuntu AMIs published on https://alestic.com. If you are using an Ubuntu image which does not include this software, you can install it from the Alestic PPA using the following commands at the top of your user-data script:

sudo add-apt-repository ppa:alestic/ppa &&
sudo apt-get update &&
sudo apt-get install -y runurl

If you are using an Ubuntu release without the add-apt-repository command or a Linux distro other than Ubuntu, you can install runurl using the following commands:

sudo wget -qO/usr/bin/runurl run.alestic.com/runurl
sudo chmod 755 /usr/bin/runurl

The subsequent commands in the user-data script can then use the runurl command as demonstrated in the above example.

SSL

To improve your certainty that you are talking to the right server and getting the right data, you could use SSL (https) in your URLs. If you are talking to S3 buckets, however, you’ll need to use the old style S3 bucket access style like:

runurl https://s3.amazonaws.com/run.alestic.com/demo/echo "hello, mars"

This is probably not as critical when accessing it from an EC2 instance as you’re operating over Amazon’s trusted network.

Caveats

There are a number of things which can go wrong when using a tool like runurl. Here are some to think about:

  • Only run content which you control or completely trust.

  • Just because you like the content of a URL when you look at it in your browser does not mean that it will still look like that when your instance goes to run it. It could change at any point to something that is broken or even malicious unless it is under your control.

  • If you depend on this approach for serious applications, you need to make sure that the content you are downloading is coming from a reliable server. S3 is reasonable (with retries) but you also need to consider the DNS server if you are depending on a non-AWS hostname to access the S3 bucket.

The name run.alestic.com points to an S3 bucket, but the DNS for this name is not redundant or worthy of use by applications with serious uptime requirements. This particular service should be considered my playground for ideas and there is no commitment on my part to make sure that it is up or that the content remains stable.

If you like what you see, please feel free to copy any of the open source content on run.alestic.com and store it on your own reliable and trusted servers. It is all published under the Apache2 license.

Project

I’m using this simple script as an opportunity to come up to speed with hosting projects on Launchpad. You can access the source code and submit bugs at

https://launchpad.net/runurl

You can also use launchpad and bazaar to branch the source into parallel projects and/or submit requests to merge patches into the main development branch.

[Update 2009-10-11: Document use of Alestic PPA]
[Update 2010-01-25: Simplify boostrap instructions for Ubuntu]
[Update 2010-08-17: Switch to using “add-apt-repository” for bootstrap instructions]

Presentation: Building Custom Linux Images for Amazon EC2

At the end of July, I gave a presentation at O’Reilly’s Open Source Convention (OSCON 2009) in San Jose. The slides from the presentation have been made available on the OSCON web site in ODP and PDF formats (look for links towards the top of the page):

Building Custom Linux Images for Amazon EC2

Bonus: For folks who live around Los Angeles (or who want to fly in?), I will be giving an extended version of this talk at the UUASC-LA (Unix Users Association of Southern California, Los Angeles Chapter) meeting on Thursday, September 3, 2009. The presentation is free and open to all. Please read the directions and instructions on the UUASC web site carefully.

New Releases of Ubuntu and Debian Images for Amazon EC2 (Tools, Security)

New updates have been released for the Ubuntu and Debian AMIs (EC2 images) published on:

https://alestic.com

The following notes apply to this release:

  • The EC2 AMI tools have been upgraded to version 1.3-34544. Note that an “apt-get upgrade” will downgrade the EC2 AMI tools because the versions of the ec2-ami-tools package currently in the Ubuntu Hardy, Intrepid, Jaunty archives are outdated. If you have an easy solution to this, please let us know.

  • The ssh host key regeneration has been moved to run after the RNG is seeded. This improves security, especially for folks who are verifying the ssh host key on the first connect by comparing the fingerprint to the value in the instance console output. Thanks to Andrew Becherer for suggesting this improvement.

  • The Ubuntu Karmic Alpha images were not updated. Due to a new kernel requirement, Karmic will no longer run on Amazon’s 2.6.21 kernel. Once Canonical releases an Ubuntu kernel with the appropriate features, the Karmic series may be resumed. Running “apt-get upgrade” on an existing Karmic AMI will cause it to be inaccessible after rebooting.

  • The Debian Etch desktop images were not updated because apparently LaTeX refuses to build a format from a source file which is more than five years old. If there is anybody who wants Etch desktops and is willing to investigate, please contact me, otherwise I plan to discontinue support for this series. If you’re using Debian, I’d encourage you to upgrade to Lenny “stable” anyway.

Please give these new images a spin and let us know if you run into any problems.

Enjoy

EBS Snapshots of a MySQL Slave Database on EC2

At our company, CampusExplorer.com, we regularly snapshot the EBS volume which holds our MySQL database using the basic procedure I outlined in the article “Running MySQL on Amazon EC2 with Elastic Block Store”, though the snapshot code has been significantly improved through our experience in the last year.

As others have reported, we also found that the background EBS snapshot process on EC2 increased IO wait on the source EBS volume holding the production database, which had a negative impact on the performance of the production web site itself. So, we moved the frequent snapshot process to a slave database which is always completely up to date with the production master database.

Aside: Having complete backups is not the only reason to do EBS snapshots. They also increase the reliability and failsafe-ness of the EBS volume itself, so we still run occasional snapshots on the production database, just during off-peak hours.

Steve Caldwell (chief tech at CampusExplorer.com) has automated some great push button EC2 system launches and configurations including EBS database volumes created from snapshots, attached, mounted, etc. This is convenient to do things like setting up a temporary development system for a contractor, starting a staging environment for QA, or running some database intensive reports on near-production data.

skip-slave-start

When EBS volumes were created from the snapshots of the slave database, Steve found that as soon as the MySQL server came up, it started replication from the master thinking it was a slave database. In some cases this is fine, but in others we want to see the database in exactly the state it was in at the time of the snapshot. We tried running “STOP SLAVE” just before creating the snapshot, but replication still resumed when the server started.

We finally found the solution in the option –skip-slave-start which tells MySQL to not start the replication process when the server comes up.

I haven’t looked into how to pass options to mysqld when running /etc/init.d/mysql so we settled on adding this line to the [mysqld] section of /etc/mysql/my.cnf by default on our non-slave servers:

skip-slave-start

On a system which we know should start as a slave, we omit this directive. On a system we want to turn into a slave, we simply run “START SLAVE” and remove this line for future restarts. If the binary logs available on the master go back to the point where the snapshot was created, then replication begins and it eventually catches up to the present.

[Update 2009-08-06: Clarify skip-start-slave meaning]

Matching EC2 Availability Zones Across AWS Accounts

Summary: EC2 availability zone names in different accounts do not match to the same underlying physical infrastructure. This article explains a trick which can be used to figure out how to match availability zone names between different accounts.

Background

As of the updating of this article (2014-11-03) Amazon EC2 (Elastic Compute Cloud) has over thirty different availability zones in eleven regions. A region can be thought of as a specific area of the world. An availability zone can be thought of roughly as a data center, defined such that no single failure scenario should affect two availability zones.

The current regions are:

  • us-east-1
  • us-west-1
  • us-west-2
  • us-gov-west-1
  • eu-central-1
  • eu-west-1
  • ap-northeast-1
  • ap-southeast-1
  • ap-southeast-2
  • sa-east-1
  • cn-north-1

The availability zones in those regions are given the region name plus simple letters appended. For example:

  • us-east-1a
  • us-east-1b
  • us-east-1c
  • us-east-1d
  • us-east-1e

and

  • eu-west-1a
  • eu-west-1b

When you start EC2 instances, you can specify an availability zone or let Amazon pick one for you. You always have a default region which can be overridden when starting instances.

Balancing

In order to prevent an overloading of a single availability zone when everybody tries to run their instances in us-east-1a, Amazon has added a layer of indirection so that each account’s availability zones can map to different physical data center equivalents.

For example, zone us-east-1a in your account might be the same as zone us-east-1c in my account and us-east-1d in a third person’s account.

In fact, given the way that Amazon has set this up, I would not be surprised if Amazon may not occasionally reassign availability zone names which you are not currently using. For example, Amazon added the fourth availability zone in the us-east-1 region, but I suspect this might not be us-east-1d in all accounts (especially new ones).

Identification

On occasion, users sometimes want to know if instances in different accounts are running in the same availability zone. Or, users might want to know which availability zone is the one with which people are currently experiencing a particular problem.

You’ll often see users say that there is a problem in zone us-east-1a but this isn’t very helpful for other users because (as described above) that name only has significance within the original user’s account.

What would be helpful is a unique identifier which maps to the underlying physical infrastructure (e.g., data center) and can be mapped to the different availability zone names in each account.

I believe that Amazon may have inadvertently let slip a way to obtain this in the current implementation of the reserved instance offering ids. In my experiments so far, these seem to be tied to something outside of the account’s availability zones and, though the ids are the same, they are mapped to different availability zone names for different accounts.

To demonstrate this I’ve arbitrarily chosen the reserved instance offerings for m3.medium, Linux/UNIX, Heavy Utilization, one year, non-Marketplace, non-VPC.

To list the mappings for a single account, you can use a command like:

aws ec2 describe-regions --output text --query 'Regions[*].[RegionName]' |
  while read region; do
    aws ec2 describe-reserved-instances-offerings --region $region --output text --query 'ReservedInstancesOfferings[*].[AvailabilityZone,ReservedInstancesOfferingId,InstanceType,OfferingType,ProductDescription,InstanceTenancy,Marketplace,Duration]'| 
      egrep 'm3\.medium.Heavy Utilization.Linux/UNIX.default.False.31536000' | 
      cut -f1,2
done | sort

The describe-reserved-instances-offerings API call can be very slow, so wait patiently for the output.

Examples

Here are the mappings for one of my accounts (let’s call it “Blue”):

ap-northeast-1b	6126282e-74c0-42db-a3c0-cfb4535203df
ap-northeast-1c	f72822dd-b4ac-431e-a8a7-74eb6cb3712d
ap-southeast-1a	ca82f875-3397-46e2-915e-1f0149617db3
ap-southeast-1b	528964f4-71a0-4209-8be7-81efc91e0742
ap-southeast-2a	97875095-d596-4535-a355-3edd28a6c404
ap-southeast-2b	31a8bf89-1ec8-4bfd-b8d9-f044d4ce37bb
eu-central-1a	223aa78f-7609-4d55-adf4-26523b76497c
eu-central-1b	0f756f35-6d7d-40e8-99d3-da56ad5e79ca
eu-west-1a	3a02219b-7eca-42f1-967b-513dc8faea80
eu-west-1b	b90c4fcf-4b9a-404b-adb0-378b0bb1d15b
eu-west-1c	d538b1e9-12c4-4b65-8ad9-fd138fa0ffb7
sa-east-1a	ada8467a-6227-4a7c-a08c-0bf7fa1a9b78
sa-east-1b	bfe7cebc-d6ce-4186-b405-16c798408f8d
us-east-1a	f0e32143-5358-4e0c-a809-b7ebb6f76f94
us-east-1c	451c0ebc-e07f-4ff1-a8c5-936965408bee
us-east-1d	afd143f6-2f95-4db4-86ac-b1210bef0254
us-east-1e	a19441e7-7e25-4d4c-bce1-71a7bced2c9a
us-west-1a	0c31ec4b-6dbf-4775-9736-137e6cae5e35
us-west-1c	b5677909-0292-4d17-b394-3286a9a10119
us-west-2a	3a32d7f4-a14a-4d25-9617-b7e2f98c3986
us-west-2b	882971b6-0f4e-454c-b72f-9b3177090aba
us-west-2c	a7ecdbc4-2441-4375-84c7-37400deccaed

Here are the mappings for a different account (let’s call it “Red”):

ap-northeast-1b	6126282e-74c0-42db-a3c0-cfb4535203df
ap-northeast-1c	f72822dd-b4ac-431e-a8a7-74eb6cb3712d
ap-southeast-1a	528964f4-71a0-4209-8be7-81efc91e0742
ap-southeast-1b	ca82f875-3397-46e2-915e-1f0149617db3
ap-southeast-2a	31a8bf89-1ec8-4bfd-b8d9-f044d4ce37bb
ap-southeast-2b	97875095-d596-4535-a355-3edd28a6c404
eu-central-1a	223aa78f-7609-4d55-adf4-26523b76497c
eu-central-1b	0f756f35-6d7d-40e8-99d3-da56ad5e79ca
eu-west-1a	b90c4fcf-4b9a-404b-adb0-378b0bb1d15b
eu-west-1b	3a02219b-7eca-42f1-967b-513dc8faea80
eu-west-1c	d538b1e9-12c4-4b65-8ad9-fd138fa0ffb7
sa-east-1a	bfe7cebc-d6ce-4186-b405-16c798408f8d
sa-east-1b	ada8467a-6227-4a7c-a08c-0bf7fa1a9b78
us-east-1a	451c0ebc-e07f-4ff1-a8c5-936965408bee
us-east-1b	f0e32143-5358-4e0c-a809-b7ebb6f76f94
us-east-1d	afd143f6-2f95-4db4-86ac-b1210bef0254
us-east-1e	a19441e7-7e25-4d4c-bce1-71a7bced2c9a
us-west-1b	0c31ec4b-6dbf-4775-9736-137e6cae5e35
us-west-1c	b5677909-0292-4d17-b394-3286a9a10119
us-west-2a	3a32d7f4-a14a-4d25-9617-b7e2f98c3986
us-west-2b	882971b6-0f4e-454c-b72f-9b3177090aba
us-west-2c	a7ecdbc4-2441-4375-84c7-37400deccaed

From this, I theorize that availability zone us-east-1a in account Blue is the same as availability zone us-east-1b in account Red, but availability zones us-east-1d happen to be the same in both accounts.

Caveats

Please note that this approach is not a documented feature of Amazon EC2. I may be misinterpreting what I am seeing and the mappings may be completely random for different accounts.

Amazon could at any time restructure how these values work so that the described offering ids cannot be used between accounts or do not map to any common infrastructure.

Use at your own risk and please post a comment if you find out any further data to support or disprove this theory.

Not all accounts have access to all regions or availability zones in those regions.

Not all accounts have access to reserved instances in the availability zones they have access to.

The Reserved Instance offering ids change from time to time over the years for unknown and unpublished reasons.

History

[Update 2011-04-25: Tweaked command line to exclude more instance types Amazon has added. Updated info for current ids in current regions/zones.]

[Update 2011-12-23: Tweaked command line to exclude more instance types Amazon has added. Updated info for current ids in current regions/zones.]

[Update 2014-11-03: Updated to use new aws-cli command syntax]

Does Your Product Help Users Build AMIs for Amazon EC2?

I will be speaking at the O’Reilly Open Source Convention (OSCON 2009) next week, giving a presentation on building custom images for Amazon EC2.

Though the presentation is primarily about doing it yourself with the practical portion focusing on Ubuntu images, I wanted to mention some of the third party services which can be used to build images of any kind for EC2.

If you run, use, or are aware of one of these AMI building services, please send me a message or post a comment on this entry.

And, for folks who still haven’t yet signed up for OSCON, here’s a code that gets you 30% off of registration: os09sphoc

Creating a New Image for EC2 by Rebundling a Running Instance

NOTE: This is an article from 2009, back before EBS boot instances were available on Amazon EC2. I recommend you use EBS boot instances which make it trivial to create new AMIs (single command/API call). Please stop reading this article now and convert to EBS boot AMIs!

When you start up an instance (server) on Amazon EC2, you need to pick the image or AMI (Amazon Machine Image) to run. This determines the Linux distribution and version as well as the initial software installed and how it is configured.

There are a number of public images to choose from with EC2 including the Ubuntu and Debian image published on https://alestic.com but sometimes it is appropriate to create your own private or public images. There are two primary ways to create an image for EC2:

  1. Create an EC2 image from scratch. This process lets you control every detail of what goes into the image and is the easiest way to automate image creation.

  2. Rebundle a running EC2 instance into a new image. This approach is the topic of the rest of this article.

After you rebundle a running instance to create a new image, you can then run new EC2 instances of that image. Each instance starts off looking exactly like the original instance as far as the files on the disk go (with a few exceptions).

This guide is primarily written in the context of running Ubuntu on EC2, but the concepts should apply without too much changing on Debian and other Linux distributions.

To use this rebundling approach, you start by running an instance of an image that (1) is as close as possible to the image you want to create, and (2) is published by a source you trust. You then proceed to install software and configure that instance so that it contains exactly what you want to be available on new instances right down to the startup scripts.

The next step is to bundle the instance’s disk image into a new AMI, but before we get to that, it is important to understand a few things about security.

Security

If you are creating a new EC2 image, you need to be very careful what pieces of information you inadvertently leave on the image, especially if you have the goal of publishing it as a public AMI. Anybody who runs an instance of that AMI will have access to the files you included in the bundle, and there is no way to modify an AMI after it has been created (though you can delete it).

For example, you don’t want to leave your AWS certificate or private key on the disk. You’ll even want to clear out the shell history file in case you had typed secret information in commands or in setting environment variables.

You also want to consider the security concerns from the perspective of the people who run the new image. For example, you don’t want to leave any passwords active on accounts. You should also make sure you don’t include your public ssh key in authorized_keys files. Leaving a back door into other people’s servers is in poor taste even if you have no intention of ever using it.

Here are some sample commands, but only you can decide if this wipes out too much or what other files you need to exclude depending on how you set up and used the instance you are bundling:

sudo rm -f /root/.*hist* $HOME/.*hist*
sudo rm -f /var/log/*.gz
sudo find /var/log -name mysql -prune -o -type f -print | 
  while read i; do sudo cp /dev/null $i; done

Whole directories can be excluded from the image using the --exclude option of the ec2-bundle-vol command (see below).

Rebundling

Now we’re ready to bundle the actual EC2 image (AMI). To start, you need to copy your certificate and key to the instance ephemeral storage. Adjust the sample command to use the appropriate keypair file for authentication and the appropriate location of your certification and private key files. If you are not running a modern Ubuntu image, then change remoteuser to “root”.

remotehost=<ec2-instance-hostname>
remoteuser=ubuntu

rsync \
  --rsh="ssh -i KEYPAIR.pem" \
  --rsync-path="sudo rsync" \
  PATHTOKEYS/{cert,pk}-*.pem \
  $remoteuser@$remotehost:/mnt/

Set up some environment variables for convenience in the following commands. A single S3 bucket can be used for multiple AMIs. The manifest prefix should be descriptive, especially if you plan to publish the AMI publicly, as it is the only piece of documentation many users will see when they look through AMI lists. At a minimum, I recommend including the Linux distribution (e.g, “ubuntu”), the architecture (e.g., “i386” or “32”), and the date (e.g., “20090621”), as well as some tag that indicates the special nature of the image (e.g., “desktop” or “lamp”).

bucket=<your-bucket-name>
prefix=<descriptive-image-title>

On the EC2 instance itself, you also set up some environment variables to help the bundle and upload commands. You can find these values in your EC2 account.

export AWS_USER_ID=<your-value>
export AWS_ACCESS_KEY_ID=<your-value>
export AWS_SECRET_ACCESS_KEY=<your-value>

if [ $(uname -m) = 'x86_64' ]; then
  arch=x86_64
else
  arch=i386
fi

Bundle the files on the current instance into a copy of the image under /mnt:

sudo -E ec2-bundle-vol \
  -r $arch \
  -d /mnt \
  -p $prefix \
  -u $AWS_USER_ID \
  -k /mnt/pk-*.pem \
  -c /mnt/cert-*.pem \
  -s 10240 \
  -e /mnt,/root/.ssh,/home/ubuntu/.ssh

Upload the bundle to a bucket on S3:

ec2-upload-bundle \
   -b $bucket \
   -m /mnt/$prefix.manifest.xml \
   -a $AWS_ACCESS_KEY_ID \
   -s $AWS_SECRET_ACCESS_KEY

Now that the AMI files have been uploaded to S3, you register the image as a new AMI. This is done back on your local system (with the API tools installed):

ec2-register \
  --name "$bucket/$prefix" \
  $bucket/$prefix.manifest.xml

The output of this command is the new AMI id which is used to run new instances of that image.

It is important to use the same account access information for the ec2-bundle-vol and ec2-register commands even though they are run on different systems. If you don’t you’ll get an error indicating you don’t have the rights to register the image.

Public Images

By default, the new EC2 image is private, which means it can only be seen and run by the user who created it. You can share access with another individual account or with the public.

To let another EC2 user run the image without giving access to the world:

ec2-modify-image-attribute -l -a <other-user-id> <ami-id>

To let all other EC2 users run instances of your image:

ec2-modify-image-attribute -l -a all <ami-id>

Cost

AWS will charge you standard S3 charges for the stored AMI files which comes out to $0.15 per GB per month. Note, however, that the bundling process uses sparse files and compression, so the final storage size is generally very small and your resulting cost may only be pennies per month.

The AMI owner incurs no charge when users run the image in new instances. The users who run the AMI are responsible for the standard hourly instance charges.

Cleanup

Before removing any public image, please consider the impact this might have on people who depend on that image to run their business. Once you publish an AMI, there is no way to tell how many users are regularly creating instances of that AMI and expecting it to stay available. There is also no way to communicate with these users to let them know that the image is going away.

If you decide you want to remove an image anyway, here are the steps to take.

Deregister the AMI

ec2-deregister ami-XXX

Delete the AMI bundle in S3:

ec2-delete-bundle \
  --access-key $AWS_ACCESS_KEY_ID \
  --secret-key $AWS_SECRET_ACCESS_KEY \
  --bucket $bucket \
  --prefix $prefix

[Update 2009-09-12: Security tweak for running under non-root.] [Update 2010-02-01: Update to use latest API/AMI tools and work for Ubuntu 9.10 Karmic.]

New Releases of Ubuntu Images for Amazon EC2 2009-06-23 (Karmic Koala Alpha released)

Ubuntu Karmic Koala Alpha is being developed and will be released as Ubuntu 9.10 in October. If you want to play around with Karmic Alpha on Amazon EC2, I have published new AMIs in the US and EU regions for 32- and 64-bit:

https://alestic.com

A Karmic desktop image for EC2 is also available if you wish to monitor progress in that area.

Warning! Karmic is an unstable alpha developer version and is not intended for use in anything resembling a production environment.

Please note that we are still defaulting to Amazon’s 2.6.21fc8 kernel which, though functional and stable, is getting older and older for each new release of Ubuntu. One effect of this is that AppArmor will not be enabled, though this should not affect the functionality of any software.

Enjoy!

Using RAID on EC2 EBS Volumes to Break the 1TB Barrier and Increase Performance

Amazon EC2 currently has a limit of 1,000 GB (1 TB) for EBS volumes (Elastic Block Store). It is possible to create file systems larger than this limit using RAID 0 across multiple EBS volumes. Using RAID 0 can also improve the performance of the file system reducing total IO wait as demonstrated in a number of published EBS performance tests.

The following instructions walk through one way to set up RAID 0 across multiple EBS volumes. Note that there is a limit on the size of a file system on 32-bit instances, but 64-bit instances can get unreasonably large. This test was run with 40 EBS volumes of 1,000 GB each for a total of 40,000 GB (40 TB) in the resulting file system.

Actual command line output showing the size of the RAID:

# df /vol
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md0             41942906368      1312 41942905056   1% /vol

# df -h /vol
Filesystem            Size  Used Avail Use% Mounted on
/dev/md0               40T  1.3M   40T   1% /vol

These commands can run in less than 10 minutes and this could probably be reduced further by parallelizing the creation and attaching of the EBS volumes.

Note that the default limit is 20 EBS volumes per EC2 account. You can request an increase from Amazon if you need more.

Caution: 40 TB of EBS storage on EC2 will cost $4,000 per month plus usage charges.

Instructions

Start a 64-bit instance (say, Ubuntu 8.04 Hardy from https://alestic.com). Use your own KEYPAIR:

ec2-run-instances \
  --key KEYPAIR \
  --instance-type c1.xlarge \
  --availability-zone us-east-1a \
  ami-0772946e

Configurable parameters (set on both local host and on EC2 instance):

instanceid=i-XXXXXXXX
volumes=40
size=1000
mountpoint=/vol

On the local host (with EC2 API tools installed)…

Create and attach EBS volumes:

devices=$(perl -e 'for$i("h".."k"){for$j("",1..15){print"/dev/sd$i$j\n"}}'|
           head -$volumes)
devicearray=($devices)
volumeids=
i=1
while [ $i -le $volumes ]; do
  volumeid=$(ec2-create-volume -z us-east-1a --size $size | cut -f2)
  echo "$i: created  $volumeid"
  device=${devicearray[$(($i-1))]}
  ec2-attach-volume -d $device -i $instanceid $volumeid
  volumeids="$volumeids $volumeid"
  let i=i+1
done
echo "volumeids='$volumeids'"

On the EC2 instance (after setting parameters as above)…

Install software:

sudo apt-get update &&
sudo apt-get install -y mdadm xfsprogs

Set up the RAID 0 device:

devices=$(perl -e 'for$i("h".."k"){for$j("",1..15){print"/dev/sd$i$j\n"}}'|
           head -$volumes)

yes | sudo mdadm \
  --create /dev/md0 \
  --level 0 \
  --metadata=1.1 \
  --chunk 256 \
  --raid-devices $volumes \
  $devices

echo DEVICE $devices       | sudo tee    /etc/mdadm.conf
sudo mdadm --detail --scan | sudo tee -a /etc/mdadm.conf

Create the file system (pick your preferred file system type)

sudo mkfs.xfs /dev/md0

Mount:

echo "/dev/md0 $mountpoint xfs noatime 0 0" | sudo tee -a /etc/fstab
sudo mkdir $mountpoint
sudo mount $mountpoint

Check it out:

df -h $mountpoint

When you’re done with it and want to destroy the data and stop paying for storage, tear it down:

sudo umount $mountpoint
sudo mdadm --stop /dev/md0

Terminate the instance:

sudo shutdown -h now

On the local host (with EC2 API tools installed)…

Detach and delete volumes:

for volumeid in $volumeids; do
  ec2-detach-volume $volumeid
done

for volumeid in $volumeids; do
  ec2-delete-volume $volumeid
done

Credits

This article was originally posted on the EC2 Ubuntu group.

Thanks to M. David Peterson for the basic mdadm instructions:

[Update 2012-01-21: Added –chunk 256 based on community recognized best practices.]

New Releases of Ubuntu and Debian Images for Amazon EC2 2009-06-14 (Reliability and Security)

New updates have been released for the Ubuntu and Debian AMIs (EC2 images) published on:

https://alestic.com

The following improvements are included in this release:

  • Ubuntu 9.04 Jaunty now uses an Ubuntu mirror inside of EC2 hosted by RightScale. This dramatically improves the performance of updates and upgrades. Hardy and Intrepid were already using the mirrors inside EC2.

  • The Hardy, Intrepid, and Jaunty images have been enhanced to add failover for Ubuntu archive mirror hosts across availability zones (data centers). This change lets an Ubuntu instance perform package updates and upgrades even if one or two of the EC2 availability zones are completely unavailable.

  • The denyhosts package is now installed on desktop images for improved security. The Amazon abuse team has identified the Ubuntu desktop images as a source of compromised systems. The cause for this is believed to be unsecure passwords set by users, since the desktop images have PasswordAuthentication enabled by default so that the NX client can connect. The denyhosts package blocks ssh attacks by adding remote systems to /etc/hosts.deny if they keep failing password logins.

    The published Ubuntu and Debian server images continue to have PasswordAuthentication turned off by default for improved security. If you choose to turn this on, I recommend installing a package like denyhosts and using software like the following to generate secure passwords:

      sudo apt-get install pwgen
      pwgen -s 10 1
    
  • The EC2 AMI tools have been upgraded to version 1.3-31780.

  • All software packages have been updated to versions current as of 2009-06-14.

Community support for Ubuntu on EC2 is available in this group:

http://groups.google.com/group/ec2ubuntu

Community support for Debian on EC2 is available in this group:

http://groups.google.com/group/ec2debian

The 32-bit Debian squeeze images and the 32-bit Debian etch desktop image have not been updated yet due to problems with initial package installation. Images will be released when these issues are resolved.

The following enhancements have been made to the ec2ubuntu-build-ami software which is used to build Ubuntu and Debian images for EC2.

  • New --kernel and --ramdisk options have been added to specify AKI and ARI. If you specify a different kernel, you should also specify kernel modules with --package or install them with the --script option.

  • Support has been removed for Ubuntu Edgy, Feisty, and Gutsy. These releases have reached their end of life. To improve the clarity of the code this software no longer supports building these images.

  • There has been a typo fix for $originaldir for folks who were using the --script option.

  • There has been a typo fix for /dev/ptmx though it apparently had no effect given how these images are built.

Thanks to Stephen Parkes and Paul Dowman for submitting patches.

Enjoy!

Using Elastic IP to Identify Internal Instances on Amazon EC2

Elastic IP

Amazon EC2 supports Elastic IP Addresses to implement the effect of having a static IP address for public servers running on EC2. You can point the Elastic IP at any of your EC2 instances, changing the active instance at any time, without changing the IP address seen by the public outside of EC2.

This is a valuable feature for things like web and email servers, especially if you need to replace a failing server or upgrade or downgrade the hardware capabilities of the server, but read on for an insiders' secret way to use Elastic IP addresses for non-public servers.

Internal Servers

Not all servers should be publicly accessible. For example, you may have an internal EC2 instance which hosts your database server accessed by other application instances inside EC2. You want to architect your installation so that you can replace the database server (instance failure, resizing, etc) but you want to make it easy to get all your application servers to start using the new instance.

There are a number of design approaches which people have used to accomplish this, including:

  1. Hard code the internal IP address into the applications and modify it whenever the internal server changes to a new instance (ugh and ouch).

  2. Run your own DNS server (or use an external DNS service) and change the IP address of the internal hostname to the new internal IP address (extra work and potentially extra failover time waiting for DNS propagation).

  3. Store the internal IP address in something like SimpleDB and change it when you want to point to a new EC2 instance (extra work and requires extra coding for clients to keep checking the SimpleDB mapping)

The following approach is the one I use and is the topic of the rest of this article:

  1. Assign an Elastic IP to the internal instance and use the external Elastic IP DNS name. To switch servers, simply re-assign the Elastic IP to a new EC2 instance

This last option uses a little-known feature of the Elastic IP Address system as implemented by Amazon EC2:

When an EC2 instance queries the external DNS name of an Elastic IP, the EC2 DNS server returns the internal IP address of the instance to which the Elastic IP address is currently assigned.

You may need to read that a couple times to grasp the implications as it is non-obvious that an “external” name will return an “internal” address.

Setting Up

You can create an Elastic IP address in an number of ways including the EC2 Console or the EC2 API command line tools. For example:

$ ec2-allocate-address 
ADDRESS	75.101.137.243

The address returned at this point is the external Elastic IP address. You don’t want to use this external IP address directly for internal server access since you would be charged for network traffic.

The next step is to assign the Elastic IP address to an EC2 instance (which is going to be your internal server):

$ ec2-associate-address -i i-07612d6e 75.101.137.243
ADDRESS	75.101.137.243	i-07612d6e

Once the Elastic IP has been assigned to an instance, you can describe that instance to find the external DNS name (which will include the external Elastic IP address in it):

$ ec2-describe-instances i-07612d6e | egrep ^INSTANCE | cut -f4
ec2-75-101-137-243.compute-1.amazonaws.com

This is the permanent external DNS name for that Elastic IP address no matter how many times you change the instance to which it is assigned. If you query this DNS name from outside of EC2, it will resolve to the external IP address as shown above:

$ dig +short ec2-75-101-137-243.compute-1.amazonaws.com
75.101.137.243

However, if you query this DNS name from inside an EC2 instance, it will resolve to the internal IP address for the instance to which it is currently assigned:

$ dig +short ec2-75-101-137-243.compute-1.amazonaws.com
10.254.171.132

You can now use this external DNS name in your applications on EC2 instances to communicate with the server over the internal EC2 network and you won’t be charged for the network traffic as long as you’re in the same EC2 availability zone.

Changing Servers

If you ever need to move the service to a new EC2 instance, simply reassign the Elastic IP address to the new EC2 instance:

$ ec2-associate-address -i i-3b783452 75.101.137.243
ADDRESS	75.101.137.243	i-3b783452     

and the original external DNS name will immediately resolve to the internal IP address of the new instance:

$ dig +short ec2-75-101-137-243.compute-1.amazonaws.com
10.190.134.5

Existing connections will fail and new connections to the external DNS name will automatically be opened on the new instance, using either the public IP address or the private IP address depending on where the client is when requesting DNS resolution.

Using CNAME

It is not entirely intuitive to have your application use names like ec2-75-101-137-243.compute-1.amazonaws.com but you can make it clearer by creating a permanent entry in your DNS which points to that name with a CNAME alias. For example, using bind:

db.example.com.    CNAME    ec2-75-101-137-243.compute-1.amazonaws.com.

You can then use db.example.com to refer to the server internally and still not have to update your DNS when you change instances.

Further Notes

Even though you are using an Elastic IP address, you don’t need (and often don’t want) to allow external users to be able to access your internal servers. For example, it is just asking for trouble to expose a MySQL server to the Internet. Keep the security groups tight so that the internal servers and services can only be accessed from your other EC2 instances.

Open TCP connections to the original server will not survive when the Elastic IP address is assigned to a new EC2 instance. Some applications and clients will automatically attempt to re-open a failed connection, getting through to the new server on the new internal IP address, but other applications may need to be kicked or signaled so they attempt a new connection to the server.

When using this approach, you need one Elastic IP address for each internal server which needs to be addressed. AWS accounts default to a limit of 5 Elastic IP addresses, but you can request an increased limit.

How do you solve the problem of connecting internal EC2 servers to each other?

Update 2009-07-20: Correct example host name.
Update 2012-03-06: Here’s the original forum post from Amazon that revealed this trick: Elastic internal IP address
Update 2012-04-02: Use different internal IP address for new instance example.