Creating a New Image for EC2 by Rebundling a Running Instance

| 15 Comments | 3 TrackBacks

When you start up an instance (server) on Amazon EC2, you need to pick the image or AMI (Amazon Machine Image) to run. This determines the Linux distribution and version as well as the initial software installed and how it is configured.

There are a number of public images to choose from with EC2 including the Ubuntu and Debian image published on http://alestic.com but sometimes it is appropriate to create your own private or public images. There are two primary ways to create an image for EC2:

  1. Create an EC2 image from scratch. This process lets you control every detail of what goes into the image and is the easiest way to automate image creation.

  2. Rebundle a running EC2 instance into a new image. This approach is the topic of the rest of this article.

After you rebundle a running instance to create a new image, you can then run new EC2 instances of that image. Each instance starts off looking exactly like the original instance as far as the files on the disk go (with a few exceptions).

This guide is primarily written in the context of running Ubuntu on EC2, but the concepts should apply without too much changing on Debian and other Linux distributions.

To use this rebundling approach, you start by running an instance of an image that (1) is as close as possible to the image you want to create, and (2) is published by a source you trust. You then proceed to install software and configure that instance so that it contains exactly what you want to be available on new instances right down to the startup scripts.

The next step is to bundle the instance’s disk image into a new AMI, but before we get to that, it is important to understand a few things about security.

Security

If you are creating a new EC2 image, you need to be very careful what pieces of information you inadvertently leave on the image, especially if you have the goal of publishing it as a public AMI. Anybody who runs an instance of that AMI will have access to the files you included in the bundle, and there is no way to modify an AMI after it has been created (though you can delete it).

For example, you don’t want to leave your AWS certificate or private key on the disk. You’ll even want to clear out the shell history file in case you had typed secret information in commands or in setting environment variables.

You also want to consider the security concerns from the perspective of the people who run the new image. For example, you don’t want to leave any passwords active on accounts. You should also make sure you don’t include your public ssh key in authorized_keys files. Leaving a back door into other people’s servers is in poor taste even if you have no intention of ever using it.

Here are some sample commands, but only you can decide if this wipes out too much or what other files you need to exclude depending on how you set up and used the instance you are bundling:

sudo rm -f /root/.*hist* $HOME/.*hist*
sudo rm -f /var/log/*.gz
sudo find /var/log -name mysql -prune -o -type f -print | 
  while read i; do sudo cp /dev/null $i; done

Whole directories can be excluded from the image using the --exclude option of the ec2-bundle-vol command (see below).

Rebundling

Now we’re ready to bundle the actual EC2 image (AMI). To start, you need to copy your certificate and key to the instance ephemeral storage. Adjust the sample command to use the appropriate keypair file for authentication and the appropriate location of your certification and private key files. If you are not running a modern Ubuntu image, then change remoteuser to “root”.

remotehost=<ec2-instance-hostname>
remoteuser=ubuntu

rsync                           \
  --rsh="ssh -i KEYPAIR.pem"     \
  --rsync-path="sudo rsync"    \
  PATHTOKEYS/{cert,pk}-*.pem    \
  $remoteuser@$remotehost:/mnt/

Set up some environment variables for convenience in the following commands. A single S3 bucket can be used for multiple AMIs. The manifest prefix should be descriptive, especially if you plan to publish the AMI publicly, as it is the only piece of documentation many users will see when they look through AMI lists. At a minimum, I recommend including the Linux distribution (e.g, “ubuntu”), the architecture (e.g., “i386” or “32”), and the date (e.g., “20090621”), as well as some tag that indicates the special nature of the image (e.g., “desktop” or “lamp”).

bucket=<your-bucket-name>
prefix=<descriptive-image-title>

On the EC2 instance itself, you also set up some environment variables to help the bundle and upload commands. You can find these values in your EC2 account.

export AWS_USER_ID=<your-value>
export AWS_ACCESS_KEY_ID=<your-value>
export AWS_SECRET_ACCESS_KEY=<your-value>

if [ $(uname -m) = 'x86_64' ]; then
  arch=x86_64
else
  arch=i386
fi

Bundle the files on the current instance into a copy of the image under /mnt:

sudo -E ec2-bundle-vol           \
  -r $arch                       \
  -d /mnt                        \
  -p $prefix                     \
  -u $AWS_USER_ID                \
  -k /mnt/pk-*.pem               \
  -c /mnt/cert-*.pem             \
  -s 10240                       \
  -e /mnt,/root/.ssh,/home/ubuntu/.ssh

Upload the bundle to a bucket on S3:

ec2-upload-bundle                \
    -b $bucket                   \
    -m /mnt/$prefix.manifest.xml \
    -a $AWS_ACCESS_KEY_ID        \
    -s $AWS_SECRET_ACCESS_KEY

Now that the AMI files have been uploaded to S3, you register the image as a new AMI. This is done back on your local system (with the API tools installed):

ec2-register \
  --name "$bucket/$prefix" \
  $bucket/$prefix.manifest.xml

The output of this command is the new AMI id which is used to run new instances of that image.

It is important to use the same account access information for the ec2-bundle-vol and ec2-register commands even though they are run on different systems. If you don’t you’ll get an error indicating you don’t have the rights to register the image.

Public Images

By default, the new EC2 image is private, which means it can only be seen and run by the user who created it. You can share access with another individual account or with the public.

To let another EC2 user run the image without giving access to the world:

ec2-modify-image-attribute -l -a <other-user-id> <ami-id>

To let all other EC2 users run instances of your image:

ec2-modify-image-attribute -l -a all <ami-id>

Cost

AWS will charge you standard S3 charges for the stored AMI files which comes out to $0.15 per GB per month. Note, however, that the bundling process uses sparse files and compression, so the final storage size is generally very small and your resulting cost may only be pennies per month.

The AMI owner incurs no charge when users run the image in new instances. The users who run the AMI are responsible for the standard hourly instance charges.

Cleanup

Before removing any public image, please consider the impact this might have on people who depend on that image to run their business. Once you publish an AMI, there is no way to tell how many users are regularly creating instances of that AMI and expecting it to stay available. There is also no way to communicate with these users to let them know that the image is going away.

If you decide you want to remove an image anyway, here are the steps to take.

Deregister the AMI

ec2-deregister ami-XXX

Delete the AMI bundle in S3:

ec2-delete-bundle \
  --access-key $AWS_ACCESS_KEY_ID \
  --secret-key $AWS_SECRET_ACCESS_KEY \
  --bucket $bucket \
  --prefix $prefix

[Update 2009-09-12: Security tweak for running under non-root.] [Update 2010-02-01: Update to use latest API/AMI tools and work for Ubuntu 9.10 Karmic.]

3 TrackBacks

TrackBack URL: http://alestic.com/mt/mt-tb.cgi/37

Notes on creating a new EC2 image by rebundling a running instance: http://alestic-rebundle.notlong.com Read More

rebundling an ec2 ami from Confluence: Project - MasterCard on November 4, 2009 9:06 AM

most of what I got is lifted from here Read More

This will be in quick note form for now. Reference these docs: Read More

15 Comments

Excellent information. I'm preparing to release several AMIs based on my work with Chapter Three in developing ready-to-go installations of the Drupal CMS (and associated stack) for clients. Will keep this on file before I send anything out.

Now that AWS is offering reserved instances at extremely competitive prices, with the right backup and restore structure it's possible to deliver really great value for clients who are ready to take the leap into the cloud.

One question, given that there's no way to update an AMI, what's your recommended process for versioning releases? Most alestic AMI's come with a datestamp, which is great, but how do you keep track of all this internally?

Anyway, I have to tip my hat to you, sir. Without this kind of trailblazing work (and all the free AMIs from alestic) none of my contributions would be possible.

My problem with learning how to bundle or rebundle is that there are no docs *anywhere* I can find that provide enough details about what to include and not include. It's a nightmare trying to figure out how the basics of bundling work, let alone trying to hammer out how to deal with the Alestic layer applied on top of it in order to accomplish a rebundle.

1. For bundling, the default list of directories excluded by ec2-bundle-vol includes /dev. Does one really not need to include this? Will Amazon automatically create a /dev directory and fill it with all necessary devices (/dev/null, /dev/random, /dev/zero)? If /dev needs to be bundled, does bundling actually manage to correctly "copy" special devices like /dev/random, and won't bundling /dev include all the harddrive mount sources (/dev/sda1, etc), which would conflict with the separate block device config given to ec2-bundle-vol?

2. From the point of view of Alestic rebundling: how to make sure that the bundled image will start over "fresh", including Alestic startup script(s)? Specific example is that after rebundling, my rebundled AMI must be capable of using the Alestic functionality for executing the userdata script on first boot of all instances.

3. Finally, what about the creation/population of /root/authorized_keys? Is that something that an Alestic-included script does, or is that something Amazon does in the underlying instance launching? And if it's Amazon, is it done via a first-boot script I can locate on the filesystem, or does Amazon create that file from outside the OS (perhaps by the VM layer manually mounting the drive to create it).

Add on the fact that the kernel is somehow magically located external to the actual filesystem via the VM layer, and the confusion just continues. :|

frickenate: I'm confused about why you have so many questions about rebundling after reading an article which describes exactly how to rebundle. Why not try the steps listed, review the results, and then post questions you have? The best forum for general EC2 questions is the Amazon EC2 forum.

Eric,

First off, thanks for all your hard work...I use a few different AMI's that you have put together and I really appreciate it.

I have been rebundling your AMI's following the steps you've outlined above, but I've run into an issue I'm hoping you can shed some light on for me..

when I rebundle an AMI and launch a new instance from it, none of the user-data scripts are run....I can confirm this by checking in the syslog...I went a step further and setup a script to be run at startup in init.d/ as outlined here:
http://snippets.dzone.com/posts/show/6200

But again, this script doesnt run when i rebundle the AMI and launch a new instance from it.


Do you have any pointers?

Thanks!
R

robbucci: The fact that your init.d script is not running implies that the problem is deeper than the user-data script itself.

@Eric, Nice article!
@Robbucci, I'm also facing the same problem while re-bundling eucalyptus images (Ubuntu). Whenever I'm running my user-scripts with originally bundled images, it worked but re-bundling as mentioned with this article as well as scripts given by UEC [ https://help.ubuntu.com/community/UEC/BundlingImages ], never worked for me. I think it is access-permission-issue for user-script to run in super-user mode. Not sure though!

Cheers,
Dipak Chirmade

Eric,

First, your website is *awesome*. The is sooo much good concise perfectly-working information here. Lots of signal. Very little noise. Except for this paragraph you're reading right now perhaps :-P

My question: the new Canonical Karmic amis come with euca2ools instead of ami tools. The euca2ools are muuuch faster than Amazon's ec2 tools, though they feel a little beta, but: have you tried rebundling karmic using them? Could you get an instance to start from the resulting image afterwards? I found I could bundle easily enough, register the image, but starting the image using ec2-run-instances, the image looked like it was starting, and then immediately terminated. Ideas?

hughperkins: euca2ools is faster than the API tools, but not the AMI tools given the different type of work they are doing. That aside, I haven't been able to use euca2ools to bundle/upload/register successfully yet due to a bug. You can install the ec2-ami-tools package on Karmic (multiverse) and use them instead.

Above you suggest running the EXPORT for the secret access key right before bundling

But - that will leave it in the ubuntu user history I think. So, clearing the shell history should be the last step before bundling.

In the latest version of the ec2 API tolls (ec2-api-tools-1.3-46266), at least, the ec2-register command also seems to require the name (-n) parameter to be passed in.

Following these instruction I hit a minor snag with the new instance, the /tmp dir is only writable by root, so commands like "crontab -e" fail.

on the original instance:

drwxrwxrwt 6 root root 4096 2010-02-01 18:20 tmp

on the instance booted from the bundled AMI:

drwxr-xr-x 4 root root 4096 2010-02-01 18:21 tmp

obviously this is easy to fix manually, but is there something missing from the bundling process or the firstboot script?

pwolanin: As I point out under "SECURITY", you have to be careful not to leave any private information on an AMI if you are going to make it public and clearing the history is one part of that. Each publisher will need to evaluate their own case and take the necessary steps.

If you are going to make an AMI public, for security and general image freshness I recommend building it from scratch instead of rebundling a running instance. Here are a couple articles about that:

http://alestic.com/2010/01/ec2-ebs-boot-ubuntu
http://alestic.com/2010/01/vmbuilder-ebs-boot-ami

Thanks also for pointing out the -n parameter. I'll take another run through the tutorial and update it.

pwolanin: The AMIs I built and published under the "alestic" name included a startup trigger which set up /tmp with the correct permissions. The Canonical (e.g., Karmic) AMIs do not have this, so you may not want to exclude (-e) the /tmp directory when bundling. I'll take it out of the example commands.

interestingly - on 9.10 at least, I had to use "history -c
" to clear my history (not sure where it's being stored). The commands above cleared what's in my home directory, but various private keys were still visible.

Also, the .pem uploads should go to /mnt not /tmp if that's no longer in the -e list.

e.g.
scp -i KEYPAIR.pem \
/{cert,pk}-*.pem \
$remoteuser@$remotehost:/mnt


and then:

/tmp/cert-*.pem --> /mnt/cert-*.pem

pwolanin: History is stored in $HOME/.bash_history (eventually). I've updated the commands to work with the latest Ubuntu 9.10 Karmic AMI as a base, and the latest API/AMI tools. Thanks again for your testing and feedback.

Leave a comment

Stay Updated

Subscribe with email address:
 Subscribe with a reader
Join the EC2 Ubuntu Google Group
Follow Eric Hammond on Twitter

More Entries

New Ubuntu 8.04.3 Hardy AMIs for Amazon EC2
Scott Moser (Canonical) built and released new Ubuntu 8.04.3 LTS Hardy images and AMIs for Amazon EC2. I also published…
Southern California Linux Expo - Februrary 19-21, 2010 at the Westin LAX
The 8th Southern California Linux Expo (aka SCaLE 8x) is a community organized, non-profit event. Those words and the incredibly…
Public EBS Boot AMIs for Ubuntu on Amazon EC2
If you’ve been following along, you probably know that I have been recommending that folks using EC2 switch to the…
How to Report Bugs with Ubuntu on Amazon EC2: ubuntu-bug
The official Ubuntu AMIs published by Canonical for EC2 starting in October have proven to be solid and production worthy.…
Three Ways to Protect EC2 Instances from Accidental Termination and Loss of Data
Here are a few little-publicized benefits that were launched with Amazon EC2’s new EBS boot instances: You can lock them…
Building EBS Boot AMIs Using Canonical's Downloadable EC2 Images
In the last article, I described how to use the vmbuilder software to build an EBS boot AMI from scratch…
Building EBS Boot and S3 Based AMIs for EC2 with Ubuntu vmbuilder
Here’s my current recipe for how to build an Ubuntu 9.10 Karmic AMI, either the new EBS boot or the…
Call for testers (building EBS boot AMIs with Ubuntu vmbuilder)
I’m polishing up an article about how to build images from scratch with Ubuntu vmbuilder, both for S3 based AMIs…
ec2-consistent-snapshot release 0.1-9
Thanks to everybody who submitted bug reports and feature requests for ec2-consistent-snapshot, software which can be used to create consistent…
Listing Recent Prices for EC2 Spot Instances
The new spot instances on EC2 are a great way to get some extra compute power at a price you…
Increasing Root Disk Size of an "EBS Boot" AMI on EC2
Amazon EC2’s new EBS Boot feature not only provides persistent root disks for instances, but also supports root disks larger…
Ubuntu Karmic Desktop on EC2
As Thilo Maier pointed out in comments on my request for UDS input, I have been publishing both server and…
Ubuntu Developer Summit - EC2 Lucid
For the last year I have been working with Canonical and the Ubuntu server team, helping to migrate over to…
New --mysql-stop option for ec2-consistent-snapshot
The ec2-consistent-snapshot software tries its best to flush and lock a MySQL database on an EC2 instance while it initiates…
Understanding Access Credentials for AWS/EC2
Amazon Web Services (AWS) has a dizzying proliferation of credentials, keys, ids, usernames, certificates, passwords, and codes which are used…
How *Not* to Upgrade to Ubuntu 9.10 Karmic on Amazon EC2
WARNING! Though most Ubuntu 9.04 Jaunty systems can upgrade to 9.10 Karmic in place, this is not possible on EC2…
1 TB of Memory in 1 Minute with 1 Command
Amazon Web Services just announced the release of two new instance types for EC2. These new types have 34.2 GB…
New Releases of Ubuntu and Debian Images for Amazon EC2 (Kernel, Security, PPA, runurl, Tools)
New updates have been released for the Ubuntu and Debian AMIs (EC2 images) published on: http://alestic.com The following notes apply…
Encrypting Ephemeral Storage and EBS Volumes on Amazon EC2
Over the years, Amazon has repeatedly recommended that customers who care about the security of their data should consider encrypting…
Creating Consistent EBS Snapshots with MySQL and XFS on EC2
In the article Running MySQL on Amazon EC2 with Elastic Block Store I describe the principles involved in using EBS…