Keeping File Ownership (UIDs) Consistent when Using EBS on EC2

Persistent storage on Amazon EC2 is accomplished through the use of Elastic Block Store (EBS) volumes. EBS is basically a storage area network (SAN) and can be thought of as an on-demand, virtual, redundant hard drive plugged in to the server with super-powers like snapshot/restore.

An EBS volume can be detached from one EC2 instance and attached to another. You can create a snapshot of an EBS volume and create new volumes from the snapshot to attach to other instances. Though this flexibility provides some useful abilities, it also presents some challenges.

In particular, the files stored on the EBS volume will be owned by specific numeric UIDs (users) and GIDs (groups). When you fire up and configure a new instance, the UIDs and GIDs on the EBS volume may not exactly match the numeric ids of the users and groups on the new instance, depending on how you set it up.

For example, when you install the MySQL software, the package will generally create a new “mysql” user with the next available UID. If you don’t create the various users in exactly the same order on new instances, you may end up with your database files owned by the “postfix” user instead of the “mysql” user. It’s happened to me and I’m not the only one.

There is a discussion about this topic on the ec2ubuntu Google Group and it has also been raised on Canonical’s EC2 beta mailing list.

Here are some of the different approaches to avoiding or fixing this problem:

  1. Bundle your own AMIs and always run instances of the same AMI when attaching EBS volumes with files. This works if you already have to bundle your AMIs for other reasons, but I often recommend against AMI rebundling because of the efforts involved, lack of reproducibility, and maintenance problems when the base image gets updated or has bugs fixed.

  2. Automate the creation of users and installation of packages in exactly the same order every time. This is likely to give you the same UID/GID values for each user, but it starts to get messy if you end up with an order mixing human users and software package users:

  3. Create all users/groups with hardcoded UIDs/GIDs before installing software packages. If you automate the creation of users and groups you can force the “mysql” and “postfix” users to have a specific UID value. Then you install the MySQL and Postfix packages and the software will use the users which already exist on the system. We ended up following this approach with our EC2 servers at CampusExplorer.com

  4. Correct the ownership of files after mounting the EBS volume. This feels a bit messy to me, but it might be the only option in some cases. I must admit that I’ve done this manually a number of times, but only after finding problems like MySQL not starting because the files aren’t owned by the correct user. For example, say you needed to change files currently owned by “postfix” to be correclty owned by “mysql”:

     find /vol -user postfix -print0 | xargs -0 chown mysql
    

    If you are changing ownership of files after mounting the EBS volume, make sure you do it in an order which does not lose information. For example, if you have to swap “postfix” and “mysql” users, you’ll need to use a temporary third UID as a placeholder.

  5. On the ec2ubuntu Google group it was suggested that a central authority might be a way to solve the problem. I’ve never used this approach on Linux and am not sure how much work it would be setting up a reliable service like this on EC2.

No matter what approach you use, it might be a good idea to add in some checks after you mount an EBS volume to make sure that the files are owned by the appropriate users. For example, you might verify that the mysql directory is owned by the mysql user

Solving this problem is something that I have only begun to work on, so I would appreciate any comments, pointers, and solutions that you may have.

Tip: Get Startup Time of EC2 Instance from meta-data

Dmitriy Samovskiy discovered that the startup time of an EC2 instance (not the latest boot time) is hidden in the “Last-Modified” header of the EC2 meta-data response. You can only query this from the instance itself, but this should perform better than querying the EC2 API, especially if you tend to use Amazon’s Java command line tools.

For example:

HEAD http://169.254.169.254/latest/meta-data/local-ipv4  | 
  egrep ^Last-Modified: | cut -f2- -d' '

Dmitriy has published a short bash script to calculate the instance run time using this trick:

http://somic.org/2009/06/04/how-long-ago-was-this-ec2-instance-started/

As he points out, this is not documented by AWS, so be careful assuming it will always behave this way.

Automate EC2 Instance Setup with user-data Scripts

user-data Scripts

The Ubuntu and Debian EC2 images published on https://alestic.com allow you to send in a startup script using the EC2 user-data parameter when you run a new instance. This functionality is useful for automating the installation and configuration of software on EC2 instances.

The basic rule followed by the image is:

If the instance user-data starts with the two characters #! then the instance runs it as the root user on the first boot.

The “user-data script” is run late in the startup process, so you can assume that networking and other system services are functional.

If you start an EC2 instance with any user-data which does not start with #! the image simply ignores it and allows your own software to access and use the data as it sees fit.

This same user-data startup script functionality has been copied in the Ubuntu images published by Canonical, and your existing user-data script should be portable across images with little change. Read a comparison of the Alestic and Canonical EC2 images.

Example

Here is a sample user-data script which sets up an Ubuntu LAMP server on a new EC2 instance:

#!/bin/bash
set -e -x
export DEBIAN_FRONTEND=noninteractive
apt-get update && apt-get upgrade -y
tasksel install lamp-server
echo "Please remember to set the MySQL root password!"

Save this to a file named, say, install-lamp and then pass it to a new EC2 instance, say, Ubuntu 9.04 Jaunty:

ec2-run-instances --key KEYPAIR --user-data-file install-lamp ami-bf5eb9d6

Please see https://alestic.com for the latest AMI ids for Ubuntu and Debian.

Note: This simplistic user-data script is for demonstration purposes only. Though it does set up a fully functional LAMP server which may be as good as some public LAMP AMIs, it does not take into account important design issues like database persistence. Read Running MySQL on Amazon EC2 with Elastic Block Store.

Debugging

Since you are passing code to the new EC2 instance, there is a very small chance that you may have made a mistake in writing the software. Well maybe not you, but somebody else out there might not be perfect, so I have to write this for them.

The stdout and stderr of your user-data script is output in /var/log/syslog and you can review this for any success and failure messages. It will contain both things you echo directly in the script as well as output from programs you run.

Tip: If you add set -x at the top of a bash script, then it will output every command executed. If you add set -e to the script, then the user-data script will exit on the first command which does not succeed. These help you quickly identify where problems might have started.

Limitations

Amazon EC2 limits the size of user-data to 16KB. If your startup instructions are larger than this limit, you can write a user-data script which downloads the full program(s) from somewhere else like S3 and runs them.

Though a shell is a handy tool for writing scripts to install and configure software, the user-data script can be written in any language which supports the shabang (#!) mechanism for running programs. This includes bash, Perl, Python, Ruby, tcl, awk, sed, vim, make, or any other language you can find pre-installed on the image.

If you want to use another language, a user-data script written in bash could install the language, install the program, and then run it.

Security

Setting up a new EC2 instance often requires installing private information like EC2 keys and certificates (e.g., to make AWS API calls). You should be aware that if you pass secrets in the user-data parameter, the complete input is available to any user or process running on the instance.

There is no way to change the instance user-data after instance startup, so anybody who has access to the instance can simply request http://169.254.169.254/latest/user-data

Depending on what software you install on your instance, even Internet users may be able to exploit holes to get at your user-data. For example, if your web server lets users specify a URL to upload a file, they might be able to enter the above URL and then read the contents.

Alternatives

Though user-data scripts are my favorite method to set up EC2 instances, it’s not always the appropriate approach. Alternatives include:

  1. Manually ssh in to the instance and enter commands to install and configure software.

  2. Automatically ssh in to the instance with automated commands to install and configure software.

  3. Install and configure software using (1) or (2) and then rebundle the instance to create a new AMI. Use the new image when running instances.

  4. Build your own EC2 images from scratch.

The ssh options have the benefit of not putting any private information into the user-data accessible from the instance. They have the disadvantage of needing to monitor new instances waiting for the ssh server to accept connections; this complicates the startup process compared to user-data scripts.

The rebundled AMI approach and building your own AMI approach are useful when the installation and configuration of your required software take a very long time or can’t be done with automated processes (less common than you might think). A big drawback of creating your own AMIs is maintaining them, keeping up with security patches and other enhancements and fixes which might be applied by the base image maintainers.

Software

Note to AMI authors: If you wish to add to your EC2 images the same ability to run user-data scripts, feel free to include the following code and make it run on image startup:

http://ec2-run-user-data.notlong.com

Credits

Thanks to RightScale for the original idea of EC2 images with user-data startup hooks. RightScale has advanced startup plugins which include scripts, software packages, and attachments, all of which integrate with the RightScale service.

Thanks to Kim Scheibel and Jorge Oliveira who submitted code used in the original ec2-run-user-data script.

What do you use EC2 user-data for?

Updated Tutorial: Running MySQL on Amazon EC2 with EBS (now supports AppArmor)

The following tutorial (originally published in Aug ‘08) has been extensively updated today:

Running MySQL on Amazon EC2 with Elastic Block Store (EBS)

This tutorial explains one approach to using Amazon’s persistent storage mechanism as the backing for a database and includes pointers on how to create snapshots for secure backups.

The primary goal of the updates was to put forth an approach which works not only on the current Ubuntu and Debian AMIs published on https://alestic.com but also with new AMIs which use the Canonical kernels as well as the new Ubuntu AMIs published by Canonical.

Ubuntu AMIs which use the new Canonical kernels may have AppArmor enabled. The original tutorial required workarounds to function in this environment, but the new tutorial keeps files right where MySQL and the AppArmor configuration expect them to be, while at the same time keeping them on the EBS volume.

There is also a plethora of “sudo"s spread around the tutorial so that it will work if you connected to your instance using a normal, non-root user, as is required by the Canonical AMIs.

I have tested these instructions on a few different AMIs. Please let me know if you run into any problems or have suggestions for improvement.

=> Go read the tutorial

Amazon Launches CloudWatch Monitoring Service for EC2

A few hours ago, Amazon launched a monitoring service for EC2 instances which they are calling CloudWatch. The service costs 1.5 cents per hour per EC2 instance (of any size) which comes out to $10.95 per month for an instance running 24x7.

The concurrently announced Load Balancing and Auto Scaling services are powerful, but I’m not so sure that CloudWatch is going to be useful by itself.

My initial impression on using CloudWatch is that it is hard enough to set up and use that most folks are going to get lost figuring out how to get regular, useful information out of it. Some of this could be alleviated by improved documentation, but I still think the direct, raw usage has a small target audience.

Most users on EC2 should be able to get by with free monitoring packages like munin. Since munin is running on the instance itself, it has access to many more metrics than CloudWatch. Plus it provides pretty graphs which are much easier on the eye than the raw CloudWatch output.

Munin is also trivial to set up on Ubuntu. It takes one command:

sudo apt-get install munin munin-node apache2

Wait 10 minutes for it to start collecting data, then point your browser at http://HOSTNAME/munin

There is a bit more work to do if you want to collect all of your munin data for multiple servers in a central location or to create summary charts combining metrics, but you can get a lot of value from just the above.

Reasons you might end up using CloudWatch include:

  1. You are using Amazon’s new EC2 Auto Scaling feature which requires CloudWatch. In this case, you shouldn’t have to worry about the gory details since Auto Scaling will take care of the monitoring for you.

  2. You need access to accurate network and disk IO numeric values measured in the same way that Amazon uses to charge you. E.g., you might be running sets of instances for clients and want to pass on EC2 charges to them.

  3. You are using a lot of EC2 instances in a large organization and have the time and expertise to implement data collection with CloudWatch for presenting in your own internal reports.

  4. You are creating some tools to help other people use the CloudWatch service more easily and with pretty graphs.

On that last point, I think there is an interesting opportunity for somebody to write munin plugins for CloudWatch. It looks like the monitoring data is available on a near-real time basis, and with a bit of state-keeping it should be possible to get graphs which closely represent Amazon’s monitoring records.

I’ve posted some of my feedback from testing CloudWatch on the EC2 forum.

If you’ve had a chance to check out CloudWatch, what is your opinion?

Credits and ThankYou's

With the conversion of the web site format for Alestic.com from a single page to more of a blog, the “Credits” section got lost, so I figured I’d post it here to thank some of the many folks who have been involved in providing input (directly or indirectly) into the original, community Ubuntu and Debian AMIs published on https://alestic.com

Escaping Restrictive/Untrusted Networks with OpenVPN on EC2

Perhaps you are behind a corporate firewall which does not allow you to access certain types of resources on the Internet. Or, perhaps you are accessing the Internet over an open wifi where you do not trust your network traffic to your fellow wifi users or the admins running the local network.

These instructions guide you in setting up an OpenVPN server on an EC2 instance, sending all your network traffic through a secure channel to port 80 on the EC2 instance and from there out to the Internet.

EC2 Instance

Run the latest Ubuntu 9.10 Karmic image. You can find the most current AMI id in a table on https://alestic.com

ec2-run-instances --key <KEYPAIR> ami-1515f67c

Make a note of the instance id (e.g., i-6fceba06). Watch the status using a command like this (replace with your own instance id):

ec2-describe-instances <INSTANCEID>

Repeat the describe instances command until it shows that the instance is “running” and make a note of the external hostname (e.g., ec2-75-101-179-94.compute-1.amazonaws.com).

Connect to the instance using the external hostname you noted.

remotehost=<HOSTNAME>
ssh -i <KEYPAIR>.pem ubuntu@$remotehost

OpenVPN Server

Upgrade the EC2 instance and install the necessary OpenVPN software:

sudo apt-get update &&
sudo apt-get upgrade -y &&
sudo apt-get install -y openvpn
sudo modprobe iptable_nat
echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward
sudo iptables -t nat -A POSTROUTING -s 10.4.0.1/24 -o eth0 -j MASQUERADE

Generate a secret key for secure communication with OpenVPN:

sudo openvpn --genkey --secret ovpn.key

Start the OpenVPN server on the EC2 instance. We are (ab)using port 80 because most closed networks will allow traffic to this port ;-)

sudo openvpn                     \
  --proto tcp-server             \
  --port 80                      \
  --dev tun1                     \
  --secret ovpn.key              \
  --ifconfig 10.4.0.1 10.4.0.2   \
  --daemon

OpenVPN Client

Back on the local (non-EC2) workstation, set up the software:

sudo apt-get install -y openvpn
sudo modprobe tun # if necessary
sudo iptables -I OUTPUT -o tun+ -j ACCEPT
sudo iptables -I INPUT -i tun+ -j ACCEPT

Download the secret key from the EC2 instance:

ssh -i <KEYPAIR>.pem ubuntu@$remotehost 'sudo cat ovpn.key' > ovpn.key
chmod 600 ovpn.key

Start the OpenVPN client:

sudo openvpn                    \
  --proto tcp-client            \
  --remote $remotehost          \
  --port 80                     \
  --dev tun1                    \
  --secret ovpn.key             \
  --redirect-gateway def1       \
  --ifconfig 10.4.0.2 10.4.0.1  \
  --daemon

Edit /etc/resolv.conf and set it so that DNS is resolved by the EC2 name server:

sudo mv /etc/resolv.conf /etc/resolv.conf.save
echo "nameserver 172.16.0.23" | sudo tee /etc/resolv.conf

You should now be able to access any Internet resource securely and without restriction.

Teardown

When you are done with this OpenVPN tunnel, remember to shut down the EC2 instance and restore the DNS configuration:

sudo killall openvpn
ec2-terminate-instances <INSTANCEID>
sudo mv /etc/resolv.conf.save /etc/resolv.conf

If you have ways to improve this approach, please leave a comment.

Disclaimer

These instructions are not intended to assist in illegal activities. If you are breaking the laws or rules of your government or college or company or ISP, then you should understand the security implications of the above steps better than I do and be willing to accept consequences of your actions.

[Update 2009-01-15: Upgrade instructions for Karmic]

Using sudo, ssh, rsync on the Official Ubuntu Images for EC2

The official Ubuntu images for EC2 do not allow ssh directly to the root account, but instead provide access through a normal “ubuntu” user account. This practice fits the standard Ubuntu security model available in other environments and, admittedly, can take a bit of getting used to if you are not familiar with it.

This document describes how to work inside this environment using the “ubuntu” user and the sudo utility to execute commands as the root user when necessary.

Official Ubuntu Images for Amazon EC2 from Canonical

Canonical has released official Ubuntu images for EC2 for Ubuntu 9.10 Karmic.

The primary technical benefit brought by Canonical's involvement in building official Ubuntu images is that custom kernels can be built for EC2 through a relationship with Amazon. This means that the Ubuntu images can now run on more modern Ubuntu kernels instead of on Amazon's older, Fedora kernels.

Other differences are listed below:

Alestic.com Ubuntu images Canonical Ubuntu images
Kernel 2.6.21 Karmic: 2.6.31
Releases 9.04 Jaunty
8.10 Intrepid
8.04 Hardy (LTS)
7.10 Gutsy (obsolete)
7.04 Feisty (obsolete)
6.10 Edgy (obsolete)
6.06 Dapper (LTS)
9.10 Karmic
Flavors server
desktop
server
ssh access ssh to root ssh to "ubuntu" with sudo to root
Apt Sources main
restricted
universe
multiverse
Alestic PPA
main
restricted
universe
Apt Mirror Jaunty, Intrepid, Hardy:
ec2-us-east-mirror.rightscale.com (load balanced with failover)
Others: us.archive.ubuntu.com
US: us.ec2.archive.ubuntu.com
EU: eu.ec2.archive.ubuntu.com
Default runlevel runlevel 4 runlevel 2
Tools Amazon EC2 AMI tools installed
runurl installed
euca2ools installed
Amazon tools available (multiverse)
runurl available through Alestic PPA

Items listed are likely to change as images are enhanced. This table may or may not be updated to match. Please leave comments if you notice or question other differences.

Note: There are some older (2009-04) Canonical AMIs floating around for Hardy and Intrepid. These have not been maintained and are not recommended at this point.

Updated 2009-06-15: Alestic.com Jaunty is using an Ubuntu mirror inside EC2. Alestic.com images using load balanced mirror with failover between EC2 availability zones.

Updated 2009-06-25: Alestic.com published Karmic (Alpha) but later withdrew.

Updated 2009-10-29: Canonical released Karmic. None of the image currently have RightScale support built in, but RightScale has their own Ubuntu AMIs.

New releases of Ubuntu AMIs for Amazon EC2 2009-04-23 (Jaunty released)

As you may have heard, Ubuntu 9.04 Jaunty has been officially released by Ubuntu today, right on schedule:

http://ubuntu.com

Matching updates have been released for the Ubuntu 9.04 Jaunty AMIs listed on:

https://alestic.com

Please note that we are still defaulting to Amazon’s 2.6.21fc8 kernel which is getting older and older for each new release of Ubuntu. Please do let the group know if you find incompatibilities with Ubuntu Jaunty other than the known problem that AppArmor is not enabled.

You might be able to run the 9.04 Jaunty image with the official Ubuntu 2.6.27 kernel (for Intrepid) which is currently in release candidate state from Canonical.

For what it’s worth, I still run Ubuntu 8.04 LTS Hardy on Amazon EC2 personally and for my company.

New releases of Ubuntu AMIs for Amazon EC2 2009-04-18 (XFS fixes)

New updates have been released for all* of the Ubuntu and Debian AMIs listed on:

https://alestic.com

The primary enhancements in this release are:

  • The images which were experiencing problems with XFS and the Amazon 2.6.21fc8 kernel have been fixed by installing an XFS kernel module which matches Amazon’s kernel. This includes Ubuntu Intrepid, Ubuntu Jaunty, Debian Lenny, and Debian Squeeze.

  • The Ubuntu 9.04 Jaunty image is using release candidate software. The official Jaunty release is expected April 23.

  • At the request of the Amazon security folks, ssh PasswordAuthentication has been disabled by default on the server images. Even though the base images have passwords disabled on the root account, some folks may be creating accounts with poor passwords susceptible to attacks. The desktop images require password authentication for NX (as far as I know) so please use secure passwords.

  • The desktop images have been upgraded to a recent version of NX Free Edition software.

  • This is the last published image for Ubuntu 7.10 Gutsy. This version has reached its end of life on April 18 and should not be used any more unless you really need to test something on Gutsy and you aren’t going to leave it running long (no security patches available).

All of the AMIs are available in both the US and European regions.

Notes:

  • The Ubuntu 6.10 Edgy, 7.04 Feisty, and 7.10 Gutsy AMIs are obsolete and unsupported. Running these images introduces a security risk as no security patches are being produced any more by Ubuntu.
New releases of Ubuntu Jaunty AMIs for Amazon EC2 2009-03-29

New updates have been released for the Ubuntu Jaunty AMIs on

https://alestic.com

Jaunty recently moved from “alpha” to “beta” in preparation for its official release as Ubuntu 9.04 next month.

For details on what is new in Jaunty, see:

http://www.ubuntu.com/testing/jaunty/beta

This is beta software and is not suitable for production use.

All of the AMIs are available in both the US and European regions.

New releases of Ubuntu AMIs for Amazon EC2 2009-02-16 (EC2 mirrors)

New updates have been released for all* of the Ubuntu and Debian AMIs listed on:

https://alestic.com

The primary enhancements in this release are:

  • Ubuntu Hardy and Intrepid have new apt sources.list pointing to the local EC2 mirrors provided by RightScale. Please let me know if you have any problems with updates.

  • Debian “lenny” has been released as the new “stable”. Debian “squeeze” is the new “testing”, so the latest Debian mapping is as follows:

    squeeze - “testing” lenny - “stable” etch - “oldstable”

As always, “sid” is “unstable” and I can’t imagine why you would want to run this on EC2 unless you’re a Debian developer in which case you should probably built your own AMIs.

When I run “squeeze” it thinks that it is “lenny” (lsb_release -a). I assume that this is because it has just been branched from lenny but it’s possible that I didn’t build it correctly. Let me know if you have further information on this.

Notes:

  • The Ubuntu 6.10 Edgy and 7.04 Feisty AMIs are obsolete, unsupported, and are not updated.

  • The AMIs are in the process of being copied to eu-west-1 (Europe). Documentation will be updated soon.

New releases of Ubuntu AMIs for Amazon EC2 2008-12-22

New updates have been released for all* of the Ubuntu and Debian AMIs listed on:

https://alestic.com

The primary enhancements in this release are:

  • The EC2 AMI tools have been upgraded to 1.3-30748. This adds support for EC2 regions including the new eu-west-1 European region.

  • AMIs have been created for Ubuntu Jaunty Jackelope alpha (planned for release 2009-04). This is alpha software and is not suitable for production use.

All of the AMIs are available in both the US and European regions.

  • The Ubuntu 6.10 Edgy and 7.04 Feisty AMIs are obsolete and unsupported.
Ubuntu AMIs available in Europe (eu-west-1)

The Ubuntu and Debian images listed on https://alestic.com are now available in both the US (us-east-1) and Europe (eu-west-1) EC2 regions.

Click on the “Europe” tab at the top of the table to see the new AMI ids for Europe.

Only the most recent images have been copied over to the Europe region. Let me know if you have specific older images which you would like to run in Europe.