Keeping File Ownership (UIDs) Consistent when Using EBS on EC2

| 2 Comments

Persistent storage on Amazon EC2 is accomplished through the use of Elastic Block Store (EBS) volumes. EBS is basically a storage area network (SAN) and can be thought of as an on-demand, virtual, redundant hard drive plugged in to the server with super-powers like snapshot/restore.

An EBS volume can be detached from one EC2 instance and attached to another. You can create a snapshot of an EBS volume and create new volumes from the snapshot to attach to other instances. Though this flexibility provides some useful abilities, it also presents some challenges.

In particular, the files stored on the EBS volume will be owned by specific numeric UIDs (users) and GIDs (groups). When you fire up and configure a new instance, the UIDs and GIDs on the EBS volume may not exactly match the numeric ids of the users and groups on the new instance, depending on how you set it up.

For example, when you install the MySQL software, the package will generally create a new “mysql” user with the next available UID. If you don’t create the various users in exactly the same order on new instances, you may end up with your database files owned by the “postfix” user instead of the “mysql” user. It’s happened to me and I’m not the only one.

There is a discussion about this topic on the ec2ubuntu Google Group and it has also been raised on Canonical’s EC2 beta mailing list.

Here are some of the different approaches to avoiding or fixing this problem:

  1. Bundle your own AMIs and always run instances of the same AMI when attaching EBS volumes with files. This works if you already have to bundle your AMIs for other reasons, but I often recommend against AMI rebundling because of the efforts involved, lack of reproducibility, and maintenance problems when the base image gets updated or has bugs fixed.

  2. Automate the creation of users and installation of packages in exactly the same order every time. This is likely to give you the same UID/GID values for each user, but it starts to get messy if you end up with an order mixing human users and software package users:

  3. Create all users/groups with hardcoded UIDs/GIDs before installing software packages. If you automate the creation of users and groups you can force the “mysql” and “postfix” users to have a specific UID value. Then you install the MySQL and Postfix packages and the software will use the users which already exist on the system. We ended up following this approach with our EC2 servers at CampusExplorer.com

  4. Correct the ownership of files after mounting the EBS volume. This feels a bit messy to me, but it might be the only option in some cases. I must admit that I’ve done this manually a number of times, but only after finding problems like MySQL not starting because the files aren’t owned by the correct user. For example, say you needed to change files currently owned by “postfix” to be correclty owned by “mysql”:

    find /vol -user postfix -print0 | xargs -0 chown mysql
    

    If you are changing ownership of files after mounting the EBS volume, make sure you do it in an order which does not lose information. For example, if you have to swap “postfix” and “mysql” users, you’ll need to use a temporary third UID as a placeholder.

  5. On the ec2ubuntu Google group it was suggested that a central authority might be a way to solve the problem. I’ve never used this approach on Linux and am not sure how much work it would be setting up a reliable service like this on EC2.

No matter what approach you use, it might be a good idea to add in some checks after you mount an EBS volume to make sure that the files are owned by the appropriate users. For example, you might verify that the mysql directory is owned by the mysql user

Solving this problem is something that I have only begun to work on, so I would appreciate any comments, pointers, and solutions that you may have.

2 Comments

Thanks for the great blog!

Have you come up with a reliable (and preferably easy to install / maintain) solution for synchronising UIDs/GIDs across instances?

arran:

Great question! In the blog article above, I list 5 approaches for keeping UIDs/GIDs synchronized across instances.

Leave a comment

Ubuntu AMIs

Ubuntu AMIs for EC2:


More Entries

Replacing a CloudFront Distribution to "Invalidate" All Objects
I was chatting with Kevin Boyd (aka Beryllium) on the ##aws Freenode IRC channel about the challenge of invalidating a…
Email Alerts for AWS Billing Alarms
using CloudWatch and SNS to send yourself email messages when AWS costs accrue past limits you define The Amazon documentation…
Cost of Transitioning S3 Objects to Glacier
how I was surprised by a large AWS charge and how to calculate the break-even point Glacier Archival of S3…
Running Ubuntu on Amazon EC2 in Sydney, Australia
Amazon has announced a new AWS region in Sydney, Australia with the name ap-southeast-2. The official Ubuntu AMI lookup pages…
Save Money by Giving Away Unused Heavy Utilization Reserved Instances
You may be able to save on future EC2 expenses by selling an unused Reserved Instance for less than its…
Installing AWS Command Line Tools from Amazon Downloads
When you need an AWS command line toolset not provided by Ubuntu packages, you can download the tools directly from…
Convert Running EC2 Instance to EBS-Optimized Instance with Provisioned IOPS EBS Volumes
Amazon just announced two related features for getting super-fast, consistent performance with EBS volumes: (1) Provisioned IOPS EBS volumes, and…
Which EC2 Availability Zone is Affected by an Outage?
Did you know that Amazon includes status messages about the health of availability zones in the output of the ec2-describe-availability-zones…
Installing AWS Command Line Tools Using Ubuntu Packages
Here are the steps for installing the AWS command line tools that are currently available as Ubuntu packages. These include:…
Ubuntu Developer Summit, May 2012 (Oakland)
I will be attending the Ubuntu Developer Summit (UDS) next week in Oakland, CA.  This event brings people from around…
Uploading Known ssh Host Key in EC2 user-data Script
The ssh protocol uses two different keys to keep you secure: The user ssh key is the one we normally…
Seeding Torrents with Amazon S3 and s3cmd on Ubuntu
Amazon Web Services is such a huge, complex service with so many products and features that sometimes very simple but…
CloudCamp
There are a number of CloudCamp events coming up in cities around the world. These are free events, organized around…
Use the Same Architecture (64-bit) on All EC2 Instance Types
A few hours ago, Amazon AWS announced that all EC2 instance types can now run 64-bit AMIs. Though t1.micro, m1.small,…
ec2-consistent-snapshot on GitHub and v0.43 Released
The source for ec2-conssitent-snapshot has historically been available here: ec2-consistent-snapshot on Launchpad.net using Bazaar For your convenience, it is now…
You Should Use EBS Boot Instances on Amazon EC2
EBS boot vs. instance-store If you are just getting started with Amazon EC2, then use EBS boot instances and stop…
Retrieve Public ssh Key From EC2
A serverfault poster had a problem that I thought was a cool challenge. I had so much fun coming up…
Running EC2 Instances on a Recurring Schedule with Auto Scaling
Do you want to run short jobs on Amazon EC2 on a recurring schedule, but don’t want to pay for…
AWS Virtual MFA and the Google Authenticator for Android
Amazon just announced that the AWS MFA (multi-factor authentication) now supports virtual or software MFA devices in addition to the…
Updated EBS boot AMIs for Ubuntu 8.04 Hardy on Amazon EC2 (2011-10-06)
Canonical has released updated instance-store AMIs for Ubuntu 8.04 LTS Hardy on Amazon EC2. Read Ben Howard’s announcement on the…