Automate EC2 Instance Setup with user-data Scripts

| 12 Comments

user-data Scripts

The Ubuntu and Debian EC2 images published on http://alestic.com allow you to send in a startup script using the EC2 user-data parameter when you run a new instance. This functionality is useful for automating the installation and configuration of software on EC2 instances.

The basic rule followed by the image is:

If the instance user-data starts with the two characters #! then the instance runs it as the root user on the first boot.

The “user-data script” is run late in the startup process, so you can assume that networking and other system services are functional.

If you start an EC2 instance with any user-data which does not start with #! the image simply ignores it and allows your own software to access and use the data as it sees fit.

This same user-data startup script functionality has been copied in the Ubuntu images published by Canonical, and your existing user-data script should be portable across images with little change. Read a comparison of the Alestic and Canonical EC2 images.

Example

Here is a sample user-data script which sets up an Ubuntu LAMP server on a new EC2 instance:

#!/bin/bash
set -e -x
export DEBIAN_FRONTEND=noninteractive
apt-get update && apt-get upgrade -y
tasksel install lamp-server
echo "Please remember to set the MySQL root password!"

Save this to a file named, say, install-lamp and then pass it to a new EC2 instance, say, Ubuntu 9.04 Jaunty:

ec2-run-instances --key KEYPAIR --user-data-file install-lamp ami-bf5eb9d6

Please see http://alestic.com for the latest AMI ids for Ubuntu and Debian.

Note: This simplistic user-data script is for demonstration purposes only. Though it does set up a fully functional LAMP server which may be as good as some public LAMP AMIs, it does not take into account important design issues like database persistence. Read Running MySQL on Amazon EC2 with Elastic Block Store.

Debugging

Since you are passing code to the new EC2 instance, there is a very small chance that you may have made a mistake in writing the software. Well maybe not you, but somebody else out there might not be perfect, so I have to write this for them.

The stdout and stderr of your user-data script is output in /var/log/syslog and you can review this for any success and failure messages. It will contain both things you echo directly in the script as well as output from programs you run.

Tip: If you add set -x at the top of a bash script, then it will output every command executed. If you add set -e to the script, then the user-data script will exit on the first command which does not succeed. These help you quickly identify where problems might have started.

Limitations

Amazon EC2 limits the size of user-data to 16KB. If your startup instructions are larger than this limit, you can write a user-data script which downloads the full program(s) from somewhere else like S3 and runs them.

Though a shell is a handy tool for writing scripts to install and configure software, the user-data script can be written in any language which supports the shabang (#!) mechanism for running programs. This includes bash, Perl, Python, Ruby, tcl, awk, sed, vim, make, or any other language you can find pre-installed on the image.

If you want to use another language, a user-data script written in bash could install the language, install the program, and then run it.

Security

Setting up a new EC2 instance often requires installing private information like EC2 keys and certificates (e.g., to make AWS API calls). You should be aware that if you pass secrets in the user-data parameter, the complete input is available to any user or process running on the instance.

There is no way to change the instance user-data after instance startup, so anybody who has access to the instance can simply request http://169.254.169.254/latest/user-data

Depending on what software you install on your instance, even Internet users may be able to exploit holes to get at your user-data. For example, if your web server lets users specify a URL to upload a file, they might be able to enter the above URL and then read the contents.

Alternatives

Though user-data scripts are my favorite method to set up EC2 instances, it’s not always the appropriate approach. Alternatives include:

  1. Manually ssh in to the instance and enter commands to install and configure software.

  2. Automatically ssh in to the instance with automated commands to install and configure software.

  3. Install and configure software using (1) or (2) and then rebundle the instance to create a new AMI. Use the new image when running instances.

  4. Build your own EC2 images from scratch.

The ssh options have the benefit of not putting any private information into the user-data accessible from the instance. They have the disadvantage of needing to monitor new instances waiting for the ssh server to accept connections; this complicates the startup process compared to user-data scripts.

The rebundled AMI approach and building your own AMI approach are useful when the installation and configuration of your required software take a very long time or can’t be done with automated processes (less common than you might think). A big drawback of creating your own AMIs is maintaining them, keeping up with security patches and other enhancements and fixes which might be applied by the base image maintainers.

Software

Note to AMI authors: If you wish to add to your EC2 images the same ability to run user-data scripts, feel free to include the following code and make it run on image startup:

http://ec2-run-user-data.notlong.com

Credits

Thanks to RightScale for the original idea of EC2 images with user-data startup hooks. RightScale has advanced startup plugins which include scripts, software packages, and attachments, all of which integrate with the RightScale service.

Thanks to Kim Scheibel and Jorge Oliveira who submitted code used in the original ec2-run-user-data script.

What do you use EC2 user-data for?

12 Comments

Extremely useful, thanks for the tip! And for the AMIs as well, of course.

Would it be possible for the script to somehow permanently disable or block the 169.254.169.254 server after it has grabbed the data?

tlrobinson: Interesting idea. Most of the startup use of the meta-data should be completed by the time the user-data script is run. However:

1. If you blocked 169.254.169.254 with, say, iptables, you'd need to make sure that it isn't blocked on reboot as the startup scripts would need to check meta-data on each reboot to see if they need to run certain procedures.

2. You'd need to make sure that access is blocked again, even though the user-data is not run on reboots.

3. There may be some other software you run on the instance which uses the meta-data and it may not be easy to identify it.

Another method is to pass some initial semi-private info to the instance via userdata, but to do a global change of passwords / credentials manually immediately after login (in one step).

I'm new to this, but...

Seems like if you were willing to burn your keys/certs into a private image (and, btw, how BAD is this?), you could write a script that took an url to a private s3 object and downloaded THAT and executed THAT as a script. This technique is mentioned above with regard to the 16K size limitation, but it could also be used to hide the script itself and any data in it. I think. Maybe. Ja?


Trying to run a user-script with Debian5.0 Lenny from Alestic get this error.

Aug 28 20:08:35 ip-10-245-215-80 S71ec2-run-user-data: Retrieving user-data
Aug 28 20:08:35 ip-10-245-215-80 S71ec2-run-user-data: curl: (22) The requested URL returned error: 404
Aug 28 20:08:35 ip-10-245-215-80 S71ec2-run-user-data: No user-data available


If i check the user-data with ec2-metadata is aviable.

Any ideas ?

Ian: You don't even need to put your access keys on the image to download a private S3 object. You can build S3 URLs which give access to a specific key for a specific amount of time. We use this in user-data scripts to pass in secret information which we don't want to be forever available to everybody on the instance.

neoecos: That's an odd one. If you can access the user-data, then the startup script should also be able to. If you can ever reliably reproduce this, please post to the ec2ubuntu Google group with the steps.

Eric: Thanks for this and many other great resources! Sorry if this is a silly question, but I'm rather new to EC2 and I can't seem to find *any* real details about the --user-data and --user-data-file options. Based on this article, I understand that I can do at least that much, but I'd like to know what's happening "under the hood." Where does the script "live" on the instance? You mention security concerns, but I don't fully understand, because I don't know how/where instance users might access my user data script. Is there a more detailed resource out there that I'm completely missing? Thanks again!

jamie: There isn't a whole lot of documentation available on user-data, but it is a fairly simple concept. Amazon talks about it a bit in the developer guide:

http://docs.amazonwebservices.com/AWSEC2/latest/DeveloperGuide/index.html?AESDG-chapter-instancedata.html

The user-data is available to any user or process that can make a network connection from the instance.

Is it possible to run user defined script in an already existing instance? I have tried with modifying instance attribute(user data) and then starting the instance .But it is not working.
Please suggest.

poulomi:

It is possible to change the content of the user-data on an instance while the instance is in the "stopped" state. However, since the user-data script is only run on the *first* boot of an instance, your new user-data content will not be executed.

If you want code to be run on every boot of an instance, you should insert it into the standard boot time scripts that are available with your Linux distro.

Leave a comment

Ubuntu AMIs

Ubuntu AMIs for EC2:


More Entries

EC2 create-image Does Not Fully "Stop" The Instance
The EC2 create-image API/command/console action is a convenient trigger to create an AMI from a running (or stopped) EBS boot instance. It takes a snapshot of the instance’s EBS volume(s)…
Finding the Region for an AWS Resource ID
use concurrent AWS command line requests to search the world for your instance, image, volume, snapshot, … Background Amazon EC2 and many other AWS services are divided up into various…
Changing The Default "ubuntu" Username On New EC2 Instances
configure your own ssh username in user-data The official Ubuntu AMIs create a default user with the username ubuntu which is used for the initial ssh access, i.e.: ssh ubuntu@<HOST>…
Default ssh Usernames For Connecting To EC2 Instances
Each AMI publisher on EC2 decides what user (or users) should have ssh access enabled by default and what ssh credentials should allow you to gain access as that user.…
New c3.* Instance Types on Amazon EC2 - Nice!
Worth switching. Amazon shared that the new c3.* instance types have been in high demand on EC2 since they were released. I finally had a minute to take a look…
Query EC2 Account Limits with AWS API
Here’s a useful tip mentioned in one of the sessions at AWS re:Invent this year. There is a little known API call that lets you query some of the EC2…
Using aws-cli --query Option To Simplify Output
My favorite session at AWS re:Invent was James Saryerwinnie’s clear, concise, and informative tour of the aws-cli (command line interface), which according to GitHub logs he is enhancing like crazy.…
Reset S3 Object Timestamp for Bucket Lifecycle Expiration
use aws-cli to extend expiration and restart the delete or archive countdown on objects in an S3 bucket Background S3 buckets allow you to specify lifecycle rules that tell AWS…
Installing aws-cli, the New AWS Command Line Tool
consistent control over more AWS services with aws-cli, a single, powerful command line tool from Amazon Readers of this tech blog know that I am a fan of the power…
Using An AWS CloudFormation Stack To Allow "-" Instead Of "+" In Gmail Email Addresses
Launch a CloudFormation template to set up a stack of AWS resources to fill a simple need: Supporting Gmail addresses with “-” instead of “+” separating the user name from…
New Options In ec2-expire-snapshots v0.11
The ec2-expire-snapshots program can be used to expire EBS snapshots in Amazon EC2 on a regular schedule that you define. It can be used as a companion to ec2-consistent-snapshot or…
Replacing a CloudFront Distribution to "Invalidate" All Objects
I was chatting with Kevin Boyd (aka Beryllium) on the ##aws Freenode IRC channel about the challenge of invalidating a large number of CloudFront objects (35,000) due to a problem…
Email Alerts for AWS Billing Alarms
using CloudWatch and SNS to send yourself email messages when AWS costs accrue past limits you define The Amazon documentation describes how to use the AWS console to monitor your…
Cost of Transitioning S3 Objects to Glacier
how I was surprised by a large AWS charge and how to calculate the break-even point Glacier Archival of S3 Objects Amazon recently introduced a fantastic new feature where S3…
Running Ubuntu on Amazon EC2 in Sydney, Australia
Amazon has announced a new AWS region in Sydney, Australia with the name ap-southeast-2. The official Ubuntu AMI lookup pages (1, 2) don’t seem to be showing the new location…
Save Money by Giving Away Unused Heavy Utilization Reserved Instances
You may be able to save on future EC2 expenses by selling an unused Reserved Instance for less than its true value or even $0.01, provided it is in the…
Installing AWS Command Line Tools from Amazon Downloads
When you need an AWS command line toolset not provided by Ubuntu packages, you can download the tools directly from Amazon and install them locally. In a previous article I…
Convert Running EC2 Instance to EBS-Optimized Instance with Provisioned IOPS EBS Volumes
Amazon just announced two related features for getting super-fast, consistent performance with EBS volumes: (1) Provisioned IOPS EBS volumes, and (2) EBS-Optimized Instances. Starting new instances and EBS volumes with…
Which EC2 Availability Zone is Affected by an Outage?
Did you know that Amazon includes status messages about the health of availability zones in the output of the ec2-describe-availability-zones command, the associated API call, and the AWS console? Right…
Installing AWS Command Line Tools Using Ubuntu Packages
See also: Installing AWS Command Line Tools from Amazon Downloads Here are the steps for installing the AWS command line tools that are currently available as Ubuntu packages. These include:…
Ubuntu Developer Summit, May 2012 (Oakland)
I will be attending the Ubuntu Developer Summit (UDS) next week in Oakland, CA. ┬áThis event brings people from around the world together in one place every six months to…
Uploading Known ssh Host Key in EC2 user-data Script
The ssh protocol uses two different keys to keep you secure: The user ssh key is the one we normally think of. This authenticates us to the remote host, proving…
Seeding Torrents with Amazon S3 and s3cmd on Ubuntu
Amazon Web Services is such a huge, complex service with so many products and features that sometimes very simple but powerful features fall through the cracks when you’re reading the…
CloudCamp
There are a number of CloudCamp events coming up in cities around the world. These are free events, organized around the various concepts, technologies, and services that fall under the…
Use the Same Architecture (64-bit) on All EC2 Instance Types
A few hours ago, Amazon AWS announced that all EC2 instance types can now run 64-bit AMIs. Though t1.micro, m1.small, and c1.medium will continue to also support 32-bit AMIs, it…
ec2-consistent-snapshot on GitHub and v0.43 Released
The source for ec2-conssitent-snapshot has historically been available here: ec2-consistent-snapshot on Launchpad.net using Bazaar For your convenience, it is now also available here: ec2-consistent-snapshot on GitHub using Git You are…
You Should Use EBS Boot Instances on Amazon EC2
EBS boot vs. instance-store If you are just getting started with Amazon EC2, then use EBS boot instances and stop reading this article. Forget that you ever heard about instance-store…
Retrieve Public ssh Key From EC2
A serverfault poster had a problem that I thought was a cool challenge. I had so much fun coming up with this answer, I figured I’d share it here as…
Running EC2 Instances on a Recurring Schedule with Auto Scaling
Do you want to run short jobs on Amazon EC2 on a recurring schedule, but don’t want to pay for an instance running all the time? Would you like to…
AWS Virtual MFA and the Google Authenticator for Android
Amazon just announced that the AWS MFA (multi-factor authentication) now supports virtual or software MFA devices in addition to the physical hardware MFA devices like the one that’s been taking…