Uploading Known ssh Host Key in EC2 user-data Script

The ssh protocol uses two different keys to keep you secure:

  1. The user ssh key is the one we normally think of. This authenticates us to the remote host, proving that we are who we say we are and allowing us to log in.

  2. The ssh host key gets less attention, but is also important. This authenticates the remote host to our local computer and proves that the ssh session is encrypted so that nobody can be listening in.

Every time you see a prompt like the following, ssh is checking the host key and asking you to make sure that your session is going to be encrypted securely.

The authenticity of host 'ec2-...' can't be established.
ECDSA key fingerprint is ca:79:72:ea:23:94:5e:f5:f0:b8:c0:5a:17:8c:6f:a8.
Are you sure you want to continue connecting (yes/no)? 

If you answer “yes” without verifying that the remote ssh host key fingerprint is the same, then you are basically saying:

I don’t need this ssh session encrypted. It’s fine for any man-in-the-middle to intercept the communication.

Ouch! (But a lot of people do this.)

Retrieve Public ssh Key From EC2

A serverfault poster had a problem that I thought was a cool challenge. I had so much fun coming up with this answer, I figured I’d share it here as it demonstrates a few handy features of EC2.

Challenge

The basic need is to get the public ssh key from a keypair that exists inside of EC2. You don’t have access to the private key at the moment (but somebody else does or you will at a different location).

The AWS console and EC2 API do not let you ask for the public ssh key associated with a keypair. However, EC2 does pass the public ssh key to a new EC2 instance when you run it with a specific keypair.

The problem is that we don’t currently have the private key, so we can’t log in to the EC2 instance to get the public key. (Besides, if we did have the private key, we could extract the public key from it directly.)

Solution

I proposed creating a user-data script that sends the public ssh key to the EC2 instance console output. You can retrieve the console output without logging in to the EC2 instance.

Running EC2 Instances on a Recurring Schedule with Auto Scaling

Do you want to run short jobs on Amazon EC2 on a recurring schedule, but don’t want to pay for an instance running all the time?

Would you like to do this using standard Amazon AWS services without needing an external server to run and terminate the instance?

Amazon EC2 Auto Scaling is normally used to keep a reasonable number of instances running to handle measured or expected load (e.g., web site traffic, queue processing).

In this article I walk through the steps to create an Auto Scaling configuration that runs an instance on a recurring schedule (e.g., four times a day) starting up a pre-defined task and letting that instance shut itself down when it is finished. We tweak the Auto Scaling group so that this uses the minimum cost in instance run time, even though we may not be able to predict in advance exactly how long it will take to complete the job.

Here’s a high level overview for folks familiar with Auto Scaling:

Logging user-data Script Output on EC2 Instances

Real time access to user-data script output

The early implementations of user-data scripts on EC2 automatically sent all output of the script (stdout and stderr) to /var/log/syslog as well as to the EC2 console output, to help monitor the startup progress and to debug when things went wrong.

The recent Ubuntu AMIs still send user-data script to the console output, so you can view it remotely, but it is no longer available in syslog on the instance. The console output is only updated a few minutes after the instance boots, reboots, or terminates, which forces you to wait to see the output of the user-data script as well as not capturing output that might come out after the snapshot.

Here is an example written in bash that demonstrates how to send all user-data script stdout and stderr automatically, transparently, and simultaneously to three locations:

Ubuntu Karmic Desktop on EC2

As Thilo Maier pointed out in comments on my request for UDS input, I have been publishing both server and desktop AMIs for running Ubuntu on EC2 up through Jaunty, but the official Karmic AMIs on EC2 only support server installations by default.

Ubuntu makes it pretty easy to install the desktop software on a server, and NX from NoMachine makes it pretty easy to access that desktop remotely, with near real-time interactivity even over slowish connections.

Here’s a quick guide to setting this up, starting with an Ubuntu 9.10 Karmic AMI on Amazon EC2:

  1. Create a user-data script which installs runurl (not on Karmic AMIs by default) and then runs a simple desktop and NX server installation script. Examine the desktop script to see what it’s doing to install the software.

     cat <<EOM >install-desktop
     #!/bin/bash -ex
     wget -qO/usr/bin/runurl run.alestic.com/runurl
     chmod 755 /usr/bin/runurl
     runurl run.alestic.com/install/desktop
     EOM
    
  2. Start an instance on EC2 telling it to run the above user-data script on first boot. The following example uses the current 32-bit Karmic server AMI. Make sure you’re using the latest AMI id.

     ec2-run-instances                   \
       --key YOURKEY                     \
       --user-data-file install-desktop  \
       ami-1515f67c
    
  3. Connect to the new instance and wait for it to complete the desktop software installation (when sshd is restarted). This takes about 30 minutes on an m1.small instance and 10 minutes on a c1.medium instance. Then generate and set a secure password for the ubuntu user using copy/paste from the pwgen output. Save the secure password so you can enter it into the NX client later.

     ssh -i YOURKEY.pem ubuntu@THEHOST
     tail -f /var/log/syslog | egrep --line-buffer user-data:
     pwgen -s 16 1
     sudo passwd ubuntu
    

    If anybody knows how to use ssh keys with NX, I’d love to do this instead of using passwords.

  4. Back on your local system, install and run the NX client. For computers not running Ubuntu, download the appropriate software from NoMachine.

     wget http://64.34.161.181/download/3.4.0/Linux/nxclient_3.4.0-5_i386.deb
     sudo dpkg -i nxclient_3.4.0-5_i386.deb
     /usr/NX/bin/nxclient --wizard &
    

    Point the NX Client to the external hostname of your EC2 instance. Enter the Login “ubuntu” and the Password from above. Choose the “Gnome” desktop.

If all goes well, you should have a complete and fresh Ubuntu desktop filling most of your screen, available for you to mess around with and then throw away.

ec2-terminate-instances INSTANCEID

If you want to have a persistent desktop with protection from crashes, you’ll need to learn how to do things like placing critical directories on EBS volumes.

If you’d like to run KDE on EC2, replace the package “ubuntu-desktop” with “kubuntu-desktop” in the installation script.

runurl - A Tool and Approach for Simplifying user-data Scripts on EC2

Many Ubuntu and Debian images for Amazon EC2 include a hook where scripts passed as user-data will be run as root on the first boot.

At Campus Explorer, we’ve been experimenting with an approach where the actual user-data is a very short script which downloads and runs other scripts. This idea is not new, but I have simplified the process by creating a small tool named runurl which adds a lot of flexibility and convenience when configuring new servers.

Usage

The basic synopsis looks like:

runurl URL [ARGS]...

The first argument to the runurl command is the URL of a script or program which should be run. All following options and arguments are passed verbatim to the program as its options and arguments. The exit code of runurl is the exit code of the program.

The runurl command is a very short and simple script, but it makes the user-data startup scripts even shorter and simpler themselves.

Example 1

If the following content is stored at http://run.alestic.com/demo/echo

#!/bin/bash
echo "$@"

then this command:

runurl run.alestic.com/demo/echo "hello, world"

will itself output:

hello, world

You can specify the “http://” in the URLs, but since it’s using wget to download them, the specifier is not necessary and the code might be easier to read without it.

Example 2

Here’s a more substantial sample user-data script which invokes a number of other remote scripts to upgrade the Ubuntu packages, install the munin monitoring software, install and run the Folding@Home application using origami with credit going to Team Ubuntu. It finally sends an email back home that it’s active.

This sample assumes that runurl is installed on the AMI (e.g., Ubuntu AMIs published on https://alestic.com>). For other AMIs, see below for additional commands to add to the start of the script.

#!/bin/bash -ex
runurl run.alestic.com/apt/upgrade
runurl run.alestic.com/install/munin
cd /root
runurl run.alestic.com/install/folding@home -u ec2 -t 45104 -b small
runurl run.alestic.com/email/start youremail@example.com

Note that the last command passes a parameter to the script, identifying where the email should be sent. Please change this if you test the script.

With the above content stored in a file named folding.user-data, you could start 5 new c1.medium instances running the Folding@Home software using the command:

ec2-run-instances \
  --user-data-file folding.user-data \
  --key [KEYPAIR] \
  --instance-type c1.medium\
  --instance-count 5 \
  ami-ed46a784

You can log on to an instance and monitor the installation with

tail -f /var/log/syslog

Once the Folding@Home application is running, you can monitor its progress with:

/root/origami/origami status

and after 15 minutes, check out the Munin system stats at

http://ec2-HOSTNAME/munin/

Expiring URLs

One of the problems with normal user-data scripts is that the contents exist as long as the instance is running and any user on the instance can read the contents of the user-data. This puts any private or confidential information in the user-data at risk.

If you put your actual startup code in private S3 buckets, you can pass runurl a URL to the contents, where the URL expires shortly after it is run. Or, the script could even delete the contents itself if you set it up correctly. This reduces the exposure to the time it takes for the instance to start up and does not let anybody else access the URL during that time.

Updating

Another benefit of keeping the actual startup code separate from the user-data content itself is that you can modify the startup code stored at the URL without modifying the user-data content.

This can be useful with services like EC2 Auto Scaling, where the specified user-data cannot be dynamically changed in a launch configuration without creating a whole new launch configuration.

If you modify the runurl scripts, the next server to be launched will automatically pick up the new instructions.

Bootstrapping

The runurl tool is pre-installed in the latest Ubuntu AMIs published on https://alestic.com. If you are using an Ubuntu image which does not include this software, you can install it from the Alestic PPA using the following commands at the top of your user-data script:

sudo add-apt-repository ppa:alestic/ppa &&
sudo apt-get update &&
sudo apt-get install -y runurl

If you are using an Ubuntu release without the add-apt-repository command or a Linux distro other than Ubuntu, you can install runurl using the following commands:

sudo wget -qO/usr/bin/runurl run.alestic.com/runurl
sudo chmod 755 /usr/bin/runurl

The subsequent commands in the user-data script can then use the runurl command as demonstrated in the above example.

SSL

To improve your certainty that you are talking to the right server and getting the right data, you could use SSL (https) in your URLs. If you are talking to S3 buckets, however, you’ll need to use the old style S3 bucket access style like:

runurl https://s3.amazonaws.com/run.alestic.com/demo/echo "hello, mars"

This is probably not as critical when accessing it from an EC2 instance as you’re operating over Amazon’s trusted network.

Caveats

There are a number of things which can go wrong when using a tool like runurl. Here are some to think about:

  • Only run content which you control or completely trust.

  • Just because you like the content of a URL when you look at it in your browser does not mean that it will still look like that when your instance goes to run it. It could change at any point to something that is broken or even malicious unless it is under your control.

  • If you depend on this approach for serious applications, you need to make sure that the content you are downloading is coming from a reliable server. S3 is reasonable (with retries) but you also need to consider the DNS server if you are depending on a non-AWS hostname to access the S3 bucket.

The name run.alestic.com points to an S3 bucket, but the DNS for this name is not redundant or worthy of use by applications with serious uptime requirements. This particular service should be considered my playground for ideas and there is no commitment on my part to make sure that it is up or that the content remains stable.

If you like what you see, please feel free to copy any of the open source content on run.alestic.com and store it on your own reliable and trusted servers. It is all published under the Apache2 license.

Project

I’m using this simple script as an opportunity to come up to speed with hosting projects on Launchpad. You can access the source code and submit bugs at

https://launchpad.net/runurl

You can also use launchpad and bazaar to branch the source into parallel projects and/or submit requests to merge patches into the main development branch.

[Update 2009-10-11: Document use of Alestic PPA]
[Update 2010-01-25: Simplify boostrap instructions for Ubuntu]
[Update 2010-08-17: Switch to using “add-apt-repository” for bootstrap instructions]

Automate EC2 Instance Setup with user-data Scripts

user-data Scripts

The Ubuntu and Debian EC2 images published on https://alestic.com allow you to send in a startup script using the EC2 user-data parameter when you run a new instance. This functionality is useful for automating the installation and configuration of software on EC2 instances.

The basic rule followed by the image is:

If the instance user-data starts with the two characters #! then the instance runs it as the root user on the first boot.

The “user-data script” is run late in the startup process, so you can assume that networking and other system services are functional.

If you start an EC2 instance with any user-data which does not start with #! the image simply ignores it and allows your own software to access and use the data as it sees fit.

This same user-data startup script functionality has been copied in the Ubuntu images published by Canonical, and your existing user-data script should be portable across images with little change. Read a comparison of the Alestic and Canonical EC2 images.

Example

Here is a sample user-data script which sets up an Ubuntu LAMP server on a new EC2 instance:

#!/bin/bash
set -e -x
export DEBIAN_FRONTEND=noninteractive
apt-get update && apt-get upgrade -y
tasksel install lamp-server
echo "Please remember to set the MySQL root password!"

Save this to a file named, say, install-lamp and then pass it to a new EC2 instance, say, Ubuntu 9.04 Jaunty:

ec2-run-instances --key KEYPAIR --user-data-file install-lamp ami-bf5eb9d6

Please see https://alestic.com for the latest AMI ids for Ubuntu and Debian.

Note: This simplistic user-data script is for demonstration purposes only. Though it does set up a fully functional LAMP server which may be as good as some public LAMP AMIs, it does not take into account important design issues like database persistence. Read Running MySQL on Amazon EC2 with Elastic Block Store.

Debugging

Since you are passing code to the new EC2 instance, there is a very small chance that you may have made a mistake in writing the software. Well maybe not you, but somebody else out there might not be perfect, so I have to write this for them.

The stdout and stderr of your user-data script is output in /var/log/syslog and you can review this for any success and failure messages. It will contain both things you echo directly in the script as well as output from programs you run.

Tip: If you add set -x at the top of a bash script, then it will output every command executed. If you add set -e to the script, then the user-data script will exit on the first command which does not succeed. These help you quickly identify where problems might have started.

Limitations

Amazon EC2 limits the size of user-data to 16KB. If your startup instructions are larger than this limit, you can write a user-data script which downloads the full program(s) from somewhere else like S3 and runs them.

Though a shell is a handy tool for writing scripts to install and configure software, the user-data script can be written in any language which supports the shabang (#!) mechanism for running programs. This includes bash, Perl, Python, Ruby, tcl, awk, sed, vim, make, or any other language you can find pre-installed on the image.

If you want to use another language, a user-data script written in bash could install the language, install the program, and then run it.

Security

Setting up a new EC2 instance often requires installing private information like EC2 keys and certificates (e.g., to make AWS API calls). You should be aware that if you pass secrets in the user-data parameter, the complete input is available to any user or process running on the instance.

There is no way to change the instance user-data after instance startup, so anybody who has access to the instance can simply request http://169.254.169.254/latest/user-data

Depending on what software you install on your instance, even Internet users may be able to exploit holes to get at your user-data. For example, if your web server lets users specify a URL to upload a file, they might be able to enter the above URL and then read the contents.

Alternatives

Though user-data scripts are my favorite method to set up EC2 instances, it’s not always the appropriate approach. Alternatives include:

  1. Manually ssh in to the instance and enter commands to install and configure software.

  2. Automatically ssh in to the instance with automated commands to install and configure software.

  3. Install and configure software using (1) or (2) and then rebundle the instance to create a new AMI. Use the new image when running instances.

  4. Build your own EC2 images from scratch.

The ssh options have the benefit of not putting any private information into the user-data accessible from the instance. They have the disadvantage of needing to monitor new instances waiting for the ssh server to accept connections; this complicates the startup process compared to user-data scripts.

The rebundled AMI approach and building your own AMI approach are useful when the installation and configuration of your required software take a very long time or can’t be done with automated processes (less common than you might think). A big drawback of creating your own AMIs is maintaining them, keeping up with security patches and other enhancements and fixes which might be applied by the base image maintainers.

Software

Note to AMI authors: If you wish to add to your EC2 images the same ability to run user-data scripts, feel free to include the following code and make it run on image startup:

http://ec2-run-user-data.notlong.com

Credits

Thanks to RightScale for the original idea of EC2 images with user-data startup hooks. RightScale has advanced startup plugins which include scripts, software packages, and attachments, all of which integrate with the RightScale service.

Thanks to Kim Scheibel and Jorge Oliveira who submitted code used in the original ec2-run-user-data script.

What do you use EC2 user-data for?

New releases of Ubuntu AMIs for Amazon EC2 2008-05-17 (startup hooks)

New updates have been released for all of the Ubuntu AMIs listed on:

https://alestic.com

Though this release is only 3 days after the previous one it is surprisingly not to fix bugs, but rather to add one of the most demanded features: startup hooks.

Thanks to code submitted by Kim Scheibel and Jorge Oliveira (with a little mangling from me) it is now easy to type a single command (or push a button in Elasticfox) and have an Ubuntu instance start up and immediately install, configure, and run software without any additional manual intervention.

Simply pass a script (starting with #!) as the instance user-data and it will be run automatically on the first boot. If you want it to be run on every boot, see the comments at the top of this file:

http://ec2-run-user-data.notlong.com

For example, to start a Hardy LAMP server, you could create a script named “install-lamp-server” with the contents:

#!/bin/bash
export DEBIAN_PRIORITY=critical
apt-get update
apt-get upgrade -y
apt-get install -y lamp-server^
echo "Please remember to set the MySQL root password!"

Then using the latest Ubuntu 8.04 Hardy base install AMIID (from https://alestic.com) run a command like:

ec2-run-instances \
  --user-data-file install-lamp-server \
  --key IDENTITY \
  AMIID

A couple minutes later, you should be able to connect to the server’s external hostname with a web browser. To see the progress of your user-data script on the instance:

tail -f /var/log/syslog

Output from the user-data script is also available in the EC2 instance console output for convenient remote debugging.

You can write the user-data script in any language that happens to be on the base AMI (bash, perl, python, ruby, awk, …) as long as the program file starts with: #!

Note that there is a size limit on user-data in EC2, but the user-data script may download additional files from S3 or other locations, so this shouldn’t be too constraining.

Now let the competition begin for coolest and most useful instance user-data scripts!

Enjoy