Fixing Files on the Root EBS Volume of an EC2 Instance

You can examine and edit files on the root EBS volume on an EC2 instance even if you are in what you considered a disastrous situation like:

  • You lost your ssh key or forgot your password

  • You made a mistake editing the /etc/sudoers file and can no longer gain root access with sudo to fix it

  • Your long running instance is hung for some reason, cannot be contacted, and fails to boot properly

  • You need to recover files off of the instance but cannot get to it

On a physical computer sitting at your desk, you could simply boot the system with a CD or USB stick, mount the hard drive, check out and fix the files, then reboot the computer to be back in business.

A remote EC2 instance, however, seems distant and inaccessible when you are in one of these situations. Fortunately, AWS provides us with the power and flexibility to be able to recover a system like this, provided that we are running EBS boot instances and not instance-store.

The approach on EC2 is somewhat similar to the physical solution, but we’re going to move and mount the faulty “hard drive” (root EBS volume) to a different instance, fix it, then move it back.

In some situations, it might simply be easier to start a new EC2 instance and throw away the bad one, but if you really want to fix your files, here is the approach that has worked for many:

New Release of ec2-consistent-snapshot and Screencast by Ahmed Kamal

ec2-consistent-snapshot is a tool that uses the Amazon EC2 API to initiate a snapshot of an EBS volume with some additional work to help ensure that an XFS file system and/or MySQL database are in a consistent state on that snapshot.

Ahmed Kamal pointed out to me yesterday that we can save lots of trouble installing ec2-consistent-snapshot by adding a dependency on the new libnet-amazon-ec2-perl package in Ubuntu instead of forcing people to install the Net::Amazon::EC2 Perl package through CPAN (not easy for the uninitiated).

I released a new version of ec2-consistent-snapshot which has this new dependency and updated documentation. Installing this software on Ubuntu 10.04 Lucid, 10.10 Maverick, and the upcoming Natty release is now as easy as:

A Simpler Way To Replace Instance Hardware on EC2

A while back I wrote an article describing a way to move the root EBS volume from one running instance to another. I pitched this as a way to replace the hardware for your instance in the event of failures.

Since then, I have come to the realization that there is a much simpler method to move your instance to new hardware, and I have been using this new method for months when I run into issues that I suspect might be attributed to underlying hardware issues.

This method is so simple, that I am almost embarrassed about having written the previous article, but I’ll point out below at least one benefit that still exists with the more complicated approach.

I now use this process as the second step–after a simple reboot–when I am experiencing odd problems like not being able to connect to a long running EC2 instance. (The zeroth step is to start running and setting up a replacement instance in the event that steps one and two do not produce the desired results.)

Here goes…

Moving an EC2 Instance to a Larger (or Smaller) Instance Type

When you discover that the entry level t1.micro instance size is simply not cutting it for your growing application needs, you may want to try upgrading it to a larger instance type, perhaps an m1.small or even a c1.medium.

Instead of starting a new instance and having to configure it from scratch, you may be able to simply resize the existing instance by asking Amazon move it to better hardware for you. Of course, since this is AWS, you don’t have to actually talk to anybody–just type a few commands and the job is done automatically.

Constraints

Before you try this approach, note that there are some conditions:

Logging user-data Script Output on EC2 Instances

Real time access to user-data script output

The early implementations of user-data scripts on EC2 automatically sent all output of the script (stdout and stderr) to /var/log/syslog as well as to the EC2 console output, to help monitor the startup progress and to debug when things went wrong.

The recent Ubuntu AMIs still send user-data script to the console output, so you can view it remotely, but it is no longer available in syslog on the instance. The console output is only updated a few minutes after the instance boots, reboots, or terminates, which forces you to wait to see the output of the user-data script as well as not capturing output that might come out after the snapshot.

Here is an example written in bash that demonstrates how to send all user-data script stdout and stderr automatically, transparently, and simultaneously to three locations:

Boot EC2 Instance With ssh on Port 80

In a thread on the EC2 forum, Marko describes a situation where an outbound firewall prevents the ability to ssh to port 22, which is the default port on all EC2 instances.

In that thread, Shlomo Swidler proposes creating a user-data script that changes sshd to listen on a port the firewall permits.

Here’s a simple example of a user-data script that does just that. Most outbound firewalls allow traffic to port 80 (web/HTTP), so I use it in this example.

The first step is to create a file containing the user-data script:

Copying EBS Boot AMIs Between EC2 Regions

Update: Since this article was written, Amazon has released the ability to copy EBS boot AMIs between regions using the web console, command line, and API. You may still find information of use in this article, but Amazon has solved some of the harder parts for you.

Using Amazon EC2, you created an EBS boot AMI and it’s working fine, but now you want to run instances of that AMI in a different EC2 region. Since AMIs are region specific, you need a copy of the image in each region where instances are required.

This article presents one method you can use to copy an EBS boot from one EC2 region to another.

Ubuntu 9.04 Jaunty End Of Life

Ubuntu 9.04 Jaunty has reached EOL (End Of Life). It is no longer supported by Ubuntu with security updates and patches. You have known this day was coming for 1.5 years, as all non-LTS Ubuntu releases are supported for only 18 months.

I have no plans to delete the Ubuntu 9.04 Jaunty AMIs for EC2 published under the Alestic name in the foreseeable future, but I request, recommend, and urge you to please stop using them and upgrade to an officially supported, active, kernel-consistent release of Ubuntu on EC2 like 10.04 LTS Lucid or 10.10 Maverick.

EBS Boot Instance Stop+Start Begins a New Hour of Charges on EC2

Scott Moser and I were wondering if stopping and starting an EBS boot instance on EC2 would begin a new hour’s worth of charges or if AWS would not increase your costs if the stop/start were done a few minutes apart in the same hour.

For some reason, I had assumed that it would start a new hour of fees, possibly because of my experience with the somewhat unrelated terminating old instances and starting new instances. However, we decided it would be easy to test, so here are the results.

I tested with an Ubuntu 10.10 Maverick 32-bit server EBS boot AMI on the m1.small instance type in the ap-southeast-1 (Singapore) region. The AMI should have no effect on charges, so these results should apply to any OS you run on EC2.

I used an AWS account that did not have any EC2 instance fees in the Singapore region that month (Scott’s idea) so that this activity would be easy to see as the only charges on that account.

Uploading Personal ssh Keys to Amazon EC2

[Update 2015-06-16: Upgrade to latest aws-cli command syntax]

Amazon recently launched the ability to upload your own ssh public key to EC2 so that it can be passed to new instances when they are launched. Prior to this you always had to use an ssh keypair that was generated by Amazon.

The benefits of using your own ssh key include:

  • Amazon never sees the private part of the ssh key (though they promise they do not save a copy after you downloaded it and we all trust them with this)

  • The private part of the ssh key is never transmitted over the network (though it always goes over an encrypted connection and we mostly trust this)

  • You can now upload the same public ssh key to all EC2 regions, so you no longer have to keep track of a separate ssh key for each region.

  • You can use your default personal ssh key with brand new EC2 instances, so you no longer have to remember to specify options like -i EC2KEYPAIR in every ssh, scp, rsync command.

If you haven’t yet created an ssh key for your local system, it can be done with the command:

Ubuntu 10.10 Maverick Released for Amazon EC2

Does anybody really need me to tell them that you can now run a copy of the newly released Ubuntu 10.10 Maverick on Amazon EC2 with official AMIs published by Canonical?

Or, by now, perhaps you have come to expect–like I have–that the smoothly oiled machine will naturally pump out Ubuntu AMIs for EC2 on the same pre-scheduled date that the larger Ubuntu machine churns out yet another smooth launch of yet another clean Ubuntu release.

The bigger question, I guess, might be:

Should I upgrade to Ubuntu 10.10 on EC2?

ec2-consistent-snapshot: New release 0.35

ec2-consistent-snapshot version 0.35 has been released on the Alestic PPA. This software is a wrapper around the EBS create-snapshot API call and can be used to help ensure that the file system and any MySQL database on the EBS volume are in a consistent state, suitable for restoring at a later time.

The most important change in this release is a fix for a defect that has been nagging many folks for months. In rare situations, the create-snapshot API call itself took longer than 10 seconds to return from the EC2 web service at Amazon. The software did not trap the alarm correctly and exited without unfreezing the XFS file system which forced us to add an awkward unfreeze command in all cron jobs.

mountall Bug and Workaround in Ubuntu 10.04 Lucid on EC2

The Ubuntu 10.04 Lucid AMis for Amazon EC2 dated 20100923 have a known bug which causes the mountall process to spin CPU when the instance is rebooted.

You can observe this by starting a Lucid instance, running sudo reboot, and then running top after reconnecting.

Cpu(s): 38.5%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si, 61.5%st
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 49 root      20   0  4128 1180  920 R 38.6  0.1   0:08.57 mountall           
New Release of Ubuntu AMIs Solves t1.micro Rebooting Issue

Canonical has released an updated series of Ubuntu AMIs for EC2. When starting new EC2 instances, you should use the latest AMI ids to pick up kernel security fixes. If you have Ubuntu 10.04 running on a t1.micro instance type, you should at least upgrade the software packages to get the patch for the rebooting issue:

Automatic Termination of Temporary Instances on Amazon EC2

I frequently fire up a temporary Ubuntu server on Amazon EC2 to test out some package feature, installation process, or other capability where I’m willing to pay a few pennies for a clean install and spare CPU.

I occasionally forget that I started an instance and leave it running for longer than I intended, turning my decision to spend ten cents into a cost of dollars. In one case, I ended up paying several hundred dollars for a super-sized instance I forgot I had running. Yes, ouch.

Because of this pain, I have a habit now of pre-scheduling the termination of my temporary instances immediately after creating them. I used to do this on the instance itself with a command like:

echo "sudo halt" | at now + 55 min

However, this only terminates the instance if its root disk is instance-store (S3 based AMI). I generally run EBS boot instances now, and a shutdown or halt only “stops” an EBS boot instance by default which leaves you paying for the EBS boot volume at, say, $1.50/month.

So, my common practice these days is to pre-schedule an instance termination call, generally from my local laptop, using a command like:

echo "ec2kill i-eb89bb81" | at now + 55 min

The at utility runs the commands on stdin with the exact same environment ($PATH, EC2 keys, current working directory, umask, etc.) as are in the current shell. There are a number of different ways to specify different times for the schedule and little documentation, but it will notify you when it plans to run the commands so you can check it.

After the command is run, at returns the output through an email. This gives you an indication of whether or not the terminate succeeded or if you will need to follow up manually.

Here’s an example email I got from the above at command:

Subject: Output from your job      114
Date: Mon, 20 Sep 2010 14:01:05 -0700 (PDT)

INSTANCE	i-eb89bb81	running	shutting-down

I already have a personal custom command which starts an instance, waits for it to move to running, waits for the ssh server to accept connections, then connects with ssh. I think I’ll add a --temporary option to do this termination scheduling for me as well.

You can get a list of the currently scheduled at jobs with

at -l

You can see the commands in a specific job id with:

at -c [JOBID]

If you decide along the way that the instance should not be temporary and you want to cancel the scheduled termination, you can delete a given at job with a command like:

at -d [JOBID]

I’ve been thinking of writing something simple that would regularly monitor my AWS/EC2 resources (instances, EBS volumes, EBS snapshots, AMIs, etc.) and alert me if it detects patterns that may indicate I am spending money where I may not have intended to.

How do you monitor and clean up temporary resources on Amazon AWS/EC2?