February 2011 Archives

You can examine and edit files on the root EBS volume on an EC2 instance even if you are in what you considered a disastrous situation like:

  • You lost your ssh key or forgot your password

  • You made a mistake editing the /etc/sudoers file and can no longer gain root access with sudo to fix it

  • Your long running instance is hung for some reason, cannot be contacted, and fails to boot properly

  • You need to recover files off of the instance but cannot get to it

On a physical computer sitting at your desk, you could simply boot the system with a CD or USB stick, mount the hard drive, check out and fix the files, then reboot the computer to be back in business.

A remote EC2 instance, however, seems distant and inaccessible when you are in one of these situations. Fortunately, AWS provides us with the power and flexibility to be able to recover a system like this, provided that we are running EBS boot instances and not instance-store.

The approach on EC2 is somewhat similar to the physical solution, but we’re going to move and mount the faulty “hard drive” (root EBS volume) to a different instance, fix it, then move it back.

In some situations, it might simply be easier to start a new EC2 instance and throw away the bad one, but if you really want to fix your files, here is the approach that has worked for many:

Set Up

Identify the original instance (A) and volume that contains the broken root EBS volume with the files you want to view and edit.

instance_a=i-XXXXXXXX

volume=$(ec2-describe-instances $instance_a |
  egrep '^BLOCKDEVICE./dev/sda1' | cut -f3)

Identify the second EC2 instance (B) that you will use to fix the files on the original EBS volume. This instance must be running in the same availability zone as instance A so that it can have the EBS volume attached to it. If you don’t have an instance already running, start a temporary one.

instance_b=i-YYYYYYYY

Stop the broken instance A (waiting for it to come to a complete stop), detach the root EBS volume from the instance (waiting for it to be detached), then attach the volume to instance B on an unused device.

ec2-stop-instances $instance_a
ec2-detach-volume $volume
ec2-attach-volume --instance $instance_b --device /dev/sdj $volume

ssh to instance B and mount the volume so that you can access its file system.

ssh ...instance b...

sudo mkdir -p 000 /vol-a
sudo mount /dev/sdj /vol-a

Fix It

At this point your entire root file system from instance A is available for viewing and editing under /vol-a on instance B. For example, you may want to:

  • Put the correct ssh keys in /vol-a/home/ubuntu/.ssh/authorized_keys

  • Edit and fix /vol-a/etc/sudoers

  • Look for error messages in /vol-a/var/log/syslog

  • Copy important files out of /vol-a/

Note: The uids on the two instances may not be identical, so take care if you are creating, editing, or copying files that belong to non-root users. For example, your mysql user on instance A may have the same UID as your postfix user on instance B which could cause problems if you chown files with one name and then move the volume back to A.

Wrap Up

After you are done and you are happy with the files under /vol-a, unmount the file system (still on instance-B):

sudo umount /vol-a
sudo rmdir /vol-a

Now, back on your system with ec2-api-tools, continue moving the EBS volume back to it’s home on the original instance A and start the instance again:

ec2-detach-volume $volume
ec2-attach-volume --instance $instance_a --device /dev/sda1 $volume
ec2-start-instances $instance_a

Hopefully, you fixed the problem, instance A comes up just fine, and you can accomplish what you originally set out to do. If not, you may need to continue repeating these steps until you have it working.

Note: If you had an Elastic IP address assigned to instance A when you stopped it, you’ll need to reassociate it after starting it up again.

Remember! If your instance B was temporarily started just for this process, don’t forget to terminate it now.

ec2-consistent-snapshot is a tool that uses the Amazon EC2 API to initiate a snapshot of an EBS volume with some additional work to help ensure that an XFS file system and/or MySQL database are in a consistent state on that snapshot.

Ahmed Kamal pointed out to me yesterday that we can save lots of trouble installing ec2-consistent-snapshot by adding a dependency on the new libnet-amazon-ec2-perl package in Ubuntu instead of forcing people to install the Net::Amazon::EC2 Perl package through CPAN (not easy for the uninitiated).

I released a new version of ec2-consistent-snapshot which has this new dependency and updated documentation. Installing this software on Ubuntu 10.04 Lucid, 10.10 Maverick, and the upcoming Natty release is now as easy as:

sudo add-apt-repository ppa:alestic &&
sudo apt-get update &&
sudo apt-get install ec2-consistent-snapshot

Once it is installed, you can read the documentation with the command

man ec2-consistent-snapshot

Screencast

Ahmed also just released a screencast which includes a quick demo of using ec2-consistent-snapshot:

Ubuntu Cloud, Run and Backup LAMP like a pro

This screencast walks through the steps I outlined in this article on the AWS site:

Running MySQL on Amazon EC2 with EBS

Though that paper is old in cloud years, it is still pretty much how I and many others run MySQL on EC2.

A while back I wrote an article describing a way to move the root EBS volume from one running instance to another. I pitched this as a way to replace the hardware for your instance in the event of failures.

Since then, I have come to the realization that there is a much simpler method to move your instance to new hardware, and I have been using this new method for months when I run into issues that I suspect might be attributed to underlying hardware issues.

This method is so simple, that I am almost embarrassed about having written the previous article, but I’ll point out below at least one benefit that still exists with the more complicated approach.

I now use this process as the second step—after a simple reboot—when I am experiencing odd problems like not being able to connect to a long running EC2 instance. (The zeroth step is to start running and setting up a replacement instance in the event that steps one and two do not produce the desired results.)

Here goes…

Method

To move your EBS boot instance to new hardware on EC2:

  1. Stop the EC2 instance

    ec2-stop-instances $instanceid
    
  2. Start the EC2 instance

    ec2-start-instances $instanceid
    
  3. (optional) If you had an Elastic IP address associated with the instance, re-associate it:

    ec2-associate-address --instance $instanceid $ipaddress
    

It’s that simple. In my experience I almost always get new hardware for my instance by performing these steps. But…

Caveats

Some things to consider when using this approach:

  1. Make sure you “stop” the instance and not “terminate” it. Terminating an instance generally loses all disk based information.

  2. This will only work with EBS boot instances. S3 based instances cannot be stopped.

  3. Stopping an EBS boot instance preserves files on attached EBS volumes, but all information on ephemeral instance-store disks will be lost (e.g., /mnt).

  4. There may be a small chance that you will get the exact same hardware after starting the instance again. If the internal IP address before and after are the same or if you continue observing what you sincerely believe is a host system issue, you may want to run the process again.

  5. There will be a short outage while your instance is stopped and started. In my experience this lasts roughly about the same time as it takes for a normal system to boot up.

  6. There is a risk that after stopping the instance, you will not be able to start it again because that availability zone no longer has open instances of that type.

I ran into this last issue recently when I stopped an m2.4xlarge instance in a us-east-1 availability zone. Upon attempting to start the instance, I received the error that instances of that type were not currently available in that zone. I ended up having to start a replacement instance from scratch in another us-east-1 availability zone which worked out fine, but I would have preferred to keep my instances closer to each other. Eventually instances freed up and I moved the server back to its home zone.

If I had used the more complicated approach to move the root EBS volume to a new instance I would have made sure that there was an instance of the right type available before stopping the original instance.

When you discover that the entry level t1.micro instance size is simply not cutting it for your growing application needs, you may want to try upgrading it to a larger instance type, perhaps an m1.small or even a c1.medium.

Instead of starting a new instance and having to configure it from scratch, you may be able to simply resize the existing instance by asking Amazon move it to better hardware for you. Of course, since this is AWS, you don’t have to actually talk to anybody—just type a few commands and the job is done automatically.

Constraints

Before you try this approach, note that there are some conditions:

  1. You must be running an EBS boot instance (not instance-store or S3-based AMI). Any files on ephemeral storage (e.g., /mnt) will be lost.

  2. You can only move to a different instance type of the same architecture (32-bit or 64-bit). Update: Always use 64-bit

  3. The private and initial public IP addresses of the instance will be different when it is running on the new hardware. Use an Elastic IP Address to keep the public IP address the same.

  4. There will be a short outage while the instance is moved to new hardware (roughly equivalent to the reboot time of normal hardware).

Process

Run a new t1.micro Ubuntu 10.04 Lucid instance to be our demo. I recommend uploading your own ssh key first.

instance_id=$(ec2-run-instances --instance-type t1.micro --key $USER ami-3e02f257 |
  egrep ^INSTANCE | cut -f2)

Wait until it is running and perhaps log in to install software or touch some files so you know it’s your instance. When you are ready:

Step 1 - stop the t1.micro instance:

ec2-stop-instances $instance_id

At this point in a normal environment, you might want to create an EBS snapshot AMI of the instance for backup purposes in the event that anything goes wrong. (See: ec2-create-image)

Step 2 - While the EBS boot instance is stopped, switch the instance type from t1.micro to m1.small:

ec2-modify-instance-attribute --instance-type m1.small $instance_id

Step 3 - Start the instance using its new m1.small type:

ec2-start-instances $instance_id

Wait until it is running (again), then log in to verify that it is the same instance with all your software and data. If you were using an Elastic IP address with the instance, it would need to be reassociated after the instance is started.

Eventually, you’ll discover that an m1.small isn’t all that powerful either for moderate loads, so you’ll want to upgrade to a c1.medium, which for many purposes is a great size. It offers 5X the CPU of an m1.small for only 2X the price.

Unfortunately, with a 32-bit instance, c1.medium is as high as you can currently go as of this writing. You’ll need to switch over to a new instance running a 64-bit AMI if you want to go larger.

If you were following along with this example, don’t forget to clean up after yourself:

ec2-terminate-instances $instance_id

Bonus

Though we often think “scaling” means moving to larger/faster/more hardware, Amazon EC2 has shown us that it is equally valuable to be able to scale down when we no longer need the extra capacity. The above approach can be used to move your instances to smaller instance types to reduce costs.

If you run this demo, you will be charged for 1 hour of t1.micro instance time plus 1 hour of m1.small instance time, plus fractions of a penny in EBS volume and IO charges. That’s slightly more than a dime.

My company has used this technique a number of times as a way of scaling up and down certain services that did not have a load balanced auto scaling architecture set up. It’s a fantastic way to temporarily increase memory on a system while you debug a new memory issue, and then scale back down after you resolve it.

Update 2012-03-08: 64-bit architecture is now available on all instance types

Ubuntu AMIs

Ubuntu AMIs for EC2: