Move a Running EBS Boot Instance to New Hardware on Amazon EC2

| 15 Comments

NOTE: Though this method works and there is useful information in this article about things you can do with EBS boot instances, there is a simpler way to move an instance to new hardware.

Amazon EC2 has been experiencing some power issues in a portion of one of their many data centers. Even though the relative percentage of people affected might be small, when you have as many customers as AWS does, a small fraction can still be a large absolute number of customers who are affected.

Naturally, some customers will be upset about not having access to their systems, but in the time it takes to write a complaint, you might be able to move your server to new hardware within EC2 and go on with your business

First, lets assume that you are running an EBS boot instance. If you didn’t think they were the way to go before reading this article, I expect to convince you with this one example (and there are a number of other benefits).

Setup

For this demo, I’m going to start an instance. In your situation this instance represents the currently running server that you are depending on and which has valuable software, configuration, and perhaps data on its EBS volume(s). I’m also going to drop a note on the EBS root disk on my running instance so that I know it is the one I wanted to preserve. Again, this is just setting things up for the demo:

# SKIP THIS ENTIRE SECTION IF YOU ALREADY HAVE AN INSTANCE RUNNING
keypair=YOURKEYPAIR
sshkey=.ssh/$keypair.pem # or wherever you keep it
region=us-west-1 # pick your region
zone=${region}a # pick your availability zone
type=m1.small # pick your size
amiid=ami-cb97c68e # Ubuntu 10.04 Lucid, 32-bit, EBS boot in that region
oldinstanceid=$(ec2-run-instances   --key $keypair   --region $region   --availability-zone $zone   --instance-type $type   $amiid |
  egrep ^INSTANCE | cut -f2)
echo "instanceid=$oldinstanceid"

while host=$(ec2-describe-instances --region $region "$oldinstanceid" | 
  egrep ^INSTANCE | cut -f4) && test -z $host; do echo -n .; sleep 1; done
echo host=$host

echo "save the volume" | ssh -i $sshkey ubuntu@$host tee README.txt

Moving to a New Instance

Now, we pretend that the above created instance has failed in some way and we can no longer access it. Here’s how we get our server running on a new instance:

  1. If you can, stop the old instance. This increases your chance of keeping the file system consistent on the EBS volume. If the instance has really failed and this step does not work, then skip it.

    ec2-stop-instances --region $region $oldinstanceid
    
  2. Run a new instance with the same startup parameters as your old instance.

    newinstanceid=$(ec2-run-instances   --key $keypair   --region $region   --availability-zone $zone   --instance-type $type   $amiid |
      egrep ^INSTANCE | cut -f2)
    echo "newinstanceid=$newinstanceid"
    
  3. Wait until the new instance is running and then stop (not terminate) the new instance and detach the EBS boot volume from it. Delete the volume as it has nothing of importance, having just been created.

    ec2-stop-instances --region $region $newinstanceid
    newebsroot=$(ec2-describe-instances --region $region $newinstanceid |
      grep ^BLOCKDEVICE | grep /dev/sda1 | cut -f3)
    ec2-detach-volume --force --region $region $newebsroot
    ec2-delete-volume --region $region $newebsroot
    
  4. Detach the old (valuable) EBS root volume from the old (broken) instance.

    oldebsroot=$(ec2-describe-instances --region $region $oldinstanceid |
      grep ^BLOCKDEVICE | grep /dev/sda1 | cut -f3)
    ec2-detach-volume --force --region $region $oldebsroot
    
  5. Attach the old (valuable) EBS root volume to the new (stopped) instance.

    ec2-attach-volume   --region $region   -d /dev/sda1   -i $newinstanceid   $oldebsroot
    

    If you had multiple EBS volumes attached to the old instance, you would move each one over in a similar manner.

  6. Restart the new instance which is now going to boot with the original volume.

    ec2-start-instances --region $region $newinstanceid
    

Voila! You have moved your server from an old, perhaps broken instance to new (or at least different) hardware keeping the same file system, and it took only a few minutes! If you’d like, ssh to the new instance and make sure that your valuable information is still there.

If you had an Elastic IP address associated with the old instance, you would move it to the new instance.

Cleanup

You may terminate the old instance if you are comfortable that you won’t need it any more. If you were following this demo as an exercise, you should also terminate the new instance. Since you manually attached the old volume to the new instance yourself, it will not be deleted automatically when the instance is terminated. You can modify the instance attributes to change the delete-on-termination flag for the volume or simply delete it manually.

# BEWARE! Don't copy these blindly, but think about what you should do
ec2-terminate-instances --region $region $oldinstanceid
ec2-terminate-instances --region $region $newinstanceid
ec2-delete-volume --region $region $oldebsroot

Tips

This above process can also be used when your instance is running fine, but you want to move to a different instance type (size) of the same architecture. For example, you could move from m1.small up to c1.medium, or from m2.4xlarge down to c1.xlarge. Update: I wasn’t thinking clearly when I wrote that last sentence. It is possible to change the instance type much more easily: Simply stop the instance, use ec2-modify-instance-attribute, and start it up again. Read more about this method.

You can also resize the root disk of a running EC2 instance using the same basic principle of swapping out an EBS root volume on a running instance.

[Update 2011-02-07: Point out simpler method to move to new hardware.]
[Update 2012-01-08: Link to article on changing instance type.]

15 Comments

As tested by Twitter user @schmidtcw, this technique works for Windows instances as well.

http://twitter.com/schmidtcw/statuses/13432697433

Instead of going to the trouble of creating a new instance and moving the boot volume, can't you just stop the old instance, change the instance type, and restart? It seems that should move you to new hardware even faster?

You should also note that bot of these processes will lose all the data on your ephemeral disks.

-tom

Tom: Yes, I caught this myself and updated the document before I received your comment, but thanks for pointing it out. Your observation on the local storage is also a good one. I tend not to think about it much since most everything I do is on EBS volumes these days.

Impecable timing. I received a notice yesterday about a hardware failure on one of my instances. Thanks to all your other tutorials and this helpful reminder, I was up and running on a different instance in no time.

The only catch for me was that I couldn't get the instance to stop, not could I detach the volumes. Fortunately I was able to create snapshots and go from there.

jedwood: ec2-detach-volume --force usually gets the job done, but I have seen cases where hardware failure prevents detaching. If you go the snapshot route, it might be easier to just register the snapshot as a new AMI and run an instance of it. My approach above (transferring the volume without a snapshot) is optimized for time to recovery, but snapshot AMIs are a bit easier.

As per my earlier comment, I am running a drupal site on an EBS Lucid instance. When I try to relaunch an AMI created from the original instance, nothing works - apache is not running, nor is mysql etc... Would it be because its still transferring data from s3?

I can't debug your issue from here, but if you post complete instructions on how to reproduce your problem to http://groups.google.com/group/ec2ubuntu somebody might have some ideas.

Hi Eric,

I did the dry run of your steps here. If I am able to stop the system than, I can detach all EBS volumes including Root vol. But, if I am not stopping the system, I am able to only detach non-root EBS vols which makes sense.

So, If I have a system which suddenly becomes inaccessible and I am not able to stop the system, I should be able to detach the non-root EBS volumes. Than, I can fire a new system with my private root partition image and attach this non-root volumes to the new system.

In case I am not able to detach any of the volumes, than we need to recover from snapshots of EBS volumes. The only issue here is that the system can not be used for production due to poor response time till all the blocks are copied from snapshots which can take upto 4-5 hrs for a 500GB vol.

Is there any better way to recover quickly if you are not able to salvage your EBS vols from inaccessible/hosed/failed system ?


thanks,

anil

anil: If you set up the root EBS volume to persist after instance termination, then you might be able to terminate the instance and force detach the EBS root volume.

If your project requires rapid recovery times, you might want to keep a hot spare standing by, updating live from the master, and ready for a fast failover, perhaps even automated.

This doesn't seem to work for me. I'm able to get as far as attaching the old volume to the new instance - the instance even shows the root device /dev/sda1 as being attached to the volume. Booting the instance leaves it in "running" state, however sshd now rejects all connections. Sadface.

If you're moving to a t1.micro using the current Ubuntu 10.04 AMIs, you'll need to first apply a workaround as described in https://bugs.launchpad.net/bugs/634102

The fix for this bug is in the process of being released, but in the meantime we need to manually account for the fact that t1.micro do not have ephemeral store.

This works perfect, I was also able to use a modified approach to this steps (simply by creating a snapshot of the old EBS and then spawning a new volume in the proper zone) to "move" an instance from one availability zone to another one (useful for the classic mistake when purchasing a reserved instance in a different AZ...)

Hi Eric, thank you for this post. I discovered that its possible to do a lot of these steps directly in the AWS control panel (which might be easier for some users).

Boris: Yep. Most of the steps I describe using AWS command line tools can be done through the AWS console, third party web UIs, or through direct API calls in your favorite programming language. I prefer demonstrating with command line examples so folks can copy and paste, and because it is easier to automate procedures with commands or the corresponding direct API calls.

Hellmut:

Nice idea, thanks!

Leave a comment

More Entries

You Should Use EBS Boot Instances on Amazon EC2
EBS boot vs. instance-store If you are just getting started with Amazon EC2, then use EBS boot instances and stop…
Retrieve Public ssh Key From EC2
A serverfault poster had a problem that I thought was a cool challenge. I had so much fun coming up…
Running EC2 Instances on a Recurring Schedule with Auto Scaling
Do you want to run short jobs on Amazon EC2 on a recurring schedule, but don’t want to pay for…
AWS Virtual MFA and the Google Authenticator for Android
Amazon just announced that the AWS MFA (multi-factor authentication) now supports virtual or software MFA devices in addition to the…
Updated EBS boot AMIs for Ubuntu 8.04 Hardy on Amazon EC2 (2011-10-06)
Canonical has released updated instance-store AMIs for Ubuntu 8.04 LTS Hardy on Amazon EC2. Read Ben Howard’s announcement on the…
New Release of Alestic Git Server
New AMIs have been released for the Alestic Git Server. Major upgrade points include: Base operating system upgraded to Ubuntu…
Using ServerFault.com for Amazon EC2 Q&A
The Amazon EC2 Forum has been around since the beginning of EC2 and has always been a place where you…
Rebooting vs. Stop/Start of Amazon EC2 Instance
When you reboot a physical computer at your desk it is very similar to shutting down the system, and booting…
Upper Limits on Number of Amazon EC2 Instances by Region
[Update: As predicted, these numbers are already out of date and Amazon has added more public IP address ranges for…
Unavailable Availability Zones on Amazon EC2
I’m taking a class about using Chef with EC2 by Florian Drescher today and Florian mentioned that he noticed one…
Desktop AMI login security with NX
Update 2011-08-04: Amazon Security did more research and investigated the desktop AMIs. They have confirmed that their software incorrectly flagged…
Updated EBS boot AMIs for Ubuntu 8.04 Hardy on Amazon EC2
For folks still using the old, reliable Ubuntu 8.04 LTS Hardy from 2008, Canonical has released updated AMIs for use…
Creating Public AMIs Securely for EC2
Amazon published a tutorial about best practices in creating public AMIs for use on EC2 last week: How To Share…
Canonical Releases Ubuntu 11.04 Natty for Amazon EC2
As steady as clockwork, Ubuntu 11.04 Natty is released on the day scheduled at least eleven months ago; and thanks…
EC2 Reserved Instance Offering IDs Change Over Time
This article is a followup to Matching EC2 Availability Zones Across AWS Accounts written back in 2009. Please read that…
My Experience With the EC2 Judgment Day Outage
Amazon designs availability zones so that it is extremely unlikely that a single failure will take out multiple zones at…
Alestic Git Server (alpha testing)
I’m working on making it easy to start a centralized Git server with an unlimited number of private Git repositories…
Amazon EC2 Tokyo (ap-northeast-1) and Ubuntu AMIs
Amazon Web Services has launched a new EC2 region in Tokyo named ap-northeast-1. Canonical has released new AMIs in this…
Fixing Files on the Root EBS Volume of an EC2 Instance
You can examine and edit files on the root EBS volume on an EC2 instance even if you are in…
New Release of ec2-consistent-snapshot and Screencast by Ahmed Kamal
ec2-consistent-snapshot is a tool that uses the Amazon EC2 API to initiate a snapshot of an EBS volume with some…