On Amazon EC2, you can create a new EBS volume from an EBS snapshot using a command like
aws ec2 create-volume \ --availability-zone us-east-1a \ --snapshot-id snap-d53484bc
This returns almost immediately with a volume id which you can attach and mount on an EC2 instance. The EBS documentation describes the magic behind the immediate availability of the data on the volume:
“New volumes created from existing Amazon S3 snapshots load lazily in the background. This means that once a volume is created from a snapshot, there is no need to wait for all of the data to transfer from Amazon S3 to your Amazon EBS volume before your attached instance can start accessing the volume and all of its data. If your instance accesses a piece of data which hasn’t yet been loaded, the volume will immediately download the requested data from Amazon S3, and then will continue loading the rest of the volume’s data in the background.”
This is a cool feature which allows you to start using the volume quickly without waiting for the blocks to be completely populated. In practice, however, there is a period of time where you experience high iowait when your application accesses disk blocks that need to be loaded. This manifests as somewhat to extreme slowness for minutes to hours after the creation of the EBS volume.
For some applications (mine) the initial slowness is acceptable, knowing that it will eventually pick up and perform with the normal EBS access speed once all blocks have been populated. As Clint Popetz pointed out on the ec2ubuntu group, other applications might need to know when the EBS volume has been populated and is ready for high performance usage.
Though there is no API status to poll or trigger to notify you when all the blocks on the EBS volume have been populated, it occurred to me that there was a method which could be used to guarantee that you are waiting until the EBS volume is ready: simply request all of the blocks from the volume and throw them away.
Here’s a simple command which reads all of the blocks on an EBS volume attached to device /dev/xvdX or /dev/sdX (substitute your device name) and does nothing with them:
sudo dd if=/dev/xvdX of=/dev/null bs=10M
sudo dd if=/dev/sdX of=/dev/null bs=10M
By the time this command is done, you know that all of the data blocks have been copied from the EBS snapshot in S3 to the EBS volume. At this point you should be able to start using the EBS volume in your application without the high iowait you would have experienced if you started earlier. Early reports from Clint are that this method works in practice.
[Update 2012-04-02: We now have to support device names like sda1 and xvda1]
[Update 2019-08-07: New aws-cli command format]