Using RAID on EC2 EBS Volumes to Break the 1TB Barrier and Increase Performance

| 12 Comments

Amazon EC2 currently has a limit of 1,000 GB (1 TB) for EBS volumes (Elastic Block Store). It is possible to create file systems larger than this limit using RAID 0 across multiple EBS volumes. Using RAID 0 can also improve the performance of the file system reducing total IO wait as demonstrated in a number of published EBS performance tests.

The following instructions walk through one way to set up RAID 0 across multiple EBS volumes. Note that there is a limit on the size of a file system on 32-bit instances, but 64-bit instances can get unreasonably large. This test was run with 40 EBS volumes of 1,000 GB each for a total of 40,000 GB (40 TB) in the resulting file system.

Actual command line output showing the size of the RAID:

# df /vol
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md0             41942906368      1312 41942905056   1% /vol

# df -h /vol
Filesystem            Size  Used Avail Use% Mounted on
/dev/md0               40T  1.3M   40T   1% /vol

These commands can run in less than 10 minutes and this could probably be reduced further by parallelizing the creation and attaching of the EBS volumes.

Note that the default limit is 20 EBS volumes per EC2 account. You can request an increase from Amazon if you need more.

Caution: 40 TB of EBS storage on EC2 will cost $4,000 per month plus usage charges.

Instructions

Start a 64-bit instance (say, Ubuntu 8.04 Hardy from http://alestic.com). Use your own KEYPAIR:

ec2-run-instances   --key KEYPAIR   --instance-type c1.xlarge   --availability-zone us-east-1a   ami-0772946e

Configurable parameters (set on both local host and on EC2 instance):

instanceid=i-XXXXXXXX
volumes=40
size=1000
mountpoint=/vol

On the local host (with EC2 API tools installed)…

Create and attach EBS volumes:

devices=$(perl -e 'for$i("h".."k"){for$j("",1..15){print"/dev/sd$i$j\n"}}'|
           head -$volumes)
devicearray=($devices)
volumeids=
i=1
while [ $i -le $volumes ]; do
  volumeid=$(ec2-create-volume -z us-east-1a --size $size | cut -f2)
  echo "$i: created  $volumeid"
  device=${devicearray[$(($i-1))]}
  ec2-attach-volume -d $device -i $instanceid $volumeid
  volumeids="$volumeids $volumeid"
  let i=i+1
done
echo "volumeids='$volumeids'"

On the EC2 instance (after setting parameters as above)…

Install software:

sudo apt-get update &&
sudo apt-get install -y mdadm xfsprogs

Set up the RAID 0 device:

devices=$(perl -e 'for$i("h".."k"){for$j("",1..15){print"/dev/sd$i$j\n"}}'|
           head -$volumes)

yes | sudo mdadm   --create /dev/md0   --level 0   --metadata=1.1   --chunk 256   --raid-devices $volumes   $devices

echo DEVICE $devices       | sudo tee    /etc/mdadm.conf
sudo mdadm --detail --scan | sudo tee -a /etc/mdadm.conf

Create the file system (pick your preferred file system type)

sudo mkfs.xfs /dev/md0

Mount:

echo "/dev/md0 $mountpoint xfs noatime 0 0" | sudo tee -a /etc/fstab
sudo mkdir $mountpoint
sudo mount $mountpoint

Check it out:

df -h $mountpoint

When you’re done with it and want to destroy the data and stop paying for storage, tear it down:

sudo umount $mountpoint
sudo mdadm --stop /dev/md0

Terminate the instance:

sudo shutdown -h now

On the local host (with EC2 API tools installed)…

Detach and delete volumes:

for volumeid in $volumeids; do
  ec2-detach-volume $volumeid
done

for volumeid in $volumeids; do
  ec2-delete-volume $volumeid
done

Credits

This article was originally posted on the EC2 Ubuntu group.

Thanks to M. David Peterson for the basic mdadm instructions:

[Update 2012-01-21: Added —chunk 256 based on community recognized best practices.]

12 Comments

Lovely! I'm thinking on an incremental storage system using LVM2 (for adding storage capacity) and RAID (for IO optimization and fault tolerance).
Using resources Amazon provides, such as EBS, this storage system could be resized on-demand.
Have you ever experienced LVM and RAID implementations using EBS?
I'd love to share experiences on such implementations. :)

ramses: I've set up LVM on EBS as a test but not in production. With this approach, RAID isn't even needed as LVM can do the striping across EBS volumes.

To do a snapshot, can one just do "xfs_freeze -f /vol", then snap each EBS vol, and the unfreeze? Or does one need to run some other command?

Thanks,
Don

Don: Effectively, yes. You can read about EBS volumes and snapshots here:

http://ec2ebs-mysqlnotlong.com

I've also written a tool which can be used to generate consistent snapshots with RAID (especially when using XFS and optionally MySQL):

http://alestic.com/2009/09/ec2-consistent-snapshot

Tracking snapshots and reassembling the RAID array is left as an exercise for the reader and should be practiced before needed.

Hi Eric,

Thanks for all your useful articles!

I have posted a question in the AWS dev forum (http://developer.amazonwebservices.com/connect/thread.jspa?threadID=45444) regarding LVM over RAID setup on EC2 issues.

As it seems that you have been able to do it in a test environment I would like to know what I'm missing in order to make everything work after an image rebundling and rebooting.

Regards,
David

Is it possible to grow a RAID0 using EBS? My reading suggests you would have to set things up as a RAID1. I set-up a 2 TB RAID0 following the instructions here - is there anyway to grow this by adding additional 1 TB EBS vols?

basscakes:

That would be a question for a RAID forum. Since EBS volumes are just like hard drives to the RAID, you can do whatever is normally possible with your RAID software.

Hi Eric,

First of all, thanks for such a nice writeup and all the good stuff you have been giving back to community to help noobs like me to catch up to latest trends and tech.

Second, i tried a setup like this and after a reboot RAID would just disappear. I had discussed it with couple of people on ##aws and later someone else on ##aws asked you almost same question while i was away.

I have posted my research results at:
error404notfound.posterous.com/experience-with-raid-and-ephemeral-devices
If its any help, it would be a pleasure. Let me know if you figure out why this happens and what is a possible workaround for this.

Thanks

Hi Eric,

Have you or anyone else solved the reboot issue in Ubuntu Lucid? Seems to be an issue with UUIDs, but I cannot find a solution. On reboot, the instance is "stuck" with a console message and you are unable to ssh into the instance. Message is:

"the disk for/mnt/md0 is not ready yet or not present. Continue to wait; or Press S to skip mount or M for manual recover"

AItOawn3dGSacFn5XCITiDKWmwzrKAV5760yzyY:

What is the launchpad bug id for your reboot issue? If there isn't one, then it's a good bet nobody is working on it.

Hi Eric,

It did really shed light on how to scale EBS volumes, but my concern is about the case of EBS outages during RAID. Is it possible to take EBS snapshots of all the EBS volumes at same RAID state at that moment, then recover from the snapshots successfully...

Unni:

If you can quiesce and freeze the file system, flushing blocks to the RAID volumes, you can then snapshot all of the volumes at the same point in time. In theory, this should give you a consistent snapshot across volumes that could be re-assembled later.

My ec2-consistent-snapshot tool supports these actions and is used by some folks for creating consistent snapshots across RAID volumes:

http://alestic.com/2009/09/ec2-consistent-snapshot

I don't guarantee that this will work for your particular RAID setup, but it might be worth testing.

Leave a comment

Ubuntu AMIs

Ubuntu AMIs for EC2:


More Entries

When Are Your SSL Certificates Expiring on AWS?
If you uploaded SSL certificates to Amazon Web Services for ELB (Elastic Load Balancing) or CloudFront (CDN), then you will want to keep an eye on the expiration dates and…
Throw Away The Password To Your AWS Account
reduce the risk of losing control of your AWS account by not knowing the root account password As Amazon states, one of the best practices for using AWS is Don’t…
AWS Community Heroes Program
Amazon Web Services recently announced an AWS Community Heroes Program where they are starting to recognize publicly some of the many individuals around the world who contribute in so many…
EBS-SSD Boot AMIs For Ubuntu On Amazon EC2
With Amazon’s announcement that SSD is now available for EBS volumes, they have also declared this the recommended EBS volume type. The good folks at Canonical are now building Ubuntu…
EC2 create-image Does Not Fully "Stop" The Instance
The EC2 create-image API/command/console action is a convenient trigger to create an AMI from a running (or stopped) EBS boot instance. It takes a snapshot of the instance’s EBS volume(s)…
Finding the Region for an AWS Resource ID
use concurrent AWS command line requests to search the world for your instance, image, volume, snapshot, … Background Amazon EC2 and many other AWS services are divided up into various…
Changing The Default "ubuntu" Username On New EC2 Instances
configure your own ssh username in user-data The official Ubuntu AMIs create a default user with the username ubuntu which is used for the initial ssh access, i.e.: ssh ubuntu@<HOST>…
Default ssh Usernames For Connecting To EC2 Instances
Each AMI publisher on EC2 decides what user (or users) should have ssh access enabled by default and what ssh credentials should allow you to gain access as that user.…
New c3.* Instance Types on Amazon EC2 - Nice!
Worth switching. Amazon shared that the new c3.* instance types have been in high demand on EC2 since they were released. I finally had a minute to take a look…
Query EC2 Account Limits with AWS API
Here’s a useful tip mentioned in one of the sessions at AWS re:Invent this year. There is a little known API call that lets you query some of the EC2…
Using aws-cli --query Option To Simplify Output
My favorite session at AWS re:Invent was James Saryerwinnie’s clear, concise, and informative tour of the aws-cli (command line interface), which according to GitHub logs he is enhancing like crazy.…
Reset S3 Object Timestamp for Bucket Lifecycle Expiration
use aws-cli to extend expiration and restart the delete or archive countdown on objects in an S3 bucket Background S3 buckets allow you to specify lifecycle rules that tell AWS…
Installing aws-cli, the New AWS Command Line Tool
consistent control over more AWS services with aws-cli, a single, powerful command line tool from Amazon Readers of this tech blog know that I am a fan of the power…
Using An AWS CloudFormation Stack To Allow "-" Instead Of "+" In Gmail Email Addresses
Launch a CloudFormation template to set up a stack of AWS resources to fill a simple need: Supporting Gmail addresses with “-” instead of “+” separating the user name from…
New Options In ec2-expire-snapshots v0.11
The ec2-expire-snapshots program can be used to expire EBS snapshots in Amazon EC2 on a regular schedule that you define. It can be used as a companion to ec2-consistent-snapshot or…
Replacing a CloudFront Distribution to "Invalidate" All Objects
I was chatting with Kevin Boyd (aka Beryllium) on the ##aws Freenode IRC channel about the challenge of invalidating a large number of CloudFront objects (35,000) due to a problem…
Email Alerts for AWS Billing Alarms
using CloudWatch and SNS to send yourself email messages when AWS costs accrue past limits you define The Amazon documentation describes how to use the AWS console to monitor your…
Cost of Transitioning S3 Objects to Glacier
how I was surprised by a large AWS charge and how to calculate the break-even point Glacier Archival of S3 Objects Amazon recently introduced a fantastic new feature where S3…
Running Ubuntu on Amazon EC2 in Sydney, Australia
Amazon has announced a new AWS region in Sydney, Australia with the name ap-southeast-2. The official Ubuntu AMI lookup pages (1, 2) don’t seem to be showing the new location…
Save Money by Giving Away Unused Heavy Utilization Reserved Instances
You may be able to save on future EC2 expenses by selling an unused Reserved Instance for less than its true value or even $0.01, provided it is in the…
Installing AWS Command Line Tools from Amazon Downloads
This article describes how to install the old generation of AWS command line tools. For the most part, these have been replaced with the new AWS cli that is…
Convert Running EC2 Instance to EBS-Optimized Instance with Provisioned IOPS EBS Volumes
Amazon just announced two related features for getting super-fast, consistent performance with EBS volumes: (1) Provisioned IOPS EBS volumes, and (2) EBS-Optimized Instances. Starting new instances and EBS volumes with…
Which EC2 Availability Zone is Affected by an Outage?
Did you know that Amazon includes status messages about the health of availability zones in the output of the ec2-describe-availability-zones command, the associated API call, and the AWS console? Right…
Installing AWS Command Line Tools Using Ubuntu Packages
See also: Installing AWS Command Line Tools from Amazon Downloads Here are the steps for installing the AWS command line tools that are currently available as Ubuntu packages. These include:…
Ubuntu Developer Summit, May 2012 (Oakland)
I will be attending the Ubuntu Developer Summit (UDS) next week in Oakland, CA. ┬áThis event brings people from around the world together in one place every six months to…
Uploading Known ssh Host Key in EC2 user-data Script
The ssh protocol uses two different keys to keep you secure: The user ssh key is the one we normally think of. This authenticates us to the remote host, proving…
Seeding Torrents with Amazon S3 and s3cmd on Ubuntu
Amazon Web Services is such a huge, complex service with so many products and features that sometimes very simple but powerful features fall through the cracks when you’re reading the…
CloudCamp
There are a number of CloudCamp events coming up in cities around the world. These are free events, organized around the various concepts, technologies, and services that fall under the…
Use the Same Architecture (64-bit) on All EC2 Instance Types
A few hours ago, Amazon AWS announced that all EC2 instance types can now run 64-bit AMIs. Though t1.micro, m1.small, and c1.medium will continue to also support 32-bit AMIs, it…
ec2-consistent-snapshot on GitHub and v0.43 Released
The source for ec2-conssitent-snapshot has historically been available here: ec2-consistent-snapshot on Launchpad.net using Bazaar For your convenience, it is now also available here: ec2-consistent-snapshot on GitHub using Git You are…