Upper Limits on Number of Amazon EC2 Instances by Region

[Update: As predicted, these numbers are already out of date and Amazon has added more public IP address ranges for use by EC2 in various regions.]

Each standard Amazon EC2 instance has a public IP address. This is true for normal instances when they are first brought up and for instances which have had elastic IP addresses assigned to them. Your EC2 instance still has a public IP address even if you have configured the security group so that it cannot be contacted from the Internet, which happens to be the default setting for security groups.

Amazon has made public the EC2 IP address ranges that may be in use for each region.

From this information, we can calculate the absolute upper limit for the number of concurrently running standard EC2 instances that could possibly be supported in each region. At the time of this writing I calculate the hard upper limits to be:

Creating Public AMIs Securely for EC2

Amazon published a tutorial about best practices in creating public AMIs for use on EC2 last week:

How To Share and Use Public AMIs in A Secure Manner

Though the general principles put forth in the tutorial are good, some of the specifics are flawed in how to accomplish those principles. (Comments here relate to the article update from June 7, 2011 3:45 AM GMT.)

The primary message of the article is that you should not publish private information on a public AMI. Excellent advice!

Unfortunately, the article seems to recommend or at least to assume that you are building the public AMI by taking a snapshot of a running instance. Though this method seems an easy way to build an AMI and is fine for private AMIs, it is is a dangerous approach for public AMIs because of how difficult it is to identify private information and to clear that private information from a running system in such a way that it does not leak into the public AMI.

My Experience With the EC2 Judgment Day Outage

Amazon designs availability zones so that it is extremely unlikely that a single failure will take out multiple zones at once. Unfortunately, whatever happened last night seemed to cause problems in all us-east-1 zones for one of my accounts.

Of the 20+ full time EC2 instances that I am responsible for (including my startup and my personal servers) only one instance was affected. As it turns out it was the one that hosts Alestic.com along with some other services that are important to me personally.

Here are some steps I took in response to the outage:

Opinion: EC2 Outage Was Not an Outage

The Twitter wires are aflame with cute quotes on how lightning from a “cloud” took down Amazon’s EC2 “cloud” service. Snarky snippets sell well on Twitter with no research or understanding of the facts behind the issues involved.

Since “the press” is now asking for my opinion, I figured I’d jot down a quick overview of my thoughts on this non-event which has been blown out of proportion. Sorry the press, we’re all the press now (for better or for worse) but you’re welcome to extract quotes with proper attribution :)

I don’t consider lighting taking out some racks of EC2 servers to be an “outage” even though this took down some customers' running instances. EC2 and the rest of AWS were completely functional. If one or more EC2 instances fail for internal or external reasons, any customer who has built a reasonable elastic architecture on EC2 should be able automatically or even manually to fire up new servers and to fail over with very little downtime, if any.

This was a “failure” or an “error” or a “fault”, not an outage. Architectures built on top of AWS should expect and plan for failures; that’s simply the way the service was designed. AWS provides dramatic resources for detecting and dealing with big and small failures and for building highly redundant, fault tolerant, distributed systems at a global level–instead of at an individual API call or EC2 instance level.

At a normal ISP, if your server goes down, it is a serious problem. You have to wait for the ISP to work to bring it up or drive over to the data center and work on it yourself. With EC2, servers are fairly disposable. When an EC2 server goes down (which is still rare) you have at your fingertips thousands of other servers in a half dozen data centers in multiple countries.

A well designed architecture built on top of EC2 keeps important information (databases, log files, etc) in easy to manage persistent and redundant data stores which can be snapshotted, duplicated, detached, and attached to new servers. EC2 provides advanced data center capabilities few companies can build on their own.

Yes, it can take some time and effort to learn this new way of working with on-demand, self-service, pay-as-you-go hardware infrastructure and sometimes the lessons are learned the hard way, but you’ll be better off in the end.

Amazon Launches CloudWatch Monitoring Service for EC2

A few hours ago, Amazon launched a monitoring service for EC2 instances which they are calling CloudWatch. The service costs 1.5 cents per hour per EC2 instance (of any size) which comes out to $10.95 per month for an instance running 24x7.

The concurrently announced Load Balancing and Auto Scaling services are powerful, but I’m not so sure that CloudWatch is going to be useful by itself.

My initial impression on using CloudWatch is that it is hard enough to set up and use that most folks are going to get lost figuring out how to get regular, useful information out of it. Some of this could be alleviated by improved documentation, but I still think the direct, raw usage has a small target audience.

Most users on EC2 should be able to get by with free monitoring packages like munin. Since munin is running on the instance itself, it has access to many more metrics than CloudWatch. Plus it provides pretty graphs which are much easier on the eye than the raw CloudWatch output.

Munin is also trivial to set up on Ubuntu. It takes one command:

sudo apt-get install munin munin-node apache2

Wait 10 minutes for it to start collecting data, then point your browser at http://HOSTNAME/munin

There is a bit more work to do if you want to collect all of your munin data for multiple servers in a central location or to create summary charts combining metrics, but you can get a lot of value from just the above.

Reasons you might end up using CloudWatch include:

  1. You are using Amazon’s new EC2 Auto Scaling feature which requires CloudWatch. In this case, you shouldn’t have to worry about the gory details since Auto Scaling will take care of the monitoring for you.

  2. You need access to accurate network and disk IO numeric values measured in the same way that Amazon uses to charge you. E.g., you might be running sets of instances for clients and want to pass on EC2 charges to them.

  3. You are using a lot of EC2 instances in a large organization and have the time and expertise to implement data collection with CloudWatch for presenting in your own internal reports.

  4. You are creating some tools to help other people use the CloudWatch service more easily and with pretty graphs.

On that last point, I think there is an interesting opportunity for somebody to write munin plugins for CloudWatch. It looks like the monitoring data is available on a near-real time basis, and with a bit of state-keeping it should be possible to get graphs which closely represent Amazon’s monitoring records.

I’ve posted some of my feedback from testing CloudWatch on the EC2 forum.

If you’ve had a chance to check out CloudWatch, what is your opinion?