Solving: "I can't connect to my server on Amazon EC2"

| 7 Comments | 1 TrackBack

Help! I can’t connect to my EC2 instance!
Woah! My box just stopped talking to me!
Hey! I can’t access the server!

These and other variations on the connectivity theme are some of the most common problems raised on the Amazon EC2 forum.

The EC2 community and Amazon employees do a valiant job helping users track down and solve these issues despite the facts that (1) there are hundreds of reasons why a server or service might not be accessible, (2) connectivity is one of the harder problems to diagnose, especially without being hands-on, and (3) users complaining about a problem generally don’t provide the clues necessary to solve the issue (because the ones who knew what those clues were probably solved it themselves and didn’t post).

This article is an attempt to provide some general assistance to folks who are experiencing connectivity issues with Amazon EC2. Please post additional help in the comments; this document will be updated over time.

Questions

First off, you should understand that it’s ok to ask for help. When you do, though, you should provide as many details as possible about what you are trying to do and what results you are seeing. It also helps if you drop some clues about your level of expertise. A person using Linux for the first time is likely to make different mistakes on EC2 than a person who is having problems connecting to a custom AMI they built from scratch.

The more specific you can be about your problem and the more information you can provide, the more likely somebody will be able to help. Here are some common questions which are important to have answered for connectivity problems on EC2:

  1. When you say you “can’t connect” what application are you trying to use and on what port? For example: “ssh to port 22” or “accessing port 80 with Firefox”. If you don’t know what a port is, then provide as many details as possible about the application you’re using and what command or steps you are taking to initiate the connection.

  2. What, specifically, happens when you try to connect? Does it hang for a long time and eventually time out? Do you get an error message? What is the exact text you see? (Copy and paste, don’t summarize.)

  3. What is the AMI id which the instance is running? If it is not a public AMI, then what is the AMI id of the public AMI it is based on?

  4. What Linux distro and release is the instance running? E.g., Ubuntu 9.04 Jaunty, Debian etch, Fedora 8, CentOS 5.

  5. What is the instance id of the instance you are trying to contact? Providing this can let Amazon employees take a look at the internals of what might be going on.

  6. What are the internal and external IP addresses and/or host names for the instance you are trying to reach? Providing this information is, in effect, giving permission to the community to try to contact your server over the network so that they can gather information about connectivity and help solve your problem.

  7. Have you ever been able to contact this instance in the past? How recently?

  8. How long has the instance been running?

  9. Have you ever been able to contact another instance of the same AMI?

  10. Is there a difference in connectivity when you try from another EC2 instance instead of from the Internet?

  11. What were you doing when the connectivity stopped?

  12. What is the console output of the instance? You can get this through an API client or a command like:

    ec2-get-console-output INSTANCE_ID
    

There are so many reasons that connectivity might be down to a remote server or service that it would be impossible to get a significant percentage of them listed in one article. I’ll start by listing some of the more common problems here; please add to the comments as you run into or remember others.

You

By far the most common cause of the problem is you (the person experiencing the problem) and that’s ok. We all make mistakes. It’s important, though, that you start with this attitude: open to the possibilities that you typed something wrong, forgot a step, or didn’t quite understand the complex instructions. Ninety percent of the people reading this paragraph think I’m talking to somebody else; oddly, they also think this sentence is not about them.

Here are some of the most common reasons folks (including me) can’t connect to their Amazon EC2 instance. Really.

  • You’re not connecting to the right instance or to the instance you think you’re trying to connect to. Servers on EC2 are identified by opaque instance ids like i-ae1df2c6 and opaque host names like ec2-75-101-182-20.compute-1.amazonaws.com. It’s easy for anybody to get these confused or mistype them.

  • The instance you’re trying to connect to has not completed the boot process yet. Though some AMIs are ready to connect in under a minute, others can take 10+ minutes.

  • The instance you’re trying to connect to has been terminated. (Did you just shut down what you thought was a different instance?)

  • The service you are trying to reach on the instance is not running on that instance.

  • The service you are trying to reach on the instance is not listening on that port or that network interface.

  • You did not open the port in the security group.

  • You did not start the instance with the correct security group.

  • You did not start the instance with the same ssh keypair as you are using to access it.

  • Your local firewall is preventing you from getting out to that port on any server outside your network. Talk to your local network administrators.

  • Your firewall on the instance is preventing access to the service. Try shutting down iptables temporarily to see if that helps.

You “experts” laugh when you read these, but if you’re having trouble reaching a server, I recommend you go through each one carefully and double check that your assumptions are correct and the world is really as you remember it. Remember: We all make mistakes. A lot of these come from personal experience.

If you’re not quite sure what terms like “security group” and “keypair” mean in the EC2 context, I recommend going back and reading some introductory material. These are important concepts for beginners.

ssh

The ssh connectivity problems generally fall into a couple major buckets

  1. ssh is not accessible, or

  2. ssh is rejecting the connection due to a failure to authenticate or authorize

You can find out which type of problem you have by using a command like

telnet HOSTNAME 22

If this connects, then ssh is running and accepting connections on port 22. (Hit [Enter] a couple times to disconnect from the telnet session). If you don’t connect, then it’s important to note if the attempt basically hung forever or if you got a “Connection refused” type of message immediately. (Hit [Ctrl]-[C] to stop the telnet command.)

If the connection attempt hangs, then there might be a problem with the security group, iptables, or your instance might not be running at that IP address.

If the telnet connection attempt gets rejected, then there might be a problem with iptables, ssh configuration, ssh not running on the instance, or perhaps it’s listening on different port if the admin likes to configure things a bit more securely. The console output can be helpful in determining if sshd was started at boot.

If you can get connected to the ssh port with telnet, then you need to start debugging why ssh is not letting you in. The most important information can be gathered by running the ssh connection attempt in verbose (-v) mode:

ssh -v -i KEYPAIR.pem USERNAME@HOSTNAME

The complete output of this command can be very helpful to post when asking for help.

The most common problems with ssh relate to:

  • Forgetting to specify -i KEYPAIR.pem in the ssh command

  • Not starting the instance specifying a keypair

  • Using a different keypair than the one which was used to start the instance

  • Not ssh’ing with the correct username. Most EC2 images require a first connection with root@.... but images published by Canonical require a first connection with ubuntu@....

  • Not having the correct ownership or mode on the .ssh directory or authorized_keys file.

  • Not having the correct Allow* or *Authentication settings in /etc/ssh/sshd_config

Apache

Web servers are much easier to connect to than other applications because there is generally no authentication and authorization involved to get a basic web page. If you can’t reach your web server on EC2, then it’s generally one of the simple problems described above like using the wrong IP address, trying to reach a terminated instance, or not having the web port opened in the security group.

MySQL

The most common problem specific to MySQL connectivity on EC2 is the fact that MySQL is configured securely by default to not allow access by remote hosts. If you need to allow a connection from your other instances running in EC2, then edit /etc/mysql/my.cnf and replace this line:

bind-address            = 127.0.0.1

with

bind-address            = 0.0.0.0

and restart the mysqld server.

IMPORTANT! You should not open the MySQL port in the EC2 security group. You only want your own EC2 instances to connect to the database and the default security group allows your EC2 instances to connect to any port on your other EC2 instances. If you open up the port to the public, then your database will be attacked by the Internet at large.

If you need to talk to your MySQL database running on EC2 from a server running outside EC2, then do it over a secure channel like an ssh tunnel or openvpn. You don’t need the MySQL port open in the security group to do this. The MySQL protocol is not by itself encrypted and your usernames and passwords would be sent in the clear for anybody else to intercept if you didn’t talk over a secure channel.

Custom AMIs

If you are building your own custom AMIs from scratch, then there are a number of complicated barriers to getting network and ssh connectivity working. Unfortunately it is nearly impossible to debug these problems since you don’t have access to the machine to see what went wrong. Console output is your only friend in these cases.

Here are some examples of odd things which others in the EC2 community have run into and solved:

  • Make sure you start networking on instance boot. It should come up with DHCP on eth0.

  • Make sure your Linux distro does not save the MAC address somewhere, preventing the network from functioning in the next instance. Ubuntu stores this in the /etc/udev/rules.d/70-persistent-net.rules file and Debian stores this in the /etc/udev/rules.d/z25_persistent-net.rules file.

  • Make sure your image downloads the ssh keypair and installs it in authorized_keys.

  • Make sure you have the right devices created and file systems mounted.

  • Make sure you’re using a udev lower than v144 as higher versions are incompatible with Amazon’s 2.6.21 kernel.

  • Make sure you’re using the right libc6 and related configurations including /lib/tls

Amazon

I realize this was your first thought, but it’s such a rare cause, I’ve put it here at the end. Sometimes there are problems with Amazon EC2. The hardware running your instance may fail or the networks might have temporary glitches. There are a couple different classes of problems here:

  1. Small scale problems local to the hardware running your instance. Though these are rare for any single instance, they are happening all the time for some customer somewhere given that AWS has hundreds of thousands of customers. Amazon often sends you an email when they notice that an instance is starting to have problems, and you should move to a new instance as soon as possible. If the failure happens without the warning, the only solution is to move to a new instance anyway, so you should always be prepared to do this.

  2. Large scale problems which affect a large number of customers simultaneously. These are very rare, and generally don’t affect more than a single availability zone given the way that Amazon has spread out the risk in their architecture.

You can check the AWS service health dashboard to see if Amazon is aware of any widespread problems with the EC2 service. If there are problems with a specific availability zone, you may want to move your servers to a different availability zone until the issues get resolved.

First Responses

For general cases where you can’t immediately figure out what went wrong with the connectivity, here are two things which are almost always recommended on EC2: reboot the instance and replace the instance.

Reboot your EC2 instance using the EC2 Console, another API client, or a command like:

    ec2-reboot-instances INSTANCE_ID

After giving it sufficient time to come up, see if that fixed the connectivity problem. Do not reboot your instance if you currently have a working ssh connection to it, but other ssh connections are failing!

If you have a production service running on Amazon EC2 and you lose connectivity to an instance, then I recommend your first reaction be to kick off a replacement instance so that it boots and configures itself while you investigate the original issue. If you don’t solve the problem by the time the replacement is ready, simply switch over to the new server. You may want to continue investigating what happened with the old server, though I generally don’t care what the problem was unless it happens more than once or twice in a short time period.

If your installation environment does not allow you to easily start replacement instances, then you should reconsider how you are using EC2 and work to improve this.

Seeking Help

If the above did not help you solve your problem reaching your EC2 instance, you may want to reach out to the community including some AWS employees on the EC2 forum.

Amazon also has premium AWS support available.

Requests for connectivity help by posting a comment on this particular thread will not be published or answered. Please only post a comment if you have corrections or additional information to share for users experiencing problems. I do occasionally receive and respond to questions posted on other articles, but for this topic, please use the EC2 forum.

1 TrackBack

TrackBack URL: http://alestic.com/mt/mt-tb.cgi/49

Wrote a new article to help people solve common EC2 connectivity problems: http://alestic-connect.notlong.com Read More

7 Comments

Wow - this article is sorely needed, and I can think of nobody better than you to have done it! Thanks!

There's also the class of "I can't connect to my Windows instance" problems, which mostly have to do with RDP connection problems or forgetting the password for a derived AMI.

For connecting via RDP problems, try the same procedure as mentioned in the article for ssh but telnet to port 3389 instead. If telnet's connection is refused then it may be a Windows Firewall issue.

For the forgotten password - well - I don't have enough experience with that, so perhaps another commenter can enlighten.

Shlomo: Maybe I should have added the keyword "Linux" somewhere in the title. Some of my best friends use Windows, but I can't say I know anything about them ;-)
Perhaps you could write up an article about EC2 Windows connectivity over on http://clouddevelopertips.blogspot.com/

:-) I have the same problem, Eric: I don't use Windows instances myself. What limited experience I have with them is from consulting projects.
However, I am already working on a blog post about diagnosing common problems with ELB, inspired by this article.

Thanks again for this article. As I mentioned above, you've inspired me to write a related article, "Solving Common ELB Problems with a Sanity Test", which reviews common ELB problems, how to detect them, how to fix them, and also provides an elb-sanity-test script to automate the detection of some common problems.

http://clouddevelopertips.blogspot.com/2009/09/solving-common-elb-problems-with-sanity.html

Make sure you’re using a udev lower than v144 as it is incompatible with Amazon’s 2.6.21 kernel.

I think you might want to rephrase this. Either it is contradictory, or the problem is me :).

I think you meant: make sure you’re using a udev lower than v144 as higher versions are incompatible with Amazon’s 2.6.21 kernel.

Hope to help.

iwein: Thanks for catching this. Fixed.

Leave a comment

Stay Updated

Subscribe with email address:
 Subscribe with a reader
Join the EC2 Ubuntu Google Group
Follow Eric Hammond on Twitter

More Entries

Understanding Access Credentials for AWS/EC2
Amazon Web Services (AWS) has a dizzying proliferation of credentials, keys, ids, usernames, certificates, passwords, and codes which are used…
How *Not* to Upgrade to Ubuntu 9.10 Karmic on Amazon EC2
WARNING! Though most Ubuntu 9.04 Jaunty systems can upgrade to 9.10 Karmic in place, this is not possible on EC2…
1 TB of Memory in 1 Minute with 1 Command
Amazon Web Services just announced the release of two new instance types for EC2. These new types have 34.2 GB…
New Releases of Ubuntu and Debian Images for Amazon EC2 (Kernel, Security, PPA, runurl, Tools)
New updates have been released for the Ubuntu and Debian AMIs (EC2 images) published on: http://alestic.com The following notes apply…
Encrypting Ephemeral Storage and EBS Volumes on Amazon EC2
Over the years, Amazon has repeatedly recommended that customers who care about the security of their data should consider encrypting…
Creating Consistent EBS Snapshots with MySQL and XFS on EC2
In the article Running MySQL on Amazon EC2 with Elastic Block Store I describe the principles involved in using EBS…
Hidden Dangers in Creating Public EBS Snapshots on EC2
Amazon EC2 recently released a feature which lets you share an EBS snapshot so that other accounts can access it.…
Solving: "I can't connect to my server on Amazon EC2"
Help! I can’t connect to my EC2 instance! Woah! My box just stopped talking to me! Hey! I can’t access…
runurl - A Tool and Approach for Simplifying user-data Scripts on EC2
Many Ubuntu and Debian images for Amazon EC2 include a hook where scripts passed as user-data will be run as…
Presentation: Building Custom Linux Images for Amazon EC2
At the end of July, I gave a presentation at O’Reilly’s Open Source Convention (OSCON 2009) in San Jose. The…
New Releases of Ubuntu and Debian Images for Amazon EC2 (Tools, Security)
New updates have been released for the Ubuntu and Debian AMIs (EC2 images) published on: http://alestic.com The following notes apply…
Poll: Verifying ssh Fingerprint on EC2 Instances
When you ssh to a new EC2 instance, you are presented with the challenge: The authenticity of host 'XXX' can't…
EBS Snapshots of a MySQL Slave Database on EC2
At our company, CampusExplorer.com, we regularly snapshot the EBS volume which holds our MySQL database using the basic procedure I…
Matching EC2 Availability Zones Across AWS Accounts
Summary: EC2 availability zone names in different accounts do not match to the same underlying physical infrastructure. This article explains…
Does Your Product Help Users Build AMIs for Amazon EC2?
I will be speaking at the O’Reilly Open Source Convention (OSCON 2009) next week, giving a presentation on building custom…
Creating a New Image for EC2 by Rebundling a Running Instance
When you start up an instance (server) on Amazon EC2, you need to pick the image or AMI (Amazon Machine…
New Releases of Ubuntu Images for Amazon EC2 2009-06-23 (Karmic Koala Alpha released)
Ubuntu Karmic Koala Alpha is being developed and will be released as Ubuntu 9.10 in October. If you want to…
Using RAID on EC2 EBS Volumes to Break the 1TB Barrier and Increase Performance
Amazon EC2 currently has a limit of 1,000 GB (1 TB) for EBS volumes (Elastic Block Store). It is possible…
New Releases of Ubuntu and Debian Images for Amazon EC2 2009-06-14 (Reliability and Security)
New updates have been released for the Ubuntu and Debian AMIs (EC2 images) published on: http://alestic.com The following improvements are…
Repost: Hiring EC2/AWS Developers/Engineers
Reposting a response I wrote to a user on Amazon’s EC2 forum who is having a hard time finding good…