Solving: "I can't connect to my server on Amazon EC2"

Help! I can’t connect to my EC2 instance!
Woah! My box just stopped talking to me!
Hey! I can’t access the server!

These and other variations on the connectivity theme are some of the most common problems raised on the Amazon EC2 forum.

The EC2 community and Amazon employees do a valiant job helping users track down and solve these issues despite the facts that (1) there are hundreds of reasons why a server or service might not be accessible, (2) connectivity is one of the harder problems to diagnose, especially without being hands-on, and (3) users complaining about a problem generally don’t provide the clues necessary to solve the issue (because the ones who knew what those clues were probably solved it themselves and didn’t post).

This article is an attempt to provide some general assistance to folks who are experiencing connectivity issues with Amazon EC2. Please post additional help in the comments; this document will be updated over time.

Questions

First off, you should understand that it’s ok to ask for help. When you do, though, you should provide as many details as possible about what you are trying to do and what results you are seeing. It also helps if you drop some clues about your level of expertise. A person using Linux for the first time is likely to make different mistakes on EC2 than a person who is having problems connecting to a custom AMI they built from scratch.

The more specific you can be about your problem and the more information you can provide, the more likely somebody will be able to help. Here are some common questions which are important to have answered for connectivity problems on EC2:

  1. When you say you “can’t connect” what application are you trying to use and on what port? For example: “ssh to port 22” or “accessing port 80 with Firefox”. If you don’t know what a port is, then provide as many details as possible about the application you’re using and what command or steps you are taking to initiate the connection.

  2. What, specifically, happens when you try to connect? Does it hang for a long time and eventually time out? Do you get an error message? What is the exact text you see? (Copy and paste, don’t summarize.)

  3. What is the AMI id which the instance is running? If it is not a public AMI, then what is the AMI id of the public AMI it is based on?

  4. What Linux distro and release is the instance running? E.g., Ubuntu 9.04 Jaunty, Debian etch, Fedora 8, CentOS 5.

  5. What is the instance id of the instance you are trying to contact? Providing this can let Amazon employees take a look at the internals of what might be going on.

  6. What are the internal and external IP addresses and/or host names for the instance you are trying to reach? Providing this information is, in effect, giving permission to the community to try to contact your server over the network so that they can gather information about connectivity and help solve your problem.

  7. Have you ever been able to contact this instance in the past? How recently?

  8. How long has the instance been running?

  9. Have you ever been able to contact another instance of the same AMI?

  10. Is there a difference in connectivity when you try from another EC2 instance instead of from the Internet?

  11. What were you doing when the connectivity stopped?

  12. What is the console output of the instance? You can get this through an API client or a command like:

    ec2-get-console-output INSTANCE_ID
    

There are so many reasons that connectivity might be down to a remote server or service that it would be impossible to get a significant percentage of them listed in one article. I’ll start by listing some of the more common problems here; please add to the comments as you run into or remember others.

You

By far the most common cause of the problem is you (the person experiencing the problem) and that’s ok. We all make mistakes. It’s important, though, that you start with this attitude: open to the possibilities that you typed something wrong, forgot a step, or didn’t quite understand the complex instructions. Ninety percent of the people reading this paragraph think I’m talking to somebody else; oddly, they also think this sentence is not about them.

Here are some of the most common reasons folks (including me) can’t connect to their Amazon EC2 instance. Really.

  • You’re not connecting to the right instance or to the instance you think you’re trying to connect to. Servers on EC2 are identified by opaque instance ids like i-ae1df2c6 and opaque host names like ec2-75-101-182-20.compute-1.amazonaws.com. It’s easy for anybody to get these confused or mistype them.

  • The instance you’re trying to connect to has not completed the boot process yet. Though some AMIs are ready to connect in under a minute, others can take 10+ minutes.

  • The instance you’re trying to connect to has been terminated. (Did you just shut down what you thought was a different instance?)

  • The service you are trying to reach on the instance is not running on that instance.

  • The service you are trying to reach on the instance is not listening on that port or that network interface.

  • You did not open the port in the security group.

  • You did not start the instance with the correct security group.

  • You did not start the instance with the same ssh keypair as you are using to access it.

  • Your local firewall is preventing you from getting out to that port on any server outside your network. Talk to your local network administrators.

  • Your firewall on the instance is preventing access to the service. Try shutting down iptables temporarily to see if that helps.

You “experts” laugh when you read these, but if you’re having trouble reaching a server, I recommend you go through each one carefully and double check that your assumptions are correct and the world is really as you remember it. Remember: We all make mistakes. A lot of these come from personal experience.

If you’re not quite sure what terms like “security group” and “keypair” mean in the EC2 context, I recommend going back and reading some introductory material. These are important concepts for beginners.

ssh

The ssh connectivity problems generally fall into a couple major buckets

  1. ssh is not accessible, or

  2. ssh is rejecting the connection due to a failure to authenticate or authorize

You can find out which type of problem you have by using a command like

telnet HOSTNAME 22

If this connects, then ssh is running and accepting connections on port 22. (Hit [Enter] a couple times to disconnect from the telnet session). If you don’t connect, then it’s important to note if the attempt basically hung forever or if you got a “Connection refused” type of message immediately. (Hit [Ctrl]-[C] to stop the telnet command.)

If the connection attempt hangs, then there might be a problem with the security group, iptables, or your instance might not be running at that IP address.

If the telnet connection attempt gets rejected, then there might be a problem with iptables, ssh configuration, ssh not running on the instance, or perhaps it’s listening on different port if the admin likes to configure things a bit more securely. The console output can be helpful in determining if sshd was started at boot.

If you can get connected to the ssh port with telnet, then you need to start debugging why ssh is not letting you in. The most important information can be gathered by running the ssh connection attempt in verbose (-v) mode:

ssh -v -i KEYPAIR.pem USERNAME@HOSTNAME

The complete output of this command can be very helpful to post when asking for help.

The most common problems with ssh relate to:

  • Forgetting to specify -i KEYPAIR.pem in the ssh command

  • Not starting the instance specifying a keypair

  • Using a different keypair than the one which was used to start the instance

  • Not ssh’ing with the correct username. Amazon Linux instances require a first ssh connection with ec2-user@... while mages published by Canonical require a first connection with ubuntu@... and others might need you to ssh using root@...

  • Not having the correct ownership or mode on the .ssh directory or authorized_keys file.

  • Not having the correct Allow* or *Authentication settings in /etc/ssh/sshd_config

Apache

Web servers are much easier to connect to than other applications because there is generally no authentication and authorization involved to get a basic web page. If you can’t reach your web server on EC2, then it’s generally one of the simple problems described above like using the wrong IP address, trying to reach a terminated instance, or not having the web port opened in the security group.

MySQL

The most common problem specific to MySQL connectivity on EC2 is the fact that MySQL is configured securely by default to not allow access by remote hosts. If you need to allow a connection from your other instances running in EC2, then edit /etc/mysql/my.cnf and replace this line:

bind-address            = 127.0.0.1

with

bind-address            = 0.0.0.0

and restart the mysqld server.

IMPORTANT! You should not open the MySQL port in the EC2 security group. You only want your own EC2 instances to connect to the database and the default security group allows your EC2 instances to connect to any port on your other EC2 instances. If you open up the port to the public, then your database will be attacked by the Internet at large.

If you need to talk to your MySQL database running on EC2 from a server running outside EC2, then do it over a secure channel like an ssh tunnel or openvpn. You don’t need the MySQL port open in the security group to do this. The MySQL protocol is not by itself encrypted and your usernames and passwords would be sent in the clear for anybody else to intercept if you didn’t talk over a secure channel.

Custom AMIs

If you are building your own custom AMIs from scratch, then there are a number of complicated barriers to getting network and ssh connectivity working. Unfortunately it is nearly impossible to debug these problems since you don’t have access to the machine to see what went wrong. Console output is your only friend in these cases.

Here are some examples of odd things which others in the EC2 community have run into and solved:

  • Make sure you start networking on instance boot. It should come up with DHCP on eth0.

  • Make sure your Linux distro does not save the MAC address somewhere, preventing the network from functioning in the next instance. Ubuntu stores this in the /etc/udev/rules.d/70-persistent-net.rules file and Debian stores this in the /etc/udev/rules.d/z25_persistent-net.rules file.

  • Make sure your image downloads the ssh keypair and installs it in authorized_keys.

  • Make sure you have the right devices created and file systems mounted.

  • Make sure you’re using a udev lower than v144 as higher versions are incompatible with Amazon’s 2.6.21 kernel.

  • Make sure you’re using the right libc6 and related configurations including /lib/tls

Amazon

I realize this was your first thought, but it’s such a rare cause, I’ve put it here at the end. Sometimes there are problems with Amazon EC2. The hardware running your instance may fail or the networks might have temporary glitches. There are a couple different classes of problems here:

  1. Small scale problems local to the hardware running your instance. Though these are rare for any single instance, they are happening all the time for some customer somewhere given that AWS has hundreds of thousands of customers. Amazon often sends you an email when they notice that an instance is starting to have problems, and you should move to a new instance as soon as possible. If the failure happens without the warning, the only solution is to move to a new instance anyway, so you should always be prepared to do this.

  2. Large scale problems which affect a large number of customers simultaneously. These are very rare, and generally don’t affect more than a single availability zone given the way that Amazon has spread out the risk in their architecture.

You can check the AWS service health dashboard to see if Amazon is aware of any widespread problems with the EC2 service. If there are problems with a specific availability zone, you may want to move your servers to a different availability zone until the issues get resolved.

First Responses

For general cases where you can’t immediately figure out what went wrong with the connectivity, here are two things which are almost always recommended on EC2: reboot the instance and replace the instance.

Reboot your EC2 instance using the EC2 Console, another API client, or a command like:

    ec2-reboot-instances INSTANCE_ID

After giving it sufficient time to come up, see if that fixed the connectivity problem. Do not reboot your instance if you currently have a working ssh connection to it, but other ssh connections are failing!

If you have a production service running on Amazon EC2 and you lose connectivity to an instance, then I recommend your first reaction be to kick off a replacement instance so that it boots and configures itself while you investigate the original issue. If you don’t solve the problem by the time the replacement is ready, simply switch over to the new server. You may want to continue investigating what happened with the old server, though I generally don’t care what the problem was unless it happens more than once or twice in a short time period.

If your installation environment does not allow you to easily start replacement instances, then you should reconsider how you are using EC2 and work to improve this.

Seeking Help

If the above did not help you solve your problem reaching your EC2 instance, you may want to reach out to the community including some AWS employees on the EC2 forum.

Amazon also has premium AWS support available.

Requests for connectivity help by posting a comment on this particular thread will not be published or answered. Please only post a comment if you have corrections or additional information to share for users experiencing problems. I do occasionally receive and respond to questions posted on other articles, but for this topic, please use the EC2 forum.

[Update 2011-10-01: Amazon Linux requires ssh using ubuntu@...]