HELP, I CAN'T REACH MY SERVER!
A Quick & Dirty Guide to Connection Troubleshooting for Beginners
This is not intended to be a technically intricate document to
allow a systems administrator to understand why an application or
server is not working correctly. Rather, it is meant as a step-by-step
guide for a relative technical novice to narrow down possible causes
for an unreachable server, before having to call a helpdesk.
Some of this stuff is pretty basic, so just skip around until you
find something you can use.
N.b. all output is from entering these commands on a laptop running
FreeBSD (Unix); syntax may vary depending on your operating system.
I. Protocols & Ports
Out of academic interest, it might be useful for you to take a look at the
OSI (Open Systems Interconnect) model. This is a theoretical, schematic
model for looking at network connections--rather than being a written-in-stone
classification for different levels of network traffic, it's meant to allow
you to think about parts of network connections a bit more abstractly, and
to help you figure out what does what.
Here the main concepts you should consider:
A. Internet Protocol (IP)
Most of the Internet, and most local area networks (LANs) function based
on the IP protocol, also referred to as IP version 4 (IPv4.) IP addresses
are unique on the internet, and usually look like this:
205.171.14.93
Each of these four 'octals' is an 8-bit number, and cannot be higher than 255.
IP addresses are parts of 'subnets', or 'ranges', which are assigned by
organizations such as IANA and RIPE. In order to be properly visible,
all IP addresses you want a single machine to reach on the internet must be
unique. Likewise, it is a bad idea to simply grab an IP on your local network
and assume it will work--you may run into an 'IP conflict' which will make
your network administrator, and possibly another user, very unhappy.
B. Transmission Control Protocol (TCP)
TCP is the backbone of internet traffic. Simply put, it is a means to transport
'packets' between applications in a way that assures delivery. Every TCP packet
has a unique identifier ("sequence number") which is incremented with each
packet in a given connection. TCP connections look as follows:
Client --> Server "SYN" (Synchronize)
Client <-- Server "SYN/ACK" (Synchronize Acknowledged)
Client <-- Server "ACK" (Acknowledge)
Client <-> Server "ACK" (at this point the connection is "established"
Client --> Server "FIN" (Finish)
Client <-- Server "FIN/ACK" (Finish Acknowledged)
Client <-- Server "RST" (Reset--the connection is now dead.)
Depending on the type of connection, not all of these are necessary--for example,
the server may just "RST" the connection without prompting, effectively cutting it off.
C. Unreliable Datagram Protocol (UDP)
UDP is a lot faster than TCP, as it does not do any error checking. Concordantly,
it is used for applications where you don't care so much if the occasional chunk
does not arrive. Internet video streaming is a typical application of UDP.
D. Internet Control Message Protocol (ICMP)
ICMP is a way for traffic on the internet to decide where to go,
whether hosts are reachable, etc. It advises hosts that a next hop may not be
available for traffic.
E. Stacks
An IP stack is an area of virtual memory allotted in a machine to dealing with
network connections. All incoming and outgoing network connections are dealt
with in this stack.
F. Ports
A "port" is a virtual address configured on a server to listen to connections. It
is analogous to a mailbox in a highrise apartment building. Each application
has a unique port, although many individual connections of the same type can be
handled by a single port. Ports 1-512 by convention are "privileged", meaning they
are supposed to only be handled by applications approved by a server administrator.
Ports 512-1024 are "privileged", which implies that they are not to be used by applications
run by a mere mortal user. However, whether this convention is adhered to or not is
up to a server's administrator.
For a list of commonplace ports, look at
http://www.iana.org/assignments/port-numbers
II. What's happening?
You may be seeing that a server "just isn't reachable". This be due
to a number of reasons--there's no way to predict all of them, but
this might give you a few hints as to what the cause could be.
Let's do some preliminary testing.
A. Machine Settings (Very Basic)
First, check to see that you are on the net. Can you reach anything? Do
any of your networked applications work? Are Windows network drives accessible?
Can you access any web pages at all? If not, check the following:
-Is the cable plugged in? If you have a link light on your network card,
is it lit?
-Do you have a default route? Is your interface configured?
Under Windows, run the command
ipconfig
in a DOS Window. If you see an entry for 'Default Gateway', you're set.
Under Unix, run
ifconfig -a
and
netstat -rn|grep default
Ifconfig (InterFace Config) allows you to see the configuration of all your
active network interfaces. Netstat is a general network utility command;
If netstat -an shows you a default gateway, you should be okay.
-Are your general settings correct? Compare your station with others on your local
network. If you have a network administrator handy, check your settings against
what he gave you. Can you ping them? Can they ping you? Can you ping your default
gateway? See below for details on Ping.
B. ICMP/Ping
First, let's find out whether a host responds to basic pings. "Ping" (Packet InterNet Groper).
is a common implementation of the ICMP protocol, and used to probe hosts. On
Unix servers, it is usually found in /usr/sbin or /usr/bin, and on Windows
machines it sits in c:\windows\system32\ or c:\winnt\system32.
The Syntax of this command is
ping
Successful pings generally mean that a server's address is reachable,
and look like this:
bolo:[15:23]~> ping www.berkeley.edu
PING arachne.berkeley.edu (169.229.131.109): 56 data bytes
64 bytes from 169.229.131.109: icmp_seq=0 ttl=237 time=180.286 ms
64 bytes from 169.229.131.109: icmp_seq=1 ttl=237 time=173.291 ms
^C
--- arachne.berkeley.edu ping statistics ---
3 packets transmitted, 2 packets received, 33% packet loss
round-trip min/avg/max/stddev = 173.291/176.788/180.286/3.497 ms
Under Solaris, 'ping -s ' will return this same result. 'Ping' by itself
just will return ' is alive.'
If your client station does not know how to get to a net (it does not have a 'route'),
or does not have a default gateway, you will see something like
bolo:[15:25]~> ping www.cnn.comPING cnn.com (207.25.71.5): 56 data bytes
ping: sendto: No route to host
ping: sendto: No route to host
ping: sendto: No route to host
^C
--- cnn.com ping statistics ---
13 packets transmitted, 0 packets received, 100% packet loss
When you try to reach a remote address, you may pass many network devices ("routers")
on the way. If you are trying to reach an address that they don't know how to get to,
you could see
bolo:[15:28]~> ping 10.0.0.1
PING 10.0.0.1 (10.0.0.1): 56 data bytes
36 bytes from Serial10-1-0.GW3.ZUR4.ALTER.NET (146.188.39.89): Destination Host Unreachable
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 5400 ae7e 0 0000 3c 01 0384 192.168.1.254 10.0.0.1
Lastly, if your output looks as follows, it is possible that there is a firewall or
similar security device in the way, which is not permitting ICMP traffic. This does not
mean that the host is down, just that it cannot be pinged.
bolo:[15:32]~> ping 165.222.187.100
PING 165.222.187.100 (165.222.187.100): 56 data bytes
^C
--- 165.222.187.100 ping statistics ---
32 packets transmitted, 0 packets received, 100% packet loss
C. Traceroute
Traceroute is a program which allows you to find out which hops your traffic
passes on the way to a target server. It is usually found in the same location
as ping, both under Unix and Windows. Traceroute's main function is to allow you
a look at whether the path a transmission takes from client to server looks
sane. A successful traceroute means that the host is up and running, and that
there is a proper route to it. Traceroute output looks like this:
soda:[6:38]~> traceroute www.berkeley.edu
traceroute to arachne.berkeley.edu (169.229.131.109), 64 hops max, 40 byte packets
1 gig6-1v247.snr1.CS.Berkeley.EDU (128.32.247.1) 0.744 ms 0.655 ms 0.610 ms
2 vlan241.inr-201-eva.Berkeley.EDU (128.32.255.161) 0.858 ms 0.761 ms 0.818 ms
3 vlan209.inr-203-eva.Berkeley.EDU (128.32.255.2) 0.861 ms 0.986 ms 0.774 ms
4 arachne.Berkeley.EDU (169.229.131.109) 0.781 ms 0.717 ms 0.732 ms
It can also look as follows. This does not mean that the server is necessarily
down or unreachable, but could imply that there is a firewall/security device
in the middle:
soda:[6:39]~> traceroute www.google.com
traceroute to www.google.com (216.239.33.101), 64 hops max, 40 byte packets
1 gig6-1v247.snr1.CS.Berkeley.EDU (128.32.247.1) 0.777 ms 0.663 ms 0.612 ms
2 vlan241.inr-201-eva.Berkeley.EDU (128.32.255.161) 0.870 ms 0.736 ms 0.785 ms
3 gigE2-0.inr-000-eva.Berkeley.EDU (128.32.0.193) 0.599 ms 0.537 ms 0.515 ms
4 pos3-0.c2-berk-gsr.Berkeley.EDU (128.32.0.90) 0.692 ms 0.579 ms 0.559 ms
5 SUNV--BERK.POS.calren2.net (198.32.249.14) 1.810 ms 1.852 ms 1.992 ms
6 STAN--SUNV.POS.calren2.net (198.32.249.74) 2.230 ms 2.167 ms 2.236 ms
7 PAIX-7206--STAN-3.ATM.calren2.net (198.32.249.186) 3.346 ms 2.836 ms 2.906 ms
8 paix.exodus.net (198.32.176.15) 3.071 ms 3.164 ms 3.503 ms
9 ibr02-g1-0.paix01.exodus.net (206.79.9.242) 3.875 ms 2.931 ms 2.848 ms
10 bbr01-p6-0.sntc03.exodus.net (209.185.9.241) 3.404 ms 3.320 ms 3.366 ms
11 dcr04-g4-0.sntc03.exodus.net (216.33.153.68) 5.334 ms 3.382 ms 9.461 ms
12 csr01-ve242.sntc03.exodus.net (216.33.153.181) 4.482 ms 3.876 ms 4.251 ms
13 google-exodus.exodus.net (64.68.64.210) 4.574 ms 3.610 ms 4.610 ms
14 exbi1-gige-1-3.net.google.com (216.239.47.2) 5.256 ms 5.382 ms 5.315 ms
15 * * *
16 * * *
17 * *^C
Lastly, you may see the next block of output--this implies that there is a firewall
in the middle which allows traces to the final destination, but not to itself (thus
nobody can identify it as a hop in the middle):
soda:[6:43]~> traceroute altavista.digital.com
traceroute to altavista.digital.com (209.73.180.1), 64 hops max, 40 byte packets
1 gig6-1v247.snr1.CS.Berkeley.EDU (128.32.247.1) 0.837 ms 0.668 ms 0.685 ms
2 vlan241.inr-201-eva.Berkeley.EDU (128.32.255.161) 1.465 ms 0.785 ms 0.754 ms
3 fast4-1-0.inr-new-666-doecev.Berkeley.EDU (128.32.0.73) 1.779 ms 2.628 ms 2.364 ms
4 qsv-juniper--ucb-gw.calren2.net (128.32.0.70) 3.717 ms 3.283 ms 3.259 ms
5 svl-edge-09.inet.qwest.net (65.113.32.209) 3.247 ms 2.917 ms 3.361 ms
6 svl-core-01.inet.qwest.net (205.171.14.93) 3.315 ms 3.632 ms 2.802 ms
7 sjo-core-01.inet.qwest.net (205.171.5.99) 3.831 ms 2.907 ms 3.094 ms
8 sfo-core-02.inet.qwest.net (205.171.5.123) 4.591 ms 4.839 ms 5.857 ms
9 jfk-core-01.inet.qwest.net (205.171.5.113) 66.522 ms 67.079 ms 67.408 ms
10 jfk-core-03.inet.qwest.net (205.171.230.6) 66.856 ms 66.065 ms 67.102 ms
11 jfk-edge-04.inet.qwest.net (205.171.30.114) 66.596 ms 66.322 ms 66.156 ms
12 63.148.0.22 (63.148.0.22) 74.506 ms 74.240 ms 73.975 ms
13 * * *
14 altavista.com (209.73.180.1) 74.845 ms 73.763 ms 74.196 ms
Traceroute and ping are two different applications. Furthermore, traceroute
on Windows and NT are not the same type of traffic. Windows traceroute uses "ICMP ECHO REQUEST"
to see whether routes to a host are alive. Unix and Cisco traceroute is generally based on UDP.
Some firewalls allow one, but not the other; if you suspect a firewall between your client
and the target server, try traceroute from both types of platforms.
D. Telnet
Telnet is a very basic login application designed to emulate a mainframe terminal.
However, telnet has the ability to connect to any TCP ports to show you whether a service
is even listening or not. On unix, telnet is normally located in /usr/bin/, in Windows
it should be in C:\{Windows,Win98,WINNT}\System32.
On Unix, you can telnet to a given port for an application using the syntax
telnet hostname portnumber
Based on the documentation in section I.F., to see if my webserver on
www.switch.ch is listening on the default http port, I would try the following:
bolo:[17:07]~> telnet www.switch.ch 80
Trying 130.59.10.30...
Connected to etna.switch.ch.
Escape character is '^]'.
This means that I have a connection established to the server. In effect, what
I have just done is "faked" the same sort of connection that a web browser (Netscape,
Opera, Internet Explorer) makes to a webserver. Under Windows NT/9x telnet, the equivalent
of this is when the server address is displayed in the telnet window's title bar. Don't
forget to change the port number from the default 'telnet' (port 23) before connecting.
Windows 2000 telnet works like Unix telnet.
If my service uses TCP and I get this prompt, and I am still having connection difficulties,
I can rule out that there is a firewall or routing problem; the issue is most likely
with the server's configuration or my client application.
Another possible response to a telnet attempt, this time to the SMTP (Simple Mail Transport
Protocol, the most-used Internet email standard) port on an address not configured to
receive SMTP mail, may yield this:
bolo:[17:15]~> telnet cfpa11.berkeley.edu 25
Trying 128.32.124.189...
telnet: connect to address 128.32.124.189: Connection refused
telnet: Unable to connect to remote host
This means that the target has sent me a "RST", or "RESET" TCP packet in response to my
Attempt to "SYN". It's effectively telling me, no, I'm not paying attention to this
port. The server is not configured correctly, or the program on the server which is
supposed to be listening on the port isn't even running.
Telnet can also yield this:
bolo:[17:15]~> telnet www.google.com 12345
Trying 216.239.37.101...
telnet: connect to address 216.239.37.101: Operation timed out
telnet: Unable to connect to remote host
This usually indicates the presence of a firewall or other similar security filter
in front of or on the server which explicitly disallows connections to port 12345.
Under Windows NT/9x telnet, you will see 'Telnet (none)' in the title bar,
and the cursor will be an hourglass.
III. What's Happening on the Server?
Now, let's see if you can log into your server. If you are able to telnet/ssh/rlogin
to your Unix server, or have console/remote desktop access to a Windows NT server
on which your network server
A. What's listening to what?
On your unix server, you can have a look at some locally defined ports in the file
/etc/services. Furthermore, the file /etc/inetd.conf specifies which of these services
your inetd, which is the program run by Unix at startup to listen to network connections,
has an actual program (server) associated with it. This means that when Inetd "sees" a
connection arriving on a given port, it starts a service to deal with it.
However, this is not the only way to run servers. Different types of Unix (Linux,
Solaris, AIX, etc.) have various methods for starting servers via scripts at boot.
The easiest way to look at this is (on the server!) to run the commands
netstat -an
or
lsof -i (if installed)
To see which services are (supposedly) listening on which ports. If you don't understand
these commands, see whether there is a manual page by running the command
man netstat (or lsof)
This should explain the syntax to you in the usual clear, concise Unix manual page
language. In short, for netstat -an, you may see these keywords:
CLOSED, LISTEN, SYN_SENT, SYN_RECEIVED, ESTABLISHED, CLOSE_WAIT, FIN_WAIT_1,
CLOSING, LAST_ACK, FIN_WAIT_2, TIME_WAIT
Consult the netstat man page for the meaning of these. However, if you see your port
listed with one of these comments, it means something's happening with it. Windows NT
also has a netstat command, which should be located in your System32 directory.
Lsof (LiSt Open Files) with the -i argument should tell you which actual application
is using which port. Have a look at its output, and see if it looks reasonable.
Under Windows, there isn't much I can tell you except to look at your Task Manager
(taskmgr.exe), and your services (under 'Administrative Tools'), and to not run
servers on Windows :-)
B. Sniffing
Lastly, if you have administrator rights on your server, you can run a sniffer for
traffic reaching the server. Under Solaris, 'snoop' is an example, while BSD-based
systems may have 'tcpdump' installed. Windows systems generally require an additional
piece of software to be installed. RTFM is left to the user. The point behind this
is to see whether traffic from your client address actually reaches the server. Generally,
sniffers let you sort traffic by destination port or source IP address, allowing you to
see whether traffic from the client even reaches your server system.
Please please be aware of the legal/administrative consequences of sniffing network
traffic on a system--you may see things people don't want you to see, such as passwords
or confidential data. This is why most sniffer software is restricted to use
by people with administrator rights.
IV. Common Services
A server can be down for a number of reasons. Let's take a few common applications
and what could cause them to not respond.
A. Hyper Text Transfer Protocol (HTTP), Web Pages
Some common errors found on webservers are listed at
http://offline.home.cern.ch/offline/web/http_error_codes.html
CERN came up with the World Wide Web, so they should know. By the way, the above
is an example of a "Uniform Resource Locator", or "URL", which is nothing more than a
means of specifying a protocol/application (in this case, HTTP, but can be FTP, Gopher,
telnet, etc etc) an address, and a path (the stuff after the first single "/").
These mean that, while the webserver you are trying to access is listening, there
is some problem either with your client (browser) or the web server.
B. Web Applications
If you are behind a firewall, or a proxy server, you may have troubles with certain
applications (java applets for example) which your browser attempts to download.
Specifically, this can be caused by an application attempting to access a server
which your firewall does not allow, even though the application was actually installed
via regular HTTP.
Certain multimedia applications, such as RealPlayer and other streaming media clients
require various other ports to be opened--this means that your browser understands
that a certain kind of file a webserver is giving it should be opened with a specific
application, and the application then handles its own traffic.
C. File Transfer Protocol (FTP)
FTP can either be via anonymous FTP (what you usually see when you have a URL in
your browser window starting with 'ftp://') or interactively, either via command-line
ftp from Unix or Windows, or a graphical client such as 'CuteFTP' or 'WS_FTP'.
There are two types of FTP, active and passive.
Active FTP opens a connection from your client to the server, which is used for passing
'administrative' information--this includes your login and password. However, the FTP server
then opens a connection back to you for the actual file transfer, regardless of what
direction the file actually flows in (whether you 'PUT' or 'GET' the file from the server.)
Certain firewalls can have trouble with this.
Passive FTP is an answer to this; it only opens the first connection from client to server,
and all data also flows over this 'channel'. However, both the client and server must understand
passive mode.
V. My Server is Slow!
As so often happens, servers slow down. There are a few things you can do, without server
access.
A. Ping the server
If the 'Round-trip' times you see in your ping are above a few hundred milliseconds, there
is possibly a problem with the network between your client and server. This can be
due to any number of factors; Use traceroute to see where the bottleneck is.
B. Check client load
Self-Explanatory. However, everything else shouuld be slow as well.
C. Check server load
On the server, if it's an NT server you can run taskmgr to see what the server load is.
On Unix, there are a number of commands you can look at; these include (insofar as they
are installed)
top
uptime
vmstat
ps
dmesg
You can also look at syslog and messages (in /var/log/ or /var/adm/) for some further information.
I can't tell you what proper output of these will look like, but they will give you information
on what could be wrong with a given application
D. Check network load
Are you connected to a switch or a hub? Ethernet, the underlying protocol for many IP
connections, is essentially a 'shared medium'. This means, that if two machines on a local
physical segment simultaneously try to transmit data, you have what is called a 'collision'.
The more collisions you get, the slower things are. A hub creates a larlge physical segment,
leading to the possibility of collisions, while a switch is designed to separate so-called
'collision domains'.
Make sure that your network card speed is correct. Some network cards try to 'autonegotiate'
their speed with whatever switch/hub port they are connected to; sometimes this breaks. Make
sure with the network administrator that both your station and the network device are
trying to communicate at the same speed.
While you're at it, have them check network traffic load on that port, as well. Most network
devices allow 'polling' of their load, via a protocol called SNMP (Simple Network Management
Protocol). It could simply be a case of putting too much data on a slow line, such as
downloading enormous files over a modem link.
E. Check the cable, network hardware
Ethernet/Cat5/UTP cabling can be sensitive to various factors, such as errors introduced
by radiation from nearby fluorescent lighting, untwisting of the wires (UTP = "unshielded
twisted pair"), etc.
Your network staff can also usually check logfiles on a router/switch/hub for errors, or
run a debug command on the device. Network hardware does occasionally break, as do network
interface cards (NIC) on servers and clients.
VI. Compendium
If you can't figure out what's wrong at this point, you'll probably need help (helpdesk, USETNET
newsgroup, etc.) Remember that the more information you supply from system debug commands, the
easier it'll be to help you find out where your problem lies.
(c) 2001 John Morgan Salomon