A Quick & Dirty Guide to Connection Troubleshooting for Beginners

This is not intended to be a technically intricate document to allow a systems administrator to understand why an application or server is not working correctly. Rather, it is meant as a step-by-step guide for a relative technical novice to narrow down possible causes for an unreachable server, before having to call a helpdesk.

Some of this stuff is pretty basic, so just skip around until you find something you can use.  A lot of it is highly out of date, as I wrote it in 2001.

N.b. all output is from entering these commands on a laptop running FreeBSD (Unix); syntax may vary depending on your operating system.

I. Protocols & Ports

Out of academic interest, it might be useful for you to take a look at the OSI (Open Systems Interconnect) model. This is a theoretical, schematic model for looking at network connections- rather than being a written-in-stone classification for different levels of network traffic, it’s meant to allow you to think about parts of network connections a bit more abstractly, and to help you figure out what does what.

Here the main concepts you should consider:

A. Internet Protocol (IP)

Most of the Internet, and most local area networks (LANs) function based on the IP protocol, also referred to as IP version 4 (IPv4.) IP addresses are unique on the internet, and usually look like this:

205.171.14.93

Each of these four ‘octals’ is an 8-bit number, and cannot be higher than 255.  IP addresses are parts of ‘subnets’, or ‘ranges’, which are assigned by organizations such as IANA and RIPE. In order to be properly visible, all IP addresses you want a single machine to reach on the internet must be unique. Likewise, it is a bad idea to simply grab an IP on your local network and assume it will work–you may run into an ‘IP conflict’ which will make your network administrator, and possibly another user, very unhappy.

B. Transmission Control Protocol (TCP)

TCP is the backbone of internet traffic. Simply put, it is a means to transport ’packets’ between applications in a way that assures delivery. Every TCP packet
has a unique identifier (“sequence number”) which is incremented with each packet in a given connection. TCP connections look as follows:

Client –> Server “SYN” (Synchronize)
Client <– Server “SYN/ACK” (Synchronize Acknowledged)
Client <– Server “ACK” (Acknowledge)
Client <-> Server “ACK” (at this point the connection is “established”
Client –> Server “FIN” (Finish)
Client <– Server “FIN/ACK” (Finish Acknowledged)
Client <– Server “RST” (Reset–the connection is now dead.)

Depending on the type of connection, not all of these are necessary–for example, the server may just “RST” the connection without prompting, effectively cutting it off.

C. Unreliable Datagram Protocol (UDP)

UDP is a lot faster than TCP, as it does not do any error checking. Concordantly, it is used for applications where you don’t care so much if the occasional chunk does not arrive. Internet video streaming is a typical application of UDP.

D. Internet Control Message Protocol (ICMP)

ICMP is a way for traffic on the internet to decide where to go, whether hosts are reachable, etc. It advises hosts that a next hop may not be available for traffic.

E. Stacks

An IP stack is an area of virtual memory allotted in a machine to dealing with network connections. All incoming and outgoing network connections are dealt with in this stack.

F. Ports

A “port” is a virtual address configured on a server to listen to connections. It is analogous to a mailbox in a highrise apartment building. Each application has a unique port, although many individual connections of the same type can be handled by a single port. Ports 1-512 by convention are “privileged”, meaning they are supposed to only be handled by applications approved by a server administrator.

Ports 512-1024 are “privileged”, which implies that they are not to be used by applications run by a mere mortal user. However, whether this convention is adhered to or not is up to a server’s administrator.

For a list of commonplace ports, look at

http://www.iana.org/assignments/port-numbers

II. What’s happening?

You may be seeing that a server “just isn’t reachable”. This be due to a number of reasons–there’s no way to predict all of them, but this might give you a few hints as to what the cause could be.

Let’s do some preliminary testing.

A. Machine Settings (Very Basic)

First, check to see that you are on the net. Can you reach anything? Do any of your networked applications work? Are Windows network drives accessible? Can you access any web pages at all? If not, check the following:

  • Is the cable plugged in? If you have a link light on your network card, is it lit?
  • Do you have a default route? Is your interface configured?

Under Windows, run the command

ipconfig

in a DOS Window. If you see an entry for ‘Default Gateway’, you’re set.

Under Unix, run

ifconfig -a

and

netstat -rn|grep default

Ifconfig (InterFace Config) allows you to see the configuration of all your active network interfaces. Netstat is a general network utility command;  If netstat -an shows you a default gateway, you should be okay.

  • Are your general settings correct? Compare your station with others on your local network. If you have a network administrator handy, check your settings against what he gave you. Can you ping them? Can they ping you? Can you ping your default gateway? See below for details on Ping.

B. ICMP/Ping

First, let’s find out whether a host responds to basic pings. “Ping” (Packet InterNet Groper) is a common implementation of the ICMP protocol, and used to probe hosts. On Unix servers, it is usually found in /usr/sbin or /usr/bin, and on Windows machines it sits in c:\windows\system32\ or c:\winnt\system32.

The Syntax of this command is

ping <hostname> (or IP address)

Successful pings generally mean that a server’s address is reachable, and look like this:

bolo:[15:23]~> ping www.berkeley.edu
PING arachne.berkeley.edu (169.229.131.109): 56 data bytes
64 bytes from 169.229.131.109: icmp_seq=0 ttl=237 time=180.286 ms
64 bytes from 169.229.131.109: icmp_seq=1 ttl=237 time=173.291 ms
^C
— arachne.berkeley.edu ping statistics —
3 packets transmitted, 2 packets received, 33% packet loss
round-trip min/avg/max/stddev = 173.291/176.788/180.286/3.497 ms

Under Solaris, ‘ping -s ‘ will return this same result. ‘Ping‘ by itself just will return ‘ is alive.’

If your client station does not know how to get to a net (it does not have a ‘route’), or does not have a default gateway, you will see something like

bolo:[15:25]~> ping www.cnn.comPING cnn.com (207.25.71.5): 56 data bytes
ping: sendto: No route to host
ping: sendto: No route to host
ping: sendto: No route to host
^C
— cnn.com ping statistics —
13 packets transmitted, 0 packets received, 100% packet loss

When you try to reach a remote address, you may pass many network devices (“routers”) on the way. If you are trying to reach an address that they don’t know how to get to, you could see

bolo:[15:28]~> ping 10.0.0.1
PING 10.0.0.1 (10.0.0.1): 56 data bytes
36 bytes from Serial10-1-0.GW3.ZUR4.ALTER.NET (146.188.39.89): Destination Host Unreachable
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 5400 ae7e 0 0000 3c 01 0384 192.168.1.254 10.0.0.1

Lastly, if your output looks as follows, it is possible that there is a firewall or similar security device in the way, which is not permitting ICMP traffic. This does not mean that the host is down, just that it cannot be pinged.

bolo:[15:32]~> ping 165.222.187.100
PING 165.222.187.100 (165.222.187.100): 56 data bytes
^C
— 165.222.187.100 ping statistics —
32 packets transmitted, 0 packets received, 100% packet loss

C. Traceroute

Traceroute is a program which allows you to find out which hops your traffic passes on the way to a target server. It is usually found in the same location as ping, both under Unix and Windows. Traceroute’s main function is to allow you a look at whether the path a transmission takes from client to server looks sane. A successful traceroute means that the host is up and running, and that there is a proper route to it. Traceroute output looks like this:

soda:[6:38]~> traceroute www.berkeley.edu
traceroute to arachne.berkeley.edu (169.229.131.109), 64 hops max, 40 byte packets
1 gig6-1v247.snr1.CS.Berkeley.EDU (128.32.247.1) 0.744 ms 0.655 ms 0.610 ms
2 vlan241.inr-201-eva.Berkeley.EDU (128.32.255.161) 0.858 ms 0.761 ms 0.818 ms
3 vlan209.inr-203-eva.Berkeley.EDU (128.32.255.2) 0.861 ms 0.986 ms 0.774 ms
4 arachne.Berkeley.EDU (169.229.131.109) 0.781 ms 0.717 ms 0.732 ms

It can also look as follows. This does not mean that the server is necessarily down or unreachable, but could imply that there is a firewall/security device in the middle:

soda:[6:39]~> traceroute www.google.com
traceroute to www.google.com (216.239.33.101), 64 hops max, 40 byte packets
1 gig6-1v247.snr1.CS.Berkeley.EDU (128.32.247.1) 0.777 ms 0.663 ms 0.612 ms
2 vlan241.inr-201-eva.Berkeley.EDU (128.32.255.161) 0.870 ms 0.736 ms 0.785 ms
3 gigE2-0.inr-000-eva.Berkeley.EDU (128.32.0.193) 0.599 ms 0.537 ms 0.515 ms
4 pos3-0.c2-berk-gsr.Berkeley.EDU (128.32.0.90) 0.692 ms 0.579 ms 0.559 ms
5 SUNV–BERK.POS.calren2.net (198.32.249.14) 1.810 ms 1.852 ms 1.992 ms
6 STAN–SUNV.POS.calren2.net (198.32.249.74) 2.230 ms 2.167 ms 2.236 ms
7 PAIX-7206–STAN-3.ATM.calren2.net (198.32.249.186) 3.346 ms 2.836 ms 2.906 ms
8 paix.exodus.net (198.32.176.15) 3.071 ms 3.164 ms 3.503 ms
9 ibr02-g1-0.paix01.exodus.net (206.79.9.242) 3.875 ms 2.931 ms 2.848 ms
10 bbr01-p6-0.sntc03.exodus.net (209.185.9.241) 3.404 ms 3.320 ms 3.366 ms
11 dcr04-g4-0.sntc03.exodus.net (216.33.153.68) 5.334 ms 3.382 ms 9.461 ms
12 csr01-ve242.sntc03.exodus.net (216.33.153.181) 4.482 ms 3.876 ms 4.251 ms
13 google-exodus.exodus.net (64.68.64.210) 4.574 ms 3.610 ms 4.610 ms
14 exbi1-gige-1-3.net.google.com (216.239.47.2) 5.256 ms 5.382 ms 5.315 ms
15 * * *
16 * * *
17 * *^C

Lastly, you may see the next block of output–this implies that there is a firewall in the middle which allows traces to the final destination, but not to itself (thus nobody can identify it as a hop in the middle):

soda:[6:43]~> traceroute altavista.digital.com
traceroute to altavista.digital.com (209.73.180.1), 64 hops max, 40 byte packets
1 gig6-1v247.snr1.CS.Berkeley.EDU (128.32.247.1) 0.837 ms 0.668 ms 0.685 ms
2 vlan241.inr-201-eva.Berkeley.EDU (128.32.255.161) 1.465 ms 0.785 ms 0.754 ms
3 fast4-1-0.inr-new-666-doecev.Berkeley.EDU (128.32.0.73) 1.779 ms 2.628 ms 2.364 ms
4 qsv-juniper–ucb-gw.calren2.net (128.32.0.70) 3.717 ms 3.283 ms 3.259 ms
5 svl-edge-09.inet.qwest.net (65.113.32.209) 3.247 ms 2.917 ms 3.361 ms
6 svl-core-01.inet.qwest.net (205.171.14.93) 3.315 ms 3.632 ms 2.802 ms
7 sjo-core-01.inet.qwest.net (205.171.5.99) 3.831 ms 2.907 ms 3.094 ms
8 sfo-core-02.inet.qwest.net (205.171.5.123) 4.591 ms 4.839 ms 5.857 ms
9 jfk-core-01.inet.qwest.net (205.171.5.113) 66.522 ms 67.079 ms 67.408 ms
10 jfk-core-03.inet.qwest.net (205.171.230.6) 66.856 ms 66.065 ms 67.102 ms
11 jfk-edge-04.inet.qwest.net (205.171.30.114) 66.596 ms 66.322 ms 66.156 ms
12 63.148.0.22 (63.148.0.22) 74.506 ms 74.240 ms 73.975 ms
13 * * *
14 altavista.com (209.73.180.1) 74.845 ms 73.763 ms 74.196 ms

Traceroute and ping are two different applications. Furthermore, traceroute on Windows and NT are not the same type of traffic. Windows traceroute uses “ICMP ECHO REQUEST” to see whether routes to a host are alive. Unix and Cisco traceroute is generally based on UDP.  Some firewalls allow one, but not the other; if you suspect a firewall between your client and the target server, try traceroute from both types of platforms.

D. Telnet

Telnet is a very basic login application designed to emulate a mainframe terminal.  However, telnet has the ability to connect to any TCP ports to show you whether a service is even listening or not. On Inix, telnet is normally located in /usr/bin/, in Windows it should be in C:\{Windows,Win98,WINNT}\System32.

On Unix, you can telnet to a given port for an application using the syntax

telnet hostname portnumber

Based on the documentation in section I.F., to see if my webserver on www.switch.ch is listening on the default http port, I would try the following:

bolo:[17:07]~> telnet www.switch.ch 80
Trying 130.59.10.30…
Connected to etna.switch.ch.
Escape character is ‘^]’.

This means that I have a connection established to the server. In effect, what I have just done is “faked” the same sort of connection that a web browser (Netscape, Opera, Internet Explorer) makes to a webserver. Under Windows NT/9x telnet, the equivalent of this is when the server address is displayed in the telnet window’s title bar. Don’t forget to change the port number from the default ‘telnet’ (port 23) before connecting.   Windows 2000 telnet works like Unix telnet.

If my service uses TCP and I get this prompt, and I am still having connection difficulties, I can rule out that there is a firewall or routing problem; the issue is most likely with the server’s configuration or my client application.

Another possible response to a telnet attempt, this time to the SMTP (Simple Mail Transport Protocol, the most-used Internet email standard) port on an address not configured to receive SMTP mail, may yield this:

bolo:[17:15]~> telnet cfpa11.berkeley.edu 25
Trying 128.32.124.189…
telnet: connect to address 128.32.124.189: Connection refused
telnet: Unable to connect to remote host

This means that the target has sent me a “RST”, or “RESET” TCP packet in response to my attempt to “SYN”. It’s effectively telling me, no, I’m not paying attention to this port. The server is not configured correctly, or the program on the server which is supposed to be listening on the port isn’t even running.

Telnet can also yield this:

bolo:[17:15]~> telnet www.google.com 12345
Trying 216.239.37.101…
telnet: connect to address 216.239.37.101: Operation timed out
telnet: Unable to connect to remote host

This usually indicates the presence of a firewall or other similar security filter in front of or on the server which explicitly disallows connections to port 12345.  Under Windows NT/9x telnet, you will see ‘Telnet (none)’ in the title bar, and the cursor will be an hourglass.

III. What’s Happening on the Server?

Now, let’s see if you can log into your server. If you are able to telnet/ssh/rlogin to your Unix server, or have console/remote desktop access to a Windows NT server on which your network server

A. What’s listening to what?

On your unix server, you can have a look at some locally defined ports in the file /etc/services.  Furthermore, the file /etc/inetd.conf specifies which of these services your inetd, which is the program run by Unix at startup to listen to network connections, has an actual program (server) associated with it. This means that when Inetd “sees” a connection arriving on a given port, it starts a service to deal with it.

However, this is not the only way to run servers. Different types of Unix (Linux, Solaris, AIX, etc.) have various methods for starting servers via scripts at boot. The easiest way to look at this is (on the server!) to run the commands

netstat -an

or

lsof -i (if installed)

To see which services are (supposedly) listening on which ports. If you don’t understand these commands, see whether there is a manual page by running the command

man netstat (or lsof)

This should explain the syntax to you in the usual clear, concise Unix manual page language. In short, for netstat -an, you may see these keywords:

CLOSED, LISTEN, SYN_SENT, SYN_RECEIVED, ESTABLISHED, CLOSE_WAIT, FIN_WAIT_1,
CLOSING, LAST_ACK, FIN_WAIT_2, TIME_WAIT

Consult the netstat man page for the meaning of these. However, if you see your port listed with one of these comments, it means something’s happening with it. Windows NT also has a netstat command, which should be located in your System32 directory.

Lsof (LiSt Open Files) with the -i argument should tell you which actual application is using which port. Have a look at its output, and see if it looks reasonable.

Under Windows, there isn’t much I can tell you except to look at your Task Manager (taskmgr.exe), and your services (under ‘Administrative Tools’), and to not run servers on Windows :-)

B. Sniffing

Lastly, if you have administrator rights on your server, you can run a sniffer for traffic reaching the server. Under Solaris, ‘snoop’ is an example, while BSD-based systems may have ‘tcpdump’ installed. Windows systems generally require an additional piece of software to be installed. RTFM is left to the user. The point behind this is to see whether traffic from your client address actually reaches the server. Generally, sniffers let you sort traffic by destination port or source IP address, allowing you to see whether traffic from the client even reaches your server system.

Please please be aware of the legal/administrative consequences of sniffing network traffic on a system–you may see things people don’t want you to see, such as passwords or confidential data. This is why most sniffer software is restricted to use by people with administrator rights.

IV. Common Services

A server can be down for a number of reasons. Let’s take a few common applications and what could cause them to not respond.

A. Hyper Text Transfer Protocol (HTTP), Web Pages

Some common errors found on webservers are listed at

http://offline.home.cern.ch/offline/web/http_error_codes.html

CERN came up with the World Wide Web, so they should know. By the way, the above is an example of a “Uniform Resource Locator”, or “URL”, which is nothing more than a means of specifying a protocol/application (in this case, HTTP, but can be FTP, Gopher, telnet, etc etc) an address, and a path (the stuff after the first single “/”).

These mean that, while the webserver you are trying to access is listening, there is some problem either with your client (browser) or the web server.

B. Web Applications

If you are behind a firewall, or a proxy server, you may have troubles with certain applications (java applets for example) which your browser attempts to download. Specifically, this can be caused by an application attempting to access a server which your firewall does not allow, even though the application was actually installed via regular HTTP.

Certain multimedia applications, such as RealPlayer and other streaming media clients require various other ports to be opened–this means that your browser understands that a certain kind of file a webserver is giving it should be opened with a specific application, and the application then handles its own traffic.

C. File Transfer Protocol (FTP)

FTP can either be via anonymous FTP (what you usually see when you have a URL in
your browser window starting with ‘ftp://‘) or interactively, either via command-line
ftp from Unix or Windows, or a graphical client such as ‘CuteFTP’ or ‘WS_FTP’.

There are two types of FTP, active and passive.

Active FTP opens a connection from your client to the server, which is used for passing ’administrative’ information–this includes your login and password. However, the FTP server then opens a connection back to you for the actual file transfer, regardless of what direction the file actually flows in (whether you ‘PUT’ or ‘GET’ the file from the server.) Certain firewalls can have trouble with this.

Passive FTP is an answer to this; it only opens the first connection from client to server, and all data also flows over this ‘channel’. However, both the client and server must understand passive mode.

V. My Server is Slow!

As so often happens, servers slow down. There are a few things you can do, without server access.

A. Ping the server

If the ‘Round-trip’ times you see in your ping are above a few hundred milliseconds, there is possibly a problem with the network between your client and server. This can be due to any number of factors; Use traceroute to see where the bottleneck is.

B. Check client load

Self-Explanatory. However, everything else shouuld be slow as well.

C. Check server load

On the server, if it’s an NT server you can run taskmgr to see what the server load is.

On Unix, there are a number of commands you can look at; these include (insofar as they are installed)

top
uptime
vmstat
ps
dmesg

You can also look at syslog and messages (in /var/log/ or /var/adm/) for some further information.

I can’t tell you what proper output of these will look like, but they will give you information on what could be wrong with a given application

D. Check network load

Are you connected to a switch or a hub? Ethernet, the underlying protocol for many IP connections, is essentially a ‘shared medium’. This means, that if two machines on a local physical segment simultaneously try to transmit data, you have what is called a ‘collision’.  The more collisions you get, the slower things are. A hub creates a larlge physical segment, leading to the possibility of collisions, while a switch is designed to separate so-called ’collision domains’.

Make sure that your network card speed is correct. Some network cards try to ‘autonegotiate’ their speed with whatever switch/hub port they are connected to; sometimes this breaks. Make sure with the network administrator that both your station and the network device are trying to communicate at the same speed.

While you’re at it, have them check network traffic load on that port, as well. Most network devices allow ‘polling’ of their load, via a protocol called SNMP (Simple Network Management Protocol). It could simply be a case of putting too much data on a slow line, such as downloading enormous files over a modem link.

E. Check the cable, network hardware

Ethernet/Cat5/UTP cabling can be sensitive to various factors, such as errors introduced by radiation from nearby fluorescent lighting, untwisting of the wires (UTP = “unshielded twisted pair”), etc.

Your network staff can also usually check logfiles on a router/switch/hub for errors, or run a debug command on the device. Network hardware does occasionally break, as do network interface cards (NIC) on servers and clients.

VI. Compendium

If you can’t figure out what’s wrong at this point, you’ll probably need help (helpdesk, USETNET newsgroup, etc.) Remember that the more information you supply from system debug commands, the easier it’ll be to help you find out where your problem lies.

(c) 2001 John Morgan Salomon

© 1997 - 2010 zog.net Suffusion WordPress theme by Sayontan Sinha