HELP, I CAN'T REACH MY SERVER!


A Quick & Dirty Guide to Connection Troubleshooting for Beginners


This is not intended to be a technically intricate document to 
allow a systems administrator to understand why an application or
server is not working correctly.  Rather, it is meant as a step-by-step
guide for a relative technical novice to narrow down possible causes
for an unreachable server, before having to call a helpdesk.

Some of this stuff is pretty basic, so just skip around until you
find something you can use.

N.b. all output is from entering these commands on a laptop running
FreeBSD (Unix);  syntax may vary depending on your operating system.


I.	Protocols & Ports

Out of academic interest, it might be useful for you to take a look at the
OSI (Open Systems Interconnect) model.  This is a theoretical, schematic
model for looking at network connections--rather than being a written-in-stone
classification for different levels of network traffic, it's meant to allow
you to think about parts of network connections a bit more abstractly, and
to help you figure out what does what.

Here the main concepts you should consider:

	A.	Internet Protocol (IP)
	
	Most of the Internet, and most local area networks (LANs) function based
	on the IP protocol, also referred to as IP version 4 (IPv4.)  IP addresses
	are unique on the internet, and usually look like this:  
	
		205.171.14.93
		
	Each of these four 'octals' is an 8-bit number, and cannot be higher than 255.
	IP addresses are parts of 'subnets', or 'ranges', which are assigned by
	organizations such as IANA and RIPE.  In order to be properly visible, 
	all IP addresses you want a single machine to reach on the internet must be
	unique.  Likewise, it is a bad idea to simply grab an IP on your local network
	and assume it will work--you may run into an 'IP conflict' which will make
	your network administrator, and possibly another user, very unhappy.
	
	B.	Transmission Control Protocol (TCP)
	 
	TCP is the backbone of internet traffic.  Simply put, it is a means to transport
	'packets' between applications in a way that assures delivery.  Every TCP packet
	has a unique identifier ("sequence number") which is incremented with each
	packet in a given connection.  TCP connections look as follows:
	
		Client --> Server	"SYN" (Synchronize)
		Client <-- Server	"SYN/ACK" (Synchronize Acknowledged)
		Client <-- Server	"ACK" (Acknowledge)
		Client <-> Server	"ACK" (at this point the connection is "established"
		Client --> Server	"FIN" (Finish)
		Client <-- Server	"FIN/ACK" (Finish Acknowledged)
		Client <-- Server	"RST" (Reset--the connection is now dead.)
		
	Depending on the type of connection, not all of these are necessary--for example,
	the server may just "RST" the connection without prompting, effectively cutting it off.
		
	C.	Unreliable Datagram Protocol (UDP)
	
	UDP is a lot faster than TCP, as it does not do any error checking.  Concordantly,
	it is used for applications where you don't care so much if the occasional chunk
	does not arrive.  Internet video streaming is a typical application of UDP.
	
	D.	Internet Control Message Protocol (ICMP)
	
	ICMP is a way for traffic on the internet to decide where to go, 
	whether hosts are reachable, etc.  It advises hosts that a next hop may not be
	available for traffic.
	
	E.	Stacks
	
	An IP stack is an area of virtual memory allotted in a machine to dealing with
	network connections.  All incoming and outgoing network connections are dealt
	with in this stack.
	
	F.	Ports
	
	A "port" is a virtual address configured on a server to listen to connections.  It
	is analogous to a mailbox in a highrise apartment building.  Each application 
	has a unique port, although many individual connections of the same type can be
	handled by a single port.  Ports 1-512 by convention are "privileged", meaning they
	are supposed to only be handled by applications approved by a server administrator.
	Ports 512-1024 are "privileged", which implies that they are not to be used by applications
	run by a mere mortal user.  However, whether this convention is adhered to or not is
	up to a server's administrator.   
	
	For a list of commonplace ports, look at
	
		http://www.iana.org/assignments/port-numbers
	
	
II.	What's happening?

You may be seeing that a server "just isn't reachable".  This be due
to a number of reasons--there's no way to predict all of them, but
this might give you a few hints as to what the cause could be.

Let's do some preliminary testing.  


	A.	Machine Settings (Very Basic)
	
	First, check to see that you are on the net.  Can you reach anything?  Do
	any of your networked applications work?  Are Windows network drives accessible?
	Can you access any web pages at all?  If not, check the following:
	
	-Is the cable plugged in?  If you have a link light on your network card,
	 is it lit?
	 
	-Do you have a default route?  Is your interface configured?
		Under Windows, run the command
		
			ipconfig
			
		in a DOS Window.  If you see an entry for 'Default Gateway', you're set.
		
		Under Unix, run
			
			ifconfig -a
		
		and
		
			netstat -rn|grep default
			
		Ifconfig (InterFace Config) allows you to see the configuration of all your
		active network interfaces.  Netstat is a general network utility command;  
		If netstat -an shows you a default gateway, you should be okay.
		
	-Are your general settings correct?  Compare your station with others on your local
	 network.  If you have a network administrator handy, check your settings against
	 what he gave you.  Can you ping them?  Can they ping you?  Can you ping your default
	 gateway?  See below for details on Ping.
	 	
	
	B.	ICMP/Ping
	
	First, let's find out whether a host responds to basic pings.  "Ping" (Packet InterNet Groper).  
	is a common implementation of the ICMP protocol, and used to probe hosts.  On
	Unix servers, it is usually found in /usr/sbin or /usr/bin, and on Windows
	machines it sits in c:\windows\system32\ or c:\winnt\system32.

	The Syntax of this command is 
	
		ping 
		
	Successful pings generally mean that a server's address is reachable,
	and look like this:
		
		bolo:[15:23]~> ping www.berkeley.edu
		PING arachne.berkeley.edu (169.229.131.109): 56 data bytes
		64 bytes from 169.229.131.109: icmp_seq=0 ttl=237 time=180.286 ms
		64 bytes from 169.229.131.109: icmp_seq=1 ttl=237 time=173.291 ms
		^C
		--- arachne.berkeley.edu ping statistics ---
		3 packets transmitted, 2 packets received, 33% packet loss
		round-trip min/avg/max/stddev = 173.291/176.788/180.286/3.497 ms
		
	Under Solaris, 'ping -s ' will return this same result.  'Ping' by itself
	just will return ' is alive.'
	
	If your client station does not know how to get to a net (it does not have a 'route'),
	or does not have a default gateway, you will see something like
	
		bolo:[15:25]~> ping www.cnn.comPING cnn.com (207.25.71.5): 56 data bytes
		ping: sendto: No route to host
		ping: sendto: No route to host
		ping: sendto: No route to host
		^C
		--- cnn.com ping statistics ---
		13 packets transmitted, 0 packets received, 100% packet loss
		
	When you try to reach a remote address, you may pass many network devices ("routers")
	on the way.  If you are trying to reach an address that they don't know how to get to,
	you could see
	
		bolo:[15:28]~> ping 10.0.0.1
		PING 10.0.0.1 (10.0.0.1): 56 data bytes
		36 bytes from Serial10-1-0.GW3.ZUR4.ALTER.NET (146.188.39.89): Destination Host Unreachable
		Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 		4  5  00 5400 ae7e   0 0000  3c  01 0384 192.168.1.254  10.0.0.1 
		
	Lastly, if your output looks as follows, it is possible that there is a firewall or
	similar security device in the way, which is not permitting ICMP traffic.  This does not
	mean that the host is down, just that it cannot be pinged.
	
		bolo:[15:32]~> ping 165.222.187.100
		PING 165.222.187.100 (165.222.187.100): 56 data bytes
		^C
		--- 165.222.187.100 ping statistics ---
		32 packets transmitted, 0 packets received, 100% packet loss
		
		
	C.	Traceroute	

	Traceroute is a program which allows you to find out which hops your traffic
	passes on the way to a target server.  It is usually found in the same location
	as ping, both under Unix and Windows.  Traceroute's main function is to allow you
	a look at whether the path a transmission takes from client to server looks 
	sane.  A successful traceroute means that the host is up and running, and that 
	there is a proper route to it.  Traceroute output looks like this:
	
		soda:[6:38]~> traceroute www.berkeley.edu
		traceroute to arachne.berkeley.edu (169.229.131.109), 64 hops max, 40 byte packets
		 1  gig6-1v247.snr1.CS.Berkeley.EDU (128.32.247.1)  0.744 ms  0.655 ms  0.610 ms
		 2  vlan241.inr-201-eva.Berkeley.EDU (128.32.255.161)  0.858 ms  0.761 ms  0.818 ms
		 3  vlan209.inr-203-eva.Berkeley.EDU (128.32.255.2)  0.861 ms  0.986 ms  0.774 ms
		 4  arachne.Berkeley.EDU (169.229.131.109)  0.781 ms  0.717 ms  0.732 ms
		 
	
	It can also look as follows.  This does not mean that the server is necessarily 
	down or unreachable, but could imply that there is a firewall/security device
	in the middle:
		
		soda:[6:39]~> traceroute www.google.com
		traceroute to www.google.com (216.239.33.101), 64 hops max, 40 byte packets
		 1  gig6-1v247.snr1.CS.Berkeley.EDU (128.32.247.1)  0.777 ms  0.663 ms  0.612 ms
		 2  vlan241.inr-201-eva.Berkeley.EDU (128.32.255.161)  0.870 ms  0.736 ms  0.785 ms
		 3  gigE2-0.inr-000-eva.Berkeley.EDU (128.32.0.193)  0.599 ms  0.537 ms  0.515 ms
		 4  pos3-0.c2-berk-gsr.Berkeley.EDU (128.32.0.90)  0.692 ms  0.579 ms  0.559 ms
		 5  SUNV--BERK.POS.calren2.net (198.32.249.14)  1.810 ms  1.852 ms  1.992 ms
		 6  STAN--SUNV.POS.calren2.net (198.32.249.74)  2.230 ms  2.167 ms  2.236 ms
		 7  PAIX-7206--STAN-3.ATM.calren2.net (198.32.249.186)  3.346 ms  2.836 ms  2.906 ms
		 8  paix.exodus.net (198.32.176.15)  3.071 ms  3.164 ms  3.503 ms
		 9  ibr02-g1-0.paix01.exodus.net (206.79.9.242)  3.875 ms  2.931 ms  2.848 ms
		10  bbr01-p6-0.sntc03.exodus.net (209.185.9.241)  3.404 ms  3.320 ms  3.366 ms
		11  dcr04-g4-0.sntc03.exodus.net (216.33.153.68)  5.334 ms  3.382 ms  9.461 ms
		12  csr01-ve242.sntc03.exodus.net (216.33.153.181)  4.482 ms  3.876 ms  4.251 ms
		13  google-exodus.exodus.net (64.68.64.210)  4.574 ms  3.610 ms  4.610 ms
		14  exbi1-gige-1-3.net.google.com (216.239.47.2)  5.256 ms  5.382 ms  5.315 ms
		15  * * *
		16  * * *
		17  * *^C
		
	Lastly, you may see the next block of output--this implies that there is a firewall
	in the middle which allows traces to the final destination, but not to itself (thus
	nobody can identify it as a hop in the middle):
	
		soda:[6:43]~> traceroute altavista.digital.com
		traceroute to altavista.digital.com (209.73.180.1), 64 hops max, 40 byte packets
		 1  gig6-1v247.snr1.CS.Berkeley.EDU (128.32.247.1)  0.837 ms  0.668 ms  0.685 ms
		 2  vlan241.inr-201-eva.Berkeley.EDU (128.32.255.161)  1.465 ms  0.785 ms  0.754 ms
		 3  fast4-1-0.inr-new-666-doecev.Berkeley.EDU (128.32.0.73)  1.779 ms  2.628 ms  2.364 ms
		 4  qsv-juniper--ucb-gw.calren2.net (128.32.0.70)  3.717 ms  3.283 ms  3.259 ms
		 5  svl-edge-09.inet.qwest.net (65.113.32.209)  3.247 ms  2.917 ms  3.361 ms
		 6  svl-core-01.inet.qwest.net (205.171.14.93)  3.315 ms  3.632 ms  2.802 ms
		 7  sjo-core-01.inet.qwest.net (205.171.5.99)  3.831 ms  2.907 ms  3.094 ms
		 8  sfo-core-02.inet.qwest.net (205.171.5.123)  4.591 ms  4.839 ms  5.857 ms
		 9  jfk-core-01.inet.qwest.net (205.171.5.113)  66.522 ms  67.079 ms  67.408 ms
		10  jfk-core-03.inet.qwest.net (205.171.230.6)  66.856 ms  66.065 ms  67.102 ms
		11  jfk-edge-04.inet.qwest.net (205.171.30.114)  66.596 ms  66.322 ms  66.156 ms
		12  63.148.0.22 (63.148.0.22)  74.506 ms  74.240 ms  73.975 ms
		13  * * *
		14  altavista.com (209.73.180.1)  74.845 ms  73.763 ms  74.196 ms

	Traceroute and ping are two different applications.  Furthermore, traceroute
	on Windows and NT are not the same type of traffic.  Windows traceroute uses "ICMP ECHO REQUEST"
	to see whether routes to a host are alive.  Unix and Cisco traceroute is generally based on UDP.
	Some firewalls allow one, but not the other;  if you suspect a firewall between your client
	and the target server, try traceroute from both types of platforms.
	
	D.	Telnet
	
	Telnet is a very basic login application designed to emulate a mainframe terminal.
	However, telnet has the ability to connect to any TCP ports to show you whether a service
	is even listening or not.  On unix, telnet is normally located in /usr/bin/, in Windows
	it should be in C:\{Windows,Win98,WINNT}\System32.
	
	On Unix, you can telnet to a given port for an application using the syntax
	
		telnet hostname portnumber
		
	Based on the documentation in section I.F., to see if my webserver on
	www.switch.ch is listening on the default http port, I would try the following:
	
		bolo:[17:07]~> telnet www.switch.ch 80
		Trying 130.59.10.30...
		Connected to etna.switch.ch.
		Escape character is '^]'.

	This means that I have a connection established to the server.  In effect, what
	I have just done is "faked" the same sort of connection that a web browser (Netscape,
	Opera, Internet Explorer) makes to a webserver.  Under Windows NT/9x telnet, the equivalent
	of this is when the server address is displayed in the telnet window's title bar.  Don't
	forget to change the port number from the default 'telnet' (port 23) before connecting.
	Windows 2000 telnet works like Unix telnet.
		
	If my service uses TCP and I get this prompt, and I am still having connection difficulties,
	I can rule out that there is a firewall or routing problem;  the issue is most likely
	with the server's configuration or my client application. 

	Another possible response to a telnet attempt, this time to the SMTP (Simple Mail Transport
	Protocol, the most-used Internet email standard) port on an address not configured to
	receive SMTP mail, may yield this:
	
		bolo:[17:15]~> telnet cfpa11.berkeley.edu 25
		Trying 128.32.124.189...
		telnet: connect to address 128.32.124.189: Connection refused
		telnet: Unable to connect to remote host
		
	This means that the target has sent me a "RST", or "RESET" TCP packet in response to my
	Attempt to "SYN".  It's effectively telling me, no, I'm not paying attention to this
	port.  The server is not configured correctly, or the program on the server which is
	supposed to be listening on the port isn't even running.
	
	Telnet can also yield this:
	
		bolo:[17:15]~> telnet www.google.com 12345
		Trying 216.239.37.101...
		telnet: connect to address 216.239.37.101: Operation timed out
		telnet: Unable to connect to remote host

	This usually indicates the presence of a firewall or other similar security filter
	in front of or on the server which explicitly disallows connections to port 12345.
	Under Windows NT/9x telnet, you will see 'Telnet (none)' in the title bar,
	and the cursor will be an hourglass.  
		

III.	What's Happening on the Server?

Now, let's see if you can log into your server.  If you are able to telnet/ssh/rlogin
to your Unix server, or have console/remote desktop access to a Windows NT server
on which your network server 
	
	
	A. 	What's listening to what?
	
	On your unix server, you can have a look at some locally defined ports in the file
	/etc/services.  Furthermore, the file /etc/inetd.conf specifies which of these services
	your inetd, which is the program run by Unix at startup to listen to network connections,
	has an actual program (server) associated with it.  This means that when Inetd "sees" a 
	connection arriving on a given port, it starts a service to deal with it.

	However, this is not the only way to run servers.  Different types of Unix (Linux,
	Solaris, AIX, etc.) have various methods for starting servers via scripts at boot.
	The easiest way to look at this is (on the server!) to run the commands
	
		netstat -an
		
	or

		lsof -i (if installed)
		
	To see which services are (supposedly) listening on which ports.  If you don't understand
	these commands, see whether there is a manual page by running the command

		man netstat (or lsof)
		
	This should explain the syntax to you in the usual clear, concise Unix manual page
	language.  In short, for netstat -an, you may see these keywords:
	
		CLOSED, LISTEN, SYN_SENT, SYN_RECEIVED, ESTABLISHED, CLOSE_WAIT, FIN_WAIT_1, 
		CLOSING, LAST_ACK, FIN_WAIT_2, TIME_WAIT
	
	Consult the netstat man page for the meaning of these.  However, if you see your port
	listed with one of these comments, it means something's happening with it.  Windows NT 
	also has a netstat command, which should be located in your System32 directory.
	
	Lsof (LiSt Open Files) with the -i argument should tell you which actual application 
	is using which port.  Have a look at its output, and see if it looks reasonable.
	
	Under Windows, there isn't much I can tell you except to look at your Task Manager
	(taskmgr.exe), and your services (under 'Administrative Tools'), and to not run
	servers on Windows :-)

	B.	Sniffing
		
	Lastly, if you have administrator rights on your server, you can run a sniffer for
	traffic reaching the server.  Under Solaris, 'snoop' is an example, while BSD-based
	systems may have 'tcpdump' installed.  Windows systems generally require an additional
	piece of software to be installed.  RTFM is left to the user.  The point behind this
	is to see whether traffic from your client address actually reaches the server.  Generally,
	sniffers let you sort traffic by destination port or source IP address, allowing you to
	see whether traffic from the client even reaches your server system.

	Please please be aware of the legal/administrative consequences of sniffing network
	traffic on a system--you may see things people don't want you to see, such as passwords
	or confidential data.  This is why most sniffer software is restricted to use
	by people with administrator rights.
	
	
IV.	Common Services

A server can be down for a number of reasons.  Let's take a few common applications
and what could cause them to not respond.


	A.	Hyper Text Transfer Protocol (HTTP), Web Pages
	
	Some common errors found on webservers are listed at
	
		http://offline.home.cern.ch/offline/web/http_error_codes.html
		
	CERN came up with the World Wide Web, so they should know.  By the way, the above
	is an example of a "Uniform Resource Locator", or "URL", which is nothing more than a
	means of specifying a protocol/application (in this case, HTTP, but can be FTP, Gopher,
	telnet, etc etc) an address, and a path (the stuff after the first single "/").
	
	These mean that, while the webserver you are trying to access is listening, there
	is some problem either with your client (browser) or the web server.  
	
	B.	Web Applications
	
	If you are behind a firewall, or a proxy server, you may have troubles with certain
	applications (java applets for example) which your browser attempts to download.
	Specifically, this can be caused by an application attempting to access a server 
	which your firewall does not allow, even though the application was actually installed
	via regular HTTP.
	
	Certain multimedia applications, such as RealPlayer and other streaming media clients
	require various other ports to be opened--this means that your browser understands
	that a certain kind of file a webserver is giving it should be opened with a specific
	application, and the application then handles its own traffic.
	
	C.	File Transfer Protocol (FTP)
	
	FTP can either be via anonymous FTP (what you usually see when you have a URL in
	your browser window starting with 'ftp://') or interactively, either via command-line
	ftp from Unix or Windows, or a graphical client such as 'CuteFTP' or 'WS_FTP'.
	
	There are two types of FTP, active and passive.  
	
	Active FTP opens a connection from your client to the server, which is used for passing
	'administrative' information--this includes your login and password.  However, the FTP server
	then opens a connection back to you for the actual file transfer, regardless of what
	direction the file actually flows in (whether you 'PUT' or 'GET' the file from the server.)
	Certain firewalls can have trouble with this.
	
	Passive FTP is an answer to this;  it only opens the first connection from client to server,
	and all data also flows over this 'channel'.  However, both the client and server must understand
	passive mode.  
	
	
V.	My Server is Slow!

As so often happens, servers slow down.  There are a few things you can do, without server
access.

	A.	Ping the server
	
	If the 'Round-trip' times you see in your ping are above a few hundred milliseconds, there
	is possibly a problem with the network between your client and server.  This can be
	due to any number of factors;  Use traceroute to see where the bottleneck is.
	
	B.	Check client load
	
	Self-Explanatory.  However, everything else shouuld be slow as well.
	
	C.	Check server load
	
	On the server, if it's an NT server you can run taskmgr to see what the server load is.
	
	On Unix, there are a number of commands you can look at;  these include (insofar as they
	are installed)
	
		top
		uptime
		vmstat
		ps
		dmesg
		
	You can also look at syslog and messages (in /var/log/ or /var/adm/) for some further information.
	
	I can't tell you what proper output of these will look like, but they will give you information
	on what could be wrong with a given application
	
	D.	Check network load
	
	Are you connected to a switch or a hub?  Ethernet, the underlying protocol for many IP
	connections, is essentially a 'shared medium'.  This means, that if two machines on a local
	physical segment simultaneously try to transmit data, you have what is called a 'collision'.
	The more collisions you get, the slower things are.  A hub creates a larlge physical segment,
	leading to the possibility of collisions, while a switch is designed to separate so-called
	'collision domains'.
	
	Make sure that your network card speed is correct.  Some network cards try to 'autonegotiate'
	their speed with whatever switch/hub port they are connected to;  sometimes this breaks.  Make
	sure with the network administrator that both your station and the network device are
	trying to communicate at the same speed.
	
	While you're at it, have them check network traffic load on that port, as well.  Most network
	devices allow 'polling' of their load, via a protocol called SNMP (Simple Network Management
	Protocol).  It could simply be a case of putting too much data on a slow line, such as
	downloading enormous files over a modem link.
		
	E.	Check the cable, network hardware
	
	Ethernet/Cat5/UTP cabling can be sensitive to various factors, such as errors introduced
	by radiation from nearby fluorescent lighting, untwisting of the wires (UTP = "unshielded
	twisted pair"), etc.   
	
	Your network staff can also usually check logfiles on a router/switch/hub for errors, or
	run a debug command on the device.  Network hardware does occasionally break, as do network 
	interface cards (NIC) on servers and clients.
	
	
VI.	Compendium
		
If you can't figure out what's wrong at this point, you'll probably need help (helpdesk, USETNET
newsgroup, etc.)  Remember that the more information you supply from system debug commands, the
easier it'll be to help you find out where your problem lies.

(c) 2001 John Morgan Salomon