Saturday, June 27, 2009

How URLs Work

Uniform Resource Locators (URL), you know: www.google.com, www.amazon.com, etc. Believe it or not these strings of text hold a lot of information about the structure and design of the site you are connection to. URL are really paths which go down the file structure (like /home/mike/
or c:\windows\, more on this latter) to the resource (file in most cases) that you are remotely accessing. When you type www.twitter.com in your browser you are tell the site, the path of the resource you want to access.

The format of URLs are a bit odd (according to Wikipedia the creator of URLs, Tim Berners-Lee, regrets the format) . You see, it follows a reverse order before the first /. Lets break down an URL to illustrate how an URL works. Take this URL for an example:

http://comp-phil.blogspot.com/2009/06/using-just-addition-and-negation-to.html

The first part http:// says which Internet protocol you are using to access the site (other popular examples are ftp:// and https://). Every browser I have ever used assumes if you just type www.vanguard.com that want to use HTTP so it will prefix http:// to your URL.

The next part, comp-phil.blogspot.com, is a bit odd. You see it is backwards, the way it is parsed is from the last . (dot) to first . (dot)., comp-phil.blogspot.com, looks like this /com/blogspot/comp-phil to the web server you are connecting to.



The image above shows how (on a very small scale) the Internet is organized. The com tells the DNS server which branch of the Internet the site is on (org, gov, net, uk, etc.). The next part, blogspot, tells you which domain branch the site is on. In this example the comp-phil says which sub-domain the site is on. Last, the www says (basicly) that the sub-domain is www (World Wide Web). So, the example can be thought of as /com/blogspot/comp-phil/www/. There is an implied .:80 at the end of this section, so this section of the URL really looks like www.comp-phil.blogspot.com.:80, if you do not believe me try www.google.com.:80 in your browser, I'll wait. Welcome back, the . (dot) after the com, says that it is the root directory (think UNIX directory structure), the :80 at the end says that the server is listening to port 80. Port 80 is the IANA assigned port number for the World Wide Web HTTP (web sites).

The last section of the example, /2009/06/using-just-addition-and-negation-to.html, is just the directory structure from the server's sub-domain to the resource you are requesting. If you look on the server you would see that in the /2009/06/ directory of the sub-domain a HTML file called using-just-addition-and-negation-to.html.

That's it, not very comlicated at all (well the www.comp-phil.blogspot.com part is complex, but understandable). Next time you type in www.twitter.com/MikeMKH in your browser you will understand what you really doing (and while you are there follow me and say hi).