What makes a cool URI?
A cool URI is one which does not change.
What sorts of URI change?
URIs don’t change: people change them.
There are no reasons at all in theory for people to change URIs (or stop
maintaining documents), but millions of reasons in practice.
In theory, the domain name space owner owns the domain name space and
therefore all URIs in it. Except insolvency, nothing prevents the domain name
owner from keeping the name. And in theory the URI space under your domain
name is totally under your control, so you can make it as stable as you like.
Pretty much the only good reason for a document to disappear from the Web is
that the company which owned the domain name went out of business or can no
longer afford to keep the server running. Then why are there so many dangling
links in the world? Part of it is just lack of forethought. Here are some
reasons you hear out there:
We just reorganized our website to make it better.
Do you really feel that the old URIs cannot be kept running? If so, you
chose them very badly. Think of your new ones so that you will be able to
keep then running after the next redesign.
We have so much material that we can’t keep track of what is out of date
and what is confidential and what is valid and so we thought we’d better just
turn the whole lot off.
That I can sympathize with – the W3C went through a period like that, when
we had to carefully sift archival material for confidentiality before making
the archives public. The solution is forethought – make sure you capture with
every document its acceptable distribution, its creation date and ideally its
expiry date. Keep this metadata.
Well, we found we had to move the files…
This is one of the lamest excuses. A lot of people don’t know that servers
such as Apache give you a lot of control over a flexible relationship between
the URI of an object and where a file which represents it actually is in a
file system. Think of the URI space as an abstract space, perfectly
organized. Then, make a mapping onto whatever reality you actually use to
implement it. Then, tell your server. You can even write bits of your server
to make it just right.
John doesn’t maintain that file any more, Jane does.
Whatever was that URI doing with John’s name in it? It was in his
directory? I see.
We used to use a cgi script for this and now we use a binary program.
There is a crazy notion that pages produced by scripts have to be located
in a “cgibin” or “cgi” area. This is exposing the mechanism of how you run
your server. You change the mechanism (even keeping the content the same )
and whoops – all your URIs change.
For example, take the National Science Foundation:
NSF Online Documents
http://www.nsf.gov/cgi-bin/pubsys/browser/odbrowse.pl
the main page for starting to look for documents, is clearly not going to
be something to trust to being there in a few years. “cgi-bin” and
“oldbrowse” and “.pl” all point to bits of how-we-do-it-now. By contrast, if
you use the page to find a document, you get first an equally bad
Report of Working Group on Cryptology and Coding Theory
http://www.nsf.gov/cgi-bin/getpub?nsf9814
for the document’s index page, but the html document itself by contrast is
very much better:
http://www.nsf.gov/pubs/1998/nsf9814/nsf9814.htm
Looking at this one, the “pubs/1998” header is going to give any future
archive service a good clue that the old 1998 document classification scheme
is in progress. Though in 2098 the document numbers might look different, I
can imagine this URI still being valid, and the NSF or whatever carries on
the archive not being at all embarrassed about it.
I didn’t think URLs have to be persistent – that was URNs.
This is the probably one of the worst side-effects of the URN discussions.
Some seem to think that because there is research about namespaces which will
be more persistent, that they can be as lax about dangling links as they like
as “URNs will fix all that”. If you are one of these folks, then allow me to
disillusion you.
Most URN schemes I have seen look something like an authority ID followed
by either a date and a string you choose, or just a string you choose. This
looks very like an HTTP URI. In other words, if you think your organization
will be capable of creating URNs which will last, then prove it by doing it
now and using them for your HTTP URIs. There is nothing about HTTP which
makes your URIs unstable. It is your organization. Make a database which maps
document URN to current filename, and let the web server use that to actually
retrieve files.
If you have gotten to this point, then unless you have the time and money
and contacts to g