Wednesday 3 February 2010: More famous than Simon Cowell
Bear with me on the title...
I don't usually write about work-related stuff here but am going to make a brief exception just this once. I am supposed to be up in London today at a meeting to discuss 'persistent identifiers' - such is the exciting life I lead. However, I also now have a hospital appointment as well, to get my ears fixed, and hospital appointments are not that easy to come by so I decided to miss the London meeting and go to the hospital instead.
Persistent identifiers are a regular topic of conversation for me, particularly as they pertain to the academic and cultural heritage sectors. An identifier is a name or label for something (e.g. a research paper, a museum object, or the digitised versions of those things) that one can use to refer to that thing even when you don't have it immediately to hand. For identifiers to work, there has to be some level of agreement about what is being identified, at least by all the parties that need to make use of it. This is usually achieved by having some widely agreed way of moving from the identifier to the thing being identified (sometimes known as 'dereferencing'). The persistence aspect comes about because one wants to be able to refer to things for very long periods of time - 100s of years in some cases, particularly in the cultural heritage sector. In the context of relatively new technology (like the Internet), thinking about how to make things work persistently for 100s of years is a non-trivial task.
Are you still reading?
This blip entry has an identifier. So did yesterday's. So does the whole Blipfoto website. Many of you will know these identifiers as URLs (or Uniform Resource Locators to use the full name). But it is worth noting that in a detailed technical sense, the term URL is somewhat disputed. In technical discussions I now tend to prefer to use the term 'http URI' - the 'http' bit referring to the way the URL starts and the 'URI' bit referring to Uniform Resource Identifier. This is usually, but not always, less contentious.
Let's look at the 'identifier' for my blip yesterday:
What makes this URL persistent or not? How long will it remain useful as an identifier? A year? Yes definitely. Two years? Yup. Five years? Very likely. Ten years? Probably. 20 years? Maybe. 50 years? Who knows! (I'll be dead anyway). 100 years? Haven't got a clue! In a sense, paying the Blipfoto membership fee is a vote of confidence in the persistence of Blipfoto identifiers for a reasonable time into the future - but I'm not sure what 'reasonable' actually means here.
So long term persistence isn't guaranteed by any means. But there are practical steps we can take to help things run more smoothly. The URL above is what the W3C (the body that looks after the Web) would call 'uncool' (as opposed to it being a cool URI). That means that it is potentially not as persistent as it could be? Why? Because it contains the string '.php'. PHP is the programming language used to build the Blipfoto website. If, in 10 or 20 years time, the Blipfoto team decide to stop using PHP to deliver the site (if use of the PHP programming language dies out for example) then they will either have to make the '.php' URLs work using some other language, or they will 'break' all the existing identifiers. It won't be an impossible task to work around this... but it will be a factor in making Blipfoto identifiers persistent into the future.
There are also other aspects to the persistence of this identifier - the domain name for example, i.e. the 'www.blipfoto.com' part of the URL. Domain names are essentially rented, and, like any rented commodity, agreements about the right to use a particular domain name can lapse for various reasons. In extreme cases (remembering that we are talking about very long periods of time here) the whole infrastructure of the Internet might change.
Now, of course, the persistence of Blipfoto URLs isn't a major worry for most of the world - even for most Blipfoto members. But in an academic context, the identifiers used to refer to academic research papers, or the research data that underpins those papers, is important - and it's important over the long term. For example, one can imagine that researchers in 20 years time will want to be able to refer back to the papers being used now to predict trends in global warming.
And so to today's blip - which is of an identifier. It's the identifier for a railway bridge in the UK. I assume (though I don't know for sure) that all UK railway bridges have such an identifier. I don't know how long such identifiers have been in use and it is somewhat hard to predict how long they will work into the future but it's probably fair to say that it will be for quite some time.
The community of use for this identifier is fairly small - extending not much beyond the people who work on the railways and the emergency services I would guess. If I said to you, "go to bridge 179-40", you wouldn't have a clue what I was talking about and you probably wouldn't have any way of finding out.
Likewise, if I said to you, "go and read doi:10.1037/0003-066X.59.1.29" I'm guessing that most of you wouldn't know what to do? (This is an example of the kind of identifier, called a DOI, typically used to refer to academic papers - it's actually the identifier for a paper called "How the Mind Hurts and Heals the Body" by Ray, Oakley published in American Psychologist. Vol 59(1), Jan 2004, 29-40).
What makes the bridge identifier persistent? It's essentially a social construct. It's not a technical thing (primarily). It's not the paint the number is written in, or the bricks of the bridge itself, or the computer system at head office that maps the number to a map reference. These things help... but it's mainly people that make it persistent.
What's interesting, and so powerful, about http URIs is how widely they are understood. I don't know what proportion of the world's population you could show 'http://www.blipfoto.com' to and they would understand it and know what to do with it? I'm guessing it would be more than 50% - well over 50% probably. More than would recognise a picture of Simon Cowell, David Beckham or Michael Jackson? I guess so?
That's fame that is!