All in the <head>

– Ponderings & code by Drew McLellan –

– Live from The Internets since 2003 –

About

Designing URIs

7 February 2005

So this is a quick hack of a post I’ve been spending far too long failing to complete. Time to just get it out the door. If parts are nonsensical, please accept my apologies. You get what you pay for.

A fair bit has been written over the years about designing good URIs. Whilst traditional teaching on the subject must also apply to web applications to some extent, how far does it go? Does the nature of the documents being served (in this case ‘active’ documents as part of a larger application) hold sway over the URI of the page?

First Principals

I tend to be pretty fussy about what appears in the location bar of any sites or apps that I architect. Partly this is down to aesthetics and some idealist goal of elegance, but primarily it rests with the core values of sustainability, perception of stability and also ease of use. Let’s unpack that.

The subject of sustainability in URI design should be familiar to us all. At a base level, /contact is good, but /contact.asp is bad because when you transition your site to PHP next summer the name of that document is going to change. A good URI doesn’t refer to a web page with a document name. Unless the visitor is supposed to grab the file and take it away from the site, leave the file extension off.

Perceived Stability

Slightly more abstract than this is the concept of perceived stability, which I think is best illustrated with an example from last weekend. Dissatisfied with the tools available for discovering what podcasts are available, I was taking a look into writing my own scripts to parse the ipodder.org podcast directory and find stuff I might be interested in. The first job was to find the URI to the directory so that I could take a look at it. After some hunting around, I found this address:

http://www.ipodder.org/discuss/reader$4.opml

Well, ok it looks fairly compact, but I have a few issues with it. The first is that dollar sign. Are those even legal? Well, with the dollar being so weak it’s certainly not a good thing to be throwing into your URIs, that’s for sure. My second issue is the file name as a whole – whilst I’m not sweating the OPML extension as I know that to be XML, what’s with the reader business? And finally, discuss? That suggests that this was posted by a user and is not a permanent resource I should be building an application on. So with this bad taste in my mouth, I posted to a list and just asked if it was the right address. I was releaved to find that I had the wrong page. Phew! There I go getting hot under the collar for no reason. But wait until you see the real URI:

http://homepage.mac.com/dailysourcecode/DSC/ipodderDirectory.opml

Deeeep breath. So I have issues here too. The first is the dot mac account, which is obviously at the mercy of Apple and where they take their dot mac service in the future. The second issue is that the document I want (a directory of podcasts) is filled under the name of a specific podcast. It’s just all messed up. (And don’t even get me started on why the damn thing is in OPML format). See how the chosen URI can have a detrimental effect on the user’s perception of stability of that URI?

Ease of Use

So what would a better address for the ipodder.org directory be? Well, in the first instance, it should be on the ipodder.org domain. That’s where a user would expect to find the feed – it comes down to ease of use. Secondarily, the feed isn’t part of the mail content of ipodder.org, so I’d expect it be to tucked away in a directory distinct from the rest of the site’s content. How about this:

http://ipodder.org/xml/directory.opml

Short and too the point. Memorable, and most of all, easy.

Where was I?

Oh yes, so that’s how URI design works at a basic level. The challenge that I’m currently faced with is deciding if the principals of the design can or should be fundamentally different for a web application vs a regular site. I’ll tell you what’s prompted this thought – working with Rails. Rails uses a URI model that goes pretty much like this:

/controller/method/options

Well, I guess that’s pretty neat. A controller in use is often mapped to something like an object within your app – say, a user. So we have a controller for users. The address to edit user #1234 would be something like:

/users/edit/1234

That makes a lot of sense. What it’s doing is taking a object oriented look at the address structure rather than a traditional hierarchical view. The URIs reflect the logical structure of the application, not the hierarchical flow of the user interface. A subtle shift, and one that may have zero effect, depending on how your interface is designed.

On that note, I just checked some of mine. Here’s how I edit user #1234 in one of my recent apps:

/admin/users/edit/?id=1234

So that would be pretty much the same then. I’m going to have to think further about whether that means that my interface is well laid out, or whether it means that there’s little fundamental difference between app-logic designed URIs and UI-hierarchy designed URIs. I dunno. Discuss.

- Drew McLellan

Comments

  1. § Turnip: I think I good URI should give the end user information about what they are likely to find. This is why URI slugs are great; I can see the title of a post I’m not even looking at.

    The ”/users/edit/” part is fine, but the “1234” part isn’t so good. Presumably that “1234” would map to the user’s name or username in the database. Well, why not have ”/users/edit/joe-bloggs” then? That way the over-worked, stressed out site admins who should have gone to lunch 10 minutes ago can easily see exactly where they’re going just by hovering the link. It makes things so much easier

    A great example of crappy URI structure is SpreadFirefox: The URI for the latest post is “http://www.spreadfirefox.com/?q=node/view/11288”. Ok, it’s crappy because it uses a querystring, but that’s easy to change to “http://www.spreadfirefox.com/node/view/11288” with a drop of mod_rewrite. More importantly though, it tells me absolutely nothing about where I’m about to navigate to, making it harder for me to decide whether I really want to click that link or not.
  2. § Anne: In reply to the last sentence in comment 1. That brings us to the next point, good linking text ;-).

    By the way, URIs recently turned into IRIs, making it even more confusing.
  3. § Drew McLellan: Your 1234 vs joe-bloggs point is an interesting one. Of course, whatever parameter you pass to your edit process has to be able to uniquely identify a specific row in your database. That’s why numeric IDs that already exist in your table are often the easiest way to go. I guess the suggestion here is to supplement the database’s internal IDs with friendly human-readable ones?
  4. § Turnip: Yes, keep the numerical IDs, and have human friendly ones too.

    I recently implemented URI slugs for Digital Proof. Whilst I was hoping to be lazy and steal the code from WordPress, as far as I could see, it didn’t support properly unique slugs. That’s fine, assuming you’re using the /archives/year/month/day/slug format, and aren’t going to make two posts with the same name on the same day, but as we weren’t using dates in our URIs I wanted something that was truly unique. So when a post is made, the system runs through and checks if that slug is already in the database. If so, it simply appends -X, where X is the duplicate number. That works fine, except it seems to be slightly bugged and applying dupe numbers where they’re not needed ; ). Must sort that some time.

    Anyway, the real point is making the user’s life easier. If creating unique URI slugs is the way to do so then that’s most certainly what should be done.
  5. § kumar mcmillan: yes URIs must be unique. But since descriptiveness is important for us “look-ahead” linkers—and not to mention search engines :)—then the solution to the above problem is add your keywords as “dummy” parameters.

    /users/edit/1234_joe-bloggs

    you probably have noticed that most search-engine-friendly news websites already do this.
  6. § Drew McLellan: Kumar – not just the news sites. Try playing around with the URIs right here. My format is:

    /section/article_id/whatever-you-like-it-makes-no-odds
  7. § Lach: Of course, you do have to be careful if you allow wildcards in URIs that you’re not going to need to change the format of them later on and have the wildcard piece become important.

    One thing that interests me more with URIs is how much information should we put into them? Obviously the more that’s there, the better the idea you can get about where you’re heading to. But when does the length become so great that the new pieces of information make it harder to see what you’re going to?

    The other big problem with URIs, is what if you want two different classification schemes. Do you have the same article up at completely different URIs so you can use both? What then about visited links, and which do you choose as the main link? The best example of this I can think of is a news site. Should urls be organised by date, or by category posted in? If I’m looking back through visited URIs in my address bar dropdown, I might want to look for all articles on a certain date, which placing them under category first will ruin. But equally, what if I can only remember what category it was in? If it’s organised date first, then I can’t check through by category easily at all.

    And of course a url like

    newssite.com/2005/02/04/news/abbreviated_story_title

    is really getting a bit into the category of too much cruft there making it harder to look at what’s going on. And that’s without any CMS generated junk.
  8. § Scott: Here is an interesting read on the same topic at the W3C :

    http://www.w3.org/Provider/Style/URI

    The part about removing file extentions in the URI is interesting, which I’ve done for some file types (scripts) but not all as it points out.

    I had not heard of ISIs, thanks Anne. I found this at the W3C concerning ISIs :

    http://www.w3.org/International/O-URL-and-ident.html

    Basically, IRIs are an international version of URIs. IRIs allow the use of the Universal Character Set while URIs only allow the use of the US-ASCII character set.
  9. § Stu Schaff: I really like your idea, Drew, and I will most likely be implementing it on my new project—if you don’t mind, of course.
  10. § Dustin Diaz: I would totally follow your advice if we could afford the $400 Windows ReWrite module for our IIS server.

    Otherwise, I always do this on my personal websites.

    .htaccess is my best friend. They’re starting to look almost as big as my style sheets… (almost)
  11. § Tim: Drew,

    Have you got in touch with Adam Curry to suggest a change to the location of the opml file? He’s a pretty amiable guy – I reckon he’d move it if you explained it calmly and clearly ;)
  12. § Drew McLellan: Adam’s a great guy. I’m certainly not knocking him or his work specifically. Sometimes you have to do whatever you need to do to get the show on the road nice and quickly. I think that’s the circumstance here. Also with OPML. Stay tuned for that rant ;)
  13. § Small Paul: I like the idea of hiding filename extensions (except where the file type is part of what you’re offering – i.e. download this as PDF), as it stays true to the object-oriented principle of keeping the interface the same, and hiding the implementation.

    The URI is the interface to your website, and is important. Granted, you’ve got search engines and links to help people get where they want, but thinking about the URL interface of your site is probably a very good start to thinking about its information architecture, and all aspects of the user experience – aside from letting people play with the URI to find what they want.

    And as this post emphasised, if you want other people to make use of your site programmatically, a good URI interface can speed the process up no end.
  14. § Matt Patterson: It’s nice to see you tackling the URI issue, mate. I’ve been thinking about this kind of thing a lot at work, because I’ve been involved with building a ReSTful web app.

    I’ve got issues with the controller/method/options format (I’ve got even bigger issues with the messy query-string-laden urls that we’re so often dumped with).

    My big issue is that the web is not an application in the traditional sense. The pseudo-OO address style is a lot like the RPC style, where URIs point to functions not to resources.

    HTTP tells us that the web is a collection of resources, which we can manipulate (by requesting representations of those resources, or modifiying or even deleting them…) using HTTP verbs like GET, POST, PUT and DELETE. If I’ve got a web app then I’ve got some kind of information space, so it makes sense to view the URI map as a map of resources and various kinds of representations of them.

    So, in the user admin example you could have this:

    /users/1234/edit

    to get an editable HTML representation of user 1234.

    This kind of URI pattern works on a restriction basis, from class of thing, to thing, to special case of thing, so:

    /users/1234 would give us a plain old representation of the user,

    and /users might give us a list of all the users. Equally, when there are several things I can do with a user then a resource-focussed URI structure keeps things that bit more intelligible:

    /users/1234/favourites might return a list of that user’s favourite things. We could POST to it to add new things, and GET it to see HTML. We could even use content negotiation to GET different representations of the same resource by asking for HTML or XML in the request headers.

    This is all straight-out-of-Fielding stuff. I’d advise digging out his thesis, and even reading the HTTP spec (it’s actually very readable).
  15. § Jon Berg: For the first principals I think you would go a long way with a regular expression that would transform the URL when just changing the file extention. For the ’?’ I don’t like it in URLs, however I belive Google indexes the ?-URLs as long as it is a prominent page, but SE maybe not be so happy with it.
  16. § JHill: On our CMS and ecommerce sites, we’ve implemented pretty extensive URL aliasing using ISAPI Rewrite which is cheaper than the previously mentioned $400, the full version is $69. Might be helpful to those running IIS.

    On the WebLinc site, we’ve dynamically built out aliases for every ‘page’ on the site.
  17. § matt mikulla: Hello. I am very new to this and I’m trying to follow along. Are there any excellent resources on how to create URIs minus file extensions when develping static sites.

    Also how do you all feel about the uri having or not having a trailing slash?

    If anyone can help give me some direction I would really appreciate it.
  18. § Dustin: pardon my brief ignorance, but after I posted the comment about having an IIS server, is there another way of designing short URI’s on IIS w/o the use of the $400 reWrite module they offer for windows servers…

    and no, I’m not looking for a solution where I have to manually create directories (nor do I want a script that just creates them on the fly…I know how to do that).

    the solution need not be free…but I’m just hoping for something that could possibly be cheaper than what the main stream module costs.

    Thanks to anyone who helps out.
  19. § Joe: Excellent entry and discussion.

    I’ve lost track of how many times I’ve forwarded that Cool URIs Don’t Change article in the interests of enlightening colleagues, some who would otherwise devise some of the most convoluted linkage I have ever laid eyes on.

    (... and not necessarily on purpose either – that’s the kicker. Most didn’t realize what kind of rabbit hole they were diving into at the time!)

    Reading all this inspired a (somewhat related) change to my site early this morning: enforcing a fully qualified domain name for all HTTP requests.
  20. § Alex: URL Rewrite for just 23,-
    http://www.smalig.com/url_rewrite/

Photographs

Work With Me

edgeofmyseat.com logo

At edgeofmyseat.com we build custom content management systems, ecommerce solutions and develop web apps.

Follow me

Affiliation

  • Web Standards Project
  • Britpack
  • 24 ways

Perch - a really little cms

About Drew McLellan

Photo of Drew McLellan

Drew McLellan (@drewm) has been hacking on the web since around 1996 following an unfortunate incident with a margarine tub. Since then he’s spread himself between both front- and back-end development projects, and now is Director and Senior Web Developer at edgeofmyseat.com in Maidenhead, UK (GEO: 51.5217, -0.7177). Prior to this, Drew was a Web Developer for Yahoo!, and before that primarily worked as a technical lead within design and branding agencies for clients such as Nissan, Goodyear Dunlop, Siemens/Bosch, Cadburys, ICI Dulux and Virgin.net. Somewhere along the way, Drew managed to get himself embroiled with Dreamweaver and was made an early Macromedia Evangelist for that product. This lead to book deals, public appearances, fame, glory, and his eventual downfall.

Picking himself up again, Drew is now a strong advocate for best practises, and stood as Group Lead for The Web Standards Project 2006-08. He has had articles published by A List Apart, Adobe, and O’Reilly Media’s XML.com, mostly due to mistaken identity. Drew is a proponent of the lower-case semantic web, and is currently expending energies in the direction of the microformats movement, with particular interests in making parsers an off-the-shelf commodity and developing simple UI conventions. He writes here at all in the head and, with a little help from his friends, at 24 ways.