All in the <head>

– Ponderings & code by Drew McLellan –

– Live from The Internets since 2003 –

About

The Dangers of Automatically Generating hCards

11 May 2006

Last month, Colin D. Devroe published a technique for using hCard in WordPress comments. Whilst it’s a nice idea to be using hCards for the names and contact details of commenters, potentially leading to useful things like being able to identify the same person across multiple sites and so on, I do worry that this technique isn’t workable.

Colin rightly suggests using the fn class name to identify the commenters name, and the url class name for the link to their site. The problem in this lies with the fn class name, which stands for ‘formatted name’. To understand why this is a problem, we need to know a little bit about what the hCard spec calls Implied n Optimisation.

Implied n Optimisation

An hCard has to have a name associated with it – the name of the person or organisation that the hCard is describing. Therefore, the n class name, which in turn contains class names like given-name and family-name is a requirement of a properly formed hCard. The exception to this rule (and this is the optimisation) is when the n is same as the (also required) fn, and the fn follows one of a short list of very specific formats. If the fn matches a known format and no n exists, anything consuming the hCard can reverse engineer the family-name and given-name from the fn.

As you can see from the wiki, the list of name formats is really quite small. It has to be. If we’re to imply the n from the fn, the fn needs to be tightly controlled.

Garbage in, Garbage out

Names are complex things, and online we’re all exposed to all sorts of names based on unfamiliar conventions from around the globe. It’s difficult enough to be able to manually break down a name into its component parts, let alone to do it automatically. I was going to use David Heinemeier Hansson’s name as an example, until I realised I don’t know if the family name is Hansson or Heinemeier Hansson. Either way, a basic fn isn’t going to cover it, and that’s a fairly straightforward example.

The only way you can accurately automatically generate an hCard is if you capture the specific data up-front. You need to capture given, additional and family names at the very least to enable people to have the resultant formatted name get somewhere close to how they want their name formatted. And of course, slipping a name into all those fields is a pain in the behind. Fine for an address book application, but too much effort for a comment form.

However, if you don’t collect the component parts of the name in separate fields, you lose all the semantic information at the capture stage. Once that’s gone, all you can do is examine the name to see if its format meets the optimisation rules. If the format’s good, then your hCard will be good. Else, your just churning out garbage.

- Drew McLellan

Comments

  1. § Richard Rutter:

    Interestng points Drew. And would a nickname, such as many people use in comments forms, be a valid fn?
    For a name example you could use Thomas Vander Wal whose surname is Vander Wal. At SxSW, Thomas regaled us with the hassle he had trying to get a driving license (in the US) with a two-word surname. Including the priceless question “Are you actually American?” Nice.

  2. § Paul Morriss:

    We had to sort out this name problem for one of our internal systems. The form to capture names fills a screen. As well as getting those component parts, family bit, personal bit, likes to be known as, other bits of the name, you have to give all possible options for the order.

    Then there all the things where wives have different family names from their husband, where the legal name is different from what they are known as, very different.

    Then we had to sort out what you sort on, what you display on reports and so on. I did some googling when we first started work on this problem, and I couldn’t find any resources online. There must be international organisations that have cracked this for their internal systems, but maybe not. Maybe they just let you have personal name, family name and a choice of two orders (family last or family first) and tough luck if your name won’t fit into that scheme.

  3. § ryan king:

    Drew, this is a very good summary of the problems. Names are very complex, which means data formats dealing with names need to cope with this complexity as best they can. I’d like to think that we’re doing a good job on this front with microformats, but I’m sure there’s room for improvement.

  4. § Colin D. Devroe:

    Nice catch, and this is something I thought about at length and have yet to come up with any type of good way to get away from the simple solution I’ve tried to provide.

    Rightfully, we’d want the most accurate and semantic result – but perhaps we’re asking for too much in this specific case. Sometimes people do not leave their real names when commenting at all, so obviously those times should be excluded – but how do we determine when someone’s name is real or fake? Perhaps asking someone to provide a nickname (as has been suggested) would alleviate guess work.

    I think, that in most instances the solution that I have put together will work for comments with first and last name scenarios – but I definitely see the drawbacks. My publishing this solution was merely an attempt to move things forward, open a discussion, and hopefully educated one or two people in what hCard (or any microformats) are. I think I’ve done this, since this is the type of discussion that needs to be had in order to figure out the best practices and practical uses for microformats.

    The main reason I put this together, was because as Tantek says “Anywhere someone’s name appears, it could be an hCard.” And so I tried to use that mantra. Though, perhaps web site comments is one of those times when hCards should not be used at all.

  5. § Tony:

    How about a check box to toggle between name and nickname? Would that help?

Photographs

Work With Me

edgeofmyseat.com logo

At edgeofmyseat.com we build custom content management systems, ecommerce solutions and develop web apps.

Follow me

Recent Links

Affiliation

  • Web Standards Project
  • Britpack
  • 24 ways

I made

Perch - a really little cms

About Drew McLellan

Photo of Drew McLellan

Drew McLellan (@drewm) has been hacking on the web since around 1996 following an unfortunate incident with a margarine tub. Since then he’s spread himself between both front- and back-end development projects, and now is Director and Senior Web Developer at edgeofmyseat.com in Maidenhead, UK (GEO: 51.5217, -0.7177). Prior to this, Drew was a Web Developer for Yahoo!, and before that primarily worked as a technical lead within design and branding agencies for clients such as Nissan, Goodyear Dunlop, Siemens/Bosch, Cadburys, ICI Dulux and Virgin.net. Somewhere along the way, Drew managed to get himself embroiled with Dreamweaver and was made an early Macromedia Evangelist for that product. This lead to book deals, public appearances, fame, glory, and his eventual downfall.

Picking himself up again, Drew is now a strong advocate for best practises, and stood as Group Lead for The Web Standards Project 2006-08. He has had articles published by A List Apart, Adobe, and O’Reilly Media’s XML.com, mostly due to mistaken identity. Drew is a proponent of the lower-case semantic web, and is currently expending energies in the direction of the microformats movement, with particular interests in making parsers an off-the-shelf commodity and developing simple UI conventions. He writes here at all in the head and, with a little help from his friends, at 24 ways.