All in the <head> – Ponderings and code by Drew McLellan –

The Dangers of Automatically Generating hCards

Last month, Colin D. Devroe published a technique for using hCard in WordPress comments. Whilst it’s a nice idea to be using hCards for the names and contact details of commenters, potentially leading to useful things like being able to identify the same person across multiple sites and so on, I do worry that this technique isn’t workable.

Colin rightly suggests using the fn class name to identify the commenters name, and the url class name for the link to their site. The problem in this lies with the fn class name, which stands for ‘formatted name’. To understand why this is a problem, we need to know a little bit about what the hCard spec calls Implied n Optimisation.

Implied n Optimisation

An hCard has to have a name associated with it – the name of the person or organisation that the hCard is describing. Therefore, the n class name, which in turn contains class names like given-name and family-name is a requirement of a properly formed hCard. The exception to this rule (and this is the optimisation) is when the n is same as the (also required) fn, and the fn follows one of a short list of very specific formats. If the fn matches a known format and no n exists, anything consuming the hCard can reverse engineer the family-name and given-name from the fn.

As you can see from the wiki, the list of name formats is really quite small. It has to be. If we’re to imply the n from the fn, the fn needs to be tightly controlled.

Garbage in, Garbage out

Names are complex things, and online we’re all exposed to all sorts of names based on unfamiliar conventions from around the globe. It’s difficult enough to be able to manually break down a name into its component parts, let alone to do it automatically. I was going to use David Heinemeier Hansson’s name as an example, until I realised I don’t know if the family name is Hansson or Heinemeier Hansson. Either way, a basic fn isn’t going to cover it, and that’s a fairly straightforward example.

The only way you can accurately automatically generate an hCard is if you capture the specific data up-front. You need to capture given, additional and family names at the very least to enable people to have the resultant formatted name get somewhere close to how they want their name formatted. And of course, slipping a name into all those fields is a pain in the behind. Fine for an address book application, but too much effort for a comment form.

However, if you don’t collect the component parts of the name in separate fields, you lose all the semantic information at the capture stage. Once that’s gone, all you can do is examine the name to see if its format meets the optimisation rules. If the format’s good, then your hCard will be good. Else, your just churning out garbage.