The Dangers of Automatically Generating hCards
Last month, Colin D. Devroe published a technique for using hCard in WordPress comments. Whilst it’s a nice idea to be using hCards for the names and contact details of commenters, potentially leading to useful things like being able to identify the same person across multiple sites and so on, I do worry that this technique isn’t workable.
Colin rightly suggests using the fn
class name to identify the commenters name, and the url
class name for the link to their site. The problem in this lies with the fn
class name, which stands for ‘formatted name’. To understand why this is a problem, we need to know a little bit about what the hCard spec calls Implied n Optimisation.
Implied n Optimisation
An hCard has to have a name associated with it – the name of the person or organisation that the hCard is describing. Therefore, the n
class name, which in turn contains class names like given-name
and family-name
is a requirement of a properly formed hCard. The exception to this rule (and this is the optimisation) is when the n
is same as the (also required) fn
, and the fn
follows one of a short list of very specific formats. If the fn
matches a known format and no n
exists, anything consuming the hCard can reverse engineer the family-name
and given-name
from the fn
.
As you can see from the wiki, the list of name formats is really quite small. It has to be. If we’re to imply the n
from the fn
, the fn
needs to be tightly controlled.
Garbage in, Garbage out
Names are complex things, and online we’re all exposed to all sorts of names based on unfamiliar conventions from around the globe. It’s difficult enough to be able to manually break down a name into its component parts, let alone to do it automatically. I was going to use David Heinemeier Hansson’s name as an example, until I realised I don’t know if the family name is Hansson or Heinemeier Hansson. Either way, a basic fn
isn’t going to cover it, and that’s a fairly straightforward example.
The only way you can accurately automatically generate an hCard is if you capture the specific data up-front. You need to capture given, additional and family names at the very least to enable people to have the resultant formatted name get somewhere close to how they want their name formatted. And of course, slipping a name into all those fields is a pain in the behind. Fine for an address book application, but too much effort for a comment form.
However, if you don’t collect the component parts of the name in separate fields, you lose all the semantic information at the capture stage. Once that’s gone, all you can do is examine the name to see if its format meets the optimisation rules. If the format’s good, then your hCard will be good. Else, your just churning out garbage.