All in the <head>

– Ponderings & code by Drew McLellan –

– Live from The Internets since 2003 –

About

Can Microformats be Validated?

26 October 2006

Screenshot of rel-lint If there’s one tool that has really helped with the adoption of web standards over the years, it’s the W3C’s HTML validation service. We all make mistakes in our code on a pretty frequent basis, and having a tool to help us catch those mistakes improves the quality of what we’re publishing. What’s more, it gives us the confidence that what we’re publishing is of good quality, which is rewarding in itself. As web developers, running our code through a validator has become a pretty standard part of our daily workflow.

So when it comes to microformats, a frequent questions is where’s the validator? Truth be told, writing a validator is a fiddly task. Writing a really good validator is hard. That aside, it raises the question of whether writing a validator for microformats is even possible at all.

Consider the situation with validating a tag-based language like HTML. The rules state which elements can be used, and for each element what attributes are acceptable. It’s clear to see that, in principal at least, that should be relatively straightforward to express in software. In such a situation, I know that if I come across a tag with the name H7, I can see that H7 isn’t in my list of allowable elements and therefore it’s an error.

With microformats, however, we’re embedding a dialect inside HTML. Whilst it’s easy to spot items that are part of that dialect, it doesn’t hold true that anything not recognisable as being of that dialect is an error. To take an example for hCard, I might have an image with a class name of photograph as part of an hCard block. The official class name from hCard is photo, but that doesn’t mean that a value of photograph is an error – it’s just not something we’re looking for.

To flag the above as an error would be like telling a Katheryn that she’s wrong and her name is really Catherine, simply because that’s the form of the name you’re expecting. It doesn’t add up, and it’s never good to piss off a Kate.

Validators (and Kates) aside, the other type of tool that exists in the programming world for checking code is what’s known as a lint tool. The subtle difference here is that a lint tool looks through your source code and highlights things that might be bugs or might cause problems. A bit like some of the popular accessibility checking tools, really. The principal being that it’s not easy to tell for sure if there’s a problem (or going to be a problem), but you can look for patterns that indicate a problem might arise.

I thought it’d be useful to take this idea and apply it to microformats. The result is rel-lint – a bookmarklet tool for checking values assigned to the rel attribute of links. This is where XFN values live, as well as tags, rel-license and so on. The tool checks any rel values against a known list and flags any not recognised. This doesn’t mean they’re wrong, just that they need checking. I’ve found it useful to have living in my bookmark bar for the last few weeks, and whilst it’s still only beta quality (there are bugs) I’d urge you to give it a try.

Turns out no one can spell ‘colleague’. Who knew?

- Drew McLellan

Comments

  1. § Frances Berriman:

    As well as colleague, I reguarly leave the c out of acquaintance. Makes me think there should be a nice easy way to add rel values to links in our most popular blogging softwares.

    I’ve been using your lint when I remember, and I do like it. I like spell checkery type things on the whole, and getting lots of green results is always heart warming.

  2. § Scott Reynen:

    I think a microformat validator is possible, and I would have started playing with one myself a long time ago if so many other people hadn’t mentioned that they were working on this problem. The existence of non-error issues doesn’t make a validator any less feasible. The HTML validator and the feed validator both give warnings for things that might be errors, but aren’t necessarily, and your photograph example seems to fall under this category.

    But there are also clear rules in microformats, which could result in actual errors when broken. It’s clearly wrong to have hcards with no name. That’s an error a validator could catch. There’s a certain format to dates in hcalendar, and anything that doesn’t follow that format is just wrong. And so on. Sure, there’s plenty of ambiguous “that’s probably not what you meant” areas in microformats, but there are also plenty of actual errors.

  3. § Daniel Morrison:

    When I get confused, I tend to pull up a tool that will show me the parsed output to ensure everything I want shows up correctly. I’ve used Suda’s converters, the Bookmarklet, or Tails to see if my microformats can be parsed as I think they can.

    Unfortunately, these tools aren’t perfect. The biggest problem is that they don’t show all data items (in hCard, that could get huge), so you’re left to guess.

    I think if we had a simple, more complete way to view all microformats on a page, combined with the HTML validator, we would at least know that our microformats are doing what we expect.

  4. § pauldwaite:

    Good call. I love blog posts that posit a problem and then, smooth as you like, pull out a slick solution.

  5. § Frances Berriman:

    Could rel=”home” become a non-suspicious value in the lint, btw? It’s documented as a microformat and even supported in some browsers.

  6. § Drew McLellan:

    Have done, Frances.

  7. § Richard Conyard:

    How would you validate optimisations though?

  8. § Cameron Adams:

    Surely it’s possible to create a validator, otherwise it wouldn’t be possible to parse a page for all microformats, and therefore impossible to use microformats as an API.

    It’s one of the problems I’ve come across when trying to access microformats like an API—how easy is it to get the data out of one big string?

  9. § jewellery:

    I agree that there must be one out there. At the University of Georgia, where I teach, we’ve been using a new software design that we created, called EMMA. Basically it allows students in first year composition (or any other English course) to format their papers into XML format and then upload them so they are accessible on a central website. Students are taught, early in the semester, how to work with Tags in XML. In addition to standard Tags such as paragraph and font Tags, we also frequently have them Tag items like Thesis statements or independent clauses. The most important part of EMMA, though, is the ability the students have to run a validation check before they submit their files to the EMMA website. Without this, we would have a lot more problems with uploads than we have.I agree that there must be one out there. At the University of Georgia, where I teach, we’ve been using a new software design that we created, called EMMA. Basically it allows students in first year composition (or any other English course) to format their papers into XML format and then upload them so they are accessible on a central website. Students are taught, early in the semester, how to work with Tags in XML. In addition to standard Tags such as paragraph and font Tags, we also frequently have them Tag items like Thesis statements or independent clauses. The most important part of EMMA, though, is the ability the students have to run a validation check before they submit their files to the EMMA website. Without this, we would have a lot more problems with uploads than we have.

  10. § Sahibinden:

    That’s an error a validator could catch. There’s a certain format to dates in hcalendar and anything that doesn’t follow that format is just wrong

  11. § emlak:

    Good call. I love blog posts that posit a problem and then, smooth as you like, pull out a slick solution.

Photographs

Work With Me

edgeofmyseat.com logo

At edgeofmyseat.com we build custom content management systems, ecommerce solutions and develop web apps.

Follow me

Affiliation

  • Web Standards Project
  • Britpack
  • 24 ways

I made

Perch - a really little cms

About Drew McLellan

Photo of Drew McLellan

Drew McLellan (@drewm) has been hacking on the web since around 1996 following an unfortunate incident with a margarine tub. Since then he’s spread himself between both front- and back-end development projects, and now is Director and Senior Web Developer at edgeofmyseat.com in Maidenhead, UK (GEO: 51.5217, -0.7177). Prior to this, Drew was a Web Developer for Yahoo!, and before that primarily worked as a technical lead within design and branding agencies for clients such as Nissan, Goodyear Dunlop, Siemens/Bosch, Cadburys, ICI Dulux and Virgin.net. Somewhere along the way, Drew managed to get himself embroiled with Dreamweaver and was made an early Macromedia Evangelist for that product. This lead to book deals, public appearances, fame, glory, and his eventual downfall.

Picking himself up again, Drew is now a strong advocate for best practises, and stood as Group Lead for The Web Standards Project 2006-08. He has had articles published by A List Apart, Adobe, and O’Reilly Media’s XML.com, mostly due to mistaken identity. Drew is a proponent of the lower-case semantic web, and is currently expending energies in the direction of the microformats movement, with particular interests in making parsers an off-the-shelf commodity and developing simple UI conventions. He writes here at all in the head and, with a little help from his friends, at 24 ways.