If there’s one tool that has really helped with the adoption of web standards over the years, it’s the W3C’s HTML validation service. We all make mistakes in our code on a pretty frequent basis, and having a tool to help us catch those mistakes improves the quality of what we’re publishing. What’s more, it gives us the confidence that what we’re publishing is of good quality, which is rewarding in itself. As web developers, running our code through a validator has become a pretty standard part of our daily workflow.
So when it comes to microformats, a frequent questions is where’s the validator? Truth be told, writing a validator is a fiddly task. Writing a really good validator is hard. That aside, it raises the question of whether writing a validator for microformats is even possible at all.
Consider the situation with validating a tag-based language like HTML. The rules state which elements can be used, and for each element what attributes are acceptable. It’s clear to see that, in principal at least, that should be relatively straightforward to express in software. In such a situation, I know that if I come across a tag with the name H7, I can see that H7 isn’t in my list of allowable elements and therefore it’s an error.
With microformats, however, we’re embedding a dialect inside HTML. Whilst it’s easy to spot items that are part of that dialect, it doesn’t hold true that anything not recognisable as being of that dialect is an error. To take an example for hCard, I might have an image with a class name of photograph as part of an hCard block. The official class name from hCard is photo, but that doesn’t mean that a value of photograph is an error – it’s just not something we’re looking for.
To flag the above as an error would be like telling a Katheryn that she’s wrong and her name is really Catherine, simply because that’s the form of the name you’re expecting. It doesn’t add up, and it’s never good to piss off a Kate.
Validators (and Kates) aside, the other type of tool that exists in the programming world for checking code is what’s known as a lint tool. The subtle difference here is that a lint tool looks through your source code and highlights things that might be bugs or might cause problems. A bit like some of the popular accessibility checking tools, really. The principal being that it’s not easy to tell for sure if there’s a problem (or going to be a problem), but you can look for patterns that indicate a problem might arise.
I thought it’d be useful to take this idea and apply it to microformats. The result is rel-lint – a bookmarklet tool for checking values assigned to the rel attribute of links. This is where XFN values live, as well as tags, rel-license and so on. The tool checks any rel values against a known list and flags any not recognised. This doesn’t mean they’re wrong, just that they need checking. I’ve found it useful to have living in my bookmark bar for the last few weeks, and whilst it’s still only beta quality (there are bugs) I’d urge you to give it a try.
Turns out no one can spell ‘colleague’. Who knew?




Comments
As well as colleague, I reguarly leave the c out of acquaintance. Makes me think there should be a nice easy way to add rel values to links in our most popular blogging softwares.
I’ve been using your lint when I remember, and I do like it. I like spell checkery type things on the whole, and getting lots of green results is always heart warming.
I think a microformat validator is possible, and I would have started playing with one myself a long time ago if so many other people hadn’t mentioned that they were working on this problem. The existence of non-error issues doesn’t make a validator any less feasible. The HTML validator and the feed validator both give warnings for things that might be errors, but aren’t necessarily, and your photograph example seems to fall under this category.
But there are also clear rules in microformats, which could result in actual errors when broken. It’s clearly wrong to have hcards with no name. That’s an error a validator could catch. There’s a certain format to dates in hcalendar, and anything that doesn’t follow that format is just wrong. And so on. Sure, there’s plenty of ambiguous “that’s probably not what you meant†areas in microformats, but there are also plenty of actual errors.
When I get confused, I tend to pull up a tool that will show me the parsed output to ensure everything I want shows up correctly. I’ve used Suda’s converters, the Bookmarklet, or Tails to see if my microformats can be parsed as I think they can.
Unfortunately, these tools aren’t perfect. The biggest problem is that they don’t show all data items (in hCard, that could get huge), so you’re left to guess.
I think if we had a simple, more complete way to view all microformats on a page, combined with the HTML validator, we would at least know that our microformats are doing what we expect.
Good call. I love blog posts that posit a problem and then, smooth as you like, pull out a slick solution.
Could rel=â€home†become a non-suspicious value in the lint, btw? It’s documented as a microformat and even supported in some browsers.
Have done, Frances.
How would you validate optimisations though?
Surely it’s possible to create a validator, otherwise it wouldn’t be possible to parse a page for all microformats, and therefore impossible to use microformats as an API.
It’s one of the problems I’ve come across when trying to access microformats like an API—how easy is it to get the data out of one big string?
I agree that there must be one out there. At the University of Georgia, where I teach, we’ve been using a new software design that we created, called EMMA. Basically it allows students in first year composition (or any other English course) to format their papers into XML format and then upload them so they are accessible on a central website. Students are taught, early in the semester, how to work with Tags in XML. In addition to standard Tags such as paragraph and font Tags, we also frequently have them Tag items like Thesis statements or independent clauses. The most important part of EMMA, though, is the ability the students have to run a validation check before they submit their files to the EMMA website. Without this, we would have a lot more problems with uploads than we have.I agree that there must be one out there. At the University of Georgia, where I teach, we’ve been using a new software design that we created, called EMMA. Basically it allows students in first year composition (or any other English course) to format their papers into XML format and then upload them so they are accessible on a central website. Students are taught, early in the semester, how to work with Tags in XML. In addition to standard Tags such as paragraph and font Tags, we also frequently have them Tag items like Thesis statements or independent clauses. The most important part of EMMA, though, is the ability the students have to run a validation check before they submit their files to the EMMA website. Without this, we would have a lot more problems with uploads than we have.
That’s an error a validator could catch. There’s a certain format to dates in hcalendar and anything that doesn’t follow that format is just wrong
Good call. I love blog posts that posit a problem and then, smooth as you like, pull out a slick solution.