On returning from a very successful @media conference at the weekend, I had the urge to get hacking on some code. In an enviroment such as that created by a tech conference, where you’re surrounded by many like-minded individuals who are passionate about the same things you’re passionate about, it’s hard not to get the bug and be compelled into action.
After a particularly interesting exchange of ideas with Ben following Tantek’s microformats presentation, I got the itch to start hacking on some quick ideas for a microformat-related mini-app.
It quickly became apparent that what should’ve been a couple of hours coding was going to take me quite a while, because I had no toolkits to back me up. I was wanting to parse hCards out of a remote site, and whilst there are excellent tools available for doing things like converting hCards to vCards, if you want to just grab some microformatted data and do something cool with it, the toolkit options are extremely limited. So I put my idea on hold, and thought I’d better get hacking on a parsing toolkit instead.
I poked around looking at stuff that’s already out there, including Microformats Base, but I couldn’t find anything that fitted the model I was after – namely chuck in a string or URL, and get out an array structure of, say, hCards. So I began working on my own.
My goals were that its interface should be very easy to use (as easy as handing over a URL and getting back some data) in order to make building tools on top of it very quick and brainless. I also wanted to make it generic enough to support a number of microformats, with as little work as necessary required to add support for new ones – and that this should ideally be provided by a plugin layer. The idea being that if you need support for an unsupported microformat, it shouldn’t be too hard to add support yourself.
So in the principal of releasing early and often, here’s what I’m calling hKit for PHP5 version 0.2.
Update: 2006-06-21 hKit for PHP5 version 0.3. – see below for changes.
Update: 2006-06-23 All further updates can be found on the hKit page which also has its own feeds.
The toolkit depends on SimpleXML in PHP5, which is new to me and I’ve already grown to dislike. Ideally, you’ll have support for Tidy either via PHP Tidy functions or tidy on your local system (a configurable setting), otherwise you’re dependant on going via a proxy to ensure pages are well formed (another standard config). hKit uses a pluggable system of ‘profiles’ for each supported µF – the only one of which is hCard at the moment, and the format of which is still in flux.
This really is way too early to be releasing, but if I don’t now I probably never will. This really is a first pass, and bits of it are a bit hacky. Known limitations (ha!) are:
- Doesn’t fully enforce all the parsing rules in hcard-parsing
Doesn’t support include-patternv0.3 now supports both include-pattern and td@headers patterns- Doesn’t pass all of the hCard tests yet,
but is well on the way. v0.3 now passes all but one test case. - v0.3 adds hCard implied-n optimization support and dozens of bug fixes.
In practise, point it at any random hCard-enabled page and it returns a pretty good set of results. It may even be useable for basic applications at this point. Knock up a quick profile and it’ll probably handle vevents too. (But be aware that the profile format may change yet.)
I’m licensing hKit under a LGPL license with the hope that others might like to contribute at some point along the way. I’ve already received contributions from Scott Reynen of Microformats Base, which is very much appreciated.
My thinking is that if we have a reasonable set of open tools, it can dramatically lower the point of entry for others to hack together quick applications based on microformats – as X2V has already proved. That only stands to benefit us all, so I do believe it’s worth the investment in time. The code’s not perfect – some of it needs rewriting already – but the most important thing is having something functional, so that’s the goal I’m persuing.
Any testing, feedback or patches would be very much appreciated. Oh, and happy first birthday, microformats.org.



Comments
Oh man! This post just gave me a chubby. I’m so hot for microformats. It’s the work of people like yourself that will move microformats to a more mainstream place on the web. Very good work!
You’re entirely on the right track. About a year ago, I was looking at including microformats in my company’s new web site, as we have a bunch of contact details on there.
After spending a fair bit of time trying to read the vCard spec (amongst other things), I gave up – mainly because I felt the editors for the site wouldn’t be able to deal with producing working hCards regularly (I still think that’s a microformat problem: data entry by non-technical folk in CMSs).
However, coming back to it last week, and I found the hCard creator (http://microformats.org/code/hcard/creator). Genius. Now, instead of trying to figure out how I should integrate microformats into my pages, and dithering over it, I got some working example code to tinker with as much as I fancied. Got me using them on another site within 5 minutes.
When the best option also involves the least work, rainbows and happiness will follow you everywhere. Er, or something.
Excellent idea here…. I’ve been looking for such a tool for the hCalendar microformat. We just added the ability for WebCalendar users to add remote iCalendar calendars (from sites like icalshare.com). I’d like to also offer remote hCal data, but didn’t want to reinvent the screen scraper for microformats.
Anyone else interested in extending this to include hCalendar?