On returning from a very successful @media conference at the weekend, I had the urge to get hacking on some code. In an enviroment such as that created by a tech conference, where you’re surrounded by many like-minded individuals who are passionate about the same things you’re passionate about, it’s hard not to get the bug and be compelled into action.
It quickly became apparent that what should’ve been a couple of hours coding was going to take me quite a while, because I had no toolkits to back me up. I was wanting to parse hCards out of a remote site, and whilst there are excellent tools available for doing things like converting hCards to vCards, if you want to just grab some microformatted data and do something cool with it, the toolkit options are extremely limited. So I put my idea on hold, and thought I’d better get hacking on a parsing toolkit instead.
I poked around looking at stuff that’s already out there, including Microformats Base, but I couldn’t find anything that fitted the model I was after – namely chuck in a string or URL, and get out an array structure of, say, hCards. So I began working on my own.
My goals were that its interface should be very easy to use (as easy as handing over a URL and getting back some data) in order to make building tools on top of it very quick and brainless. I also wanted to make it generic enough to support a number of microformats, with as little work as necessary required to add support for new ones – and that this should ideally be provided by a plugin layer. The idea being that if you need support for an unsupported microformat, it shouldn’t be too hard to add support yourself.
So in the principal of releasing early and often, here’s what I’m calling
hKit for PHP5 version 0.2.
The toolkit depends on SimpleXML in PHP5, which is new to me and I’ve already grown to dislike. Ideally, you’ll have support for Tidy either via PHP Tidy functions or tidy on your local system (a configurable setting), otherwise you’re dependant on going via a proxy to ensure pages are well formed (another standard config). hKit uses a pluggable system of ‘profiles’ for each supported ÂµF – the only one of which is hCard at the moment, and the format of which is still in flux.
This really is way too early to be releasing, but if I don’t now I probably never will. This really is a first pass, and bits of it are a bit hacky. Known limitations (ha!) are:
- Doesn’t fully enforce all the parsing rules in hcard-parsing
Doesn’t support include-patternv0.3 now supports both include-pattern and td@headers patterns
- Doesn’t pass all of the hCard tests yet,
but is well on the way. v0.3 now passes all but one test case.
- v0.3 adds hCard implied-n optimization support and dozens of bug fixes.
In practise, point it at any random hCard-enabled page and it returns a pretty good set of results. It may even be useable for basic applications at this point. Knock up a quick profile and it’ll probably handle vevents too. (But be aware that the profile format may change yet.)
I’m licensing hKit under a LGPL license with the hope that others might like to contribute at some point along the way. I’ve already received contributions from Scott Reynen of Microformats Base, which is very much appreciated.
My thinking is that if we have a reasonable set of open tools, it can dramatically lower the point of entry for others to hack together quick applications based on microformats – as X2V has already proved. That only stands to benefit us all, so I do believe it’s worth the investment in time. The code’s not perfect – some of it needs rewriting already – but the most important thing is having something functional, so that’s the goal I’m persuing.
Any testing, feedback or patches would be very much appreciated. Oh, and happy first birthday, microformats.org.