All in the <head>

– Ponderings & code by Drew McLellan –

– Live from The Internets since 2003 –

About

hKit Microformats Toolkit for PHP

21 June 2006

On returning from a very successful @media conference at the weekend, I had the urge to get hacking on some code. In an enviroment such as that created by a tech conference, where you’re surrounded by many like-minded individuals who are passionate about the same things you’re passionate about, it’s hard not to get the bug and be compelled into action.

After a particularly interesting exchange of ideas with Ben following Tantek’s microformats presentation, I got the itch to start hacking on some quick ideas for a microformat-related mini-app.

It quickly became apparent that what should’ve been a couple of hours coding was going to take me quite a while, because I had no toolkits to back me up. I was wanting to parse hCards out of a remote site, and whilst there are excellent tools available for doing things like converting hCards to vCards, if you want to just grab some microformatted data and do something cool with it, the toolkit options are extremely limited. So I put my idea on hold, and thought I’d better get hacking on a parsing toolkit instead.

I poked around looking at stuff that’s already out there, including Microformats Base, but I couldn’t find anything that fitted the model I was after – namely chuck in a string or URL, and get out an array structure of, say, hCards. So I began working on my own.

My goals were that its interface should be very easy to use (as easy as handing over a URL and getting back some data) in order to make building tools on top of it very quick and brainless. I also wanted to make it generic enough to support a number of microformats, with as little work as necessary required to add support for new ones – and that this should ideally be provided by a plugin layer. The idea being that if you need support for an unsupported microformat, it shouldn’t be too hard to add support yourself.

So in the principal of releasing early and often, here’s what I’m calling hKit for PHP5 version 0.2.

Update: 2006-06-21 hKit for PHP5 version 0.3. – see below for changes.
Update: 2006-06-23 All further updates can be found on the hKit page which also has its own feeds.

The toolkit depends on SimpleXML in PHP5, which is new to me and I’ve already grown to dislike. Ideally, you’ll have support for Tidy either via PHP Tidy functions or tidy on your local system (a configurable setting), otherwise you’re dependant on going via a proxy to ensure pages are well formed (another standard config). hKit uses a pluggable system of ‘profiles’ for each supported µF – the only one of which is hCard at the moment, and the format of which is still in flux.

This really is way too early to be releasing, but if I don’t now I probably never will. This really is a first pass, and bits of it are a bit hacky. Known limitations (ha!) are:

  • Doesn’t fully enforce all the parsing rules in hcard-parsing
  • Doesn’t support include-pattern v0.3 now supports both include-pattern and td@headers patterns
  • Doesn’t pass all of the hCard tests yet, but is well on the way. v0.3 now passes all but one test case.
  • v0.3 adds hCard implied-n optimization support and dozens of bug fixes.

In practise, point it at any random hCard-enabled page and it returns a pretty good set of results. It may even be useable for basic applications at this point. Knock up a quick profile and it’ll probably handle vevents too. (But be aware that the profile format may change yet.)

I’m licensing hKit under a LGPL license with the hope that others might like to contribute at some point along the way. I’ve already received contributions from Scott Reynen of Microformats Base, which is very much appreciated.

My thinking is that if we have a reasonable set of open tools, it can dramatically lower the point of entry for others to hack together quick applications based on microformats – as X2V has already proved. That only stands to benefit us all, so I do believe it’s worth the investment in time. The code’s not perfect – some of it needs rewriting already – but the most important thing is having something functional, so that’s the goal I’m persuing.

Any testing, feedback or patches would be very much appreciated. Oh, and happy first birthday, microformats.org.

- Drew McLellan

Comments

  1. § Darren Wood:

    Oh man! This post just gave me a chubby. I’m so hot for microformats. It’s the work of people like yourself that will move microformats to a more mainstream place on the web. Very good work!

  2. § Small Paul:

    You’re entirely on the right track. About a year ago, I was looking at including microformats in my company’s new web site, as we have a bunch of contact details on there.

    After spending a fair bit of time trying to read the vCard spec (amongst other things), I gave up – mainly because I felt the editors for the site wouldn’t be able to deal with producing working hCards regularly (I still think that’s a microformat problem: data entry by non-technical folk in CMSs).

    However, coming back to it last week, and I found the hCard creator (http://microformats.org/code/hcard/creator). Genius. Now, instead of trying to figure out how I should integrate microformats into my pages, and dithering over it, I got some working example code to tinker with as much as I fancied. Got me using them on another site within 5 minutes.

    When the best option also involves the least work, rainbows and happiness will follow you everywhere. Er, or something.

  3. § Craig Knudsen:

    Excellent idea here…. I’ve been looking for such a tool for the hCalendar microformat. We just added the ability for WebCalendar users to add remote iCalendar calendars (from sites like icalshare.com). I’d like to also offer remote hCal data, but didn’t want to reinvent the screen scraper for microformats.

    Anyone else interested in extending this to include hCalendar?

Photographs

Work With Me

edgeofmyseat.com logo

At edgeofmyseat.com we build custom content management systems, ecommerce solutions and develop web apps.

Follow me

Affiliation

  • Web Standards Project
  • Britpack
  • 24 ways

Perch - a really little cms

About Drew McLellan

Photo of Drew McLellan

Drew McLellan (@drewm) has been hacking on the web since around 1996 following an unfortunate incident with a margarine tub. Since then he’s spread himself between both front- and back-end development projects, and now is Director and Senior Web Developer at edgeofmyseat.com in Maidenhead, UK (GEO: 51.5217, -0.7177). Prior to this, Drew was a Web Developer for Yahoo!, and before that primarily worked as a technical lead within design and branding agencies for clients such as Nissan, Goodyear Dunlop, Siemens/Bosch, Cadburys, ICI Dulux and Virgin.net. Somewhere along the way, Drew managed to get himself embroiled with Dreamweaver and was made an early Macromedia Evangelist for that product. This lead to book deals, public appearances, fame, glory, and his eventual downfall.

Picking himself up again, Drew is now a strong advocate for best practises, and stood as Group Lead for The Web Standards Project 2006-08. He has had articles published by A List Apart, Adobe, and O’Reilly Media’s XML.com, mostly due to mistaken identity. Drew is a proponent of the lower-case semantic web, and is currently expending energies in the direction of the microformats movement, with particular interests in making parsers an off-the-shelf commodity and developing simple UI conventions. He writes here at all in the head and, with a little help from his friends, at 24 ways.