Late last week I received an email from a user of the hAtom to Atom service I maintain at tools.microformatic.com, asking if I could update to the latest version of the hAtom2Atom XSLT that the service implements. Every happy to oblige, this weekend I set about doing just that, and after the upgrade began to tail -f the httpd log so that I could check a few requests to see if the results looked correct.
I’d never really promoted the service in any particular way, and knew that a few people used it for testing their hAtom implementations, as well as using it to subscribe to the odd hAtom enabled page – mostly, I presumed, to keep tabs on their own implementations. You can imagine my surprise, then, to see the log files ticking by and a fair old rate, with URLs from Yahoo! Pipes, but mostly from social music service Last.fm.
A bit of investigation lead me to a blog post describing how to subscribe to a Last.fm shoutbox using my hAtom to Atom service. This is a superb example of the utility of hAtom. Last.fm don’t have a dedicated feed for their shoutboxes, but because they’re nicely marked up with hAtom, it can be converted to Atom on the fly. Awesome.
Now, about my smoking server. At the moment I don’t use any caching on the hAtom to Atom service. Of course, every request to the service causes me to make a request out to the destination server which then does its thing and returns me the result. I take that result and process it and pass it back to my user. Any caching I can implement to cut down that process for common requests seems like the right thing to do – even though I’m not really having any problem serving the volume of requests at the moment.
However, I don’t want to get in the way of those who are using the service as a method of testing their hAtom markup, and an unexpected caching layer could cause havoc there. I’ve considered implementing a no-cache flag, but it’s all too easy for people to forget to remove that or to use it without being fully aware of the implications.
I think what I’ll do is selectively apply caching to known URL patterns (like Last.fm shoutboxes) where I know that retaining the result for 10 or 15 minutes really won’t be a problem, and perhaps drop a comment into the result indicating the time at which it was cached.



Comments
It seems that the obvious choice would be to respect the Pragma: no-cache and Cache-Control headers. If people are doing development work they should explicitly disable caching at the HTTP level.
That theory is sound, Ian, but in practise it seems most people are making development requests manually via a browser. Under those circumstances there’s not often much control over headers, so I fear it wouldn’t work too well for this.
Squid to the rescue! If you place an instance of the Squid caching proxy (in HTTP acceleration mode) in front of your application you should see the results you’re looking for – assuming, that is, that you are making correct use of Etags or Last-Modified headers.
Normal requests (including If-Modified-Since) to a URI would result in a cache hit if the resource was already served by Squid.
However, and this is the crunch, if the user presses refresh in the browser it triggers a client refresh and Squid will re-fetch the original resource and update it’s cache.
Sounds like just the ticket.