All in the <head> – Ponderings and code by Drew McLellan –

Scalability vs Performance

When designing the architecture for a web application, it is normally desirable to design every aspect of the system to be as scalable as possible. It’s only too often that badly designed apps have need to be completely refactored before any further development work can be done, entirely due to an unscalable architecture. If the app hasn’t been designed with an attitude of what if we wanted to …? then adding a dot-dot-dot six months down the line can be a massive undertaking with needless slaughtering of otherwise good code.

But we know this; we know that applications, any code in fact, needs to be designed to cope effectively with change. It’s one of the principals behind OO and it’s generally accepted as A Good Thing. This is typically done through the use of OO design principals closely coupled with various levels of abstraction. The question it raises, however, is where do we stop? At what point does multiple levels of abstraction stop saving development time and start taking its toll on performance?

Take the example of database normalisation. If adhering to third level of normalisation completely, theory would have us storing tables full of things like postal codes. It’s possible that two users could have the same postal code, and our design should not allow that data to be stored twice. Of course, the reality in a global information system like a web application is that it’s very unlikely that a reasonable quantity of users will live only a few doors apart, and even if they do, the overhead of needing to join to a massive table of postal codes means it just isn’t important. Nothing bad will happen if AB12 3CD is stored twice, or hell, even three times. There is little or no consequence in not adhering to the third level of normalisation in case like this – so little that it’s not often done. There’s no advantage in that degree of abstraction.

When working on our latest web CMS project, we decided we would not abstract our database layer so far as to not caring about what the database engine was and what flavour of SQL it used. We standardised on MySQL. Of course, we did abstract a lot of the functionality, but not to the extent where we were no longer using MySQL-flavoured SQL in our classes. We decided that we’d picked PHP and MySQL because the combination is highly performant and quick to develop for, so why bog it, and us, down with SQL translation for every single query.

The alternative would be to have dropped in something like PEAR DB, which we did consider for a while, as it would have enabled our app to be portable across an entire range of database platforms. However, I couldn’t stomach the thought of all that code (PEAR DB is big) even being parsed, let alone run for every database query (of which there are typically about a dozen per page load).

Instead, I opted for rolling my down, far simpler abstraction layer. Although our classes use MySQL-flavoured SQL, the only place that makes specific reference to any of PHP’s MySQL functions is the database class. We figured that if the worse came to the worse, it wouldn’t be too much effort to rewrite the database class to use a different engine if needs be.

The scalability / performance balance can be a difficult one to strike – especially is it’s not always apparent that you’re doing the wrong thing until you’ve done it. I’m pretty happy with the solution we settled on this time – but ask me again in six months’ time.

Update: Jeremy Zawodny expresses a view which pretty much confirms, in my mind, the decision we took. Zawodny’s a guy worth listening to.