As the lead developer of the small CMS Perch, I’m often asked why we use a database at all, when Perch is aimed at smaller websites. There seems to be an assumption that using a relational database like MySQL is both complex, and somehow ‘heavier’ than needed for a ‘lightweight’ solution.
I wanted to talk a bit about both these issues, as the misunderstanding is pervasive and can lead to poor choices if you’re not careful.
Databases are hard to use
When learning HTML, every budding web designer soon wants to start working with images. The problem is, images on the web are complex. You have to learn about the different formats (JPEG, GIF, PNG, SVG…) and when to use each. You need to learn about scaling images to the appropriate size, and exporting from your graphics tool in a way that optimises for file size. You even need to understand some basic conventions about file naming and, when it comes to adding the
<img /> tag to your page, file paths. It’s really hard, and you’re best to just stick with text.
I’m being disingenuous, of course, but it’s true that you do need to understand all those things in order to use images on your website. The reality is that, although there’s a few things to understand, they’re quickly learned and become second nature. Once understood, you rarely need to consciously think about them again.
And so it is with using a database. You initially need to learn how to create a database and a user account for the database. You need to understand about exporting and importing the database to a file, in order to make backups and transfer from one system to the next. You also need to learn that a web application or CMS needs the name of the database, username, password and server name in order to connect, and know how to provide that information when guided.
There’s a few new concepts to understand there, but by no means is it difficult. Just like understanding images, there are things to learn, but they’re easy to learn and anyone who can build a site with HTML and CSS should not have any trouble in that regard.
Soon after, you learn the convenience of being able to dump your entire website content to a single file for easy backup, or to get a copy of your site up and running in a different location. The initial extra time spent in configuring the database is soon recouped and forgotten. This isn’t hard, it’s a small extra step of configuration that brings benefits in spades.
This is heavy, man
The other thing I hear is that a relation database like MySQL is ‘heavy’. Software has mass, but no weight, so what do we mean by heavy? I think most people mean computationally expensive – i.e. a lot of work has to be done in order to process a request for a web page. To measure how heavy something is, is to measure the amount and computational cost of work needed to be done between receiving a request for a page and delivering the closing
</html> tag to the browser.
So how does MySQL fair in terms of weight? The first thing we need to consider is that, on the majority of PHP hosting accounts, MySQL is already up and running whether we’re using it or not. If you have full control over the hosting, you can control that too, but in the majority of cases where someone is hosting a PHP-based CMS like Perch, MySQL is already running and in use for other sites on the same system.
The next thing to consider is that MySQL is a system built with the express purpose of giving fast access to stored data with as little overhead as possible, so that as many reads and writes of data can be executed as possible with the available resources. That’s the whole point of it – making data storage and retrieval fast, reliable and convenient. It’s written in C and compiled, so runs close to the metal.
So we have a data store that is both already running anyway, and is also super optimised for exactly the task in hand. What’s the alternative?
When people talk about not using a database, they either have a completely static site (i.e. no content management, or some rickety direct-editing system working directly on the files) or they’re building their own data store using files.
Using files as a data store isn’t a new idea. In fact, it’s the original idea – the traditional way of solving the problem. It sounds like an attractive idea – back to basics, no need for a database, just writing and reading from a simple file structure. It can work well, of course, but it has major limitations. Reading and writing is easy, but operations like searching, sorting and filtering data (e.g. show me all articles in category x ordered by date) quickly become complex. You can do it, but as it has to be done in your high-level scripting language (PHP) where operations like sorting and filtering are computationally expensive and therefore slow. On shared hosting, CPU is limited and disk access is often slow, so reading and searching your data files be painful. The solution is to then add an additional layer of cached results for common lookups. To help with searching, you construct an index file like a table of contents, showing your code where everything lives.
Things get more complex when you have multiple users editing content. Your code needs to know how to manage data across files when multiple people could be writing to those files at once. You have to introduce locking, and ways of dealing with that in your user interface.
The alternative to writing all this extra code is to have a data store that is slow and unreliable, and frequently loses your data. As no one wants that, what you end up building is your own, computationally expensive, disk intensive, crappy version of MySQL. In PHP. With only some of the features, and lots of new bugs.
You see, these problems have already been solved. Established computer science has gone down the path of storing relational data in flat files, encountered the problems, solved them and given us relational databases like MySQL. Fast, efficient, mainly bug-free, and already running on your hosting. And for the task in hand, exceptionally fast, reliable and lightweight.
Now that is heavy
So if MySQL is actually lightweight compared to the alternative solutions, what things are expensive? Genuinely heavyweight traits to look out for when evaluating a system include ‘live’ processing of Markdown or Textile, on-demand image resizing, and any activity, like a flat file hobo-database that is disk-intensive. I plan to write some more about these in a separate post.
Databases are not hard to use, once you spend a moment understanding the key concepts. A little bit more configuration work up-front provides tonnes of benefits down the road.
If you need some sort of data store, MySQL is vastly more ‘lightweight’ than the alternative of trying to emulate what it does in an inefficient language like PHP. Plus MySQL is already running on your server, so just use it. It solves problems you won’t even have to bother knowing about.
One of our design goals with Perch is to make a small, but high quality CMS. It should be just as robust as a large system, and not some sort of toy. For us, that meant providing strong foundations by using a proper relational database (not flat files) building a proper control panel (not a flimsy edit-on-page charade) and building a simple system, not an over-simplified one.