All in the <head>

– Ponderings & code by Drew McLellan –

– Live from The Internets since 2003 –

About

Why Use a Database With a Small CMS?

30 August 2012

As the lead developer of the small CMS Perch, I’m often asked why we use a database at all, when Perch is aimed at smaller websites. There seems to be an assumption that using a relational database like MySQL is both complex, and somehow ‘heavier’ than needed for a ‘lightweight’ solution.

I wanted to talk a bit about both these issues, as the misunderstanding is pervasive and can lead to poor choices if you’re not careful.

Databases are hard to use

When learning HTML, every budding web designer soon wants to start working with images. The problem is, images on the web are complex. You have to learn about the different formats (JPEG, GIF, PNG, SVG…) and when to use each. You need to learn about scaling images to the appropriate size, and exporting from your graphics tool in a way that optimises for file size. You even need to understand some basic conventions about file naming and, when it comes to adding the <img /> tag to your page, file paths. It’s really hard, and you’re best to just stick with text.

I’m being disingenuous, of course, but it’s true that you do need to understand all those things in order to use images on your website. The reality is that, although there’s a few things to understand, they’re quickly learned and become second nature. Once understood, you rarely need to consciously think about them again.

And so it is with using a database. You initially need to learn how to create a database and a user account for the database. You need to understand about exporting and importing the database to a file, in order to make backups and transfer from one system to the next. You also need to learn that a web application or CMS needs the name of the database, username, password and server name in order to connect, and know how to provide that information when guided.

There’s a few new concepts to understand there, but by no means is it difficult. Just like understanding images, there are things to learn, but they’re easy to learn and anyone who can build a site with HTML and CSS should not have any trouble in that regard.

Soon after, you learn the convenience of being able to dump your entire website content to a single file for easy backup, or to get a copy of your site up and running in a different location. The initial extra time spent in configuring the database is soon recouped and forgotten. This isn’t hard, it’s a small extra step of configuration that brings benefits in spades.

This is heavy, man

The other thing I hear is that a relation database like MySQL is ‘heavy’. Software has mass, but no weight, so what do we mean by heavy? I think most people mean computationally expensive – i.e. a lot of work has to be done in order to process a request for a web page. To measure how heavy something is, is to measure the amount and computational cost of work needed to be done between receiving a request for a page and delivering the closing </html> tag to the browser.

So how does MySQL fair in terms of weight? The first thing we need to consider is that, on the majority of PHP hosting accounts, MySQL is already up and running whether we’re using it or not. If you have full control over the hosting, you can control that too, but in the majority of cases where someone is hosting a PHP-based CMS like Perch, MySQL is already running and in use for other sites on the same system.

The next thing to consider is that MySQL is a system built with the express purpose of giving fast access to stored data with as little overhead as possible, so that as many reads and writes of data can be executed as possible with the available resources. That’s the whole point of it – making data storage and retrieval fast, reliable and convenient. It’s written in C and compiled, so runs close to the metal.

So we have a data store that is both already running anyway, and is also super optimised for exactly the task in hand. What’s the alternative?

When people talk about not using a database, they either have a completely static site (i.e. no content management, or some rickety direct-editing system working directly on the files) or they’re building their own data store using files.

Using files as a data store isn’t a new idea. In fact, it’s the original idea – the traditional way of solving the problem. It sounds like an attractive idea – back to basics, no need for a database, just writing and reading from a simple file structure. It can work well, of course, but it has major limitations. Reading and writing is easy, but operations like searching, sorting and filtering data (e.g. show me all articles in category x ordered by date) quickly become complex. You can do it, but as it has to be done in your high-level scripting language (PHP) where operations like sorting and filtering are computationally expensive and therefore slow. On shared hosting, CPU is limited and disk access is often slow, so reading and searching your data files be painful. The solution is to then add an additional layer of cached results for common lookups. To help with searching, you construct an index file like a table of contents, showing your code where everything lives.

Things get more complex when you have multiple users editing content. Your code needs to know how to manage data across files when multiple people could be writing to those files at once. You have to introduce locking, and ways of dealing with that in your user interface.

The alternative to writing all this extra code is to have a data store that is slow and unreliable, and frequently loses your data. As no one wants that, what you end up building is your own, computationally expensive, disk intensive, crappy version of MySQL. In PHP. With only some of the features, and lots of new bugs.

You see, these problems have already been solved. Established computer science has gone down the path of storing relational data in flat files, encountered the problems, solved them and given us relational databases like MySQL. Fast, efficient, mainly bug-free, and already running on your hosting. And for the task in hand, exceptionally fast, reliable and lightweight.

Now that is heavy

So if MySQL is actually lightweight compared to the alternative solutions, what things are expensive? Genuinely heavyweight traits to look out for when evaluating a system include ‘live’ processing of Markdown or Textile, on-demand image resizing, and any activity, like a flat file hobo-database that is disk-intensive. I plan to write some more about these in a separate post.

tl;dr

Databases are not hard to use, once you spend a moment understanding the key concepts. A little bit more configuration work up-front provides tonnes of benefits down the road.

If you need some sort of data store, MySQL is vastly more ‘lightweight’ than the alternative of trying to emulate what it does in an inefficient language like PHP. Plus MySQL is already running on your server, so just use it. It solves problems you won’t even have to bother knowing about.

One of our design goals with Perch is to make a small, but high quality CMS. It should be just as robust as a large system, and not some sort of toy. For us, that meant providing strong foundations by using a proper relational database (not flat files) building a proper control panel (not a flimsy edit-on-page charade) and building a simple system, not an over-simplified one.

- Drew McLellan

Comments

  1. § Rick Hurst:

    Interesting article – i’ve recently come to pretty much the same conclusion, when working with my own home-brewed static file driven blog – although i’m caching json files for quick lookup of categories etc. so performance isn’t an issue, it’s a massive PITA to do things that are already solved by mysql. However, my original intention in building it was so that the content (blog posts and blocks of text or HTML) can be put under version control, which makes it easy to sync content between local and live versions of the site, without having to dump and restore databases each time. I also wanted to be able to write blog posts locally in a text editor, as it started off as a travel blog where I was offline most of the time. I guess it wouldn’t be hard to have some import/ export functionallity to make this work with database content though..

  2. § Lonnie Olson:

    Have you considered the middle ground between custom flat files and full-on MySQL? SQLite.

    Perfect for smaller websites, portable, built-in to PHP, easy to backup (copy a file), rich SQL featureset, etc. Best of both worlds in my opinion.

  3. § Drew McLellan:

    Lonnie – we have looked at SQLite. The problem is that it’s impossible to give a clear set of instructions on how to secure it that apply to all the different hosting configurations we support.

    Lots of shared hosting doesn’t enable the user to place files outside of the site root, or if it does, it’s in specific folders. The user then needs to formulate a valid path to that location. If they can’t put the database file outside of the site root, they need to prevent direct download from their web server, which differs depending if you’re running Apache or IIS or something else entirely. The risk is that the instructions become so complex that a user could end up with an unsecured database, or even never making a successful connection.

    Turns out it’s much, much simpler to configure MySQL. All the hosting companies already have instructions on how to do it for their configuration.

Photographs

Work With Me

edgeofmyseat.com logo

At edgeofmyseat.com we build custom content management systems, ecommerce solutions and develop web apps.

Follow me

Affiliation

  • Web Standards Project
  • Britpack
  • 24 ways

I made

Perch - a really little cms

About Drew McLellan

Photo of Drew McLellan

Drew McLellan (@drewm) has been hacking on the web since around 1996 following an unfortunate incident with a margarine tub. Since then he’s spread himself between both front- and back-end development projects, and now is Director and Senior Web Developer at edgeofmyseat.com in Maidenhead, UK (GEO: 51.5217, -0.7177). Prior to this, Drew was a Web Developer for Yahoo!, and before that primarily worked as a technical lead within design and branding agencies for clients such as Nissan, Goodyear Dunlop, Siemens/Bosch, Cadburys, ICI Dulux and Virgin.net. Somewhere along the way, Drew managed to get himself embroiled with Dreamweaver and was made an early Macromedia Evangelist for that product. This lead to book deals, public appearances, fame, glory, and his eventual downfall.

Picking himself up again, Drew is now a strong advocate for best practises, and stood as Group Lead for The Web Standards Project 2006-08. He has had articles published by A List Apart, Adobe, and O’Reilly Media’s XML.com, mostly due to mistaken identity. Drew is a proponent of the lower-case semantic web, and is currently expending energies in the direction of the microformats movement, with particular interests in making parsers an off-the-shelf commodity and developing simple UI conventions. He writes here at all in the head and, with a little help from his friends, at 24 ways.