All in the <head> – Ponderings and code by Drew McLellan –

User Defined Functions Considered Harmful

I admit I should have listened to Rachel. She told me that SQL Server user defined functions (UDFs) were evil, but in the face of zero evidence to back up her claim, I went ahead and designed the linchpin query in our latest project using beautifully designed logic blocks all extracted out to UDFs.

The query is exceptionally complex, having to take account of countless business rules and turn them into result set filters. In fact the query is the crux of the entire web app, with all roads leading to its door. In order to design the query to be as elegant and maintainable as possible, I made heavy use of UDFs to compartmentalise each rule. The design of the query was pretty good – with carefully chosen function names the whole thing read more like a screenplay. And in development it ran beautifully.

Now we’re in alpha testing, and the client has begun loading data. The primary table has about 40,000 rows, and there are perhaps a dozen joined tables containing anything up to 20,000 rows each – but it’s early days, and the volume is likely to increase exponentially. That’s partly why we’re using SQL Server, after all.

That’s when the query started timing out.

ASP pages (I know – not our choice) have a default timeout of 90 seconds, so with our 40,000 rows of data it looked like the query was taking longer than that. I was half betting that it was returning in a consistent 91 seconds, but a quick trip out with Query Analyser revealed the ghastly truth. The damn thing was taking 4 minutes to return.

To cut a long debugging session short, it was quite clearly the UDFs. By taking the contents of the UDFs out and pasting them into the main body of the query (and I’m talking more or less a straight copy and paste job, plus a quick parameter ratification) I got the query time down to less than 4 seconds. At 4 minutes down to 4 seconds, you’ll appreciate how much restraint I exercised in naming this article.

From what I can gather by searching around, the reason for the performance hit is twofold:

Firstly, UDFs when used inline in a query work a bit like a cursor. They encourage row-by-row processing of the query rather than the usual block processing SQL Server performs. This slows down the whole show. For each row in a 40,000 strong table I was performing like 15 row-by-row subqueries cursor-style. This was the main performance hit.

Secondarily, it seems that SQL Server’s query optimisation fails to deal with UDFs properly. Basically it optimises and builds the execution plan without ever looking inside the UDFs to find out what they’re doing, so in all the optimisation isn’t worth the cycles it consumes. Waste of time.

So that’s why user defined functions in Microsoft’s SQL Server 2000 are considered harmful. They provide a lot of advantages in terms of code design and modularisation. But in my experience it’s simply not worth the extreme performance cost.