I admit I should have listened to Rachel. She told me that SQL Server user defined functions (UDFs) were evil, but in the face of zero evidence to back up her claim, I went ahead and designed the linchpin query in our latest project using beautifully designed logic blocks all extracted out to UDFs.
The query is exceptionally complex, having to take account of countless business rules and turn them into result set filters. In fact the query is the crux of the entire web app, with all roads leading to its door. In order to design the query to be as elegant and maintainable as possible, I made heavy use of UDFs to compartmentalise each rule. The design of the query was pretty good – with carefully chosen function names the whole thing read more like a screenplay. And in development it ran beautifully.
Now we’re in alpha testing, and the client has begun loading data. The primary table has about 40,000 rows, and there are perhaps a dozen joined tables containing anything up to 20,000 rows each – but it’s early days, and the volume is likely to increase exponentially. That’s partly why we’re using SQL Server, after all.
That’s when the query started timing out.
ASP pages (I know – not our choice) have a default timeout of 90 seconds, so with our 40,000 rows of data it looked like the query was taking longer than that. I was half betting that it was returning in a consistent 91 seconds, but a quick trip out with Query Analyser revealed the ghastly truth. The damn thing was taking 4 minutes to return.
To cut a long debugging session short, it was quite clearly the UDFs. By taking the contents of the UDFs out and pasting them into the main body of the query (and I’m talking more or less a straight copy and paste job, plus a quick parameter ratification) I got the query time down to less than 4 seconds. At 4 minutes down to 4 seconds, you’ll appreciate how much restraint I exercised in naming this article.
From what I can gather by searching around, the reason for the performance hit is twofold:
Firstly, UDFs when used inline in a query work a bit like a cursor. They encourage row-by-row processing of the query rather than the usual block processing SQL Server performs. This slows down the whole show. For each row in a 40,000 strong table I was performing like 15 row-by-row subqueries cursor-style. This was the main performance hit.
Secondarily, it seems that SQL Server’s query optimisation fails to deal with UDFs properly. Basically it optimises and builds the execution plan without ever looking inside the UDFs to find out what they’re doing, so in all the optimisation isn’t worth the cycles it consumes. Waste of time.
So that’s why user defined functions in Microsoft’s SQL Server 2000 are considered harmful. They provide a lot of advantages in terms of code design and modularisation. But in my experience it’s simply not worth the extreme performance cost.



Comments
Clearly in this particular case the cons far outweighed the pros, but that’s not to say that they are all bad, so branding them as evil helps nobody and might put people off using them altogether. It’s just another tool in the set, to be used where it actually helps, not all the time just because it’s there.
You may find Adam Machanic’s excellent SQL-oriented blog to be of interest:
http://sqljunkies.com/WebLog/amachanic/
http://thedailywtf.com/ShowPost.aspx?PostID=27097
I actually really like UDFs still, but I’m a bit annoyed now that I’m going to have to curb my use of them.
I guess it’s not as obvious as I thought, but the whole “considered harmful” thing is somewhat tongue in cheek :-)
I’ve worked on systems on both ends of the spectrum: lots of UDFs and everything done in code. There were good and bad aspects of each.
My conclusion was one of moderation: use databases for things they do well. Use the programming language for the things it does well.
At this point, databases are remarkably good at managing sets of data: queries with intersections and unions of large sets don’t bother most modern RDBs. Procedural logic, however, can’t be optimized in the same way, and should be used in the database with great care.
Also, I know everyone says this, but did you check your indexes? Maybe you can enhance the performance by sacrificing storage space for indexes. They’re worth it!
“SELECT * FROM AnotherFreakingUDF”
In many cases, they were used as complex views that (1) accepted input and (2) manipulated the data set before returning to the user (neither of which you can do with a view). Combine this with 90% of the queries being done in the ASP code with no stored procedures to return the data and you’ve got a grand system that has almost no SQL optimization. Yeah, it’s fun to inherit another developers work who no longer works for the company.
I had a query that basically displayed several statistical info (counts, min, max, avg, deviations etc.) from data in another bunch of tables. The main query was joining ten odd tables each having over 200,000 rows. For each statistical computation, I had created UDF’s and was passing the PK/FK’s to them from the main query as parameter. I would say I was really shocked to see my query take over 2 minutes to complete. Somehow instinctively I started searching for usefulness of UDF’s and landed at your article. Following your path, I just copy-pasted each UDF’s code as a plain subquery and the results were amazing!!! The query completed in less than 143 milliseconds.
Thanks to your article, you saved my day!!!
Actually, CM, you can have views accept input and manipulate the data. You can even create triggers for added complexity.