All in the <head>

– Ponderings & code by Drew McLellan –

– Live from The Internets since 2003 –

About

User Defined Functions Considered Harmful

3 March 2005

I admit I should have listened to Rachel. She told me that SQL Server user defined functions (UDFs) were evil, but in the face of zero evidence to back up her claim, I went ahead and designed the linchpin query in our latest project using beautifully designed logic blocks all extracted out to UDFs.

The query is exceptionally complex, having to take account of countless business rules and turn them into result set filters. In fact the query is the crux of the entire web app, with all roads leading to its door. In order to design the query to be as elegant and maintainable as possible, I made heavy use of UDFs to compartmentalise each rule. The design of the query was pretty good – with carefully chosen function names the whole thing read more like a screenplay. And in development it ran beautifully.

Now we’re in alpha testing, and the client has begun loading data. The primary table has about 40,000 rows, and there are perhaps a dozen joined tables containing anything up to 20,000 rows each – but it’s early days, and the volume is likely to increase exponentially. That’s partly why we’re using SQL Server, after all.

That’s when the query started timing out.

ASP pages (I know – not our choice) have a default timeout of 90 seconds, so with our 40,000 rows of data it looked like the query was taking longer than that. I was half betting that it was returning in a consistent 91 seconds, but a quick trip out with Query Analyser revealed the ghastly truth. The damn thing was taking 4 minutes to return.

To cut a long debugging session short, it was quite clearly the UDFs. By taking the contents of the UDFs out and pasting them into the main body of the query (and I’m talking more or less a straight copy and paste job, plus a quick parameter ratification) I got the query time down to less than 4 seconds. At 4 minutes down to 4 seconds, you’ll appreciate how much restraint I exercised in naming this article.

From what I can gather by searching around, the reason for the performance hit is twofold:

Firstly, UDFs when used inline in a query work a bit like a cursor. They encourage row-by-row processing of the query rather than the usual block processing SQL Server performs. This slows down the whole show. For each row in a 40,000 strong table I was performing like 15 row-by-row subqueries cursor-style. This was the main performance hit.

Secondarily, it seems that SQL Server’s query optimisation fails to deal with UDFs properly. Basically it optimises and builds the execution plan without ever looking inside the UDFs to find out what they’re doing, so in all the optimisation isn’t worth the cycles it consumes. Waste of time.

So that’s why user defined functions in Microsoft’s SQL Server 2000 are considered harmful. They provide a lot of advantages in terms of code design and modularisation. But in my experience it’s simply not worth the extreme performance cost.

- Drew McLellan

Tags

Comments

  1. § Marcus Tucker: Drew, I’m surprised to see you posting something so subjective and misleading (it’s not like you at all) – UDFs are just another feature of SQL which can be enormously useful when used appropriately, but a performance killer when they are not.

    Clearly in this particular case the cons far outweighed the pros, but that’s not to say that they are all bad, so branding them as evil helps nobody and might put people off using them altogether. It’s just another tool in the set, to be used where it actually helps, not all the time just because it’s there.

    You may find Adam Machanic’s excellent SQL-oriented blog to be of interest:
    http://sqljunkies.com/WebLog/amachanic/
  2. § Marcus Tucker: I should perhaps add that views can be just as “evil” – just look at this example posted by someone who just so happens to have the same first name as me… ;)

    http://thedailywtf.com/ShowPost.aspx?PostID=27097
  3. § Drew McLellan: Marcus – that’s pretty much what I said. UDFs provide a lot of advantages in terms of code design and modularisation. When performance is critical, however, they can cause big problems.
    I actually really like UDFs still, but I’m a bit annoyed now that I’m going to have to curb my use of them.

    I guess it’s not as obvious as I thought, but the whole “considered harmful” thing is somewhat tongue in cheek :-)
  4. § Paul Carpenter: I didn’t even know that such a thing existed, so I thankyou for the heads up. Is this just with Windows SQL server or does it apply across the board?
  5. § Eric Hancock: This is something about which I feel somewhat strongly.

    I’ve worked on systems on both ends of the spectrum: lots of UDFs and everything done in code. There were good and bad aspects of each.

    My conclusion was one of moderation: use databases for things they do well. Use the programming language for the things it does well.

    At this point, databases are remarkably good at managing sets of data: queries with intersections and unions of large sets don’t bother most modern RDBs. Procedural logic, however, can’t be optimized in the same way, and should be used in the database with great care.
  6. § Gabriel Mihalache: I imagine this is a SQL Server thing. On IBM’s DB2 I used stored procedures and UDF to my liking without any such problems.

    Also, I know everyone says this, but did you check your indexes? Maybe you can enhance the performance by sacrificing storage space for indexes. They’re worth it!
  7. § Charles Martin: I had never used a UDF until I came to my current position. Instead of using them for simple manipulation of values, they were used to return entire recordsets. Thus, there were several

    “SELECT * FROM AnotherFreakingUDF”

    In many cases, they were used as complex views that (1) accepted input and (2) manipulated the data set before returning to the user (neither of which you can do with a view). Combine this with 90% of the queries being done in the ASP code with no stored procedures to return the data and you’ve got a grand system that has almost no SQL optimization. Yeah, it’s fun to inherit another developers work who no longer works for the company.
  8. § DK:

    I had a query that basically displayed several statistical info (counts, min, max, avg, deviations etc.) from data in another bunch of tables. The main query was joining ten odd tables each having over 200,000 rows. For each statistical computation, I had created UDF’s and was passing the PK/FK’s to them from the main query as parameter. I would say I was really shocked to see my query take over 2 minutes to complete. Somehow instinctively I started searching for usefulness of UDF’s and landed at your article. Following your path, I just copy-pasted each UDF’s code as a plain subquery and the results were amazing!!! The query completed in less than 143 milliseconds.

    Thanks to your article, you saved my day!!!

  9. § Brian C Brown:

    Actually, CM, you can have views accept input and manipulate the data. You can even create triggers for added complexity.

Photographs

Work With Me

edgeofmyseat.com logo

At edgeofmyseat.com we build custom content management systems, ecommerce solutions and develop web apps.

Recent Links

Affiliation

  • Web Standards Project
  • Britpack
  • 24 ways

About Drew McLellan

Photo of Drew McLellan

Drew McLellan has been hacking on the web since around 1996 following an unfortunate incident with a margarine tub. Since then he’s spread himself between both front- and back-end development projects, and now is Director and Senior Web Developer at edgeofmyseat.com in Maidenhead, UK (GEO: 51.5217, -0.7177). Prior to this, Drew was a Web Developer for Yahoo!, and before that primarily worked as a technical lead within design and branding agencies for clients such as Nissan, Goodyear Dunlop, Siemens/Bosch, Cadburys, ICI Dulux and Virgin.net. Somewhere along the way, Drew managed to get himself embroiled with Dreamweaver and was made an early Macromedia Evangelist for that product. This lead to book deals, public appearances, fame, glory, and his eventual downfall.

Picking himself up again, Drew is now a strong advocate for best practises, and stood as Group Lead for The Web Standards Project 2006-08. He has had articles published by A List Apart, Adobe, and O’Reilly Media’s XML.com, mostly due to mistaken identity. Drew is a proponent of the lower-case semantic web, and is currently expending energies in the direction of the microformats movement, with particular interests in making parsers an off-the-shelf commodity and developing simple UI conventions. He writes here at all in the head and, with a little help from his friends, at 24 ways.