Table Valued functions Vs Scalar Valued Functions

  • Have any of you had a chance to try out some of the stuff Adam Machanic did with Inline TVF's? Just curious, becuase the perf differences are fairly remarkable.

    I still don't think it gets past all of the circumstances, and I also suspect it's a preciously narrow window where this would be as good or better, but still, interesting read nonetheless....

    Here's the blog post I had in mind.

    http://sqlblog.com/blogs/adam_machanic/archive/2006/08/04/scalar-functions-inlining-and-performance-an-entertaining-title-for-a-boring-post.aspx

    I adapted this to a 10M row test, and the inline BEAT the "straight" group by scenario.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Jeff Moden (6/26/2008)


    The other option was taking each line of the calculation, adding a column to a join table between the two sets, and calculating one column at a time on the whole set of the join table. This was ugly code, and nowhere near so simple to maintain. But it was amazingly fast.

    How many rows did you end up with in that table, Gus? Was it a full Cartesian product or just a triangular join ((2-1)=ABS(1-2))

    Limited cartesian join, depending on business rules. For example, the 5-million to 1,500 join had about 300-million rows. (To be more precise, it was an inner theta join on a range of data, not a full cross join, but definitely many-to-many within pre-specified limits.)

    Then I broke up each part of a spherical distance calculation into a column. For example, the first thing you calculate is SIN(Latitude1/57.29577951), so I had a SinLat1 column, and had it run that as one calculation; next is SIN(Latitude2/57.29577951), same treatment. Instead of trying to run the whole calculation at once (which I did test), I had it run each piece on each row, and store the data in the column. Doing it as calculated columns ended up being slower in this case, as did trying to run the whole thing at once. Main problem was that 300-million floating point operations takes a LOT of RAM, and there are 12 such operations per distance calculation. That 3.6-billion calculations, and each takes a minimum of 16 bytes (two float data types), plus whatever overhead is used for the actual calculations. Meant it had to push the larger sets onto the hard drive, since I just simply didn't have that much RAM available. (That's a minimum of just about 57 Gig just for this, not including what the OS, SQL Server, etc., need, if I'm calculating it correctly. As opposed to 4.8 Gig per calculation when broken up.)

    Breaking it up like this meant each calculation one column at a time darn near pegged the RAM, but didn't end up in the swap file/tempdb.

    Another option would have been doing whole calculations atomically, but on a smaller subset of the rows at a time. That means a cursor/loop, and I just plain don't like messing around with those if I can avoid it.

    As mentioned, this is a heavy-lifting option that probably doesn't apply in very many cases, but it sure as heck was better than an inline function in this case, which is the point.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • about 300-million rows

    That's kinda what I thought you were going to say.

    What I was suggesting is that the distance from Point A to Point B is exactly the same as the distance from Point B to Point A... wouldn't that cut your row count about in half if you did it like half a multiplication table for each "group"?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (6/28/2008)


    about 300-million rows

    That's kinda what I thought you were going to say.

    What I was suggesting is that the distance from Point A to Point B is exactly the same as the distance from Point B to Point A... wouldn't that cut your row count about in half if you did it like half a multiplication table for each "group"?

    It's already one-way. The two tables have completely different data in them. No self-join at all. A to B or B to A, it's the same number of calculations in what I'm doing. And I'm already filtering for lat/long outside of the possible distances range, etc. (otherwise, it would be a full cartesian). It's a question of taking a large number of addresses in one list, and finding the four closest addresses in a different list for each one in the first list. There's no overlap in the two lists.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 4 posts - 16 through 18 (of 18 total)

You must be logged in to reply to this topic. Login to reply