Dynamic SQL and IN Operator

Question

Post reply

Dynamic SQL and IN Operator

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

VERY impressive splitter improvements there Jeff! Well done!!

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

TomThomson SSC Guru Points: 104773 More actions · Answer 2

m.haren (4/20/2011)
What if you use the same convert for the value you are checking. Like ... convert(nvarchar(20), PRH.ClosedDate, 112) ?

That should work, but is gloriously inefficient as it's not sargable.

Tom

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 3

Jeff Moden (4/16/2011)
Thanks for posting the splitter code. There are several problems with that splitter code that you may not be aware of.
1. That splitter doesn’t work as advertised according to the code itself. The input variables suggest that it will allow a 5-character delimiter but if we feed it a string with even just a 2-character delimiter…
SELECT * FROM dbo.Split('12|-|34|-|56|-|78','|-|');
… here’s what you get.
IdData
112
2|34
3|56
4|78
2. The splitter code uses RBAR to do the split. In this case, it’s what I call a “Nibbler” splitter which finds the first element, stores it, and the “bites” the first element and first delimiter off of the string. The process continues until no delimiters are left and it does a final insert with everything that’s left as the final element. Because of the multiple inputs and the fact that the function is necessarily an mTVF (Multi-line Table Valued Function which has all of the overhead of a Scalar Function), this function is relatively slow.
3. The splitter uses artificial limits on the size of the string that it can be fed. With MAX data-types being the exception, one would always expect an NVARCHAR splitter to be able to split up to and including the full width of an NVARCHAR(4000).
4. The splitter has a built in LTRIM/RTRIM which could be a real problem if leading or trailing spaces are ever required to be preserved. My recommendation would be to do such trimming outside of the function if such a need arises. For those items not needing to be trimmed, having trimming done inside the function doesn’t do anything except slow things down.
Here’s the code that I recommend you use. If you really need for the trimming to be done inside the splitter, do that at the final assignment of “Item = “. Of course, you can also rename the splitter and the column names it returns to match your current splitter.
CREATE FUNCTION [dbo].[DelimitedSplitN4K]
/***************************************************************************************************
Purpose:
Split a given string at a given delimiter and return a list of the split elements (items).
Notes:
1. Leading a trailing delimiters are treated as if an empty string element were present.
2. Consecutive delimiters are treated as if an empty string element were present between them.
3. Except when spaces are used as a delimiter, all spaces present in each element are preserved.
Returns:
iTVF containing the following:
ItemNumber = Element position of Item as a BIGINT (not converted to INT to eliminate a CAST)
Item = Element value as a NVARCHAR(4000)
CROSS APPLY Usage Example:
---------------------------------------------------------------------------------------------------
--===== Conditionally drop the test tables to make reruns easier for testing.
-- (this is NOT a part of the solution)
IF OBJECT_ID('tempdb..#JBMTest') IS NOT NULL
DROP TABLE #JBMTest
;
--===== Create and populate a test table on the fly (this is NOT a part of the solution).
-- In the following comments, "b" is a blank and "E" is an element in the left to right order.
-- Pipes are used to encapsulate the output of "Item" so that you can see that all blanks
-- are preserved no matter where they may appear.
SELECT *
INTO #JBMTest
FROM ( --# & type of Return Row(s)
SELECT 0, NULL UNION ALL --1 NULL
SELECT 1, SPACE(0) UNION ALL --1 b (Empty String)
SELECT 2, SPACE(1) UNION ALL --1 b (1 space)
SELECT 3, SPACE(5) UNION ALL --1 b (5 spaces)
SELECT 4, N',' UNION ALL --2 b b (both are empty strings)
SELECT 5, N'55555' UNION ALL --1 E
SELECT 6, N',55555' UNION ALL --2 b E
SELECT 7, N',55555,' UNION ALL --3 b E b
SELECT 8, N'55555,' UNION ALL --2 b B
SELECT 9, N'55555,1' UNION ALL --2 E E
SELECT 10, N'1,55555' UNION ALL --2 E E
SELECT 11, N'55555,4444,333,22,1' UNION ALL --5 E E E E E
SELECT 12, N'55555,4444,,333,22,1' UNION ALL --6 E E b E E E
SELECT 13, N',55555,4444,,333,22,1,' UNION ALL --8 b E E b E E E b
SELECT 14, N',55555,4444,,,333,22,1,' UNION ALL --9 b E E b b E E E b
SELECT 15, N' 4444,55555 ' UNION ALL --2 E (w/Leading Space) E (w/Trailing Space)
SELECT 16, N'This,is,a,test.' --E E E E
) d (SomeID, SomeValue)
;
GO
--===== Split the CSV column for the whole table using CROSS APPLY (this is the solution)
SELECT test.SomeID, test.SomeValue, split.ItemNumber, Item = '|'+split.Item+'|'
FROM #JBMTest test
CROSS APPLY dbo.DelimitedSplitN4K(test.SomeValue,',') split
;
---------------------------------------------------------------------------------------------------
Other Notes:
1. Optimized for NVARCHAR(4000) or less. No testing or error reporting for truncation at 4000
characters is done.
2. Optimized for single character delimiter. Multi-character delimiters should be resolved
externally from this function.
3. Optimized for use with CROSS APPLY.
4. Does not "trim" elements just in case leading or trailing blanks are intended.
5. If you don't know how a Tally table can be used to replace loops, please see the following...
http://qa.sqlservercentral.com/articles/T-SQL/62867/
6. Changing this function to use NVARCHAR(MAX) will cause it to run twice as slow. It's just the
nature of NVARCHAR(MAX) whether it fits in-row or not.
7. Multi-machine testing for the method of using UNPIVOT instead of 10 SELECT/UNION ALLs shows
that the UNPIVOT method is quite machine dependent and can slow things down quite a bit.
Credits:
This code is the product of many people's efforts including but not limited to the following:
cteTally concept originally by Iztek Ben Gan and "decimalized" by Lynn Pettis (and others) for a
bit of extra speed and finally redacted by Jeff Moden for a different slant on readability and
compactness. Hat's off to Paul White for his simple explanations of CROSS APPLY. Finally,
special thanks to Erland Sommarskog for his tireless efforts to help people understand
what you can actually do with T-SQL. I also thank whoever wrote the first article I ever saw
on "numbers tables" which is located at the following URL ...
http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html
Revision History:
Rev 00 - 20 Jan 2010 - Concept for inline cteTally: Lynn Pettis and others.
Redaction/Implementation: Jeff Moden
- Base 10 redaction and reduction for CTE. (Total rewrite)
Rev 01 - 13 Mar 2010 - Jeff Moden
- Removed one additional concatenation and one subtraction from the SUBSTRING in the
SELECT List for that tiny bit of extra speed.
Rev 02 - 14 Apr 2010 - Jeff Moden
- No code changes. Added CROSS APPLY usage example to the header, some additional credits,
and extra documentation.
Rev 03 - 18 Apr 2010 - Jeff Moden
- No code changes. Added notes 7, 8, and 9 about certain "optimizations" that dont'
actually work for this type of function.
Rev 04 - 29 Jun 2010 - Jeff Moden
- Added WITH SCHEMABINDING thanks to a note by Paul White. This prevents an unnecessary
"Table Spool" when the function is used in an UPDATE statement even though the function
makes no external references.
Rev 05 - 02 Apr 2011 - Jeff Moden
- Rewritten for extreme performance improvement especially for larger strings approaching
the 4K boundary and for strings that have wider elements. The redaction of this code
involved removing ALL concatenation of delimiters, optimization of the maximum "N" value
by using TOP instead of including it in the WHERE clause, and the reduction of all
previous calculations (thanks to the switch to a "zero based" cteTally) to just one
instance of one add and one instance of a subtract. The length calculation for the final
element (not followed by a delimiter) in the string to be split has been greatly
simplified by using the ISNULL/NULLIF combination to determine when the CHARINDEX returned
a 0 which indicates there are no more delimiters to be had or to start with. Depending on
the width of the elements, this code is between 4 and 8 times faster on a single CPU box
than the original code especially near the 8K boundary.
- Modified comments to include more sanity checks on the usage example, etc.
Rev 06 - 12 Apr 2011 - Jeff Moden
- Based on a suggestion by Ron "Bitbucket" McCullough, additional test rows were added to
the above sample and the code was changed to encapsulate the output in pipes so that
spaces and empty strings could be perceived in the output. The "Notes" section was added
and the "Other Notes" section was abbreviated.
***************************************************************************************************/
--===== Define I/O parameters
(
@pString NVARCHAR(4000),
@pDelimiter NCHAR(1)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
--===== "Inline" CTE Driven "Tally Table” produces values from 0 up to
-- 10,000... enough to cover NVARCHAR(4000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1))/2) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1
FROM cteTally t
WHERE (
SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN
OR t.N = 0
)
)--===== The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,4000))
COLLATE Latin1_General_BIN
FROM cteStart s
;
GO
Just in case you’ve heard that “Tally Table” based splitters get slow as the number of elements increased or the width of the string increased, that used to be true. This is a new type of splitter that I just completed a new article for, which will be coming out soon. If you still have doubts, here’s a performance chart from the article for a very similar VARCHAR(8000) splitter. The heavy Blue line is your “Nibbler” splitter. The Red line is how old “Tally Table” based splitters used to act. The heavy Black line is the new splitter above.

Can you post the version optimized for varchar(8000) or varchar(max)?

P.S. I love it when the short, clear and consise explainations takes 3 times more space than the code they explain :hehe:.

bubs SSCommitted Points: 1620 More actions · Answer 4

Thanks for the help everyone, especially Tom. Got it working now.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 5

TheSQLGuru (4/20/2011)
VERY impressive splitter improvements there Jeff! Well done!!

Aye... thanks, Kevin.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 6

Ninja's_RGR'us (4/20/2011)
Can you post the version optimized for varchar(8000) or varchar(max)?

You bet... see attached. I've included both the NVARCHAR(4000) and VARCHAR(8000) versions. The only differences are (of course) no divide by 2 for DATALENGTH, 4000 becomes 8000, and NVARCHAR(4000) becomes VARCHAR(8000).

As a side bar, I've also sussed the same problem for the 2 MAX datatypes which will be in a separate article. I'd attach the code except that I'm not done testing it for accuracy and performance yet. I don't want to put something out there as an "improvement" unless I'm absolutely sure.

And, no... it's not as easy as just changing the parameters to a MAX datatype. You can certainly do that but performance will be disappointing.

P.S. I love it when the short, clear and consise explainations takes 3 times more space than the code they explain :hehe:.

Yeah, I know, huh? Lot's of people on this forum have contributed to the code over a period of time and I kept track of them as best I can. I also didn't want anyone to have a misunderstanding about how the code works so that's why all the comments. It's the first function that's an "article" in itself. 😉

[font="Arial Black"]{edit} The code I had attached to this post works fine for most delimiters but has a case problem if letters are used as delimiters. I took the code down and will replace it with corrected code as soon as I can.[/font]

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 7

[font="Arial Black"] The code I had attached to this post works fine for most delimiters but has a case problem (thanks to Paul White for finding the problem) if letters are used as delimiters. I took the code down and will replace it with corrected code as soon as I can which will be tonight.[/font]

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 8

Considering that I missed something before, I sent the repaired code out to several folks and asked them to try to break it. As soon as I hear back from them, I'll post back here.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions