The "Numbers" or "Tally" Table: What it is and how it replaces a loop.

Question

Post reply

The "Numbers" or "Tally" Table: What it is and how it replaces a loop.

Viewing 15 posts - 316 through 330 (of 497 total)

You must be logged in to reply to this topic. Login to reply

Tim-24860 Mr or Mrs. 500 Points: 578 More actions · Answer 1

Sure,

I'd like the output to be like this.

Name | Oct 2010 | Nov 2010 | Dec 2010 | 2010 Total | Jan 2011 | Feb 2011 | Mar 2011 | ... | 2011 Total |

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 2

Tim-24860 (3/28/2011)
Sure,
I'd like the output to be like this.
Name | Oct 2010 | Nov 2010 | Dec 2010 | 2010 Total | Jan 2011 | Feb 2011 | Mar 2011 | ... | 2011 Total |

Perfect. I've got just the article for that...

http://qa.sqlservercentral.com/articles/Crosstab/65048/

For further understanding, especially when it comes to performance on the subject of Cross Tabs by using "pre-aggregation" as Peter Larsson calls it, see part one of that series...

http://qa.sqlservercentral.com/articles/T-SQL/63681/

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

shannonjk SSCarpal Tunnel Points: 4286 More actions · Answer 3

Great article Jeff!

One situation I would like to see if it can be solved, is I wrote a script that will search my database char columns for 'funny' characters. Of course it does this rbar and takes a good 20 minutes to run (but does save me some head aches because our source system does NOT have anything in place to prevent these characters).

I wanted to try and use this tally table method to see if I can improve performance but I have run into a problem. My loop iteratively builds the character field so for instance

'test data' in the test varchar(100) column would be searched one character at a time and the output would be compared to the orginal (I have a lot of other code in between that makes this useful but the loop is the crux of the performance).

Anyways when I use the tally method this spits the result out as rows instead of a singular column like I have in my loop so it would appear

test

1 t

2 e

3 s

4 t

5

6 d

7 a

8 t

9 a

Is there a way using the tally method to roll this back up, or return it as just 'test data' from this point?

Link to my blog http://notyelf.com/

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 4

Thanks for the feedback, Shannon.

When you say...

One situation I would like to see if it can be solved, is I wrote a script that will search my database char columns for 'funny' characters.

I'm thinking that you may not need a loop or a Tally Table. Could you give an example of something with "funny" characters in it, the original, and explain what you want done one you find a "funny" character? Perhaps maybe even post your loop code comparsion?

Thanks.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

shannonjk SSCarpal Tunnel Points: 4286 More actions · Answer 5

Not a problem. I know I didn't do too good of a job explaining myself. Basically we transfer some of our data to a 3rd party source for reporting that needs to be in ascii. Because our source is in unicode and not constrained to certain character times, we get a lot of diacritical marks and sometimes just some really strange characters (some I have no idea what they are!). My code at the moment doesn't remove them yet, just simply spits out a list of which row and column in which tables are offenders and I clean them up afterwards.

Here is the function I use to do this

CREATE FUNCTION dbo.RemoveSpecialChars (@s varchar(256)) returns varchar(256)

with schemabinding

BEGIN

DECLARE @s2 varchar(256) = '' ,@p int = 1,@l int = len(@s),@c int

IF @S is null

RETURN null

IF len(@s) = 0 AND @S is not null

RETURN ''

WHILE @p <= @l

BEGIN

SET @C = ascii(substring(@s, @p, 1))

IF @C between 32 and 126 Or @C IN (220,146,252,167,214,246,196,228,145,147,148,150,180,188,189,190, 201,233,166, 195, 227, 193, 225, 199, 231, 209, 241)

SET @s2 = @s2 + CHAR(@c)

SET @p = @p + 1

END

RETURN @s2

END

GO

From there I have a stored proc that simply loops through every column in every table and builds a sql statement that is executed via sp_executesql @SQL. So technically its 3 loops, which I have been trying to stray away from for a long time :-D.

Anyways, any help on this is appreciated, I have been looking for an alternate solution in my spare time but as of yet to find anything. I have successfully added the tally table into some loops within my production code in the past though so thank you again!

Link to my blog http://notyelf.com/

James Goodwin Hall of Fame Points: 3691 More actions · Answer 6

Ideas:

1) Are you dumping your results into a text file of some sort? If so, it might be easier to this in a post-processing script on the text file.

1b) dump it to a text file, process to strip the bad characters, import it back in.

2) Use SSIS and use a script to do the processing.

3) What I fundamentally want to do is something like:

REPLACE(...(Replace(Column,CHAR(1),' '), Char(2), ' ')..., Char(255), ' ')

Which is much easier (for me anyway) in a script, which I would build like this:

// PseudoCode

Dim BadChars as String, temp as string

BadChars = Char(1) & Char(2) & ... // (All of the Characters that you do not want...)

temp = inputstring

For i = 1 to len(BadChars)

temp = replace(temp,mid(BadChars,i,1),' ')

next

return temp

Hopefully someone else has a better answer for you.

--

JimFive

Edited to fix psuedocode error

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 7

shannonjk (5/4/2011)
Not a problem. I know I didn't do too good of a job explaining myself. Basically we transfer some of our data to a 3rd party source for reporting that needs to be in ascii. Because our source is in unicode and not constrained to certain character times, we get a lot of diacritical marks and sometimes just some really strange characters (some I have no idea what they are!). My code at the moment doesn't remove them yet, just simply spits out a list of which row and column in which tables are offenders and I clean them up afterwards.
Here is the function I use to do this
(.... function removed so won't kick up a storm at work...)
From there I have a stored proc that simply loops through every column in every table and builds a sql statement that is executed via sp_executesql @SQL. So technically its 3 loops, which I have been trying to stray away from for a long time :-D.
Anyways, any help on this is appreciated, I have been looking for an alternate solution in my spare time but as of yet to find anything. I have successfully added the tally table into some loops within my production code in the past though so thank you again!

That function does actually remove all but the "sanctioned" list of characters from the return variable. You may not be using it that way, but that's what it actually does. Anyway, we should be able to get rid of some of the loops no matter which way you need to have it. I'll see if I can demo for you after work.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

shannonjk SSCarpal Tunnel Points: 4286 More actions · Answer 8

Yes you are correct sorry it has been a while since I designed and/or even looked at this code. I use it as a comparative like

select dbo.removespecialcharacters(column) from table where column != dbo.removespecialcharacters(column)

The output shows only 'bad' data and I take the output from there to manipulate it.

I have a master table I save to show me how to get to unique records on all my tables that is referenced in the stored proc loop so when it spits out the bad data its along with the unique row identifiers as well.

Link to my blog http://notyelf.com/

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 9

Sorry... I didn't forget you... I just got busy. I'll be back.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

shannonjk SSCarpal Tunnel Points: 4286 More actions · Answer 10

No worries take your time. I have been looking for a solution as well, I will post something if I figure it out 😀

Link to my blog http://notyelf.com/

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 11

shannonjk (5/5/2011)
No worries take your time. I have been looking for a solution as well, I will post something if I figure it out 😀

Shannon, to confirm, in the end, you wanted something like this:

IF OBJECT_ID( 'tempdb..#tmp') is not null

drop table #tmp

CREATE table #tmp ( RowID INT IDENTITY( 1,1) NOT NULL, testfield nVARCHAR(256))

INSERT INTO #tmp (testfield) VALUES

( N'abcdefghijklmnopqrstuvwxyz'),

(N'abcdef' + nCHAR(300) + N'ghi'),

(N'abcd' + nchar(301) + N'efg' + nchar(302) + N'hij' + nCHAR(303))

SELECT * FROM #tmp

--===== "Inline" CTE Driven "Tally Table” produces values up to

-- 10,000... enough to cover VARCHAR(8000)

;WITH

E1(N) AS ( --=== Create Ten 1's

SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 --10

),

E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E4)

SELECT

t.RowID,

t.testfield AS BadField

FROM

#tmp AS t

JOIN

(SELECT DISTINCT

t.RowID

FROM

cteTally AS ct,

#tmp AS t

WHERE

ct.N <= LEN( t.testfield)

AND UNICODE(substring( t.testfield, n, 1)) > 256

) AS drv

ONt.RowID = drv.RowID

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 12

Craig Farrell (5/5/2011)
Shannon, to confirm, in the end, you wanted something like this:

What I was thinking is that no Tally Table is necessary for this. I believe a LIKE NOT (using [^]) would do it faster than a Tally Table implementation. I just haven't had the time to tweek it so it escapes embedded wildcard and reserved characters of _, %, and [ although I do believe I just had an epiphany in that area. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 13

Jeff Moden (5/7/2011)
Craig Farrell (5/5/2011)
Shannon, to confirm, in the end, you wanted something like this:
What I was thinking is that no Tally Table is necessary for this. I believe a LIKE NOT (using [^]) would do it faster than a Tally Table implementation. I just haven't had the time to tweek it so it escapes embedded wildcard and reserved characters of _, %, and [ although I do believe I just had an epiphany in that area. 🙂

Most likely, and I'm sure you're right. This would allow for the stripping of the characters or whatnot as a baseline for a number of different tasks, but for simple location it's probably a bit overkill and slower. If I get a chance I'll see if I can determine if an accented A exists in [a-Z]. 🙂

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 14

At this point, I thought Shannon didn't really want to strip characters... only find those "cells" that had unwanted characters. At the very least, LIKE NOT would likely make a decent filter for those "cells" that do need cleaning.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 15

Ok, Shannon... here's part 1. No need for a Tally Table just to figure out if a "cell" has "weird" characters in it. The following code makes 100,000 rows with a column that may have occasional "wierd" characters in it (about 7,800 rows... I say "about" because they're randomly generated).

--===== Create a test table where the last character

-- may or may not be an illegal character

SELECT TOP 100000

RowNum = IDENTITY(INT,1,1),

TestString =

CAST(NEWID() AS VARCHAR(36))

+ CHAR(ABS(CHECKSUM(NEWID()))%90+25)

INTO #TestTable

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

To find them is easy. We'll build your string of "valid" characters like you did before but we'll add some special "escape" characters using a handful of nested replaces. These "escape" characters are important to "escape" characters normally used as WildCards and character range blocks so they'll be treated as normal characters. Then, it's just a matter of doing a SELECT with the right kind of "LIKE" in the WHERE clause with an ESCAPE clause to use the special "escape" characters we inserted before.

--===== Build the "valid" character string.

DECLARE @LegalCharacters VARCHAR(256)

SELECT @LegalCharacters = ISNULL(@LegalCharacters,'')+CHAR(t.N)

FROM dbo.Tally t

WHERE t.N between 32 and 126

OR t.N IN (220,146,252,167,214,246,196,228,145,147,148,150,180,188,189,190,201,233,166,195,227,193,225,199,231,209,241)

;

--===== Add the "escape" characters to the characters normally used as WildCards and character range blocks.

-- In this case, I've used an underscore as the "escape" character.

SELECT @LegalCharacters = REPLACE(REPLACE(REPLACE(REPLACE(@LegalCharacters,'_','__'),'%','_%'),'[','_['),']','_]')

;

--===== And now we find ALL of the "cells" in the column that have a "wierd" character.

-- The "AsciiValue" column shows the decimal ASCII value of the "weird" character found.

-- I didn't use every possible wierd character but it will work for all "weird" characters.

SELECT RowNum, TestString, AsciiValue = ASCII(RIGHT(TestString,1))

FROM #TestTable

WHERE TestString LIKE '%[^'+@LegalCharacters+']%' ESCAPE '_'

;

The SELECT to find the "wierd" characters finds all ~7,800 rows out of 100,000 rows in about 9 seconds on my 9 year old single 1.8GHz CPU. It should do a whole lot better on a server.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions