Normalize data using Pivot WITHOUT using Aggregate function

Question

Post reply

Normalize data using Pivot WITHOUT using Aggregate function

Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

Lowell SSC Guru Points: 323482 More actions · Answer 1

ouch that changes things a bit;

can you please do a SELECT DISTINCT ColumnName From yourtable, so we can have the list of columnnames to normalize this data into? so far we have 9 columns, i'm betting there might be more:

'DestinationTableName'

'ID'

'SubjectArea'

'LocalCourseCode'

'SchoolID'

'Source'

'CourseTitle'

'TeacherID'

'Credits Possible'

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

jjbear11 Default port Points: 1497 More actions · Answer 2

I just did a select distinct, and those are all of the columns from the table.

Ray K SSC-Dedicated Points: 31135 More actions · Answer 3

Man, this would make for a good T-SQL challenge!

Now I'm really determined to figure this out!

+--------------------------------------------------------------------------------------+
Check out my blog at https://pianorayk.wordpress.com/

jjbear11 Default port Points: 1497 More actions · Answer 4

This is what I've come up with so far:

declare @count int

declare @groupcount int

declare @NumberofRows int

select @NumberofRows = COUNT(*) from @school

set @count = 1

set @groupcount = 0

while @count <= @NumberofRows

begin

if (select case when text = '[DW].[DimClass]' then 1 else 0 end from @school where UniqueID = @count) = 1

BEGIN

set @groupcount = @groupcount + 1

update @school

set GroupID = @groupcount

where UniqueID = @count

END

ELSE

update @school

set GroupID = @groupcount

where UniqueID = @count

set @count = @count + 1

END

However, it is RBAR, and INCREDIBLY SLOOOOOW! I altered the @school table to include a GroupID (int) column.

Since performance is such a bear, I'd ideally like to do something like the above on the fly -- when Pivoting the data, as it is looking at each row, check if the Text is '[DW].[DimClass]'; if it is, increment the GroupID by 1.

Ray K SSC-Dedicated Points: 31135 More actions · Answer 5

Justin James (12/29/2009)
However, it is RBAR, and INCREDIBLY SLOOOOOW!

Which is why I'm making every effort to avoid RBAR. (If I was coding this in VB.NET or something similar, I'd already have a solution.)

I've been messing around with a recursive CTE, but I haven't been able to get it to work correctly. Here's what I have so far:

declare @RowNum table (ID int, RowNo int)

insert into @RowNum select ID, row_number() over (order by ID) from @school where ColumnName = 'DestinationTableName'

select * from @RowNum

; with NumberThis (ID, RowNo, ColumnName, [Text]) as

(

select ID, 1, ColumnName, [Text] from @school where ID < (select ID from @RowNum where RowNo = 2)

union all

select s.ID, n.RowNo + 1, s.ColumnName, s.[Text] from @school s join NumberThis n on s.ID > n.ID and s.ID < (select ID from @RowNum where RowNo = n.RowNo + 1)

)

select * from NumberThis

I haven't been able to get the recursive part of the CTE (after the UNION ALL) to work properly. Maybe someone else reading this can point out what I'm doing wrong.

+--------------------------------------------------------------------------------------+
Check out my blog at https://pianorayk.wordpress.com/

Ray K SSC-Dedicated Points: 31135 More actions · Answer 6

I'm not having much luck with this, but here's where I'm going with it.

Using the @school table that we defined earlier, I started with this query:

select ID, row_number() over (order by ID) from @school

where ColumnName = 'DestinationTableName'

What this does is it assigns an index to every ID that contains "DestinationTableName." It looks something like this:

ID(No column name)

01

72

163

In other words, your "DestinationTableName" (your delimiter, if you will) occurs at IDs 0, 7, and 16 in our sample data.

What I'm hoping to do is to use this to define row numbers for the values in the table. I've been messing around with a recursive CTE for this, but haven't gotten it to work properly. I started with this code:

; with NumberThis (ID, RowNo, ColumnName, [Text]) as

(

select ID, 1, ColumnName, [Text] from @school where ID < (select ID from @RowNum where RowNo = 2)

)

select * from NumberThis

(Note: @RowNum is a table I created to hold the contents of the "DestinationFileName" IDs I described above.)

This CTE (without the recursion) gives me this:

IDRowNoColumnNameText

01DestinationTableName[DW].[DimClass]

11ID123146

21SubjectAreaMath

31LocalCourseCode000A1

41SchoolID123

51SourceHub

61CourseTitleAlgebra I

Unfortunately, at this point, I'm at a bit of a loss as to what to do with it next. I'm still messing around with it, but I'm hoping one of you who has more SQL experience than I do can pick up the ball and continue helping poor Justin out!

+--------------------------------------------------------------------------------------+
Check out my blog at https://pianorayk.wordpress.com/

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 7

Justin James (12/29/2009)
Please see my original post for code to copy/paste.

Tell ya what, Justin... please see the first link in my signature line below. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 8

Justin James (12/29/2009)
There is not. I was thinking of using ROW_NUMBER(), but was unsure how to partition by each block of 6 rows.

I see you've already got that... simple integer divide will do the trick.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Rich Hansell SSCrazy Points: 2316 More actions · Answer 9

Alright.. I think this might what you're looking for with a pivot-ish solution. I found this on another site and retro fitted to your code ( I just used a normal table for school):

-- May be a solution

;With CTE(RID,ID,ColumnName,Text)

AS

(

Select ROW_NUMBER() over (Order by ID) as RID,* from

(Select * from school UNION Select Max(ID),'ID','' From school)a

Where ColumnName = 'ID'

)

Select * into #group FROM School a

JOIN (Select a.RID,A.id as Srt,B.ID as ends from cte a JOIN cte b on

B.RID - A.rid =1)b

ON a.id >= b.srt and a.id < b.ends

Select PKeys.Text as ID,B.Text as 'SubjectArea',

C.Text as 'LocalCourseCode',D.Text as 'SchoolID',

E.Text as 'Source', F.Text as 'CourseTitle',

G.Text as 'TeacherID', H.Text as 'Credits Possible'--,

--I.columnName as 'Destination

FROM

(Select RID,Text from #group where ColumnName = 'ID')PKEYS

LEFT OUTER JOIN #group B ON PKEYS.RID = B.RID AND B.ColumnName= 'SubjectArea'

LEFT OUTER JOIN #group C ON PKEYS.RID = C.RID AND C.ColumnName= 'LocalCourseCode'

LEFT OUTER JOIN #group D ON PKEYS.RID = D.RID AND D.ColumnName= 'SchoolID'

LEFT OUTER JOIN #group E ON PKEYS.RID = E.RID AND E.ColumnName= 'Source'

LEFT OUTER JOIN #group F ON PKEYS.RID = F.RID AND F.ColumnName= 'CourseTitle'

LEFT OUTER JOIN #group G ON PKEYS.RID = G.RID AND G.ColumnName= 'TeacherID'

LEFT OUTER JOIN #group H ON PKEYS.RID = H.RID AND H.ColumnName= 'Credits Possible'

The original source was here: http://www.calsql.com/2009/10/pivot-with-out-aggregate.html

Thanks,

Rich

LutzM SSC Guru Points: 107049 More actions · Answer 10

How about this to assign a group number to each set regardless of the number of rows?

;with cte1 AS

(

SELECT id,

row_number() OVER (order BY id) AS row

FROM @school

WHERE text = '[DW].[DimClass]'

),

cte2 AS

(

SELECT

row_number() OVER (order BY a.id) AS grp,

a.id start,

isnull(b.id,99) finish

FROM cte1 a

LEFT OUTER JOIN cte1 b ON a.row=b.row-1

)

,cte3 AS (

SELECT s.*, c.grp

FROM @school s

INNER JOIN cte2 c ON s.id >= c.start AND id < c.finish

)

SELECT [ID],[SubjectArea], [LocalCourseCode], [CourseTitle], [SchoolID], [Source],[TeacherID],[Credits Possible]

FROM (select ColumnName,[text],grp FROM cte3) As TheSource

PIVOT

(

max([Text])

FOR

ColumnName --each value

IN ( [ID],[SubjectArea], [LocalCourseCode], [CourseTitle], [SchoolID], [Source],[TeacherID],[Credits Possible])

) AS PIVOTALIAS

/* result set:

IDSubjectAreaLocalCourseCodeCourseTitleSchoolIDSourceTeacherIDCredits Possible

123146Math000A1Algebra I123HubNULLNULL

94503Science9037Biology II46Annex583981.0

84023NULLNULLPhysics INULLAnnexNULLNULL

*/

Edit: complete code and result set added (PIVOT clause was missing before...)

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 11

Another overcomplicated effort from the peanut gallery.

SELECT d.OutputRowNo,

MAX(CASE ColumnName WHEN 'ID' THEN [Text] END) AS [ID],

MAX(CASE ColumnName WHEN 'SubjectArea' THEN [Text] END) AS [SubjectArea],

MAX(CASE ColumnName WHEN 'LocalCourseCode' THEN [Text] END) AS [LocalCourseCode],

MAX(CASE ColumnName WHEN 'SchoolID' THEN [Text] END) AS [SchoolID],

MAX(CASE ColumnName WHEN 'Source' THEN [Text] END) AS [Source],

MAX(CASE ColumnName WHEN 'CourseTitle' THEN [Text] END) AS [CourseTitle],

MAX(CASE ColumnName WHEN 'TeacherID' THEN [Text] END) AS [TeacherID],

MAX(CASE ColumnName WHEN 'Credits Possible' THEN [Text] END) AS [Credits Possible]

FROM @school s

INNER JOIN (

SELECT

OutputRowNo = ROW_NUMBER() OVER(ORDER BY s.[ID]),

Lowerbound = s.[ID],

Upperbound = ISNULL((SELECT MIN([ID]) FROM @school WHERE [ID] > s.[ID] AND [Text] = '[DW].[DimClass]')-1, (SELECT MAX([ID]) FROM @school))

FROM @school s

WHERE s.[Text] = '[DW].[DimClass]'

) d ON s.[ID] BETWEEN d.Lowerbound AND d.Upperbound

GROUP BY d.OutputRowNo

Results:

OutputRowNo ID SubjectArea LocalCourseCode SchoolID Source CourseTitle TeacherID Credits Possible

----------- ------- ----------- --------------- -------- ------ ----------- --------- ----------------

1 123146 Math 000A1 123 Hub Algebra I NULL NULL

2 94503 Science 9037 46 Annex Biology II 58398 1.0

3 84023 NULL NULL NULL Annex Physics I NULL NULL

Cheers

ChrisM@home

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

LutzM SSC Guru Points: 107049 More actions · Answer 12

Nice one, Chris!!

Another overcomplicated effort from the peanut gallery.

I wouldn't take it that serious... if the requirement would have been to always have a group of 6, then there would have been easier solutions than what you recommended in your earlier post...

But this one beats my PIVOT solution by about 50% (as per execution plan, since there are not enough sample data to get any comparable duration results).

I'm sure there will be one or the other around to confirm that the CASE-based solution "outperformed" the PIVOT. Again.

But I guess it's not really about using PIVOT or CASE but more how to get the data grouped together. And your solution requires one less sorting operation (I have one more call of the ROW_NUMBER function...).

Again: Good job! (as far as I can see... 🙂 )

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 13

Cheers Lutz!

Having SQL Server handy to play with helps a bit ...

Looks like the OP has at least two solutions to choose from & tweak to perfection. Job's a good'un.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

mister.magoo SSC-Forever Points: 47068 More actions · Answer 14

Unless I missed it earlier in the thread, has anyone wondered why the text is "DW.DimClass"?

Is this data coming out of a Data Warehouse?

If so, would you not be better querying it with MDX (or failing that, finding the source data instead?)

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

mister.magoo SSC-Forever Points: 47068 More actions · Answer 15

For the sake of it, a solution which copes with new "ColumnName" values by using dynamic sql (go ahead - pork chop me or whatever you do :-D)

If you don't like dynamic SQL , then you could just use this to generate the query as and when the data requires a change....just change the "exec sp_executesql @sql" line to "print @sql"....

--= Dump the data into a temp table as this uses dynamic sql and needs a "proper" table to read.

IF OBJECT_ID('tempdb..#school') IS NULL

SELECT ID, ColumnName, [Text]

INTO #school

FROM @school

DECLARE @sql NVARCHAR(4000)

SET @sql = ''

;WITH base(ID,ColName,Text,ColNum)

AS

(

--= Get the data into some kind of order with the ID from the '[DW].[DimClass]' row for each set of data as the key

--= And a column number to ascertain the correct sequence for the columns later

SELECT s1.ID, Details.ColName, Details.Text , ROW_NUMBER() OVER(PARTITION BY s1.ID ORDER BY Details.ID) AS ColNum

FROM #school s1

OUTER APPLY (

--= Get the associated rows for the current group/ID

SELECT ID,ColumnName,[Text]

FROM #school s2

WHERE s2.ID>s1.ID

AND s2.ID < ISNULL(

(

--= Get the ID of the next group of data

SELECT TOP 1 ID

FROM #school s3

WHERE s3.ID>s1.ID AND s3.ColumnName = 'DestinationTableName'

ORDER BY s3.ID

),

(

--= Get the last row of data when there are no more groups

SELECT MAX(ID) FROM #school

)

) AS Details(ID,ColName,TEXT)

WHERE s1.ColumnName = 'DestinationTableName'

), cols(Name,POSITION)

AS

(

--= Get the distinct ColumnName values and positions (assuming the data will always place them in order in the first place)

SELECT ColName ,MAX(ColNum)

FROM base

GROUP BY ColName

), colList(list)

AS

(

--= Use FOR XML PATH('') trick to get a list of columns

SELECT STUFF((

SELECT ', '+QUOTENAME(Name,'[')

FROM cols

ORDER BY POSITION

FOR XML PATH('')),1,1,'')

)

SELECT @sql=list

FROM colList

--= Now build a dynamic SQL to PIVOT the data

SET @sql = N'

;WITH base(BASEID,ColName,TEXT)

AS

(

SELECT s1.ID, Details.ColName, Details.Text

FROM #school s1

OUTER APPLY (

SELECT ID,ColumnName,[Text]

FROM #school s2

WHERE s2.ID>s1.ID

AND s2.ID < ISNULL(

(

SELECT TOP 1 ID

FROM #school s3

WHERE s3.ID>s1.ID AND s3.ColumnName = ''DestinationTableName''

ORDER BY s3.ID

),

(

SELECT MAX(ID) FROM #school

)

) AS Details(ID,ColName,TEXT)

WHERE s1.ColumnName = ''DestinationTableName''

)

SELECT BASEID,' + @sql + '

FROM base

PIVOT (MAX([Text]) FOR ColName IN ('+@sql +')) AS PVT

ORDER BY BASEID'

EXEC sp_executesql @sql

IF NOT (OBJECT_ID('tempdb..#school') IS NULL)

DROP TABLE #school

I make no claims for performance as I have not tried to optimise it, but it demonstrates a technique that could be used...

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);