Crosstab or Pivot variant - stuck!

Question

Post reply

Crosstab or Pivot variant - stuck!

ProKelly

SSCommitted

Points: 1867
More actions
May 31, 2012 at 8:36 am

#392395

Ok, here goes.
I have a table, which sits within my CRM and stores "questionnaire" type information, in a very cumbersome way.
The table has 4 pertinent columns. CaseID, FormName, Name, Value.
CaseID is the PK which links back to the case. Formname is the name of the questionnaire, name provides the 'question' and value provides the answer.
Lets say a form has 10 questions, that means there are going to be 10 rows of information, all with the same caseid, and with the name and value combinations being different for each row. For example...
CaseID, Formname, Name, Value
1, Highways, roadname, darlington road
1, Highways, faulttype, Drain
1, Highways, addinfo, Cover stuck up cannot be moved
2, Police Report, roadname, Middleham Road
2, Polive Report, location, Nr Barton
I want to change the format of the information so that the names, or questions, are columns and the answers are the values. For example to look like this
CaseId, Formname, Roadname, faulttype, addinfo, location
1, Highways, Darlington Road, Drain, Coverstuck up cannot be moved, null, null
2, Police Report, Middleham Road, null, null, Nr Barton
I cannot seem to do this with pivot, as it want's to aggregate and the responses are text fields. Any clues?

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

SQL Padawan Say Hey Kid Points: 712 More actions · Answer 1

I was bored but this should work:

IF OBJECT_ID('tempdb..#basetable') IS NOT NULL

DROP TABLE #basetable

CREATE TABLE #basetable

(

caseid INT,

formname VARCHAR(255),

name VARCHAR(255),

value VARCHAR(255)

)

DECLARE @sql VARCHAR(MAX)

DECLARE @cols VARCHAR(MAX)

DECLARE @joins VARCHAR(MAX)

INSERT #basetable(caseid,formname,name,value)

VALUES

(1, 'Highways', 'roadname', 'darlington road'),

(1, 'Highways', 'faulttype', 'Drain'),

(1, 'Highways', 'addinfo', 'Cover stuck up cannot be moved'),

(2, 'Police Report', 'roadname', 'Middleham Road'),

(2, 'Police Report', 'location', 'Nr Barton')

SET @cols =

(SELECT ','+ '[' + name + '] = t' + CAST(ROW_NUMBER() OVER (ORDER BY name) AS VARCHAR(3)) +'.value'

FROM

(SELECT DISTINCT name FROM #basetable) x

FOR XML PATH(''))

SET @joins =

(SELECT 'LEFT JOIN #basetable t' + CAST(ROW_NUMBER() OVER (ORDER BY name) AS VARCHAR(3)) + ' ON t.caseid = '

+ 't' + CAST(ROW_NUMBER() OVER (ORDER BY name) AS VARCHAR(3)) + '.caseid AND t.formname = ' +

't' + CAST(ROW_NUMBER() OVER (ORDER BY name) AS VARCHAR(3)) + '.formname AND t' + CAST(ROW_NUMBER() OVER (ORDER BY name) AS VARCHAR(3))

+ '.name = ''' + name + ''''+ CHAR(10)

FROM

(SELECT DISTINCT name FROM #basetable) x

FOR XML PATH(''))

SET @sql = 'SELECT DISTINCT t.caseid,t.formname'+@cols+' '+ CHAR(10) + 'FROM #basetable t' + CHAR(10) + @joins

EXEC(@sql)

Greg Snidow SSCoach Points: 16106 More actions · Answer 2

That looks like it is just what was ordered. If it were me I would much rather have it the way it is to start with. Can you imagine what that would look like with hundreds of reports, each having different column names?

Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.

Dwain Camps SSC Guru Points: 86913 More actions · Answer 3

I do deplore dynamic SQL unless you really need it. Try this:

DECLARE @basetable TABLE

(caseid INT, formname VARCHAR(255), name VARCHAR(255), value VARCHAR(255))

INSERT @basetable(caseid,formname,name,value)

VALUES

(1, 'Highways', 'roadname', 'darlington road'),

(1, 'Highways', 'faulttype', 'Drain'),

(1, 'Highways', 'addinfo', 'Cover stuck up cannot be moved'),

(2, 'Police Report', 'roadname', 'Middleham Road'),

(2, 'Police Report', 'location', 'Nr Barton')

SELECT caseid, formname=MAX(formname)

,RoadName=MAX(CASE WHEN name = 'roadname' THEN value ELSE NULL END)

,faulttype=MAX(CASE WHEN name = 'faulttype' THEN value ELSE NULL END)

,addinfo=MAX(CASE WHEN name = 'addinfo' THEN value ELSE NULL END)

,location=MAX(CASE WHEN name = 'location' THEN value ELSE NULL END)

FROM @basetable

GROUP BY CaseID

You just need to add a CASE for each name attribute that you're interested in reporting on in a column (others will be ignored).

Check for an article by Jeff Moden on dynamic cross tabs.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

R.P.Rozema SSChampion Points: 12305 More actions · Answer 4

Almost. But the written request does not match the requested output (this is exactly why we always ask for example output) : the group by should be on caseid AND formname.

Using the DDL and example code that ProKelly forgot but luckily SQL Padawan stepped in:

print '---- cross tab solution -----'

set statistics io on;

select forms.CaseId,

forms.Formname,

max(case forms.name when 'roadname' then forms.value end) as Roadname,

max(case forms.name when 'faulttype' then forms.value end) as faulttype,

max(case forms.name when 'addinfo' then forms.value end) as addinfo,

max(case forms.name when 'location' then forms.value end) as location

from #basetable forms

group by forms.caseid,

forms.formname

set statistics io off;

print '---- left join solution -----'

set statistics io on;

SELECT DISTINCT t.caseid,t.formname,[addinfo] = t1.value,[faulttype] = t2.value,[location] = t3.value,[roadname] = t4.value FROM #basetable t LEFT JOIN #basetable t1 ON t.caseid = t1.caseid AND t.formname = t1.formname AND t1.name = 'addinfo' LEFT JOIN #basetable t2 ON t.caseid = t2.caseid AND t.formname = t2.formname AND t2.name = 'faulttype' LEFT JOIN #basetable t3 ON t.caseid = t3.caseid AND t.formname = t3.formname AND t3.name = 'location' LEFT JOIN #basetable t4 ON t.caseid = t4.caseid AND t.formname = t4.formname AND t4.name = 'roadname'

set statistics io off;

As you can see I've compared the both methods presented so far on their IO behavior: the left join method needs 5 scans at the cost of 21 logical reads, whereas the cross tab method only needs 1 scan and 1 logical read. Imagine what happens if the table contains not 5 rows, but several hundreds of thousands...

edit: order of the columns was not the same as requested.

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 5

ProKelly (5/31/2012)
I cannot seem to do this with pivot, as it want's to aggregate and the responses are text fields. Any clues?

What's wrong with using MAX as the aggregate to pivot the Text?

The real key here is do you want this code to dynamically handle any "Name" value for any given unique values of CaseID, Formname, and Name? To wit, is there some sort of guarantee that the combination of CaseID, Formname, and Name will be unique in this table?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

ProKelly SSCommitted Points: 1867 More actions · Answer 6

Jeff Moden (6/3/2012)
ProKelly (5/31/2012)
I cannot seem to do this with pivot, as it want's to aggregate and the responses are text fields. Any clues?
What's wrong with using MAX as the aggregate to pivot the Text?
The real key here is do you want this code to dynamically handle any "Name" value for any given unique values of CaseID, Formname, and Name? To wit, is there some sort of guarantee that the combination of CaseID, Formname, and Name will be unique in this table?

Aha! I haven't tried aggregating text, with the presumption you can't - but I could try that?

CaseID+Formname themselves are unique. The reason for this transformation is that so I can dimensionalise this ugly table within a reporting cube in analysis services. To do this, I need the full list of names (questions) by CaseID and Formname, which will link back to the fact table.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 7

ProKelly (6/3/2012)
Jeff Moden (6/3/2012)
ProKelly (5/31/2012)
I cannot seem to do this with pivot, as it want's to aggregate and the responses are text fields. Any clues?
What's wrong with using MAX as the aggregate to pivot the Text?
The real key here is do you want this code to dynamically handle any "Name" value for any given unique values of CaseID, Formname, and Name? To wit, is there some sort of guarantee that the combination of CaseID, Formname, and Name will be unique in this table?
Aha! I haven't tried aggregating text, with the presumption you can't - but I could try that?
CaseID+Formname themselves are unique. The reason for this transformation is that so I can dimensionalise this ugly table within a reporting cube in analysis services. To do this, I need the full list of names (questions) by CaseID and Formname, which will link back to the fact table.

If he has all of the column names you want, R.P.Rozema's corrected version of what Dwaine wrote should do nicely for you, then. Notice that it does use "MAX" for the aggregate. That's why I needed to know if the CaseID, FormName, and Name columns formed something unique. If the Name within a give instance of CaseID and FormName isn't unique, you miss some data. Please confirm that Name is unique within any give combination of CaseID and FormName.

R.P.Rozema is also correct about using multiple joins to do this. That would be slower and more resource intensive. The idea of using Dynamic SQL, however, is not a bad idea if you want this code to be "self healing" if someone adds a new "Name" to the mix. I was intending to write some dynamic SQL to do that in an expeditious fashion but I need to know what I asked about uniqueness, first.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

SQL Padawan Say Hey Kid Points: 712 More actions · Answer 8

Great point guys. Here is a better solution with dynamic sql.

IF OBJECT_ID('tempdb..#basetable') IS NOT NULL

DROP TABLE #basetable

CREATE TABLE #basetable

(

caseid INT,

formname VARCHAR(255),

name VARCHAR(255),

value VARCHAR(255)

)

DECLARE @sql VARCHAR(MAX)

DECLARE @cols VARCHAR(MAX)

DECLARE @joins VARCHAR(MAX)

INSERT #basetable(caseid,formname,name,value)

VALUES

(1, 'Highways', 'roadname', 'darlington road'),

(1, 'Highways', 'faulttype', 'Drain'),

(1, 'Highways', 'addinfo', 'Cover stuck up cannot be moved'),

(2, 'Police Report', 'roadname', 'Middleham Road'),

(2, 'Police Report', 'location', 'Nr Barton')

SET @cols =

STUFF(

(

SELECT ',' + QUOTENAME(name) + '= MAX(CASE WHEN name = '''+name+''' THEN name ELSE NULL END)' + CHAR(10)

FROM (SELECT DISTINCT id = RANK() OVER(ORDER BY name),name FROM #basetable) x

ORDER BY id FOR XML PATH('')

)

,1,1,'')

SET @sql =

'

SELECT

caseid,

formname,' + CHAR(10) + @cols + 'FROM #basetable GROUP BY caseid,formname'

EXEC (@sql)

ProKelly SSCommitted Points: 1867 More actions · Answer 9

CaseID + Formname + Name is indeed always unique. Thanks so much for the help here people, even though (after reading Jeff's ettiquette article) my question was poorly presented. Will try harder next time.

Thanks again, and thanks SQL Padawan for the corrected SQL