Split comma delimited field?

Question

Split comma delimited field?

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 1

sgmunson - Thursday, October 25, 2018 12:45 PM
Lynn Pettis - Wednesday, October 24, 2018 3:23 PM
Still needs to be dynamic unless you plan to edit it every time you want to run it since the number of values in the comma delimited column is unknown.
As is, that query just delivers NULL values for columns where the number of values in the existing data is less than 13, so no need for dynamic, as there's no variation in the number of columns actually delivered, and chances are, there shouldn't be. What tool could effectively consume a dataset that varies in the actual number of columns for a given row? SQL Server couldn't even deliver such a row-set. Rows are fixed things. That's not avoidable until you start delivering 1 row at a time, and slow yourself down to a crawl with overhead. Who needs more RBAR?

Dynamic SQL doesn't make this RBAR. And, unless I miss my guess, Lynn isn't suggesting that each row be varied by number of columns. Instead, he's suggested that the number of columns for the run be automatically varied so that you're 1) Not returning unnecessary columns that will never be filled if there are less than 13 columns to pivot and 2) make it so that you don't have to make modifications to the code when there are more than 13 columns to be pivoted.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 2

ScottPletcher - Thursday, October 25, 2018 9:49 AM
Jeff Moden - Wednesday, October 24, 2018 6:07 PM
Jonathan AC Roberts - Wednesday, October 24, 2018 3:51 PM
Lynn Pettis - Wednesday, October 24, 2018 3:23 PM
Still needs to be dynamic unless you plan to edit it every time you want to run it since the number of values in the comma delimited column is unknown.
Wouldn't Scott's solution just put nulls in columns higher than the number of values in the csv? The OP said the maximum number of columns is 13.
That would be true... until the OP runs into something with more than 13 columns. If Scott were to convert his great CROSS TAB solution to dynamic SQL, it would become "self healing".
And far more complex. If this is for data periods in a year, for example, it's unlikely to ever change. A year has 12 months or 13 4-week periods per year, and that's it, ever. If it's something truly fixed like that, there's no need to add complexity.
That is, I don't think it's necessarily wrong to keep the code straight-forward if that's all that is reasonably likely to be needed.

While I agree that hard-coded functionality is frequently faster and easier to modify, the dynamic version of the code is not "far more" complex and certainly not difficult to modify (which you might never need to do if done correctly the first time). It's simple code that is written once and used for virtually any number of columns and will easily handle changing requirements. For example, if you look at your 12 months or 13 4-week periods per year requirements, you have two different requirements. Ok, sure... it's not difficult for anyone to simply ignore the 13th column for 12 week applications (although the CEO and other mangers simply might not want such a thing or some application may not handle that) but what happens when the requirement (especially for one offs), "OK, we need to see 36 months until further notice". And, no... it's not going to hurt performance either if done correctly.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jason A. Long SSC-Insane Points: 23731 More actions · Answer 3

Jeff Moden - Thursday, October 25, 2018 3:06 PM
ScottPletcher - Thursday, October 25, 2018 9:49 AM
Jeff Moden - Wednesday, October 24, 2018 6:07 PM
Jonathan AC Roberts - Wednesday, October 24, 2018 3:51 PM
Lynn Pettis - Wednesday, October 24, 2018 3:23 PM
Still needs to be dynamic unless you plan to edit it every time you want to run it since the number of values in the comma delimited column is unknown.
Wouldn't Scott's solution just put nulls in columns higher than the number of values in the csv? The OP said the maximum number of columns is 13.
That would be true... until the OP runs into something with more than 13 columns. If Scott were to convert his great CROSS TAB solution to dynamic SQL, it would become "self healing".
And far more complex. If this is for data periods in a year, for example, it's unlikely to ever change. A year has 12 months or 13 4-week periods per year, and that's it, ever. If it's something truly fixed like that, there's no need to add complexity.
That is, I don't think it's necessarily wrong to keep the code straight-forward if that's all that is reasonably likely to be needed.
While I agree that hard-coded functionality is frequently faster and easier to modify, the dynamic version of the code is not "far more" complex and certainly not difficult to modify (which you might never need to do if done correctly the first time). It's simple code that is written once and used for virtually any number of columns and will easily handle changing requirements. For example, if you look at your 12 months or 13 4-week periods per year requirements, you have two different requirements. Ok, sure... it's not difficult for anyone to simply ignore the 13th column for 12 week applications (although the CEO and other mangers simply might not want such a thing or some application may not handle that) but what happens when the requirement (especially for one offs), "OK, we need to see 36 months until further notice". And, no... it's not going to hurt performance either if done correctly.

One potential downside to using dynamic sql could be the front end display environment. SSRS, for example, needs to be defined with a specific set of predefined columns and will choke & puke on a dynamic cross tab.

ScottPletcher SSC Guru Points: 100262 More actions · Answer 4

Jeff Moden - Thursday, October 25, 2018 3:06 PM
ScottPletcher - Thursday, October 25, 2018 9:49 AM
Jeff Moden - Wednesday, October 24, 2018 6:07 PM
Jonathan AC Roberts - Wednesday, October 24, 2018 3:51 PM
Lynn Pettis - Wednesday, October 24, 2018 3:23 PM
Still needs to be dynamic unless you plan to edit it every time you want to run it since the number of values in the comma delimited column is unknown.
Wouldn't Scott's solution just put nulls in columns higher than the number of values in the csv? The OP said the maximum number of columns is 13.
That would be true... until the OP runs into something with more than 13 columns. If Scott were to convert his great CROSS TAB solution to dynamic SQL, it would become "self healing".
And far more complex. If this is for data periods in a year, for example, it's unlikely to ever change. A year has 12 months or 13 4-week periods per year, and that's it, ever. If it's something truly fixed like that, there's no need to add complexity.
That is, I don't think it's necessarily wrong to keep the code straight-forward if that's all that is reasonably likely to be needed.
While I agree that hard-coded functionality is frequently faster and easier to modify, the dynamic version of the code is not "far more" complex and certainly not difficult to modify (which you might never need to do if done correctly the first time). It's simple code that is written once and used for virtually any number of columns and will easily handle changing requirements. For example, if you look at your 12 months or 13 4-week periods per year requirements, you have two different requirements. Ok, sure... it's not difficult for anyone to simply ignore the 13th column for 12 week applications (although the CEO and other mangers simply might not want such a thing or some application may not handle that) but what happens when the requirement (especially for one offs), "OK, we need to see 36 months until further notice". And, no... it's not going to hurt performance either if done correctly.

Have to disagree. Any dynamically constructed code will be "far more" complex than a very straight-forward cross tab. If dynamic isn't needed, why use it?

If it came to it, if you're that hyper about extra columns, you could use an intermediate table and drop (or rename) any columns that didn't populate at all.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 5

ScottPletcher - Thursday, October 25, 2018 4:42 PM
Jeff Moden - Thursday, October 25, 2018 3:06 PM
ScottPletcher - Thursday, October 25, 2018 9:49 AM
Jeff Moden - Wednesday, October 24, 2018 6:07 PM
Jonathan AC Roberts - Wednesday, October 24, 2018 3:51 PM
Lynn Pettis - Wednesday, October 24, 2018 3:23 PM
Still needs to be dynamic unless you plan to edit it every time you want to run it since the number of values in the comma delimited column is unknown.
Wouldn't Scott's solution just put nulls in columns higher than the number of values in the csv? The OP said the maximum number of columns is 13.
That would be true... until the OP runs into something with more than 13 columns. If Scott were to convert his great CROSS TAB solution to dynamic SQL, it would become "self healing".
And far more complex. If this is for data periods in a year, for example, it's unlikely to ever change. A year has 12 months or 13 4-week periods per year, and that's it, ever. If it's something truly fixed like that, there's no need to add complexity.
That is, I don't think it's necessarily wrong to keep the code straight-forward if that's all that is reasonably likely to be needed.
While I agree that hard-coded functionality is frequently faster and easier to modify, the dynamic version of the code is not "far more" complex and certainly not difficult to modify (which you might never need to do if done correctly the first time). It's simple code that is written once and used for virtually any number of columns and will easily handle changing requirements. For example, if you look at your 12 months or 13 4-week periods per year requirements, you have two different requirements. Ok, sure... it's not difficult for anyone to simply ignore the 13th column for 12 week applications (although the CEO and other mangers simply might not want such a thing or some application may not handle that) but what happens when the requirement (especially for one offs), "OK, we need to see 36 months until further notice". And, no... it's not going to hurt performance either if done correctly.
Have to disagree. Any dynamically constructed code will be "far more" complex than a very straight-forward cross tab. If dynamic isn't needed, why use it?
If it came to it, if you're that hyper about extra columns, you could use an intermediate table and drop (or rename) any columns that didn't populate at all.

You are allowed to disagree with us just as we are allowed to disagree with you. You aren't the end all and with all things SQL it all comes down to "It depends."

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 6

ScottPletcher - Thursday, October 25, 2018 4:42 PM
Jeff Moden - Thursday, October 25, 2018 3:06 PM
ScottPletcher - Thursday, October 25, 2018 9:49 AM
Jeff Moden - Wednesday, October 24, 2018 6:07 PM
Jonathan AC Roberts - Wednesday, October 24, 2018 3:51 PM
Lynn Pettis - Wednesday, October 24, 2018 3:23 PM
Still needs to be dynamic unless you plan to edit it every time you want to run it since the number of values in the comma delimited column is unknown.
Wouldn't Scott's solution just put nulls in columns higher than the number of values in the csv? The OP said the maximum number of columns is 13.
That would be true... until the OP runs into something with more than 13 columns. If Scott were to convert his great CROSS TAB solution to dynamic SQL, it would become "self healing".
And far more complex. If this is for data periods in a year, for example, it's unlikely to ever change. A year has 12 months or 13 4-week periods per year, and that's it, ever. If it's something truly fixed like that, there's no need to add complexity.
That is, I don't think it's necessarily wrong to keep the code straight-forward if that's all that is reasonably likely to be needed.
While I agree that hard-coded functionality is frequently faster and easier to modify, the dynamic version of the code is not "far more" complex and certainly not difficult to modify (which you might never need to do if done correctly the first time). It's simple code that is written once and used for virtually any number of columns and will easily handle changing requirements. For example, if you look at your 12 months or 13 4-week periods per year requirements, you have two different requirements. Ok, sure... it's not difficult for anyone to simply ignore the 13th column for 12 week applications (although the CEO and other mangers simply might not want such a thing or some application may not handle that) but what happens when the requirement (especially for one offs), "OK, we need to see 36 months until further notice". And, no... it's not going to hurt performance either if done correctly.
Have to disagree. Any dynamically constructed code will be "far more" complex than a very straight-forward cross tab. If dynamic isn't needed, why use it?
If it came to it, if you're that hyper about extra columns, you could use an intermediate table and drop (or rename) any columns that didn't populate at all.

Adding an "intermediate table" and then "drop (or rename) and columns that didn't populate at all" sounds a heck of a lot more complex than just having a little dynamic code build it right the first time. It's kind of like avoiding the use of SELECT * when you don't need all the columns..

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions