Multi Column Sort

  • I think that sample data is adjusted to OP needs.

    OP fourth post (first on second page) still doesn't answer the question of "3, NULL, 9" sample data. OP just rephrase himself without giving a definitive answer.

    This can also be interpreted as not all records should be returned. Just return all records that satisfies the rules and omit the ones that doesn't.

    Until OP leaves a definitive answer to "3, NULL, 9" sample data enigma, we will not know.


    N 56°04'39.16"
    E 12°55'05.25"

  • Since this seems to be a classic example of (whose signature line has this?) "You need an answer? NO. You need a question.", I'd like to offer a hypothetical case as a guess at the underlying requirements. I, like some people, can attack a problem more easily with a concrete model in mind than if it's presented in the abstract. I know you-all will let me know if I've mis-interpreted this.

    Let's say that the columns each represent a real-time reading of cumulative rainfall from various measuring stations and that those individual metrics on any one row were gathered at the same time. Because those stations are in remote areas, sometimes they cannot send a reading. When that happens, there's a NULL for that station's column, but any subsequent reading that's recorded for that station will have a higher value than any previous value in that column.

    Now, let's say that some time into the gathering of data, it's realized that the table was not provided with a timestamp column or identity column. The ID column is, for whatever reason, not to be trusted for ordering. Perhaps the designer assumed that since there'd be a row every day, it was known when the first row was recorded, and since the data for each column is always ascending, there'd be no need. Or maybe s/he assumed that data may be retrieved in the order inserted, a typical beginner's error with SQL. (Not necessarily what really happened with the OP's situation, but it's my hypothetical to make the nature of the data understandable).

    So, now we need to report the data in sequence, and since there are NULLs in some rows for col1, we cannot simply order by Col1. Or any other column as the data stands.

    ****EDIT****

    RBarry points out (in very next post after this) that in the original problem, the values in any one row are specfied as always being in ascending sequence. I would think that this is a peripheral characteristic of the data as given and may or may not affect the solution, depending on whether you want to allow for indeterminate order caused by impossible rows.

  • First, I think that it is important to understand that this is actually a SQL puzzle or theoretical problem. It is not a real-world problem. As such, the rules of math & comp-sci puzzles apply. In particular is "assume the problem is complete/answerable as stated."

    Here is what the OP said:

    The values always increment across the row such that, when the rows are in order, the values in all columns also increment top to bottom. There are random cells that do not have a value (zero in this case) and are not considered in the sort.

    Poorly stated though it is, there is still only one logically consistent way to interpret this statement, which the following rules:

    1) When the rows are "in order", values will monotonically increase left to right and top to bottom.

    2) This places implicit restrictions on the values of columns within a row: they must always be increasing left to right.

    3) This also place restrictions on the make-up of the datasets: It must be possible to order them this way. If not, then the meaning of "in order" is indeterminate

    Note that there are no provisions in any of this that would allow for dropping or excluding any rows. Under the rules of puzzles & theoretical problems we must then conclude that this is not allowed.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I agree with most points, but I strongly disagree with creating sample data that fulfills OP's wanted output. I don't believe in some divine intervention that sample data is perfectly built the way OP want it sorted.


    N 56°04'39.16"
    E 12°55'05.25"

  • RBarryYoung (4/8/2009)


    . . . .

    1) When the rows are "in order", values will monotonically increase left to right and top to bottom.

    2) This places implicit restrictions on the values of columns within a row: they must always be increasing left to right.

    3) This also place restrictions on the make-up of the datasets: It must be possible to order them this way. If not, then the meaning of "in order" is indeterminate

    Note that there are no provisions in any of this that would allow for dropping or excluding any rows. Under the rules of puzzles & theoretical problems we must then conclude that this is not allowed.

    I would think that this is a peripheral characteristic of the data as given and may or may not affect the solution, depending on whether you want to allow for indeterminate order caused by impossible rows. It is a valid point to consider, and I will edit my "hypothetical" post to note this aspect of the data.

  • john.arnott (4/8/2009)


    RBarryYoung (4/8/2009)


    . . . .

    1) When the rows are "in order", values will monotonically increase left to right and top to bottom.

    2) This places implicit restrictions on the values of columns within a row: they must always be increasing left to right.

    3) This also place restrictions on the make-up of the datasets: It must be possible to order them this way. If not, then the meaning of "in order" is indeterminate

    Note that there are no provisions in any of this that would allow for dropping or excluding any rows. Under the rules of puzzles & theoretical problems we must then conclude that this is not allowed.

    I would think that this is a peripheral characteristic of the data as given and may or may not affect the solution, depending on whether you want to allow for indeterminate order caused by impossible rows. It is a valid point to consider, and I will edit my "hypothetical" post to note this aspect of the data.

    Whether you allow for it or not, the correct order in any dataset that does not conform to these rules is "undefined" and thus meaningless. Tests are only valid on the datasets that conform to these rules, so there is no real point in testing any other datasets.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Peso (4/8/2009)


    I agree with most points, but I strongly disagree with creating sample data that fulfills OP's wanted output. I don't believe in some divine intervention that sample data is perfectly built the way OP want it sorted.

    My generator generates the data according to the specified rules and is always unambigously, deterministically sortable. No divine intervention necessary. If you find any flaw in it please let me know.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (4/8/2009)


    Peso (4/8/2009)


    I agree with most points, but I strongly disagree with creating sample data that fulfills OP's wanted output. I don't believe in some divine intervention that sample data is perfectly built the way OP want it sorted.

    My generator generates the data according to the specified rules and is always unambigously, deterministically sortable. No divine intervention necessary. If you find any flaw in it please let me know.

    The relative position of nulls and data isn't entirely random 😛

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • OK. If the puzzle's rules specify that the given data always has increasing values in the respective columns of each row, so be it. Does that affect one's solution? If you put the rows in such an order that every column is in order, disregarding nulls, does the left-to-right order (fixed in each row and not altered by the sort) matter to how you achieve the ordering?

    With data that fits the rules, an ordered set might be:

    col1 col2 col3 col4

    1 2 _ 5

    3 6 7 _

    _ 7 8 9

    4 _ 9 _

    Would the solution algorithm be different with "illegal" data that could still be in order for every column?

    col1 col2 col3 col4

    _ 3 _ 2

    3 5 _ 3

    _ 6 2 _

    4 _ 3 6

  • john.arnott (4/8/2009)


    OK. If the puzzle's rules specify that the given data always has increasing values in the respective columns of each row, so be it. Does that affect one's solution? If you put the rows in such an order that every column is in order, disregarding nulls, does the left-to-right order (fixed in each row and not altered by the sort) matter to how you achieve the ordering?

    With data that fits the rules, an ordered set might be:

    col1 col2 col3 col4

    1 2 _ 5

    3 6 7 _

    _ 7 8 9

    4 _ 9 _

    Would the solution algorithm be different with "illegal" data that could still be in order for every column?

    col1 col2 col3 col4

    _ 3 _ 2

    3 5 _ 3

    _ 6 2 _

    4 _ 3 6

    As far as I can tell, the left-to-right thing is just a distraction.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Am I correct in saying the values need to stay in the same rows?

    or could I sort this sample as follows:

    Sample

    Col1 Col2 Col3

    9 1 6

    null 7 2

    8 null 4

    result 1

    Col1 Col2 Col3

    1 2 4

    6 7 8

    9 null null

    Or does it have to be like the this?

    Result 2

    Col1 Col2 Col3

    1 6 9

    2 7 null

    4 8 null

    This sounds like an awesome puzzle

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • john.arnott (4/8/2009)


    OK. If the puzzle's rules specify that the given data always has increasing values in the respective columns of each row, so be it. Does that affect one's solution? If you put the rows in such an order that every column is in order, disregarding nulls, does the left-to-right order (fixed in each row and not altered by the sort) matter to how you achieve the ordering?

    With data that fits the rules, an ordered set might be:

    col1 col2 col3 col4

    1 2 _ 5

    3 6 7 _

    _ 7 8 9

    4 _ 9 _

    Would the solution algorithm be different with "illegal" data that could still be in order for every column?

    col1 col2 col3 col4

    _ 3 _ 2

    3 5 _ 3

    _ 6 2 _

    4 _ 3 6

    Hi John

    With no solution yet for Barry's "killer data" I reckon we need every clue we can get 🙂

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Christopher Stobbs (4/9/2009)


    Am I correct in saying the values need to stay in the same rows?

    or could I sort this sample as follows:

    Sample

    Col1 Col2 Col3

    9 1 6

    null 7 2

    8 null 4

    result 1

    Col1 Col2 Col3

    1 2 4

    6 7 8

    9 null null

    Or does it have to be like the this?

    Result 2

    Col1 Col2 Col3

    1 6 9

    2 7 null

    4 8 null

    This sounds like an awesome puzzle

    By my understanding of the conditions of the puzzle, your sample data is illegal. Each row is to remain intact and the result has to find every column in ascending sequence after ignoring nulls. With your sample data, it's impossible to have both col2 and col3 in order at the same time.

  • Well, I have thought out a proper use-case.

    Think of the eight columns as a freeway with 8 lanes.

    All 8 lanes are always open.

    At the "end" of the freeway (starting from top to bottom), there comes a toll-booth. This is marked by the columns having a value.

    You want to use the tool-booth where the cost is the lowest for passing by.

    So you need to find the next record to continue the travel.

    If you start with CorrectSequence 1, you see that toll-booh 1 and 2 are open. So you find next record which has at least one of toll-both 1 and 2 open. There can be other open too, but to pass toll-booth 1 or toll-booth 2 has to be used, and för any matching record you choose the one with the smallest difference in fees. Difference is between records.

    So you find CorrectSequence 2 (by passing toll-booth 2). All traffic passes that toll-booth and spread over the 8 lanes again. When this section/record has ended (CorrectSequence 2) you find that now is toll-booths 2 and 3 are open. So you need to find another record which has at least toll-booth 2 or 3 open.

    So you find CorrectSequence 3 because it has the smallest difference of fees. Traffic passes and spreads across all 8 lanes.

    Then you find CorrectSequence 4 because that record has toll-booth 1 and 6 open. Traffic passes and then you find CorrectSequence 5 which has toll-booth 1 open and this toll-booth has the smalles difference of fees.

    Does this explanation make anything clearer?


    N 56°04'39.16"
    E 12°55'05.25"

  • Peso,

    I'm not sure that I'm following your toll-booth analogy properly. I don't see how it fits with the data you posted as coming from the OP:

    SELECT 1, 2.35, 3.01, 3.49, 4.25, 4.79, 5.36, 5.82, 6.31

    UNION SELECT 2, null, 3.59, 4.32, null, 5.21, null, null, 7.12

    UNION SELECT 3, NULL, null, null, 5.23, 5.41, 6.33, 6.89, 7.99

    UNION SELECT 4, null, 5.46, 5.31, 5.92, null, 6.87, 7.34, null

    UNION SELECT 5, 5.12, 5.66, 6.31, 6.92, null, 7.87, 8.34, null

    By saying "toll booths 1 and 2 are open", do you mean you have non-null data in them? So then what? We find the row with the next higher value for col1 or col2? So far, so good -- we have found row 2. But then what happens... we see that in row 2 we have values in col2 and col3. If we just look for the next higher value in those two columns, we'll get row 4 and miss row 3.

Viewing 15 posts - 136 through 150 (of 180 total)

You must be logged in to reply to this topic. Login to reply