Efficient solution using TSQL

  • I have a situation where a table carries data like detailed below

    Region name sequence description

    1 apple 1 It is round

    1 apple 2 It is sweet

    1 apple 3 it is colored

    1 orange 1 It is yellow

    1 orange 2 It is sour

    1 lemon 1 It is green

    1 nut 1 hard to crack

    2 apple 1 It is round

    2 apple 2 It is sweet

    2 apple 3 it is colored

    2 orange 1 It is yellow

    2 orange 2 It is sweet and sour

    2 lemon 1 It is green

    2 pear 1 it is a fruit

    3 apple 1 It is round

    3 apple 2 It is sweet

    3 apple 3 it is colored

    3 orange 1 It is yellow

    3 orange 2 It is sweet and sour

    3 lemon 1 It is yelow

    3 berry 1 It is small

    3 nut 1 hard to crack

    3 pear 1 it is vegetable

    So there are 1 to many rows of definition for fruits in multiple regions. There can be more than 3 regions - above is just a sample.

    I need to identify those fruits which are defined the same in all regions those are defined in (if not definied in a region we do not care) or are defined in just one region. If a fruit is defined in more than one regions than I need to select only if it is defined the same in all.

    In the example above I need to select

    Apple rows because it is same in all regions

    1 apple 1 It is round

    1 apple 2 It is sweet

    1 apple 3 it is colored

    NUT rows because it is same in all or is blank in region 2

    1 nut 1 hard to crack

    BERRY row becaue it is just there in one and blank in other

    3 berry 1 It is small

    Do not need orange or Lemon because these have different definitions

    Do not need pear because it has diff definitions on the two regions it is present.

    Do not want to do it using cursors - just a simple set operation solution is what I am looking for.

    Thanks in advance.

  • Main:

    I am a bit shaky on this, but this is what I came up with for a first pass:

    declare @fruit table

    ( region integer,

    name varchar(9),

    sequence integer,

    description varchar(25)

    )

    insert into @fruit

    select 1, 'apple', 1, 'It is round' union all

    select 1, 'apple', 2, 'It is sweet' union all

    select 1, 'apple', 3, 'it is colored' union all

    select 1, 'orange', 1, 'It is yellow' union all

    select 1, 'orange', 2, 'It is sour' union all

    select 1, 'lemon', 1, 'It is green' union all

    select 1, 'nut', 1, 'hard to crack' union all

    select 1, 'lime', 1, 'It is green' union all

    select 1, 'lime', 2, 'It is sour' union all

    select 2, 'apple', 1, 'It is round' union all

    select 2, 'apple', 2, 'It is sweet' union all

    select 2, 'apple', 3, 'it is colored' union all

    select 2, 'orange', 1, 'It is yellow' union all

    select 2, 'orange', 2, 'It is sweet and sour' union all

    select 2, 'lemon', 1, 'It is green' union all

    select 2, 'pear', 1, 'it is a fruit' union all

    select 2, 'lime', 1, 'It is green' union all

    select 3, 'apple', 1, 'It is round' union all

    select 3, 'apple', 2, 'It is sweet' union all

    select 3, 'apple', 3, 'it is colored' union all

    select 3, 'orange', 1, 'It is yellow' union all

    select 3, 'orange', 2, 'It is sweet and sour' union all

    select 3, 'lemon', 1, 'It is yelow' union all

    select 3, 'berry', 1, 'It is small' union all

    select 3, 'nut', 1, 'hard to crack' union all

    select 3, 'pear', 1, 'it is vegetable'

    ;with descriptionException as

    ( select

    name,

    sequence,

    min(description) as min_description,

    max(description) as max_description

    from @fruit b

    group by name, sequence

    having min(description) <> max(description)

    )

    select distinct

    a.name

    from @fruit a

    where not exists

    ( select 0 from descriptionException b

    where a.name = b.name

    )

    /* ------- Sample Output: --------

    name

    ---------

    apple

    berry

    lime

    nut

    */

    Note that I added in the lime case because I was not sure what to do with a case like this. The LIME case has the same SEQ=1 entry in both regions 1 and 2; however, there is a SEQ=2 entry in region 1 that is missing in region 2. This code needs some adjustment if the LIME case should not be included.

  • Thanks for looking into this odd request.

    Lime is also not acceptable.

    Actually the definitions are later to be concatenated - so each region must have same number of lines for the definition.

    BTW - I had come up w a solution also - which again is not handling the LIME case.

    drop table #fruit

    select distinct name, sequence, description into #fruit from fruit

    --These are to be excluded the remaining I can keep

    select name, sequence, count(*)

    from #fruit

    group by name, sequence

    having count(*) > 1

    This also does not exclude LIME.

  • I'm not sure I fully understand what you are after here, but here's a go using Kent's sample data:

    SELECT f1.*

    FROM @Fruit f1

    LEFT JOIN (

    SELECT f1.Region, f1.Name, f1.Sequence

    FROM @Fruit f1

    INNER JOIN @Fruit f2

    ON f1.Name = f2.Name AND f1.Sequence = f2.Sequence

    WHERE f1.Description <> f2.Description

    ) f2

    ON f1.Region = f2.Region AND f1.Name = f2.Name AND f1.Sequence = f2.Sequence

    WHERE f2.Name IS NULL

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Try thisSELECTf1.Name

    FROM@Fruit AS f1

    LEFT JOIN@Fruit AS f2 ON f2.Name = f1.Name

    AND f2.Sequence = f1.Sequence

    AND f2.Description = f1.Description

    AND f2.Region <> f1.Region

    GROUP BYf1.Name

    HAVINGSUM(CASE WHEN f2.Name IS NULL THEN 1 ELSE 0 END) IN (0, COUNT(*))

    ORDER BYf1.Name


    N 56°04'39.16"
    E 12°55'05.25"

  • This version avoids the self join but I am not sure if it is as good:

    select

    name

    from

    ( select

    name,

    sequence,

    count(*) as sequence_count,

    case when max(description) = min(description)

    then 0 else 1 end as same_description

    from @fruit

    group by

    name,

    sequence

    ) x

    group by name

    having max(same_description) = 0

    and max(sequence_count) = min(sequence_count)

  • This one from Kent is cool - thanks a lot.

  • If I change this line

    select 3, 'nut', 1, 'hard to crack' union all

    to

    select 3, 'nut', 2, 'hard to crack' union all

    it fails too...


    N 56°04'39.16"
    E 12°55'05.25"

  • Try this

    SELECTd.Name

    FROM(

    SELECTf1.Name,

    MIN(f1.Region) AS minRegion,

    MAX(f1.Region) AS maxRegion,

    SUM(CASE WHEN f2.Name IS NULL THEN 1 ELSE 0 END) AS Nulls,

    COUNT(*) AS Items

    FROM@Fruit AS f1

    LEFT JOIN@Fruit AS f2 ON f2.Name = f1.Name

    AND f2.Sequence = f1.Sequence

    AND f2.Description = f1.Description

    AND f2.Region <> f1.Region

    GROUP BYf1.Name

    ) AS d

    WHEREd.Nulls = 0

    OR (d.Nulls = d.Items AND d.minRegion = d.maxRegion)

    ORDER BYd.Name


    N 56°04'39.16"
    E 12°55'05.25"

  • This might eliminate the "different nut" probably; however, as before I am still not sure that Peso's pont isn't valid and you might be forced to perform the self-join. However, here is yet another try at avoiding the self-join:

    select

    name

    from

    ( select

    name,

    sequence,

    min(region) as min_region,

    max(region) as max_region,

    count(*) as sequence_count,

    case when max(description) = min(description)

    then 0 else 1 end as same_description

    from @fruit

    group by

    name,

    sequence

    ) x

    group by name

    having max(same_description) = 0

    and max(sequence_count) = min(sequence_count)

    and max(min_region) = min(min_region)

    and max(max_region) = max(max_region)

    You might be best off just taking the self join; this code is looking a bit mangy.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply