Grouping/Partitioning data

  • I have dataset that consists of 3 fields. Package number, IC_CODE and a description

    I am trying to write a query that spilts/partitions data into groups based on the package number and the outputs data where for each package the data is the same.

    I have provided some test data

    CREATE TABLE #TEST

    (

    PACKAGE INT

    ,CODE VARCHAR(20)

    ,[DESCRIPTION] VARCHAR(20)

    )

    INSERT INTO #TEST

    SELECT 10000,'CSAC','CASING'UNION ALL

    SELECT 10000,'BOXC','BOXING'UNION ALL

    SELECT 10000,'CSAC','CASING' UNION ALL

    SELECT 12000,'BOXC','BOXING'UNION ALL

    SELECT 12000,'BOXC','BOXING'UNION ALL

    SELECT 12000,'BOXC','BOXING' UNION ALL

    SELECT 13000,'BASK','BASKET'UNION ALL

    SELECT 13000,'CSAC','CASING'UNION ALL

    SELECT 13000,'BOXC','BOXING'

    Based on th above data the output should be

    Package CODE DESCRIPTION

    ______________________________

    12000 BOXC BOXING

    12000 BOXC BOXING

    12000 BOXC BOXING

  • eseosaoregie (1/19/2012)


    Based on th above data the output should be

    Package CODE DESCRIPTION

    ______________________________

    12000 BOXC BOXING

    12000 BOXC BOXING

    12000 BOXC BOXING

    Can you doublecheck your expected results? This doesn't seem right.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • This output is correct because package 120000 has only code associated with it. The other packages have 2 or more different codes. The output is required to be the packages which within them have the same codes

  • Based on the information you gave and the question you asked...

    SELECT PACKAGE,CODE,[DESCRIPTION]

    FROM #TEST

    WHERE PACKAGE = 12000 AND CODE = 'BOXC' AND [DESCRIPTION] = 'BOXING'

    So, if there is more to this, please give ALL of the expected results.

    Jared
    CE - Microsoft

  • eseosaoregie (1/19/2012)


    This output is correct because package 120000 has only code associated with it. The other packages have 2 or more different codes. The output is required to be the packages which within them have the same codes

    Ah, now I understand. Sorry, first time I looked at it didn't make sense.

    Jared, seriously? It's sample data, and that's obviously not the correct algorithm for a full data set.

    Gimme a bit on the final solution, thanks for the sample dataset. General principal: Group By on all involved columns, pull having HAVING COUNT(*) > 1, and then using that as a sub-query to re-connect to the primary data to pull back all rows having that count > 1.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • In the example I gave there are only 3 different packages. However what if there were 2000 packages. I like to have a generic query that will select all the packages with only one code associated with it.

  • Evil Kraig F (1/19/2012)


    eseosaoregie (1/19/2012)


    This output is correct because package 120000 has only code associated with it. The other packages have 2 or more different codes. The output is required to be the packages which within them have the same codes

    Ah, now I understand. Sorry, first time I looked at it didn't make sense.

    Jared, seriously? It's sample data, and that's obviously not the correct algorithm for a full data set.

    Gimme a bit on the final solution, thanks for the sample dataset. General principal: Group By on all involved columns, pull having HAVING COUNT(*) > 1, and then using that as a sub-query to re-connect to the primary data to pull back all rows having that count > 1.

    I just didn't know what was being asked... I could have been nicer about it though. 🙂

    So, are we expecting a parameter to filter the grouped results? Are there any aggregates or counts?

    Jared
    CE - Microsoft

  • eseosaoregie (1/19/2012)


    In the example I gave there are only 3 different packages. However what if there were 2000 packages. I like to have a generic query that will select all the packages with only one code associated with it.

    I'm sorry, I'm still confused. With the data you gave us and the statement above, you just want to filter on a given code?

    Jared
    CE - Microsoft

  • 1 more guess from me... Are you trying to identify duplicates?

    Jared
    CE - Microsoft

  • Had to make a little adjustment to the sample script, it didn't copy/paste well for me.

    However, remembering now why I questioned the result set, there's two 'multiples' of items. CSAC Casing in 10000 and BOXC Boxing in 12000.

    The following code will show you the algorithm and the insert into script I rebuilt to get CRLFs to behave properly. (EDIT: which still won't behave for me when I copy/paste my own code back into my own window. Blasted wierd...)

    IF OBJECT_ID('tempdb..#TEST') IS NOT NULL

    DROP TABLE #TEST

    CREATE TABLE #TEST(

    PACKAGE INT,

    CODE VARCHAR(20),

    [DESCRIPTION] VARCHAR(20))

    INSERT INTO #TEST

    SELECT 10000,'CSAC','CASING' UNION ALL

    SELECT 10000,'BOXC','BOXING' UNION ALL

    SELECT 10000,'CSAC','CASING' UNION ALL

    SELECT 12000,'BOXC','BOXING' UNION ALL

    SELECT 12000,'BOXC','BOXING' UNION ALL

    SELECT 12000,'BOXC','BOXING' UNION ALL

    SELECT 13000,'BASK','BASKET' UNION ALL

    SELECT 13000,'CSAC','CASING' UNION ALL

    SELECT 13000,'BOXC','BOXING'

    ;WITH cte AS

    (SELECT

    package, code, [description], count(*) AS cnt

    FROM

    #test

    GROUP BY

    package, code, [description]

    HAVING

    COUNT(*) > 1

    )

    --SELECT * FROM cte

    SELECT

    t.*

    FROM

    #TEST AS t

    JOIN

    cte

    ONt.package = cte.package

    and t.code = cte.code

    and t.[description] = cte.[description]


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • CREATE TABLE #TEST

    (

    PACKAGE INT

    ,CODE VARCHAR(20)

    ,[DESCRIPTION] VARCHAR(20)

    )

    INSERT INTO #TEST

    SELECT 10000,'CSAC','CASING'UNION ALL

    SELECT 10000,'BOXC','BOXING'UNION ALL

    SELECT 10000,'CSAC','CASING' UNION ALL

    SELECT 12000,'BOXC','BOXING'UNION ALL

    SELECT 12000,'BOXC','BOXING'UNION ALL

    SELECT 12000,'BOXC','BOXING' UNION ALL

    SELECT 13000,'BASK','BASKET'UNION ALL

    SELECT 13000,'CSAC','CASING'UNION ALL

    SELECT 13000,'BOXC','BOXING'

    ;

    WITH RowNumbers AS

    (

    SELECT PACKAGE, CODE, [DESCRIPTION],

    ROW_NUMBER() OVER (PARTITION BY PACKAGE, CODE, [DESCRIPTION] ORDER BY PACKAGE, CODE, [DESCRIPTION]) AS rn

    FROM #test

    )

    SELECT PACKAGE, CODE, [DESCRIPTION]

    FROM RowNumbers WHERE rn > 1

    Something like this?

    EDIT: This is wrong! I'm sorry... it has been a long day. This will give you duplicates leaving 1 if you need to remove duplicates.

    Jared
    CE - Microsoft

  • thanks, this is what i wanted. Apologies for my poor explanations at the beginning

  • eseosaoregie (1/19/2012)


    thanks, this is what i wanted. Apologies for my poor explanations at the beginning

    No worries, it happens. I wasn't explaining myself well either. Happy to help, good luck. 🙂


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply