Distinct Query Problem

  • Hi,

    I am running a distinct query that was working fine. as soon as i added a new field(ilr.unique_ilr_id,)to be displayed in the reults the distinct part stops working and i get all the rows returned (8000) instead of the 4000 i need.

    Could someone cast there eye over my query or tell me why sql handles it differently with the new field, the new field is an auto incremtal numeric 18,0 but i tried changing that to a varchar(30) and still got the same result.

    Thanks for any help.

    SELECT

    DISTINCT ilr.person_code,

    ilr.unique_ilr_id,

    ilr.student_name,

    ilr.forename,

    ilr.surname,

    ilr.middle_names,

    ilr.person_code,

    ilr.unique_learn_no,

    ilr.sex,

    ilr.date_of_birth,

    ilr.ni_number,

    ilr.student_address_1,

    ilr.student_address_2,

    ilr.student_address_3,

    ilr.student_address_4,

    ilr.student_Pcode_1,

    ilr.student_Pcode_2,

    ilr.telephone,

    ilr.telephone_line_2,

    ilr.email_address,

    ilr.mobile_phone_number,

    ilr.disability,

    ilr.ethnicity

    FROM wce_ilr ilr

  • Distinct works on all the columns in the select clause. Most chances are that the column that you added has a unique constraint (or a primary key constraint) and each value in this column is unique. Since each record has a unique value in this column, the distinct will return all the records. You could work with min or max function and get the min/max value of ilr.unique_ilr_id and group it by the rest of the columns.

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • , the new field is an auto incremtal numeric 18,0 but i tried changing that to a varchar(30) and still got the same result

    You are admiting yourself that the newly added column has a unique value for each row. So the DISTINCT will return all the records, as thay all are distinct.

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Sometimes, winning is not an issue but trying.
    You can check my BLOG
    [font="Arial Black"]here[/font][/url][/right]

  • Can you not just distinct on one column only then display the rest of the columns based on the results for the distinct column person_code?

    Seems crazy i can't get the results i need.

  • The distinct Applies to all the Columns in the Select List.

    Can you not just distinct on one column

    In case of "One column", on which column are you trying to apply Distinct Clause?

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Sometimes, winning is not an issue but trying.
    You can check my BLOG
    [font="Arial Black"]here[/font][/url][/right]

  • I am applying it to DISTINCT ilr.person_code but i would need to have access to all the other columns related to the results that the distinct defines. Maybe there is another function i can use?

  • sward (4/20/2009)


    I am applying it to DISTINCT ilr.person_code but i would need to have access to all the other columns related to the results that the distinct defines. Maybe there is another function i can use?

    You will need to consider the logic of this first. As Adi points out, you can return the results you want by applying MAX or MIN to unique_ilr_id and using GROUP BY instead of distinct - but which of these might make sense? The results of your original query might return one row per person, when you include the new column you may get two or more rows per person, each with a different unique_ilr_id. Which of those rows do you want to return, and why?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Try This...

    with wcte as (

    Select ilr.person_code,

    ilr.unique_ilr_id,

    ilr.student_name,

    ilr.forename,

    ilr.surname,

    ilr.middle_names,

    ilr.person_code,

    ilr.unique_learn_no,

    ilr.sex,

    ilr.date_of_birth,

    ilr.ni_number,

    ilr.student_address_1,

    ilr.student_address_2,

    ilr.student_address_3,

    ilr.student_address_4,

    ilr.student_Pcode_1,

    ilr.student_Pcode_2,

    ilr.telephone,

    ilr.telephone_line_2,

    ilr.email_address,

    ilr.mobile_phone_number,

    ilr.disability,

    ilr.ethnicity,

    RNO as Row_Number() Over (Partition By ilr.person_code Order By ilr.person_code)

    FROM wce_ilr ilr

    ) Select * from wcte where RNO = 1

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Sometimes, winning is not an issue but trying.
    You can check my BLOG
    [font="Arial Black"]here[/font][/url][/right]

  • sward (4/19/2009)


    Hi,

    I am running a distinct query that was working fine. as soon as i added a new field(ilr.unique_ilr_id,)to be displayed in the reults the distinct part stops working and i get all the rows returned (8000) instead of the 4000 i need.

    Could someone cast there eye over my query or tell me why sql handles it differently with the new field, the new field is an auto incremtal numeric 18,0 but i tried changing that to a varchar(30) and still got the same result.

    Thanks for any help.

    SELECT

    DISTINCT ilr.person_code,

    ilr.unique_ilr_id,

    ilr.student_name,

    ilr.forename,

    ilr.surname,

    ilr.middle_names,

    ilr.person_code,

    ilr.unique_learn_no,

    ilr.sex,

    ilr.date_of_birth,

    ilr.ni_number,

    ilr.student_address_1,

    ilr.student_address_2,

    ilr.student_address_3,

    ilr.student_address_4,

    ilr.student_Pcode_1,

    ilr.student_Pcode_2,

    ilr.telephone,

    ilr.telephone_line_2,

    ilr.email_address,

    ilr.mobile_phone_number,

    ilr.disability,

    ilr.ethnicity

    FROM wce_ilr ilr

    I assume that your data contains distinct 4000 records and then you inserted again the original data causing duplicates.. and now you are having 8000 records.

    Now, you want to query the original record of 4000..

    Running the query with distinct will show you the record of 4000..

    That is because "Distinct" removes duplicate rows in a result set.

    But if you added additional field and as you have stated, you created an incremental int data type, making all the rows unique.

    So it will return all the records.

    I think you need to rephrase your question...

    Let me ask you one thing.. Why do you want to see the distinct data? Or is it mabe you want to delete the duplicated rows?

    BR

    Hayzer

  • I think repharasing the question would be a better thing to do. I have given you a query, but still i am not sure about the requirement.

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Sometimes, winning is not an issue but trying.
    You can check my BLOG
    [font="Arial Black"]here[/font][/url][/right]

  • sward (4/20/2009)


    Can you not just distinct on one column only then display the rest of the columns based on the results for the distinct column person_code?

    Seems crazy i can't get the results i need.

    Hey sward,

    If you think about it, it would be 'crazy' if SQL Server did what you are asking. Perhaps an example will help:

    CREATE TABLE dbo.Test (A INT NOT NULL, B INT NOT NULL)

    -- Note the two duplicate rows

    INSERTdbo.Test VALUES (1, 4);

    INSERTdbo.Test VALUES (2, 12);

    INSERTdbo.Test VALUES (1, 4);

    INSERTdbo.Test VALUES (3, -2);

    INSERTdbo.Test VALUES (4, 0);

    SELECT*

    FROMdbo.Test

    A B

    ----------- -----------

    1 4

    2 12

    1 4

    3 -2

    4 0

    (5 row(s) affected)

    -- This will eliminate the duplicates

    SELECTDISTINCT A, B

    FROMdbo.Test

    A B

    ----------- -----------

    1 4

    2 12

    3 -2

    4 0

    (4 row(s) affected)

    -- Now we add a third column, with a unique value in each row

    ALTER TABLE dbo.Test ADD C INT IDENTITY(1,1) NOT NULL

    SELECT*

    FROMdbo.Test

    A B C

    ----------- ----------- -----------

    1 4 1

    2 12 2

    1 4 3

    3 -2 4

    4 0 5

    (5 row(s) affected)

    -- Returns five rows because column C has made each row (as a whole) unique

    SELECTDISTINCT A, B, C

    FROMdbo.Test

    A B C

    ----------- ----------- -----------

    1 4 1

    1 4 3

    2 12 2

    3 -2 4

    4 0 5

    (5 row(s) affected)

    If you wanted to return the DISTINCT values for columns A and B only, that would give you four rows, as before.

    You seem to be asking SQL Server to grab the four rows that are distinct on A and B, and then 'tack on' the value for C.

    Seems sort of reasonable, but what value for C would you return for the row where A = 1 and B = 4?

    Look above. There are two values of C (1 and 3) that go with the 'distinct' values of A = 1 and B = 4.

    This is why you are not getting the results you expect. SQL Server cannot choose between the values for C, so it has to return both. It is perfectly logical, and simply down to the data. If you want SQL Server to return a single row where A = 1 and B = 4, you have to tell it which value of C to use. For this you need an aggregate like MIN, MAX, SUM, AVG or whatever on C. This ensures that only one value of C is returned for the A = 1 and B = 4 combination.

    Sadly, that is now a different query, and can't be done with DISTINCT. You would need a GROUP BY:

    SELECTA, B, AVG(C) AS C

    FROMdbo.Test

    GROUPBY

    A, B

    A B C

    ----------- ----------- -----------

    3 -2 4

    4 0 5

    1 4 2

    2 12 2

    (4 row(s) affected)

    I hope that helps you understand the issue a bit more clearly.

    Cheers,

    Paul

  • Hi, and thanks Atif & Paul for your detailed input, Paul i posted this reply before i saw what you had wrote, great explanation. I do see where your coming from and i think becausei have never used other aggregates like MIN, MAX, SUM, AVG it's hard to understand. I will now read up on those and see if i can apply them to my problem. I will leave my post here as i realise now my initial post was probably not as detailed as it should hav been. Great feed back and thanks again.

    I have a table (wce_ilr) with 8000 rows, each row contains student and employer data. I need to insert this information into a table called wce_contact, seperatley, so i needed a 1 row inserted for each student and one row for each employer, the student data i have got inserted into wce_contact because each learner had a (person_code) so the distinct query was fine.

    The company data is more difficult, one employer will be assigned to many students so they appear multiple times, i need to select one row where an employer exists and insert that into the wce_contact table but only once.

    However, i need to insert one other field called (edrs) this field holds a unqiue number for a employer but will exist multiple times which then creates duplicate employer records when added to the query below.

    Here is the finished query but i can't use it becuse of the duplicates, if i remove edrs it is ok. Below this i have given some example data.

    INSERT INTO wce_contact (

    uniqueid,

    address1,

    address2,

    address3,

    address4,

    postcode1,

    postcode2,

    edrs

    )

    SELECT DISTINCT ilr.postcode,

    ilr.Emp_Address_1,

    ilr.Emp_Address_2,

    ilr.Emp_Address_3,

    ilr.Emp_Address_4,

    ilr.Emp_Address_Pcode1,

    ilr.Emp_Address_Pcode2,

    ilr.EDRS

    FROM wce_ilr AS ilr

    column headings:

    postcode|student_name|Emp_Address_1|Emp_Address_2|Emp_Address_3|Emp_Address_4|Emp_Address_pcode1|Emp_Address_pcode2|edrs

    results:

    AL2 3PQ-Abdi Kutiye-The Bluebells-Station Street-Bricket Wood-NULL-AL2-3PQ-901376094

    AL2 3PQ-Abdi Osman-The Bluebells-Station Street-Bricket Wood-NULL-AL2-3PQ-901376094

    AL2 3PQ-Abdillahi Aden-The Bluebells-Station Street-Bricket-WoodNULL-AL2-3PQ-901376094

    AL2 3PQ-Abikarim Mohamed-The Bluebells-Station Street-Bricket Wood-NULL-AL2-3PQ-901376094

    AL2 3PQ-Abiudun Lawal-The Bluebells-Station Street-Bricket WoodNULL-AL2-3PQ-901376094

    CM19 5TX-Adam Jacobs-Flex Meadows-Pinnacles-NULL-NULL-CM19-5TX-168450720

    NG11 8LU-Adam Kemp-FARNBOROUGH RD-CLIFTON-NULL-NULL-NG11-8LU-100232043

    OX11 7HR-Adam Lipowski-Hawksworth Road-Southmead Industrial Estate-NULL-NULL-OX11-7HR101208731

    The first 5 results i would only want to insert one row where as the 3 under it i would want to insert all of them.

    Sorry for the delayed response and i hope this is clear, please let meknow if you have any questions.

  • Hi Again,

    I have looked at the Min, max, avg etc but they appear to all be related to columns that are numbers, where as the fields i am using are mainly varchar(). Does that mean i might be hitting abrick wall?

    Thanks

    S

  • I'm figuring that you already inserted the individual student data since you mentioned:

    the student data i have got inserted into wce_contact

    Looking at these results and the columns you are going to insert into, it appears all you would need to do is remove 'student_name' from the select statement. Perhaps I do not understand what you are asking for, though.

    column headings:

    postcode|student_name|Emp_Address_1|Emp_Address_2|Emp_Address_3|Emp_Address_4|Emp_Address_pcode1|Emp_Address_pcode2|edrs

    results:

    AL2 3PQ-Abdi Kutiye-The Bluebells-Station Street-Bricket Wood-NULL-AL2-3PQ-901376094

    AL2 3PQ-Abdi Osman-The Bluebells-Station Street-Bricket Wood-NULL-AL2-3PQ-901376094

    AL2 3PQ-Abdillahi Aden-The Bluebells-Station Street-Bricket-WoodNULL-AL2-3PQ-901376094

    AL2 3PQ-Abikarim Mohamed-The Bluebells-Station Street-Bricket Wood-NULL-AL2-3PQ-901376094

    AL2 3PQ-Abiudun Lawal-The Bluebells-Station Street-Bricket WoodNULL-AL2-3PQ-901376094

    CM19 5TX-Adam Jacobs-Flex Meadows-Pinnacles-NULL-NULL-CM19-5TX-168450720

    NG11 8LU-Adam Kemp-FARNBOROUGH RD-CLIFTON-NULL-NULL-NG11-8LU-100232043

    OX11 7HR-Adam Lipowski-Hawksworth Road-Southmead Industrial Estate-NULL-NULL-OX11-7HR101208731

    Just noticed something else... Each one of these rows has a 'NULL' in it. Because null is "not defined" it will always be a unique value. It seems that all your rows have this. You may want to consider either throwing an 'ISNULL()' in there or not selecting that column at all. If the former is the case it may look something like:

    SELECT DISTINCT ilr.postcode,

    ilr.Emp_Address_1,

    ilr.Emp_Address_2,

    ilr.Emp_Address_3,

    ISNULL(ilr.Emp_Address_4,''),

    ilr.Emp_Address_Pcode1,

    ilr.Emp_Address_Pcode2,

    ilr.EDRS

    FROM wce_ilr AS ilr

  • Distinct does not apply to a single column -- it applies to all of the columns being selected.

    The data model may be suspect if you need to use distinct at all, but the solution Adi proposed will get you back the resultset you were originally getting. The question is why do you need the identity field returned in your result? What are you goping to do with it?

    SELECT

    DISTINCT ilr.person_code, --distinct is not just against this column

    ilr.unique_ilr_id,

    ilr.student_name,

    ilr.forename,

    ilr.surname,

    ilr.middle_names,

    ilr.person_code,

    ilr.unique_learn_no,

    ilr.sex,

    ilr.date_of_birth,

    ilr.ni_number,

    ilr.student_address_1,

    ilr.student_address_2,

    ilr.student_address_3,

    ilr.student_address_4,

    ilr.student_Pcode_1,

    ilr.student_Pcode_2,

    ilr.telephone,

    ilr.telephone_line_2,

    ilr.email_address,

    ilr.mobile_phone_number,

    ilr.disability,

    ilr.ethnicity

    FROM wce_ilr ilr

    --you can remove the identity column from the result or ..

    SELECT

    ilr.person_code,

    MAX(ilr.unique_ilr_id) as last_unique_ilr_id,

    ilr.student_name,

    ilr.forename,

    ilr.surname,

    ilr.middle_names,

    ilr.person_code,

    ilr.unique_learn_no,

    ilr.sex,

    ilr.date_of_birth,

    ilr.ni_number,

    ilr.student_address_1,

    ilr.student_address_2,

    ilr.student_address_3,

    ilr.student_address_4,

    ilr.student_Pcode_1,

    ilr.student_Pcode_2,

    ilr.telephone,

    ilr.telephone_line_2,

    ilr.email_address,

    ilr.mobile_phone_number,

    ilr.disability,

    ilr.ethnicity

    FROM wce_ilr ilr

    GROUP BY

    ilr.person_code,

    ilr.student_name,

    ilr.forename,

    ilr.surname,

    ilr.middle_names,

    ilr.person_code,

    ilr.unique_learn_no,

    ilr.sex,

    ilr.date_of_birth,

    ilr.ni_number,

    ilr.student_address_1,

    ilr.student_address_2,

    ilr.student_address_3,

    ilr.student_address_4,

    ilr.student_Pcode_1,

    ilr.student_Pcode_2,

    ilr.telephone,

    ilr.telephone_line_2,

    ilr.email_address,

    ilr.mobile_phone_number,

    ilr.disability,

    ilr.ethnicity

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply