Distinct Query Problem

Question

Distinct Query Problem

sc-w

Hall of Fame

Points: 3664
More actions
April 19, 2009 at 1:02 am

#130711

Hi,
I am running a distinct query that was working fine. as soon as i added a new field(ilr.unique_ilr_id,)to be displayed in the reults the distinct part stops working and i get all the rows returned (8000) instead of the 4000 i need.
Could someone cast there eye over my query or tell me why sql handles it differently with the new field, the new field is an auto incremtal numeric 18,0 but i tried changing that to a varchar(30) and still got the same result.
Thanks for any help.
SELECT
DISTINCT ilr.person_code,
ilr.unique_ilr_id,
ilr.student_name,
ilr.forename,
ilr.surname,
ilr.middle_names,
ilr.person_code,
ilr.unique_learn_no,
ilr.sex,
ilr.date_of_birth,
ilr.ni_number,
ilr.student_address_1,
ilr.student_address_2,
ilr.student_address_3,
ilr.student_address_4,
ilr.student_Pcode_1,
ilr.student_Pcode_2,
ilr.telephone,
ilr.telephone_line_2,
ilr.email_address,
ilr.mobile_phone_number,
ilr.disability,
ilr.ethnicity
FROM wce_ilr ilr

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Adi Cohn-120898 SSC-Dedicated Points: 33944 More actions · Answer 1

Distinct works on all the columns in the select clause. Most chances are that the column that you added has a unique constraint (or a primary key constraint) and each value in this column is unique. Since each record has a unique value in this column, the distinct will return all the records. You could work with min or max function and get the min/max value of ilr.unique_ilr_id and group it by the rest of the columns.

Adi

--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Atif-ullah Sheikh SSChampion Points: 12495 More actions · Answer 2

, the new field is an auto incremtal numeric 18,0 but i tried changing that to a varchar(30) and still got the same result

You are admiting yourself that the newly added column has a unique value for each row. So the DISTINCT will return all the records, as thay all are distinct.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Sometimes, winning is not an issue but trying.
You can check my BLOG [font="Arial Black"]here[/font][/url][/right]

sc-w Hall of Fame Points: 3664 More actions · Answer 3

Can you not just distinct on one column only then display the rest of the columns based on the results for the distinct column person_code?

Seems crazy i can't get the results i need.

Atif-ullah Sheikh SSChampion Points: 12495 More actions · Answer 4

The distinct Applies to all the Columns in the Select List.

Can you not just distinct on one column

In case of "One column", on which column are you trying to apply Distinct Clause?

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Sometimes, winning is not an issue but trying.
You can check my BLOG [font="Arial Black"]here[/font][/url][/right]

sc-w Hall of Fame Points: 3664 More actions · Answer 5

I am applying it to DISTINCT ilr.person_code but i would need to have access to all the other columns related to the results that the distinct defines. Maybe there is another function i can use?

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

sward (4/20/2009)
I am applying it to DISTINCT ilr.person_code but i would need to have access to all the other columns related to the results that the distinct defines. Maybe there is another function i can use?

You will need to consider the logic of this first. As Adi points out, you can return the results you want by applying MAX or MIN to unique_ilr_id and using GROUP BY instead of distinct - but which of these might make sense? The results of your original query might return one row per person, when you include the new column you may get two or more rows per person, each with a different unique_ilr_id. Which of those rows do you want to return, and why?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Atif-ullah Sheikh SSChampion Points: 12495 More actions · Answer 7

Try This...

with wcte as (

Select ilr.person_code,

ilr.unique_ilr_id,

ilr.student_name,

ilr.forename,

ilr.surname,

ilr.middle_names,

ilr.person_code,

ilr.unique_learn_no,

ilr.sex,

ilr.date_of_birth,

ilr.ni_number,

ilr.student_address_1,

ilr.student_address_2,

ilr.student_address_3,

ilr.student_address_4,

ilr.student_Pcode_1,

ilr.student_Pcode_2,

ilr.telephone,

ilr.telephone_line_2,

ilr.email_address,

ilr.mobile_phone_number,

ilr.disability,

ilr.ethnicity,

RNO as Row_Number() Over (Partition By ilr.person_code Order By ilr.person_code)

FROM wce_ilr ilr

) Select * from wcte where RNO = 1

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Sometimes, winning is not an issue but trying.
You can check my BLOG [font="Arial Black"]here[/font][/url][/right]

hayzer Say Hey Kid Points: 713 More actions · Answer 8

sward (4/19/2009)
Hi,
I am running a distinct query that was working fine. as soon as i added a new field(ilr.unique_ilr_id,)to be displayed in the reults the distinct part stops working and i get all the rows returned (8000) instead of the 4000 i need.
Could someone cast there eye over my query or tell me why sql handles it differently with the new field, the new field is an auto incremtal numeric 18,0 but i tried changing that to a varchar(30) and still got the same result.
Thanks for any help.
SELECT
DISTINCT ilr.person_code,
ilr.unique_ilr_id,
ilr.student_name,
ilr.forename,
ilr.surname,
ilr.middle_names,
ilr.person_code,
ilr.unique_learn_no,
ilr.sex,
ilr.date_of_birth,
ilr.ni_number,
ilr.student_address_1,
ilr.student_address_2,
ilr.student_address_3,
ilr.student_address_4,
ilr.student_Pcode_1,
ilr.student_Pcode_2,
ilr.telephone,
ilr.telephone_line_2,
ilr.email_address,
ilr.mobile_phone_number,
ilr.disability,
ilr.ethnicity
FROM wce_ilr ilr

I assume that your data contains distinct 4000 records and then you inserted again the original data causing duplicates.. and now you are having 8000 records.

Now, you want to query the original record of 4000..

Running the query with distinct will show you the record of 4000..

That is because "Distinct" removes duplicate rows in a result set.

But if you added additional field and as you have stated, you created an incremental int data type, making all the rows unique.

So it will return all the records.

I think you need to rephrase your question...

Let me ask you one thing.. Why do you want to see the distinct data? Or is it mabe you want to delete the duplicated rows?

BR

Hayzer

Atif-ullah Sheikh SSChampion Points: 12495 More actions · Answer 9

I think repharasing the question would be a better thing to do. I have given you a query, but still i am not sure about the requirement.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Sometimes, winning is not an issue but trying.
You can check my BLOG [font="Arial Black"]here[/font][/url][/right]

Paul White SSC Guru Points: 150457 More actions · Answer 10

sward (4/20/2009)
Can you not just distinct on one column only then display the rest of the columns based on the results for the distinct column person_code?
Seems crazy i can't get the results i need.

Hey sward,

If you think about it, it would be 'crazy' if SQL Server did what you are asking. Perhaps an example will help:

CREATE TABLE dbo.Test (A INT NOT NULL, B INT NOT NULL)

-- Note the two duplicate rows

INSERTdbo.Test VALUES (1, 4);

INSERTdbo.Test VALUES (2, 12);

INSERTdbo.Test VALUES (1, 4);

INSERTdbo.Test VALUES (3, -2);

INSERTdbo.Test VALUES (4, 0);

SELECT*

FROMdbo.Test

A B

----------- -----------

1 4

2 12

1 4

3 -2

4 0

(5 row(s) affected)

-- This will eliminate the duplicates

SELECTDISTINCT A, B

FROMdbo.Test

A B

----------- -----------

1 4

2 12

3 -2

4 0

(4 row(s) affected)

-- Now we add a third column, with a unique value in each row

ALTER TABLE dbo.Test ADD C INT IDENTITY(1,1) NOT NULL

SELECT*

FROMdbo.Test

A B C

----------- ----------- -----------

1 4 1

2 12 2

1 4 3

3 -2 4

4 0 5

(5 row(s) affected)

-- Returns five rows because column C has made each row (as a whole) unique

SELECTDISTINCT A, B, C

FROMdbo.Test

A B C

----------- ----------- -----------

1 4 1

1 4 3

2 12 2

3 -2 4

4 0 5

(5 row(s) affected)

If you wanted to return the DISTINCT values for columns A and B only, that would give you four rows, as before.

You seem to be asking SQL Server to grab the four rows that are distinct on A and B, and then 'tack on' the value for C.

Seems sort of reasonable, but what value for C would you return for the row where A = 1 and B = 4?

Look above. There are two values of C (1 and 3) that go with the 'distinct' values of A = 1 and B = 4.

This is why you are not getting the results you expect. SQL Server cannot choose between the values for C, so it has to return both. It is perfectly logical, and simply down to the data. If you want SQL Server to return a single row where A = 1 and B = 4, you have to tell it which value of C to use. For this you need an aggregate like MIN, MAX, SUM, AVG or whatever on C. This ensures that only one value of C is returned for the A = 1 and B = 4 combination.

Sadly, that is now a different query, and can't be done with DISTINCT. You would need a GROUP BY:

SELECTA, B, AVG(C) AS C

FROMdbo.Test

GROUPBY

A, B

A B C

----------- ----------- -----------

3 -2 4

4 0 5

1 4 2

2 12 2

(4 row(s) affected)

I hope that helps you understand the issue a bit more clearly.

Cheers,

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

sc-w Hall of Fame Points: 3664 More actions · Answer 11

Hi, and thanks Atif & Paul for your detailed input, Paul i posted this reply before i saw what you had wrote, great explanation. I do see where your coming from and i think becausei have never used other aggregates like MIN, MAX, SUM, AVG it's hard to understand. I will now read up on those and see if i can apply them to my problem. I will leave my post here as i realise now my initial post was probably not as detailed as it should hav been. Great feed back and thanks again.

I have a table (wce_ilr) with 8000 rows, each row contains student and employer data. I need to insert this information into a table called wce_contact, seperatley, so i needed a 1 row inserted for each student and one row for each employer, the student data i have got inserted into wce_contact because each learner had a (person_code) so the distinct query was fine.

The company data is more difficult, one employer will be assigned to many students so they appear multiple times, i need to select one row where an employer exists and insert that into the wce_contact table but only once.

However, i need to insert one other field called (edrs) this field holds a unqiue number for a employer but will exist multiple times which then creates duplicate employer records when added to the query below.

Here is the finished query but i can't use it becuse of the duplicates, if i remove edrs it is ok. Below this i have given some example data.

INSERT INTO wce_contact (

uniqueid,

address1,

address2,

address3,

address4,

postcode1,

postcode2,

edrs

)

SELECT DISTINCT ilr.postcode,

ilr.Emp_Address_1,

ilr.Emp_Address_2,

ilr.Emp_Address_3,

ilr.Emp_Address_4,

ilr.Emp_Address_Pcode1,

ilr.Emp_Address_Pcode2,

ilr.EDRS

FROM wce_ilr AS ilr

column headings:

results:

AL2 3PQ-Abdi Kutiye-The Bluebells-Station Street-Bricket Wood-NULL-AL2-3PQ-901376094

AL2 3PQ-Abdi Osman-The Bluebells-Station Street-Bricket Wood-NULL-AL2-3PQ-901376094

AL2 3PQ-Abdillahi Aden-The Bluebells-Station Street-Bricket-WoodNULL-AL2-3PQ-901376094

AL2 3PQ-Abikarim Mohamed-The Bluebells-Station Street-Bricket Wood-NULL-AL2-3PQ-901376094

AL2 3PQ-Abiudun Lawal-The Bluebells-Station Street-Bricket WoodNULL-AL2-3PQ-901376094

CM19 5TX-Adam Jacobs-Flex Meadows-Pinnacles-NULL-NULL-CM19-5TX-168450720

NG11 8LU-Adam Kemp-FARNBOROUGH RD-CLIFTON-NULL-NULL-NG11-8LU-100232043

OX11 7HR-Adam Lipowski-Hawksworth Road-Southmead Industrial Estate-NULL-NULL-OX11-7HR101208731

The first 5 results i would only want to insert one row where as the 3 under it i would want to insert all of them.

Sorry for the delayed response and i hope this is clear, please let meknow if you have any questions.

sc-w Hall of Fame Points: 3664 More actions · Answer 12

Hi Again,

I have looked at the Min, max, avg etc but they appear to all be related to columns that are numbers, where as the fields i am using are mainly varchar(). Does that mean i might be hitting abrick wall?

Thanks

S

Wesley Norton SSCrazy Points: 2797 More actions · Answer 13

I'm figuring that you already inserted the individual student data since you mentioned:

the student data i have got inserted into wce_contact

Looking at these results and the columns you are going to insert into, it appears all you would need to do is remove 'student_name' from the select statement. Perhaps I do not understand what you are asking for, though.