eliminating duplicates

Question

eliminating duplicates

Viewing 15 posts - 46 through 60 (of 137 total)

You must be logged in to reply to this topic. Login to reply

RBarryYoung SSC Guru Points: 143329 More actions · Answer 1

Jeff Moden (6/28/2008)
If you're going to spend that much time on it... write it as an article and submit it, instead. 😉

Heh. Now you tell me!?!

Seriously, though, I thought about it but writing in that way is very slow and difficult for me (it's a mild language processing disorder, apparently). Writing here is hard enough, but adapting to the form or an article, especially anything long, presents an almost insurmountable barrier to me...

Well, maybe. The explanation is already a lot longer than I expected. Maybe when I finish it I will send it to you and you can suggest what changes I should make for it to become an article?

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 2

rbarryyoung (6/28/2008)
Writing here is hard enough, but adapting to the form or an article, especially anything long, presents an almost insurmountable barrier to me...

Heh... nah... you have the knack and just don't know it... pretty you're writing one of the long winded forum explanations that you and I frequently do with and without each other. Then, go back and throw in a couple of "headers" that seem to isolate subjects. Then, make sure you don't have any duplicate headers by combining areas that appear to be alike. Check it for readability and "revelation" and you have a nice article.

Well, maybe. The explanation is already a lot longer than I expected. Maybe when I finish it I will send it to you and you can suggest what changes I should make for it to become an article?

See what I mean? :w00t: You're already starting to realize that an article is really just a longer explanation. 😉 Sure... Email your final draft (I prefer something I can load in Word 97... I think it'll handle all the way up to 2003 if you don't go nuts with features).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

RBarryYoung SSC Guru Points: 143329 More actions · Answer 3

OK, I sent you the draft. Heh, I didn't see your reply before I sent, hopefully the mess o' text I sent will be alright.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 4

rbarryyoung (6/28/2008)
OK, I sent you the draft. Heh, I didn't see your reply before I sent, hopefully the mess o' text I sent will be alright.

I gotta wonder how some of these spam filters work.... it went right straight to "bulk"... good thing you told me you sent it or I might have never seen it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

RBarryYoung SSC Guru Points: 143329 More actions · Answer 5

Hey, did you get a chance to look at that draft yet?

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 6

rbarryyoung (6/29/2008)
Hey, did you get a chance to look at that draft yet?

Barely... I did a quick scan and now I need to do a full read...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

miquelfrances Valued Member Points: 63 More actions · Answer 7

Hello,

I had read all responses, and I have to recognise that Barry's approach, is very interesting, and clever solution...

I have test the sql-steps but i found is a little bit confusing, it's no more clear do the following thing:

1. Insert into TAB

Select Name

, Age

, CHAR( (2*Count(*)) + (CASE Sex When 'M' Then 0 Else 1 End) )

From TAB

Group By Name, Age, Sex

/* That's barry's great approch... */

Delete from TAB

Where Sex IN('F','M')

/* Now remove non compressed values */

UPDATE TAB

SET Sex = CASE WHEN Ascii(Sex) % 2 = 0 Then 'M' Else 'F' END

/* Get sex orginal value */

I think this is a more clear, I'm missing something....?

Thank's in advance.

karthik M One Orange Chip Points: 29139 More actions · Answer 8

rbarryyoung,

I am eagerly expecting your article.

karthik

RBarryYoung SSC Guru Points: 143329 More actions · Answer 9

Thanks, folks. I'll try to get it done and in the queue by tonight. I'll be sure to post a link to it from here then.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 10

Nicking Barry's excellent idea (and Janines test data), here's another way which is quite fast:

DROP TABLE #Testing

CREATE TABLE #Testing (

FirstCol varChar(5),

SecondCol INT,

ThirdCol Char(1)

)

INSERT INTO #Testing (FirstCol, SecondCol, ThirdCol)

SELECT 'ABC',24,'M' UNION ALL --

SELECT 'ABC',24,'M' UNION ALL

SELECT 'DEF',24,'M' UNION ALL --

SELECT 'DEF',24,'F' UNION ALL

SELECT 'GHI',26,'F' UNION ALL --

SELECT 'GHI',26,'F' UNION ALL

SELECT 'LMN',27,'M' UNION ALL --

SELECT 'LMN',27,'M' UNION ALL

SELECT 'PQRS',25,'F' UNION ALL --

SELECT 'XYZ',24,'M' UNION ALL --

SELECT 'XYZ',25,'M' --

DECLARE @ItemID INT

DECLARE @FirstCol varChar(5), @SecondCol INT, @ThirdCol Char(1)

SET @ItemID = 1

UPDATE #Testing SET

ThirdCol = CHAR((@ItemID*2) + CASE ThirdCol WHEN 'M' THEN 1 ELSE 0 END),

@ItemID = CASE WHEN @FirstCol = FirstCol AND @SecondCol = SecondCol AND @ThirdCol = ThirdCol THEN @ItemID+1 ELSE 1 END,

@FirstCol = FirstCol, @SecondCol = SecondCol, @ThirdCol = ThirdCol

DELETE FROM #Testing WHERE ASCII(ThirdCol) = 2 OR ASCII(ThirdCol) = 3

UPDATE #Testing SET ThirdCol = CASE (ASCII(ThirdCol) & 1) When 1 Then 'M' Else 'F' End

SELECT * FROM #Testing

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

mlando.mngomezulu SSC Rookie Points: 26 More actions · Answer 11

Step1: You need to create a table containing only the duplicate records by first selecting all the duplicate records into a new table i.e.

Insert into temptable (Select * from tablename group by (Name,Age,Sex) having count (Name,Age,Sex) > 1

Step2: Delete all the duplicate records contained in your table name.

Step3: Insert the records back from the temptable to your tablename.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 12

mlando.mngomezulu (7/1/2008)
Step1: You need to create a table containing only the duplicate records by first selecting all the duplicate records into a new table i.e.
Insert into temptable (Select * from tablename group by (Name,Age,Sex) having count (Name,Age,Sex) > 1
Step2: Delete all the duplicate records contained in your table name.
Step3: Insert the records back from the temptable to your tablename.

This won't work.

Firstly, the OP states no intermediate tables are to be created.

Secondly, it's not a dedupe exercise - the aim is to eliminate all unique rows and one row from each group of dupes, leaving the rest. Interestingly, if you put the deleted rows into a new table, it would be a dedupe of the original table.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

mlando.mngomezulu SSC Rookie Points: 26 More actions · Answer 13

Actually the first step makes sure that the temptable only have one record per corresponding duplicate records. This is achieved by the "group by" clause. This is grouped by the combination of the troublesome fields i.e. ('Name','Age','Sex'). Hence when you insert the records back, your table will no longer have duplicate records.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 14

mlando.mngomezulu (7/1/2008)
Actually the first step makes sure that the temptable only have one record per corresponding duplicate records. This is achieved by the "group by" clause. This is grouped by the combination of the troublesome fields i.e. ('Name','Age','Sex'). Hence when you insert the records back, your table will no longer have duplicate records.

Yes of course - but actually it's not a deduping exercise. The aim is to remove all rows except those which would be removed by a dedupe procedure.

The finished table, after processing correctly according to the OP's criteria, will have dupes because only one row of each dupe group is to be removed. If there are three dupes of one row, then two will remain. It's the exact opposite of your suggestion.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 15

You can say it easier than even that...

Remove one of every type of row that can be uniquely identified even if there is only one.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions