Remove duplicates where 3 columns contain the same value

  • I want to remove duplicates where asciiname, countrycode and provinceid contain the same values.

    How can I do this?

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[cities_geonames](

    [id] [int] IDENTITY(1,1) NOT NULL,

    [geonameid] [float] NULL,

    [asciiname] [nvarchar](255) NULL,

    [country code] [nvarchar](255) NULL,

    [provinceid] [int] NOT NULL,

    CONSTRAINT [PK_cities_geonames] PRIMARY KEY CLUSTERED

    (

    [id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1067813,2743447,'Abelhal','PT',463);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (866673,2657842,'Aberchalder','GB',201);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1009434,2522470,'Acebuchal','ES',353);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1009427,2522446,'Aceuchal','ES',356);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (866616,2657756,'Achallader','GB',201);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1001366,2959625,'Achthal','DE',314);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1001367,2959626,'Achthal','DE',314);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1001368,2959627,'Achthal','DE',314);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1001237,2959363,'Affalterthal','DE',314);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (866546,2657642,'Aghalee','GB',202);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (654337,4179245,'Ahaluna','US',60);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1001008,2958936,'Aich halden','DE',315);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1001009,2958937,'Aich halden','DE',315);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1194072,6714269,'Air Halim Rambung','ID',551);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1000846,2958612,'Albrechtsthal','DE',312);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1000794,2958542,'Alexandrinenthal','DE',314);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (866459,2657476,'Allhallows','GB',203);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (999842,2956763,'Alten-thal','DE',310);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1000183,2957440,'Alten-thal','DE',310);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1000093,2957169,'Althaldensleben','DE',302);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (999918,2956888,'Altrosenthal','DE',312);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1099492,1651600,'Aluhaluh','ID',565);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1079889,736891,'Amigdhala','GR',513);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1079887,736889,'Amigdhalea','GR',513);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1079888,736890,'Amigdhalea','GR',513);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1076992,265176,'Amigdhalea','GR',511);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1076994,265178,'Amigdhalea','GR',502);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1076995,265179,'Amigdhalea','GR',502);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1076996,265180,'Amigdhalea','GR',512);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1076997,265181,'Amigdhalea','GR',509);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1076998,265182,'Amigdhalea','GR',509);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1076999,265183,'Amigdhalea','GR',509);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1077000,265184,'Amigdhalea','GR',504);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1076991,265175,'Amigdhaleai','GR',512);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1079886,736887,'Amigdhaleai','GR',513);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1079885,736886,'Amigdhaleon','GR',505);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1076988,265172,'Amigdhali','GR',511);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1076990,265174,'Amigdhali','GR',512);

    INSERT INTO cities_geonames (id,geonameid,asciiname,[country code],provinceid) VALUES (1076987,265168,'Amigdhalitsa','GR',509);

  • You can use ROW_NUMBER() to add a sequence number to detect duplicates.

    A basic example of it is here - http://sqlrambling.net/2013/11/04/row_number-basic-example/

  • Excellent job posting ddl and sample data. I think you deserve more than a link to basic article that discusses ROW_NUMBER but does nothing to explain how you might use it in your case.

    I took the liberty of changing this to a temp table. I also renamed country code to remove the space so it is less painful to code with. The last change I had to make was to remove the identity from id because your inserts had values for that column. FWIW, I would consider changing the name of your primary key to something more meaningful than id. Maybe something like CitiesGeonamesID or something to give you an idea of what id it is.

    CREATE TABLE [dbo].[#cities_geonames](

    [id] [int] NOT NULL,

    [geonameid] [float] NULL,

    [asciiname] [nvarchar](255) NULL,

    countrycode [nvarchar](255) NULL,

    [provinceid] [int] NOT NULL,

    CONSTRAINT [PK_#cities_geonames] PRIMARY KEY CLUSTERED

    (

    [id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1067813,2743447,'Abelhal','PT',463);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (866673,2657842,'Aberchalder','GB',201);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1009434,2522470,'Acebuchal','ES',353);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1009427,2522446,'Aceuchal','ES',356);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (866616,2657756,'Achallader','GB',201);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1001366,2959625,'Achthal','DE',314);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1001367,2959626,'Achthal','DE',314);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1001368,2959627,'Achthal','DE',314);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1001237,2959363,'Affalterthal','DE',314);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (866546,2657642,'Aghalee','GB',202);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (654337,4179245,'Ahaluna','US',60);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1001008,2958936,'Aich halden','DE',315);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1001009,2958937,'Aich halden','DE',315);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1194072,6714269,'Air Halim Rambung','ID',551);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1000846,2958612,'Albrechtsthal','DE',312);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1000794,2958542,'Alexandrinenthal','DE',314);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (866459,2657476,'Allhallows','GB',203);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (999842,2956763,'Alten-thal','DE',310);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1000183,2957440,'Alten-thal','DE',310);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1000093,2957169,'Althaldensleben','DE',302);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (999918,2956888,'Altrosenthal','DE',312);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1099492,1651600,'Aluhaluh','ID',565);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1079889,736891,'Amigdhala','GR',513);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1079887,736889,'Amigdhalea','GR',513);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1079888,736890,'Amigdhalea','GR',513);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1076992,265176,'Amigdhalea','GR',511);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1076994,265178,'Amigdhalea','GR',502);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1076995,265179,'Amigdhalea','GR',502);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1076996,265180,'Amigdhalea','GR',512);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1076997,265181,'Amigdhalea','GR',509);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1076998,265182,'Amigdhalea','GR',509);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1076999,265183,'Amigdhalea','GR',509);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1077000,265184,'Amigdhalea','GR',504);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1076991,265175,'Amigdhaleai','GR',512);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1079886,736887,'Amigdhaleai','GR',513);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1079885,736886,'Amigdhaleon','GR',505);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1076988,265172,'Amigdhali','GR',511);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1076990,265174,'Amigdhali','GR',512);

    INSERT INTO #cities_geonames (id,geonameid,asciiname,countrycode,provinceid) VALUES (1076987,265168,'Amigdhalitsa','GR',509);

    with cte as

    (

    select asciiname, countrycode, provinceid, ROW_NUMBER() over (partition by asciiname, countrycode, provinceid order by id) as RowNum

    from #cities_geonames

    )

    --you can delete directly from the cte

    delete

    from cte

    where RowNum > 1;

    --now we want to test our code to make sure it actually worked

    with MyTestCTE as

    (

    select asciiname, countrycode, provinceid, ROW_NUMBER() over (partition by asciiname, countrycode, provinceid order by id) as RowNum

    from #cities_geonames

    )

    select *

    from MyTestCTE

    where RowNum > 1

    drop table #cities_geonames

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thanks! Also for the coding feedback, will add that later 🙂

    But it turns out that I need one more thing: of all duplicates: keep the row where the column [population] has the highest value, how would I incorporate that into your statement? (I've updated table create statement and insert statements):

    GO

    CREATE TABLE [dbo].[cities_geonames](

    [id] [int] IDENTITY(1,1) NOT NULL,

    [geonameid] [float] NULL,

    [asciiname] [nvarchar](255) NULL,

    [countrycode] [nvarchar](255) NULL,

    [provinceid] [int] NOT NULL,

    [population] [int] NOT NULL,

    CONSTRAINT [PK_cities_geonames] PRIMARY KEY CLUSTERED

    (

    [id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2743447,'Abelhal','PT',463,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2657842,'Aberchalder','GB',201,30);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2522470,'Acebuchal','ES',353,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2522446,'Aceuchal','ES',356,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2657756,'Achallader','GB',201,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2959625,'Achthal','DE',314,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2959626,'Achthal','DE',314,10);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2959627,'Achthal','DE',314,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2959363,'Affalterthal','DE',314,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2657642,'Aghalee','GB',202,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (4179245,'Ahaluna','US',60,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2958936,'Aich halden','DE',315,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2958937,'Aich halden','DE',315,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (6714269,'Air Halim Rambung','ID',551,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2958612,'Albrechtsthal','DE',312,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2958542,'Alexandrinenthal','DE',314,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2657476,'Allhallows','GB',203,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2956763,'Alten-thal','DE',310,4000);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2957440,'Alten-thal','DE',310,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2957169,'Althaldensleben','DE',302,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (2956888,'Altrosenthal','DE',312,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (1651600,'Aluhaluh','ID',565,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (736891,'Amigdhala','GR',513,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (736889,'Amigdhalea','GR',513,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (736890,'Amigdhalea','GR',513,30);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (265176,'Amigdhalea','GR',511,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (265178,'Amigdhalea','GR',502,650);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (265179,'Amigdhalea','GR',502,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (265180,'Amigdhalea','GR',512,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (265181,'Amigdhalea','GR',509,560);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (265182,'Amigdhalea','GR',509,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (265183,'Amigdhalea','GR',509,0);

    INSERT INTO cities_geonames (geonameid,asciiname,[countrycode],provinceid,[population]) VALUES (265184,'Amigdhalea','GR',504,0);

  • petervdkerk (1/10/2014)


    Thanks! Also for the coding feedback, will add that later 🙂

    But it turns out that I need one more thing: of all duplicates: keep the row where the column [population] has the highest value, how would I incorporate that into your statement? (I've updated table create statement and insert statements):

    All you need to do here is modify the order by in the ROW_NUMBER so that the largest population will have RowNum = 1.

    ROW_NUMBER() over (partition by asciiname, countrycode, provinceid order by population desc) as RowNum

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thanks! All working now 🙂

  • Cool glad that is working you. Thanks for letting me know.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply