Combining rows in table with 3m rows

  • I have a table with 3m rows and have 27k dups due to removal of a key field (no longer needed or used).  I need to roll up the dups, not just delete them.  I was going to select into temp and group to get unique records without losing summed data and then del from main table and insert unique records.  This is taking forever.  Any other ideas?  I thought it would be just as bad to find them and then loop through updates and deletes, but am not sure now.

    Dups today looks like this:

    Susan, 10

    Susan, 12

    Needs to look like this when I'm done:

    Susan, 22

  • 1. Move duplicates to temp tables.

    2. Delete values from base tables.

    3. Insert from temp tables into base tables and use SUM and GROUP BY to perform your roll up.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Are you inserting all 3 million records into a temp table?

    If so, how about inserting ONLY records into a temp table having count(*) > 1 (after the grouping on the main table of course)?

    E.g. Susan, 22 from your example

    select someName, sum(age) into myTempTable

    from mainTable

    group by someName having count(*) > 1

  • I just started trying that (always think of an answer right after I ask for help) so thanks for confirming I'm on the right track.  Much appreciated!

  • Unless I'm missing something, everything about the dupe rows is duped except for the column that you want summed. If that's the case, why not just SELECT INTO a new table, summing the one column, and grouping by the rest of the columns. Unless you have a ton of things tied to this table, such as low level permissions, triggers, indexes, etc.

    To use your example:

    SELECT

      NameField

     ,Sum(NumberField)

    INTO

     NewTableName

    FROM

     OldTableName

    GROUP BY

     NameField

    You'll obviously need to replace your old table with your new one, apply any indexes, triggers, permissions that were affected, but it would do the trick quickly.

    If you go down this path, save the old table until you have fully verified the new one.

  • -- It may be a good Idea to create an index on the Grouping/joining Column

    -- create index idx_someName on SomeTame(someName)

    -- go

    select someName, sum(age) X

    into #tmp

    from mainTable

    group by someName

    having count(*) > 1

     

    delete d

    from SomeTable d

         join

         #Tmp t on d.someName = t.someName

    insert into someTable( SomeName, age )

    select somename, X

    from #Tmp

    drop table #tmp

    -- Drop the index if you created it

    -- drop index SomeTable.idx_someName

    -- go


    * Noel

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply