Extracting data from multiple tables, avoiding duplicates.

  • Hi

    I currently have two tables called Book and JournalPaper, both of which have a column called Publisher. Currently the data in the Publisher column is the Publisher name that is entered straight into either table and has been duplicated in many cases. To tidy this up I have created a new table called Publisher where each entry will have a unique ID.

    I now want to remove the Publisher columns from Book and JournalPaper, replace it with an ID foreign key column and move the Publisher name data into the Publisher table. How do I construct the SQL statement to move the data without creating duplicates on the Publisher table as some publishers appear several times on both the Book and JournalPaper tables?

    Any help with this will be greatly appreciated as my limited SQL is not up to this particular challenge!!!

    Thanks!

  • Hi

     

    Can you not do something like this. If its a once off this should be fine

    INSERT INTO Publisher

    (PubName)

    SELECT Distinct(Publisher)

    FROM Books

    ---Then run this

    INSERT INTO Publisher

    (PubName)

    SELECT Distinct(Publisher)

    FROM JournalPaper

    WHERE Publisher not in (SELECT PubName from Publisher)

     

     

  • Try this, should be a lot more effective...

    Create table TblPublisher (Pub_ID int identity(1, 1) constraint PK_TblPublisher primary key(Pub_ID), Publisher nvarchar(255))

    go

    create unique index IX_TblPublisher on TblPublisher(Publisher) with ignore_dup_key on primary

    go

    insert into TblPublisher (Publisher) Select Distinct Publisher from Book order by Publisher

    insert into TblPublisher (Publisher) Select Distinct Publisher from JournalPaper order by Publisher

    go

    update a set a.Publisher = b.PubID from Book a inner join TblPublisher b on a.Publisher = b.Publisher

    update a set a.Publisher = b.PubID from Book a inner join TblPublisher b on a.Publisher = b.Publisher

    go

    alter table Book add foreign key (Book_Pub_ID) references TblPublisher(Pub_ID)

    go

    alter table JournalPaper add foreign key (JournalPaper_Pub_ID) references TblPublisher(Pub_ID)

    go

    <hr noshade size='1' width='250' color='#BBC8E5'>Kindest Regards,

    Roelof
    <a href='http://' class='authorlink' target='_blank'></a>

    --There are only 10 types of people in the world. Those who understand binary, and those who don't.

  • An alternative would be to UNION the publisher list queries, which also removes duplicates.  I often use IGNORE_DUP_KEY myself, but it is not standard SQL and you should at least be aware of a generic solution.

    INSERT INTO TblPublishers (Publisher)

    SELECT DISTINCT Publisher FROM Book

    UNION SELECT DISTINCT Publisher FROM JournalPaper

  • Thanks for your help,it's working now!

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply