How do I use a checksum

  •  

    Morning Campers,

    I have two tables src_monthly_terrrier and src_weekly_terrier. Both of these tables consists of 10+ columns. As the table names probably suggest, I import weekly data into one and monthly data into another.

    All the source data comes from an Excel spreadsheet via straight Import Data procedure. The only guaranteed change on a weekly and monthly basis is that one of the columns in each table named src_date will obviously have the data value for whichever month or week's data it relates to.

    I understand that through 'SQL Server Business Intelligence Development Studio' I can create an 'Intergrated Services' package that will import the spreadsheet details for me. I might be going the long way around this, but it was my intention to bring in all the data and then run a couple of 'INSERT INTO' Stored Procedures.

    My biggest issue / vunerability I have is that there is no error checking of the data on the way in to ensure that it has not already been imported. What I was thinking I could do to resolve this was to create a Checksum field comprising of a number of different columns (incl src_date) and then somehow write something that will look at the values of each intended imported row and then work out whether a duplicate checksum was found in the target table and then rejected the import routine as Duplicate Data Found (or something similar) and move onto the next stored procedure.

    My problem is two fold, one I have no idea how to create said checksum and two no idea where to begin on coding a procedure etc that looks to see if the value already exists etc etc.

    I have looked up checksum creation on the net and there appears to be plenty of resource to explain how to create one, so I guess my main question is, Where do I start when it comes to writing some code that will do the check of the checksum before the importation routine begins (or at least the Insert Into procedures.

    I would truly appreciate anyone's help on this. In the meanwhile I am off to learn how to create them.

    I would like to add, if anyone sees this as a bad idea, then please speak up.

    Thanks in Advance

    Tonic

  • The problem with using CHECKSUM is the possibility of having the same hash value for different data and therefore you would have to add secondary condition(s) to avoid this, therefore you may just as well use the matching conditions only. The only benefit in this case of using CHECKSUM (plus the secondary conditions) is to index it to improve performance when using long strings.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • OK, I hear what you are saying, but if I were to use the checksum across say three or four columns then would that not reduce the likely hood? Bearing in mind that it is guaranteed that the src_date will change week on week/ month on month which would then create a different checksum value wouldn't it? Or does it not work like that?

    If this is a no go as you have suggested, could you point me in the direction of learning how to do Matchning criteria queries etc?

    Regards

     

  • There is still a chance that you can get the same checksum value even if the dates were different, although a small chance. If it occurs then you have duplicates or if the destination table is constrained then a failure.

    I do this type of data processing all the time in many of my systems.

    What you need to left outer join the monthly table to the weekly table using columns to uniquely identify duplicates and check for nulls, eg

    DECLARE @day TABLE (Col1 int, Col2 int, Col3 int)

    DECLARE @month TABLE (Col1 int, Col2 int, Col3 int)

    INSERT INTO @month VALUES(4, 5, 6)

    INSERT INTO @day VALUES(1, 2, 3)

    INSERT INTO @day VALUES(4, 5, 6)

    INSERT INTO @day VALUES(7, 8, 9)

    INSERT INTO @month (Col1, Col2, Col3)

    SELECT d.Col1, d.Col2, d.Col3

    FROM @day d

    LEFT OUTER JOIN @month m

    ON m.Col1 = d.Col1 AND m.Col2 = d.Col2 AND m.Col3 = d.Col3

    WHERE m.Col1 IS NULL

    Alternatively you could use NOT EXISTS in a WHERE clause which will achieve the same (both utilize a left join albeit different types)

    Far away is close at hand in the images of elsewhere.
    Anon.

  • i shall inwardly digest your post and see what I can come up with. Thank you for taking the time for me

    Regards

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply