columns unequal? (column might be null)

  • I am try to find unequal rows of two tables t1 and t2 on two different servers.

    To find where two columns are unequal I am currently using

    t1.Name<>t2.Name

    or t1.Name is null and t2.Name is not null 

    or t1.Name is not null and t2.Name is null

    If the columns may be null, does ti really need five expression in standard SQL to do this? Am I to stupid? Is there a special SQL-server function to do this? If a table has some 30 columns this will produce a lot of code...

    I know about SQL-Servers SET ANSI_NULLS OFF so we could write

    t1.Name<>t2.Name

    but this reduces the cases where this statement can be used. However, would it run faster if I do it this way?

    Thanks for any hints

    Andreas

  • I would use a where exists (or not exists) statement.

    Select [column] from

    a

    where exists (select * from

    b where a.id=b.id)

  • Andreas the code that you posted is exactly what is needed for differences on nullable columns you may skip the is null part for non nullable.

    Usually the fastest way to perform the comparison is to use

    CHECKSUM_AGG(BINARY_CHECKSUM(*))

    like:

    SELECT

    (SELECT   CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM t1)

     -

    (SELECT   CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM t2) /* 0 means OK*/

     


    * Noel

  • Neol, thanks for the brilliant answer!

    The BINARY_CHECKSUM(*) function was exactly what I needed! Let's hope it uses a good hash function so I will never miss any changed rows...

    BINARY_CHECKSUM(t1.ColX)<>BINARY_CHECKSUM(t2.ColX)

    also seems to do the same as my much longer unequal expression above 🙂

    Too bad this function does not directly support a syntax like

    WHERE BINARY_CHECKSUM(t1.*)<> BINARY_CHECKSUM(t2.*)

    (no tablename allowed, only * or list of column names)

    to generically compare all columns from two tables and list all different rows.

    So now I am using

    WHERE

    (select binary_checksum(*) from TableVersion1 where TableVersion1.ID=t1.ID)

    <>

    (select binary_checksum(*) from TableVersion2 where TableVersion2.ID=t2.ID)

    Any shortcut for this?

    Thanks

    Andreas

  • Re: Any shortcut

    This might be faster?

    SELECT ta.ID as ta_id, ta_check, tb.ID AS tb_id, tb_check

    FROM

      (SELECT ID, binary_checksum(*) AS ta_check FROM TableVersion1) AS ta

    INNER JOIN

      (SELECT ID, binary_checksum(*) AS tb_check FROM TableVersion2) AS tb

    ON ta.ID = tb.ID

    WHERE ta_check <> tb_check

    Alternative without the INNER JOIN notation:

    SELECT ta.ID as ta_id, ta_check, tb.ID AS tb_id, tb_check

    FROM

      (SELECT ID, binary_checksum(*) AS ta_check FROM TableVersion1) AS ta

    , (SELECT ID, binary_checksum(*) AS tb_check FROM TableVersion2) AS tb

    WHERE ta.ID = tb.ID

      AND ta_check <> tb_check

     


    Regards,

    Bob Monahon

  • Just a note on the use of a column with BINARY_CHECKSUM instead of * as has been mentioned above. Consider this:

    CREATE TABLE #t

    (

     i1 INT IDENTITY

     , c1 VARCHAR(50)

    )

    INSERT INTO #t VALUES ('a                b               c')

    INSERT INTO #t VALUES ('ab               c')

    SELECT

    (

     SELECT BINARY_CHECKSUM(c1) FROM #t WHERE i1 =1)

     -

     (SELECT BINARY_CHECKSUM(c1) FROM #t WHERE i1 =2)

    DROP TABLE  #t

               

    -----------

    0

    (1 row(s) affected)

    Seeing the result of 0, one might think both columns are equal

    Thanks to Adam Machanic, who originally posted the idea on the newsgroups.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • I have a similar problem.

    We base a calculation performed by a set of stored procs, on a set of input record tables.

    Recently, we've been asked to implement the possibility to recalculate only the changed records from a child-set as compared to a parent set (eg say you correct the input records created yesterday, and want to rerun them as today's new set. You then identify today's set as being the child of yesterday's set. The procedure picks up this parenthood, and starts comparing the records from parent and child, to identify those records that were changed).

    Using BINARY_CHECKSUM on every meaningful field, returned the strange impression of "yesterday's amount *1000 = yesterday's amount" ??? Apparently, the BINARY_CHECKSUM goodie has problems with NUMERICs.

    Anyone willing to do a suggestion on how to deal with this?

  • --DROP TABLE t_1

    --DROP TABLE t_2

     create table t_1

    (int1 int, int2 int constraint pk_1 primary key clustered (int1,int2), int3 int)

     create table t_2

    (int1 int, int2 int constraint pk_2 primary key clustered (int1,int2), int3 NUMERIC(10,3) )

    insert into t_1 values (1,1,1000.000)

    insert into t_1 values (1,2,100.000)

    insert into t_1 values (2,1,500.000)

    insert into t_1 values (2,2,500.000)

     

    insert into t_2 values (1,1,100.000)

    insert into t_2 values (1,2,1)

    insert into t_2 values (2,1,400)

    insert into t_2 values (2,2,500)

    SELECT   BINARY_CHECKSUM( l.int3,l.int2),

             binary_checksum(r.int3,r.int2),

             CASE WHEN BINARY_CHECKSUM( l.int3,l.int2) <> binary_checksum(r.int3,r.int2)

                  THEN 'CHANGED'

                  ELSE 'UNCHANGED'

              END, *

    from t_2 l full outer join t_1 r on l.int1 = r.int1 AND l.int2 = r.int2

     

  • Notice the nice constatation that 100.000 = 1000, and 1.000 = 100...

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply