Binary Compare

  • So did I - and got it wrong! That was the start of my downfall....

  • This is an interesting discussion and clearly it is very easy to see why it can be confusing.

    From a value perspective the 2 HEX numbers are the same. They both have a 1 in the "ones" column therefore they are both equal.

    However this is not about value, something I see as faulty, but it is about VARBINARY and how the data is stored in SQL.

    Because there is a conversion from Hex to variable lenght binary, the process allocates 4 binary bits for each Hex character therfore we "see" left side zero padding. So 0x01 becomes 00000001 and and 0x0001 becomes 0000000000000001. Again purely as value they are the same. Each contains a 1 in the "ones" column.

    The odd behaviour comes when these 2 are compared for equality. It seems to be treated like a string and compared left to right and they don't match, rather than as a value from right to left where they would match.

    Personally I see this as explainable, but faulty, dare I say wrong behaviour.

    Either way this as something to be be very aware of when coding.

  • bknight 46549 (5/17/2016)


    This is an interesting discussion and clearly it is very easy to see why it can be confusing.

    From a value perspective the 2 HEX numbers are the same. They both have a 1 in the "ones" column therefore they are both equal.

    However this is not about value, something I see as faulty, but it is about VARBINARY and how the data is stored in SQL.

    Because there is a conversion from Hex to variable lenght binary, the process allocates 4 binary bits for each Hex character therfore we "see" left side zero padding. So 0x01 becomes 00000001 and and 0x0001 becomes 0000000000000001. Again purely as value they are the same. Each contains a 1 in the "ones" column.

    The odd behaviour comes when these 2 are compared for equality. It seems to be treated like a string and compared left to right and they don't match, rather than as a value from right to left where they would match.

    Personally I see this as explainable, but faulty, dare I say wrong behaviour.

    Either way this as something to be be very aware of when coding.

    Thank you. This was helpful.

  • So, if I understand this correctly (thanks to posts from crmitchell and bknight), VARBINARY works more like VARCHAR when it comes to comparing values.

    I agree with bknight that this seems wrong. To follow the VARCHAR example, ' x' and ' x' would still be considered logically equal. I can't see why in VARBINARY's case '0x0001' and '0x01' should not also be logically equal.

  • This was discussed on Twitter about a month ago and I found it interesting. Why are 0x00 and 0x0000 equal?

    There is a note in the binary and varbinary BOL page that when converting, values are padded on the right. I expect this is the same for comparisons, meaning that 0x00 = 0x0000 gets converted to 0x0000 = 0x0000.

    For the 1s, this would mean 0x01 = 0x0001 gets converted to 0x0100 = 0x0001, clear unequal.

    Whether this is appropriate or not, I did see a note from someone that this is ANSI standard. Padding goes on the right with zeros.

  • So this is the VARBINARY equivalent of:

    DECLARE @a VARCHAR(20) = ' '

    , @b-2 VARCHAR(20) = ' '

    , @C VARCHAR(20) = ' 1'

    , @d VARCHAR(20) = ' 1';

    IF ( @a = @b-2 )

    IF ( @C = @d )

    PRINT 'Whole';

    ELSE

    PRINT 'Half';

    ELSE

    PRINT 'None';

  • Perry Whittle (5/17/2016)


    Bob JH Cullen (5/17/2016)


    which in any integer expression are irrelevant, surely?

    This is not an integer expression.

    The code first checks the binary values for @a and @b-2, do they match, well yes they do.

    You now enter the nested IF, do @C and @d match?

    No they don't because binary values 0x01 and 0x0001 are clearly different

    Sure, but so are 0x00 and 0x0000. This is more a case that you just have to know the standard on this one.

  • patrickmcginnis59 10839 (5/18/2016)


    Perry Whittle (5/17/2016)


    Bob JH Cullen (5/17/2016)


    which in any integer expression are irrelevant, surely?

    This is not an integer expression.

    The code first checks the binary values for @a and @b-2, do they match, well yes they do.

    You now enter the nested IF, do @C and @d match?

    No they don't because binary values 0x01 and 0x0001 are clearly different

    Sure, but so are 0x00 and 0x0000. This is more a case that you just have to know the standard on this one.

    For 16, 32 bit and 64 bit architectures.

    0x00 will be stored in memory and in the cpu registers as 8 0 bits with the remaining (1, 3 or 7) bytes in the word padded with another 8 0 bits in each.

    0x0000 will be stored as 16 0 bits with the remaining (0, 2 or 6) bytes padded with 8 0 bits each

    This means that in this case both 0x00 and 0x0000 will have exactly the same binary representation.

    For other values the endianness (i.e. the order of bytes stored for the data element where we are dealing with multibyte variables) and the memory alignment (i.e. how the system pads values which do not fill an entire word) of the system will affect how the value is stored. Both of these are dependant on the machine architecture NOT upon SQL Server

  • crmitchell (5/18/2016)


    patrickmcginnis59 10839 (5/18/2016)


    Perry Whittle (5/17/2016)


    Bob JH Cullen (5/17/2016)


    which in any integer expression are irrelevant, surely?

    This is not an integer expression.

    The code first checks the binary values for @a and @b-2, do they match, well yes they do.

    You now enter the nested IF, do @C and @d match?

    No they don't because binary values 0x01 and 0x0001 are clearly different

    Sure, but so are 0x00 and 0x0000. This is more a case that you just have to know the standard on this one.

    For 16, 32 bit and 64 bit architectures.

    0x00 will be stored in memory and in the cpu registers as 8 0 bits with the remaining (1, 3 or 7) bytes in the word padded with another 8 0 bits in each.

    0x0000 will be stored as 16 0 bits with the remaining (0, 2 or 6) bytes padded with 8 0 bits each

    This means that in this case both 0x00 and 0x0000 will have exactly the same binary representation.

    For other values the endianness (i.e. the order of bytes stored for the data element where we are dealing with multibyte variables) and the memory alignment (i.e. how the system pads values which do not fill an entire word) of the system will affect how the value is stored. Both of these are dependant on the machine architecture NOT upon SQL Server

    Those are implementation issues, this shouldn't get mixed up into what we're talking about in my opinion. If computers in general couldn't differentiate between the binary values 0x00 and 0x0000 regardless of processor word width or edianness we'd be in pretty sad shape indeed 🙂 I'm ok that SQL says they're equal, but I personally maintain a program that says these values are very different and I wouldn't be able to do that if I was hamstrung by processor issues.

    This is why you can play audio and video files on any computer that recognizes them, the file specification says how the values are laid out, not the processor word length or edianness. For a closer database analogy, this illustrates my point http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc31644.1570/html/sag2/sag2417.htm

  • This is certainly an implementation issue (and a knowing of the standard that is being followed). However, that's part of why I thought it was an interesting question.

    I think there should be a default that says 0x00 and 0x0000 are equal. However perhaps there should be a choice by the programmer to decide if these should be equal or not. I have no idea how you'd do this, as the register would need to be padded in either case if it is longer than the value being stored. In that case, should we take the overhead to somehow mark a length of the value?

    This is a thorny situation, but certainly one that you need to be aware of if you work with varbinary fields.

  • Steve Jones - SSC Editor (5/18/2016)


    This is certainly an implementation issue (and a knowing of the standard that is being followed). However, that's part of why I thought it was an interesting question.

    I think there should be a default that says 0x00 and 0x0000 are equal. However perhaps there should be a choice by the programmer to decide if these should be equal or not. I have no idea how you'd do this, as the register would need to be padded in either case if it is longer than the value being stored.

    It looks like its more about specifying the size of the operand, but honestly I don't spend any time at that level. I've been spoiled by high level languages 😛

    http://www.c-jump.com/CIS77/CPU/x86/X77_0060_mod_reg_r_m_byte.htm

    In that case, should we take the overhead to somehow mark a length of the value?

    SQL has that covered, it apparently already marks the length. Otherwise how would it even know how to display the varbinary values?

    declare @a varbinary(8), @b-2 varbinary(8)

    set @a = 0x00

    set @b-2 = 0x0000

    select @a, @b-2

    This is a thorny situation, but certainly one that you need to be aware of if you work with varbinary fields.

    I agree with that, but I don't think we need to get mixed up in opcodes or registers. For that matter, you should unban Celko for long enough for him to talk about how we don't worry about implementation details when we're using our high level abstractions 😛

  • I suspect that all of this comes out of a desire to make varchar and char equality testing easy. In the case of char equality testing, it seems obvious that we don't want to consider the trailing spaces with which they are padded, so ANSI says to pad the shorter string out with spaces until it is the same length as the longer string, then check equality. In order to make it easy to compare a varchar and a char, the same logic is also applied. It's when it gets applied to comparing a varchar with a varchar that it becomes non-intuitive, but in the interest of consistency, even for varchars SQL defines '' = ' ' as TRUE.

    With that in place, consider that the null byte is the binary equivalent of a space in a char field. The same logic then applies - we want comparison of binary values to be straightforward, and thus in the interest of consistency SQL pads varbinary values with null bytes. Thus any length of null bytes is equal to an empty string, and this has nothing to do with processor endianness, CPU data register widths, etc.

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply