Storing IPv4 Addresses for Performance

  • Elliott W (8/25/2009)


    Guys, I just got to thinking about this, but it seems this kind of conversion is right up SQLCLR's alley, once it is in memory I would think it is very fast..

    This is effectively the binary(4) method in SQL..

    Take string IP Addr and get 4 bytes.. They would be probably SqlBytes in CLR..

    Dim _oIP0 As New System.Net.IPAddress(0)

    Dim _oIP1 As System.Net.IPAddress

    Dim _oAddr As Byte()

    _oIP1 = _oIP0.Parse("192.168.1.100")

    _oAddr = _oIP1.GetAddressBytes()

    Take 4 bytes and get string

    Dim _oInBytes As Byte()

    ReDim _oInBytes(3)

    _oInBytes(0) = 192

    _oInBytes(1) = 168

    _oInBytes(2) = 1

    _oInBytes(3) = 100

    Dim _oIP As New System.Net.IPAddress(_oInBytes)

    Debug.WriteLine("String IP Addr: " & _oIP1.ToString)

    These are just fragments but you can go from the string to varbinary(4) in 5 lines of code and from the varbinary(4) to the string in 2. Once it is loaded I would think it would be really fast..

    Thoughts? Flames?

    CEWII

    Good point... No flames... just thoughts.... If all you intend to do is split the octets, then the six line T-SQL solution that uses PARSENAME is almost as fast, requires no externally compiled code, and requires no call to an external assembly.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Good point... No flames... just thoughts.... If all you intend to do is split the octets, then the six line T-SQL solution that uses PARSENAME is almost as fast, requires no externally compiled code, and requires no call to an external assembly.

    --Jeff Moden

    I know there is a lot of FUD in the DBA community about SQLCLR, but being able to code in VB and read C# I have an understanding of what the developer is trying to accomplish and have a certain level of trust when I can examine the code. Some people see SQLCLR and immediately say no without consideration, I'm not saying that you are. But people who do that are acting in what *I* (the world according to Elliott) feel is an ignorant way, there are lots of things you can accomplish in SQL without resorting to SQL CLR, and quite a few of them make more sense in SQL CLR. So sayeth my personal experience.. So climbing off my soapbox..

    I have it coded and am getting ready to test it. When I'm done I will pack everything up and include it with my response.. For everyone to play with..

    CEWII

  • Well it looks like conversion from a string to a varbinary(4) is rockin fast with SQLCLR..

    (10000 row(s) affected)

    -- Output of statistics time on for 10 runs in a row

    SQL Server Execution Times:

    CPU time = 62 ms, elapsed time = 200 ms.

    SQL Server Execution Times:

    CPU time = 62 ms, elapsed time = 247 ms.

    SQL Server Execution Times:

    CPU time = 63 ms, elapsed time = 210 ms.

    SQL Server Execution Times:

    CPU time = 94 ms, elapsed time = 199 ms.

    SQL Server Execution Times:

    CPU time = 125 ms, elapsed time = 205 ms.

    SQL Server Execution Times:

    CPU time = 46 ms, elapsed time = 161 ms.

    SQL Server Execution Times:

    CPU time = 47 ms, elapsed time = 145 ms.

    SQL Server Execution Times:

    CPU time = 16 ms, elapsed time = 253 ms.

    SQL Server Execution Times:

    CPU time = 63 ms, elapsed time = 170 ms.

    SQL Server Execution Times:

    CPU time = 109 ms, elapsed time = 183 ms.

    On a 2GHz laptop with 2GB of RAM, running SQL 2008 Express. I wonder what it would look like on server class hardware..

    Now I'm going to test the retrieval of the varbinary(4) back to a string.. When I'm all done I will attach the code for all.

    -- Edited to protect the guilty..--

    CEWII

  • [ See after next post ... ]

  • and the conversion to string from varbinary is also rockin fast

    (10000 row(s) affected)

    -- Output of statistics time on for 10 runs in a row

    SQL Server Execution Times:

    CPU time = 46 ms, elapsed time = 122 ms.

    SQL Server Execution Times:

    CPU time = 16 ms, elapsed time = 109 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 110 ms.

    SQL Server Execution Times:

    CPU time = 47 ms, elapsed time = 104 ms.

    SQL Server Execution Times:

    CPU time = 16 ms, elapsed time = 82 ms.

    SQL Server Execution Times:

    CPU time = 63 ms, elapsed time = 115 ms.

    SQL Server Execution Times:

    CPU time = 47 ms, elapsed time = 113 ms.

    SQL Server Execution Times:

    CPU time = 16 ms, elapsed time = 83 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 108 ms.

    SQL Server Execution Times:

    CPU time = 15 ms, elapsed time = 83 ms.

    I have attached the code, there is the project that for the SQLCLR object, the compiled SQLCLR dll, scripts to install the DLL and build the tables and functions necessary. Also the queries to test the times.

    --Edited to protect the guilty.. --

    I also want to give credit to loganmerazzi for the CLR idea, I hadn't realized it at the time but he first mentioned it.

    CEWII

  • I respectfully disagree. As has already been discussed, the performance hit is due to the use of UDFs. Creating a computed column on the parsename method demonstrates how fast parsename really is.

    [IntegerComputed] AS ((((CONVERT([int],parsename([StringFixed],(4)),0)-(128))*(16777216)+CONVERT([int],parsename([StringFixed],(3)),0)*(65536))+CONVERT([int],parsename([StringFixed],(2)),0)*(256))+CONVERT([int],parsename([StringFixed],(1)),0))

    SELECT [IntegerComputed] FROM [bdq].[dbo].[IPAddresses]

    (100000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 172 ms, elapsed time = 979 ms.

    Which would make the SQLCLR method 1/6th slower then native T-SQL.

  • You may be right, I did 10000, which I went back to the article and it mentions 10,000 loops, but doesn't say how many rows. I see that you had 100000 in your tests above.. I shall retest..

    CEWII

  • Ok, I think Jean-Sebastien might have me..

    String to varbinary

    (100000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 547 ms, elapsed time = 1481 ms.

    SQL Server Execution Times:

    CPU time = 688 ms, elapsed time = 1579 ms.

    SQL Server Execution Times:

    CPU time = 719 ms, elapsed time = 1586 ms.

    SQL Server Execution Times:

    CPU time = 609 ms, elapsed time = 1668 ms.

    SQL Server Execution Times:

    CPU time = 656 ms, elapsed time = 1643 ms.

    SQL Server Execution Times:

    CPU time = 656 ms, elapsed time = 1585 ms.

    SQL Server Execution Times:

    CPU time = 625 ms, elapsed time = 1697 ms.

    SQL Server Execution Times:

    CPU time = 687 ms, elapsed time = 1601 ms.

    SQL Server Execution Times:

    CPU time = 500 ms, elapsed time = 1618 ms.

    SQL Server Execution Times:

    CPU time = 672 ms, elapsed time = 1647 ms.

    varbinary back to string

    (100000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 203 ms, elapsed time = 500 ms.

    SQL Server Execution Times:

    CPU time = 172 ms, elapsed time = 461 ms.

    SQL Server Execution Times:

    CPU time = 203 ms, elapsed time = 464 ms.

    SQL Server Execution Times:

    CPU time = 172 ms, elapsed time = 460 ms.

    SQL Server Execution Times:

    CPU time = 156 ms, elapsed time = 425 ms.

    SQL Server Execution Times:

    CPU time = 219 ms, elapsed time = 465 ms.

    SQL Server Execution Times:

    CPU time = 203 ms, elapsed time = 526 ms.

    SQL Server Execution Times:

    CPU time = 140 ms, elapsed time = 436 ms.

    SQL Server Execution Times:

    CPU time = 188 ms, elapsed time = 501 ms.

    SQL Server Execution Times:

    CPU time = 156 ms, elapsed time = 502 ms.

    It doesn't look like perfomance is linear though, an order of manitude difference in source data did not translate into 10 times slower times..

    10,000 Rows (averages)

    string to varbinary 197ms

    varbinary to string 103ms

    100,000 Rows (averages)

    string to varbinary 1611ms

    varbinary to string 474ms

    I can only counter this was on a laptop so mileage may vary..

    We'll see what everybody has to say..

    CEWII

  • Elliott W (8/25/2009)


    Good point... No flames... just thoughts.... If all you intend to do is split the octets, then the six line T-SQL solution that uses PARSENAME is almost as fast, requires no externally compiled code, and requires no call to an external assembly.

    --Jeff Moden

    I know there is a lot of FUD in the DBA community about SQLCLR, but being able to code in VB and read C# I have an understanding of what the developer is trying to accomplish and have a certain level of trust when I can examine the code. Some people see SQLCLR and immediately say no without consideration, I'm not saying that you are. But people who do that are acting in what *I* (the world according to Elliott) feel is an ignorant way, there are lots of things you can accomplish in SQL without resorting to SQL CLR, and quite a few of them make more sense in SQL CLR. So sayeth my personal experience.. So climbing off my soapbox..

    I have it coded and am getting ready to test it. When I'm done I will pack everything up and include it with my response.. For everyone to play with..

    CEWII

    I have no FUD about SQLCLR nor do I believe that most of the rest of the DBA community does, either. I'll just state that except for certain RegEx functionality (which most good developers and DBA's can usually come real close to beating) and certain mathematical functionality, SQLCLR has met the same fate as Cursors and While Loops... people use it to overcome their lack of knowledge of T-SQL and set based programming and it usually costs a bit more in performance, resource usage, and maintenance hassles. The reasons why DBA's don't care for most CLR's is because of the problems I mentioned... it usually costs a bit more in performance, resource usage, and maintenance hassles just like you and Jean-Sebastien just proved (laptop mileage not withstanding, of course :-D). It has nothing to do with FUD. In fact, I'd like to turn this around and say there's more FUD about T-SQL and set based programming on the behalf of many developers than there will ever be by DBA's on the subject of SQLCLR's. It's why developers turn to Cursors, While Loops, and CLR's to begin with. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • And I gotta say that computed column performs pretty well, its too bad that SQL doesn't think its deterministic because it could then be persisted..

    I got to thinking that there are a couple more CLR paths I could go down..

    Here are the breakdowns since I don't want to post any more code tonight..

    100,000 Rows (averages)

    varbinary to string 474ms (CLR - IP address Method)

    string to varbinary 1611ms (CLR - IP Address Method)

    string to varbinary 648ms (CLR - string breaker)

    string to int 539ms (CLR - string breaker)

    string to int 406ms (computed column)

    Conclusions:

    If you want to store the data as text in the table you can use the computed column and get pretty good performance on compares but cost you at least 11 bytes of storage per record.

    The varbinary method is a little slower than the int method but previous posts may provide insight into the limitations imposed.

    If you want to store the data as an int and use a computed column to make it human readable, I think you would save some disk space but the performance would be the same as if you stored the text..

    If you plan to convert back and forth then the CLR will likely perform better.

    CEWII

  • Jeff,

    I had a reply just about done then I hit the back button.. So here is the really short version..

    I want the developers reading this to know that all tools should be on the workbench. And to keep an eye out for the right tool for the job. When you can do set based operations it is best to do them. SQLCLR is a tool, it doesn't fit all or even most scenarios. In this scenario the computed column is a good solution provided you store the string then you can compute the int, but it costs you ~11bytes/record, I think a better approach is to store the int and compute the string, you save about ~11bytes/record and don't lose functionality. Also in this scenario we can show that if you are going to be converting back and forth that the CLR module will perform better. Tradeoffs, speed for convenience..

    I unfortunately have seen way to many DBAs flat out not decline SQLCLR even if it could be shown to be the best solution. What really turned me to the dark side was a talk I attended at Dev Connections where (I think) Gert Drapers was showing that for some situations like text manipulation that a CLR module will hands down outperform T-SQL. And I think DBAs don't like them because they are opaque and can't look inside. This is why the sourcecode needs to be available to DBAs and the DBAs need to be knowledgeable enough to read them..

    Short answer, there are scenarios in the IP Addr discussion where the computed column is a clear winner, it gets a lot murkier outside of that what the best solution is.

    CEWII

  • I wonder if there isn't a way to 'force' or 'convince' SQL Server that the computation IS deterministic which would allow it to be persisted...

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I'm thinking the problem is parsename is probably tagged as non-deterministic.. Not sure why, if you feed the same value in you should always get the same results..

    I think I could make the CLR objects deterministic, I just have to add am attribute..

    I did a little further checking, new data bold old data unbold.

    100,000 Rows (averages)

    varbinary to string 474ms (CLR - IP address Method)

    string to varbinary 1611ms (CLR - IP Address Method)

    string to varbinary 648ms (CLR - string breaker)

    string to int 539ms (CLR - string breaker)

    string to int 406ms (computed column)

    int to string 525ms (CLR - reverse of computed column)

    select string column 441ms (no computations, or CLR)

    select int column 395ms (no computations, or CLR)

    CEWII

  • Nice supplement to the article I wrote there[/url] 😉

    The code extracts the class of the IP Adress 😉

    @++ 😉

  • Elliott W (8/26/2009)


    I want the developers reading this to know that all tools should be on the workbench. And to keep an eye out for the right tool for the job. When you can do set based operations it is best to do them. SQLCLR is a tool, it doesn't fit all or even most scenarios. In this scenario the computed column is a good solution provided you store the string then you can compute the int, but it costs you ~11bytes/record, I think a better approach is to store the int and compute the string, you save about ~11bytes/record and don't lose functionality. Also in this scenario we can show that if you are going to be converting back and forth that the CLR module will perform better. Tradeoffs, speed for convenience..

    I unfortunately have seen way to many DBAs flat out not decline SQLCLR even if it could be shown to be the best solution. What really turned me to the dark side was a talk I attended at Dev Connections where (I think) Gert Drapers was showing that for some situations like text manipulation that a CLR module will hands down outperform T-SQL. And I think DBAs don't like them because they are opaque and can't look inside. This is why the sourcecode needs to be available to DBAs and the DBAs need to be knowledgeable enough to read them..

    Short answer, there are scenarios in the IP Addr discussion where the computed column is a clear winner, it gets a lot murkier outside of that what the best solution is.

    CEWII

    heh... understood... I just don't want developers to use a tool just because it's there... I want them to use it for all the right reasons as I suspect you do.

    And, my solution didn't require 11 bytes as you suggest... it only required 4 and, because of the computed columns, it allowed greatly expanded functionality without further obfuscation of the IP address.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 61 through 75 (of 100 total)

You must be logged in to reply to this topic. Login to reply