SQL DB write degradation

  • Hi,

    I am a newbie to SQL Server administration and as fortunate as I am, I got a production performance issue to troubleshoot right away. Here's how the issue emerged -

    Old World:

    Application and SQL Server residing on the same Windows server. Application used Microsoft ODBCs to write to the Database. It read from and wrote back to the same database on the same server and was achieving a write speed of ~11MBps.

    Current World facing issue:

    Application moved over to UNIX Solaris server and SQL Server still on the same Windows box. Application uses DataDirect ODBCs to read from and write back to the same database. The application now achieves a write speed of only as much as ~1.2MBps.

    My understanding is that this should be related to the change that happened in this environment which seems to be

    - the introduction of the network latency and

    - the ODBC vendor

    A simple file transfer from the App server to the DB server demonstrates a speed of 24MBps so the Network can definitely support the throughput expected. As for the ODBC vendor, I was told by the vendor that the driver is tweaked as well as it can be. All of 'em seem to be pointing to that the DB can take the writes, slowing down the process. My dilemma is nothing changed on the DB side to cause this, why is this thing slowing down.

    SQL Server Specifics

    Server Version - SQL 2000 Standard

    Memory - 4 GB

    Database on - F: (Spanned Volume) 487 GB with 73% free space

    Log file on - E: (Primary Partition with C: ) with 82% free space

    Database size - 65GB

    Any help you could provide would be greatly appreciated!!!!

    Thanks!

    Leon

  • Run SQLIO to test the writes on the db server. You should be able to guess the bandwidth from there or issues with the other server.

    Not sure what you can run on the other server to test it's ability to send the file. Maybe send a file to a file server?

  • Thanks .... as for the App server to the DB server bandwidth, I ran a simple FTP to see what the network throughput was, and I am getting around 24 MBps consistenly.

  • Steve, Thanks again

    I am continuing to check with SQIO even as I post this. However, I wanted to get this out sooner than later -

    C:\Program Files\SQLIO>sqlio -kW -s10 -frandom -o8 -b8 -LS -Fparam.txt timeout /T 60

    CUMULATIVE DATA:

    throughput metrics:

    IOs/sec: 1778.72

    MBs/sec: 13.89

    latency metrics:

    Min_Latency(ms): 0

    Avg_Latency(ms): 34

    Max_Latency(ms): 4739

    C:\Program Files\SQLIO>sqlio -kW -s360 -frandom -o8 -b64 -LS -Fparam.txt timeout /T 60

    CUMULATIVE DATA:

    throughput metrics:

    IOs/sec: 589.99

    MBs/sec: 36.87

    latency metrics:

    Min_Latency(ms): 3

    Avg_Latency(ms): 107

    Max_Latency(ms): 1171

    C:\Program Files\SQLIO>sqlio -kW -s360 -frandom -o8 -b128 -LS -Fparam.txt timeout /T 60

    CUMULATIVE DATA:

    throughput metrics:

    IOs/sec: 307.92

    MBs/sec: 38.49

    latency metrics:

    Min_Latency(ms): 6

    Avg_Latency(ms): 207

    Max_Latency(ms): 1304

    Param.txt for all 3 tests above -

    F:\SQL_2000\MSSQL\DB_Data_Files\T_IBNR_INFO_RTC.dat 8 0x0 622

    Doesn't this tell me that the DB disks can obviously take as much as 29 MBps for writes?

    Leon

  • I'm not sure about the parameters off the top of my head, but it looks like you've got capability the 13MB.sec is a little disconcerting.

    How about writing from the source to this disk? Any testing there on raw file throughput?

  • Steve,

    When you say writing from the source to the disk, do you mean the actual application on the UNIX box? I am doing it right now, and am seeing around 1.43 MBps being written to the Log and 2.4 MBps written to the Database. Let me know if I am misconstruing something.

    Two questions -

    1. Given that the Log resides on E: which is also where the paging file resides, do you think there will be contention between writes to E: when there is data coming in from the App server over the network that needs to be paged to E: perhaps and also log being written out?

    2. I am seeing that the writes to the Database disk keep going on and off, i mean, subsequent samples show a series of writes as follows -

    0 - 2.4 MBps - 0 - 2.4 MBps - ....

    Thanks!

    Leon

  • Maybe I'm missing something, but if I understand, the Unix box writes to the SQL instance on a remote Windows box.

    Therefore you have 3 places where there could be bottlenecks.

    1. SQL not handling the volume of writes to the db and/or log. One could bottleneck the other.

    2. The network between them not supporting volume of bandwidth needed.

    3. The Unix box not sending quickly enough or having disk issues itself.

    So, I was wondering if the Unix box can be tested to ensure that it can pump out 10-20Mbps or more by sending a flat file to a file server or the SQL box. If you can verify that works, then you've kind of eliminated #2.

    I'd check with Data Direct as well and be sure they don't have some strange issue you're hitting. They're good guys and should help you ensure things are optimized.

  • Thanks a lot Steve! Yes I checked by changing the outgoing connection from the app (UNIX to Windows) to FTP and write to a flat file on the same SAN as as the Database and I am getting 24 MBps throughput.

    Also, given that the SQLIO shows the disk can take much more for writes from SQL, the ODBC is the only item that needs investigation; let me touch base with the vendor and see if get anywhere with this. Thanks again!

    Leon

  • I guess I jumped the gun when I said ODBC implemenation is the only item pending investigation -

    the thing you mentioned about the write to the log 'bottleneck'ing that to the DB, is there a way I could prove it (I am collecting perfmon counter logs should those be needed)? Also, if that is indeed the case, why was that not an issue in the old world where we used Microsoft ODBCs?

    Also, I am being told now, there was a change made to the recovery model of the database from

    Bulk-logged to Simple, thinking it will help performance. Would this have anything to do with performance? I understand Simple RM databases have their logs truncated on checkpoints.

    Leon

  • Since the log writes must complete before the database writes, if the log is slow, it can slow everything.

    If nothing changed on the disks, I'm not sure of the issue. The simple recovery mode means that when a transaction is committed, the entire thing, it is cleared from the log on the next checkpoint. If anything, I'd expect that to run faster, albeit with less recovery capability.

  • I know I am kinda comparing apples to oranges here, but in order to ensure the log writes were not the bottlenecks, i performed the following test -->

    1. Using SQL query analyzer, I ran a transaction (begin transaction, insert 10000 rows commit transaction) 40 times in succession, and saw what the throughput was. It seems to be around 7 MBps which is around 1/7 times that seen with the ODBC. So I know the logs could take in data faster

    2. I am hoping given that there were transaction boundaries defined, inspite of this using the native SQL client, the sequence of events from a database perspective (like writing to the log first and then to the database) would have remained the same.

    3. Of course, the actual application has to traverse a network, but as I have maintained, the Network throughput is around 24MBps so that seems to be out of the circle of suspicion

    Would you know what happens when the ODBC connects to SQL for running transactions? Does it write it out to the addr space of a network utility that then passes it over to sqlservr? I cant seem to explain. Or do you think this is a case strong enough for DataDirect to acknowledge is internal to the ODBC and have them troubleshoot?

    Thanks for all the guidance!

    Leon

  • Correction to -

    It seems to be around 7 MBps which is around 1/7 times that seen with the ODBC. So I know the logs could take in data faster

    I really meant

    It seems to be around 7 MBps which is around 7 times that seen with the ODBC. So I know the logs could take in data faster

  • It seems pretty obvious to me that this issue absolutely, positively cannot be related to the sql server itself. When the app was running locally you got significantly more throughput (however you decide to measure it) than you get with the app running remotely. Since you have many times validated network throughput, it seems obvious that it is either the new application code (running on unix now instead of windows) or the new sql connection driver (or some issue caused by it) or both.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • While I was of the same opinion initially, I get the feeling we tend to overlook a few other areas that should be investigated (being a SQL Server DBA) before it narrowing it down to a specific component. Here are a few of them we wanted to make sure were good to meet our expectations of performance -

    1. The Network latency

    2. The Network card performance of the DB server ( I am not sure if this ever played any role in the old world)

    3. The App server network and/or card performance

    However, with the tests executed now, we have sufficiently isolated this to the ODBC

    Thanks!

    Leon

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply