Connection Issues with Server 2003

  • Lots of info here, so please bear with me. To begin with, our production SQL environment is SQL Server 2008 (NOT R2).

    Our server administrator rolled out the March Windows Updates to our infrastructure Sunday afternoon. At that point, we started having issues connecting to all SQL servers/instances on Windows 2003 boxes.

    Our issue first presented itself when attempting to establish a connection via our custom program to the database on our production server. When trying to connect, our application returns the following error:

    "DEBNETLIB][ConnectionOpen (SECDOClientHandshake()).[SSL Security error."

    This seems odd, since on our configuration, "Force Protocol Encryption" is set to No. I then tried to create an ODBC connection (via System DSN) using the 10.0 native client and received the following error when trying to get the default settings from the server:

    "Connection failed:

    SQLState: '08001'

    SQL Server Error: 10054

    [Microsoft][SQL Server Native Client 10.0]TCP Provider: An existing connection was forcibly closed by the remote host.

    Connection failed:

    SQLState: '08001'

    SQL Server Error: 10054

    [Microsoft][SQL Server Native Client 10.0]Client unable to establish connection"

    I log failed logins on my sql server, however there were no errors in my logs associated to the above. My next step was to try and telnet into my sql instance via 1433 - I know it will not connect, but it should at least register an error in my sql logs. However, upon attempting, the connection fails but there is no sql error. I verified that an error log entry will be created by telnet-ing from another server (2008), which successfully gives me this error (as expected):

    "Length specified in network packet payload did not match number of bytes read; the connection has been closed"

    Ok, so seems like a connection issue. I ran a Wireshark on my sql server to verify I can see incoming connections from the 2003 server - which returns nothing. Immediately this tells me its a firewall or antivirus issue. I verify that Windows firewall is off on the client and sql server (which they were never on to begin with), then disable anti-virus on the client. Now when I run a wireshark, I can see incoming connection attempts. What's odd this time is I see the following blocks in the Wireshark over and over again until the connection times-out and fails on the client:

    "[ClientIP][Host IP]TCP601947→1433 [ACK] Seq=1 Ack=1 Win=65535 Len=0

    [ClientIP][Host IP]TDS106TDS7 pre-login message

    [Host IP] [ClientIP]TDS91Response

    [ClientIP][Host IP]TDS132TDS7 pre-login message"

    This same block is repeated approx. 20 times in the 30 seconds before it times out. No errors in the sql log for any of the above.

    At this point I'm getting very confused. My original error mentioned SSL certs, which we are not using. There is no CA on the domain and I verified on the host and client that there are no SQL or PC certificates installed. Unfortunately we did not snapshot the Server 2003 clients prior to installing the updates, so I have my admin roll them all back manually. While he's doing that, I grabbed a Server 2003 ISO from MSDN (dated 2005) and setup another server exactly like the other client that is having the issue. The end result on both the server is the same as described above.

    Finally I put SSMS (and HeidiSQL, just for some different insight) onto the test 2003 server to see if I can get any other errors or insight. When I try to connect via sqlcmd or SSMS from the client, I receive this error:

    "Cannot connect to [instance name].

    A connection was successfully established with the server, but then an error occurred during the pre-login handshake. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) (Microsoft SQL Server, Error: 10054)"

    HeidiSQL returns a similar error. Again, my sql logs on the sql server do not show these connection attempts or failures.

    I should also note that I've performed most of the steps above against other instances (2008, 2008 R2, 2012) in this domain and am experiencing the exact same issues. This is *only* happening on Server 2003 clients.

    Has anyone else had any issues similar to above or seeing anything wierd with the March Windows Updates? Or, does anyone know of any additional troubleshooting steps I might have missed?

    We've opened a ticket with Microsoft to resolve, however I'm still trying to pick around at it to find some more info.

  • Update to the situation:

    Worked all day yesterday with the Microsoft SQL team in trying to figure out what's going on. After lots of troubleshooting we narrowed down the issue as follows:

    - Connecting from a 2003 client server to a 2003 server running SQL works without issue.

    - Connecting from a 2008 or 2012 client server to a 2003 server running SQL fails with the errors described originally.

    - Connecting from a 2003 client server to a 2008 or 2012 server running SQL fails with the errors described originally.

    The MS guys spent a large portion of their time looking at RSA and SCHANNEL registry entries and drivers, so it seems like it has something to do with encryption. We did comparisons between the 2008 boxes vs the 2003 boxes to see if any of the Cryptology registry keys were different (basically all were the same, except on the 2008 box the .dll location is preceded with %System32%, which makes no difference). We ran a few more wireshark scans and they mentioned, from the scans, that the sql server is seeing the connection attempt. Further, the handshake is successfully made. However, after the handshake, the sql server is expecting the information packet sent to be in some particular format and it's not; sql server immediately kills the connection attempt at that point and does not log anything. They mentioned this bad packet format is likely due to a bad NIC driver or dll. We tested this (and received the same results) on both a Hyper-V machine (with a virtual NIC) and a physcial box, so it doesn't seem to be a NIC driver issue.

    They ran some traces on the sql server last night before we called it quits for the day, so hopefully we'll have some more info today when we talk.

    Oh, and they asked about the registry values for the following (not sure what it handles either, but it gives me some research material):

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\FipsAlgorithmPolicy

  • Another update:

    We never heard back from MS yesterday as they were researching the issue. I wasn't able to get on the phone call this afternoon, but from what I understand they did some more troubleshooting and ran more traces to try and isolate the problem.

    Long story short, somehow the cipher keys on our 2003 machines got screwed up. They initially suspected we had changed it through GPO, but that was not the case. They are suspecting that rolling back the KB3002657 patch may be the culprit (so beware of that patch!!).

    They are passing on the info from the SQL team to other engineers in order to get us a resolution.

  • Hey, we've been having the exact same problem as you. Do you know if this was resolved? What was the solution?

  • I'm racking my brain, trying to remember what the resolution was, but unfortunately it was too long ago. The issue was with Microsoft and the certificates.

    I believe what ended up happening, in our case, was we replaced the 2003 server with a 2008 or 2012 server. Since 2003 is EOL, our company required all 2003 servers to be decommissioned or replaced.

    I'll see if I can dig up any more info, but the server admin that I was working with at the time (who was the POC with Microsoft) is gone.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply