Win 2003 + MS SQL 2000 sp3 cluster on HP (Compaq) MSA 1000

  • Hi!

    Just created a two node cluster with this configuration:

    MSA 1000 (2 built in 3-port hubs + 2 HBA on each node, Secure Path 4.0c, 28x36GB 15k disks, 9xRAID10 & RAID1 arrays of various sizes)

    Windows 2003 Server + MS SQL 2000 Server sp3

    With enabled array accelerator (controller cashe, 512MB 100% write), have troubles with MS SQL I/O

    (i.e.: multiple databases restore)   The error listings are below.  When turn the array accelerator off, the restore works fine on the first node, but on the second I get an error:

    -------------------------------------------------------------------------------------

    Processed 3706848 pages for database 'mydb', file 'mydb_Data' on file 1.

    Processed 648240 pages for database 'mydb', file 'mydb_Index' on file 1.

    [Microsoft][ODBC SQL Server Driver][Named Pipes]ConnectionRead (WrapperRead()).

    Server: Msg 11, Level 16, State 1, Line 0

    General network error. Check your network documentation.

    Processed 5124 pages for database 'mydb', file 'mydb_Log' on file 1.

    Connection Broken

    -------------------------------------------------------------------------------------

    Did anyone faced anything like this?

     

    The errors on the first nide with hardware accelerator enabled:

    -------------------------------------------------------------------------------------

    Event Type: Error

    Event Source: raidisk

    Event Category: None

    Event ID: 1026

    Date:  06.11.2004

    Time:  15:21:46

    User:  N/A

    Computer: CLUST-SQL3

    Description:

    The Driver has detected a path failure to Subsystem ID . Dump Data 0 contains the Phys Path Info < Port | Bus | Target | Lun >. Dump Data 1 contains the Driver Status. Dump Data 2 contains the HBA Slot Number (ffffffff if unavailable). Dump Data 3 contains extended Driver Status.

    Data:

    0000: 0f 00 10 00 02 00 60 00   ......`.

    0008: 00 00 00 00 02 04 00 c4   .......Ä

    0010: 00 00 00 00 00 00 00 00   ........

    0018: 00 00 00 00 00 00 00 00   ........

    0020: 00 00 00 00 00 00 00 00   ........

    0028: 02 00 00 02 46 4f 52 00   ....FOR.

    0030: 01 00 00 00 00 00 00 00   ........

    Event Type: Error

    Event Source: ClusSvc

    Event Category: Physical Disk Resource

    Event ID: 1038

    Date:  06.11.2004

    Time:  15:21:47

    User:  N/A

    Computer: CLUST-SQL4

    Description:

    Reservation of cluster disk 'Disk X:' has been lost. Please check your system and disk configuration.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Event Type: Information

    Event Source: CPQFCAC

    Event Category: None

    Event ID: 24636

    Date:  06.11.2004

    Time:  22:22:59

    User:  N/A

    Computer: CLUST-SQL4

    Description:

     Array Controller \Device\FibreArray1, HBA Slot 2, Chassis: MSA1000RAID is now active.

    Data:

    0000: 00 00 00 00 03 00 50 00   ......P.

    0008: 00 00 00 00 3c 60 35 44   ....<`5D

    0010: 00 00 00 00 00 00 00 00   ........

    0018: 00 00 00 00 00 00 00 00   ........

    0020: 00 00 00 00 00 00 00 00   ........

    Event Type: Warning

    Event Source: Ntfs

    Event Category: None

    Event ID: 50

    Date:  06.11.2004

    Time:  22:23:44

    User:  N/A

    Computer: CLUST-SQL3

    Description:

    {Delayed Write Failed} Windows was unable to save all the data for the file . The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Data:

    0000: 04 00 04 00 02 00 52 00   ......R.

    0008: 00 00 00 00 32 00 04 80   ....2..?

    0010: 00 00 00 00 9d 00 00 c0   ......À

    0018: 00 00 00 00 00 00 00 00   ........

    0020: 00 00 00 00 00 00 00 00   ........

    0028: 9d 00 00 c0               ..À   

    Event Type: Warning

    Event Source: ql2300

    Event Category: None

    Event ID: 118

    Date:  06.11.2004

    Time:  22:24:13

    User:  N/A

    Computer: CLUST-SQL3

    Description:

    The driver for device \Device\Scsi\ql23001 performed a bus reset upon request.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Data:

    0000: 0f 00 08 00 01 00 5e 00   ......^.

    0008: 00 00 00 00 76 00 04 80   ....v..?

    0010: 00 01 00 52 00 00 00 00   ...R....

    0018: 00 00 00 00 00 00 00 00   ........

    0020: 00 00 00 00 00 00 00 00   ........

    0028: 00 00 00 00 14 50 2d 00   .....P-.

     

  • I have had the same problem on my W2k3 / SQL2k cluster. It has to do with named pipes communication. You need to disable the named pipes communication after installing the SQL2k instances. Use only tcp/ip communications and it works ok.

    Why? I do not have any clue.

    The problem came forward only with bigger (>1500Mb) databases, the smaller went all ok. After disabling named pipes everything worked (and works still) ok.

  • Which of the two problems do you mean?  The database restore problem, with disabled array accelerator?  I solved it by removing the named-pipe alias, needed to install MS SQL 2000 server as another instance on the second node without sp3.   (I forgot to remove it after I install the second instance).  Or do you mean the enabled array accelerator problem with MS SQL intensive I/O?  (WIndows intensive I/O, with the same Disk Bytes/sec counter worked fine) 

    ?

  • Processed 3706848 pages for database 'mydb', file 'mydb_Data' on file 1.

    Processed 648240 pages for database 'mydb', file 'mydb_Index' on file 1.

    [Microsoft][ODBC SQL Server Driver][Named Pipes]ConnectionRead (WrapperRead()).

    Server: Msg 11, Level 16, State 1, Line 0

    General network error. Check your network documentation.

    Processed 5124 pages for database 'mydb', file 'mydb_Log' on file 1.

    Connection Broken

    I was pointing to this problem. I did not make up out of your post that you had a second problem.

  • We are plagued with the SQL 0 and 11 errors running SAP on SQL 2000 SP3a, MDAC Hotfix, Windows 2003.  HP and SAP have been no help to date and basically told us that there is no way to solve the problem.  Bullsh*t.

    My diagnosis has shown that SAP will open up a random amount of named pipe connections and then start creating TCP connections even though the alias is set to named pipes; switching just to TCP does not make the problem go away.  One setting that helped reduce the problem was adding MaxTokenSize to the registry with a value of 48000.

    Other notes also indicate that the SQL 11 error can be caused by contention on tempDB or if Log files fill up.

    It is my firm belief that this is an MDAC 2.8 issue and Microsoft needs to release MDAC 2.8 SP1 for Windows 2003.

     

  • I would agree. We are running win2k3 and haven't had any major issues but I haven't had to restore anything over 40 gig yet on those systems. We did have issues with firmware and drivers for the HBA's we were using only to find out that securepath didn't like the rev of the driver and the driver we needed didnt' work with the firmware of the HBA. I would make sure that you don't have any driver issues at all before I started down any other path.

    Wes

  • In attempt to resolve driver issues we recently updated the firmware on the EVA and NIC drivers.  Problem still occurs.  I also created additional datafiles on tempdb just to make sure it wasn't the so called contention issue;  That didn't help either.

    As always each vendor is point fingers at the other.

  • Hewlett Packard support advised to replace 3-port internal FC-hubs with switches as well as to upgrade Secure Path 4c with sp1 and to upgrade HBA firmware.  The main focus of the advice was replacing hubs with switches.  Did anyone solved the hardware write cache problem doing this?

     

  • I found an interesting note on the Microsoft Site, it may be worth looking at.

    http://support.microsoft.com/default.aspx?kbid=328476

     

     

  • Implementing the WinsockListenBacklog produced very strange results....

    1) Appears to have fixed SQL 11 error

    2) No Named pipe connection are created only TCP even from SQL Agent and Enterprise Manager.

    I have a note off to Microsoft asking for an explanation of why this is.

     

     

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply