dbWarden - A Free SQL Server Monitoring Package

  • Excellent work - many thanks for sharing.

  • SQLR45 (4/29/2013)


    Hello, first of all thanks for this amazing solution.

    I'm getting a particular error on one server while trying to run the Health Report:

    Warning: Null value is eliminated by an aggregate or other SET operation. [SQLSTATE 01003] (Message 8153) Conversion failed when converting the nvarchar value '9 lockTimeout=' to data type int. [SQLSTATE 22018] (Error 245). The step failed.

    Microsoft SQL Server 2005 - 9.00.5000.00 (X64) (SP4)

    Any assistance you can provide would be greatly appreciated.

    Matt

    Thanks, I'll keep my eye out for this bug. I wasn't seeing it before when I tested on 2005 SP4. If you troubleshoot the issue further and have any more info, let me know.

  • Ok thanks, just to let you know that it's working fine on other 2005 sp4.

  • I'm not sure that LongRunningQueries is finding long running queries.

    It seems like it's finding long sessions.

    Every night I run a job that runs a sp that runs a dozen sps that each run a dozen sps.

    In reality what ends up happening is a handful of reports (contained in a stored procedure each) are run 100+ times over with different parameters to test them out.

    For the most part, the SQL is subpar. Might be a series of 10+ queries each into temp tables and filtering/aggregating that data.

    What I would expect to see as a query is each query.

    What I see in the results is a set of cumulative statistics:

    - Same session_id

    - Same Login_Time

    - Same Login_Name

    - Same Program_Name

    - Cumulative Reads

    - Cumulative Writes

    - Cumulative RunTime

    - Cumulative Logical_Reads

    - Cumulative CPU_Time

    - Apparently the SQL text running at the moment the snapshot was taken

    So effectively I'm left with great "Long running Session" or "Long running job" information. I see the job and every few minutes it's statistics updated with whats happened in the last few minutes. The SQL bears no relation to the numbers being shown... I suppose the longer running queries are statistically more likely to be running when a snapshot is taken.

    It doesn't tell me which queries are running longest, which is what I'd expect.

    It *does* help during the day because there aren't any singular long sessions. Maybe a whole report (10+ queries) might ding as longrunning, which is good, but no hints as to what part of the procedure was long running.

    If this is intended... it is what it is. It's just not what I expected.

  • It's a bug since the script transitioned to a new query for gathering query info. I should have a fix in the next version.

    Also, the queries that gather the data need to look at a different(new) column (the new column "Formatted_SQL_Text", versus the old "raw" SQL text column, "SQL_Text")

    There is another bug in the new usp_checkfileswork proc that is fixed in the next version... Hopefully I can get time to upload it today.

  • So the bug is that it's not showing the right SQL text?

    But using sys.dm_exec_requests and sys.dm_exec_sessions rather than sys.dm_exec_query_stats to get session level data... is that the intention of LongRunningQueries to get the higher level session data?

  • Adding a CERTIFICATE causes the Schema Tracking trigger to fail - red ink everywhere... SQL2005, but syntax is unchanged for SQL2012. SQLCmd cannot be NULL.

  • I am getting negative numbers on file stats last 24 hours for one of my databases.

    File Stats - Last 24 Hours

    Filename# ReadsKBytes Read# WritesKBytes WrittenIO Read Wait (MS)IO Write Wait (MS)Cumulative IO (GB)IO %

    pricing2.mdf130-6960.00 (-6.80 MB)-322-22984.00 (-22.45 MB)6630-2359-0.030.01

    pricing2_log.ldf-49-40760.00 (-39.80 MB)-12926-40929.00 (-39.97 MB)-1652-9526-0.080.00

    One the jobs listing, can you add in next run date/time?

    I use gmail and the listing is too long so it comes as an attachment. A lines like the job listing description goes to two lines, it can be somewhat hard to read.

  • In my environment the Job "LongRunningQueriesAlert" fails often.

    I think it's something with the original SQL Text that is inserted in the Table QueryHistory at the beginning of "usp_LongRunningQueries".

    Maybe we should "escape" some characters before they are inserted in the Table like '--' or '' ?

    replace(qt.[text],'--','-') or so?

    BTW: I renamed the Jobs also to 'dbWarden_*' to be easily identifyed and prefixed the Email Subjects with '[dbWarden]*' for easy outlook rules. Just my thoughts....

    Regards,

    Volker

  • I agree that jobs name should be renamed to 'dbWarden_* in future release.

  • dbWarden 2.4 is ready for download.

    A lot of fixes and a few enhancements are in this release.

    Read through the change log, and be sure to note that all existing jobs named "dba_*" will be dropped in this script!

    Thanks to everyone for their help in finding bugs/fixes and for your suggestions!!

    Changelog

    - usp_JobStats - Creating temp tables instead of inserting INTO

    - usp_JobStats - Removed COALESCE's from previous change on 4/24. Causing dates to read 1/1/1900 when NULL. Would rather have NULL.

    - rpt_HealthReport - Fixed HTML formatting in Job Stats section

    - rpt_HealthReport - Changed Job Stats section - CREATE #TEMPJOB instead of INSERT INTO

    - rpt_HealthReport - Changed LongRunningQueries section to use Formatted_SQL_Text instead of SQL_Text

    - usp_LongRunningQueries - Change TEMP table to use Formatted_SQL_Text instead of SQL_Text

    - rpt_Queries- Changed to use Formatted_SQL_Text instead of SQL_Text

    - SchemaChangeLog - SQLCmd is now NULLABLE (added ALTER TABLE for existing installations)

    - rpt_HealthReport - New variables added to AlertSettings to turn sections on/off or show reduced data. Use the [Enabled] BIT in AlertSettings.

    - Changed all SQL Job names from "dba_" to "dbWarden_" **ALL OLD JOBS WILL BE DROPPED

    - Added "[dbWarden]" to the start of all email subject lines

    - AlertSettings table has changed. AlertContacts table added, separating out email addresses from the AlertSettings table

    The following procs were modified to work with the updated schema

    - usp_CheckFiles

    - usp_CheckFilesWork

    - ti_blockinghistory

    - usp_CPUProcessAlert

    - usp_LongRunningQueries

    - usp_LongRunningJobs

    - rpt_Queries

    - rpt_HealthReport

  • Couple of suggestions

    1) Put the html css into a table (AlertContacts, or AlertSettings maybe). Then it can be more easily customized to a users corporate color scheme.

    2) Don't mix use of css style and separate styling on tags eg: <td bgcolor="#F0F0F0" width="150" should be replaced with the appropriate style attribute

    3) Use the @profile_name parameter for sp_send_dbmail. This way you're not assuming the default profile is the "alerting" profile. We have one server that has many mail profiles that are used for internal and external user communications.

    (ref: http://msdn.microsoft.com/en-US/library/ms190307(v=sql.90).aspx)

    4) The sys.dm_exec_sql_text dmv returns the entire sql text for the batch/request. The statement_start_offset and statement_end_offset values in sys.dm_exec_requests are used to show what statement in that batch is currently executing. So your column "Formatted_SQL_Text" is a little mis-leading and more correctly named something like "Current_SQL_Text".

    (ref: http://msdn.microsoft.com/en-us/library/ms177648.aspx)

    5) Particularly for usp_LongRunningQueries, but may be applicable for other alerting procedures. Populate the Temp table first. Then send the alerts based on the results of the temp table. Once any alerts have been sent, insert the temp table results into the QueryHistory table. This will help performance in high use systems as you'll consistently be analyzing alerting criteria on a smaller result, rather than re-querying the QueryHistory table which is likely to will grow quickly on high use servers.

    6) For one server I was getting alerts even though all databases in the DatabaseSettings table are set to false. Tracked this down to a normal scheduled process that runs in the context of the master database. Maybe the "install" script should have a option to include systems databases.

    --------------------
    Colt 45 - the original point and click interface

  • Hi Michael,

    first of all a big thanks for the update.

    It's exactly what I thought of - especial the health report!

    Since the beginning I have an error that's coming up only on two servers of eight where I installed the latest version V 2.4.

    It's still with the usp_LongRunningQueries as I mentioned earlier - thought it had something to do with the SQL Text that's inserted - but it's not 🙁

    It's in the part

    SELECT QueryHistoryID, DateStamp, Login_Time, Start_Time, session_id, CPU_Time, reads, writes, Logical_Reads, [host_name], [DBName], login_name, Formatted_SQL_Text AS SQL_Text, [program_name]

    FROM [dbWarden].dbo.QueryHistory

    WHERE (DATEDIFF(ss,Start_Time,DateStamp)) >= @QueryValue

    AND (DATEDIFF(mi,DateStamp,GETDATE())) < (DATEDIFF(mi,@LastCollectionTime, DateStamp))

    AND [DBName] NOT IN (SELECT [DBName] FROM [dbWarden].dbo.DatabaseSettings WHERE LongQueryAlerts = 0)

    AND Formatted_SQL_Text NOT LIKE '%BACKUP DATABASE%'

    AND Formatted_SQL_Text NOT LIKE '%RESTORE VERIFYONLY%'

    AND Formatted_SQL_Text NOT LIKE '%ALTER INDEX%'

    AND Formatted_SQL_Text NOT LIKE '%DECLARE @BlobEater%'

    AND Formatted_SQL_Text NOT LIKE '%DBCC%'

    AND Formatted_SQL_Text NOT LIKE '%WAITFOR(RECEIVE%'

    and it's one of the datediffs - there I end up...

    Error message (translated from german)

    The datediff function resulted in an overflow. The number of datepart values ??that separate two date/time instances is too big. Use a function datediff with a less precise datepart. [SQLSTATE 22003] (Error 535)

    Hope you have an idea what's going on...

    Thanks and best Regards,

    Volker

  • Hello,

    I'm still working on the

    Msg 245, Level 16, State 1, Line 8

    Conversion failed when converting the nvarchar value '9 lockTimeout=' to data type int.

    Error message appear when running health report on sql 2005.

    I found that the piece of code creating the error is :

    BEGIN

    INSERT INTO #TEMPDATES (LogDate)

    SELECT DISTINCT CONVERT(VARCHAR(30),LogDate,120) as LogDate

    FROM #ERRORLOG

    WHERE ProcessInfo LIKE 'spid%'

    and [text] LIKE ' process id=%'

    INSERT INTO #DEADLOCKINFO (DeadLockDate, DBName, ProcessInfo, VictimHostname, VictimLogin, VictimSPID, LockingHostname, LockingLogin, LockingSPID)

    SELECT

    DISTINCT CONVERT(VARCHAR(30),b.LogDate,120) AS DeadlockDate,

    DB_NAME(SUBSTRING(RTRIM(SUBSTRING(b.[text],PATINDEX('%currentdb=%',b.[text]),SUM((PATINDEX('%lockTimeout%',b.[text])) - (PATINDEX('%currentdb=%',b.[text])) ) )),11,50)) as DBName,

    b.processinfo,

    SUBSTRING(RTRIM(SUBSTRING(a.[text],PATINDEX('%hostname=%',a.[text]),SUM((PATINDEX('%hostpid%',a.[text])) - (PATINDEX('%hostname=%',a.[text])) ) )),10,50)

    AS VictimHostname,

    CASE WHEN SUBSTRING(RTRIM(SUBSTRING(a.[text],PATINDEX('%loginname=%',a.[text]),SUM((PATINDEX('%isolationlevel%',a.[text])) - (PATINDEX('%loginname=%',a.[text])) ) )),11,50) NOT LIKE '%id%'

    THEN SUBSTRING(RTRIM(SUBSTRING(a.[text],PATINDEX('%loginname=%',a.[text]),SUM((PATINDEX('%isolationlevel%',a.[text])) - (PATINDEX('%loginname=%',a.[text])) ) )),11,50)

    ELSE NULL END AS VictimLogin,

    CASE WHEN SUBSTRING(RTRIM(SUBSTRING(a.[text],PATINDEX('%spid=%',a.[text]),SUM((PATINDEX('%sbid%',a.[text])) - (PATINDEX('%spid=%',a.[text])) ) )),6,10) NOT LIKE '%id%'

    THEN SUBSTRING(RTRIM(SUBSTRING(a.[text],PATINDEX('%spid=%',a.[text]),SUM((PATINDEX('%sbid%',a.[text])) - (PATINDEX('%spid=%',a.[text])) ) )),6,10)

    ELSE NULL END AS VictimSPID,

    SUBSTRING(RTRIM(SUBSTRING(b.[text],PATINDEX('%hostname=%',b.[text]),SUM((PATINDEX('%hostpid%',b.[text])) - (PATINDEX('%hostname=%',b.[text])) ) )),10,50)

    AS LockingHostname,

    CASE WHEN SUBSTRING(RTRIM(SUBSTRING(b.[text],PATINDEX('%loginname=%',b.[text]),SUM((PATINDEX('%isolationlevel%',b.[text])) - (PATINDEX('%loginname=%',b.[text])) ) )),11,50) NOT LIKE '%id%'

    THEN SUBSTRING(RTRIM(SUBSTRING(b.[text],PATINDEX('%loginname=%',b.[text]),SUM((PATINDEX('%isolationlevel%',b.[text])) - (PATINDEX('%loginname=%',b.[text])) ) )),11,50)

    ELSE NULL END AS LockingLogin,

    CASE WHEN SUBSTRING(RTRIM(SUBSTRING(b.[text],PATINDEX('%spid=%',b.[text]),SUM((PATINDEX('%sbid=%',b.[text])) - (PATINDEX('%spid=%',b.[text])) ) )),6,10) NOT LIKE '%id%'

    THEN SUBSTRING(RTRIM(SUBSTRING(b.[text],PATINDEX('%spid=%',b.[text]),SUM((PATINDEX('%sbid=%',b.[text])) - (PATINDEX('%spid=%',b.[text])) ) )),6,10)

    ELSE NULL END AS LockingSPID

    FROM #TEMPDATES t

    JOIN #ERRORLOG a

    ON CONVERT(VARCHAR(30),t.LogDate,120) = CONVERT(VARCHAR(30),a.LogDate,120)

    JOIN #ERRORLOG b

    ON CONVERT(VARCHAR(30),t.LogDate,120) = CONVERT(VARCHAR(30),b.LogDate,120) AND a.[text] LIKE ' process id=%' AND b.[text] LIKE ' process id=%' AND a.ID < b.ID

    GROUP BY b.LogDate,b.processinfo, a.[Text], b.[Text]

    DELETE FROM #ERRORLOG

    WHERE CONVERT(VARCHAR(30),LogDate,120) IN (SELECT DeadlockDate FROM #DEADLOCKINFO)

    DELETE FROM #DEADLOCKINFO

    WHERE (DeadlockDate < CONVERT(DATETIME, CONVERT (VARCHAR(10), GETDATE(), 101)) -1)

    OR (DeadlockDate >= CONVERT(DATETIME, CONVERT (VARCHAR(10), GETDATE(), 101)))

    END

    Any help would be appreciated.

  • Msg 245, Level 16, State 1, Line 8

    Conversion failed when converting the nvarchar value '9 lockTimeout=' to data type int.

    @SQLR45, Are you sure that bit of code is where the error is coming from? That looks to be from the rpt_HealthReport procedure, but is nowhere near line 8.

    Have you tried running the usp_TodaysDeadlocks procedure? Does it produce the same error?

    --------------------
    Colt 45 - the original point and click interface

Viewing 15 posts - 91 through 105 (of 186 total)

You must be logged in to reply to this topic. Login to reply