inconsistently wrong query results

  • Hi All - I have a count/grouping problem and for the life of me can't make sense of it. Involved in the query is one table (plus a time-dimension table as a variation) which holds info on enquiries made, each enquiry has its int key and an FK referencing the table which contains info on cases, it's a straight forward one-to many relationship. What I want the query to do is count distinct cases and count enquiries per month. It shouldn't be hard, but I get inconsistent and partly wrong results. Below is the sql with result sets. The first two snipets are variations on achiving the count/group operations in one query, and the third bit looking up each month seperately and puts it in one result set, where the months are in the same order as in the two queries above. The last result set is also the correct one.

    I have checked the data and there are no abnormalities, so I must be to do with just not understanding the way sql count and groups. Any help would be most welcome, thanx.

    SELECT TOP (100) PERCENT COUNT(DISTINCT CaseID) AS Cases, COUNT( distinct CallID) AS Enquiries, RIGHT(CONVERT(varchar(10), CallDatAdd, 5), 5) as [Month-Year]

    FROM tbldCalls

    WHERE (CallDatAdd BETWEEN CONVERT(DATETIME, '2013-01-01 00:00:00', 102) AND CONVERT(DATETIME, '2013-06-30 00:00:00', 102))

    GROUP BY RIGHT(CONVERT(varchar(10), CallDatAdd, 5), 5)

    order by [Month-Year] asc

    CasesEnquiriesMonth-Year

    72394501-13

    64686602-13

    77393703-13

    67582904-13

    75892805-13

    63278206-13

    SELECT COUNT(DISTINCT tbldCalls.CaseID) AS Cases, COUNT(tbldCalls.CallID) AS Enquiries, dimension_time.Month_Text

    FROM dimension_time INNER JOIN

    tbldCalls ON CONVERT(date, dimension_time.Day_Timestamp) = CONVERT(date, tbldCalls.CallDatAdd)

    WHERE (tbldCalls.CallDatAdd BETWEEN CONVERT(DATETIME, '2013-01-01 00:00:00', 102) AND CONVERT(DATETIME, '2013-06-30 00:00:00', 102))

    GROUP BY dimension_time.Month_Text, dimension_time.Month_Key

    ORDER BY dimension_time.Month_Key asc

    CasesEnquiriesMonth_Text

    723945Jan

    646866Feb

    773937Mar

    675829Apr

    758928May

    632782Jun

    SELECT count(distinct CaseID) as Cases, count (callid) as Enquiries

    FROM tbldCalls

    WHERE (CallDatAdd between CONVERT(DATETIME, '2013-01-01 00:00:00', 102) and CONVERT(DATETIME, '2013-01-31 00:00:00', 102))

    Union all

    SELECT count(distinct CaseID) as FebCases, count (callid) as Enquiries

    FROM tbldCalls

    WHERE (CallDatAdd between CONVERT(DATETIME, '2013-02-01 00:00:00', 102) and CONVERT(DATETIME, '2013-02-28 00:00:00', 102))

    Union all

    SELECT count(distinct CaseID) as MarchCases, count (callid) as Enquiries

    FROM tbldCalls

    WHERE (CallDatAdd between CONVERT(DATETIME, '2013-03-01 00:00:00', 102) and CONVERT(DATETIME, '2013-03-31 00:00:00', 102))

    Union all

    SELECT count(distinct CaseID) as AprilCases, count (callid) as Enquiries

    FROM tbldCalls

    WHERE (CallDatAdd between CONVERT(DATETIME, '2013-04-01 00:00:00', 102) and CONVERT(DATETIME, '2013-04-30 00:00:00', 102))

    Union all

    SELECT count(distinct CaseID) as MayCases, count (callid) as Enquiries

    FROM tbldCalls

    WHERE (CallDatAdd between CONVERT(DATETIME, '2013-05-01 00:00:00', 102) and CONVERT(DATETIME, '2013-05-31 00:00:00', 102))

    Union all

    SELECT count(distinct CaseID) as JuneCases, count (callid) as Enquiries

    FROM tbldCalls

    WHERE (CallDatAdd between CONVERT(DATETIME, '2013-06-01 00:00:00', 102) and CONVERT(DATETIME, '2013-06-30 00:00:00', 102))

    CasesEnquiries

    680888

    618825

    773937

    630772

    678831

    632782

  • DOH!!!!

    the union all queries should look up the date as for example

    BETWEEN CONVERT(DATETIME, '2012-12-01 00:00:00', 102) AND CONVERT(DATETIME, '2012-12-31 23:59:59', 102)

    as the day ends 23:59:59 ... :blush:

  • Things to keep in mind:

    - the BETWEEN operator is the logical equivalent of ">= and <=".

    - the DATETIME datatype has precision down to fractions of a second.

    - when you define a DATETIME without specifying the time a time of 00:00:00.000 is assumed.

    When you are dealing with DATETIME ranges you should use the ">= AND <" construct to avoid excluding or double-counting edge values. For example:

    TestDate >= '01/01/2013' and TestDate < '02/01/2013'

    The reason is that if you try to specify a <= value you would have to be much more precise to avoid excluding any data. For example:

    TestDate >= '01/01/2013' and TestDate <= '02/01/2013 23:59:59.999'

    If you use the pattern:

    TestDate >= '01/01/2013' and TestDate <= 01/31/2013'

    TestDate >= 02/01/2013' and TestDate <= 02/28/2013'

    You will exclude any records with values between '01/31/2013 00:00:00.001' and '01/31/2013 32:59:59.999'.

    If you use the pattern:

    TestDate >= '01/01/2013' and TestDate <= 02/01/2013'

    TestDate >= 02/01/2013' and TestDate <= 03/01/2013'

    You will double-count any records with a value of '02/01/2013 00:00:00.000'.

    By using the ">= and <" construct you are clearly splitting values into one group or another.

  • Thanx paul.s.lach, makes sense.

    Some of my parameters are passed by an access front end and I hadn't realised that passing a date only parameter would result in an 00:00:000 assumption (we don't really need anything more accurate than date when searching). Have rewritten my procs now and gotten rid of the 'between ... and' logic alltogether. Thanx for the tip!

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply