Date Range overlaps

Question

Date Range overlaps

Viewing 15 posts - 16 through 30 (of 39 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 1

You might find this version a little easier to follow than the IBG original and it only reads the data once, so it's quite a bit quicker:

;WITH DateSequence AS (

SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

FROM #testtable m

CROSS APPLY (VALUES ('S', StartDate, 1), ('E', ISNULL(EndDate,'21000101'), -1)) d (Dir, [Date], flag)

),

MarkedBoundaries AS (

SELECT *, LastOnOff = ISNULL(LAG(OnOff,1) OVER(PARTITION BY Code ORDER BY [Date]),0)

FROM DateSequence

),

PairedDates AS (

SELECT *, Grp = (ROW_NUMBER() OVER(PARTITION BY Code ORDER BY [Date])-1)/2

FROM MarkedBoundaries

WHERE OnOff = 0 OR LastOnOff = 0

)

SELECT Code,

StartDate = MAX(CASE WHEN Dir = 'S' THEN [Date] ELSE NULL END),

EndDate = NULLIF(MAX(CASE WHEN Dir = 'E' THEN [Date] ELSE NULL END),'21000101')

FROM PairedDates

GROUP BY Code, Grp

ORDER BY Code, Grp

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 2

ChrisM@Work (7/7/2016)
You might find this version a little easier to follow than the IBG original and it only reads the data once, so it's quite a bit quicker:
;WITH DateSequence AS (
SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM #testtable m
CROSS APPLY (VALUES ('S', StartDate, 1), ('E', ISNULL(EndDate,'21000101'), -1)) d (Dir, [Date], flag)
),
MarkedBoundaries AS (
SELECT *, LastOnOff = ISNULL(LAG(OnOff,1) OVER(PARTITION BY Code ORDER BY [Date]),0)
FROM DateSequence
),
PairedDates AS (
SELECT *, Grp = (ROW_NUMBER() OVER(PARTITION BY Code ORDER BY [Date])-1)/2
FROM MarkedBoundaries
WHERE OnOff = 0 OR LastOnOff = 0
)
SELECT Code,
StartDate = MAX(CASE WHEN Dir = 'S' THEN [Date] ELSE NULL END),
EndDate = NULLIF(MAX(CASE WHEN Dir = 'E' THEN [Date] ELSE NULL END),'21000101')
FROM PairedDates
GROUP BY Code, Grp
ORDER BY Code, Grp

Haven't tried this yet but looks awesome. It's one of the things that Itzik referred to at the very end of his article because, as he pointed out, MS hadn't come up with proper windowing functions that would do the Preceeding Rows thing, yet. He was pretty instrumental in pushing on MS to make it a priority and, man, I'm thankful for that.

Now that we've finally upgraded from 2005 to 1012 at work, I can see me working with it more and more. Thanks for the great example, Chris.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 3

ChrisM@Work (7/7/2016)
You might find this version a little easier to follow than the IBG original and it only reads the data once, so it's quite a bit quicker:

edit...the original OP post related to SQL 2008....so what follows with windowing functions will not work for OP

Hi Chris

had a look at your solution and decided to have a "play" <grin>

here is some code that builds a test harness

--test harness adpated from here http://sqlmag.com/blog/solutions-packing-date-and-time-intervals-puzzle

SET NOCOUNT ON;

USE tempdb;

---- helper function GetNums.....

-- jls: altenative use your own tally table/function

--IF OBJECT_ID('dbo.GetNums', 'IF') IS NOT NULL

-- DROP FUNCTION dbo.GetNums;

--GO

--CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE

--AS

--RETURN

-- WITH

-- L0 AS(SELECT 1 AS c UNION ALL SELECT 1),

-- L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

-- L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),

-- L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),

-- L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),

-- L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),

-- Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)

-- SELECT TOP (@n) n FROM Nums ORDER BY n;

--GO

IF OBJECT_ID('dbo.testtable') IS NOT NULL DROP TABLE dbo.testtable;

CREATE TABLE dbo.testtable

(

id INT NOT NULL IDENTITY(1, 1),

Code VARCHAR(14) NOT NULL,

startdate DATETIME NOT NULL,

enddate DATETIME NOT NULL,

CONSTRAINT PK_Sessions PRIMARY KEY(id),

CONSTRAINT CHK_endtime_gteq_starttime CHECK (enddate >= startdate)

);

-- code to create and populate the table testtable with 5,000,000 rows

DECLARE

@num_code AS INT = 1000,

@intervals_per_code AS INT = 5000,

@start_period AS DATETIME = '20110101',

@end_period AS DATETIME = '20110114',

@max_duration_in_ms AS INT = 3600000; -- 60 min

--TRUNCATE TABLE dbo.testtable;

WITH C AS

(

SELECT 'Code' + RIGHT('00' + CAST(U.n AS VARCHAR(10)), 10) AS code,

DATEADD(ms, ABS(CHECKSUM(NEWID())) % 86400000,

DATEADD(day, ABS(CHECKSUM(NEWID())) % DATEDIFF(day, @start_period, @end_period), @start_period)) AS startdate

FROM dbo.GetNums(@num_code) AS U

CROSS JOIN dbo.GetNums(@intervals_per_code) AS I

)

INSERT INTO dbo.testtable WITH (TABLOCK) (code, startdate, enddate)

SELECT code,

startdate,

DATEADD(ms, ABS(CHECKSUM(NEWID())) % (@max_duration_in_ms + 1), startdate) AS enddate

FROM C;

-- indexes

CREATE UNIQUE INDEX idx_user_start_id ON dbo.testtable(code, startdate, id);

CREATE UNIQUE INDEX idx_user_end_id ON dbo.testtable(code, enddate, id);

if I compare your code to an alternative that IBG suggested initially for Oracle, prior to necesary windowing functions being available in Sql, then in my tests IBG code wins hands down.

Maybe you will see differences...will be interested to see what you think?

(I have removed the requirement for timediff calc and the OP's issue with NULL for sanity)

-- IBG http://blogs.solidq.com/en/sqlserver/packing-intervals/ Solution 3, Using a Window Aggregate

WITH C1 AS

(

SELECT code, startdate AS ts, +1 AS type, 1 AS sub

FROM dbo.testtable

UNION ALL

SELECT code, enddate AS ts, -1 AS type, 0 AS sub

FROM dbo.testtable

),

C2 AS

(

SELECT C1.*,

SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC

ROWS BETWEEN UNBOUNDED PRECEDING

AND CURRENT ROW) - sub AS cnt

FROM C1

),

C3 AS

(

SELECT code, ts,

((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)

AS grpnum

FROM C2

WHERE cnt = 0

)

SELECT code, MIN(ts) AS startdate, max(ts) AS enddate

FROM C3

GROUP BY code, grpnum

ORDER BY code

-- CHrisM solution

WITH DateSequence AS (

SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

FROM testtable m

CROSS APPLY (VALUES ('S', StartDate, 1), ('E', EndDate, -1)) d (Dir, [Date], flag)

),

MarkedBoundaries AS (

SELECT *, LastOnOff = ISNULL(LAG(OnOff,1) OVER(PARTITION BY Code ORDER BY [Date]),0)

FROM DateSequence

),

PairedDates AS (

SELECT *, Grp = (ROW_NUMBER() OVER(PARTITION BY Code ORDER BY [Date])-1)/2

FROM MarkedBoundaries

WHERE OnOff = 0 OR LastOnOff = 0

)

SELECT Code,

StartDate = MAX(CASE WHEN Dir = 'S' THEN [Date] ELSE NULL END),

EndDate = MAX(CASE WHEN Dir = 'E' THEN [Date] ELSE NULL END)

FROM PairedDates

GROUP BY Code, Grp

ORDER BY Code, Grp

--- IBG for SQL 2008

WITH C1

AS (

SELECT

code,

startdate AS dt,

1 AS type,

NULL AS e,

ROW_NUMBER() OVER(PARTITION BY code ORDER BY startdate) AS s

FROM testtable

UNION ALL

SELECT

code,

Enddate AS dt,

-1 AS type,

ROW_NUMBER() OVER(PARTITION BY code ORDER BY enddate) AS e,

NULL AS s

FROM testtable),

C2

AS (SELECT *,

ROW_NUMBER() OVER(PARTITION BY code ORDER BY dt, type DESC) AS se

FROM C1),

C3

AS (SELECT *,

FLOOR((ROW_NUMBER() OVER(PARTITION BY code ORDER BY dt) - 1) / 2) + 1 AS grpnum

FROM C2

WHERE COALESCE(s - (se - s) - 1, (se - e) - e) = 0)

SELECT code,

MIN(dt) AS startdate,

MAX(dt) AS enddate

FROM C3

GROUP BY code,grpnum

ORDER BY code,grpnum

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

That's awesome G - I've been in and out of meetings all day so haven't had a chance to play yet, but I bet there's some tweaking potential!

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

Following on from this, I had a few hours to spare on Saturday morning to spend on JLS' test harness.

IBG's first query looks like this:

SELECT code, startdate AS ts, +1 AS type, 1 AS sub

FROM dbo.testtable

UNION ALL

SELECT code, enddate AS ts, -1 AS type, 0 AS sub

FROM dbo.testtable

Two table-length reads, one from each index. The next operator requires that the data is in a specific order to avoid a sort:

SELECT C1.*,

SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC

ROWS BETWEEN UNBOUNDED PRECEDING

AND CURRENT ROW) - sub AS cnt

FROM C1

and it achieves this by using a merge join between the two streams from the first query. The plan is serial in SQL2012, even with a 5-million row table. I'd guess that a parallel plan can't guarantee that streams arrive at the merge join operator in the correct order.

The alternative I posted earlier looks like this:

SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

FROM testtable m

CROSS APPLY (VALUES ('S', StartDate, 1), ('E', EndDate, -1)) d (Dir, [Date], flag)

It uses a compute scalar to generate the "extra" rows and a nested loops join to union them. The data is out-of-order for the aggregate so a sort is invoked: join hints cannot change the join type. However, the plan is parallel. On my two-core playpen the IBG query was about four times faster than my mod, mostly down to the sort.

That's on two cores, however. I wonder how that parallel plan will behave with 20 cores? Lunchtime = playtime 🙂

Thanks G for posting this up.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

Jeff Moden (7/7/2016)
ChrisM@Work (7/7/2016)
You might find this version a little easier to follow than the IBG original and it only reads the data once, so it's quite a bit quicker:
;WITH DateSequence AS (
SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM #testtable m
CROSS APPLY (VALUES ('S', StartDate, 1), ('E', ISNULL(EndDate,'21000101'), -1)) d (Dir, [Date], flag)
),
MarkedBoundaries AS (
SELECT *, LastOnOff = ISNULL(LAG(OnOff,1) OVER(PARTITION BY Code ORDER BY [Date]),0)
FROM DateSequence
),
PairedDates AS (
SELECT *, Grp = (ROW_NUMBER() OVER(PARTITION BY Code ORDER BY [Date])-1)/2
FROM MarkedBoundaries
WHERE OnOff = 0 OR LastOnOff = 0
)
SELECT Code,
StartDate = MAX(CASE WHEN Dir = 'S' THEN [Date] ELSE NULL END),
EndDate = NULLIF(MAX(CASE WHEN Dir = 'E' THEN [Date] ELSE NULL END),'21000101')
FROM PairedDates
GROUP BY Code, Grp
ORDER BY Code, Grp
Haven't tried this yet but looks awesome. It's one of the things that Itzik referred to at the very end of his article because, as he pointed out, MS hadn't come up with proper windowing functions that would do the Preceeding Rows thing, yet. He was pretty instrumental in pushing on MS to make it a priority and, man, I'm thankful for that.
Now that we've finally upgraded from 2005 to 1012 at work, I can see me working with it more and more. Thanks for the great example, Chris.

You're welcome ol' friend - but note the operational differences between the two versions. I was testing them on a beefy instance where parallelism paid off.

That sort is expensive because it spills. I wonder if there's a sneaky way around that?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 7

ChrisM@Work (7/11/2016)
Following on from this, I had a few hours to spare on Saturday morning to spend on JLS' test harness.
IBG's first query looks like this:
SELECT code, startdate AS ts, +1 AS type, 1 AS sub
FROM dbo.testtable
UNION ALL
SELECT code, enddate AS ts, -1 AS type, 0 AS sub
FROM dbo.testtable
Two table-length reads, one from each index. The next operator requires that the data is in a specific order to avoid a sort:
SELECT C1.*,
SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) - sub AS cnt
FROM C1
and it achieves this by using a merge join between the two streams from the first query. The plan is serial in SQL2012, even with a 5-million row table. I'd guess that a parallel plan can't guarantee that streams arrive at the merge join operator in the correct order.
The alternative I posted earlier looks like this:
SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM testtable m
CROSS APPLY (VALUES ('S', StartDate, 1), ('E', EndDate, -1)) d (Dir, [Date], flag)
It uses a compute scalar to generate the "extra" rows and a nested loops join to union them. The data is out-of-order for the aggregate so a sort is invoked: join hints cannot change the join type. However, the plan is parallel. On my two-core playpen the IBG query was about four times faster than my mod, mostly down to the sort.
That's on two cores, however. I wonder how that parallel plan will behave with 20 cores? Lunchtime = playtime 🙂
Thanks G for posting this up.

Hey Chris

try this

WITH C1 AS

(

SELECT code, startdate AS ts, +1 AS type, 1 AS sub

FROM dbo.testtable

UNION ALL

SELECT code, enddate AS ts, -1 AS type, 0 AS sub

FROM dbo.testtable

),

C2 AS

(

SELECT C1.*,

SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC

ROWS BETWEEN UNBOUNDED PRECEDING

AND CURRENT ROW) - sub AS cnt

FROM C1

),

C3 AS

(

SELECT code, ts,

((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)

AS grpnum

FROM C2

WHERE cnt = 0

)

SELECT code, MIN(ts) AS startdate, max(ts) AS enddate

--into #ibgdump2

FROM C3

GROUP BY code, grpnum

ORDER BY code

OPTION (RECOMPILE, QUERYTRACEON 8649) -- http://sqlblog.com/blogs/paul_white/archive/2011/12/23/forcing-a-parallel-query-execution-plan.aspx

as IBG says in the post.....

http://sqlmag.com/blog/solutions-packing-date-and-time-intervals-puzzle

What makes this solution so fast is the efficient use of parallelism. Note though that when testing it in different environments and with different arguments for number of users and intervals, I didn’t always get a parallel plan. If you’re not getting a parallel plan, it could be because the machine you’re using has fewer logical CPUs than 8. Just for test purposes, you can use the SQL Server service startup option -P8, which will cause SQL Server to use 8 schedulers like in an environment with 8 logical CPUs. The -P startup parameter is not an officially documented one, so use it just for test purposes to mimic a machine with a desired number of CPUs, not for production purposes. Also, I noticed that in some machines where I tested this code and didn’t get parallel plans, when changing the sample data to 2,000 users each with 2,500 intervals, instead of 1,000 by 5,000, I got parallel plans in more cases. Either way, this solution is still very fast even when using a serial plan.

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 8

J Livingston SQL (7/11/2016)
ChrisM@Work (7/11/2016)
Following on from this, I had a few hours to spare on Saturday morning to spend on JLS' test harness.
IBG's first query looks like this:
SELECT code, startdate AS ts, +1 AS type, 1 AS sub
FROM dbo.testtable
UNION ALL
SELECT code, enddate AS ts, -1 AS type, 0 AS sub
FROM dbo.testtable
Two table-length reads, one from each index. The next operator requires that the data is in a specific order to avoid a sort:
SELECT C1.*,
SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) - sub AS cnt
FROM C1
and it achieves this by using a merge join between the two streams from the first query. The plan is serial in SQL2012, even with a 5-million row table. I'd guess that a parallel plan can't guarantee that streams arrive at the merge join operator in the correct order.
The alternative I posted earlier looks like this:
SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM testtable m
CROSS APPLY (VALUES ('S', StartDate, 1), ('E', EndDate, -1)) d (Dir, [Date], flag)
It uses a compute scalar to generate the "extra" rows and a nested loops join to union them. The data is out-of-order for the aggregate so a sort is invoked: join hints cannot change the join type. However, the plan is parallel. On my two-core playpen the IBG query was about four times faster than my mod, mostly down to the sort.
That's on two cores, however. I wonder how that parallel plan will behave with 20 cores? Lunchtime = playtime 🙂
Thanks G for posting this up.
Hey Chris
try this
WITH C1 AS
(
SELECT code, startdate AS ts, +1 AS type, 1 AS sub
FROM dbo.testtable
UNION ALL
SELECT code, enddate AS ts, -1 AS type, 0 AS sub
FROM dbo.testtable
),
C2 AS
(
SELECT C1.*,
SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) - sub AS cnt
FROM C1
),
C3 AS
(
SELECT code, ts,
((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE cnt = 0
)
SELECT code, MIN(ts) AS startdate, max(ts) AS enddate
--into #ibgdump2
FROM C3
GROUP BY code, grpnum
ORDER BY code
OPTION (RECOMPILE, QUERYTRACEON 8649) -- http://sqlblog.com/blogs/paul_white/archive/2011/12/23/forcing-a-parallel-query-execution-plan.aspx
as IBG says in the post.....
http://sqlmag.com/blog/solutions-packing-date-and-time-intervals-puzzle
What makes this solution so fast is the efficient use of parallelism. Note though that when testing it in different environments and with different arguments for number of users and intervals, I didn’t always get a parallel plan. If you’re not getting a parallel plan, it could be because the machine you’re using has fewer logical CPUs than 8. Just for test purposes, you can use the SQL Server service startup option -P8, which will cause SQL Server to use 8 schedulers like in an environment with 8 logical CPUs. The -P startup parameter is not an officially documented one, so use it just for test purposes to mimic a machine with a desired number of CPUs, not for production purposes. Also, I noticed that in some machines where I tested this code and didn’t get parallel plans, when changing the sample data to 2,000 users each with 2,500 intervals, instead of 1,000 by 5,000, I got parallel plans in more cases. Either way, this solution is still very fast even when using a serial plan.

It's quick G but not quite as quick as this, which uses no Black Magic at all:

PRINT '== CJM2 =========================================================================='

SET STATISTICS IO, TIME ON

;WITH C2 AS

(

SELECT d.*, cnt = CAST(SUM(type) OVER(PARTITION BY d.code ORDER BY ts, type DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - sub AS TINYINT) -- 853760

FROM testtable m

CROSS APPLY (

VALUES

(CAST(m.Code AS VARCHAR(10)), CAST(StartDate AS DATETIME), CAST(1 AS SMALLINT), CAST(1 AS TINYINT)),

(CAST(m.Code AS VARCHAR(10)), CAST(EndDate AS DATETIME), CAST(-1 AS SMALLINT), CAST(0 AS TINYINT))

) d (Code, ts, type, sub)

),

C3 AS

(

SELECT code, ts,

((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)

AS grpnum

FROM C2

WHERE cnt = 0

)

SELECT code, MIN(ts) AS startdate, max(ts) AS enddate

FROM C3

GROUP BY code, grpnum

ORDER BY code;

SET STATISTICS IO, TIME OFF

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 9

It's quick G but not quite as quick as this, which uses no Black Magic at all:

I dont get the same results Chris....any thoughts ??

PRINT '== IBG =========================================================================='

SET STATISTICS IO, TIME ON;

WITH C1 AS

(

SELECT code, startdate AS ts, +1 AS type, 1 AS sub

FROM dbo.testtable

UNION ALL

SELECT code, enddate AS ts, -1 AS type, 0 AS sub

FROM dbo.testtable

),

C2 AS

(

SELECT C1.*,

SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC

ROWS BETWEEN UNBOUNDED PRECEDING

AND CURRENT ROW) - sub AS cnt

FROM C1

),

C3 AS

(

SELECT code, ts,

((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)

AS grpnum

FROM C2

WHERE cnt = 0

)

SELECT code, MIN(ts) AS startdate, max(ts) AS enddate

into #ibgdump

FROM C3

GROUP BY code, grpnum

ORDER BY code

OPTION (RECOMPILE, QUERYTRACEON 8649)

SET STATISTICS IO, TIME OFF

DROP TABLE #ibgdump

PRINT '== CJM2 =========================================================================='

SET STATISTICS IO, TIME ON

;WITH C2 AS

(

SELECT d.*, cnt = CAST(SUM(type) OVER(PARTITION BY d.code ORDER BY ts, type DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - sub AS TINYINT) -- 853760

FROM testtable m

CROSS APPLY (

VALUES

(CAST(m.Code AS VARCHAR(10)), CAST(StartDate AS DATETIME), CAST(1 AS SMALLINT), CAST(1 AS TINYINT)),

(CAST(m.Code AS VARCHAR(10)), CAST(EndDate AS DATETIME), CAST(-1 AS SMALLINT), CAST(0 AS TINYINT))

) d (Code, ts, type, sub)

),

C3 AS

(

SELECT code, ts,

((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)

AS grpnum

FROM C2

WHERE cnt = 0

)

SELECT code, MIN(ts) AS startdate, max(ts) AS enddate

into #cmdump

FROM C3

GROUP BY code, grpnum

ORDER BY code;

SET STATISTICS IO, TIME OFF

DROP TABLE #cmdump

== IBG ==========================================================================

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 3 ms.

Table 'testtable'. Scan count 18, logical reads 38914, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 26627 ms, elapsed time = 7075 ms.

(2693 row(s) affected)

== CJM2 ==========================================================================

Table 'testtable'. Scan count 9, logical reads 26134, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 34484 ms, elapsed time = 17962 ms.

(2693 row(s) affected)

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 10

J Livingston SQL (7/11/2016)
It's quick G but not quite as quick as this, which uses no Black Magic at all:
I dont get the same results Chris....any thoughts ??
PRINT '== IBG =========================================================================='
SET STATISTICS IO, TIME ON;
WITH C1 AS
(
SELECT code, startdate AS ts, +1 AS type, 1 AS sub
FROM dbo.testtable
UNION ALL
SELECT code, enddate AS ts, -1 AS type, 0 AS sub
FROM dbo.testtable
),
C2 AS
(
SELECT C1.*,
SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) - sub AS cnt
FROM C1
),
C3 AS
(
SELECT code, ts,
((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE cnt = 0
)
SELECT code, MIN(ts) AS startdate, max(ts) AS enddate
into #ibgdump
FROM C3
GROUP BY code, grpnum
ORDER BY code
OPTION (RECOMPILE, QUERYTRACEON 8649)
SET STATISTICS IO, TIME OFF
DROP TABLE #ibgdump
PRINT '== CJM2 =========================================================================='
SET STATISTICS IO, TIME ON
;WITH C2 AS
(
SELECT d.*, cnt = CAST(SUM(type) OVER(PARTITION BY d.code ORDER BY ts, type DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - sub AS TINYINT) -- 853760
FROM testtable m
CROSS APPLY (
VALUES
(CAST(m.Code AS VARCHAR(10)), CAST(StartDate AS DATETIME), CAST(1 AS SMALLINT), CAST(1 AS TINYINT)),
(CAST(m.Code AS VARCHAR(10)), CAST(EndDate AS DATETIME), CAST(-1 AS SMALLINT), CAST(0 AS TINYINT))
) d (Code, ts, type, sub)
),
C3 AS
(
SELECT code, ts,
((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE cnt = 0
)
SELECT code, MIN(ts) AS startdate, max(ts) AS enddate
into #cmdump
FROM C3
GROUP BY code, grpnum
ORDER BY code;
SET STATISTICS IO, TIME OFF
DROP TABLE #cmdump
== IBG ==========================================================================
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 3 ms.
Table 'testtable'. Scan count 18, logical reads 38914, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 26627 ms, elapsed time = 7075 ms.
(2693 row(s) affected)
== CJM2 ==========================================================================
Table 'testtable'. Scan count 9, logical reads 26134, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 34484 ms, elapsed time = 17962 ms.
(2693 row(s) affected)

Guessing - you're on 8 cores with less than the 200gb of ram I'm playing with?

== CJM2 ==========================================================================

Table 'testtable'. Scan count 17, logical reads 27332, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 34617, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 82182 ms, elapsed time = 7076 ms.

== JLS1 ==========================================================================

SQL Server parse and compile time:

CPU time = 9 ms, elapsed time = 9 ms.

Table 'testtable'. Scan count 34, logical reads 40869, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 48161 ms, elapsed time = 7217 ms.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 11

guess I am !!

only 12 GB here

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 12

J Livingston SQL (7/11/2016)
guess I am !!
only 12 GB here

Try with this index G, I'm getting about 10% improvement.

-- CREATE UNIQUE NONCLUSTERED INDEX [idx_Code_Dates] ON [dbo].[testtable] ([Code] ASC,[enddate] ASC,[startdate] ASC)

PRINT '== CJM2 =========================================================================='

SET STATISTICS IO, TIME ON

;WITH C2 AS (

SELECT

d.code ,

d.ts,

cnt = SUM(type) OVER(PARTITION BY d.code ORDER BY d.ts, d.type DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - d.sub

FROM testtable m

CROSS APPLY (

VALUES

(CAST(m.Code AS CHAR(10)), CAST(m.StartDate AS DATETIME), CAST(1 AS SMALLINT), CAST(1 AS TINYINT)),

(CAST(m.Code AS CHAR(10)), CAST(m.EndDate AS DATETIME), CAST(-1 AS SMALLINT), CAST(0 AS TINYINT))

) d (code, ts, type, sub)

),

C3 AS (

SELECT code, ts,

grpnum = ((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2)

FROM C2

WHERE cnt = 0

)

SELECT code, MIN(ts) AS startdate, max(ts) AS enddate

FROM C3

GROUP BY code, grpnum

ORDER BY code;

SET STATISTICS IO, TIME OFF -- 2782

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 13

== CJM2 ==========================================================================

Table 'testtable'. Scan count 9, logical reads 24440, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 31218 ms, elapsed time = 16031 ms.

(2693 row(s) affected)

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 14

J Livingston SQL (7/11/2016)
== CJM2 ========================================================================== Table 'testtable'. Scan count 9, logical reads 24440, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 31218 ms, elapsed time = 16031 ms.
(2693 row(s) affected)

On your phone? 😛 You've got nothing in cache!

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 15

ChrisM@Work (7/11/2016)
J Livingston SQL (7/11/2016)
== CJM2 ========================================================================== Table 'testtable'. Scan count 9, logical reads 24440, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 31218 ms, elapsed time = 16031 ms.
(2693 row(s) affected)
On your phone? 😛 You've got nothing in cache!

nope...not on phone

ran DBCC FREEPROCCACHE then

ran your code 10 times.....that was the best I got.

Microsoft SQL Server 2014 - 12.0.4213.0 (X64) Enterprise Evaluation Edition (64-bit) on Windows NT 6.3 <X64> (Build 10586: )

Windows 10

Intel i7 6700

16Gb RAM (12gb to SQL)

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day