Date Range overlaps

  • You might find this version a little easier to follow than the IBG original and it only reads the data once, so it's quite a bit quicker:

    ;WITH DateSequence AS (

    SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    FROM #testtable m

    CROSS APPLY (VALUES ('S', StartDate, 1), ('E', ISNULL(EndDate,'21000101'), -1)) d (Dir, [Date], flag)

    ),

    MarkedBoundaries AS (

    SELECT *, LastOnOff = ISNULL(LAG(OnOff,1) OVER(PARTITION BY Code ORDER BY [Date]),0)

    FROM DateSequence

    ),

    PairedDates AS (

    SELECT *, Grp = (ROW_NUMBER() OVER(PARTITION BY Code ORDER BY [Date])-1)/2

    FROM MarkedBoundaries

    WHERE OnOff = 0 OR LastOnOff = 0

    )

    SELECT Code,

    StartDate = MAX(CASE WHEN Dir = 'S' THEN [Date] ELSE NULL END),

    EndDate = NULLIF(MAX(CASE WHEN Dir = 'E' THEN [Date] ELSE NULL END),'21000101')

    FROM PairedDates

    GROUP BY Code, Grp

    ORDER BY Code, Grp

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (7/7/2016)


    You might find this version a little easier to follow than the IBG original and it only reads the data once, so it's quite a bit quicker:

    ;WITH DateSequence AS (

    SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    FROM #testtable m

    CROSS APPLY (VALUES ('S', StartDate, 1), ('E', ISNULL(EndDate,'21000101'), -1)) d (Dir, [Date], flag)

    ),

    MarkedBoundaries AS (

    SELECT *, LastOnOff = ISNULL(LAG(OnOff,1) OVER(PARTITION BY Code ORDER BY [Date]),0)

    FROM DateSequence

    ),

    PairedDates AS (

    SELECT *, Grp = (ROW_NUMBER() OVER(PARTITION BY Code ORDER BY [Date])-1)/2

    FROM MarkedBoundaries

    WHERE OnOff = 0 OR LastOnOff = 0

    )

    SELECT Code,

    StartDate = MAX(CASE WHEN Dir = 'S' THEN [Date] ELSE NULL END),

    EndDate = NULLIF(MAX(CASE WHEN Dir = 'E' THEN [Date] ELSE NULL END),'21000101')

    FROM PairedDates

    GROUP BY Code, Grp

    ORDER BY Code, Grp

    Haven't tried this yet but looks awesome. It's one of the things that Itzik referred to at the very end of his article because, as he pointed out, MS hadn't come up with proper windowing functions that would do the Preceeding Rows thing, yet. He was pretty instrumental in pushing on MS to make it a priority and, man, I'm thankful for that.

    Now that we've finally upgraded from 2005 to 1012 at work, I can see me working with it more and more. Thanks for the great example, Chris.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • ChrisM@Work (7/7/2016)


    You might find this version a little easier to follow than the IBG original and it only reads the data once, so it's quite a bit quicker:

    edit...the original OP post related to SQL 2008....so what follows with windowing functions will not work for OP

    Hi Chris

    had a look at your solution and decided to have a "play" <grin>

    here is some code that builds a test harness

    --test harness adpated from here http://sqlmag.com/blog/solutions-packing-date-and-time-intervals-puzzle

    SET NOCOUNT ON;

    USE tempdb;

    ---- helper function GetNums.....

    -- jls: altenative use your own tally table/function

    --IF OBJECT_ID('dbo.GetNums', 'IF') IS NOT NULL

    -- DROP FUNCTION dbo.GetNums;

    --GO

    --CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE

    --AS

    --RETURN

    -- WITH

    -- L0 AS(SELECT 1 AS c UNION ALL SELECT 1),

    -- L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

    -- L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),

    -- L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),

    -- L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),

    -- L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),

    -- Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)

    -- SELECT TOP (@n) n FROM Nums ORDER BY n;

    --GO

    IF OBJECT_ID('dbo.testtable') IS NOT NULL DROP TABLE dbo.testtable;

    CREATE TABLE dbo.testtable

    (

    id INT NOT NULL IDENTITY(1, 1),

    Code VARCHAR(14) NOT NULL,

    startdate DATETIME NOT NULL,

    enddate DATETIME NOT NULL,

    CONSTRAINT PK_Sessions PRIMARY KEY(id),

    CONSTRAINT CHK_endtime_gteq_starttime CHECK (enddate >= startdate)

    );

    -- code to create and populate the table testtable with 5,000,000 rows

    DECLARE

    @num_code AS INT = 1000,

    @intervals_per_code AS INT = 5000,

    @start_period AS DATETIME = '20110101',

    @end_period AS DATETIME = '20110114',

    @max_duration_in_ms AS INT = 3600000; -- 60 min

    --TRUNCATE TABLE dbo.testtable;

    WITH C AS

    (

    SELECT 'Code' + RIGHT('00' + CAST(U.n AS VARCHAR(10)), 10) AS code,

    DATEADD(ms, ABS(CHECKSUM(NEWID())) % 86400000,

    DATEADD(day, ABS(CHECKSUM(NEWID())) % DATEDIFF(day, @start_period, @end_period), @start_period)) AS startdate

    FROM dbo.GetNums(@num_code) AS U

    CROSS JOIN dbo.GetNums(@intervals_per_code) AS I

    )

    INSERT INTO dbo.testtable WITH (TABLOCK) (code, startdate, enddate)

    SELECT code,

    startdate,

    DATEADD(ms, ABS(CHECKSUM(NEWID())) % (@max_duration_in_ms + 1), startdate) AS enddate

    FROM C;

    -- indexes

    CREATE UNIQUE INDEX idx_user_start_id ON dbo.testtable(code, startdate, id);

    CREATE UNIQUE INDEX idx_user_end_id ON dbo.testtable(code, enddate, id);

    if I compare your code to an alternative that IBG suggested initially for Oracle, prior to necesary windowing functions being available in Sql, then in my tests IBG code wins hands down.

    Maybe you will see differences...will be interested to see what you think?

    (I have removed the requirement for timediff calc and the OP's issue with NULL for sanity)

    -- IBG http://blogs.solidq.com/en/sqlserver/packing-intervals/ Solution 3, Using a Window Aggregate

    WITH C1 AS

    (

    SELECT code, startdate AS ts, +1 AS type, 1 AS sub

    FROM dbo.testtable

    UNION ALL

    SELECT code, enddate AS ts, -1 AS type, 0 AS sub

    FROM dbo.testtable

    ),

    C2 AS

    (

    SELECT C1.*,

    SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND CURRENT ROW) - sub AS cnt

    FROM C1

    ),

    C3 AS

    (

    SELECT code, ts,

    ((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)

    AS grpnum

    FROM C2

    WHERE cnt = 0

    )

    SELECT code, MIN(ts) AS startdate, max(ts) AS enddate

    FROM C3

    GROUP BY code, grpnum

    ORDER BY code

    -- CHrisM solution

    WITH DateSequence AS (

    SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    FROM testtable m

    CROSS APPLY (VALUES ('S', StartDate, 1), ('E', EndDate, -1)) d (Dir, [Date], flag)

    ),

    MarkedBoundaries AS (

    SELECT *, LastOnOff = ISNULL(LAG(OnOff,1) OVER(PARTITION BY Code ORDER BY [Date]),0)

    FROM DateSequence

    ),

    PairedDates AS (

    SELECT *, Grp = (ROW_NUMBER() OVER(PARTITION BY Code ORDER BY [Date])-1)/2

    FROM MarkedBoundaries

    WHERE OnOff = 0 OR LastOnOff = 0

    )

    SELECT Code,

    StartDate = MAX(CASE WHEN Dir = 'S' THEN [Date] ELSE NULL END),

    EndDate = MAX(CASE WHEN Dir = 'E' THEN [Date] ELSE NULL END)

    FROM PairedDates

    GROUP BY Code, Grp

    ORDER BY Code, Grp

    --- IBG for SQL 2008

    WITH C1

    AS (

    SELECT

    code,

    startdate AS dt,

    1 AS type,

    NULL AS e,

    ROW_NUMBER() OVER(PARTITION BY code ORDER BY startdate) AS s

    FROM testtable

    UNION ALL

    SELECT

    code,

    Enddate AS dt,

    -1 AS type,

    ROW_NUMBER() OVER(PARTITION BY code ORDER BY enddate) AS e,

    NULL AS s

    FROM testtable),

    C2

    AS (SELECT *,

    ROW_NUMBER() OVER(PARTITION BY code ORDER BY dt, type DESC) AS se

    FROM C1),

    C3

    AS (SELECT *,

    FLOOR((ROW_NUMBER() OVER(PARTITION BY code ORDER BY dt) - 1) / 2) + 1 AS grpnum

    FROM C2

    WHERE COALESCE(s - (se - s) - 1, (se - e) - e) = 0)

    SELECT code,

    MIN(dt) AS startdate,

    MAX(dt) AS enddate

    FROM C3

    GROUP BY code,grpnum

    ORDER BY code,grpnum

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • That's awesome G - I've been in and out of meetings all day so haven't had a chance to play yet, but I bet there's some tweaking potential!

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Following on from this, I had a few hours to spare on Saturday morning to spend on JLS' test harness.

    IBG's first query looks like this:

    SELECT code, startdate AS ts, +1 AS type, 1 AS sub

    FROM dbo.testtable

    UNION ALL

    SELECT code, enddate AS ts, -1 AS type, 0 AS sub

    FROM dbo.testtable

    Two table-length reads, one from each index. The next operator requires that the data is in a specific order to avoid a sort:

    SELECT C1.*,

    SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND CURRENT ROW) - sub AS cnt

    FROM C1

    and it achieves this by using a merge join between the two streams from the first query. The plan is serial in SQL2012, even with a 5-million row table. I'd guess that a parallel plan can't guarantee that streams arrive at the merge join operator in the correct order.

    The alternative I posted earlier looks like this:

    SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    FROM testtable m

    CROSS APPLY (VALUES ('S', StartDate, 1), ('E', EndDate, -1)) d (Dir, [Date], flag)

    It uses a compute scalar to generate the "extra" rows and a nested loops join to union them. The data is out-of-order for the aggregate so a sort is invoked: join hints cannot change the join type. However, the plan is parallel. On my two-core playpen the IBG query was about four times faster than my mod, mostly down to the sort.

    That's on two cores, however. I wonder how that parallel plan will behave with 20 cores? Lunchtime = playtime 🙂

    Thanks G for posting this up.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Jeff Moden (7/7/2016)


    ChrisM@Work (7/7/2016)


    You might find this version a little easier to follow than the IBG original and it only reads the data once, so it's quite a bit quicker:

    ;WITH DateSequence AS (

    SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    FROM #testtable m

    CROSS APPLY (VALUES ('S', StartDate, 1), ('E', ISNULL(EndDate,'21000101'), -1)) d (Dir, [Date], flag)

    ),

    MarkedBoundaries AS (

    SELECT *, LastOnOff = ISNULL(LAG(OnOff,1) OVER(PARTITION BY Code ORDER BY [Date]),0)

    FROM DateSequence

    ),

    PairedDates AS (

    SELECT *, Grp = (ROW_NUMBER() OVER(PARTITION BY Code ORDER BY [Date])-1)/2

    FROM MarkedBoundaries

    WHERE OnOff = 0 OR LastOnOff = 0

    )

    SELECT Code,

    StartDate = MAX(CASE WHEN Dir = 'S' THEN [Date] ELSE NULL END),

    EndDate = NULLIF(MAX(CASE WHEN Dir = 'E' THEN [Date] ELSE NULL END),'21000101')

    FROM PairedDates

    GROUP BY Code, Grp

    ORDER BY Code, Grp

    Haven't tried this yet but looks awesome. It's one of the things that Itzik referred to at the very end of his article because, as he pointed out, MS hadn't come up with proper windowing functions that would do the Preceeding Rows thing, yet. He was pretty instrumental in pushing on MS to make it a priority and, man, I'm thankful for that.

    Now that we've finally upgraded from 2005 to 1012 at work, I can see me working with it more and more. Thanks for the great example, Chris.

    You're welcome ol' friend - but note the operational differences between the two versions. I was testing them on a beefy instance where parallelism paid off.

    That sort is expensive because it spills. I wonder if there's a sneaky way around that?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (7/11/2016)


    Following on from this, I had a few hours to spare on Saturday morning to spend on JLS' test harness.

    IBG's first query looks like this:

    SELECT code, startdate AS ts, +1 AS type, 1 AS sub

    FROM dbo.testtable

    UNION ALL

    SELECT code, enddate AS ts, -1 AS type, 0 AS sub

    FROM dbo.testtable

    Two table-length reads, one from each index. The next operator requires that the data is in a specific order to avoid a sort:

    SELECT C1.*,

    SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND CURRENT ROW) - sub AS cnt

    FROM C1

    and it achieves this by using a merge join between the two streams from the first query. The plan is serial in SQL2012, even with a 5-million row table. I'd guess that a parallel plan can't guarantee that streams arrive at the merge join operator in the correct order.

    The alternative I posted earlier looks like this:

    SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    FROM testtable m

    CROSS APPLY (VALUES ('S', StartDate, 1), ('E', EndDate, -1)) d (Dir, [Date], flag)

    It uses a compute scalar to generate the "extra" rows and a nested loops join to union them. The data is out-of-order for the aggregate so a sort is invoked: join hints cannot change the join type. However, the plan is parallel. On my two-core playpen the IBG query was about four times faster than my mod, mostly down to the sort.

    That's on two cores, however. I wonder how that parallel plan will behave with 20 cores? Lunchtime = playtime 🙂

    Thanks G for posting this up.

    Hey Chris

    try this

    WITH C1 AS

    (

    SELECT code, startdate AS ts, +1 AS type, 1 AS sub

    FROM dbo.testtable

    UNION ALL

    SELECT code, enddate AS ts, -1 AS type, 0 AS sub

    FROM dbo.testtable

    ),

    C2 AS

    (

    SELECT C1.*,

    SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND CURRENT ROW) - sub AS cnt

    FROM C1

    ),

    C3 AS

    (

    SELECT code, ts,

    ((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)

    AS grpnum

    FROM C2

    WHERE cnt = 0

    )

    SELECT code, MIN(ts) AS startdate, max(ts) AS enddate

    --into #ibgdump2

    FROM C3

    GROUP BY code, grpnum

    ORDER BY code

    OPTION (RECOMPILE, QUERYTRACEON 8649) -- http://sqlblog.com/blogs/paul_white/archive/2011/12/23/forcing-a-parallel-query-execution-plan.aspx

    as IBG says in the post.....

    http://sqlmag.com/blog/solutions-packing-date-and-time-intervals-puzzle

    What makes this solution so fast is the efficient use of parallelism. Note though that when testing it in different environments and with different arguments for number of users and intervals, I didn’t always get a parallel plan. If you’re not getting a parallel plan, it could be because the machine you’re using has fewer logical CPUs than 8. Just for test purposes, you can use the SQL Server service startup option -P8, which will cause SQL Server to use 8 schedulers like in an environment with 8 logical CPUs. The -P startup parameter is not an officially documented one, so use it just for test purposes to mimic a machine with a desired number of CPUs, not for production purposes. Also, I noticed that in some machines where I tested this code and didn’t get parallel plans, when changing the sample data to 2,000 users each with 2,500 intervals, instead of 1,000 by 5,000, I got parallel plans in more cases. Either way, this solution is still very fast even when using a serial plan.

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL (7/11/2016)


    ChrisM@Work (7/11/2016)


    Following on from this, I had a few hours to spare on Saturday morning to spend on JLS' test harness.

    IBG's first query looks like this:

    SELECT code, startdate AS ts, +1 AS type, 1 AS sub

    FROM dbo.testtable

    UNION ALL

    SELECT code, enddate AS ts, -1 AS type, 0 AS sub

    FROM dbo.testtable

    Two table-length reads, one from each index. The next operator requires that the data is in a specific order to avoid a sort:

    SELECT C1.*,

    SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND CURRENT ROW) - sub AS cnt

    FROM C1

    and it achieves this by using a merge join between the two streams from the first query. The plan is serial in SQL2012, even with a 5-million row table. I'd guess that a parallel plan can't guarantee that streams arrive at the merge join operator in the correct order.

    The alternative I posted earlier looks like this:

    SELECT m.ID, m.Code, d.*, OnOff = SUM(Flag) OVER(PARTITION BY Code ORDER BY [Date] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    FROM testtable m

    CROSS APPLY (VALUES ('S', StartDate, 1), ('E', EndDate, -1)) d (Dir, [Date], flag)

    It uses a compute scalar to generate the "extra" rows and a nested loops join to union them. The data is out-of-order for the aggregate so a sort is invoked: join hints cannot change the join type. However, the plan is parallel. On my two-core playpen the IBG query was about four times faster than my mod, mostly down to the sort.

    That's on two cores, however. I wonder how that parallel plan will behave with 20 cores? Lunchtime = playtime 🙂

    Thanks G for posting this up.

    Hey Chris

    try this

    WITH C1 AS

    (

    SELECT code, startdate AS ts, +1 AS type, 1 AS sub

    FROM dbo.testtable

    UNION ALL

    SELECT code, enddate AS ts, -1 AS type, 0 AS sub

    FROM dbo.testtable

    ),

    C2 AS

    (

    SELECT C1.*,

    SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND CURRENT ROW) - sub AS cnt

    FROM C1

    ),

    C3 AS

    (

    SELECT code, ts,

    ((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)

    AS grpnum

    FROM C2

    WHERE cnt = 0

    )

    SELECT code, MIN(ts) AS startdate, max(ts) AS enddate

    --into #ibgdump2

    FROM C3

    GROUP BY code, grpnum

    ORDER BY code

    OPTION (RECOMPILE, QUERYTRACEON 8649) -- http://sqlblog.com/blogs/paul_white/archive/2011/12/23/forcing-a-parallel-query-execution-plan.aspx

    as IBG says in the post.....

    http://sqlmag.com/blog/solutions-packing-date-and-time-intervals-puzzle

    What makes this solution so fast is the efficient use of parallelism. Note though that when testing it in different environments and with different arguments for number of users and intervals, I didn’t always get a parallel plan. If you’re not getting a parallel plan, it could be because the machine you’re using has fewer logical CPUs than 8. Just for test purposes, you can use the SQL Server service startup option -P8, which will cause SQL Server to use 8 schedulers like in an environment with 8 logical CPUs. The -P startup parameter is not an officially documented one, so use it just for test purposes to mimic a machine with a desired number of CPUs, not for production purposes. Also, I noticed that in some machines where I tested this code and didn’t get parallel plans, when changing the sample data to 2,000 users each with 2,500 intervals, instead of 1,000 by 5,000, I got parallel plans in more cases. Either way, this solution is still very fast even when using a serial plan.

    It's quick G but not quite as quick as this, which uses no Black Magic at all:

    PRINT '== CJM2 =========================================================================='

    SET STATISTICS IO, TIME ON

    ;WITH C2 AS

    (

    SELECT d.*, cnt = CAST(SUM(type) OVER(PARTITION BY d.code ORDER BY ts, type DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - sub AS TINYINT) -- 853760

    FROM testtable m

    CROSS APPLY (

    VALUES

    (CAST(m.Code AS VARCHAR(10)), CAST(StartDate AS DATETIME), CAST(1 AS SMALLINT), CAST(1 AS TINYINT)),

    (CAST(m.Code AS VARCHAR(10)), CAST(EndDate AS DATETIME), CAST(-1 AS SMALLINT), CAST(0 AS TINYINT))

    ) d (Code, ts, type, sub)

    ),

    C3 AS

    (

    SELECT code, ts,

    ((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)

    AS grpnum

    FROM C2

    WHERE cnt = 0

    )

    SELECT code, MIN(ts) AS startdate, max(ts) AS enddate

    FROM C3

    GROUP BY code, grpnum

    ORDER BY code;

    SET STATISTICS IO, TIME OFF

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • It's quick G but not quite as quick as this, which uses no Black Magic at all:

    I dont get the same results Chris....any thoughts ??

    PRINT '== IBG =========================================================================='

    SET STATISTICS IO, TIME ON;

    WITH C1 AS

    (

    SELECT code, startdate AS ts, +1 AS type, 1 AS sub

    FROM dbo.testtable

    UNION ALL

    SELECT code, enddate AS ts, -1 AS type, 0 AS sub

    FROM dbo.testtable

    ),

    C2 AS

    (

    SELECT C1.*,

    SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND CURRENT ROW) - sub AS cnt

    FROM C1

    ),

    C3 AS

    (

    SELECT code, ts,

    ((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)

    AS grpnum

    FROM C2

    WHERE cnt = 0

    )

    SELECT code, MIN(ts) AS startdate, max(ts) AS enddate

    into #ibgdump

    FROM C3

    GROUP BY code, grpnum

    ORDER BY code

    OPTION (RECOMPILE, QUERYTRACEON 8649)

    SET STATISTICS IO, TIME OFF

    DROP TABLE #ibgdump

    PRINT '== CJM2 =========================================================================='

    SET STATISTICS IO, TIME ON

    ;WITH C2 AS

    (

    SELECT d.*, cnt = CAST(SUM(type) OVER(PARTITION BY d.code ORDER BY ts, type DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - sub AS TINYINT) -- 853760

    FROM testtable m

    CROSS APPLY (

    VALUES

    (CAST(m.Code AS VARCHAR(10)), CAST(StartDate AS DATETIME), CAST(1 AS SMALLINT), CAST(1 AS TINYINT)),

    (CAST(m.Code AS VARCHAR(10)), CAST(EndDate AS DATETIME), CAST(-1 AS SMALLINT), CAST(0 AS TINYINT))

    ) d (Code, ts, type, sub)

    ),

    C3 AS

    (

    SELECT code, ts,

    ((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)

    AS grpnum

    FROM C2

    WHERE cnt = 0

    )

    SELECT code, MIN(ts) AS startdate, max(ts) AS enddate

    into #cmdump

    FROM C3

    GROUP BY code, grpnum

    ORDER BY code;

    SET STATISTICS IO, TIME OFF

    DROP TABLE #cmdump

    == IBG ==========================================================================

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 3 ms.

    Table 'testtable'. Scan count 18, logical reads 38914, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 26627 ms, elapsed time = 7075 ms.

    (2693 row(s) affected)

    == CJM2 ==========================================================================

    Table 'testtable'. Scan count 9, logical reads 26134, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 34484 ms, elapsed time = 17962 ms.

    (2693 row(s) affected)

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL (7/11/2016)


    It's quick G but not quite as quick as this, which uses no Black Magic at all:

    I dont get the same results Chris....any thoughts ??

    PRINT '== IBG =========================================================================='

    SET STATISTICS IO, TIME ON;

    WITH C1 AS

    (

    SELECT code, startdate AS ts, +1 AS type, 1 AS sub

    FROM dbo.testtable

    UNION ALL

    SELECT code, enddate AS ts, -1 AS type, 0 AS sub

    FROM dbo.testtable

    ),

    C2 AS

    (

    SELECT C1.*,

    SUM(type) OVER(PARTITION BY code ORDER BY ts, type DESC

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND CURRENT ROW) - sub AS cnt

    FROM C1

    ),

    C3 AS

    (

    SELECT code, ts,

    ((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)

    AS grpnum

    FROM C2

    WHERE cnt = 0

    )

    SELECT code, MIN(ts) AS startdate, max(ts) AS enddate

    into #ibgdump

    FROM C3

    GROUP BY code, grpnum

    ORDER BY code

    OPTION (RECOMPILE, QUERYTRACEON 8649)

    SET STATISTICS IO, TIME OFF

    DROP TABLE #ibgdump

    PRINT '== CJM2 =========================================================================='

    SET STATISTICS IO, TIME ON

    ;WITH C2 AS

    (

    SELECT d.*, cnt = CAST(SUM(type) OVER(PARTITION BY d.code ORDER BY ts, type DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - sub AS TINYINT) -- 853760

    FROM testtable m

    CROSS APPLY (

    VALUES

    (CAST(m.Code AS VARCHAR(10)), CAST(StartDate AS DATETIME), CAST(1 AS SMALLINT), CAST(1 AS TINYINT)),

    (CAST(m.Code AS VARCHAR(10)), CAST(EndDate AS DATETIME), CAST(-1 AS SMALLINT), CAST(0 AS TINYINT))

    ) d (Code, ts, type, sub)

    ),

    C3 AS

    (

    SELECT code, ts,

    ((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2 + 1)

    AS grpnum

    FROM C2

    WHERE cnt = 0

    )

    SELECT code, MIN(ts) AS startdate, max(ts) AS enddate

    into #cmdump

    FROM C3

    GROUP BY code, grpnum

    ORDER BY code;

    SET STATISTICS IO, TIME OFF

    DROP TABLE #cmdump

    == IBG ==========================================================================

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 3 ms.

    Table 'testtable'. Scan count 18, logical reads 38914, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 26627 ms, elapsed time = 7075 ms.

    (2693 row(s) affected)

    == CJM2 ==========================================================================

    Table 'testtable'. Scan count 9, logical reads 26134, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 34484 ms, elapsed time = 17962 ms.

    (2693 row(s) affected)

    Guessing - you're on 8 cores with less than the 200gb of ram I'm playing with?

    == CJM2 ==========================================================================

    Table 'testtable'. Scan count 17, logical reads 27332, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 34617, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 82182 ms, elapsed time = 7076 ms.

    == JLS1 ==========================================================================

    SQL Server parse and compile time:

    CPU time = 9 ms, elapsed time = 9 ms.

    Table 'testtable'. Scan count 34, logical reads 40869, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 48161 ms, elapsed time = 7217 ms.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • guess I am !!

    only 12 GB here

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL (7/11/2016)


    guess I am !!

    only 12 GB here

    Try with this index G, I'm getting about 10% improvement.

    -- CREATE UNIQUE NONCLUSTERED INDEX [idx_Code_Dates] ON [dbo].[testtable] ([Code] ASC,[enddate] ASC,[startdate] ASC)

    PRINT '== CJM2 =========================================================================='

    SET STATISTICS IO, TIME ON

    ;WITH C2 AS (

    SELECT

    d.code ,

    d.ts,

    cnt = SUM(type) OVER(PARTITION BY d.code ORDER BY d.ts, d.type DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - d.sub

    FROM testtable m

    CROSS APPLY (

    VALUES

    (CAST(m.Code AS CHAR(10)), CAST(m.StartDate AS DATETIME), CAST(1 AS SMALLINT), CAST(1 AS TINYINT)),

    (CAST(m.Code AS CHAR(10)), CAST(m.EndDate AS DATETIME), CAST(-1 AS SMALLINT), CAST(0 AS TINYINT))

    ) d (code, ts, type, sub)

    ),

    C3 AS (

    SELECT code, ts,

    grpnum = ((ROW_NUMBER() OVER(PARTITION BY code ORDER BY ts) - 1) / 2)

    FROM C2

    WHERE cnt = 0

    )

    SELECT code, MIN(ts) AS startdate, max(ts) AS enddate

    FROM C3

    GROUP BY code, grpnum

    ORDER BY code;

    SET STATISTICS IO, TIME OFF -- 2782

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • == CJM2 ==========================================================================

    Table 'testtable'. Scan count 9, logical reads 24440, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 31218 ms, elapsed time = 16031 ms.

    (2693 row(s) affected)

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL (7/11/2016)


    == CJM2 ==========================================================================

    Table 'testtable'. Scan count 9, logical reads 24440, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 31218 ms, elapsed time = 16031 ms.

    (2693 row(s) affected)

    On your phone? 😛 You've got nothing in cache!

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (7/11/2016)


    J Livingston SQL (7/11/2016)


    == CJM2 ==========================================================================

    Table 'testtable'. Scan count 9, logical reads 24440, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 31218 ms, elapsed time = 16031 ms.

    (2693 row(s) affected)

    On your phone? 😛 You've got nothing in cache!

    nope...not on phone

    ran DBCC FREEPROCCACHE then

    ran your code 10 times.....that was the best I got.

    Microsoft SQL Server 2014 - 12.0.4213.0 (X64) Enterprise Evaluation Edition (64-bit) on Windows NT 6.3 <X64> (Build 10586: )

    Windows 10

    Intel i7 6700

    16Gb RAM (12gb to SQL)

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

Viewing 15 posts - 16 through 30 (of 39 total)

You must be logged in to reply to this topic. Login to reply