Retrieving specific records using OVER clause?

  • Would anyone be able to assist with the following?

    I'm trying to retrieve the highlighted line... basically the start of GT0 1 till the last record. Though it would be possible with the OVER clause... but not getting it right.

    The table has 4million records per month - and it needs a comparison of 12 months - so I'm trying to read the data once.

    ACCOUNT_NBRDATE GT0

    0001200910120

    0001200911120

    0001200912121

    0001201001121

    0001201002121

    0001201003120

    0001201004120

    0001201005120

    0001201006120

    0001201007121

    0001201008121

    0001201009111

    I basically need to work out that the first 1 (after a preceding zero) starts at 20100712 and ends on 20100911 - so three months.

  • Brett

    From the description you have provided, the answer would be

    WHERE DATE >= '20100712'

    however if it were this easy, then you wouldn't have posted. Can you give a little more detail please?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Yeah - not quite that easy.

    The Column GT0 represents payments missed, so when the value is 1, that months payment was missed.

    I'm trying to retrieve the count of the last set of consecutive months where payments were missed. (in this case it would be 3 months for the for the relevant ACCOUNT_NBR.

    2010-09-11 is the latest month, and is passed in as a variable, which retrieves the last 12 months payment information.

  • For Better Assistance

    CREATE TABLE #mytable

    (

    ACCOUNT_NBR CHAR(19),

    DATE INT,

    GTO INT

    )

    INSERT INTO #mytable

    SELECT '0001','20091012','0' UNION ALL

    SELECT '0001','20091112','0' UNION ALL

    SELECT '0001','20091212','1' UNION ALL

    SELECT '0001','20100112','1' UNION ALL

    SELECT '0001','20100212','1' UNION ALL

    SELECT '0001','20100312','0' UNION ALL

    SELECT '0001','20100412','0' UNION ALL

    SELECT '0001','20100512','0' UNION ALL

    SELECT '0001','20100612','0' UNION ALL

    SELECT '0001','20100712','1' UNION ALL

    SELECT '0001','20100812','1' UNION ALL

    SELECT '0001','20100911','1'

    SELECT * FROM #mytable

  • Thanks for the data, Brett. Try this:

    DROP TABLE #mytable

    CREATE TABLE #mytable

    (

    ACCOUNT_NBR CHAR(19),

    DATE INT,

    GTO INT

    )

    INSERT INTO #mytable

    SELECT '0010010000001000001','20091012','0' UNION ALL

    SELECT '0010010000001080001','20091112','0' UNION ALL

    SELECT '0010010000001080001','20091212','1' UNION ALL

    SELECT '0010010000001080001','20100112','1' UNION ALL

    SELECT '0010010000001080001','20100212','1' UNION ALL

    SELECT '0010010000001080001','20100312','0' UNION ALL

    SELECT '0010010000001080001','20100412','0' UNION ALL

    SELECT '0010010000001080001','20100512','0' UNION ALL

    SELECT '0010010000001080001','20100612','0' UNION ALL

    SELECT '0010010000001080001','20100712','1' UNION ALL

    SELECT '0010010000001080001','20100812','1' UNION ALL

    SELECT '0010010000001080001','20100911','1'

    ;WITH Partitioner AS (

    SELECT rn = ROW_NUMBER() OVER(ORDER BY ACCOUNT_NBR, DATE),

    grn = ROW_NUMBER() OVER(PARTITION BY ACCOUNT_NBR, GTO ORDER BY ACCOUNT_NBR, DATE),

    ACCOUNT_NBR,

    DATE,

    GTO

    FROM #mytable

    ) SELECT ACCOUNT_NBR, FirstMissedDate = MIN(DATE), PaymentsMissed = COUNT(*)

    FROM Partitioner

    GROUP BY ACCOUNT_NBR, GTO, rn-grn

    HAVING GTO = 1

    There are at least two other ways of tackling this, the quirky update and a recursive CTE. The method shown is the easiest but slowest.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Try this:

    select account_nbr,

    (select min(date) from tab1 as b

    where b.date >= a.date

    and not exists

    (select * from tab1 as c

    where b.date = c.date - 1)) as date,

    gto

    from tab1 as a

    where gto=1

    You can add a filter on date. I was confused if you needed all 1s after any preceeding 0s or just that specific range.

  • @chris-2 Morris-439714 - thanks - that's putting me on the right track... I'll see if I can work with this, its not quite as simpple as my calculation, as I also have other calculations.

    @bjhogan - thanks for the response, unfortunately there are other calculations other, 78 in total, so I need to try and do this as quickly and neatly as possible.

  • Brett - you're welcome. Thanks for posting up the sample data.

    If you can describe some of the other work you need to do with this data set, I'm sure folks will help. It's beginning to sound like one of the other two methods might be more appropriate.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • All around the same principal, below is another account_nbr.

    I have the following calculations:

    1.) [Months Since GT0 greater than 0] (up until latest month)

    2.) [Months Since GT0 greater than 1] (up until latest month)

    3.) [Months Since GT0 greater than 2] (up until latest month)

    4.) [Max Consecutive Months GT0 greater than 0]

    5.) [Max Consecutive Months GT0 greater than 1]

    6.) [Count Months Consecutively where GT0 increased]

    7.) [Count Months Consecutively where GT0 decreased]

    8). [Count Months Consecutively where GT0 has not changed]

    9.) [Maximum decrease in GT0 month to month]

    Those are the ones i'm currently stuck with.

    DROP TABLE #mytable

    CREATE TABLE #mytable

    (

    ACCOUNT_NBR CHAR(19),

    DATE INT,

    GTO INT

    )

    INSERT INTO #mytable

    SELECT '0002','20091012','3' UNION ALL

    SELECT '0002','20091112','2' UNION ALL

    SELECT '0002','20091212','1' UNION ALL

    SELECT '0002','20100112','0' UNION ALL

    SELECT '0002','20100212','0' UNION ALL

    SELECT '0002','20100312','0' UNION ALL

    SELECT '0002','20100412','1' UNION ALL

    SELECT '0002','20100512','2' UNION ALL

    SELECT '0002','20100612','3' UNION ALL

    SELECT '0002','20100712','4' UNION ALL

    SELECT '0002','20100812','5' UNION ALL

    SELECT '0002','20100911','6' UNION ALL

  • Time to consider the other two options. The quirky update is exactly that - an update with a twist, it uses variables to hold data between rows. There's a comprehensive article here[/url] and I'd recommend you read the discussion too.

    The other alternative is the recursive CTE, which is a read operation - if you want to persist the results in the source table then you have to code for it. Each method has its merits. The following example is a rCTE, and I've chosen it not on merit at all but because it's quicker to code up. What you will find, looking at the output, is that you will still have some work to do in terms of harvesting the figures you want - the method just provides them. Three of your requirements are included of the nine in the list.

    Sample data:

    DROP TABLE #mytable

    CREATE TABLE #mytable

    (

    ACCOUNT_NBR CHAR(19),

    DATE INT,

    GTO INT

    )

    INSERT INTO #mytable

    SELECT '0010010000001080001','20091012','0' UNION ALL

    SELECT '0010010000001080001','20091112','0' UNION ALL

    SELECT '0010010000001080001','20091212','1' UNION ALL

    SELECT '0010010000001080001','20100112','1' UNION ALL

    SELECT '0010010000001080001','20100212','1' UNION ALL

    SELECT '0010010000001080001','20100312','0' UNION ALL

    SELECT '0010010000001080001','20100412','0' UNION ALL

    SELECT '0010010000001080001','20100512','0' UNION ALL

    SELECT '0010010000001080001','20100612','0' UNION ALL

    SELECT '0010010000001080001','20100712','1' UNION ALL

    SELECT '0010010000001080001','20100812','1' UNION ALL

    SELECT '0010010000001080001','20100911','1'

    INSERT INTO #mytable

    SELECT '0002','20091012','3' UNION ALL

    SELECT '0002','20091112','2' UNION ALL

    SELECT '0002','20091212','1' UNION ALL

    SELECT '0002','20100112','0' UNION ALL

    SELECT '0002','20100212','0' UNION ALL

    SELECT '0002','20100312','0' UNION ALL

    SELECT '0002','20100412','1' UNION ALL

    SELECT '0002','20100512','2' UNION ALL

    SELECT '0002','20100612','3' UNION ALL

    SELECT '0002','20100712','4' UNION ALL

    SELECT '0002','20100812','5' UNION ALL

    SELECT '0002','20100911','6'

    Code:

    ;WITH OrderedData AS (

    SELECT seq = ROW_NUMBER() OVER (ORDER BY ACCOUNT_NBR, DATE),

    ACCOUNT_NBR, DATE, GTO

    FROM #mytable

    ),

    Calculator AS (

    SELECT seq, ACCOUNT_NBR, DATE, GTO,

    -- 1.) [Months Since GT0 greater than 0] (up until latest month)

    MonthsGTOgt0 = CAST(CASE WHEN GTO > 0 THEN 1 ELSE 0 END AS INT),

    -- 2.) [Months Since GT0 greater than 1] (up until latest month)

    MonthsGTOgt1 = CAST(CASE WHEN GTO > 1 THEN 1 ELSE 0 END AS INT),

    -- 9.) [Maximum decrease in GT0 month to month]

    GTO_decrease = CAST(0 AS INT)

    FROM OrderedData

    WHERE seq = 1

    UNION ALL

    SELECT nr.seq, nr.ACCOUNT_NBR, nr.DATE, nr.GTO,

    MonthsGTOgt0 = CASE

    WHEN nr.ACCOUNT_NBR = lr.ACCOUNT_NBR AND nr.GTO > 0 THEN lr.MonthsGTOgt0 + 1

    WHEN nr.ACCOUNT_NBR = lr.ACCOUNT_NBR AND nr.GTO = 0 THEN 0

    WHEN nr.ACCOUNT_NBR <> lr.ACCOUNT_NBR AND nr.GTO > 0 THEN 1

    ELSE 0 END,

    MonthsGTOgt1 = CASE

    WHEN nr.ACCOUNT_NBR = lr.ACCOUNT_NBR AND nr.GTO > 1 THEN lr.MonthsGTOgt1 + 1

    WHEN nr.ACCOUNT_NBR = lr.ACCOUNT_NBR AND nr.GTO < 2 THEN 0

    WHEN nr.ACCOUNT_NBR <> lr.ACCOUNT_NBR AND nr.GTO > 1 THEN 1

    ELSE 0 END,

    GTO_decrease = CASE

    WHEN nr.ACCOUNT_NBR <> lr.ACCOUNT_NBR THEN 0 -- new row has different account no.

    ELSE CAST(lr.GTO - nr.GTO AS INT) END

    FROM OrderedData nr -- new row

    INNER JOIN Calculator lr -- last row

    ON lr.seq + 1 = nr.seq

    )

    SELECT *

    FROM Calculator

    ORDER BY seq

    OPTION (MAXRECURSION 0)

    Key point: for optimum performance, spool your source columns and the sequence column into a temporary table and create a unique clustered index on the sequence column.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply