Relate records from different rows (create a relation in customer events)

  • Hi all,

    I require some assistance and i'm not sure on the best way to proceed.

    I have a table which records customers and their events (how its missing their "sesson"). Example below, customer 000001 had an event named ER1/16 and a start/end datetime and another event AC1/16 (start/end datetime). These events need to be placed in the same session within +/- 2 hours (there is no identifier in the table for session). However their next event ER2/16 is a few days after and should be excluded from the session.

    How do i go about creating this session key and relating different events within +/- 2 hours for the same customer?

    Sample data below

    CREATE TABLE HELP(

    [rownumber] [INT],

    [customer] [INT],

    [session] [INT],

    [event] [VARCHAR](10),

    [startdatetime] [datetime],

    [enddatetime] [datetime]

    )

    INSERT INTO HELP

    (rownumber,customer,session,event,startdatetime,enddatetime)

    VALUES

    (1,000001,NULL,'E1/16','2016-09-21 11:59:00.000','2016-09-21 18:00:00.000'),

    (2,000001,NULL,'A1/16','2016-09-21 17:22:00.000','2016-09-26 16:40:00.000'),

    (3,000001,NULL,'E2/16','2016-09-27 17:40:00.000','2016-09-27 20:58:00.000')

    select * from help

    Thank you for any and all help in advance!

  • Take a look at New Solution to the Packing Intervals Problem.

    Hint: You may want to add an index on (customer, startdatetime, enddatetime, rownumber) (prefereably clustered)

  • Hello!

    Thank you for the reply I will certainly take a look.

    My dataset is not as clean but I think I can make it work. Example: a user can have accounts in different regions. So aside from me just knowing the user account there is no other method for me to tell if it's the same user. All the other values are unique.

    Current I do something similar to what is in the example but not as clean and I determine the "session" but finding users and their activity in through ranking and I have it broken down into 30 minute intervals with a confidence flag and relate from that.

    I'll give this a shot though, thank you again!

  • I just realised that you are using SQL 2008, so you will need to look at the old solution that only uses ROW_NUMBER()

    This blog post[/url] explains the solution.

    Something like this

    --CREATE UNIQUE INDEX idx_start ON HELP(customer, startdatetime, rownumber);

    --CREATE UNIQUE INDEX idx_end ON HELP(customer, enddatetime, rownumber);

    WITH C1 AS (

    SELECT rownumber, customer, [event], startdatetime, enddatetime,

    startdatetime AS ts, +1 AS type,

    ROW_NUMBER() OVER(PARTITION BY customer ORDER BY startdatetime, rownumber) AS s,

    NULL AS e

    FROM HELP

    UNION ALL

    SELECT rownumber, customer, [event], startdatetime, enddatetime,

    enddatetime AS ts, -1 AS type,

    NULL AS s,

    ROW_NUMBER() OVER(PARTITION BY customer ORDER BY enddatetime, rownumber) AS e

    FROM HELP

    )

    , C2 AS (

    SELECT *,

    ROW_NUMBER() OVER(PARTITION BY customer ORDER BY ts, type DESC, rownumber) AS se

    FROM C1

    )

    , C3 AS (

    SELECT rownumber, customer, [event], startdatetime, enddatetime,

    FLOOR((ROW_NUMBER() OVER(PARTITION BY customer ORDER BY ts) + 1) / 2) AS [session]

    FROM C2

    CROSS APPLY ( VALUES(s - (se - s) - 1, (se - e) - e) ) AS A(cs, ce)

    WHERE cs = 0 OR ce = 0

    )

    SELECT rownumber, customer, [session]=MIN([session]), [event], startdatetime, enddatetime

    FROM C3

    GROUP BY rownumber, customer, [event], startdatetime, enddatetime;

  • Thank you very much for the redirect, i'm going to try to apply this tomorrow but i will post the result, this certainly looks like where i can to go though!

    Thank you again!

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply