First time semi-complex query - its slow!!!

  • Ok so I am giving myself a huge pat on the shoulder because i've had no formal training on SQL nor do I even work in a IT-ish role but work as a data analyst. We had some data that I wanted to transform and the only thing I could think of was to write the script below. It works (!!) but it's very slow. It took 58 minutes to process 23,000 rows using the system below.

    The idea is that it takes data from a table, looks at a start and end datetime, checks to see if there is an interval which fits between that range, and if so updates another table.

    Can someone take my donkey script and help me make it more efficient? from now on, the script will be processing a much smaller number of rows, say 1500 per day, but that still quite a long time while my data warehouse loads.

    Anyway, your thoughts would be appreciated.

    -- Declare Procedural Variable

    declare @exception varchar(50)

    declare @Interval datetime

    declare @start datetime

    declare @stop datetime

    declare @DateRaw varchar(50)

    declare @AgentName varchar(50)

    declare @RowCnt int

    declare @MaxRows int

    declare @Flag int

    declare @subloop int

    -- reset global variables

    select @flag = 1

    select @RowCnt = 1

    --define total rowcount

    select @MaxRows=count(*) from dbo.wfm_actualactivity

    -- Load table into a temporary table for sorting, adding a row identifier for the rowcount

    declare @SourceTable table (rownum int IDENTITY (1,1),DateRaw datetime null, AgentName varchar(50)null, exception varchar(50)null, e_start datetime null, e_end datetime null)

    insert into @SourceTable (DateRaw, AgentName, exception, e_start, e_end) SELECT DateRaw, agentName, exception, cast(start as datetime), cast([stop] as datetime) FROM dbo.wfm_actualactivity

    -- Start outer procedure :: Loop while there are still rows to process

    While @RowCnt <= @MaxRows

    begin

    -- Prep for the inner loop to check each row against each interval

    select @subloop = 0

    select @start = (SELECT e_start FROM @SourceTable WHERE rownum = @RowCnt)

    select @stop = (SELECT e_end FROM @SourceTable WHERE rownum = @RowCnt)

    select @Interval = '01/01/1900 07:45:00'

    --Start inner procedure :: Loop for each minute of the day

    While @interval < '01/01/1900 20:15:00' -- Enter Statement to make sure row data is within interval

    begin

    select @subloop = 0 -- reset subloop counter. Subloop = 2 means it is within that time range

    IF @Interval >= @Start select @subloop = @subloop + 1 -- Checks to see if the interval is greater than start time

    IF @Interval <= @Stop select @subloop = @subloop + 1 -- checks to see if the interval is less than the end time.

    IF @subloop =2 select @DateRaw = (Select DateRaw FROM @SourceTable WHERE rownum = @RowCnt) -- loads variable if appropriate

    IF @subloop =2 select @AgentName = (Select AgentName FROM @SourceTable WHERE rownum = @RowCnt) -- loads variable if appropriate

    IF @subloop =2 select @Exception = (Select Exception FROM @SourceTable WHERE rownum = @RowCnt) -- loads variable if appropriate

    IF @subloop =2 INSERT INTO dbo.ActualInterval (DateRaw, Interval, AgentName, Exception, Flag) Values (@DateRaw, @Interval, @AgentName, @Exception, @flag) -- loads variable if appropriate

    select @interval = @interval + '01/01/1900 00:05:00' -- run process to see if exception fits within the next 1 minute interval

    END

    select @RowCnt = @RowCnt + 1

    END

  • To restate your problem :

    You need a list of 5 minute intervals betweeen '07:45:00' and '20:15:00' that are between the start and stop time on wfm_actualactivity.

    Is that correct ?



    Clear Sky SQL
    My Blog[/url]

  • Yes. Well, any interval really, I would much prefer 1 minute intervals, the problem is query time which is why it's 5 minutes.

    The source table looks something like this.

    date, agent, exception, start, stop.

    1/6/2010, John, Lunch, 12:30, 13:30

    1/6/2010, John, OpenTime, 13:30, 15:00

    I would want to send that to another table, which has an interval column, at an interval (1 min, 5 min whatever) and what exception they were in at that time.

  • Then i suspect something like this....

    Create Table #Source

    (

    dateRaw datetime,

    Agent varchar(20),

    Exception varchar(20),

    e_start datetime,

    e_end datetime

    )

    go

    Insert into #Source(dateRaw,Agent,Exception,e_start,e_end)

    select '2010-01-06', 'John', 'Lunch', cast('12:30' as datetime), cast('13:30' as datetime)

    union all

    select '2010-01-06', 'John', 'OpenTime', '13:30', '15:00'

    go

    with cteTime

    as

    (

    Select *,DATEDIFF(mi,e_start,e_end) as mins

    from #source

    )

    select *,DATEADD(mi,vals.number,e_start) as ExceptionTime

    from cteTime

    join master..spt_values vals

    on vals.number <= mins

    where vals.type = 'P'

    Hope you are able to break this down to understand the code, needless to say dont put anything live that you dont 100% understand.

    Any questions , feel free to post back



    Clear Sky SQL
    My Blog[/url]

  • what is the rowcount of the table?

  • 23,000 rows ish.

    with 1500 rows per day.

  • Dave Ballantyne (12/23/2010)


    Then i suspect something like this....

    Create Table #Source

    (

    dateRaw datetime,

    Agent varchar(20),

    Exception varchar(20),

    e_start datetime,

    e_end datetime

    )

    go

    Insert into #Source(dateRaw,Agent,Exception,e_start,e_end)

    select '2010-01-06', 'John', 'Lunch', cast('12:30' as datetime), cast('13:30' as datetime)

    union all

    select '2010-01-06', 'John', 'OpenTime', '13:30', '15:00'

    go

    with cteTime

    as

    (

    Select *,DATEDIFF(mi,e_start,e_end) as mins

    from #source

    )

    select *,DATEADD(mi,vals.number,e_start) as ExceptionTime

    from cteTime

    join master..spt_values vals

    on vals.number <= mins

    where vals.type = 'P'

    Hope you are able to break this down to understand the code, needless to say dont put anything live that you dont 100% understand.

    Any questions , feel free to post back

    I will try this, this afternoon - but thanks in advance

  • To add to this, I had a similar issue not too long ago which was solved on this forum.

    Try going over this.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • skcadavre (12/23/2010)


    To add to this, I had a similar issue not too long ago which was solved on this forum.

    Try going over this.

    Hmmmm... your last statement in the link you provided was

    I then intend to do some testing on my real data to see the performance differences between the others.

    I wonder if you could post the results of your testing and the code you finally came up with in the thread you just referred to. ;-):-)

    I'm sure it'll help to make that other thread "well-formed"...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • How about this ?

    ;with cte

    as

    (

    select e_start,e_end from #Source

    union all

    select DATEADD(minute,1,e_start),e_end from cte

    where DATEADD(minute,1,e_start)<=e_end

    )

    select e_start from cte order by e_start

    I think with Denali and introduction of SEQUENCES the above method will be a thing of the past.

    --------------------------------------------------------------------------------------------------
    I am just an another naive wannabe DBA trying to learn SQL Server

  • Sachin Nandanwar (12/23/2010)


    How about this ?

    ;with cte

    as

    (

    select e_start,e_end from #Source

    union all

    select DATEADD(minute,1,e_start),e_end from cte

    where DATEADD(minute,1,e_start)<=e_end

    )

    select e_start from cte order by e_start

    I think with Denali and introduction of SEQUENCES the above method will be a thing of the past.

    This method should be a thing of the past already , have you tested performance of this over a large number of rows and compared that to a tally table ?

    Plus sequences will be a dead end here. You would have to reinitialize the sequence to zero for each call of the routine. This would also mean that concurrent executions would have to be stopped as sequences are global



    Clear Sky SQL
    My Blog[/url]

  • No I havent.

    Can you post some link where it has been done?

    --------------------------------------------------------------------------------------------------
    I am just an another naive wannabe DBA trying to learn SQL Server

  • Here you go , Very easy to prove.

    drop table #Source

    go

    Create Table #Source

    (

    SourceID integer identity ,

    e_start smalldatetime,

    e_end smalldatetime

    )

    go

    insert into #Source(e_start,e_end)

    select top(1000) '01jan2010','01jan2010 01:00'

    from sys.columns a cross join sys.columns b

    go

    --- CTE RBAR Method

    ;with cte

    as

    (

    select SourceID,e_start,e_end from #Source

    union all

    select SourceID,DATEADD(minute,1,e_start),e_end from cte

    where DATEADD(minute,1,e_start)<=e_end

    )

    select SourceID,e_start from cte order by e_start

    go

    --- Tally Table

    with cteTime

    as

    (

    Select *,DATEDIFF(mi,e_start,e_end) as mins

    from #source

    )

    select *,DATEADD(mi,vals.number,e_start) as ExceptionTime

    from cteTime

    join master..spt_values vals

    on vals.number <= mins

    where vals.type = 'P'

    Bear in mind this is over a relatively small dataset of 1000 rows...



    Clear Sky SQL
    My Blog[/url]

  • Thanks.

    I am to getting somewhat the same results after testing both the methods.

    I am now gonna STOP using the CTE method in future 🙂

    --------------------------------------------------------------------------------------------------
    I am just an another naive wannabe DBA trying to learn SQL Server

  • Dave Ballantyne (12/23/2010)


    To restate your problem :

    You need a list of 5 minute intervals betweeen '07:45:00' and '20:15:00' that are between the start and stop time on wfm_actualactivity.

    Is that correct ?

    To confirm, your solution works perfectly, and wanted to thank you. Coming to forums such as this is a perfect way for people who learn like I do, through application in real life issues. I understand now with some research the solution - and has gone to widen my ideas when searching for other issues.

    Thanks again.

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply