how many times cursor has looped....

  • Good morning Everybody.

    I know that it might be Not a best solution to use cursor....however here is my question:

    using below USP to 'scramble' some test data...

    it's a SP with cursor inside that replaces one symbol with the other...

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    Create PROCEDURE [dbo].[mask_data]

    AS

    SET NOCOUNT ON

    DECLARE @newVARCHAR(11)

    DECLARE @oldVARCHAR(11)

    DECLARE @col varchar(10)

    DECLARE @get cursor

    Set @get = CURSOR FOR

    SELECT col1, col2

    FROM table

    order by col1

    OPEN @get

    FETCH NEXT FROM @get INTO @icol,@old

    WHILE (@@FETCH_STATUS <> -1)

    BEGIN

    select something

    from something

    UPDATE table

    SET col2 = @new

    where col1 = @col

    FETCH NEXT FROM @get INTO @col, @old

    -- END

    END

    CLOSE @get

    DEALLOCATE @get

    I would like to may be optimize it, if possible

    To see what is going on there i am running this-

    SELECT execution_count, st.text

    FROM sys.dm_exec_query_stats as qs

    CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st

    WHERE st.text like '%FETCH NEXT FROM @get INTO%'

    It does shows me that it looped 90,000 times ..which is total number of records for that clmn

    BUT WHY, WHY in the output 'text' column it shows me that statement that was executed 90,000 times was 'CREATE PROCEDURE [dbo].[mask_data] AS SET NOCOUNT ON DECLARE @new VARCHAR(11) ......

    Why 'CREATE' ?? why not 'execute'? DOes that means that, for example, execution plan has not been reused after it was created and executed first time?

    Am i missing something? 🙂

  • I'm skipping the question completely, and going to what i think was the core issue...scrambling data.

    I stopped as soonas I saw the cursor, because I know that unless you are passing specific values to a procedure ,or doing something with the metadata, like DDL operations, anything that is using a curosr can be replaced with a set based command instead.

    It looks like you have a varchar(11) column you need masked, and it can probably be done with a CTE and an update in a single super quick set based operation instead.

    what are the rules for the new value? any old combination of stirngs and numbers? an arbitrary value from a dictionary?

    I'm rubbing my hands with glee, because if you can describe the expected results, I can throw an example out there to update your table in a signel shot with random data.

    This is one of my favorite style of forum posts to help with.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I knew, i will step on 'cursor killers' 🙂 🙂

    Sure, thank you for the offer.

    Need some time to modify script ( for security purposes ) before i can 'publish ' it.

  • SD1999 (12/6/2011)


    I knew, i will step on 'cursor killers' 🙂 🙂

    Sure, thank you for the offer.

    Need some time to modify script ( for security purposes ) before i can 'publish ' it.

    yeah guilty as charged, we are almost as bad as a spelling nazi, I know 😀

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • wait..it gets even worse 😀 , i have udf that being used in that cursor

    so heres 2 queries to scramble users phone #

    first is a function

    second is that cursor's loop that utilizes that function

    Oooookk, hit me ..show me please how BAD it is 🙂

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    create function [dbo].[fn_phn_mask]

    (

    @Origtring as varchar(10),

    @charstobereplaced as varchar(10),

    @charsreplacingwith as varchar(10)

    )

    returns varchar(10)

    as

    begin

    declare @output as varchar(10)

    -- declare the return variable here

    declare @pos as int

    declare @index as int

    declare @numtoreplace as varchar(1)

    declare @replacementnum as varchar(1)

    declare @rnums as varchar(10)

    set @pos = 1

    set @output = @Origtring

    set @rnums = ''

    while @pos<= len(@output)

    begin

    set @numtoreplace = substring(@output, @pos, 1)

    -- look up the number to se if it shoul be replaced

    set @index = Charindex(@numtoreplace, @charstobereplaced)

    if @index=0

    begin

    set @replacementnum= @numtoreplace -- leave the number as it is

    end

    else

    begin

    -- get the number that will replace the original

    set @replacementnum = substring(@charsreplacingwith, @index, 1)

    end

    select @output = stuff(@output, @pos, 1, @replacementnum)

    set @pos = @pos+1

    end

    return @output

    end

    ----------------------------------------

    --------------------------------------

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    Create PROCEDURE [dbo].[P_mask_phn]

    AS

    SET NOCOUNT ON

    DECLARE @new_phnVARCHAR(11)

    DECLARE @old_phnVARCHAR(11)

    DECLARE @icustomer varchar(10)

    DECLARE @get_phn cursor

    Set @get_phn = CURSOR FOR

    SELECT icustomer,phn

    FROM dbo.customer

    order by icustomer

    OPEN @get_phn

    FETCH NEXT FROM @get_phn INTO @icustomer,@old_phn

    WHILE (@@FETCH_STATUS <> -1)

    BEGIN

    SELECT @new_phn = SUBSTRING(@old_phn, 1, 1)+

    DBO.fn_phn_mask(SUBSTRING(@old_phn, 2, 1), '5496321708', '9876543210')+

    DBO.fn_phn_mask(SUBSTRING(@old_phn, 3, 1), '6985743021', '9876543210')+

    DBO.fn_phn_mask(SUBSTRING(@old_phn, 4, 1), '6745832912', '9876543210')+

    DBO.fn_phn_mask(SUBSTRING(@old_phn, 5, 1), '7854930216', '9876543210')+

    DBO.fn_phn_mask(SUBSTRING(@old_phn, 6, 1), '1896423057', '9876543210')+

    DBO.fn_phn_mask(SUBSTRING(@old_phn, 7, 1), '8756932014', '9876543210')+

    DBO.fn_phn_mask(SUBSTRING(@old_phn, 8, 1), '2814603975', '9876543210')+

    DBO.fn_phn_mask(SUBSTRING(@old_phn, 9, 1), '7854930216', '9876543210')

    UPDATE dbo.customer

    SET phn = @new_phn

    where icustomer = @icustomer

    FETCH NEXT FROM @get_phn INTO @icustomer, @old_phn

    END

    CLOSE @get_phn

    DEALLOCATE @get_phn

  • well, here's how I do it;

    My assumption is the phone number string in question is all numeric with no dashes, but it's not hard to add logic to fiddle with that as well.

    instead of the proc calling a function, it's just a straight update: Performance wise, this updated a bit more than a million rows in 16 seconds on my decent dev machine.

    (1100011 row(s) affected)

    CREATE TABLE #MyContactTable (ID int,name varchar(30),phone varchar(11))

    INSERT INTO #MyContactTable

    SELECT 8,'Dee','3333333333' UNION ALL

    SELECT 9,'Monkey','3333333333' UNION ALL

    SELECT 10,'Robot','9999999999' UNION ALL

    SELECT 11,'Lizard','3333333333' UNION ALL

    SELECT 12,'Lauray','9542179597' UNION ALL

    SELECT 14,'Roberts','9548878712' UNION ALL

    SELECT 15,'Maluroney','3333333333' UNION ALL

    SELECT 18,'Jordan','9544548899' UNION ALL

    SELECT 21,'Seinfield','3333333333' UNION ALL

    SELECT 22,'Jackson','3333333333' UNION ALL

    SELECT 23,'Mourning ','1111111111'

    --now the randmization update.

    UPDATE #MyContactTable

    SET #MyContactTable.phone = RANDDATA.Random_String

    FROM

    (select #MyContactTable.ID,

    --dashes are commented out if needed later.

    Random_String =

    substring(x,(abs(checksum(newid()))%10)+1,1) +

    substring(x,(abs(checksum(newid()))%10)+1,1) +

    substring(x,(abs(checksum(newid()))%10)+1,1) +

    -- '-' +

    substring(x,(abs(checksum(newid()))%10)+1,1) +

    substring(x,(abs(checksum(newid()))%10)+1,1) +

    substring(x,(abs(checksum(newid()))%10)+1,1) +

    -- '-' +

    substring(x,(abs(checksum(newid()))%10)+1,1) +

    substring(x,(abs(checksum(newid()))%10)+1,1) +

    substring(x,(abs(checksum(newid()))%10)+1,1) +

    substring(x,(abs(checksum(newid()))%10)+1,1)

    from (select x='0123456789') a

    CROSS APPLY #MyContactTable --this gets us a random phone number for each primary key in the target table

    ) RANDDATA

    WHERE #MyContactTable.ID = RANDDATA.ID

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • One more condition here i forgot to mention...

    the logic(algorithm) of number replacements should be followed as per below

    we cant use some random number

    the reason is appl talking to oracle as well and even though numbers are scrambled, they should

    match in sql and oracle

  • What's the point of having obfuscated yet, 100% accurate data??? :blink:

  • SD1999 (12/6/2011)


    One more condition here i forgot to mention...

    the logic(algorithm) of number replacements should be followed as per below

    we cant use some random number

    the reason is appl talking to oracle as well and even though numbers are scrambled, they should

    match in sql and oracle

    still easy to fix; since an area code or exchange should not start with zero or one, you just tweak those two rand elements to return only 2-9 instead of 0-9; other than that, what is left to need a specific biz rule?

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thank you, i'll play with it.

    those are not really phone numbers, it was just for example

    no more rules, this is it

    what and where do you think would be a performance or level of complexity gain be??

  • SD1999 (12/6/2011)


    Thank you, i'll play with it.

    those are not really phone numbers, it was just for example

    no more rules, this is it

    what and where do you think would be a performance or level of complexity gain be??

    Good question;

    I've got a script that has to obfuscate a specific version of a client database; while even the biggest tables only have 40K rows, i have to obfuscate hundreds of columns.

    In that case, I pretty much have a script per table that needs to be obfuscated; all those scripts might take 3 minutes to run, tops. now, change that to be a hundred cursors, and it might take an hour or more to run instead.

    So I have the ability to restore and obfuscate in a couple of minutes, vs the much slower cursor examples.

    My snippets do stuff like replace all last names with HTML colors, so people become Jeff Brown, Jack Green, etc; replaicng urls, birthdays, other sensitive data.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply