Loop won''t terminate

  •  

    Hello, wondering if any have encountered this problem:

    I am looping through about 50,000 records.  importing them from one table to another.  For each record, selecting the data i want into local variables, then passing those variables to a stored procedure that does 2 things.  1.) checks to see if data already exists  2.) inserts the data if not.

    The script runs for several minutes until it has imported about 32,700 records then stops importing...but continues running (processor at 100% the whole time, memory = 600mb/1000mb).

    I have let it run for 11 hours...and it doesn't get past about 32K of 50k records.  If i comment out the stored procedure and print out the number of times the loop executed, it runs in 2 seconds without fail.

    I will insert both my script (loop), and my stored procedure if it helps any.

    thanks for any help.

    Script:

    --## Import Addresses to Households from Member table to Address table

    Declare @HouseHoldID as int

    Declare @mailaddress1 as varchar(100), @mailaddress2 as varchar(100), @mailcity as varchar(100),  @mailstate as varchar(100), @mailzip as varchar(100), @mailcountry as varchar(100)

    DBCC CHECKIDENT (Address, RESEED, 0) -- reset the identity

    Declare @MaxID as int, @CurrentID as int

    Set @CurrentID = 1

    SELECT @MaxID = Max(MVINumber) From mvi.mvi.Member

    While @CurrentID <= @MaxID

    BEGIN

     /* do work here */

     

     -- clear variables

     Set @HouseHoldID = null

     Set @mailaddress1 = null

     Set @mailaddress2 = null

     Set @mailcity = null

     Set @mailstate = null

     Set @mailzip = null

     Set @mailcountry = null

     -- get the source info from member table.

     Select @mailaddress1=mailaddress1, @mailaddress2=mailaddress2, @mailcity=mailcity, @mailstate=mailstate, @mailzip=mailzip, @mailcountry=mailcountry

     from mvi.mvi.member

     where mvinumber = @CurrentID

     

     -- insert the info into the new schema.

     

     --address is to be linked to householdid.  Get householdID

     Select @HouseHoldID = ContentGroupID From Person_View Where EntityID = @CurrentID

     --insert the addresses

     IF @mailaddress1 is not null

     begin 

      Exec Address_Insert @HouseHoldID=@HouseHoldID, @address1=@mailaddress1, @address2=@mailaddress2, @city=@mailcity, @state=@mailstate, @zip=@mailzip, @country=@mailcountry, @addresstypeid=0

     end

     /* end work here */

     Set @CurrentID = @CurrentID + 1

    END

    Stored procedure Address_Insert:

    ALTER PROCEDURE Address_Insert

    (

     @HouseHoldID as int,

     @AddressTypeID as int,

     @Address1 as varchar(100),

     @Address2 as varchar(100),

     @City as varchar(100),

     @State as varchar(100),

     @Zip as varchar(100),

     @Country as varchar(100),

     @ContentGroupID as int = -1 

    )

    AS

     

    Declare @Count as int

    Select @Count = Count(AddressID) From Address_View Where HouseHoldID=@HouseHoldID And ContentTypeID=@AddressTypeID

    If @Count < 1
    begin
     -- Insert into ContentSequence system...retrieve ContentSequenceID
     Declare @ContentSequenceID as int
     Exec @ContentSequenceID = ContentSequence_Insert @ContentGroupID=@ContentGroupID, @ContentTypeID=@AddressTypeID
     -- Insert the address record
     Insert Into Address( HouseHoldID, Address1, Address2, City, State, Zip, Country, ContentSequenceID )
     Values( @HouseHoldID, @Address1, @Address2, @City, @State, @Zip, @Country, @ContentSequenceID )
    end

    Stored Procedure: ContentSequence_Insert:

    ALTER PROCEDURE ContentSequence_Insert

     (

      @ContentGroupID as int = -1,

      @ContentTypeID as int = null

     )

    AS

    IF @ContentGroupID < 1  --if no group exists...Get new groupid

    begin 

     Exec @ContentGroupID = ContentGroup_Insert

    end

    --get the max(ContentSequence) for inserting the record.

    Declare @ContentSequence as int

    Select @ContentSequence = Max(ContentSequence) +1

    From ContentSequence

    Where ContentGroupID = @ContentGroupID

    IF (@ContentSequence IS NULL)  -- check for null value...increment to 0 if null.

    begin

     Set @ContentSequence = 0

    end

    --insert the record.

    Insert into ContentSequence(ContentGroupID, ContentTypeID, ContentSequence)

    Values( @ContentGroupID, @ContentTypeID, @ContentSequence )

    RETURN @@Identity --return ContentSequenceID

    Stored Procedure: ContentGroup_Insert:

    ALTER PROCEDURE ContentGroup_Insert

    AS

    Declare @ContentGroupID as int

    Insert Into ContentGroup(ContentGroupDate)

    Values(GetDate())

    Set @ContentGroupID = @@Identity

    -- Return the ContentGroup Identity so that a record may participate in the ContentSequence System.

    -- all participating records must be part of a ContentGroup.

    Return @ContentGroupID

    .

  • As to why it stops inserting after 32700 rows, it's possible that data being inserted to the Address table or Address_View view contains duplicate records on the HouseholdID and ContentTypeID.

    To give us a complete view of the problem on why the loop won't terminate, please insert the ContentSequence_Insert stored procedure.

  • I have included 2 more stored procedures to complete the code.

    During my testing, i had commented out the execution of the stored procedure:

    Exec @ContentSequenceID = ContentSequence_Insert @ContentGroupID=@ContentGroupID, @ContentTypeID=@AddressTypeID

    in Address_Insert.

    It showed no change in it's behavior.

    .

  • Why do you not use a cursor for this job?


    Regards,

    Anders Dæmroen
    epsilon.no

  • Did you verify the log and data size ? Maybe the log or data file is full, and they can not grow ( not automatically grow, Restricted file growth or disk full ) ?

    To see what is happening, you can print the CurrentID in the beginning of the loop. So you can see where exactly it goes wrong.

    I would suggest not to use @@identity. See BOL and IDENT_CURRENT ...

    Also, check for null for  @HouseHoldID and @AddressTypeID in Address_Insert ...

    Bert

  • Hi,

    You declare variables as int, which can hold 32768 as maximum number.

    Declare maxID and CurentID as bigint (or numeric) and you're fine!

    JP

     

  • Int can hold up to 2147483647. Smallint is limited to 32767 .

    From BOL:

    bigint

    Integer (whole number) data from -2^63 (-9,223,372,036,854,775,808) through 2^63-1 (9,223,372,036,854,775,807). Storage size is 8 bytes.

    int

    Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647). Storage size is 4 bytes. The SQL-92 synonym for int is integer.

    smallint

    Integer data from -2^15 (-32,768) through 2^15 - 1 (32,767). Storage size is 2 bytes.

    tinyint

    Integer data from 0 through 255. Storage size is 1 byte.

  • you're right! Got confused with vb & vbscript

    JP

  • You might try breaking the job into smaller batches to see what happens.

    Also, I don't see any kind of error checking so you have no way of knowing if any of these statements failed and why?

    Here's a good article on error handling:

    http://www.sommarskog.se/error-handling-II.html#presumptions

     

  • Thanks for all the suggestions, i was able to complete the process without any problems.  Here are my conclusions and questions:

    The process completed when I stopped using @@Rowcount in Address_Insert and replaced it with @Count.  Is there a problem with using system variables like @@Rowcount and @@Identity?  Does this compromise my database?

    Bert, you recommended against using @@Identity, but i am using it because i only want the identity of a table that was inserted in the current scope.  i noticed that using IDENT_CURRENT could give me the Identity from another scope.  here's what BOL says:

  • IDENT_CURRENT returns the last identity value generated for a specific table in any session and any scope.
  • @@IDENTITY returns the last identity value generated for any table in the current session, across all scopes.
  • SCOPE_IDENTITY returns the last identity value generated for any table in the current session and the current scope.
  • Bert:  I had no problem with unrestricted growth of datbase and log files.

    Grasshopper:  I avoided cursors, because i seem to get much better performance with a counter rather than a cursor.

    Enthusiast:  is a procedural approach to SQL always wrong?  It seems that my approach was the best for me since i wanted to take advantage of data logic already written and stored in other stored procedures.  How are you supposed to reuse your stored procedures if you are going to only use t-sql everytime you write a complicated routine?

    Example:  i don't want to write an Insert statement as in your example above, i want to let my stored procedure handle it, so i don't have to rethink the process.  I appreciate any more information you can give me on where my approach has gone wrong.

    .

  • So how do you recommend writing reusable code?

    Do you recommend the use of Stored Procedures?  If so, how am i supposed to run tests on a value before executing a stored procedure?  I can run those tests from another language, but that may not be as practical.

    Scenario: 

    lets say you group addresses together with GroupID and link them to customers with customerID.

    A client application attempts to add an address to a Customer.  You need to perform the following logic:

    1.  check to see if customer exists.

    2.  check to see if customer already has any addresses

         a.)  if addresses exist, get the groupID that they belong to.

         b.)  if addresses dont' exist, obtain a new groupID.

    3.  Insert the address with the CustomerID and GroupID

    Where do you recomment placing this logic?  And how would you write "Good Code" that would accomplish this task?

    .

  • Hi John just a couple of observations, your database allows alternate address fields (2) but only one field for city, state and county. What if the alternate address is in another state? Secondly if you check the first post by Joe Celko he gave you the code for moving the information from one table to another.

    From Joe Celko’s first post.

    “INSERT INTO Foo( ...)

    SELECT ..

      FROM Bar

     WHERE NOT EXISTS

          (SELECT *

             FROM Foo

            WHERE Foo.key_col = Bar.key_col);

    Updating the address for an existing customer (your last post) is a different question. What are your business rules for validating your data? Are you relating Customer Id to First and Last Names, or Customer ID to address? You can write an UPDATE statement to change the value in an existing record.HTH

    Mike

  • Michael,

    Address 1 and Address 2 are for line 1 and line 2 of an address respectively.  AddressTypeID differentiates between mailing, billing, shipping addresses...etc...

    so one Customer to Address is a one to many relationship.

    You are correct in stating that Joe Celko's code moves data from one table to another.  That isn't my problem.  My question is about using procedural code.  Is it a bad idea to use procedural code?  Joe calls it "Bad Code".   I'm more interested in why he calls it bad code.  It is obvious that performance is lost, but i've found it to be very practical to check conditions of data before Updating or Inserting.  I've also found it valuable to call other stored procedures which makes my code reusable...and saves large amounts of time in development and also simplifies the amount of code to manage in a project.

    thanks for the input

    .

  • John,

    I agree that it is a bad habit to use procedural code, unless it is necessary for some reason. I don't see any such reason in this example, however, I don't understand much what you are doing with the ContentGroup and ContentSequence tables. Could you explain this in more detail, please? Maybe we are missing something important here...

    I couldn't help thinking, that a simple SQL code with a branch (in short pseudocode IF EXISTS -> UPDATE, ELSE INSERT) would accomplish the same, while being simpler, more reliable, easier to maintain and without risk of infinite looping. Quite a lot of reasons to qualify it as better . Even these thing with Content should be manageable the same way, though this is just a guess until I know more about what it is, what are the rules and how it is used.

    HTH, Vladan

  • "It is obvious that performance is lost, but i've found it to be very practical to check conditions of data before Updating or Inserting." 

    John you have answered your own question . Performance is improved.  The code is also much simpler and easer to maintain.

    As to checking conditions of data before Updating or Inserting it is not only practical but imperative that you filter data before updating or adding records. 

    Valadan's suggestion of a branching statement checks to see if the customer exists and then either Inserts a new record or updates the existing record. A carefully constructed WHERE statement will validate your data. If any of the conditions are not met then the data will not be added or updated. 

    But it could be that I am missing something here. It might be helpful if you could provide the business rules and what you are trying to accomplish.

    HTH

    Mike  

  • Viewing 15 posts - 1 through 15 (of 17 total)

    You must be logged in to reply to this topic. Login to reply