Loop won''t terminate

Question

Loop won''t terminate

john-209445

SSChasing Mays

Points: 600
More actions
April 6, 2005 at 6:15 pm

#65634
```
 
```
Hello, wondering if any have encountered this problem:
I am looping through about 50,000 records. importing them from one table to another. For each record, selecting the data i want into local variables, then passing those variables to a stored procedure that does 2 things. 1.) checks to see if data already exists 2.) inserts the data if not.
The script runs for several minutes until it has imported about 32,700 records then stops importing...but continues running (processor at 100% the whole time, memory = 600mb/1000mb).
I have let it run for 11 hours...and it doesn't get past about 32K of 50k records. If i comment out the stored procedure and print out the number of times the loop executed, it runs in 2 seconds without fail.
I will insert both my script (loop), and my stored procedure if it helps any.
thanks for any help.
Script:
--## Import Addresses to Households from Member table to Address table
Declare @HouseHoldID as int
Declare @mailaddress1 as varchar(100), @mailaddress2 as varchar(100), @mailcity as varchar(100), @mailstate as varchar(100), @mailzip as varchar(100), @mailcountry as varchar(100)
DBCC CHECKIDENT (Address, RESEED, 0) -- reset the identity
Declare @MaxID as int, @CurrentID as int
Set @CurrentID = 1
SELECT @MaxID = Max(MVINumber) From mvi.mvi.Member
While @CurrentID <= @MaxID
BEGIN
/* do work here */

-- clear variables
Set @HouseHoldID = null
Set @mailaddress1 = null
Set @mailaddress2 = null
Set @mailcity = null
Set @mailstate = null
Set @mailzip = null
Set @mailcountry = null
-- get the source info from member table.
Select @mailaddress1=mailaddress1, @mailaddress2=mailaddress2, @mailcity=mailcity, @mailstate=mailstate, @mailzip=mailzip, @mailcountry=mailcountry
from mvi.mvi.member
where mvinumber = @CurrentID

-- insert the info into the new schema.

--address is to be linked to householdid. Get householdID
Select @HouseHoldID = ContentGroupID From Person_View Where EntityID = @CurrentID
--insert the addresses
IF @mailaddress1 is not null
begin
Exec Address_Insert @HouseHoldID=@HouseHoldID, @address1=@mailaddress1, @address2=@mailaddress2, @city=@mailcity, @state=@mailstate, @zip=@mailzip, @country=@mailcountry, @addresstypeid=0
end
/* end work here */
Set @CurrentID = @CurrentID + 1
END
Stored procedure Address_Insert:
ALTER PROCEDURE Address_Insert
(
@HouseHoldID as int,
@AddressTypeID as int,
@Address1 as varchar(100),
@Address2 as varchar(100),
@City as varchar(100),
@State as varchar(100),
@Zip as varchar(100),
@Country as varchar(100),
@ContentGroupID as int = -1
)
AS

Declare @Count as int
Select @Count = Count(AddressID) From Address_View Where HouseHoldID=@HouseHoldID And ContentTypeID=@AddressTypeID
```
If @Count < 1
begin
 -- Insert into ContentSequence system...retrieve ContentSequenceID
 Declare @ContentSequenceID as int
 Exec @ContentSequenceID = ContentSequence_Insert @ContentGroupID=@ContentGroupID, @ContentTypeID=@AddressTypeID
```
```
 -- Insert the address record
 Insert Into Address( HouseHoldID, Address1, Address2, City, State, Zip, Country, ContentSequenceID )
 Values( @HouseHoldID, @Address1, @Address2, @City, @State, @Zip, @Country, @ContentSequenceID )
end
```
Stored Procedure: ContentSequence_Insert:
ALTER PROCEDURE ContentSequence_Insert
(
  @ContentGroupID as int = -1,
  @ContentTypeID as int = null
)
AS
IF @ContentGroupID < 1  --if no group exists...Get new groupid
begin
Exec @ContentGroupID = ContentGroup_Insert
end
--get the max(ContentSequence) for inserting the record.
Declare @ContentSequence as int
Select @ContentSequence = Max(ContentSequence) +1
From ContentSequence
Where ContentGroupID = @ContentGroupID
IF (@ContentSequence IS NULL)  -- check for null value...increment to 0 if null.
begin
Set @ContentSequence = 0
end
--insert the record.
Insert into ContentSequence(ContentGroupID, ContentTypeID, ContentSequence)
Values( @ContentGroupID, @ContentTypeID, @ContentSequence )
RETURN @@Identity --return ContentSequenceID
Stored Procedure: ContentGroup_Insert:
ALTER PROCEDURE ContentGroup_Insert
AS
Declare @ContentGroupID as int
Insert Into ContentGroup(ContentGroupDate)
Values(GetDate())
Set @ContentGroupID = @@Identity
-- Return the ContentGroup Identity so that a record may participate in the ContentSequence System.
-- all participating records must be part of a ContentGroup.
Return @ContentGroupID
.
IDENT_CURRENT returns the last identity value generated for a specific table in any session and any scope.
@@IDENTITY returns the last identity value generated for any table in the current session, across all scopes.
SCOPE_IDENTITY returns the last identity value generated for any table in the current session and the current scope.

Bert: I had no problem with unrestricted growth of datbase and log files.

Grasshopper: I avoided cursors, because i seem to get much better performance with a counter rather than a cursor.

Enthusiast: is a procedural approach to SQL always wrong? It seems that my approach was the best for me since i wanted to take advantage of data logic already written and stored in other stored procedures. How are you supposed to reuse your stored procedures if you are going to only use t-sql everytime you write a complicated routine?

Example: i don't want to write an Insert statement as in your example above, i want to let my stored procedure handle it, so i don't have to rethink the process. I appreciate any more information you can give me on where my approach has gone wrong.

.

rfrancisco Mr or Mrs. 500 Points: 544 More actions · Answer 1

As to why it stops inserting after 32700 rows, it's possible that data being inserted to the Address table or Address_View view contains duplicate records on the HouseholdID and ContentTypeID.

To give us a complete view of the problem on why the loop won't terminate, please insert the ContentSequence_Insert stored procedure.

http://www.sql-server-helper.com

john-209445 SSChasing Mays Points: 600 More actions · Answer 2

I have included 2 more stored procedures to complete the code.

During my testing, i had commented out the execution of the stored procedure:

Exec @ContentSequenceID = ContentSequence_Insert @ContentGroupID=@ContentGroupID, @ContentTypeID=@AddressTypeID

in Address_Insert.

It showed no change in it's behavior.

.

Dølle Døck Mr or Mrs. 500 Points: 577 More actions · Answer 3

Why do you not use a cursor for this job?

Regards,

Anders Dæmroen
epsilon.no

Bert De Haes Hall of Fame Points: 3078 More actions · Answer 4

Did you verify the log and data size ? Maybe the log or data file is full, and they can not grow ( not automatically grow, Restricted file growth or disk full ) ?

To see what is happening, you can print the CurrentID in the beginning of the loop. So you can see where exactly it goes wrong.

I would suggest not to use @@identity. See BOL and IDENT_CURRENT ...

Also, check for null for @HouseHoldID and @AddressTypeID in Address_Insert ...

Bert

JP de Jong-202059 SSCommitted Points: 1674 More actions · Answer 5

Hi,

You declare variables as int, which can hold 32768 as maximum number.

Declare maxID and CurentID as bigint (or numeric) and you're fine!

JP

Bert De Haes Hall of Fame Points: 3078 More actions · Answer 6

Int can hold up to 2147483647. Smallint is limited to 32767 .

From BOL:

bigint

Integer (whole number) data from -2^63 (-9,223,372,036,854,775,808) through 2^63-1 (9,223,372,036,854,775,807). Storage size is 8 bytes.

int

Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647). Storage size is 4 bytes. The SQL-92 synonym for int is integer.

smallint

Integer data from -2^15 (-32,768) through 2^15 - 1 (32,767). Storage size is 2 bytes.

tinyint

Integer data from 0 through 255. Storage size is 1 byte.

JP de Jong-202059 SSCommitted Points: 1674 More actions · Answer 7

JP de Jong-202059

SSCommitted

Points: 1674

April 8, 2005 at 7:50 am

#551496

you're right! Got confused with vb & vbscript

JP

ron k SSCertifiable Points: 6053 More actions · Answer 8

You might try breaking the job into smaller batches to see what happens.

Also, I don't see any kind of error checking so you have no way of knowing if any of these statements failed and why?

Here's a good article on error handling:

http://www.sommarskog.se/error-handling-II.html#presumptions

john-209445 SSChasing Mays Points: 600 More actions · Answer 9

Thanks for all the suggestions, i was able to complete the process without any problems. Here are my conclusions and questions:

The process completed when I stopped using @@Rowcount in Address_Insert and replaced it with @Count. Is there a problem with using system variables like @@Rowcount and @@Identity? Does this compromise my database?

Bert, you recommended against using @@Identity, but i am using it because i only want the identity of a table that was inserted in the current scope. i noticed that using IDENT_CURRENT could give me the Identity from another scope. here's what BOL says:

john-209445 SSChasing Mays Points: 600 More actions · Answer 10

So how do you recommend writing reusable code?

Do you recommend the use of Stored Procedures? If so, how am i supposed to run tests on a value before executing a stored procedure? I can run those tests from another language, but that may not be as practical.

Scenario:

lets say you group addresses together with GroupID and link them to customers with customerID.

A client application attempts to add an address to a Customer. You need to perform the following logic:

1. check to see if customer exists.

2. check to see if customer already has any addresses

a.) if addresses exist, get the groupID that they belong to.

b.) if addresses dont' exist, obtain a new groupID.

3. Insert the address with the CustomerID and GroupID

Where do you recomment placing this logic? And how would you write "Good Code" that would accomplish this task?

.

Michael Du Bois SSCertifiable Points: 5143 More actions · Answer 11

Hi John just a couple of observations, your database allows alternate address fields (2) but only one field for city, state and county. What if the alternate address is in another state? Secondly if you check the first post by Joe Celko he gave you the code for moving the information from one table to another.

From Joe Celko’s first post.

“INSERT INTO Foo( ...)

SELECT ..

FROM Bar

WHERE NOT EXISTS

(SELECT *

FROM Foo

WHERE Foo.key_col = Bar.key_col);

Updating the address for an existing customer (your last post) is a different question. What are your business rules for validating your data? Are you relating Customer Id to First and Last Names, or Customer ID to address? You can write an UPDATE statement to change the value in an existing record.HTH

Mike

john-209445 SSChasing Mays Points: 600 More actions · Answer 12

Michael,

Address 1 and Address 2 are for line 1 and line 2 of an address respectively. AddressTypeID differentiates between mailing, billing, shipping addresses...etc...

so one Customer to Address is a one to many relationship.

You are correct in stating that Joe Celko's code moves data from one table to another. That isn't my problem. My question is about using procedural code. Is it a bad idea to use procedural code? Joe calls it "Bad Code". I'm more interested in why he calls it bad code. It is obvious that performance is lost, but i've found it to be very practical to check conditions of data before Updating or Inserting. I've also found it valuable to call other stored procedures which makes my code reusable...and saves large amounts of time in development and also simplifies the amount of code to manage in a project.

thanks for the input

.

Vladan SSC-Insane Points: 22072 More actions · Answer 13

John,

I agree that it is a bad habit to use procedural code, unless it is necessary for some reason. I don't see any such reason in this example, however, I don't understand much what you are doing with the ContentGroup and ContentSequence tables. Could you explain this in more detail, please? Maybe we are missing something important here...

I couldn't help thinking, that a simple SQL code with a branch (in short pseudocode IF EXISTS -> UPDATE, ELSE INSERT) would accomplish the same, while being simpler, more reliable, easier to maintain and without risk of infinite looping. Quite a lot of reasons to qualify it as better . Even these thing with Content should be manageable the same way, though this is just a guess until I know more about what it is, what are the rules and how it is used.

HTH, Vladan

Michael Du Bois SSCertifiable Points: 5143 More actions · Answer 14

"It is obvious that performance is lost, but i've found it to be very practical to check conditions of data before Updating or Inserting."

John you have answered your own question . Performance is improved. The code is also much simpler and easer to maintain.

As to checking conditions of data before Updating or Inserting it is not only practical but imperative that you filter data before updating or adding records.

Valadan's suggestion of a branching statement checks to see if the customer exists and then either Inserts a new record or updates the existing record. A carefully constructed WHERE statement will validate your data. If any of the conditions are not met then the data will not be added or updated.

But it could be that I am missing something here. It might be helpful if you could provide the business rules and what you are trying to accomplish.

HTH

Mike