How to remove infinite loops in hierarchical data without using cursor?

Question

Post reply

How to remove infinite loops in hierarchical data without using cursor?

lxz20

Mr or Mrs. 500

Points: 592
More actions
June 3, 2009 at 2:40 pm

#155779

Does anyone know a way to clean up data in a hierarchical table without using cursor?
Here is the hierarchical table definition: NodeItem (NodeID int, ParentNodeID int) where ParentNodeID is also an entry in the same table.
By walking up the hierarchy, I can show all the upper node items of a given node up to the root node level.
The infinite loop problem comes up when there is data corruption, like ParentNodeID loops back to the starting NodeID after some steps. This data issue can be discovered easily by using the new CTE feature of SQL Server 2005 to display the hierarchical data all at once and setting the MAXRECURSION Option to a high value (like 50). If there is infinite loop, CTE will fail (or throw an exception).
However, I'm struggling to identify the data involved in the infinite loop and clean them up. The only way I can think of is to put the entire NodeItem table into a cursor loop, and examine data row by row following the NodeID / ParentNodeID relationship in a temp table. The performance is quite bad because the table has over 1 million rows.
My question: is there a non-cursor way to identify all the rows involved in infinite loop?
Thanks.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Jason- SSCrazy Points: 2764 More actions · Answer 1

There's always a non-cursor way, but without some sample data and sample code, you may never know that way...

Copy in some sample data depicting your scenario it will help other readers determine how to help you.

-

pawan.falor Old Hand Points: 307 More actions · Answer 2

If I'm not wrong sample data may be like this...

create table NodeItem

(NodeID int, ParentNodeID int)

go

insert into NodeItem

select 1 NodeID, null ParentNodeID union all

select 11 , 1111 union all

select 12 , 1 union all

select 2 , null union all

select 21 , 2 union all

select 22 , 2 union all

select 111 , 11 union all

select 112 , 11 union all

select 1111 , 111

go

select * from NodeItem

go

Close loop is created between these 11 >> 1111 >> 111 >> 11

Expected output:

11, 1111

1111, 111

111, 11

sitarama.rajumudunuru SSC Journeyman Points: 80 More actions · Answer 3

sitarama.rajumudunuru

SSC Journeyman

Points: 80

June 4, 2009 at 1:03 am

#1004444

Use a while loop

lxz20 Mr or Mrs. 500 Points: 592 More actions · Answer 4

I should have clarified that:

(1) There may be many root nodes (root = dead end or no more parent entry found);

(2) Not all paths will have the same depth (steps);

(3) Infinite loop can be created in 0 or more steps

0: A -> A;

1: A -> B -> A;

n: A -> B -> C -> ... -> A.

Here are some sample data:

create table NodeItem

(NodeID int, ParentNodeID int)

go

insert into NodeItem

select 1 NodeID, -1 ParentNodeID union all -- 1st root

select 2 , NULL union all -- the 2nd root

select 3 , -2 union all -- a 3rd root

select 11 , 11 union all -- infinite loop: 0 step (11 -> 11)

select 222 , 22 union all

select 22 , 222 union all -- infinte loop: 1 step (22-> 222 -> 22)

select 33 , 3 union all

select 333 , 33

go

-- I use this CTE to discover whether there are infinite loop in data

WITH HierarchyTree (NodeID, ParentNodeID, Step) AS

(

SELECT NodeID, ParentNodeID, 1

FROM NodeItem

UNION ALL

SELECT r.NodeID, r.ParentNodeID, t.Step + 1

FROM NodeItem r inner join HierarchyTree t on r.ParentNodeID = t.NodeID

)

SELECT Count(NodeID)

From HierarchyTree

Where Step = 50 -- a limit

OPTION (HASH JOIN, MAXRECURSION 50);

If the above CTE throws error, I know there are bad data. Since the table has millions of rows, looping row by row to find out whether a row is involved in infinite loop is a rather slow process. However, I haven't figured out a non-cursor way to list all those offending rows.

Mark Cowne One Orange Chip Points: 26865 More actions · Answer 5

This should give you all of the invalid nodes

WITH HierarchyTree (NodeID, ParentNodeID, Step) AS

(

SELECT NodeID, ParentNodeID, 1

FROM NodeItem

WHERE ParentNodeID = 0

UNION ALL

SELECT r.NodeID, r.ParentNodeID, t.Step + 1

FROM NodeItem r

inner join HierarchyTree t on r.ParentNodeID = t.NodeID

)

SELECT NodeID

FROM NodeItem

WHERE NodeID NOT IN (SELECT NodeID FROM HierarchyTree)

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

lxz20 Mr or Mrs. 500 Points: 592 More actions · Answer 6

Old Hand,

Your CTE will not work in my instance because Root Nodes may not have ParentNodeID = 0. As I said, there might be many root nodes.

Thanks anyway.

Mark Cowne One Orange Chip Points: 26865 More actions · Answer 7

lxz20 (6/4/2009)
Old Hand,
Your CTE will not work in my instance because Root Nodes may not have ParentNodeID = 0. As I said, there might be many root nodes.
Thanks anyway.

So how do you identify a root node?

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Mark Cowne One Orange Chip Points: 26865 More actions · Answer 8

Mark (6/4/2009)
lxz20 (6/4/2009)
Old Hand,
Your CTE will not work in my instance because Root Nodes may not have ParentNodeID = 0. As I said, there might be many root nodes.
Thanks anyway.
So how do you identify a root node?

I'll take a guess that root nodes don't have a valid parent, in which case this should work

WITH HierarchyTree (NodeID, ParentNodeID, Step) AS

(

SELECT NodeID, ParentNodeID, 1

FROM NodeItem

WHERE ParentNodeID NOT IN (SELECT NodeID FROM NodeItem) OR ParentNodeID IS NULL

UNION ALL

SELECT r.NodeID, r.ParentNodeID, t.Step + 1

FROM NodeItem r

inner join HierarchyTree t on r.ParentNodeID = t.NodeID

)

SELECT NodeID

FROM NodeItem

WHERE NodeID NOT IN (SELECT NodeID FROM HierarchyTree)

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

GSquared SSC Guru Points: 260824 More actions · Answer 9

The first thing to do is to select all rows where the parent ID is equal to the node ID. Eliminate/fix those, then add a constraint that blocks that from coming up again.

Next is to do the same for ones where the child is the parent of the parent.

After that, it's pretty much going to need to be a cursor.

There are ways to prevent this kind of thing in the first place, but cleaning it up once it's already in there is a bit of a pain.

One thing I always recommend on this kind of hierarchy, is have a "TopNodeID" type column. All levels in the same hierarchy have the same entry in that column. Speeds them up immensely. Makes it easy to prevent/fix these type of corruption issues too, since you can join on the TopNodeID column and find the loops right there. But since you don't have that, it doesn't help in this situation.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

lxz20 Mr or Mrs. 500 Points: 592 More actions · Answer 10

Old Hand,

Your 2nd CTE still blew up with the following sample data:

insert into NodeItem

select 1 NodeID, -1 ParentNodeID -- 1st root

union all select 2 , NULL -- the 2nd root

union all select 3 , -2 -- a 3rd root

union all select 11 , 11 -- infinite loop: 0 step (11 -> 11)

union all select 22 , 2

union all select 222 , 22

union all select 22 , 222 -- infinte loop: 1 step (22-> 222 -> 22)

union all select 2222 , 222

union all select 33 , 3

union all select 333 , 33

union all select 3, 333 -- infinte loop: 2 steps (3 -> 333 -> 33 -> 3)

go

Since how data is put into this table is beyond my control, I'm wondering whether using cursor loop to clean up is the only way to go, like GSquared suggested.

GSquared SSC Guru Points: 260824 More actions · Answer 11

You could potentially avoid a cursor in this. But it'll take more work and be less efficient.

Cursors do have their uses. Limited, yes, but uses nonetheless.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Mark Cowne One Orange Chip Points: 26865 More actions · Answer 12

lxz20 (6/4/2009)
Old Hand,
Your 2nd CTE still blew up with the following sample data:
insert into NodeItem
select 1 NodeID, -1 ParentNodeID -- 1st root
union all select 2 , NULL -- the 2nd root
union all select 3 , -2 -- a 3rd root
union all select 11 , 11 -- infinite loop: 0 step (11 -> 11)
union all select 22 , 2
union all select 222 , 22
union all select 22 , 222 -- infinte loop: 1 step (22-> 222 -> 22)
union all select 2222 , 222
union all select 33 , 3
union all select 333 , 33
union all select 3, 333 -- infinte loop: 2 steps (3 -> 333 -> 33 -> 3)
go
Since how data is put into this table is beyond my control, I'm wondering whether using cursor loop to clean up is the only way to go, like GSquared suggested.

NodeID 22 and 3 are repeated in your data, is that what you intended? These would be easy to identify and remove.

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

lxz20 Mr or Mrs. 500 Points: 592 More actions · Answer 13

Yep, NodeIDs are repeated here as examples to show infinite loops in 0, 1, or 2 steps. These loops are actually easier to catch and removed. There are other instances where a node is NOT looped back to itself until like 7 or 10 steps later. I would like a non-cursor way to list / remove all those n step loops, if possible, for performance reason. Currently I have to use a cursor, going through row by row following the following pseudo-code. The performance is NOT good because I have to use 2 loops:

Put all data into a cursor

Begin cursor loop

Fetch a row into a temp table

Inner While Loop (while there is a new parent)

Follow the chain to get a new parent row

If parentNodeID = nodeID in temp table

Infinite Loop -- do something here

Break Inner While Loop

Else

Insert parent row into the same temp table

End Inner While Loop

End corsor loop

lxz20 Mr or Mrs. 500 Points: 592 More actions · Answer 14

FYI, nesting loops is needed to list the offending data to send back to the other team so that they can clean it up at the source.