Creating a recursive hierarchy based on a flat file

  • Hello:

    I hope someone can help me with this--I'm at my wits' end!

    I inherited a biological taxonomy database containing this table:

    TaxonLookup (all fields are varchar(50))

    Phylum

    Class

    Subclass

    Superorder

    Order

    Suborder

    Superfamily

    Family

    Subfamily

    Genus

    Species

    Subspecies

    It seems incorrect to me to have all of this data for each item, so my plan was to create these tables:

    TaxonUnitType TaxonUnit

    RankID (smallint, PK) TaxonUnitID (smallint, PK)

    RankName (varchar(50)) UnitName (varchar(50))

    DirectParentRankID (smallint) ParentTaxonUnitID (smallint)

    RequiredParentRankID (smallint) RankID (smallint, FK)

    So the TaxonUnitType table serves as a recursive hierarchy; for example, if Superorder's RankID is 4 and Class's RankID is 2, Order's record would look like this:

    RankID.....................5

    RankName.................Order

    DirectParentRankID.....4

    RequiredParentRankID..2

    Now I need to create the same kind of hierarchy with the ParentTaxonUnitID in the TaxonUnit table, but it's considerably more difficult since there are tens of thousands of records to deal with. I need to come up with an UPDATE query that will, for example, find the genus associated with each species in the flat file, find the TaxonUnitID whose UnitName matches the genus name, then insert the genus's TaxonUnitID into the species's ParentTaxonUnitID field.

    Does this make any sense? Can anyone help a poor stupid newbie? :crying:

  • Break it up into stages. Assuming your flat file has species and genus in it as columns, you import that into a table with the same columns as the flat file. Then add the ID numbers for the parents to that table. Once you have that, insert into the final table.

    Does that help?

    If you provide a few sample rows of what the flat file looks like, I can help write import code, but without that, I really can't go beyond a description.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I have a question, also, on your proposed schema. Why would the required parent ID of Order be the ID of Class? I think you'd be better off with that rule in a separate join table. Have a table of the allowed values for parent rank, and use that in your keys.

    For example:

    create table dbo.Ranks (

    ID int identity primary key,

    Name varchar(100) not null);

    go

    create table dbo.Ranks_Ranks (

    RankID int not null,

    ParentRankID int not null,

    constraint PK_Ranks_Ranks primary key (RankID, ParentRankID));

    go

    create table dbo.Units (

    ID int identity primary key,

    ParentID int null references dbo.Units(ID),

    RankID int not null references dbo.Ranks(ID),

    Name varchar(100));

    go

    create trigger Units_RankEnforcement on dbo.Units

    after insert,update

    as

    if exists

    (select *

    from inserted

    inner join dbo.Ranks_Ranks

    on inserted.RankID = Ranks_Ranks.RankID

    left outer join dbo.Units

    on inserted.ParentID = Units.ID

    and Units.RankID != Ranks_Ranks.ParentRankID)

    rollback;

    Something like that would allow you to hold the rule in the database, and you could allow a genus to have a parent that was either a family or subfamily, but not that was an order or phylum (for example).

    Not that the taxonomic categories are likely to change all that frequently, but they do change over time. Was originally just Phylum, Class, Order, Family, Genus, Species, without the sub/super categories. And what happens if someone wants to add something at the bottom, like Breed (for dogs or horses)? Easy to do with something like this.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks, GSquared! You've definitely gotten me on the right track. I think I have some sort of mental block when it comes to recursive relationships. 🙂 As for your suggestion for the trigger, that's a good idea. I based my tables on an ITIS (Integrated Taxonomic Information System) model I found online--I should have known better than to try and get database design guidance from a government organization!

    Thanks again!

  • You're welcome. Let us know how it works out, okay?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply