Table Theory - In Table Row Versioning - Questions - HELP!?!?

Question

Table Theory - In Table Row Versioning - Questions - HELP!?!?

brendt hess

Hall of Fame

Points: 3256
More actions
June 8, 2005 at 1:25 pm

#87908

At my current workplace, there is a group of programmers working on our next-generation app for in-house work. Their work includes the design of the enterprise DB, which is where I have a question for people.
In the current design, they are implementing in-table row versioning. In this implementation, when a row is modified, instead of actually modifying the existing row, the original row is flagged as 'deleted', and a new row is added. A pair of Date fields indicate the date/times that the row was valid from - to (labeled as Date added and date modified), with no IsDeleted flag indicating the current row.
Each table will have an identity field, and a second surrogate 'Identity' field for identifying the rows related to one entity. The table's PK (and a clustered index) will be based on the second 'Identity' field.
Personally, I believe that tables like this will become increasingly problematic as time goes on, growing ridiculously large and slow to query, and difficult to run applications against. However, I need some actual information on this:
1) Has anyone ever actually considered an implementation like this in SQL Server?
2) If so, what were your thoughts on this?
3) If you went to Implementation, what were the results?
4) Are there any existing theoretial works that support this methodology? If so, what are they?
This is one of the most intelligent groups I know on SQL Server questions, and I would really like an assist on answering this.
THANKS!

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

Sounds like a good plan... except for the part where the history of the table is kept in the master table. I can't imagine having to go through millions of useless rows just to fetch some simple data.

I would think that it's best to put the history of the data into it's own history table... Then when you need to check the history, you go in the history and when you need the data, you go get the data .

As for theory... I would think that I wouldn't want to keep the apples that have passed their due date with the good ones in the same basket... but that's just me .

Chris Hedgate One Orange Chip Points: 25041 More actions · Answer 2

I know of no theory that would support this implementation, but I can name one that says this is a bad implementation. How about the relational model of database management? You should not store different things in one table. A flag that makes a row change 'mode' indicates that there are actually more than one kind of entity stored in a table.

And what do you mean with two surrogate keys using identity? Since every new version of a row will receive a new pk it means rows in other tables that reference the old row will need to be updated with the new key.

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 3

Hehe.. I missed that one (referencial integrity). But I think it's the best reason to stop (or not start) this non-sens. Not to re-mention speed, ease of use, correct design and a few others I'm surely missing again.

SteveB-74825 Hall of Fame Points: 3290 More actions · Answer 4

this is another classic example of programmers getting their hands on the data, brainstorming, and then coming up with a solution that probably fits their needs and they can rationalise until the cows come home.

i'd say your concerns are more than justified. put your foot down and stand your ground, you have the relational model to fall back on. fight the power.

cheers

dbgeezer

brendt hess Hall of Fame Points: 3256 More actions · Answer 5

>>And what do you mean with two surrogate keys using identity?

Here is what a sample table layout would look like, with generic identifiers:

VersionID int identity

RecordID int <<--- Core of PK

RecordData varchar(12)

IsDeleted bit

Created datetime

CreatedBy Varchar(15)

LastModified datetime

LastModifiedBy varchar(15)

So, let's enter data item 1, and edit it twice:

1 1 ThisisTests 1 06/01/2005 12:30:00 Brendt 06/02/2005 10:15:00 Mike
2 1 ThisisTest 1 06/02/2005 10:15:00 Mike 06/04/2005 08:44:10 Brendt
3 1 ThisIsATest 0 06/04/2005 08:44:10 Brendt

As near as I can tell, this means RI through triggers, and no actual PK/FK relationships. Piles of data, long queries or inclusion of LastModified Is Null in all queries. Lots of additional overhead, bad indexing, and bad performance is what I see coming.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 6

Ninja's_RGR'us

SSC Guru

Points: 294069

June 8, 2005 at 4:43 pm

#564023

Amen.

May the force be with you .

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 7

Can you live with this type of query???

Select VersionID, RecordID, RecordData, isDeleted, Created, CreatedBy, LastModified, LasmodifiedBy from dbo.YourTable Main inner join

(Select RecordID, max(LastModified) as LastModified

from dbo.YourTable

group by RecordID) dtCurrentDate on Main.RecordID = dtCurrentDate.RecordID and Main.LastModiFied = dtCurrentDate.LastModified

EDIT

Sorry I forgot about the IsDeleted Column. You'd still have to scan the whole table, flush 90% of the data, to present the whole table in a report.... is this something you can live with???

brendt hess Hall of Fame Points: 3256 More actions · Answer 8

Nope - that's why I am looking for outside reinforcing opinions on this table structure. Thanks!

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 9

If you need real punch, you should ask Joe Celko to comment on this design. They'll start running for their mommies once he's done with 'em .

Chris Hedgate One Orange Chip Points: 25041 More actions · Answer 10

You don't even need to look at how complex the queries for this table will be. It is absolutely not an acceptable design! There is no PK (RecordID cannot be PK since there will be many rows with equal values for it). Like I said before the table contains multiple types of entities. There is unnecessary data stored with the Created/LastModified design. There is no real key (at least not in this example). Etc.

Say no, get a copy of Fabian Pascal's Practical Issues in Database Management and have the designers read that if they are going to have anything to say about the database design.

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/

SteveB-74825 Hall of Fame Points: 3290 More actions · Answer 11

Indeed.

To the dark side poor design leads.

cheers

dbgeezer

Frank Kalis SSC Guru Points: 111183 More actions · Answer 12

Chris already mentioned all there is to say. You should suggest, more or less gently, that those developers take some course in database design. Maybe you can get your hands on http://www.oreilly.com/catalog/databaseid/

I've started yesterday to review it and it looks very promising. Relational theory mere mortals outside the ivory tower can understand.

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Janne-232659 SSC Rookie Points: 34 More actions · Answer 13

As it has been said: History should be managed with different table or you could consider using different dbms product that has a native support for multiversion data.

Carl Federl One Orange Chip Points: 25384 More actions · Answer 14

I have successfully implimented this type of historical schema but with a slightly different approach such as:

-- The CORE table

create table Partys

( PartyId integer identity(1,1)

, EffectiveDt datetime not null

, ExpirationDt datetime not null

, constraint Partys_P primary key (PartyId)

)

-- The Version table

-- Note that PartyVersions should not have its own identity column.

Create table PartyVersions

(PartyId integer identity(1,1) not null

,AsOfDt datetime not null

,CreatedDt datetime not null

,CreatedBy Varchar(15) not null

,LastModified datetime not null

,LastModifiedBy varchar(15) not null

-- Data Columns go here

, constraint PartyVersions_P primary key (PartyId , AsOfDt )

, constraint Partys_F_PartyVersions foreign key

(PartyId ) references Partys

)

-- A version table with a uniqueness constraint

Create table PartyTaxIdVersions

(PartyId integer identity(1,1) not null

,AsOfDt datetime not null

,CreatedDt datetime not null

,CreatedBy Varchar(15) not null

,LastModified datetime not null

,LastModifiedBy varchar(15) not null

,TaxIdentifier integer not null

, constraint PartyTaxIdVersions_P primary key (PartyId , AsOfDt )

, constraint PartyTaxIdVersions_P unique (TaxIdentifier, AsOfDt )

, constraint Partys_F_PartyTaxIdVersions foreign key

(PartyId ) references Partys

)

All Queries for this class of structure will be similiar to

select *

from Partys

join PartyVersions

on PartyVersions.PartyId = Partys.PartyId

join (select PartyVersions.PartyId

, MIN(AsOfDt) as AsOfDt

from PartyVersions

where AsOfDt >= @AsOfDt

) as PartyVersionLatest (PartyId, AsOfDt )

on PartyVersionLatest.PartyId = PartyVersions.PartyId

and PartyVersionLatest.AsOfDt= PartyVersions.AsOfDt

join PartyTaxIdVersions

on PartyTaxIdVersions.PartyId = Partys.PartyId

join (select PartyVersions.PartyId

, MIN(AsOfDt) as AsOfDt

from PartyTaxIdVersions

where AsOfDt >= @AsOfDt

) as PartyTaxIdVersionLatest (PartyId, AsOfDt )

on PartyTaxIdVersionLatest.PartyId = PartyVersions.PartyId

and PartyTaxIdVersionLatest.AsOfDt= PartyVersions.AsOfDt

Please be aware of these situations:

The typical SQL will contain a variable ( As Of Date ) within a subquery, views cannot contain variables and therefore, views cannot be used to aid re-usability.

Since stored procedure use parameter sniffing to determine the optimal query plan, the AsOfDt value must always be passed and must not be defaulted to the current date within the stored procedure. i.e. do not have this logic in any stored procedures:

if @AsOfDt is null set @AsOfDt = current_timestamp

Columns that have uniqueness constraints should be in a seperate table or updates to the other columns will cause many rows having the same value which increases the index size resulting in performance problems.

The Core table and the version table should be on different disks to increase thruput.

SQL = Scarcely Qualifies as a Language