Don’t Change Anything

  • Two points:

    First, the statement

    I'm sure none of you would store dates as characters, but it does happen.

    is all very well, but if you are recording for example dates for first recorded occurrence of a word or idiom in even a modern language you'll find that datetimecan't do the job so you don't have the option of using it. I've been involved in a couple of projects (as a volunteer, not paid employment) that involved rather old dates, and I couldn't use datetime. As it happened I used SMALLINT not CHAR(10), but there's nothing wrong with the character version.

    Second: people have made the points about keeping an audit trail and about the data being the user's responsability so only the user should change it, and I agree completely with that. I've often been in the position where I'm holding three roles: DBA, Developer, and Data Owner (= "user") and even there I find it useful to have a nice clean document indicating quite clearly three separate things: first the data problem (from the DBA point of view), second the fix (from the user point of view), and third the options for measures to ensure that the application warns the user instead of presenting bad data to the data import system (from the developer point of view). That enables the people who own the money to decide whether to live with manual fixing after the problem manifests itself in the DB (probably in a staging table) or to pick some other option.

    Tom

  • Tom.Thomson (4/9/2010)


    First, the statement "I'm sure none of you would store dates as characters, but it does happen." is all very well, but if you are recording for example dates for first recorded occurrence of a word or idiom in even a modern language you'll find that datetimecan't do the job so you don't have the option of using it. I've been involved in a couple of projects (as a volunteer, not paid employment) that involved rather old dates, and I couldn't use datetime. As it happened I used SMALLINT not CHAR(10), but there's nothing wrong with the character version.

    Heh. The forums are full of problems caused by storing dates and times as strings - which is exactly why no competent DBA would consider it, in all but the most extreme edge-cases.

    For the odd occasion when even the new 2008 types (0001-01-01 to 9999-12-31) are insufficient, you can choose between applying a fixed offset (retaining a proper type), or using an encoding scheme, probably based on an INTEGER or BIGINT).

    The use of SMALLINT interests me - on the face of it, it is too small to encode a date (only) in the usual fashion (20050814), and as a day-only offset from a fixed date (say 1900-01-01) it can only represent dates from 1810 to 1989. Saving two bytes over an INTEGER here might seem like a case of premature optimization!

    My position? Storing dates as strings is dumb.

  • Paul White NZ (4/10/2010)


    Tom.Thomson (4/9/2010)


    First, the statement "I'm sure none of you would store dates as characters, but it does happen." is all very well, but if you are recording for example dates for first recorded occurrence of a word or idiom in even a modern language you'll find that datetimecan't do the job so you don't have the option of using it. I've been involved in a couple of projects (as a volunteer, not paid employment) that involved rather old dates, and I couldn't use datetime. As it happened I used SMALLINT not CHAR(10), but there's nothing wrong with the character version.

    Heh. The forums are full of problems caused by storing dates and times as strings - which is exactly why no competent DBA would consider it, in all but the most extreme edge-cases.

    For the odd occasion when even the new 2008 types (0001-01-01 to 9999-12-31) are insufficient, you can choose between applying a fixed offset (retaining a proper type), or using an encoding scheme, probably based on an INTEGER or BIGINT).

    The use of SMALLINT interests me - on the face of it, it is too small to encode a date (only) in the usual fashion (20050814), and as a day-only offset from a fixed date (say 1900-01-01) it can only represent dates from 1810 to 1989. Saving two bytes over an INTEGER here might seem like a case of premature optimization!

    My position? Storing dates as strings is dumb.

    I'd say "nearly always dumb", but I reckon the exceptions are so rare that "always" is OK.

    SMALLINT for dates: well, the problem was representing the interval during which a word form first appeared. As we don't generally know the day or month something was wriiten if it was written between 600 and 1500 years ago there's no need for we don't need day or month, just year. The (non-programmer, non-dba) who had put the db together and now needed a volunteer to get it working had represented the interval as two columns: date (the start year or the interval) and accuracy. He was using char(4) for the start year, with either a three digit year number and a trailing space or a four digit year number, making comparison a little verbose. He had represented the accuracy as char(10) using strings like 'year ', 'decade ', 'quartercen', 'halfcentur', 'century ' which made any query searching on approximate contemporariness or beforeness or afterness very verbose indeed (and probably impossible to get right). This was running on an oldish PC with a what was even then a very small hard disc, and as he had stacks of corpus on there he was having trouble with the DB size as well as with functionality. I changed the representation to smallint date (year) and tinyint accuracy (which was now the interval length in years - of course only a few values were actually used, not all 256). but that was back in 2000. Today the natural thing would be to use the new (SQL 2008) DATE type for the date (size not a worry, because in 2010 discs are very much bigger and cheaper). If accuracy to a day was needed (won't happen for dates in that application) the year 2000 solution would have been to use INT instead of SMALLINT and today's solution would be to use the SQL 2008 DATE type.

    Tom

  • Paul White NZ (4/10/2010)


    Tom.Thomson (4/9/2010)


    First, the statement "I'm sure none of you would store dates as characters, but it does happen." is all very well, but if you are recording for example dates for first recorded occurrence of a word or idiom in even a modern language you'll find that datetimecan't do the job so you don't have the option of using it. I've been involved in a couple of projects (as a volunteer, not paid employment) that involved rather old dates, and I couldn't use datetime. As it happened I used SMALLINT not CHAR(10), but there's nothing wrong with the character version.

    Heh. The forums are full of problems caused by storing dates and times as strings - which is exactly why no competent DBA would consider it, in all but the most extreme edge-cases.

    For the odd occasion when even the new 2008 types (0001-01-01 to 9999-12-31) are insufficient, you can choose between applying a fixed offset (retaining a proper type), or using an encoding scheme, probably based on an INTEGER or BIGINT).

    The use of SMALLINT interests me - on the face of it, it is too small to encode a date (only) in the usual fashion (20050814), and as a day-only offset from a fixed date (say 1900-01-01) it can only represent dates from 1810 to 1989. Saving two bytes over an INTEGER here might seem like a case of premature optimization!

    My position? Storing dates as strings is dumb.

    Think along the lines of CYYDDD, which JDEdwards uses.

    109001 is Jan 1, 2009.

    There are other such formats that make use of an assumed starting point.

    Although they tend to be propietary, I seem to find them much easier to deal with.

    I had to build a planning /scheduling tool that used dates like this, and it matched up with a work center calendar that had a row for every month. Each row had 31 day columns, and hours in each bucket.

    The solution was rather creative. 😛

    Although looking at what you tend to post, you might cry foul on my abuse of Excel in pulling it together.

    Greg E

  • Greg Edwards-268690 (4/10/2010)


    Think along the lines of CYYDDD, which JDEdwards uses.

    109001 is Jan 1, 2009.

    There are other such formats that make use of an assumed starting point.

    Although they tend to be propietary, I seem to find them much easier to deal with.

    I had to build a planning /scheduling tool that used dates like this, and it matched up with a work center calendar that had a row for every month. Each row had 31 day columns, and hours in each bucket.

    The solution was rather creative. 😛

    Although looking at what you tend to post, you might cry foul on my abuse of Excel in pulling it together.

    Greg E

    Heh. Whatever works, works! I can see some potential advantages there - thanks for the explanation - I feel more knowledgeable already. 🙂

  • Greg Edwards-268690 (4/9/2010)


    GabyYYZ (4/9/2010)


    Greg Edwards-268690 (4/7/2010)


    When you change the data type, you can lose quite a bit. Dates have special functions and edits, and this is probably even more important when integrating data from multiple data sources.

    ...

    Greg E

    I have some colleagues who work with Oracle/SAP. They did a small, incremental, upgrade that SAP assured them would not change any table definitions or data. What happened, a column was modified with a default NULL, it was not caught in QA, and it resulted in the warehouse not shipping out inventory to our stores.

    Ouch!

    NULL kind of means nothing, so it shouldn't have caused a problem. :w00t:

    Greg E

    Greg,

    NULL most definitely does NOT "kind of mean nothing". What NULL means is NULL. It is normally not equal to any value (including another NULL ;-). It also doesn't play well with numerically oriented aggregate functions (e.g. SUM, AVG) nor does it play well with mathmatical calculations (NULL + 1 = NULL and not 1).

    If you know that a numeric field may contain a NULL, then you can use tricks like COALESCE(fieldname, 0) AS . . . when you are doing calculations with the field. However, if you don't do that and then do a calculation, you can get all kinds of wierd problems.

    Ralph D. Wilson II
    Development DBA

    "Give me 6 hours to chop down a tree and I will spend the first 4 sharpening the ax."
    A. Lincoln

  • RalphWilson (4/12/2010)


    Greg Edwards-268690 (4/9/2010)


    GabyYYZ (4/9/2010)


    Greg Edwards-268690 (4/7/2010)


    When you change the data type, you can lose quite a bit. Dates have special functions and edits, and this is probably even more important when integrating data from multiple data sources.

    ...

    Greg E

    I have some colleagues who work with Oracle/SAP. They did a small, incremental, upgrade that SAP assured them would not change any table definitions or data. What happened, a column was modified with a default NULL, it was not caught in QA, and it resulted in the warehouse not shipping out inventory to our stores.

    Ouch!

    NULL kind of means nothing, so it shouldn't have caused a problem. :w00t:

    Greg E

    Greg,

    NULL most definitely does NOT "kind of mean nothing". What NULL means is NULL. It is normally not equal to any value (including another NULL ;-). It also doesn't play well with numerically oriented aggregate functions (e.g. SUM, AVG) nor does it play well with mathmatical calculations (NULL + 1 = NULL and not 1).

    If you know that a numeric field may contain a NULL, then you can use tricks like COALESCE(fieldname, 0) AS . . . when you are doing calculations with the field. However, if you don't do that and then do a calculation, you can get all kinds of wierd problems.

    Sorry - I was being sarcastic.

    Greg E

  • Greg Edwards-268690 (4/13/2010)


    RalphWilson (4/12/2010)


    Greg Edwards-268690 (4/9/2010)


    GabyYYZ (4/9/2010)


    Greg Edwards-268690 (4/7/2010)


    When you change the data type, you can lose quite a bit. Dates have special functions and edits, and this is probably even more important when integrating data from multiple data sources.

    ...

    Greg E

    I have some colleagues who work with Oracle/SAP. They did a small, incremental, upgrade that SAP assured them would not change any table definitions or data. What happened, a column was modified with a default NULL, it was not caught in QA, and it resulted in the warehouse not shipping out inventory to our stores.

    Ouch!

    NULL kind of means nothing, so it shouldn't have caused a problem. :w00t:

    Greg E

    Greg,

    NULL most definitely does NOT "kind of mean nothing". What NULL means is NULL. It is normally not equal to any value (including another NULL ;-). It also doesn't play well with numerically oriented aggregate functions (e.g. SUM, AVG) nor does it play well with mathmatical calculations (NULL + 1 = NULL and not 1).

    If you know that a numeric field may contain a NULL, then you can use tricks like COALESCE(fieldname, 0) AS . . . when you are doing calculations with the field. However, if you don't do that and then do a calculation, you can get all kinds of wierd problems.

    Sorry - I was being sarcastic.

    Greg E

    I have toyed with the idea of getting into SAP, but then the rational part of my brain kicks me in the groin and says, "Snap out of it!"

    Gaby
    ________________________________________________________________
    "In theory, theory and practice are the same. In practice, they are not."
    - Albert Einstein

  • GabyYYZ (4/13/2010)


    ...but then the rational part of my brain kicks me in the groin...

    Odd anatomical image :laugh:

  • Paul White NZ (4/13/2010)


    GabyYYZ (4/13/2010)


    ...but then the rational part of my brain kicks me in the groin...

    Odd anatomical image :laugh:

    Not all that odd really, if the brain is what a person thinks with. I keep on hearing people claim that most men think with something they don't keep in their heads and is very well positioned to place that kick. :laugh:

    Tom

Viewing 10 posts - 61 through 69 (of 69 total)

You must be logged in to reply to this topic. Login to reply