Question about malformed ZIP codes

  • ISNULL('"' + LEFT(D.Zip, 5) + '"', '"99999"') AS ["Customer Zip"],

    I have this query in a stored procedure and use it for some EDI transactions.  In the past few months we received an alert from the vendor that our file was not processing.  In doing some research, there was a test account put into the main system that used alpha characters in the ZIP field.  I believe that alpha is allowed due to Canadian postal codes.

    But for this application of the data, these will be US Zip codes only.  How can I wrap this statement to check for NUMERIC, 5-digit only data, and return "99999" if not validated?  Is there something similar to a RegEx statement, or something easier?

    Thanks for the information

  • You could use ISNUMERIC.

    Personally, if I COULD I would be changing the application that puts the data into the system to validate it and refuse to store invalid data.

    Failing that, you are going to need to use multiple functions to ensure it is good.  But you could do something like:

    CASE WHEN LEN(D.Zip) = 5 AND ISNUMERIC(D.Zip) = 1 then D.ZIP else '99999' END AS ["Customer Zip"]

    NOTE - I didn't test the above and there may be other cases that you want to verify.  For example the string "123.4" has a length of 5 (5 characters) and it is numeric, but it is not a valid ZIP.

    The reason I prefer the above is that it is EASY to tell what validation rules are required - length must be 5 and it must be numeric.

    Alternately, you could build up a function to validate that the ZIP code is a valid ZIP code, or have a calculated column on the table that validates that the ZIP is in a valid format and if it is, it gets a 1 otherwise it is a 0.

    Something you will want to be careful of though is that as your company grows and expands, you MAY end up doing business inside countries such as Canada where a ZIP code (Postal code in Canada) contains letters.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • If you want to rely on the existing zip always being fully padded on the left with zeros, you can do this:

    CASE WHEN LEN(D.Zip) = 5 AND D.Zip NOT LIKE '%[^0-9]%' THEN D.Zip ELSE '99999' END AS Zip

    Since there are valid 3-digit zip codes (with leading zeros removed), such as 055, I generally do this when I check zip codes:

    CASE WHEN LEN(D.Zip) <= 5 AND RIGHT('00' + D.Zip, 5) NOT LIKE '%[^0-9]%' 
    THEN RIGHT('00' + D.Zip, 5) ELSE '99999' END AS Zip

    Since 3 to 5 leading zeros are invalid, you could check for that also if you wanted to.  Since there are only ~42K+ zip codes, you could just create a table of them to verify against, and you would know for sure that the zip is truly valid.

    Btw, the formally correct Canadian postal format is LNL NLN, with the embedded space.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • Thanks for the replies.  I wish we could change the application, however it is a vendor we prefer not to deal with too much.  I will try all of these options and it looks as if CASE WHEN LEN(D.Zip) = 5 AND D.Zip NOT LIKE '%[^0-9]%' THEN D.Zip ELSE '99999' END AS Zip will work as we need.

    Thanks again and have a great weekend

  • Another possibility is a bit shorter, uses only LIKE instead of LIKE and the LEN() function, and does the same thing is...

    IIF(D.Zip LIKE '[0-9][0-9][0-9][0-9][0-9]', d.Zip, '99999') AS Zip

    As a bit of a sidebar, IIF is converted to a CASE WHEN ELSE END behind the scenes so no performance advantage or disadvantage there.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Of course, an even better way would be to buy an up-to-date zip code database and do a lookup because not every number from 00000 to 99998 is a valid zip code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden wrote:

    Another possibility is a bit shorter, uses only LIKE instead of LIKE and the LEN() function, and does the same thing is...

    IIF(D.Zip LIKE '[0-9][0-9][0-9][0-9][0-9]', d.Zip, '99999') AS Zip

    As a bit of a sidebar, IIF is converted to a CASE WHEN ELSE END behind the scenes so no performance advantage or disadvantage there.

    Looking back at the original code - this needs a minor fix:

    QUOTENAME(IIF(D.Zip LIKE '[0-9][0-9][0-9][0-9][0-9]', d.Zip, '99999'), '"') AS ["Customer Zip"]

    Note: some will argue against using IIF because it isn't standard and you can just use CASE.  I personally limit my use of IIF to situations where there will only be 2 possible results (true/false, yes/no, on/off, 0/1, etc.).  If the logic gets more complicated - then use the standard CASE expression.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I say don't use IIF.  It's a completely foreign coding approach to other SQL.  And there's no genuine need for it, since CASE can do the same thing, and IIF makes the code much less readable, esp. for non-IT people.

    Btw, for zips I tend to use:

    D.Zip NOT LIKE '%[^0-9]%'

    rather than '[0-9]...' because typically zips can be either 5 or 9 digits intermixed (and ours can be).  Which means two patterns then rather than one.  But not a big deal either way.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • While I agree that IIF is "foreign" to SQL, it is VERY short for 2 part CASE statements.  As you know, it does resolve to a CASE statement behind the scenes anyway and the IIF does not cause a performance issue.  Unlike someone using the FORMAT function, IIF is a decent form of shorthand I frequently use when 2 part decisions are needed.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeffrey Williams wrote:

    Jeff Moden wrote:

    Another possibility is a bit shorter, uses only LIKE instead of LIKE and the LEN() function, and does the same thing is...

    IIF(D.Zip LIKE '[0-9][0-9][0-9][0-9][0-9]', d.Zip, '99999') AS Zip

    As a bit of a sidebar, IIF is converted to a CASE WHEN ELSE END behind the scenes so no performance advantage or disadvantage there.

    Looking back at the original code - this needs a minor fix:

    QUOTENAME(IIF(D.Zip LIKE '[0-9][0-9][0-9][0-9][0-9]', d.Zip, '99999'), '"') AS ["Customer Zip"]

    Note: some will argue against using IIF because it isn't standard and you can just use CASE.  I personally limit my use of IIF to situations where there will only be 2 possible results (true/false, yes/no, on/off, 0/1, etc.).  If the logic gets more complicated - then use the standard CASE expression.

    But that means if an existing comparison needs expanded later, someone must convert your IIF to a CASE in order to extend it.  And that increases the chances for errors.

    I prefer code that is much clearer, even if it is a bit longer to type.  To me, the typing time is trivial compared to the total lost time as people have to spend more time interpreting the code.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply