Processing duplicate address

  • Hi,

    I have a table with lots of addresses and some are obvious duplicates, but with differences, i.e.

    1486 SOUTH PAGOSA STREET

    1486 S. Pagosa St.

    1486 So. Pagosa Street

    Is there any method that can be used to know that these are duplicates?

  • One way is to process them in SSIS using the FUZZY Lookup component. I don't think it works all that well, but it does get some. Depending on how many addresses you have you may have to do a lot manually.

    What I have done is used the Fuzzy lookup to load a table I then use a cross reference and then manually go through the table, fixing the ones that aren't matches.

    I don't know of a real good way.

  • Use the Google Maps API and see if they return the same location?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thanks, I will check that out. I am trying to do as much as I can programmatically.

  • Please (please!) report back on this. I think that this would be the coolest thing ever if you get it working. 🙂

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I've never tried the Fuzzy logic or Google API approach, but I did make a fairly decent tool to standardize addresses using the US Post office abbreviations (there should be simmilar standards for other countries):

    http://pe.usps.gov/text/pub28/28apc_001.html#NL508_2

    And then strip out periods, commas, etc, and compare the standardized addresses. Attached is a script that creates the word-abbreviation table I used.

  • Thanks so much for all of the responses. We are looking into how we can use this information to accomplish this.

  • Your best option would be to use a tool that is built specifically for this. Melissa data (http://www.melissadata.com) is one of the better recognized providers.

    They have the ability to integrate with multiple programming languages as well as with SQL Server.

    This kind of software pays for itself in no time at all.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I have to do this kind of thing all day every day (actually, I have automated procedures that do exactly this). Since I work for a direct mail company, that makes sense.

    What I use is Satori Software's Mailroom Toolkit. It standardizes the addresses, verifies they are mailable, and then it's easy to dedupe the list.

    That, of course, costs money. I don't remember how much, since the company I work for paid for it.

    It's a great product for handling mailing lists. Has all the stuff for getting reduced postage rates (what used to be called bulk rates), etc. They even have a product for checking if people have moved, based on the Post Office National Change of Address registry.

    It's fully automatable with SQL and .NET.

    http://www.satorisoftware.com is their web address.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • We maintain seperate fields for our Property Addresses and then assemble them when we wish to use them. It probably doesn't help for your situation but it helps us to keep reasonably clean addresses. Some of the fields use a lookup table for validation.

    -------------------- ---------- ------ ----- ---- --------------------------------------

    house_no Int 10 0 YES House Number.

    -------------------- ---------- ------ ----- ---- --------------------------------------

    house_fra Varchar 3 YES House number fraction, ie '1/2'.

    -------------------- ---------- ------ ----- ---- --------------------------------------

    st_dir_pre Char 2 YES Prefix Direction.

    -------------------- ---------- ------ ----- ---- --------------------------------------

    st_name Varchar 40 YES Street Name.

    -------------------- ---------- ------ ----- ---- --------------------------------------

    st_type Char 4 YES Street Type or Suffix (in the

    str_type_lu table).

    -------------------- ---------- ------ ----- ---- --------------------------------------

    st_dir_suf Char 2 YES Suffix Direction.

    -------------------- ---------- ------ ----- ---- --------------------------------------

    unit_type Varchar 4 YES Type of unit.

    APT = Apartment.

    STE = Suite.

    UNIT = Unit.

    -------------------- ---------- ------ ----- ---- --------------------------------------

    unit_num Varchar 9 YES Unit (Apartment, Suite) Number.

    -------------------- ---------- ------ ----- ---- --------------------------------------

    bldg Varchar 6 YES Building Number.

    -------------------- ---------- ------ ----- ---- --------------------------------------

    city Varchar 30 YES City Name (in the cityzip_lu view).

    -------------------- ---------- ------ ----- ---- --------------------------------------

    state Char 2 YES State code (IA).

    -------------------- ---------- ------ ----- ---- --------------------------------------

    zip Char 5 YES Five digit zip code (in the cityzip_lu

    view).

    -------------------- ---------- ------ ----- ---- --------------------------------------

    zip4 Char 4 YES Four digit Zip Code Extension.

    -------------------- ---------- ------ ----- ---- --------------------------------------

    Steve

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply