Using SoundEx to find similar Client Names

  • I have a Client list of about 175,000 plus records.  It is not my own.  We are sure there are repeated clients, (e.g. A & A Plumbing, A and A Plumbing, A and A Plumbing Inc., A & A Plumbing, Incorporated, etc). 

    I have written a routine that parses out each word in each record.  I then use SoundEx.  I then parse out each SoundEx return and use the ASCII to get a numeric value.  I then add the numeric values until all of the words are completed for each record. 

    Note: I have a list of REPLACE for things like Inc, Incorp, Incorporated, etc since that would only complicate this worse. 

    I am getting a return set of about 50,000 records; some are look to be similar, some are not close at all. 

    Has anyone done a SQL routine that tries to capture these kinds of occurences and if so, what approach did you take?  If not, does anyone have a suggestion as to how I could further refine these 50,000 records? 

    Thank you. 

    I wasn't born stupid - I had to study.

  • I has done a SQL routing that tries to capture Malyutin Slava and Malyutin Salava. Within it I can set a number of errors. That script is not small. If you want I can to e-mail it.

  • Can you send me the copy of your SQL Script, using SoundEx to find similar Client Names. My email address is naimsyed@hotmail.com

    Thanks a bundle

     

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply