Sounds Like

  • Hey Guru's,

    I'm wanting to do a "Sounds Like" comparison to compare a list of addresses of one of my databases. Is SOUNDEX the best way to go? I'm just not getting the tyoe of results that I had hoped for.

    Thanks for all the help

    Shane

  • You can also check out difference.

  • I'm still having problems.

    Do you know a way using either SOUNDEX or DIFFERNCE to get the following results?

    I'm wanting to take to addresses (a parent record and a child record), if the addresses are similar I don't want to display them. Similar being, one person may have typed in Road a while the other person typed in Rd or Rd.

    I only want to show records where the addresses are significantly different.

    Hopefully this doesn't sound confusing.

  • How is the adress stored in the db?

  • As a VARCHAR(100)

  • You'll have to split it into its smaller parts for the task... and I suggest you do that for the db too. It's easier if you have the street number, stree name and zip code in different columns. YOu can use soundex where like and can't do the job.

  • OK

    Currently I have it stored as

    street_line1 VARCHAR(100)

    street_line2 VARCHAR(100)

    street_line3 VARCHAR(100)

    city VARCHAR(30)

    state VARCHAR(2)

    zipcode VARCHAR(10)

    So if I wanted to use the SOUNDEX in place of a LIKE or '', can you give me an example?

  • You'd have to use difference. Check it out under bols it's pretty good, but I don't think it was made for such a task.

  • You could try using the Levenshtein edit distance - there are plenty of examples on the net. You would have to write your own function for this but it should be OK if you just are comparing a couple of values. Performance would be a problem if you were doing it in a query.

  • from BOL :

    The DIFFERENCE function compares the SOUNDEX values of two strings and evaluates the similarity between them, returning a value from 0 through 4, where 4 is the best match.

    Here is a litle test I've done a while ago :

    select 'bijnens', soundex('bijnens')

    union all

    select 'bainens', soundex('baainens')

    union all

    select 'beinens',soundex('beinens')

    union all

    select 'beijnens', soundex('beijnens')

    union all

    select 'bynens', soundex('bynens')

    union all

    select 'bijnans', soundex('bijnans')

    union all

    select 'bijkans', soundex('bijkans')

    union all

    select 'janssen', soundex('janssen')

    union all

    select 'janssens', soundex('janssens')

    union all

    select 'jansen', soundex('jansen')

    union all

    select 'jans', soundex('jans')

    union all

    select 'sjans', soundex('sjans')

    union all

    select 'chans', soundex('chans')

    resulted at our servers (default installation English)

    bijnens  B255

    bainens  B552

    beinens  B552

    beijnens B255

    bynens   B552

    bijnans  B255

    bijkans  B252

    janssen  J525

    janssens J525

    jansen   J525

    jans     J520

    sjans    S252

    chans    C520

    I geuss it's obvious the results are language dependant ! (phonetics)

    You should do propre testing and evaluation.

    you could use a case statement to conditionaly select your data :

    declare @tmptest table (idnr int not null identity(1,1) primary key, LName varchar(50) not null )

    insert into @tmptest (LName) 

    select lname

    from (

    select 'bijnens' as LName, soundex('bijnens') as SoundexValue

    union all

    select 'bainens', soundex('baainens')

    union all

    select 'beinens',soundex('beinens')

    union all

    select 'beijnens', soundex('beijnens')

    union all

    select 'bynens', soundex('bynens')

    union all

    select 'bijnans', soundex('bijnans')

    union all

    select 'bijkans', soundex('bijkans')

    union all

    select 'janssen', soundex('janssen')

    union all

    select 'janssens', soundex('janssens')

    union all

    select 'jansen', soundex('jansen')

    union all

    select 'jans', soundex('jans')

    union all

    select 'sjans', soundex('sjans')

    union all

    select 'chans', soundex('chans')

    ) A

    Select T1.IdNr, T1.LName , T2.LName

    from @tmptest T1

    left join @tmptest T2

    on  T1.IdNr <> T2.IdNr

    and soundex(T1.Lname) = soundex(T2.Lname)

    order by T1.IdNr

    --- 2

    Select T1.IdNr, T1.LName , T2.LName

    from @tmptest T1

    left join @tmptest T2

    on  T1.IdNr <> T2.IdNr

    and soundex(T1.Lname) = soundex(T2.Lname)

    where DIFFERENCE(T1.LName, T2.LName) = 4

    order by T1.IdNr

     

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • i though soundex didn't work if the value started with a number? so an address like 123 main street vs 321 main St was ineffective?

    select soundex('123 main street') = 0000

    select soundex('321 main street') = 0000

    select soundex('main street') = M500

    I've got a stack of REPLACE functions that i used to add a column that hadd a "cleaned address" based on the original address if htat might help;

    it just copied the addr1 field to a new column, and then updated that new column to replace all variations of POBOX for example (ie ''PO','p.o.box','p.o. box') with the default .

    it was a stack of like 100 replaces, but seemed to give me pretty clean results when compared to the original. let me know if it sounds useful and i'll post it here.

     

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • You are better off using an external app for address matching.

    SOUNDEX is OK but it always retains the 1st character and it deals with letters on an individual basic.

    Philistine an Filistine are clearly intended to be the same word but will return different SOUNDEXs.

    I am a great fan of the Lawrence Philips Metaphone for address matching but you would never want to build that functionality into SQL Server.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply