Fuzzy Lookups

Diez B. Roggisch deets at nospam.web.de
Mon Jan 30 11:30:06 EST 2006

Fredrik Lundh wrote:

> Diez B. Roggisch wrote:
>> The advantage becomes apparent when you try to e.g. compare
>> "Angelina Jolie"
>> with
>> "AngelinaJolei"
>> and
>> "Bob"
>> Both have a l-dist of 3
>>>> distance("Angelina Jolie", "AngelinaJolei")
> 3
>>>> distance("Angelina Jolie", "Bob")
> 13
> what did I miss ?

Hmm. I missed something - the "1" before the "3" in 13 when I looked on my
terminal after running the example. And according to


it has the property 

"""It is always at least the difference of the sizes of the two strings."""

And my implementation I got from there (or better from  Magnus Lie Hetland
whoms python version is referenced there)

So you are right, my example is crap.

But I ran into cases where my normalizing made sense - otherwise I wouldn't
have done it :)

I guess it is more along the lines of (coughed up example)


compared to 



I can only say that I used it to fuzzy-compare people's and hotel names, and
applying the normalization made my results by far better.

Sorry to cause confusion.


More information about the Python-list mailing list