Fuzzy Lookups

gene tani gene.tani at gmail.com
Mon Jan 30 19:28:19 EST 2006


BBands wrote:
> Diez B. Roggisch wrote:
> > I did a levenshtein-fuzzy-search myself, however I enhanced my version by
> > normalizing the distance the following way:
>
> Thanks for the snippet. I agree that normalizing is important. A
> distance of three is one thing when your strings are long, but quite
> another when they are short. I'd been thinking about something along
> these lines myself, but hadn't gotten there yet. It'll be interesting
> to have a look at the distribution of the normalized numbers, I'd guess
> that there may be a rough threshold that effectively separates the
> wheat from the chaff.
>
>     jab

i noticed this guy, who's quite a good ruby developer spent some time
on distances:

http://ruby.brian-schroeder.de/editierdistanz/

and also look at soundex, other algorithms (Double Metaphone, NYSIIS,
Phonex, I have notes to investigate but I haven't looked at them
myhself)




More information about the Python-list mailing list