[Python-Dev] Why is soundex marked obsolete?
Sun, 14 Jan 2001 14:00:21 -0500
Very quick (swamped):
> I think you've just made an argument for replacing your
> SequenceMatcher with simil.ratcliff.
Actually, I'm certain they're the same algorithm now, except the C is
showing through in ratcliff to the floating-point eye <wink>. For
demonstration, I *always* printed the top three scorers (that's logic in the
little driver I posted, not in SequenceMatcher), without any notion of
cutoff (ndiff does use a cutoff). Add this line before the return (in the
posted driver) to see the actual scores:
Module name? browser
Hmm. My best guesses are webbrowser, robotparser, user
On this example you reported:
>>> simil.ratcliff("browser", "webbrowser")
>>> simil.ratcliff("browser", "robotparser")
>>> simil.ratcliff("browser", "user")
which strongly suggests you're using C floats instead of Python floats to
compute the final score. I didn't try every example in your email, but it's
the same story on the three I did try (scores identical modulo
simil.ratcliff dropping about 30 of the low-order result bits -- which is
about the difference between a C double and a C float on most boxes).
> Mine's even documented. :-).
Which I appreciate! I dreamt up the SequenceMatcher algorithm going on 20
years ago for a friendly diff generator, and never even considered using it
for other purposes. But then I may have mentioned that these other purposes
never come up in my apps <wink>.
strong-enough-ly y'rs - tim