[Python-Dev] Why is soundex marked obsolete?
Eric S. Raymond
esr@thyrsus.com
Sat, 13 Jan 2001 15:23:50 -0500
I have a new goodie for the 2.1 standard library, a module called
"simil" that supports computation of similarity indices between
strings such as one might use for recovery-matching of misspellings
against a dictionary.
The three methods supported are stemming, normalized Hamming
similarity, and (the star of the show) Ratcliff-Obershelp gestalt
subpattern matching. The latter is spookily effective for detecting
not just substition typos but insertions and deletions. The module is
a C extension (my first!) for speed and because the Ratcliff-Obershelp
implementation uses pointer arithmetic heavily.
It's documented, tested, and ready to go. But having written it, I
now have a question: why is soundex marked obsolete? Is there
something wrong with the algorithm or implementation? If not, then
it would be natural for simil to absorb the existing soundex
implementation as a fourth entry point.
--
<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>
Whether the authorities be invaders or merely local tyrants, the
effect of such [gun control] laws is to place the individual at the
mercy of the state, unable to resist.
-- Robert Anson Heinlein, 1949
--
<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>
Americans have the right and advantage of being armed - unlike the citizens
of other countries whose governments are afraid to trust the people with arms.
-- James Madison, The Federalist Papers