[Python-Dev] Why is soundex marked obsolete?

Eric S. Raymond esr@thyrsus.com
Sat, 13 Jan 2001 15:23:50 -0500


I have a new goodie for the 2.1 standard library, a module called
"simil" that supports computation of similarity indices between
strings such as one might use for recovery-matching of misspellings
against a dictionary.

The three methods supported are stemming, normalized Hamming
similarity, and (the star of the show) Ratcliff-Obershelp gestalt
subpattern matching.  The latter is spookily effective for detecting
not just substition typos but insertions and deletions.  The module is
a C extension (my first!) for speed and because the Ratcliff-Obershelp
implementation uses pointer arithmetic heavily.

It's documented, tested, and ready to go.  But having written it, I
now have a question: why is soundex marked obsolete?  Is there
something wrong with the algorithm or implementation?  If not, then
it would be natural for simil to absorb the existing soundex 
implementation as a fourth entry point.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Whether the authorities be invaders or merely local tyrants, the
effect of such [gun control] laws is to place the individual at the 
mercy of the state, unable to resist.
        -- Robert Anson Heinlein, 1949

-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Americans have the right and advantage of being armed - unlike the citizens
of other countries whose governments are afraid to trust the people with arms.
	-- James Madison, The Federalist Papers