[Python-Dev] Why is soundex marked obsolete?
Sun, 14 Jan 2001 14:46:44 -0500
> BTW, are there less English centric "sounds alike" matchers
> around ?
Yes, but if anything there are far too many of them: like Soundex, they're
just heuristics, and *everybody* who cares adds their own unique twists,
while proper studies are almost non-existent. Few variants appear to be in
use much beyond their inventor's friends; one notable exception in the
Jewish community is the Daitch-Mokotoff variation, originally tailored to
their unique needs but later generalized; a brief description here:
The similarly involved NYSIIS algorithm (New York State Identification
Intelligence System -- look for NYSIIS on Parnassus) was the winner from a
field of about two dozen competing algorithms, after measuring their
effectiveness on assorted databases maintained by the state of New York.
Since New York has a large immigrant population, NYSIIS isn't as
Anglocentric as Soundex either.
But state-of-the-art has given up on purely computational algorithms for
these purposes: proper names are simply too much a mess. For example, if I
search for "Richard", it *ought* to match on "Dick"; if my Arab buddy
searches on "Mohammed", it *ought* to match on "Mhd"; "the rules" people
actually use just aren't reducible to pure computation -- it takes a large
knowledge base to capture what people "just know". You may enjoy visiting
this commercial site (AFAIK, nobody is giving away state-of-the-art for
> works fine for English texts,
If that were true, the English-speaking researchers would have declared
victory 120 years ago <wink>. But English pronunciation is *notoriously*
difficult to predict from spelling, partly because English is the Perl of
or-maybe-the-borg-assuming-there's-a-difference<wink>-ly y'rs - tim