ANNOUNCE: soundex.py
Steve Williams
sandj.williams at gte.net
Wed Dec 20 18:57:41 EST 2000
Skip Montanaro wrote:
> Python 2.0 no longer ships with a soundex module. Sometime ago, Tim Peters
> and Fred Drake each cooked up replacements written in Python. I merged them
> together into a single module which is available from
>
> http://musi-cal.mojam.com/~skip/python/
>
> If you have any questions or comments on the module, please send them my
> way.
>
Soundex routines traditionally return a fixed number of characters--the NDIGITS
in your routine. That's 'cause the system was developed before computers.
I've found you can good results with a variable length soundex string--the more
information you give the routine (first names, middle names, prefixes and
suffixes) the better/smaller the result set.
Store the full soundex key as a varchar in your database and use the SQL LIKE
statement to do the retrieval.
This is particularly useful with one syllable surnames--you really need to add
more to the name to get anything useful. (Mao Tse-Tung == M000 vs. M32352, you
be the judge).
For example,
print get_soundex('van')
V5
print get_soundex('van der tamp')
V536351
print get_soundex('van der tamp, albert')
V53635141632
print get_soundex('van der tamp, albert c, lieutenant colonel, Phd')
V5363514163243553245413 <== this is the full key stored as a varchar in your
database
So a retrieval like V53635141632435532454% will return all the lieutenant
colonel albert c van der tamps in your database, whether they have PhDs or not.
More information about the Python-list
mailing list