ANNOUNCE: soundex.py

Steve Williams sandj.williams at gte.net
Wed Dec 20 18:57:41 EST 2000


Skip Montanaro wrote:

> Python 2.0 no longer ships with a soundex module.  Sometime ago, Tim Peters
> and Fred Drake each cooked up replacements written in Python.  I merged them
> together into a single module which is available from
>
>     http://musi-cal.mojam.com/~skip/python/
>
> If you have any questions or comments on the module, please send them my
> way.
>

Soundex routines traditionally return a fixed number of characters--the NDIGITS
in your routine.  That's 'cause the system was developed before computers.

I've found you can good results with a variable length soundex string--the more
information you give the routine (first names, middle names, prefixes and
suffixes) the better/smaller the result set.

Store the full soundex key as a varchar in your database and use the SQL LIKE
statement to do the retrieval.

This is particularly useful with one syllable surnames--you really need to add
more to the name to get anything useful.  (Mao Tse-Tung == M000 vs. M32352, you
be the judge).

For example,
    print get_soundex('van')
V5
    print get_soundex('van der tamp')
V536351
    print get_soundex('van der tamp, albert')
V53635141632
    print get_soundex('van der tamp, albert c, lieutenant colonel, Phd')
V5363514163243553245413   <== this is the full key stored as a varchar in your
database

So a retrieval like V53635141632435532454% will return all the lieutenant
colonel albert c van der tamps in your database, whether they have PhDs or not.





More information about the Python-list mailing list