appoximate string matching library - any interest?

Tim Churches tchur at
Thu Aug 28 15:27:39 CEST 2003

Istvan Albert writes:
> I'm working on a project that needs approximate string 
> matching such as the String::Aprox module in perl:
> Unlike  exact matches approximate (fuzzy) matches can match 
> words having small differences in them, typos, errors or 
> similarly spellings.
> I was unable to find a similar implementation in python right 
> away so I tried wrapping the perl module's underlying C 
> library into python calls. I turned out to be fairly easy, 
> man is SWIG an awesome product or what ... in a just a few 
> hours I managed to create a quite functional version (see below).
> In the meantime I have also discovered that there is a 
> similar project available but I have no idea how 
> well it works. I'm trying to gauge the interest relative to 
> this library, right now it serves my needs yet I wouldn't 
> mind polishing it up and making it public if it appears to be 
> useful for others too.

There are a number of approximate string matching functions, implemented
in pure Python, included in the Febrl project (Febrl=Freely-extensible
biomedical record linkage). See under "Prototype software" at or for details of the
approximate string comparators implemented, see
l Note that approximate string comparator functions are different from
(although related to) phonetic encoders, such as Soundex (see
l for some examples of the latter).

Wrapped C implementations of any or all of these comparators would be
welcome, although in practice we haven't found them to be a major
bottleneck (although calculating the Levenshtein distance on long
strings can be rather expensive). Oh, there are also a number of
interesting vector-space comparison techniques which can be applied to
strings, but we haven't implemented any of these yet in Febrl. Then
there are various language- or culture-specific comparators. And then
there is the whole issue of name comparison in pictographic and
ideographic languages...

And I didn't mention guns once...

Tim C

More information about the Python-list mailing list