appoximate string matching library - any interest?

Istvan Albert ialbert at mailblocks.com
Fri Aug 29 00:57:59 CEST 2003


Hi all,

I'm working on a project that needs approximate string matching
such as the String::Aprox module in perl:

http://search.cpan.org/author/JHI/String-Approx-3.20/Approx.pm

Unlike  exact matches approximate (fuzzy) matches can match
words having small differences in them, typos, errors or
similarly spellings.

I was unable to find a similar implementation in python right
away so I tried wrapping the perl module's underlying C library
into python calls. I turned out to be fairly easy, man is
SWIG an awesome product or what ... in a just a few hours I
managed to create a quite functional version (see below).

In the meantime I have also discovered that there is a similar
project Agrepy.py available but I have no idea how well
it works. I'm trying to gauge the interest relative to this
library, right now it serves my needs yet I wouldn't mind
polishing it up and making it public if it appears to be
useful for others too.

cheers,

Istvan.

-------------------------------------------------------

import APSE

title = "Star Wars - Episode Two"
test_word = "Stra Wras - Episode Two"
test_list = ["Stra Wars - Episode Two",
"Stra Wras - Episode Two", "Stra Wras - Episode Tow"]

# allow at most 4 edits
ap = APSE.Approx(title, edit=4)

#info on the matching parameres
print "Info", ap.info()
print

# only prints matching entries
print "Word:", ap.match(test_word)
print "List:", ap.match(test_list)
print

# verbose matching, the first element of the tuple is
# the edit distance
print "Verbose word:", ap.verbose_match(test_word)
print "Verbose list:", ap.verbose_match(test_list)

-------------------- output --------------

Info ({'edit': 4}, 'Star Wars - Episode Two')

Word: Stra Wras - Episode Two
List: ['Stra Wars - Episode Two', 'Stra Wras - Episode Two']

Verbose word: (4, 'Stra Wras - Episode Two')
Verbose list: [(2, 'Stra Wars - Episode Two'),
(4, 'Stra Wras - Episode Two')]





More information about the Python-list mailing list