appoximate string matching library - any interest?
Istvan Albert
ialbert at mailblocks.com
Thu Aug 28 18:57:59 EDT 2003
Hi all,
I'm working on a project that needs approximate string matching
such as the String::Aprox module in perl:
http://search.cpan.org/author/JHI/String-Approx-3.20/Approx.pm
Unlike exact matches approximate (fuzzy) matches can match
words having small differences in them, typos, errors or
similarly spellings.
I was unable to find a similar implementation in python right
away so I tried wrapping the perl module's underlying C library
into python calls. I turned out to be fairly easy, man is
SWIG an awesome product or what ... in a just a few hours I
managed to create a quite functional version (see below).
In the meantime I have also discovered that there is a similar
project Agrepy.py available but I have no idea how well
it works. I'm trying to gauge the interest relative to this
library, right now it serves my needs yet I wouldn't mind
polishing it up and making it public if it appears to be
useful for others too.
cheers,
Istvan.
-------------------------------------------------------
import APSE
title = "Star Wars - Episode Two"
test_word = "Stra Wras - Episode Two"
test_list = ["Stra Wars - Episode Two",
"Stra Wras - Episode Two", "Stra Wras - Episode Tow"]
# allow at most 4 edits
ap = APSE.Approx(title, edit=4)
#info on the matching parameres
print "Info", ap.info()
print
# only prints matching entries
print "Word:", ap.match(test_word)
print "List:", ap.match(test_list)
print
# verbose matching, the first element of the tuple is
# the edit distance
print "Verbose word:", ap.verbose_match(test_word)
print "Verbose list:", ap.verbose_match(test_list)
-------------------- output --------------
Info ({'edit': 4}, 'Star Wars - Episode Two')
Word: Stra Wras - Episode Two
List: ['Stra Wars - Episode Two', 'Stra Wras - Episode Two']
Verbose word: (4, 'Stra Wras - Episode Two')
Verbose list: [(2, 'Stra Wars - Episode Two'),
(4, 'Stra Wras - Episode Two')]
More information about the Python-list
mailing list