Fuzzy matching of postal addresses [1/1]
Andrew McLean
spam-trap-095 at at-andros.demon.co.uk
Sun Jan 23 15:00:29 EST 2005
In case anyone is interested, here is the latest.
I implemented an edit distance technique based on tokens. This
incorporated a number of the ideas discussed in the thread.
It works pretty well on my data. I'm getting about 95% matching now,
compared with 90% for the simple technique I originally tried. So I have
matched half the outstanding cases.
I have spotted very few false positives, and very few cases where I
could make a match manually. Although I suspect the code could still be
improved.
It took a bit of head scratching to work out how to incorporate
concatenation of tokens into the dynamic programming method, but I think
I got there! At least my test cases seem to work!
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: MinEditDistance5.py
URL: <http://mail.python.org/pipermail/python-list/attachments/20050123/db65fd12/attachment.ksh>
-------------- next part --------------
--
Andrew McLean
More information about the Python-list
mailing list