Fuzzy matching of postal addresses [1/1]

Andrew McLean spam-trap-095 at at-andros.demon.co.uk
Sun Jan 23 21:00:29 CET 2005


In case anyone is interested, here is the latest.

I implemented an edit distance technique based on tokens. This 
incorporated a number of the ideas discussed in the thread.

It works pretty well on my data. I'm getting about 95% matching now, 
compared with 90% for the simple technique I originally tried. So I have 
matched half the outstanding cases.

I have spotted very few false positives, and very few cases where I 
could make a match manually. Although I suspect the code could still be 
improved.

It took a bit of head scratching to work out how to incorporate 
concatenation of tokens into the dynamic programming method, but I think 
I got there! At least my test cases seem to work!

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: MinEditDistance5.py
URL: <http://mail.python.org/pipermail/python-list/attachments/20050123/db65fd12/attachment.ksh>
-------------- next part --------------



-- 
Andrew McLean


More information about the Python-list mailing list