Fuzzy matching of postal addresses

Aaron Bingham bingham at cenix-bioscience.com
Tue Jan 18 09:09:09 CET 2005

Andrew McLean wrote:
> I have a problem that is suspect isn't unusual and I'm looking to see if 
> there is any code available to help. I've Googled without success.
> Basically, I have two databases containing lists of postal addresses and 
> need to look for matching addresses in the two databases. More 
> precisely, for each address in database A I want to find a single 
> matching address in database B.

I had a similar problem to solve a while ago.  I can't give you my code, 
but I used this paper as the basis for my solution (BibTeX entry from 

@misc{ monge-adaptive,
   author = "Alvaro E. Monge",
   title = "An Adaptive and Efficient Algorithm for Detecting 
Approximately Duplicate
     Database Records",
   url = "citeseer.ist.psu.edu/monge00adaptive.html" }

There is a lot of literature--try a google search for "approximate 
string match"--but very little publically available code in this area, 
from what I could gather.  Removing punctuation, etc., as others have 
suggested in this thread, is _not_sufficient_.  Presumably you want to 
be able to match typos or phonetic errors as well.  This paper's 
algorithm deals with those problems quite nicely,

Aaron Bingham
Application Developer
Cenix BioScience GmbH

More information about the Python-list mailing list