Fuzzy matching of postal addresses

Aaron Bingham bingham at cenix-bioscience.com
Wed Jan 19 03:06:37 EST 2005


Andrew McLean wrote:
> Thanks for all the suggestions. There were some really useful pointers.
> 
> A few random points:
[snip]
> 4. You need to be careful doing an endswith search. It was actually my 
> first approach to the house name issue. The problem is you end up 
> matching "12 Acacia Avenue, ..." with "2 Acacia Avenue, ...".

Is that really a problem?  That looks like a likely typo to me.  I guess 
it depends on your data set.  In my case, the addresses were scattered 
all over the place, with relatively few in a given city, so the 
likelyhood of two addresses on the same street in the same town was very 
low.  We /wanted/ to check for this kind of 'duplication'.

Note that endswith will not deal with 'Avenue' vs. 'Ave.', but I supose 
a normalization phase could take care of this for you.  The Monge 
algorithm I pointed you to takes care of this pretty nicely.

-- 
--------------------------------------------------------------------
Aaron Bingham
Application Developer
Cenix BioScience GmbH
--------------------------------------------------------------------




More information about the Python-list mailing list