Fuzzy matching of postal addresses
Aaron Bingham
bingham at cenix-bioscience.com
Wed Jan 19 03:06:37 EST 2005
Andrew McLean wrote:
> Thanks for all the suggestions. There were some really useful pointers.
>
> A few random points:
[snip]
> 4. You need to be careful doing an endswith search. It was actually my
> first approach to the house name issue. The problem is you end up
> matching "12 Acacia Avenue, ..." with "2 Acacia Avenue, ...".
Is that really a problem? That looks like a likely typo to me. I guess
it depends on your data set. In my case, the addresses were scattered
all over the place, with relatively few in a given city, so the
likelyhood of two addresses on the same street in the same town was very
low. We /wanted/ to check for this kind of 'duplication'.
Note that endswith will not deal with 'Avenue' vs. 'Ave.', but I supose
a normalization phase could take care of this for you. The Monge
algorithm I pointed you to takes care of this pretty nicely.
--
--------------------------------------------------------------------
Aaron Bingham
Application Developer
Cenix BioScience GmbH
--------------------------------------------------------------------
More information about the Python-list
mailing list