Usable street address parser in Python?

John Nagle nagle at animats.com
Tue Apr 20 13:16:04 EDT 2010


Iain King wrote:
> Not sure on the volume of addresses you're working with, but as an
> alternative you could try grabbing the zip code, looking up all
> addresses in that zip code, and then finding whatever one of those
> address strings most closely resembles your address string (smallest
> Levenshtein distance?).

    The parser doesn't have to be perfect, but it should
reliably reports when it fails.  Then I can run the hard cases through
one of the commercial online address standardizers.  I'd like to
be able to knock off the easy cases cheaply.

    What I want to do is to first extract the street number and
undecorated street name only, match that to a large database of US businesses
stored in MySQL, and then find the best match from the database
hits.  So I need reliable extraction of undecorated street name and number.  The
other fields are less important.

				John Nagle



More information about the Python-list mailing list