Usable street address parser in Python?
John Nagle
nagle at animats.com
Tue Apr 20 13:16:04 EDT 2010
Iain King wrote:
> Not sure on the volume of addresses you're working with, but as an
> alternative you could try grabbing the zip code, looking up all
> addresses in that zip code, and then finding whatever one of those
> address strings most closely resembles your address string (smallest
> Levenshtein distance?).
The parser doesn't have to be perfect, but it should
reliably reports when it fails. Then I can run the hard cases through
one of the commercial online address standardizers. I'd like to
be able to knock off the easy cases cheaply.
What I want to do is to first extract the street number and
undecorated street name only, match that to a large database of US businesses
stored in MySQL, and then find the best match from the database
hits. So I need reliable extraction of undecorated street name and number. The
other fields are less important.
John Nagle
More information about the Python-list
mailing list