Usable street address parser in Python?

John Nagle nagle at animats.com
Tue Apr 20 07:12:09 CEST 2010


John Nagle wrote:
>   Is there a usable street address parser available?  There are some
> bad ones out there, but nothing good that I've found other than commercial
> products with large databases.  I don't need 100% accuracy, but I'd like
> to be able to extract street name and street number for at least 98% of
> US mailing addresses.
> 
>   There's pyparsing, of course. There's a street address parser as an
> example at 
> "http://pyparsing.wikispaces.com/file/view/streetAddressParser.py".

   The author of that module has changed the code, and it has some
new features.  This is much better.

   Unfortunately, now it won't run with the released
version of "pyparsing" (1.5.2, from April 2009), because it uses
"originalTextFor", a feature introduced since then.  I worked around that,
but discovered that the new version is case-sensitive.  Changed
"Keyword" to "CaselessKeyword" where appropriate.

   I put in the full list of USPS street types, and discovered
that "1500 DEER CREEK LANE" still parses with a street name
of "DEER", and a street type fo "CREEK", because "CREEK" is a
USPS street type.  Need to do something to pick up the last street
type, not the first.  I'm not sure how to do that with pyparsing.
Maybe if I buy the book...

   There's still a problem with: "2081 N Webb Rd", where the street name
comes out as "N WEBB".
Addresses like "1234 5th St. S." yield a street name of "5 TH",
but if the directional is before the name, it ends up with the name.

   Getting closer, though.  If I can get to 95% of common cases, I'll
be happy.


				John Nagle



More information about the Python-list mailing list