Usable street address parser in Python?

Tim Roberts timr at probo.com
Tue Apr 20 08:53:09 CEST 2010


John Nagle <nagle at animats.com> wrote:
>
>   Unfortunately, now it won't run with the released
>version of "pyparsing" (1.5.2, from April 2009), because it uses
>"originalTextFor", a feature introduced since then.  I worked around that,
>but discovered that the new version is case-sensitive.  Changed
>"Keyword" to "CaselessKeyword" where appropriate.
>
>   I put in the full list of USPS street types, and discovered
>that "1500 DEER CREEK LANE" still parses with a street name
>of "DEER", and a street type fo "CREEK", because "CREEK" is a
>USPS street type.  Need to do something to pick up the last street
>type, not the first.  I'm not sure how to do that with pyparsing.
>Maybe if I buy the book...
>
>   There's still a problem with: "2081 N Webb Rd", where the street name
>comes out as "N WEBB".
>Addresses like "1234 5th St. S." yield a street name of "5 TH",
>but if the directional is before the name, it ends up with the name.
>
>   Getting closer, though.  If I can get to 95% of common cases, I'll
>be happy.

This is a very tricky problem.  Consider Salem, Oregon, which puts the
direction after the street:

    3340 Astoria Way NE
    Salem, OR 97303

Consider northern Los Angeles County, which use directions both before and
after.  I used to live at:

    44720 N 2nd St E
    Lancaster, CA  93534

Consider much of Utah, which is both easy (because of its very neat grid)
and a pain, because of addresses like:

    389 W 1700 S
    Salt Lake City, UT  84115
-- 
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.



More information about the Python-list mailing list