Usable street address parser in Python?
Tim Roberts
timr at probo.com
Tue Apr 20 02:53:09 EDT 2010
John Nagle <nagle at animats.com> wrote:
>
> Unfortunately, now it won't run with the released
>version of "pyparsing" (1.5.2, from April 2009), because it uses
>"originalTextFor", a feature introduced since then. I worked around that,
>but discovered that the new version is case-sensitive. Changed
>"Keyword" to "CaselessKeyword" where appropriate.
>
> I put in the full list of USPS street types, and discovered
>that "1500 DEER CREEK LANE" still parses with a street name
>of "DEER", and a street type fo "CREEK", because "CREEK" is a
>USPS street type. Need to do something to pick up the last street
>type, not the first. I'm not sure how to do that with pyparsing.
>Maybe if I buy the book...
>
> There's still a problem with: "2081 N Webb Rd", where the street name
>comes out as "N WEBB".
>Addresses like "1234 5th St. S." yield a street name of "5 TH",
>but if the directional is before the name, it ends up with the name.
>
> Getting closer, though. If I can get to 95% of common cases, I'll
>be happy.
This is a very tricky problem. Consider Salem, Oregon, which puts the
direction after the street:
3340 Astoria Way NE
Salem, OR 97303
Consider northern Los Angeles County, which use directions both before and
after. I used to live at:
44720 N 2nd St E
Lancaster, CA 93534
Consider much of Utah, which is both easy (because of its very neat grid)
and a pain, because of addresses like:
389 W 1700 S
Salt Lake City, UT 84115
--
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.
More information about the Python-list
mailing list