Usable street address parser in Python?
nagle at animats.com
Sat Apr 17 15:23:54 EDT 2010
Is there a usable street address parser available? There are some
bad ones out there, but nothing good that I've found other than commercial
products with large databases. I don't need 100% accuracy, but I'd like
to be able to extract street name and street number for at least 98% of
US mailing addresses.
There's pyparsing, of course. There's a street address parser as an
example at "http://pyparsing.wikispaces.com/file/view/streetAddressParser.py".
It's not very good. It gets all of the following wrong:
1500 Deer Creek Lane (Parses "Creek" as a street type")
186 Avenue A (NYC street)
2081 N Webb Rd (Parses N Webb as a street name)
2081 N. Webb Rd (Parses N as street name)
1515 West 22nd Street (Parses "West" as name)
2029 Stierlin Court (Street names starting with "St" misparse.)
Some special cases that don't work, unsurprisingly.
P.O. Box 33170
The Landmark @ One Market, Suite 200
One Market, Suite 200
Much of the problem is that this parser starts at the beginning of the string.
US street addresses are best parsed from the end, says the USPS. That's why
things like "Deer Creek Lane" are mis-parsed. It's not clear that regular
expressions are the right tool for this job.
There must be something out there a little better than this.
More information about the Python-list