Street address parsing in Python, again.
John Nagle
nagle at animats.com
Fri Jun 4 15:59:21 EDT 2010
John Nagle wrote:
> The parser at PyParsing:
>
> http://pyparsing.wikispaces.com/file/view/streetAddressParser.py
>
> ..Bad cases...
> 487 E. Middlefield Rd. -> streetnumber = 487, streetname = E. MIDDLEFIELD
> 487 East Middlefield Road -> streetnumber = 487, streetname = EAST MIDDLEFIELD
> 226 West Wayne Street -> streetnumber = 226, streetname = WEST WAYNE
> New Orchard Road -> streetnumber = , streetname = NEW
> 1 New Orchard Road -> streetnumber = 1 , streetname = NEW
> 390 Park Avenue -> streetnumber =, streetname = 390
Here's a system that gets all the above cases right: the USC Deterministic
Address Parser.
https://webgis.usc.edu/Services/AddressNormalization/Interactive/DeterministicNormalization.aspx
This will parse a street address line alone, without a city, state, or ZIP code,
so it's not using a big database. There's a technical paper
http://gislab.usc.edu/i/publications/gislabtr11.pdf
but it doesn't have that much detail. However, now we know a solution
exists. I've asked USC if they'll make the code available.
John Nagle
More information about the Python-list
mailing list