Usable street address parser in Python?

Paul McGuire ptmcg at austin.rr.com
Mon Apr 19 08:11:47 CEST 2010


On Apr 17, 2:23 pm, John Nagle <na... at animats.com> wrote:
>    Is there a usable street address parser available?  There are some
> bad ones out there, but nothing good that I've found other than commercial
> products with large databases.  I don't need 100% accuracy, but I'd like
> to be able to extract street name and street number for at least 98% of
> US mailing addresses.
>
>    There's pyparsing, of course. There's a street address parser as an
> example at "http://pyparsing.wikispaces.com/file/view/streetAddressParser.py".
> It's not very good.  It gets all of the following wrong:
>
>         1500 Deer Creek Lane    (Parses "Creek" as a street type")
>         186 Avenue A            (NYC street)
>         2081 N Webb Rd          (Parses N Webb as a street name)
>         2081 N. Webb Rd         (Parses N as street name)
>         1515 West 22nd Street   (Parses "West" as name)
>         2029 Stierlin Court     (Street names starting with "St" misparse.)
>
> Some special cases that don't work, unsurprisingly.
>         P.O. Box 33170
>         The Landmark @ One Market, Suite 200
>         One Market, Suite 200
>         One Market
>

Please take a look at the updated form of this parser.  It turns out
there actually *were* some bugs in the old form, plus there was no
provision for PO Boxes, avenues that start with "Avenue" instead of
ending with them, or house numbers spelled out as words.  The only one
I consider a "special case" is the support for "Avenue X" instead of
"X Avenue" - adding support for the rest was added in a fairly general
way.  With these bug fixes, I hope this improves your hit rate. (There
are also some simple attempts at adding apt/suite numbers, and APO and
AFP in addition to PO boxes - if not exactly what you need, the means
to extend to support other options should be pretty straightforward.)

-- Paul



More information about the Python-list mailing list