Stuck on a three word street name regex

Brian D briandenzer at gmail.com
Thu Jan 28 18:50:37 CET 2010


On Jan 28, 8:27 am, Lie Ryan <lie.1... at gmail.com> wrote:
> On 01/28/10 11:28, Brian D wrote:
>
>
>
> > I've tackled this kind of problem before by looping through a patterns
> > dictionary, but there must be a smarter approach.
>
> > Two addresses. Note that the first has incorrectly transposed the
> > direction and street name. The second has an extra space in it before
> > the street type. Clearly done by someone who didn't know how to
> > concatenate properly -- or didn't care.
>
> > 1000 RAMPART S ST
>
> > 100 JOHN CHURCHILL CHASE  ST
>
> > I want to parse the elements into an array of values that can be
> > inserted into new database fields.
>
> > Anyone who loves solving these kinds of puzzles care to relieve my
> > frazzled brain?
>
> > The pattern I'm using doesn't keep the "CHASE" with the "JOHN
> > CHURCHILL":
>
> How does the following perform?
>
> pat =
> re.compile(r'(?P<streetnum>\d+)\s+(?P<streetname>[A-Z\s]+)\s+(?P<streetdir>N|S|W|E|)\s+(?P<streettype>ST|RD|AVE?|)$')
>
> or more legibly:
>
> pat = re.compile(
>     r'''
>       (?P<streetnum>  \d+              )  #M series of digits
>       \s+
>       (?P<streetname> [A-Z\s]+         )  #M one-or-more word
>       \s+
>       (?P<streetdir>  S?E|SW?|N?W|NE?| )  #O direction or nothing
>       \s+
>       (?P<streettype> ST|RD|AVE?       )  #M street type
>       $                                   #M END
>     ''', re.VERBOSE)

Is that all? That little empty space after the "|" OR metacharacter?
Wow.

As a test, to create a failure, if I remove that last "|"
metacharacter from the "N|S|W|E|" string (i.e., "N|S|W|E"), the match
fails on addresses that do not have that malformed direction after the
street name (e.g., '45 JOHN CHURCHILL CHASE  ST')

Very clever. I don't think I've ever seen documentation showing that
little trick.

Thanks for enlightening me!



More information about the Python-list mailing list