Stuck on a three word street name regex
Brian D
briandenzer at gmail.com
Thu Jan 28 12:50:37 EST 2010
On Jan 28, 8:27 am, Lie Ryan <lie.1... at gmail.com> wrote:
> On 01/28/10 11:28, Brian D wrote:
>
>
>
> > I've tackled this kind of problem before by looping through a patterns
> > dictionary, but there must be a smarter approach.
>
> > Two addresses. Note that the first has incorrectly transposed the
> > direction and street name. The second has an extra space in it before
> > the street type. Clearly done by someone who didn't know how to
> > concatenate properly -- or didn't care.
>
> > 1000 RAMPART S ST
>
> > 100 JOHN CHURCHILL CHASE ST
>
> > I want to parse the elements into an array of values that can be
> > inserted into new database fields.
>
> > Anyone who loves solving these kinds of puzzles care to relieve my
> > frazzled brain?
>
> > The pattern I'm using doesn't keep the "CHASE" with the "JOHN
> > CHURCHILL":
>
> How does the following perform?
>
> pat =
> re.compile(r'(?P<streetnum>\d+)\s+(?P<streetname>[A-Z\s]+)\s+(?P<streetdir>N|S|W|E|)\s+(?P<streettype>ST|RD|AVE?|)$')
>
> or more legibly:
>
> pat = re.compile(
> r'''
> (?P<streetnum> \d+ ) #M series of digits
> \s+
> (?P<streetname> [A-Z\s]+ ) #M one-or-more word
> \s+
> (?P<streetdir> S?E|SW?|N?W|NE?| ) #O direction or nothing
> \s+
> (?P<streettype> ST|RD|AVE? ) #M street type
> $ #M END
> ''', re.VERBOSE)
Is that all? That little empty space after the "|" OR metacharacter?
Wow.
As a test, to create a failure, if I remove that last "|"
metacharacter from the "N|S|W|E|" string (i.e., "N|S|W|E"), the match
fails on addresses that do not have that malformed direction after the
street name (e.g., '45 JOHN CHURCHILL CHASE ST')
Very clever. I don't think I've ever seen documentation showing that
little trick.
Thanks for enlightening me!
More information about the Python-list
mailing list