Stuck on a three word street name regex
briandenzer at gmail.com
Thu Jan 28 18:50:37 CET 2010
On Jan 28, 8:27 am, Lie Ryan <lie.1... at gmail.com> wrote:
> On 01/28/10 11:28, Brian D wrote:
> > I've tackled this kind of problem before by looping through a patterns
> > dictionary, but there must be a smarter approach.
> > Two addresses. Note that the first has incorrectly transposed the
> > direction and street name. The second has an extra space in it before
> > the street type. Clearly done by someone who didn't know how to
> > concatenate properly -- or didn't care.
> > 1000 RAMPART S ST
> > 100 JOHN CHURCHILL CHASE ST
> > I want to parse the elements into an array of values that can be
> > inserted into new database fields.
> > Anyone who loves solving these kinds of puzzles care to relieve my
> > frazzled brain?
> > The pattern I'm using doesn't keep the "CHASE" with the "JOHN
> > CHURCHILL":
> How does the following perform?
> pat =
> or more legibly:
> pat = re.compile(
> (?P<streetnum> \d+ ) #M series of digits
> (?P<streetname> [A-Z\s]+ ) #M one-or-more word
> (?P<streetdir> S?E|SW?|N?W|NE?| ) #O direction or nothing
> (?P<streettype> ST|RD|AVE? ) #M street type
> $ #M END
> ''', re.VERBOSE)
Is that all? That little empty space after the "|" OR metacharacter?
As a test, to create a failure, if I remove that last "|"
metacharacter from the "N|S|W|E|" string (i.e., "N|S|W|E"), the match
fails on addresses that do not have that malformed direction after the
street name (e.g., '45 JOHN CHURCHILL CHASE ST')
Very clever. I don't think I've ever seen documentation showing that
Thanks for enlightening me!
More information about the Python-list