help with simple regular expression grouping with re

Dan Schmidt dfan at harmonixmusic.com
Mon May 10 09:58:24 EDT 1999


"Tim Peters" <tim_one at email.msn.com> writes:

| import re
| pattern = re.compile(r"""
|     "           # match an open quote
|     (           # start a group so re.findall returns only this part
|         [^"]*?  # match shortest run of non-quote characters
|     )           # close the group
|     "           # and match the close quote
| """, re.VERBOSE)
| 
| answer = re.findall(pattern, your_example)
| for field in answer:
|     print field

This works for a tricky reason, which people should be aware of.

I had just written the following response to your code:

  Not that it's important, but technically, what you did was overkill.
  Because *? is non-greedy, it won't match any quote characters,
  because it will be happy to hand off the quote to the next element
  of the regexp, which does match it.

  So "(.*?)" and "([^"]*)" both solve the problem; you don't need to
  disallow quotes _and_ match non-greedily.

And then I decided to test it, just to make sure (replacing '[^"]'
with '.'), and... it failed.  Because '.' doesn't match newlines by
default.  When I added re.DOTALL to the options at the end, it worked
fine.

Your example works because the character class [^"] (everything
but a double quote) happens to include newlines too.  (Actually, I
think you took the newlines out of the input string before you tested
it, so maybe you were just lucky).

So my new claim is that the following is the 'best' regexp, for my
personal definition of best (internal comments deleted):

pattern = re.compile(r'"(.*?)"', re.VERBOSE | re.DOTALL)

-- 
                 Dan Schmidt -> dfan at harmonixmusic.com, dfan at alum.mit.edu
Honest Bob & the                http://www2.thecia.net/users/dfan/
Factory-to-Dealer Incentives -> http://www2.thecia.net/users/dfan/hbob/
          Gamelan Galak Tika -> http://web.mit.edu/galak-tika/www/




More information about the Python-list mailing list