help with simple regular expression grouping with re
Dan Schmidt
dfan at harmonixmusic.com
Mon May 10 09:58:24 EDT 1999
"Tim Peters" <tim_one at email.msn.com> writes:
| import re
| pattern = re.compile(r"""
| " # match an open quote
| ( # start a group so re.findall returns only this part
| [^"]*? # match shortest run of non-quote characters
| ) # close the group
| " # and match the close quote
| """, re.VERBOSE)
|
| answer = re.findall(pattern, your_example)
| for field in answer:
| print field
This works for a tricky reason, which people should be aware of.
I had just written the following response to your code:
Not that it's important, but technically, what you did was overkill.
Because *? is non-greedy, it won't match any quote characters,
because it will be happy to hand off the quote to the next element
of the regexp, which does match it.
So "(.*?)" and "([^"]*)" both solve the problem; you don't need to
disallow quotes _and_ match non-greedily.
And then I decided to test it, just to make sure (replacing '[^"]'
with '.'), and... it failed. Because '.' doesn't match newlines by
default. When I added re.DOTALL to the options at the end, it worked
fine.
Your example works because the character class [^"] (everything
but a double quote) happens to include newlines too. (Actually, I
think you took the newlines out of the input string before you tested
it, so maybe you were just lucky).
So my new claim is that the following is the 'best' regexp, for my
personal definition of best (internal comments deleted):
pattern = re.compile(r'"(.*?)"', re.VERBOSE | re.DOTALL)
--
Dan Schmidt -> dfan at harmonixmusic.com, dfan at alum.mit.edu
Honest Bob & the http://www2.thecia.net/users/dfan/
Factory-to-Dealer Incentives -> http://www2.thecia.net/users/dfan/hbob/
Gamelan Galak Tika -> http://web.mit.edu/galak-tika/www/
More information about the Python-list
mailing list