[Python-bugs-list] [ python-Bugs-476912 ] regex annoyance
noreply@sourceforge.net
noreply@sourceforge.net
Wed, 31 Oct 2001 12:17:40 -0800
Bugs item #476912, was opened at 2001-10-31 12:17
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=476912&group_id=5470
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Bill Bumgarner (bbum)
Assigned to: Nobody/Anonymous (nobody)
Summary: regex annoyance
Initial Comment:
(this may be a feature request-- but it is annoying
enough that I filed it as a bug)
Python's named sub expressions within regular
expressions are an incredibly valuable feature;
between it and the ability to automatically collapse
multiline regex's w/comments leads to very
readable regex's.
However, there is an annoyance in named
subexpressions that has bitten me several times.
Namely, if you have a situation where a particular
token must be parsed out of the input through the
use of one of two (or more) expressions in a
fashion that cannot be expressed without multiple
possible means of matching any given
subexpression, then the named subexpression
will only be non-None intermittently (depending on
expression order and what was matched).
That is, given:
(?:(?<Tok1>[a-z]+)\s(?<Tok2>[a-z]+))|(?:(?<Tok1>
[a-z]+)\t(?<Tok2>[a-z]+))
In this case, Tok1 and Tok2 will be None if the first
expression matches...
(Yes, this is a contrived example that could be
refactored to not use multiple <Tok1>/<Tok2>
references-- however, more complex expressions
do not always enable easy refactoring.)
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=476912&group_id=5470