[issue2636] Regexp 2.6 (modifications to current re 2.2.2)
Jeffrey C. Jacobs
report at bugs.python.org
Thu Jun 19 14:01:32 CEST 2008
Jeffrey C. Jacobs <timehorse at users.sourceforge.net> added the comment:
Thanks for weighing in Mark! Actually, your point is valid and quite
fair, though I would not assume that Item 3 would be included just
because Item 2 isn't. I will do my best to develop both, but I do not
make the final decision as to what python includes. That having been
said, 3 seems very likely at this point so we may be okay, but let me
give this one more try as I think I have a better solution to make Item
2 more palatable. Let's say we have 5 choices here:
> a) Simply disallow the exposure of match group name attributes if the
> names collide with an existing member of the basic Match Object
> interface.
>
> b) Expose the reserved names through a special prefix notation, and
> for forward compatibility, expose all names via this prefix notation.
> In other words, if the prefix was 'k', match.kpos could be used to
> access pos; if it was '_', match._pos would be used. If Item 3 is
> implemented, it may be sufficient to allow access via match['pos'] as
> the canonical way of handling match group names using reserved words.
>
> c) Don't expose the names directly; only expose them through a
> prefixed name, e.g. match._pos or match.kpos.
d) (As Mark suggested) we drop Item 2 completely. I have not invested
much work in this so that would not bother me, but IMHO I actually
prefer Item 2 to 3 so I would really like to see it preserved in some
form.
e) Add an option, re.MATCH_ATTRIBUTES, that is used as a Match Creation
flag. When the re.MATCH_ATTRIBUTES or re.A flag is included in the
compile, or (?a) is included in the pattern, it will do 2 things.
First, it will raise an exception if either a) there exists an unnamed
capture group or b) the capture group name is a reserved keyword. In
addition to this, I would put in a hook to support a from __future__ so
that any post 2.6 changes to the match object type can be smoothly
integrated a version early to allow programmers to change when any
future changes come. Secondly, I would *conditionally* allow arbitrary
capture group name via the __getattr__ handler IFF that flag was
present; otherwise you could not access Capture Groups by name via
match.foo.
I really like the idea of e) so I'm taking Item 2 out of the "ready for
merge" category and going to put it in the queue for the modifications
spelled out above. I'm not too worried about our flags differing from
Perl too much as we did base our first 4 on Perl (x, s, m, i), but
subsequently added Unicode and Locale, which Perl does not have, and
never implemented o (since our caching semantic already pretty much
gives every expression that), e (which is specific to Perl syntax
AFAICT) and g (which can be simulated via re.split). So I propose we
take A and implement it as I've specified and that is the current goal
of Item 2. Once this is done and working, we can decide whether it
should be included in the python trunk.
How does that sound to you, Mark and anyone else who wishes to weigh in?
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2636>
_______________________________________
More information about the Python-bugs-list
mailing list