[issue2636] Regexp 2.6 (modifications to current re 2.2.2)

Jeffrey C. Jacobs report at bugs.python.org
Thu Jun 19 14:01:32 CEST 2008


Jeffrey C. Jacobs <timehorse at users.sourceforge.net> added the comment:

Thanks for weighing in Mark!  Actually, your point is valid and quite 
fair, though I would not assume that Item 3 would be included just 
because Item 2 isn't.  I will do my best to develop both, but I do not 
make the final decision as to what python includes.  That having been 
said, 3 seems very likely at this point so we may be okay, but let me 
give this one more try as I think I have a better solution to make Item 
2 more palatable.  Let's say we have 5 choices here:

> a) Simply disallow the exposure of match group name attributes if the 
> names collide with an existing member of the basic Match Object 
> interface.
>
> b) Expose the reserved names through a special prefix notation, and
> for forward compatibility, expose all names via this prefix notation. 
> In other words, if the prefix was 'k', match.kpos could be used to
> access pos; if it was '_', match._pos would be used.  If Item 3 is
> implemented, it may be sufficient to allow access via match['pos'] as
> the canonical way of handling match group names using reserved words.
>
> c) Don't expose the names directly; only expose them through a
> prefixed name, e.g. match._pos or match.kpos.

d) (As Mark suggested) we drop Item 2 completely.  I have not invested 
much work in this so that would not bother me, but IMHO I actually 
prefer Item 2 to 3 so I would really like to see it preserved in some 
form.

e) Add an option, re.MATCH_ATTRIBUTES, that is used as a Match Creation 
flag.  When the re.MATCH_ATTRIBUTES or re.A flag is included in the 
compile, or (?a) is included in the pattern, it will do 2 things.  
First, it will raise an exception if either a) there exists an unnamed 
capture group or b) the capture group name is a reserved keyword.  In 
addition to this, I would put in a hook to support a from __future__ so 
that any post 2.6 changes to the match object type can be smoothly 
integrated a version early to allow programmers to change when any 
future changes come.  Secondly, I would *conditionally* allow arbitrary 
capture group name via the __getattr__ handler IFF that flag was 
present; otherwise you could not access Capture Groups by name via 
match.foo.

I really like the idea of e) so I'm taking Item 2 out of the "ready for 
merge" category and going to put it in the queue for the modifications 
spelled out above.  I'm not too worried about our flags differing from 
Perl too much as we did base our first 4 on Perl (x, s, m, i), but 
subsequently added Unicode and Locale, which Perl does not have, and 
never implemented o (since our caching semantic already pretty much 
gives every expression that), e (which is specific to Perl syntax 
AFAICT) and g (which can be simulated via re.split).  So I propose we 
take A and implement it as I've specified and that is the current goal 
of Item 2.  Once this is done and working, we can decide whether it 
should be included in the python trunk.

How does that sound to you, Mark and anyone else who wishes to weigh in?

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2636>
_______________________________________


More information about the Python-bugs-list mailing list