[Python-ideas] Improving Clarity of re Module

Stephen J. Turnbull stephen at xemacs.org
Wed Nov 27 15:43:00 CET 2013


Ned Batchelder writes:

 > On 11/26/13 5:31 PM, Alex Seewald wrote:

 >> For a match object, m, m.group(0) is the semantics for accessing the
 >> entire span of the match. For newcomers to regular expressions who
 >> are not familiar with the concept of a 'group', the name group(0) is
 >> counter-intuitive. A more natural-language-esque alias to group(0),
 >> perhaps 'matchSpan', could reduce the time novices spend from idea
 >> to working code. Of course, this convenience would introduce a bit of
 >> complexity to the codebase, so it may or may not be worth it to add
 >> an alias to group(0). What do people think?

-1 on "matchSpan", which isn't intuitive to me (and my first guess
would be (match.start, match.end) -- which *isn't* because of
match.span, this is the first I've heard of it although my eyes may have
just slid over it in reading the docs).

-0.5 on the whole idea, the not very clueful students I occasionally
have to lead by the nose through this stuff have no trouble with
.group(0).  Their big problem is getting peeved about the whole idea
that regexps aren't globs, forgetting the period leads to failed matches
that they often fail to diagnose for themselves. :-P

 > I like the idea of a better attribute for accessing the matched
 > text.  I would go for either "m.matched" or "m.text".

Please, not "text"; I would expect that to be the target string, not a
substring.

 > While we're at it, how can it be that we haven't improved the
 > __repr__ after all these years?

Because there are multiple implementations of re?



More information about the Python-ideas mailing list