For a match object, m, m.group(0) is the semantics for accessing the entire span of the match. For newcomers to regular expressions who are not familiar with the concept of a 'group', the name group(0) is counter-intuitive. A more natural-language-esque alias to group(0), perhaps 'matchSpan', could reduce the time novices spend from idea to working code. Of course, this convenience would introduce a bit of complexity to the codebase, so it may or may not be worth it to add an alias to group(0). What do people think? -- Alex Seewald
On 26/11/2013 22:31, Alex Seewald wrote:
For a match object, m, m.group(0) is the semantics for accessing the entire span of the match. For newcomers to regular expressions who are not familiar with the concept of a 'group', the name group(0) is counter-intuitive. A more natural-language-esque alias to group(0), perhaps 'matchSpan', could reduce the time novices spend from idea to working code. Of course, this convenience would introduce a bit of complexity to the codebase, so it may or may not be worth it to add an alias to group(0). What do people think?
Well, including 'span' in the name would be confusing because it already has a .span method which returns the start and end indexes. I think that for newcomers to regexes, the concept of capture groups is one of the easiest things to understand!
On 11/26/2013 11:31 PM, Alex Seewald wrote:
For a match object, m, m.group(0) is the semantics for accessing the entire span of the match. For newcomers to regular expressions who are not familiar with the concept of a 'group', the name group(0) is counter-intuitive. A more natural-language-esque alias to group(0), perhaps 'matchSpan', could reduce the time novices spend from idea to working code.
I do agree and support such a change. Actually, I remember it took me some time to find that expression, precisely. (However, isn't it group() alone, without 0? Haven't used re for a while...) But "m.matchspan" is for the least redondant (since m is a match result). "m.span" or "m.snippet" would nicely do the job, wouldn't it?
Of course, this convenience would introduce a bit of complexity to the codebase, so it may or may not be worth it to add an alias to group(0). What do people think?
At first sight, does not seem that complicated (also, the code exist for group()). How clear is the existing implementation? Denis
On Tue, Nov 26, 2013 at 05:31:09PM -0500, Alex Seewald wrote:
For a match object, m, m.group(0) is the semantics for accessing the entire span of the match. For newcomers to regular expressions who are not familiar with the concept of a 'group', the name group(0) is counter-intuitive. A more natural-language-esque alias to group(0), perhaps 'matchSpan', could reduce the time novices spend from idea to working code. Of course, this convenience would introduce a bit of complexity to the codebase, so it may or may not be worth it to add an alias to group(0). What do people think?
As a beginner to regexes, it is not very long ago that I was a novice to regexes, and I can tell you that in my experience, the difference between group(0) and matchSpan is entirely inconsequential. I was not familiar with either the concept of "group" nor "span", in fact I had never come across the concept of "span" in regards to regexes until I read your email just now. Either way, the name would be jargon I need to learn. -- Steven
On 11/26/13 5:31 PM, Alex Seewald wrote:
For a match object, m, m.group(0) is the semantics for accessing the entire span of the match. For newcomers to regular expressions who are not familiar with the concept of a 'group', the name group(0) is counter-intuitive. A more natural-language-esque alias to group(0), perhaps 'matchSpan', could reduce the time novices spend from idea to working code. Of course, this convenience would introduce a bit of complexity to the codebase, so it may or may not be worth it to add an alias to group(0). What do people think?
I like the idea of a better attribute for accessing the matched text. I would go for either "m.matched" or "m.text". There are convenience methods on match objects that I've almost never used: why do we need both .span() and .start()+.end(), for example? And yet, I use .group() all the time, and have to just accept that my pattern had no groups in it, and I say "group" when I mean "matched text". Yes, I understand about groups, and group 0, etc, but for such a common need, why not have a common name? While we're at it, how can it be that we haven't improved the __repr__ after all these years? >>> m = re.search("[ab]", "xay") >>> m <_sre.SRE_Match object at 0x10a2ce9f0> _sre? SRE_Match? huh? :) --Ned.
-- Alex Seewald
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas
Ned Batchelder writes:
On 11/26/13 5:31 PM, Alex Seewald wrote:
For a match object, m, m.group(0) is the semantics for accessing the entire span of the match. For newcomers to regular expressions who are not familiar with the concept of a 'group', the name group(0) is counter-intuitive. A more natural-language-esque alias to group(0), perhaps 'matchSpan', could reduce the time novices spend from idea to working code. Of course, this convenience would introduce a bit of complexity to the codebase, so it may or may not be worth it to add an alias to group(0). What do people think?
-1 on "matchSpan", which isn't intuitive to me (and my first guess would be (match.start, match.end) -- which *isn't* because of match.span, this is the first I've heard of it although my eyes may have just slid over it in reading the docs). -0.5 on the whole idea, the not very clueful students I occasionally have to lead by the nose through this stuff have no trouble with .group(0). Their big problem is getting peeved about the whole idea that regexps aren't globs, forgetting the period leads to failed matches that they often fail to diagnose for themselves. :-P
I like the idea of a better attribute for accessing the matched text. I would go for either "m.matched" or "m.text".
Please, not "text"; I would expect that to be the target string, not a substring.
While we're at it, how can it be that we haven't improved the __repr__ after all these years?
Because there are multiple implementations of re?
On 27/11/2013 21:41, Greg Ewing wrote:
Stephen J. Turnbull wrote:
Please, not "text"; I would expect that to be the target string, not a substring.
I would have expected 'string' to be the matched text, but it's not. So the most obvious name is already taken. :-(
How about 'all'?
All what? -1 Reading the docs, it refers to the entire "matching string", so how about "matching_string"? Or "matched_part"?
Sounds nice! Here are my names: matchedText matched allGroups On Wed, Nov 27, 2013 at 4:56 PM, MRAB <python@mrabarnett.plus.com> wrote:
On 27/11/2013 21:41, Greg Ewing wrote:
Stephen J. Turnbull wrote:
Please, not "text"; I would expect that to be the target string, not a substring.
I would have expected 'string' to be the matched text, but it's not. So the most obvious name is already taken. :-(
How about 'all'?
All what?
-1
Reading the docs, it refers to the entire "matching string", so how about "matching_string"? Or "matched_part"?
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas
-- Ryan When your hammer is C++, everything begins to look like a thumb.
Ryan Gonzalez <rymg19@gmail.com> writes:
Sounds nice!
Here are my names:
matchedText matched allGroups
If those suggestions are for a library you hope to be widely used in Python, or even part of the standard library, please make them PEP 8 conformant (avoid camelCaseNames). -- \ “Our task must be to free ourselves from our prison by widening | `\ our circle of compassion to embrace all humanity and the whole | _o__) of nature in its beauty.” —Albert Einstein | Ben Finney
Whoops... matched_text matched all_groups Bad habit from C++... On Wed, Nov 27, 2013 at 6:19 PM, Ben Finney <ben+python@benfinney.id.au>wrote:
Ryan Gonzalez <rymg19@gmail.com> writes:
Sounds nice!
Here are my names:
matchedText matched allGroups
If those suggestions are for a library you hope to be widely used in Python, or even part of the standard library, please make them PEP 8 conformant (avoid camelCaseNames).
-- \ “Our task must be to free ourselves from our prison by widening | `\ our circle of compassion to embrace all humanity and the whole | _o__) of nature in its beauty.” —Albert Einstein | Ben Finney
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas
-- Ryan When your hammer is C++, everything begins to look like a thumb.
On 28/11/2013 01:24, Ryan Gonzalez wrote:
Whoops...
matched_text
The tendency is to refer to it as a "string" rather than "text".
matched all_groups
The complaint was that .group isn't clear enough, so "all_groups" is right out! :-) And there's already .groups...
Bad habit from C++...
On Wed, Nov 27, 2013 at 6:19 PM, Ben Finney <ben+python@benfinney.id.au <mailto:ben+python@benfinney.id.au>> wrote:
Ryan Gonzalez <rymg19@gmail.com <mailto:rymg19@gmail.com>> writes:
> Sounds nice! > > Here are my names: > > matchedText > matched > allGroups
If those suggestions are for a library you hope to be widely used in Python, or even part of the standard library, please make them PEP 8 conformant (avoid camelCaseNames).
[snip]
From: Ryan Gonzalez <rymg19@gmail.com>
Whoops...
matched_text matched all_groups
Bad habit from C++...
When did C++ change from for_each, find_if, and replace_copy to forEach, findIf, and replaceCopy? The entire C++ standard library, the C standard library that it incorporates by references, Boost and many related third-party libraries, the POSIX API that's available on almost every platform (even Windows), etc. are all lowercase_with_underscores. Did you mean "Windows C API" when you said C++?
Am 28.11.2013 06:41, schrieb Andrew Barnert:
From: Ryan Gonzalez <rymg19@gmail.com>
Whoops...
matched_text matched all_groups
Bad habit from C++...
When did C++ change from for_each, find_if, and replace_copy to forEach, findIf, and replaceCopy? The entire C++ standard library, the C standard library that it incorporates by references, Boost and many related third-party libraries, the POSIX API that's available on almost every platform (even Windows), etc. are all lowercase_with_underscores.
Did you mean "Windows C API" when you said C++?
I don't think C++ stdlib and Boost are the only ways to program in C++ -- take Qt for example, it has the camelCase convention (and don't I wish it hadn't). Georg
On Nov 27, 2013, at 23:43, Georg Brandl <g.brandl@gmx.net> wrote:
Am 28.11.2013 06:41, schrieb Andrew Barnert:
From: Ryan Gonzalez <rymg19@gmail.com>
Whoops...
matched_text matched all_groups
Bad habit from C++...
When did C++ change from for_each, find_if, and replace_copy to forEach, findIf, and replaceCopy? The entire C++ standard library, the C standard library that it incorporates by references, Boost and many related third-party libraries, the POSIX API that's available on almost every platform (even Windows), etc. are all lowercase_with_underscores.
Did you mean "Windows C API" when you said C++?
I don't think C++ stdlib and Boost are the only ways to program in C++ -- take Qt for example, it has the camelCase convention (and don't I wish it hadn't).
Well, sure, and PyQt and PySide have the same camelCase conventions in Python.
It was the book I used to learn the language that got me into that habit. In fact, I've only used the Windows C API once, and I gave up because I couldn't handle the lack of exception handling. *shudders* Andrew Barnert <abarnert@yahoo.com> wrote:
From: Ryan Gonzalez <rymg19@gmail.com>
Whoops...
matched_text matched all_groups
Bad habit from C++...
When did C++ change from for_each, find_if, and replace_copy to forEach, findIf, and replaceCopy? The entire C++ standard library, the C standard library that it incorporates by references, Boost and many related third-party libraries, the POSIX API that's available on almost every platform (even Windows), etc. are all lowercase_with_underscores.
Did you mean "Windows C API" when you said C++?
-- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
participants (12)
-
Alex Seewald
-
Andrew Barnert
-
Ben Finney
-
Georg Brandl
-
Greg Ewing
-
MRAB
-
Ned Batchelder
-
Ryan
-
Ryan Gonzalez
-
spir
-
Stephen J. Turnbull
-
Steven D'Aprano