
Dear Idealists, I notice that when you do use the re module grouping, that it only tells you what it matched last: Dumb Real Python Code:
import re match=re.search('^(?P<letter>[a-z])*$', 'abcz') match.groupdict() {'letter': '0'} What happened to all the other matches. Now here is a cool idea.
import re match=re.search('^(?P<letter>[a-z])*$', 'abcz') match.groupdict() {'number': '0'} {'letter.0':'a', 'letter.1':'b', 'letter.2':'c', 'letter.3':'z'}
Cool Improved Python Code * * Now, we see all that it matched. Now the problem with this and all ideas is reverse compatibility. So an addition is also too.
import re match=re.search('^(?P*P*<number>[a-z])** and also (?PP=letter.0)(?PP=letter.-1)*$', 'abcz* and also az*') match.groupdict() {'letter.0':'a', 'letter.1':'b', 'letter.2':'c', 'letter.3':'z'}
Notice how I added an extra P. I also made it so that matching it in the text is also more adaptable. Please consider this idea. Sincerely, Me

Could you elaborate on the change? I don't understand your modification. The regex is a different one than the original, as well. I do agree that remembering all the groups would be nice, at least if it could be done reasonably. Devin On Sun, Jul 31, 2011 at 8:36 PM, Christopher King <g.nius.ck@gmail.com> wrote:
Dear Idealists, I notice that when you do use the re module grouping, that it only tells you what it matched last: Dumb Real Python Code:
import re match=re.search('^(?P<letter>[a-z])*$', 'abcz') match.groupdict() {'letter': '0'} What happened to all the other matches. Now here is a cool idea. Cool Improved Python Code import re match=re.search('^(?P<letter>[a-z])*$', 'abcz') match.groupdict() {'number': '0'} {'letter.0':'a', 'letter.1':'b', 'letter.2':'c', 'letter.3':'z'}
Now, we see all that it matched. Now the problem with this and all ideas is reverse compatibility. So an addition is also too.
import re match=re.search('^(?PP<number>[a-z])* and also (?PP=letter.0)(?PP=letter.-1)$', 'abcz and also az') match.groupdict() {'letter.0':'a', 'letter.1':'b', 'letter.2':'c', 'letter.3':'z'} Notice how I added an extra P. I also made it so that matching it in the text is also more adaptable. Please consider this idea. Sincerely, Me
Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

On Sun, Jul 31, 2011 at 8:41 PM, Devin Jeanpierre <jeanpierreda@gmail.com>wrote:
Could you elaborate on the change? I don't understand your modification. The regex is a different one than the original, as well.
What do you mean by elaborate on the change. You mean explain. I guess I could do it in more detail. What would happen is if you do something like. match=re.search('^(?*PP*<tag>[a-z])*$', 'abc') Then the match.groupdict() would return {'tag.0':'a', 'tag.1':'b', 'tag.2':'c', 'tag.-1':'c', 'tag.-2':'b', 'tag.-3':'a'} notice the PP. This means that it will save all the times it matches. It does this by adding a decimal after the tag to show the index. It also supports negative indexing in case you want the last time it matched. All these can be used with the old (?P=tag.-2) with it. Also, are there any forbidden characters in a tag. That would be good to add so it won't mess with current tags.

On Jul 31, 2011 8:57 PM, "Christopher King" <g.nius.ck@gmail.com> wrote:
What would happen is if you do something like.
match=re.search('^(?PP<tag>[a-z])*$', 'abc') Then the match.groupdict() would return {'tag.0':'a', 'tag.1':'b', 'tag.2':'c', 'tag.-1':'c', 'tag.-2':'b', 'tag.-3':'a'} notice the PP. This means that it will save all the times it matches.
If you want to return something that supports negative indexing, why not return a list instead of an ad-hoc string representation?

Tim Lesher wrote:
On Jul 31, 2011 8:57 PM, "Christopher King" <g.nius.ck <http://g.nius.ck>@gmail.com <http://gmail.com>> wrote:
{'tag.0':'a', 'tag.1':'b', 'tag.2':'c', 'tag.-1':'c', 'tag.-2':'b',
'tag.-3':'a'}
why not return a list instead of an ad-hoc string representation?
That's my thought, too. The proposed scheme looks very unpythonic. -- Greg

On Mon, Aug 1, 2011 at 10:56 AM, Christopher King <g.nius.ck@gmail.com> wrote:
On Sun, Jul 31, 2011 at 8:41 PM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:
Could you elaborate on the change? I don't understand your modification. The regex is a different one than the original, as well.
What do you mean by elaborate on the change. You mean explain. I guess I could do it in more detail.
By elaborate on the change, I expect Devin meant a more accurate description of the problem you're trying to solve without the confusing and irrelevant noise about named groups. Specifically:
match=re.search('^([a-z])*$', 'abcz') match.groups() ('z',)
You're asking for '*' and '+' to change the group numbers based on the number of matches that actually occur. This is untenable, which should become clear as soon as another group is placed after the looping constructs:
match=re.search('^([a-y])*(.*)$', 'abcz') match.groups() ('c', 'z')
Group names/numbers are assigned when the regex is compiled. They cannot be affected by runtime information based on the string being processed. The way to handle this (while still using the re module to do the parsing) is multi-level parsing:
match=re.search('^([a-z]*)$', 'abcz') relevant = match.group(0) pattern = re.compile('([a-z])') for match in pattern.finditer(relevant): ... print(match.groups()) ... ('a',) ('b',) ('c',) ('z',)
There's no reason to try to embed the functionality of finditer() into the regex itself (and it's utterly impractical to do so anyway). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Am 01.08.2011 02:36, schrieb Christopher King:
Dear Idealists, I notice that when you do use the re module grouping, that it only tells you what it matched last:
Dumb Real Python Code:
import re match=re.search('^(?P<letter>[a-z])*$', 'abcz') match.groupdict() {'letter': '0'} What happened to all the other matches. Now here is a cool idea.
Cool Improved Python Code
import re match=re.search('^(?P<letter>[a-z])*$', 'abcz') match.groupdict() {'number': '0'} {'letter.0':'a', 'letter.1':'b', 'letter.2':'c', 'letter.3':'z'}
The "regex" module by Matthew Barnett already supports this: https://code.google.com/p/mrab-regex-hg/ Georg

On Mon, Aug 1, 2011 at 3:12 PM, Georg Brandl <g.brandl@gmx.net> wrote:
The "regex" module by Matthew Barnett already supports this:
The PyPI page is more helpful, since it has the docs: http://pypi.python.org/pypi/regex (the relevant section is the captures() API under "Repeated captures") So clearly it sets up the additional storage under the hood when the pattern is compiled. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (6)
-
Christopher King
-
Devin Jeanpierre
-
Georg Brandl
-
Greg Ewing
-
Nick Coghlan
-
Tim Lesher