[Python-Dev] \G (match last position) regex operator non-existant in python?

Serhiy Storchaka storchaka at gmail.com
Sun Oct 29 08:27:22 EDT 2017

27.10.17 18:35, Guido van Rossum пише:
> The "why" question is not very interesting -- it probably wasn't in PCRE 
> and nobody was familiar with it when we moved off PCRE (maybe it wasn't 
> even in Perl at the time -- it was ~15 years ago).
> I didn't understand your description of \G so I googled it and found a 
> helpful StackOverflow article: 
> https://stackoverflow.com/questions/21971701/when-is-g-useful-application-in-a-regex. 
>  From this I understand that when using e.g. findall() it forces 
> successive matches to be adjacent.

This looks too Perlish to me. In Perl regular expressions are the part 
of language syntax, they can contain even Perl expressions. Arguments to 
them are passed implicitly (as well as to Perl's analogs of str.strip() 
and str.split()) and results are saved in global special variables. 
Loops also can be implicit.

It seems to me that \G makes sense only to re.findall() and 
re.finditer(), not to re.match(), re.search() or re.split().

In Python all this is explicit. Compiled regular expressions are 
objects, and you can pass start and end positions to Pattern.match(). 
The Python equivalent of \G looks to me like:

p = re.compile(...)
i = 0
while True:
     m = p.match(s, i)
     if not m: break
     i = m.end()

The one also can use the undocumented Pattern.scanner() method. Actually 
Pattern.finditer() is implemented as iter(Pattern.scanner().search). 
iter(Pattern.scanner().match) would return an iterator of adjacent matches.

I think it would be more Pythonic (and much easier) to add a boolean 
parameter to finditer() and findall() than introduce a \G operator.

More information about the Python-Dev mailing list