[Python-Dev] \G (match last position) regex operator non-existant in python?
MRAB
python at mrabarnett.plus.com
Sun Oct 29 12:54:09 EDT 2017
On 2017-10-29 12:27, Serhiy Storchaka wrote:
> 27.10.17 18:35, Guido van Rossum пише:
>> The "why" question is not very interesting -- it probably wasn't in PCRE
>> and nobody was familiar with it when we moved off PCRE (maybe it wasn't
>> even in Perl at the time -- it was ~15 years ago).
>>
>> I didn't understand your description of \G so I googled it and found a
>> helpful StackOverflow article:
>> https://stackoverflow.com/questions/21971701/when-is-g-useful-application-in-a-regex.
>> From this I understand that when using e.g. findall() it forces
>> successive matches to be adjacent.
>
> This looks too Perlish to me. In Perl regular expressions are the part
> of language syntax, they can contain even Perl expressions. Arguments to
> them are passed implicitly (as well as to Perl's analogs of str.strip()
> and str.split()) and results are saved in global special variables.
> Loops also can be implicit.
>
> It seems to me that \G makes sense only to re.findall() and
> re.finditer(), not to re.match(), re.search() or re.split().
>
> In Python all this is explicit. Compiled regular expressions are
> objects, and you can pass start and end positions to Pattern.match().
> The Python equivalent of \G looks to me like:
>
> p = re.compile(...)
> i = 0
> while True:
> m = p.match(s, i)
> if not m: break
> ...
> i = m.end()
>
>
You're correct. \G matches at the start position, so .search(r\G\w+')
behaves the same as .match(r'\w+').
findall and finditer perform a series of searches, but with \G at the
start they'll perform a series of matches, each anchored at where the
previous one ended.
> The one also can use the undocumented Pattern.scanner() method. Actually
> Pattern.finditer() is implemented as iter(Pattern.scanner().search).
> iter(Pattern.scanner().match) would return an iterator of adjacent matches.
>
> I think it would be more Pythonic (and much easier) to add a boolean
> parameter to finditer() and findall() than introduce a \G operator.
>
More information about the Python-Dev
mailing list