[Python-Dev] \G (match last position) regex operator non-existant in python?

Guido van Rossum guido at python.org
Fri Oct 27 11:35:58 EDT 2017


The "why" question is not very interesting -- it probably wasn't in PCRE
and nobody was familiar with it when we moved off PCRE (maybe it wasn't
even in Perl at the time -- it was ~15 years ago).

I didn't understand your description of \G so I googled it and found a
helpful StackOverflow article:
https://stackoverflow.com/questions/21971701/when-is-g-useful-application-in-a-regex.
>From this I understand that when using e.g. findall() it forces successive
matches to be adjacent.

In general this seems to be a unique property of \G: it preserves *state*
from one match to the next. This will make it somewhat difficult to
implement -- e.g. that state should probably be thread-local in case
multiple threads use the same compiled regex. It's also unclear when that
state should be reset. (Only when you compile the regex? Each time you pass
it a different source string?)

So I'm not sure it's reasonable to add. But I also don't see a reason why
it shouldn't be added -- presuming we can decide on good answer for the
questions above about the "scope" of the anchor.

I think it's okay to start a discussion on bugs.python.org about the
precise specification of \G for Python. OTOH I expect that most core devs
won't find this a very interesting problem (Python relies on regexes for
parsing a lot less than Perl does).

Good luck!

On Thu, Oct 26, 2017 at 11:03 PM, Ed Peschko <horos22 at gmail.com> wrote:

> All,
>
> perl has a regex assertion (\G) that allows multiple-match regular
> expressions to be able to use the position of the last match. Perl's
> documentation puts it this way:
>
>     \G Match only at pos() (e.g. at the end-of-match position of prior
> m//g)
>
> Anyways, this is exceedingly powerful for matching regularly
> structured free-form records, and I was really surprised when I found
> out that python did not have it. For example, if findall supported
> this, it would be possible to write things like this (a quick and
> dirty ifconfig parser):
>
> pat = re.compile(r'\G(\S+)(.*?\n)(?=\S+|\Z)', re.S)
>
> val = """
> eth2      Link encap:Ethernet  HWaddr xx
>              inet addr: xx.xx.xx.xx  Bcast:xx.xx.xx.xx  Mask:xx.xx.xx.xx
> ...
> lo        Link encap:Local Loopback
>            inet addr:127.0.0.1  Mask:255.0.0.0
> """
>  matches = re.findall(pat, val)
>
> So - why doesn't python have this? is it something that simply was
> overlooked, or is there another method of doing the same thing with
> arbitrarily complex freeform records?
>
> thanks much..
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20171027/044737fd/attachment.html>


More information about the Python-Dev mailing list