[Python-Dev] \G (match last position) regex operator non-existant in python?

Ed Peschko horos22 at gmail.com
Fri Oct 27 13:05:28 EDT 2017


>  From this I understand that when using e.g. findall() it forces successive matches to be adjacent.

yes, I admit that this is a clearer description of what \G does. My
only defense is that I wrote my description when it was late. :)

I can only stress how useful it is, especially for debugging regexes.
Basically if you are cutting up any string into discrete chunks, you
want to make sure that you aren't missing any chunks in the middle
when you do the cut.

without \G,  you can miss large sections of string, and it is easy to
overlook. with \G, you are guaranteed to see exactly where your regex
falls down. In addition, there are specific regexes that you can only
write with \G (eg. c parsers)

Anyways, I'll look at regex.


On Fri, Oct 27, 2017 at 8:35 AM, Guido van Rossum <guido at python.org> wrote:
> The "why" question is not very interesting -- it probably wasn't in PCRE and
> nobody was familiar with it when we moved off PCRE (maybe it wasn't even in
> Perl at the time -- it was ~15 years ago).
>
> I didn't understand your description of \G so I googled it and found a
> helpful StackOverflow article:
> https://stackoverflow.com/questions/21971701/when-is-g-useful-application-in-a-regex.
> From this I understand that when using e.g. findall() it forces successive
> matches to be adjacent.
>
> In general this seems to be a unique property of \G: it preserves *state*
> from one match to the next. This will make it somewhat difficult to
> implement -- e.g. that state should probably be thread-local in case
> multiple threads use the same compiled regex. It's also unclear when that
> state should be reset. (Only when you compile the regex? Each time you pass
> it a different source string?)
>
> So I'm not sure it's reasonable to add. But I also don't see a reason why it
> shouldn't be added -- presuming we can decide on good answer for the
> questions above about the "scope" of the anchor.
>
> I think it's okay to start a discussion on bugs.python.org about the precise
> specification of \G for Python. OTOH I expect that most core devs won't find
> this a very interesting problem (Python relies on regexes for parsing a lot
> less than Perl does).
>
> Good luck!
>
> On Thu, Oct 26, 2017 at 11:03 PM, Ed Peschko <horos22 at gmail.com> wrote:
>>
>> All,
>>
>> perl has a regex assertion (\G) that allows multiple-match regular
>> expressions to be able to use the position of the last match. Perl's
>> documentation puts it this way:
>>
>>     \G Match only at pos() (e.g. at the end-of-match position of prior
>> m//g)
>>
>> Anyways, this is exceedingly powerful for matching regularly
>> structured free-form records, and I was really surprised when I found
>> out that python did not have it. For example, if findall supported
>> this, it would be possible to write things like this (a quick and
>> dirty ifconfig parser):
>>
>> pat = re.compile(r'\G(\S+)(.*?\n)(?=\S+|\Z)', re.S)
>>
>> val = """
>> eth2      Link encap:Ethernet  HWaddr xx
>>              inet addr: xx.xx.xx.xx  Bcast:xx.xx.xx.xx  Mask:xx.xx.xx.xx
>> ...
>> lo        Link encap:Local Loopback
>>            inet addr:127.0.0.1  Mask:255.0.0.0
>> """
>>  matches = re.findall(pat, val)
>>
>> So - why doesn't python have this? is it something that simply was
>> overlooked, or is there another method of doing the same thing with
>> arbitrarily complex freeform records?
>>
>> thanks much..
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)


More information about the Python-Dev mailing list