[Python-Dev] \G (match last position) regex operator non-existant in python?
Guido van Rossum
guido at python.org
Fri Oct 27 11:57:57 EDT 2017
Oh. Yes, that is being discussed about once a year two. It seems Matthew
isn't very interested in helping out with the port, and there are some
concerns about backwards compatibility with the `re` module. I think it
needs a champion!
On Fri, Oct 27, 2017 at 8:50 AM, Tim Peters <tim.peters at gmail.com> wrote:
> Note that Matthew Barnett's `regex` module already supports \G, and a
> great many other features that weren't around 15 years ago ;-) either:
> I haven't followed this in detail. I'm just surprised once per year
> that it hasn't been folded into the core ;-)
> [nothing new below]
> On Fri, Oct 27, 2017 at 10:35 AM, Guido van Rossum <guido at python.org>
> > The "why" question is not very interesting -- it probably wasn't in PCRE
> > nobody was familiar with it when we moved off PCRE (maybe it wasn't even
> > Perl at the time -- it was ~15 years ago).
> > I didn't understand your description of \G so I googled it and found a
> > helpful StackOverflow article:
> > https://stackoverflow.com/questions/21971701/when-is-g-
> > From this I understand that when using e.g. findall() it forces
> > matches to be adjacent.
> > In general this seems to be a unique property of \G: it preserves *state*
> > from one match to the next. This will make it somewhat difficult to
> > implement -- e.g. that state should probably be thread-local in case
> > multiple threads use the same compiled regex. It's also unclear when that
> > state should be reset. (Only when you compile the regex? Each time you
> > it a different source string?)
> > So I'm not sure it's reasonable to add. But I also don't see a reason
> why it
> > shouldn't be added -- presuming we can decide on good answer for the
> > questions above about the "scope" of the anchor.
> > I think it's okay to start a discussion on bugs.python.org about the
> > specification of \G for Python. OTOH I expect that most core devs won't
> > this a very interesting problem (Python relies on regexes for parsing a
> > less than Perl does).
> > Good luck!
> > On Thu, Oct 26, 2017 at 11:03 PM, Ed Peschko <horos22 at gmail.com> wrote:
> >> All,
> >> perl has a regex assertion (\G) that allows multiple-match regular
> >> expressions to be able to use the position of the last match. Perl's
> >> documentation puts it this way:
> >> \G Match only at pos() (e.g. at the end-of-match position of prior
> >> m//g)
> >> Anyways, this is exceedingly powerful for matching regularly
> >> structured free-form records, and I was really surprised when I found
> >> out that python did not have it. For example, if findall supported
> >> this, it would be possible to write things like this (a quick and
> >> dirty ifconfig parser):
> >> pat = re.compile(r'\G(\S+)(.*?\n)(?=\S+|\Z)', re.S)
> >> val = """
> >> eth2 Link encap:Ethernet HWaddr xx
> >> inet addr: xx.xx.xx.xx Bcast:xx.xx.xx.xx Mask:xx.xx.xx.xx
> >> ...
> >> lo Link encap:Local Loopback
> >> inet addr:127.0.0.1 Mask:255.0.0.0
> >> """
> >> matches = re.findall(pat, val)
> >> So - why doesn't python have this? is it something that simply was
> >> overlooked, or is there another method of doing the same thing with
> >> arbitrarily complex freeform records?
> >> thanks much..
> >> _______________________________________________
> >> Python-Dev mailing list
> >> Python-Dev at python.org
> >> https://mail.python.org/mailman/listinfo/python-dev
> >> Unsubscribe:
> >> https://mail.python.org/mailman/options/python-dev/guido%40python.org
> > --
> > --Guido van Rossum (python.org/~guido)
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> > https://mail.python.org/mailman/options/python-dev/
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-Dev