[Python-ideas] Exposing regular expression bytecode

Stephen J. Turnbull stephen at xemacs.org
Tue Feb 16 21:02:34 EST 2016


Executive summary:  Very process-meta.  Suggestion: good GSoC project?

Jonathan Goble writes:

 > That's exactly the type of tools I envision being made available by
 > third parties.

Experience shows that such visions mostly remain dreams.  Dreaming is
good, but Python demands somewhat more for inclusion in the core.  Not
that much -- Victor Stinner's FAT Python is a good example.  (I think
that's being discussed on the python-dev list, easy to find in the
archives.)  What he's *actually* doing is (conceptually, I haven't
looked at the actual patch) a somewhat invasive modification of the
core compilation process.  But along with that he's demonstrated
several practical optimizations that are enabled by his change.[1]
(Note that he brought the patch with him when proposing his change,
too.  I guess he wrote the code when he woke up in the morning. :-)

That created a certain amount of buzz, and some people see the needed
changes as simplifying the whole process, which brought them on
board.

 > Depending on how much I get invested into this project,
 > I may even write such a tool myself (though that's not guaranteed).

This is exactly backwards from the point of view of getting it into
the stdlib.  What the deafening silence was saying (and what the
actual posts say!) is that nobody else is going to do it.  Features
that aren't going to be exploited fairly soon are complications, and
that is against a fundamental design principle (the "Zen of Python",
try "python -m this | grep -i comp" if you haven't seen it before).

Which gives me an idea: Victor proposed writing optimizations for FAT
Python as a Google Summer of Code project for students.  If you can
find an experienced developer to mentor with you, the basic change
sounds like an easy project for a student, and the tool like something
quite advanced but still feasible.  If you want to know more, write me
off-list.  I can't mentor, but I can help with the admin details.

FMI: https://wiki.python.org/moin/SummerOfCode/2016

 > On Tue, Feb 16, 2016 at 4:55 AM, Paul Moore <p.f.moore at gmail.com> wrote:
 > > Sorry. I don't personally have any issue with the proposal, and it
 > > sounds like a reasonable idea. I don't think it's likely to be
 > > *hugely* controversial

Agreed.  The only real risks in exposing an existing internal
attribute are (1) complication and (2) future maintenance cost for a
little-used feature (eg, if Python decides to anoint regex).  It might
languish in the tracker for quite a while if you can't demonstrate
real use cases, though.  (The educational aspect of being able to
merely list the compiled bytecodes readably might be enough, but I
would bet against it.)

 > I don't think this would be major enough to require a PEP,

Definitely, bet against that.  This doesn't change the language or
violate backward compatibility at all, and design looks quite
straightforward, including the potential tools (ISTR you saying you've
seen them for other regexp engines? at least the UI, and maybe the
algorithms, can be borrowed).

Footnotes: 
[1]  They all give 1-10% on microbenchmarks, so individually they're
insignificant.  But as the apocryphal congressman said about billions
of dollars, "1% here, 5% there, and pretty soon you're talking about
perceptible speedups", and that gets certain people excited.




More information about the Python-ideas mailing list