[Python-ideas] Incorporating something like byteplay into the stdlib

Fri Feb 12 07:05:21 EST 2016

Hi,

2016-02-12 4:58 GMT+01:00 Andrew Barnert via Python-ideas
<python-ideas at python.org>:
> tl;dr: We should turn dis.Bytecode into a builtin mutable structure similar to byteplay.Code, to make PEP 511 bytecode transformers implementable.

Hum, it looks like your email is highly coupled to the PEP 511.

First of all, I really want to support bytecode transformer because I
would like to be able to disable the peephole optimizer. Having the
peephole registered in code transformers as AST transformers make the
whole PEP more consistent. There is no more special case for peephole
optimizer.

> Why?
>
> ----
>
> The first problem is that the API for bytecode transformers is incomplete. You don't get, e.g., varnames, so you can't even write a transformer that looks up or modifies locals. But that part is easy to fix, so I won't dwell on it.

I agree that we can enhance the Python stdlib to ease manipulation of
bytecode, but I disagree that it's a requirement. It's ok to use an
external library (like byteplay) for that.

> The second problem is that bytecode is just painful to work with. The peephole optimizer deals with this by just punting and returning the original code whenever it sees anything remotely complicated (which I don't think we want to encourage for all bytecode transformers), and it's _still_ pretty hairy code. And it's the kind of code that can easily harbor hard-to-spot and harder-to-debug bugs (line numbers occasionally off on tracebacks, segfaults on about 1/255 programs that do something uncommon, that kind of fun stuff).

Sorry, I don't understand. Are you writing that the CPython peephole
optimizer produces invalid code? Or are you talking about bugs in your
own code? I'm not aware of bugs in the peephole optimizer.

> The compiler is already doing this work. There's no reason every bytecode processor should have to repeat all of it. I'm not so worried about performance here--technically, fixup is worst-case quadratic, but practically I doubt we spend much time on it--but simplicity. Why should everyone have to repeat dozens of lines of complicated code to write a simple 10-line transformer?

Hum, are you talking about the API proposed in the PEP 511?

I understand that you are saying the API only takes a whole code
object as input and produces a code object as output. An optimizer
usually needs a different structure to be able to modify the code. If
you have multiple bytecode transformers, you have to repeat these
"disassemble" and "assemble" steps, right?

I don't think that we will have plently bytecode optimizers in the
wild. Even if two or three major bytecode optimizers become popular,
are you sure that we will want to combine them? I expect that a single
optimizer implements *all* optimizations. I don't see the point of
running multiple optimizers to implement multiple optimizations steps.
For example, my fatoptimizer AST optimizer implements multiple steps,
but *internally*. It is only called once on the AST.

I don't think that performance of importing modules really matters. My
PEP 511 is mostly written for compilation ahead of time, to support
complex and expensive optimizers.

The real problem is to run a script: "python script.py" always has to
execute all code transformers. For scripts, I hesitate to simply
disable expensive optimizers, or maybe even disable all optimizers.
For example, a script can run less than 50 ms, is it worth to spend 10
ms to optimize it to a get speedup of 1 ms? (no)

The problem of using a specific format for bytecode rather than a code
object is that we will have to maintain it. I'm not sure that all
bytecode optimizer want the same internal structures. For some kind of
optimizations, a sequential list of instructions is enough. For some
other optimizations, you need to split blocks of code to have a
representation of the exacty "control flow". I'm not sure that one
structure is enough to cover all cases. So I prefer to let optimizers
"disassemble" and "assemble" themself the bytecode.

Last point, the PEP 511 has to take in account the existing peephole
optimizer implement in C. If you really want to use a different
structure, you will have to reimplement the peephole optimizer with
your new API. Since my target is AST, I'm not really interested by
that :-)

What do you think?

> I played around with a few possibilities from fixed-width bytecode and uncompressed lnotab to a public version of the internal assembler structs, but I think the best one is a flat sequence of instructions, with pseudo-instructions for labels and line numbers, and jump targets just references to those label instructions.

Could you try to reimplement the whole peephole optimizer to see if it
benefit of your design?

I played with bytecode in the past. At the send, I started to
implement optimizations which can be implemented simpler at AST level.

Why do you prefer bytecode over AST? Your example of converting
globals to constants became trivial to implement using my new
ast.Constant node.

>  * It needs a C API, and probably a C implementation.

I don't like extending the Python C API, it is already very large, we
have too many functions :-p

A C API is more expensive to maintain than a Python API.

I would prefer to continue to play with an external module (hosted on
PyPI) to not pay the price of maintenance!

By the way, what is the final goal? Do you plan to implement a new
ultra optimized bytecode optimizer? If yes, do you plan to integrate
it into CPython? If no, I don't think that we have to pay the price of
maintenance for such "toy" project.

The design of my PEP 511 is to allow to support pluggable and
*external* optimizers. I don't think that any code optimizer in the
wild is mature enough to enter into CPython directly.

> Anyway, I realize this API is still a little vague, (...)

It doesn't fit requirements to put something into the Python stdlib.
Usually, we experiment stuff on PyPI, wait until it becomes mature,
and then propose to integrate it.

It looks like you are talking about creating a new API and directly
put it into the stdlib, right?

Are you sure that it will not change next 2 years? Not any single minor change?

> We could just pass code objects (or all the separate pieces, instead of some of them), and then the docs could suggest using byteplay for non-trivial bytecode transformers, and then everyone will just end up using byteplay.

Again, you need to elaborate your rationale. What are your use cases?
Which kind of optimizations do you want to implement?

> So, what's wrong with that? The biggest problem is that, after each new Python release, anyone using a bytecode transformer will have to wait until byteplay is updated before they can update Python.

Why not contributing to byteplay to support the next CPython release?
I don't understand your problem here.

> Or we could just remove bytecode transformers from PEP 511. PEP 511 still seems worth doing to me, even if it only has AST transformers, especially since all or nearly all of the examples anyone's come up with for it are implementable (and easier to implement) at the AST level.

I want to plug the existing peephole optimizer into my PEP 511 since
it is an obvious *code* transformer. Even if changes are minor, some
users want to disable it because it really changes the code. It would
help code coverage for example.

Victor