[Python-ideas] PEP 511: API for code transformers

Fri Jan 15 16:14:17 EST 2016

Wow, giant emails (as mine, ok).

2016-01-15 20:41 GMT+01:00 Andrew Barnert <abarnert at yahoo.com>:
> * You can register transformers in any order, and they're run in the order specified, first all the AST transformers, then all the code transformers. That's very weird; it seems like it would be conceptually simpler to have a list of AST transformers, then a separate list of code transformers.

The goal is to have a short optimizer tag. I'm not sure yet that it
makes sense, but I would like to be able to transform AST and bytecode
in a single code transformer. I prefer to add a single get/set
function to sys, instead of two (4 new functions).

> * Why are transformers objects with ast_transformer and code_transformer methods, but those methods don't take self?

They take self. It's just a formating issue (a mistake in the PEP) :-)
They take self parameter, see examples.
https://www.python.org/dev/peps/pep-0511/#bytecode-transformer

It's just hard to format a PEP correctly when you know Sphinx :-) I
started to use ".. method:: ..." but it doesn't work, it's the simpler
reST format ;-)

> It seems like the only advantage to require attaching them to a class is to associate each one with a name

I started with a function, but it's a little bit weird to set a name
attribute to a function (func.name = "fat"). Moreover, it's convenient
to store some data in the object. In fatoptimizer, I store the
configuration. Even in the most simple AST transformer example of the
PEP, the constructor creates an object:
https://www.python.org/dev/peps/pep-0511/#id1

It may be possible to use functions, but classes are just more
"natural" in Python.

> And is there ever a good use case for putting both in the same class, given that the code transformer isn't going to run on the output of the AST transformer but rather on the output of all subsequent AST transformers and all preceding code transformers?

The two methods are disconnected, but they are linked by the optimizer
tag. IMHO it makes sense to implement all optimizations (crazy stuff
in AST, simple optimizer like peephole on bytecode) in a single code
transformer. It avoids to use a long optimizer tag like
"fat_ast-fat_bytecode". I also like short filenames.

> * Why does the code transformer only take consts and names? Surely you need varnames, and many of the other properties of code objects. And what's the use of lnotab if you can't set the base file and line? In fact, why not just pass a code object?

To be honest, I don't feel confortable with a function taking 5
parameters which has to return a tuple of 4 items :-/ Especially if
it's only the first version, we may have to add more items.

code_transformer() API comes from PyCode_Optimize() API: the CPython
peephole optimizer.

PyAPI_FUNC(PyObject*) PyCode_Optimize(PyObject *code, PyObject* consts,
                                      PyObject *names, PyObject *lnotab);

The function modifies lntotab in-place and returns the modified code.

Passing a whole code object makes the API much simpler and code
objects contain all information. I take your suggestion, thanks.

> * It seems like 99% of all ast_transformer methods are just going to construct and apply an ast.NodeTransformer subclass. Why not just register the NodeTransformer subclass?

fatoptimizer doesn't use ast.NodeTransformer ;-)

ast.NodeTransformer has a naive and inefficent design. For example,
fatoptimizer uses a metaclass to only create the mapping of visitors
once (visit_xxx methods). My transformer copies modified nodes to
leave the input tree unchanged. I need this to be able to duplicate a
tree later (to specialize functions).

(Maybe I can proposed to enhance ast.NodeTransformer, but that's a
different topic.)

> * There are other reasons to write AST and bytecode transformations besides optimization. MacroPy, which you mentioned, is an obvious example. But also, playing with new ideas for Python is a lot easier if you can do most of it with a simple hook that only makes you deal with the level you care about, rather than hacking up everything from the grammar to the interpreter. So, that's an additional benefit you might want to mention in your proposal.

I wrote "A preprocessor has various and different usages." Maybe I can
elaborate :-)

It looks like it is possible to "implement" f-string (PEP 498) using
macros. I think that it's a good example of experimenting evolutions
of the language (without having to modify the C code which is much
more complex, Yury Selivanov may want to share his experience here for
this async/await PEP ;-)).

> * In fact, I think this PEP could be useful even if the other two were rejected, if rewritten a bit.

Yeah, I tried to split changes to make them independant.

Only PEP 509 (dict version) is linked to PEP 510 (func specialize).
Even alone, the PEP 509 can be used to implement the "copy globals to
locals/constants" optimization mentioned in the PEP (at least two
developers proposed changes to implement! it was also in Unladen
Swallow plans).

> * It might be useful to have an API that handled bytes and text (and tokens, but that requires refactoring the token stream API, which is a separate project) as well as AST and bytecode.
> (...)
> Is there a reason you can't add text_transformer as well?

I don't know this part of the compiler.

Does Python already has an API to manipulate tokens, etc.? What about
other Python implementations?

I proposed AST transformers because it's already commonly used in the wild.

I also proposed bytecode to replace the peephole optimizer: make it
optional and maybe implement a new one (in Python to be more easily be
maintenable?).

The Hy language uses its own parser and emits Python AST. Why not
using this design?

> (...) e.g., if I'm using a text transformer, (...)

IHMO you are going too far and it becomes out of the scope of the PEP.

You should also read the previous discussion:
https://mail.python.org/pipermail/python-dev/2012-August/121309.html

> * It seems like doing any non-trivial bytecode transforms will still require a third-party library like byteplay (which has trailed 2.6, 2.7, 3.x in general, and each new 3.x version by anywhere from 3 months to 4 years). Have you considered integrating some of that functionality into Python itself?

To be honest, right now, I'm focsed on fatoptimizer. I don't want to
integrate it in the stdlib because:

* it's incomplete: see the giant
https://fatoptimizer.readthedocs.org/en/latest/todo.html list if you
are bored
* the stdlib is moving ... is not really moving... well, the
development process is way too slow for such very young project
* fatoptimizer still changes the Python semantics in subtle ways which
should be tested in large applications and discussed point per point
* etc.

It's way too early to discuss that (at least for fatoptimizer).

Since pip becomes standard, I don't think that it's real issue in practice.

> Even if that's out of scope, a paragraph explaining how to use byteplay with a code_transformer, and why it isn't integrated into the proposal, might be helpful.

byteplay doesn't seem to be maintained anymore. Last commit in 2010...

IHMO you can do the same than byteplay on the AST with much simpler
code. I only mentioned some projects modifying bytecode to pick ideas
of what can be done with a code transformer.

I don't think that it's worth to add more examples than the two "Ni!
Ni! Ni!" examples.

> * One thing I've always wanted is a way to write decorators that transform at the AST level. But code objects only have bytecode and source;

You should take a look at MacroPy, it looks like it has some crazy
stuff to modify the AST and compile at runtime. I'm not sure, I never
used MacroPy, I only read its documentation to generalize my PEP ;-)

Modifying and recompiling the code at runtime (using AST, something
higher level than bytecode) sounds like a Lisp feature and like JIT
compiler, two cool stuff ;)

Victor