[Python-ideas] PEP 511: API for code transformers

Victor Stinner victor.stinner at gmail.com
Fri Jan 15 11:10:25 EST 2016


Hi,

This PEP 511 is part of a serie of 3 PEP (509, 510, 511) adding an API
to implement a static Python optimizer specializing functions with
guards.

If the PEP is accepted, it will solve a long list of issues, some
issues are old, like #1346238 which is 11 years old ;-) I found 12
issues:

* http://bugs.python.org/issue1346238
* http://bugs.python.org/issue2181
* http://bugs.python.org/issue2499
* http://bugs.python.org/issue2506
* http://bugs.python.org/issue4264
* http://bugs.python.org/issue7682
* http://bugs.python.org/issue10399
* http://bugs.python.org/issue11549
* http://bugs.python.org/issue17068
* http://bugs.python.org/issue17430
* http://bugs.python.org/issue17515
* http://bugs.python.org/issue26107

I worked to make the PEP more generic that "this hook is written for
FAT Python". Please read the full PEP to see a long list of existing
usages in Python of code transformers.

You may read again the discussion which occurred 4 years ago about the
same topic:
https://mail.python.org/pipermail/python-dev/2012-August/121286.html
(the thread starts with an idea of AST optimizer, but is moves quickly
to a generic API to transform the code)

Thanks to Red Hat for giving me time to experiment on this.
Victor


HTML version:
https://www.python.org/dev/peps/pep-0510/#changes


PEP: 511
Title: API for code transformers
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner at gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 4-January-2016
Python-Version: 3.6

Abstract
========

Propose an API to register bytecode and AST transformers. Add also ``-o
OPTIM_TAG`` command line option to change ``.pyc`` filenames, ``-o
noopt`` disables the peephole optimizer. Raise an ``ImportError``
exception on import if the ``.pyc`` file is missing and the code
transformers required to transform the code are missing.  code
transformers are not needed code transformed ahead of time (loaded from
``.pyc`` files).


Rationale
=========

Python does not provide a standard way to transform the code. Projects
transforming the code use various hooks. The MacroPy project uses an
import hook: it adds its own module finder in ``sys.meta_path`` to
hook its AST transformer. Another option is to monkey-patch the
builtin ``compile()`` function. There are even more options to
hook a code transformer.

Python 3.4 added a ``compile_source()`` method to
``importlib.abc.SourceLoader``. But code transformation is wider than
just importing modules, see described use cases below.

Writing an optimizer or a preprocessor is out of the scope of this PEP.

Usage 1: AST optimizer
----------------------

Transforming an Abstract Syntax Tree (AST) is a convenient
way to implement an optimizer. It's easier to work on the AST than
working on the bytecode, AST contains more information and is more high
level.

Since the optimization can done ahead of time, complex but slow
optimizations can be implemented.

Example of optimizations which can be implemented with an AST optimizer:

* `Copy propagation
  <https://en.wikipedia.org/wiki/Copy_propagation>`_:
  replace ``x=1; y=x`` with ``x=1; y=1``
* `Constant folding
  <https://en.wikipedia.org/wiki/Constant_folding>`_:
  replace ``1+1`` with ``2``
* `Dead code elimination
  <https://en.wikipedia.org/wiki/Dead_code_elimination>`_

Using guards (see the `PEP 510
<https://www.python.org/dev/peps/pep-0510/>`_), it is possible to
implement a much wider choice of optimizations. Examples:

* Simplify iterable: replace ``range(3)`` with ``(0, 1, 2)`` when used
  as iterable
* `Loop unrolling <https://en.wikipedia.org/wiki/Loop_unrolling>`_
* Call pure builtins: replace ``len("abc")`` with ``3``
* Copy used builtin symbols to constants
* See also `optimizations implemented in fatoptimizer
  <https://fatoptimizer.readthedocs.org/en/latest/optimizations.html>`_,
  a static optimizer for Python 3.6.

The following issues can be implemented with an AST optimizer:

* `Issue #1346238
  <https://bugs.python.org/issue1346238>`_: A constant folding
  optimization pass for the AST
* `Issue #2181 <http://bugs.python.org/issue2181>`_:
  optimize out local variables at end of function
* `Issue #2499 <http://bugs.python.org/issue2499>`_:
  Fold unary + and not on constants
* `Issue #4264 <http://bugs.python.org/issue4264>`_:
  Patch: optimize code to use LIST_APPEND instead of calling list.append
* `Issue #7682 <http://bugs.python.org/issue7682>`_:
  Optimisation of if with constant expression
* `Issue #10399 <https://bugs.python.org/issue10399>`_: AST
  Optimization: inlining of function calls
* `Issue #11549 <http://bugs.python.org/issue11549>`_:
  Build-out an AST optimizer, moving some functionality out of the
  peephole optimizer
* `Issue #17068 <http://bugs.python.org/issue17068>`_:
  peephole optimization for constant strings
* `Issue #17430 <http://bugs.python.org/issue17430>`_:
  missed peephole optimization


Usage 2: Preprocessor
---------------------

A preprocessor can be easily implemented with an AST transformer. A
preprocessor has various and different usages.

Some examples:

* Remove debug code like assertions and logs to make the code faster to
  run it for production.
* `Tail-call Optimization <https://en.wikipedia.org/wiki/Tail_call>`_
* Add profiling code
* `Lazy evaluation <https://en.wikipedia.org/wiki/Lazy_evaluation>`_:
  see `lazy_python <https://github.com/llllllllll/lazy_python>`_
  (bytecode transformer) and `lazy macro of MacroPy
  <https://github.com/lihaoyi/macropy#lazy>`_ (AST transformer)
* Change dictionary literals into collection.OrderedDict instances
* Declare constants: see `@asconstants of codetransformer
  <https://pypi.python.org/pypi/codetransformer>`_
* Domain Specific Language (DSL) like SQL queries. The
  Python language itself doesn't need to be modified. Previous attempts
  to implement DSL for SQL like `PEP 335 - Overloadable Boolean
  Operators <https://www.python.org/dev/peps/pep-0335/>`_ was rejected.
* Pattern Matching of functional languages
* String Interpolation, but `PEP 498 -- Literal String Interpolation
  <https://www.python.org/dev/peps/pep-0498/>`_ was merged into Python
  3.6.

`MacroPy <https://github.com/lihaoyi/macropy>`_ has a long list of
examples and use cases.

This PEP does not add any new code transformer. Using a code transformer
will require an external module and to register it manually.

See also `PyXfuscator <https://bitbucket.org/namn/pyxfuscator>`_: Python
obfuscator, deobfuscator, and user-assisted decompiler.


Usage 3: Disable all optimization
---------------------------------

Ned Batchelder asked to add an option to disable the peephole optimizer
because it makes code coverage more difficult to implement. See the
discussion on the python-ideas mailing list: `Disable all peephole
optimizations
<https://mail.python.org/pipermail/python-ideas/2014-May/027893.html>`_.

This PEP adds a new ``-o noopt`` command line option to disable the
peephole optimizer. In Python, it's as easy as::

    sys.set_code_transformers([])

It will fix the `Issue #2506 <https://bugs.python.org/issue2506>`_: Add
mechanism to disable optimizations.


Usage 4: Write new bytecode optimizers in Python
------------------------------------------------

Python 3.6 optimizes the code using a peephole optimizer. By
definition, a peephole optimizer has a narrow view of the code and so
can only implement basic optimizations. The optimizer rewrites the
bytecode. It is difficult to enhance it, because it written in C.

With this PEP, it becomes possible to implement a new bytecode optimizer
in pure Python and experiment new optimizations.

Some optimizations are easier to implement on the AST like constant
folding, but optimizations on the bytecode are still useful. For
example, when the AST is compiled to bytecode, useless jumps can be
emited because the compiler is naive and does not try to optimize
anything.


Use Cases
=========

This section give examples of use cases explaining when and how code
transformers will be used.

Interactive interpreter
-----------------------

It will be possible to use code transformers with the interactive
interpreter which is popular in Python and commonly used to demonstrate
Python.

The code is transformed at runtime and so the interpreter can be slower
when expensive code transformers are used.

Build a transformed package
---------------------------

It will be possible to build a package of the transformed code.

A transformer can have a configuration. The configuration is not stored
in the package.

All ``.pyc`` files of the package must be transformed with the same code
transformers and the same transformers configuration.

It is possible to build different ``.pyc`` files using different
optimizer tags. Example: ``fat`` for the default configuration and
``fat_inline`` for a different configuration with function inlining
enabled.

A package can contain ``.pyc`` files with different optimizer tags.


Install a package containing transformed .pyc files
---------------------------------------------------

It will be possible to install a package which contains transformed
``.pyc`` files.

All ``.pyc`` files with any optimizer tag contained in the package are
installed, not only for the current optimizer tag.


Build .pyc files when installing a package
------------------------------------------

If a package does not contain any ``.pyc`` files of the current
optimizer tag (or some ``.pyc`` files are missing), the ``.pyc`` are
created during the installation.

Code transformers of the optimizer tag are required. Otherwise, the
installation fails with an error.


Execute transformed code
------------------------

It will be possible to execute transformed code.

Raise an ``ImportError`` exception on import if the ``.pyc`` file of the
current optimizer tag is missing and the code transformers required to
transform the code are missing.

The interesting point here is that code transformers are not needed to
execute the transformed code if all required ``.pyc`` files are already
available.


Code transformer API
====================

A code transformer is a class with ``ast_transformer()`` and/or
``code_transformer()`` methods (API described below) and a ``name``
attribute.

For efficiency, do not define a ``code_transformer()`` or
``ast_transformer()`` method if it does nothing.

The ``name`` attribute (``str``) must be a short string used to identify
an optimizer. It is used to build a ``.pyc`` filename. The name must not
contain dots (``'.'``), dashes (``'-'``) or directory separators: dots
are used to separated fields in a ``.pyc`` filename and dashes areused
to join code transformer names to build the optimizer tag.

.. note::
   It would be nice to pass the fully qualified name of a module in the
   *context* when an AST transformer is used to transform a module on
   import, but it looks like the information is not available in
   ``PyParser_ASTFromStringObject()``.


code_transformer()
------------------

Prototype::

    def code_transformer(code, consts, names, lnotab, context):
        ...
        return (code, consts, names, lnotab)

Parameters:

* *code*: the bytecode (``bytes``)
* *consts*: a sequence of constants
* *names*: tuple of variable names
* *lnotab*: table mapping instruction offsets to line numbers
  (``bytes``)

The code transformer is run after the compilation to bytecode


ast_transformer()
------------------

Prototype::

    def ast_transformer(tree, context):
        ...
        return tree

Parameters:

* *tree*: an AST tree
* *context*: an object with a ``filename`` attribute (``str``)

It must return an AST tree. It can modify the AST tree in place, or
create a new AST tree.

The AST transformer is called after the creation of the AST by the
parser and before the compilation to bytecode. New attributes may be
added to *context* in the future.


Changes
=======

In short, add:

* ``-o OPTIM_TAG`` command line option
* ``ast.Constant``
* ``ast.PyCF_TRANSFORMED_AST``
* ``sys.get_code_transformers()``
* ``sys.implementation.optim_tag``
* ``sys.set_code_transformers(transformers)``


API to get/set code transformers
--------------------------------

Add new functions to register code transformers:

* ``sys.set_code_transformers(transformers)``: set the list of code
  transformers and update ``sys.implementation.optim_tag``
* ``sys.get_code_transformers()``: get the list of code
  transformers.

The order of code transformers matter. Running transformer A and then
transformer B can give a different output than running transformer B an
then transformer A.

Example to prepend a new code transformer::

    transformers = sys.get_code_transformers()
    transformers.insert(0, new_cool_transformer)
    sys.set_code_transformers(transformers)

All AST tranformers are run sequentially (ex: the second transformer
gets the input of the first transformer), and then all bytecode
transformers are run sequentially.


Optimizer tag
-------------

Changes:

* Add ``sys.implementation.optim_tag`` (``str``): optimization tag.
  The default optimization tag is ``'opt'``.
* Add a new ``-o OPTIM_TAG`` command line option to set
  ``sys.implementation.optim_tag``.

Changes on ``importlib``:

* ``importlib`` uses ``sys.implementation.optim_tag`` to build the
  ``.pyc`` filename to importing modules, instead of always using
  ``opt``. Remove also the special case for the optimizer level ``0``
  with the default optimizer tag ``'opt'`` to simplify the code.
* When loading a module, if the ``.pyc`` file is missing but the ``.py``
  is available, the ``.py`` is only used if code optimizers have the
  same optimizer tag than the current tag, otherwise an ``ImportError``
  exception is raised.

Pseudo-code of a ``use_py()`` function to decide if a ``.py`` file can
be compiled to import a module::

    def transformers_tag():
        transformers = sys.get_code_transformers()
        if not transformers:
            return 'noopt'
        return '-'.join(transformer.name
                        for transformer in transformers)

    def use_py():
        return (transformers_tag() == sys.implementation.optim_tag)

The order of ``sys.get_code_transformers()`` matter. For example, the
``fat`` transformer followed by the ``pythran`` transformer gives the
optimizer tag ``fat-pythran``.

The behaviour of the ``importlib`` module is unchanged with the default
optimizer tag (``'opt'``).


Peephole optimizer
------------------

By default, ``sys.implementation.optim_tag`` is ``opt`` and
``sys.get_code_transformers()`` returns a list of one code transformer:
the peephole optimizer (optimize the bytecode).

Use ``-o noopt`` to disable the peephole optimizer. In this case, the
optimizer tag is ``noopt`` and no code transformer is registered.

Using the ``-o opt`` option has not effect.


AST enhancements
----------------

Enhancements to simplify the implementation of AST transformers:

* Add a new compiler flag ``PyCF_TRANSFORMED_AST`` to get the
  transformed AST. ``PyCF_ONLY_AST`` returns the AST before the
  transformers.
* Add ``ast.Constant``: this type is not emited by the compiler, but
  can be used in an AST transformer to simplify the code. It does not
  contain line number and column offset informations on tuple or
  frozenset items.
* ``PyCodeObject.co_lnotab``: line number delta becomes signed to
  support moving instructions (note: need to modify MAGIC_NUMBER in
  importlib). Implemented in the `issue #26107
  <https://bugs.python.org/issue26107>`_
* Enhance the bytecode compiler to support ``tuple`` and ``frozenset``
  constants. Currently, ``tuple`` and ``frozenset`` constants are
  created by the peephole transformer, after the bytecode compilation.
* ``marshal`` module: fix serialization of the empty frozenset singleton
* update ``Tools/parser/unparse.py`` to support the new ``ast.Constant``
  node type


Examples
========

.pyc filenames
--------------

Example of ``.pyc`` filenames of the ``os`` module.

With the default optimizer tag ``'opt'``:

===========================   ==================
.pyc filename                 Optimization level
===========================   ==================
``os.cpython-36.opt-0.pyc``                    0
``os.cpython-36.opt-1.pyc``                    1
``os.cpython-36.opt-2.pyc``                    2
===========================   ==================

With the ``'fat'`` optimizer tag:

===========================   ==================
.pyc filename                 Optimization level
===========================   ==================
``os.cpython-36.fat-0.pyc``                    0
``os.cpython-36.fat-1.pyc``                    1
``os.cpython-36.fat-2.pyc``                    2
===========================   ==================


Bytecode transformer
--------------------

Scary bytecode transformer replacing all strings with
``"Ni! Ni!  Ni!"``::

    import sys

    class BytecodeTransformer:
        name = "knights_who_say_ni"

        def code_transformer(self, code, consts, names, lnotab, context):
            consts = ['Ni! Ni! Ni!' if isinstance(const, str) else const
                      for const in consts]
            return (code, consts, names, lnotab)

    # replace existing code transformers with the new bytecode transformer
    sys.set_code_transformers([BytecodeTransformer()])

    # execute code which will be transformed by code_transformer()
    exec("print('Hello World!')")

Output::

    Ni! Ni! Ni!


AST transformer
---------------

Similary to the bytecode transformer example, the AST transformer also
replaces all strings with ``"Ni! Ni! Ni!"``::

    import ast
    import sys

    class KnightsWhoSayNi(ast.NodeTransformer):
        def visit_Str(self, node):
            node.s = 'Ni! Ni! Ni!'
            return node

    class ASTTransformer:
        name = "knights_who_say_ni"

        def __init__(self):
            self.transformer = KnightsWhoSayNi()

        def ast_transformer(self, tree, context):
            self.transformer.visit(tree)
            return tree

    # replace existing code transformers with the new AST transformer
    sys.set_code_transformers([ASTTransformer()])

    # execute code which will be transformed by ast_transformer()
    exec("print('Hello World!')")

Output::

    Ni! Ni! Ni!


Other Python implementations
============================

The PEP 511 should be implemented by all Python implementation, but the
bytecode and the AST are not standardized.

By the way, even between minor version of CPython, there are changes on
the AST API. There are differences, but only minor differences. It is
quite easy to write an AST transformer which works on Python 2.7 and
Python 3.5 for example.


Discussion
==========

* `[Python-Dev] AST optimizer implemented in Python
  <https://mail.python.org/pipermail/python-dev/2012-August/121286.html>`_
  (August 2012)


Prior Art
=========

AST optimizers
--------------

In 2011, Eugene Toder proposed to rewrite some peephole optimizations in
a new AST optimizer: issue #11549, `Build-out an AST optimizer, moving
some functionality out of the peephole optimizer
<https://bugs.python.org/issue11549>`_.  The patch adds ``ast.Lit`` (it
was proposed to rename it to ``ast.Literal``).

In 2012, Victor Stinner wrote the `astoptimizer
<https://bitbucket.org/haypo/astoptimizer/>`_ project, an AST optimizer
implementing various optimizations. Most interesting optimizations break
the Python semantics since no guard is used to disable optimization if
something changes.

In 2015, Victor Stinner wrote the `fatoptimizer
<http://fatoptimizer.readthedocs.org/>`_ project, an AST optimizer
specializing functions using guards.

The Issue #17515 `"Add sys.setasthook() to allow to use a custom AST"
optimizer <https://bugs.python.org/issue17515>`_ was a first attempt of
API for code transformers, but specific to AST.


Python Preprocessors
--------------------

* `MacroPy <https://github.com/lihaoyi/macropy>`_: MacroPy is an
  implementation of Syntactic Macros in the Python Programming Language.
  MacroPy provides a mechanism for user-defined functions (macros) to
  perform transformations on the abstract syntax tree (AST) of a Python
  program at import time.
* `pypreprocessor <https://code.google.com/p/pypreprocessor/>`_: C-style
  preprocessor directives in Python, like ``#define`` and ``#ifdef``


Bytecode transformers
---------------------

* `codetransformer <https://pypi.python.org/pypi/codetransformer>`_:
  Bytecode transformers for CPython inspired by the ``ast`` module’s
  ``NodeTransformer``.
* `byteplay <http://code.google.com/p/byteplay/>`_: Byteplay lets you
  convert Python code objects into equivalent objects which are easy to
  play with, and lets you convert those objects back into living Python
  code objects. It's useful for applying crazy transformations on Python
  functions, and is also useful in learning Python byte code
  intricacies. See `byteplay documentation
  <http://wiki.python.org/moin/ByteplayDoc>`_.

See also:

* `BytecodeAssembler <http://pypi.python.org/pypi/BytecodeAssembler>`_


Copyright
=========

This document has been placed in the public domain.


More information about the Python-ideas mailing list