PEP 511 (code transformers) rejected

Hi, I rejected my own PEP 511 "API for code transformers" that I wrote in January 2016: https://github.com/python/peps/commit/9d8fd950014a80324791d7dae3c130b1b64fda... Rejection Notice: """ This PEP was rejected by its author. This PEP was seen as blessing new Python-like programming languages which are close but incompatible with the regular Python language. It was decided to not promote syntaxes incompatible with Python. This PEP was also seen as a nice tool to experiment new Python features, but it is already possible to experiment them without the PEP, only with importlib hooks. If a feature becomes useful, it should be directly part of Python, instead of depending on an third party Python module. Finally, this PEP was driven was the FAT Python optimization project which was abandonned in 2016, since it was not possible to show any significant speedup, but also because of the lack of time to implement the most advanced and complex optimizations. """ Victor

I find this sad. In the JavaScript community the existence of Babel is very important for the long-term evolution of the language independently from the runtime. With Babel, JavaScript programmers can utilize new language syntax while being able to deploy on dated browsers. While there's always some experimentation, I doubt our community would abuse the new syntactic freedom that the PEP provided. Then again, maybe we should do what Babel did, e.g. release a tool like it totally separately from the runtime. - Ł

On 2 November 2017 at 09:16, Lukasz Langa <lukasz@langa.pl> wrote:
Right, I think python-modernize and python-future provide a better model for Babel equivalents in Python than anything built directly into CPython would. In many ways, python-future's pasteurize already *is* that kind of polyfill, where you get to write nice modern Python yourself, and then ask pasteurize to mess it up so it also works on Python 2.7: http://python-future.org/pasteurize.html The piece that we're currently missing to make such workflows easier to manage is an equivalent of JavaScript's source maps ( http://blog.teamtreehouse.com/introduction-source-maps), together with debugging tools that are able to use source map information to generate nice tracebacks, even when the original sources are unavailable. Source maps could also potentially help with getting meaningful tracebacks in other contexts, like bytecode-only deployments and Cython extension modules (for example, the traceback problem is the main reason Red Hat's Python container images still have the source code in them - when that was last measured, you could get an image size reduction of around 15% by including only the pyc files and omitting the original sources, but it wasn't worth it when it came at the cost of making tracebacks unreadable). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

(Email resent, I first sent it to Nick privately by mistake.) 2017-11-02 2:53 GMT+01:00 Nick Coghlan <ncoghlan@gmail.com>:
The piece that we're currently missing to make such workflows easier to manage is an equivalent of JavaScript's source maps (...)
Code objects already have a instruction pointer => line number mapping table: the code.co_lnotab field. It's documented at: https://github.com/python/cpython/blob/master/Objects/lnotab_notes.txt This table is built from the line number information of the AST tree. The mapping table is optimized to be small. Before Python 3.5, line number had to be monotonic. Since Python 3.6, you "move" instructions at the AST level, and so have "non-monotonic" line numbers (ex: line 1, line 3, line 2, line 4). Victor

On 2 November 2017 at 23:42, Victor Stinner <victor.stinner@gmail.com> wrote:
(Email resent, I first sent it to Nick privately by mistake.)
Oops, I didn't even notice that. Reposting my reply below.
Right, and linecache knows how to read that, However, it can only do so if the source files are on the running system with the bytecode files, *and* the source code we're interested in is the source code that was actually compiled by the interpreter. Source code transformers fail that second precondition (since the interpreter only sees the post-transformation code), and this is one of the big reasons folks ended up writing actual single source 2/3 compatible code bases rather than running 2to3 as a source code transformer when building packages: with transformed source, conventional tracebacks quote the line from the transformed source code, *not* the line in the original pre-transformation source code. However, if the code transformer were to emit a JavaScript style source map in addition to emitting the transformed code, then automated tooling could take a traceback that referenced lines in the transformed code, and work out the equivalent traceback for the pre-transformation code. (I believe Cython has something like that in order to provide it's HTML annotation mode, and PyPy's JIT can trace from machine code back to the related Python source lines, but we don't have anything that's independent of a particular toolchain the way source maps are) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, 1 Nov 2017 at 16:17 Lukasz Langa <lukasz@langa.pl> wrote:
I think the trick here would be getting people more comfortable with ahead-of-time compilation and then adding the appropriate support to bytecode files to load other "optimization" levels/tags. Then you load the .pyc files and rely on co_lnotab as Victor pointed out to get your source mapping by compiling your source code explicitly instead of as a side-effect of import. And since this approach would then just be about generalizing how to specify different tags to match against in .pyc file names it's easier to get accepted. -Brett

On 3 November 2017 at 03:19, Brett Cannon <brett@python.org> wrote:
I'm not sure it's quite that simple, as you still need to define: - how does the import system know that a given input file is a "cache-only" import? - how do linecache and similar tools know what source file the pyc maps back to? Right now, the source-file/cache-file relationship is hardcoded in two functions: * https://docs.python.org/3/library/importlib.html#importlib.util.cache_from_s...; and * https://docs.python.org/3/library/importlib.html#importlib.util.source_from_... If we look at the code from hylang's custom importer for ".hy" files [1] we can see that the "cache_from_source" implementation has a convenient property: it ignores the source extension entirely, which means it works for input paths with arbitrary file extensions, not just Python source files. This means that hy's import system integration can use that helper, but if you have a "foo.hy" source file and a "__pycache__/foo-<cache-tags>.pyc" ouptut file, the regular import machinery will *ignore* the latter file, and you have to register Hy's customer importer in order for Python to acknowledge that the cached file exists. The reverse lookup, by contrast, always assumes that the source suffix is a ".py" file (which is already broken for "pyw" source files on Windows). Correcting for that at the standard library level would require changing the cache filename format to include an optional additional element: the source file extension (cache_to_source doesn't assume it has access to the pyc file itself - only the filename). So if we went down that path, then the import system level additions we'd want would probably be along the lines of: - an enhancement to the cache file naming scheme to allow source file extensions to be saved in PYC filenames - an update to the SourceFileLoader to use that new naming scheme when implicitly compiling source files with the pyw extension - a new "CacheOnlyLoader" together with a new CACHE_ONLY_SUFFIXES list - a new ".pyb" suffix (for "Python backport") as the sole default entry in CACHE_ONLY_SUFFIXES (awful pun alert: you could also argue that this suffix makes sense because "pyb files come before pyc files") To make this syntactic polyfill approach usable with older Python versions (including 2.7), importlib2 could be resynced to the first importlib version that supported this (importlib2 is currently up to date with Python 3.5's multi-phase initialisation support, since that was the last major functional change in importlib). Cheers, Nick. [1] https://github.com/hylang/hy/blob/master/hy/importer.py -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I find this sad. In the JavaScript community the existence of Babel is very important for the long-term evolution of the language independently from the runtime. With Babel, JavaScript programmers can utilize new language syntax while being able to deploy on dated browsers. While there's always some experimentation, I doubt our community would abuse the new syntactic freedom that the PEP provided. Then again, maybe we should do what Babel did, e.g. release a tool like it totally separately from the runtime. - Ł

On 2 November 2017 at 09:16, Lukasz Langa <lukasz@langa.pl> wrote:
Right, I think python-modernize and python-future provide a better model for Babel equivalents in Python than anything built directly into CPython would. In many ways, python-future's pasteurize already *is* that kind of polyfill, where you get to write nice modern Python yourself, and then ask pasteurize to mess it up so it also works on Python 2.7: http://python-future.org/pasteurize.html The piece that we're currently missing to make such workflows easier to manage is an equivalent of JavaScript's source maps ( http://blog.teamtreehouse.com/introduction-source-maps), together with debugging tools that are able to use source map information to generate nice tracebacks, even when the original sources are unavailable. Source maps could also potentially help with getting meaningful tracebacks in other contexts, like bytecode-only deployments and Cython extension modules (for example, the traceback problem is the main reason Red Hat's Python container images still have the source code in them - when that was last measured, you could get an image size reduction of around 15% by including only the pyc files and omitting the original sources, but it wasn't worth it when it came at the cost of making tracebacks unreadable). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

(Email resent, I first sent it to Nick privately by mistake.) 2017-11-02 2:53 GMT+01:00 Nick Coghlan <ncoghlan@gmail.com>:
The piece that we're currently missing to make such workflows easier to manage is an equivalent of JavaScript's source maps (...)
Code objects already have a instruction pointer => line number mapping table: the code.co_lnotab field. It's documented at: https://github.com/python/cpython/blob/master/Objects/lnotab_notes.txt This table is built from the line number information of the AST tree. The mapping table is optimized to be small. Before Python 3.5, line number had to be monotonic. Since Python 3.6, you "move" instructions at the AST level, and so have "non-monotonic" line numbers (ex: line 1, line 3, line 2, line 4). Victor

On 2 November 2017 at 23:42, Victor Stinner <victor.stinner@gmail.com> wrote:
(Email resent, I first sent it to Nick privately by mistake.)
Oops, I didn't even notice that. Reposting my reply below.
Right, and linecache knows how to read that, However, it can only do so if the source files are on the running system with the bytecode files, *and* the source code we're interested in is the source code that was actually compiled by the interpreter. Source code transformers fail that second precondition (since the interpreter only sees the post-transformation code), and this is one of the big reasons folks ended up writing actual single source 2/3 compatible code bases rather than running 2to3 as a source code transformer when building packages: with transformed source, conventional tracebacks quote the line from the transformed source code, *not* the line in the original pre-transformation source code. However, if the code transformer were to emit a JavaScript style source map in addition to emitting the transformed code, then automated tooling could take a traceback that referenced lines in the transformed code, and work out the equivalent traceback for the pre-transformation code. (I believe Cython has something like that in order to provide it's HTML annotation mode, and PyPy's JIT can trace from machine code back to the related Python source lines, but we don't have anything that's independent of a particular toolchain the way source maps are) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, 1 Nov 2017 at 16:17 Lukasz Langa <lukasz@langa.pl> wrote:
I think the trick here would be getting people more comfortable with ahead-of-time compilation and then adding the appropriate support to bytecode files to load other "optimization" levels/tags. Then you load the .pyc files and rely on co_lnotab as Victor pointed out to get your source mapping by compiling your source code explicitly instead of as a side-effect of import. And since this approach would then just be about generalizing how to specify different tags to match against in .pyc file names it's easier to get accepted. -Brett

On 3 November 2017 at 03:19, Brett Cannon <brett@python.org> wrote:
I'm not sure it's quite that simple, as you still need to define: - how does the import system know that a given input file is a "cache-only" import? - how do linecache and similar tools know what source file the pyc maps back to? Right now, the source-file/cache-file relationship is hardcoded in two functions: * https://docs.python.org/3/library/importlib.html#importlib.util.cache_from_s...; and * https://docs.python.org/3/library/importlib.html#importlib.util.source_from_... If we look at the code from hylang's custom importer for ".hy" files [1] we can see that the "cache_from_source" implementation has a convenient property: it ignores the source extension entirely, which means it works for input paths with arbitrary file extensions, not just Python source files. This means that hy's import system integration can use that helper, but if you have a "foo.hy" source file and a "__pycache__/foo-<cache-tags>.pyc" ouptut file, the regular import machinery will *ignore* the latter file, and you have to register Hy's customer importer in order for Python to acknowledge that the cached file exists. The reverse lookup, by contrast, always assumes that the source suffix is a ".py" file (which is already broken for "pyw" source files on Windows). Correcting for that at the standard library level would require changing the cache filename format to include an optional additional element: the source file extension (cache_to_source doesn't assume it has access to the pyc file itself - only the filename). So if we went down that path, then the import system level additions we'd want would probably be along the lines of: - an enhancement to the cache file naming scheme to allow source file extensions to be saved in PYC filenames - an update to the SourceFileLoader to use that new naming scheme when implicitly compiling source files with the pyw extension - a new "CacheOnlyLoader" together with a new CACHE_ONLY_SUFFIXES list - a new ".pyb" suffix (for "Python backport") as the sole default entry in CACHE_ONLY_SUFFIXES (awful pun alert: you could also argue that this suffix makes sense because "pyb files come before pyc files") To make this syntactic polyfill approach usable with older Python versions (including 2.7), importlib2 could be resynced to the first importlib version that supported this (importlib2 is currently up to date with Python 3.5's multi-phase initialisation support, since that was the last major functional change in importlib). Cheers, Nick. [1] https://github.com/hylang/hy/blob/master/hy/importer.py -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (4)
-
Brett Cannon
-
Lukasz Langa
-
Nick Coghlan
-
Victor Stinner