PEP 511: Add a check function to decide if a "language extension" code transformer should be used or not

Hi, Thank you for all feedback on my PEP 511. It looks like the current blocker point is the unclear status of "language extensions": code tranformers which deliberately changes the Python semantics. I would like to discuss how we should register them. I think that the PEP 511 must discuss "language extensions" even if it doesn't have to propose a solution to make their usage easier. It's an obvious usage of code transformers. If possible, I would like to find a compromise to support them, but make it explicit that they change the Python semantics. By the way, I discussed with Joseph Jevnik who wrote codetransformer (bytecode transformer) and lazy_python (AST transformer). He wrote me: "One concern that I have though is that transformers are registered globally. I think that the decorators in codetransformer do a good job of signalling to reader the scope of some new code generation." Currently, the PEP 511 doesn't provide a way to register a code transformer but only use it under some conditions. For example, if fatoptimizer is registered, all .pyc files will be called file.cpython-36.fat-0.pyc even if fatoptimizer was disabled. I propose to change the design of sys.set_code_transformers() to use it more like a registry similar to the codecs registry (codecs.register), but different (details below). A difference is that the codecs registry uses a mapping (codec name => codec functions), whereas sys.set_code_transformers() uses an ordered sequence (list) of code transformers. A sequence is used because multiple code transformers can be applied sequentially on a single .py file. Petr Viktorin wrote that language extensions "target specific modules, with which they're closely coupled: The modules won't run without the transformer. And with other modules, the transformer either does nothing (as with MacroPy, hopefully), or would fail altogether (as with Hy). So, they would benefit from specific packages opting in. The effects of enabling them globally range from inefficiency (MacroPy) to failures or needing workarounds (Hy)." Problem (A): solutions proposed below don't make code tranformers mandatory. If a code *requires* a code transformer and the code transformer is not registered, Python doesn't complain. Do you think that it is a real issue in practice? For MacroPy, it's not a problem in practice since functions must be decorated using a decorator from the macropy package. If importing macropy fails, the module cannot be imported. Problem (B): proposed solutions below adds markers to ask to enable a specific code transformer, but a code transformer can decide to always modify the Python semantics without using such marker. According to Nick Coghlan, code transformers changing the Python semantics *must* require a marker in the code using them. IMHO it's the responsability of the author of the code transformer to use markers, not the responsability of Python. Code transformers should maybe return a flag telling if they changed the code or not. I prefer a flag rather than comparing the output to the input, since the comparison can be expensive, especially for a deep AST tree. Example: class Optimizer: def ast_optimizer(self, tree, context): # ... return modified, tree *modified* must be True if tree was modified. There are several options to decide if a code transformer must be used on a specific source file. (1) Add a check_code() and check_ast() functions to code transformers. The code transformer is responsible to decide if it wants to transform the code or not. Python doesn't use the code transformer if the check method returns False. Examples: * MacroPy can search for the "import macropy" statement (of "from macropy import ...") in the AST tree * fatoptimizer can search for "__fatoptimizer__ = {'enabled': False}" in the code: if this variable is found, the optimizer is completly skipped (2) Petr proposed to extend importlib to pass a code transformer when importing a module. importlib.util.import_with_transformer( 'mypackage.specialmodule', MyTransformer()) IMHO this option is too specific: it's restricted to importlib (py_compile, compileall and interactive interpreter don't have the feature). I also dislike the API. (3) Petr also proposed "a special flag in packages": __transformers_for_submodules__ = [MyTransformer()] I don't like having to get access to MyTransformer. The PEP 511 mentions an use case where the transformed code is run *without* registering the transformer. But this issue can easily be fixed by using the string to identify the transformer in the registery (ex: "fat") rather than its class. I'm not sure that putting a flag on the package (package/__init__.py?) is a good idea. I would prefer to enable language extensions on individual files to restrict their scope. (4) Sjoerd Job Postmus proposed something similar but using a comment and not for packages, but any source file: #:Transformers modname.TransformerClassName, modname.OtherTransformerClassName The problem is that comments are not stored in the AST tree. I would prefer to use AST to decide if an AST transformer should be used or not. Note: I'm not really motived to extend the AST to start to include comments, or even code formatting (spaces, newlines, etc.). https://pypi.python.org/pypi/redbaron/ can be used if you want to transform a .py file without touching the format. But I don't think that AST must go to this direction. I prefer to keep AST simple. (5) Nick proposed (indirectly) to use a different filename (don't use ".py") for language extensions. This option works with my option (2): the context contains the filename which can be used to decide to enable or not the code transformer. I understand that the code transformer must also install an importlib hook to search for other filenames than only .py files. Am I right? (6) Nick proposed (indirectly) to use an encoding cookie "which are visible as a comment in the module header". Again, I dislike this option because comments are not stored in AST. Victor

On 01/27/2016 04:39 PM, Victor Stinner wrote:
I believe Nick meant that if a transformer modifies semantics of un-marked code, it would be considered a badly written transformer that doesn't play well with the rest of the language. The responsibility of Python is just to make it easy to do the right thing.
What would this flag be useful for?
There are several options to decide if a code transformer must be used on a specific source file.
[...]
[...]
Yes, you are. But once a custom import hook is in place, you can just use a regular import, the hack in (2) isn't necessary. Also, note that this would solve problem (A) -- without the hook enabled, the source won't be found.

On Jan 27, 2016, at 07:39, Victor Stinner <victor.stinner@gmail.com> wrote:
Is this really necessary? If someone is testing a language change locally, and just wants to use your (original) API for his tests instead of the more complicated alternative of building an import hook, it works fine. If he can't deploy that way, that's fine. If someone builds a transformer that adds a feature in a way that makes it a pure superset of Python, he should be fine with running it on all files, so your API works fine. And if some files that didn't use any of the new features get .pyc files that imply they did, so what? If someone builds a transformer that only runs on files with a different extension, he already needs an import hook, so he might as well just call his transformer from the input hook, same as he does today. So... What case is served by this new, more complicated API that wasn't already served by your original, simple one (remembering that import hooks are already there as a fallback)?
That doesn't really answer his question, unless you're trying to add some syntax that's like a decorator but for an entire module, to be used in addition to the existing more local class and function decorators?
It seems like you're trying to find a declarative alternative to every possible use for an imperative import hook. If you can pull that off, it would be cool--but is it really necessary for your proposal? Does your solution have to make it possible for MacroPy and Hy to drop their complicated import hooks and just register transformers, for it to be a useful solution? If the problem you're trying to solve is just making it easier for MacroPy and Hy to coexist with the new transformers, maybe just solve that. For example, if it's too hard for them to decorate .pyc names in a way that fits in with your system, maybe adding a function to get the pre-hook pyc name and to set the post-hook one (e.g., to insert "-pymacro-" in the middle of it) would be sufficient. If there's something that can't be solved in a similar way--e.g., if you think your proposal has to make macropy.console (or whatever he calls the "macros in the REPL" feature) either automatic or at least a lot easier--then maybe that's a different story, but it would be nice to see the rationale for why we need to solve that today. (Couldn't it be added in 3.7, after people have gotten experience with using 3.6 transformers?)

On Wed, 27 Jan 2016 at 08:49 Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
And the import hook is not that difficult. You can reuse everything from importlib without modification except for needing to override a single method in some loader to do your transformation ( https://docs.python.org/3/library/importlib.html#importlib.abc.InspectLoader...). Otherwise the only complication is instantiating the right classes and setting the path hook in `sys.path_hooks`.
As Victor pointed out, the discussion could end in "nothing changed, but we at least discussed it". I think both you and I currently agree that's the answer to his question. :) -Brett

My thought about decorators is that they allow obvious scoping of changes for the reader. Anything that becomes module scope or is implied based on system state that is set in another module will make debugging and reading much harder. Both lazy_python and codetransformer use bytecode manipulation; however, it is a purely opt-in system where the transformed function is decorated. This keeps the transformations in view while you are reading the code that is affected by them. I would find debugging a project much more difficult if I needed to remember that the order my modules were imported matters a lot because they setup a bunch of state. I am not sure why people want the module to be the smallest unit that is transformed when really it is the code object that should be the smallest unit. This means class bodies and functions. If we treat the module as the most atomic unit then you wouldn't be able to use something like `asconstants` This is a really great local optimzation when calling a function in a loop, especially builtins that you know will most likely never change and you don't want to change if they do. For example: In [1]: from codetransformer.transformers.constants import asconstants In [2]: @asconstants(a=1) ...: def f(): ...: return a ...: In [3]: a = 5 In [4]: f() Out[4]: 1 In [5]: @asconstants('pow') # string means use the built in for this name ...: def g(ns): ...: for n in ns: ...: yield pow(n, 2) ...: In [6]: list(g([1, 2, 3])) Out[6]: [1, 4, 9] In [7]: dis(g) 3 0 SETUP_LOOP 28 (to 31) 3 LOAD_FAST 0 (ns) 6 GET_ITER >> 7 FOR_ITER 20 (to 30) 10 STORE_FAST 1 (n) 13 LOAD_CONST 0 (<built-in function pow>) 16 LOAD_FAST 1 (n) 19 LOAD_CONST 1 (2) 22 CALL_FUNCTION 2 (2 positional, 0 keyword pair) 25 YIELD_VALUE 26 POP_TOP 27 JUMP_ABSOLUTE 7 >> 30 POP_BLOCK >> 31 LOAD_CONST 2 (None) 34 RETURN_VALUE This is a simple optimization that people emulate all the time with things like `sum_ = sum` before the loop or `def g(ns, *, _sum=sum)`. This cannot be used at module scope because often you only think it is safe or worth it to lock in the value for a small segment of code. Hopefully this use case is being considered as I think this is a very simple, non-semantics preserving case that is not very and also practical. On Wed, Jan 27, 2016 at 3:20 PM, Brett Cannon <brett@python.org> wrote:

On Jan 27, 2016, at 12:20, Brett Cannon <brett@python.org> wrote:
Unless it has to work in 2.7 and 3.3 (or, worse, 2.6 and 3.2). :)
You can reuse everything from importlib without modification except for needing to override a single method in some loader to do your transformation
Yes, as of 3.4, the design is amazing. In fact, hooking any level--lookup, source, AST, bytecode, or pyc--is about as easy as it could be. My only complaint is that it's not easy enough to find out how easy import hooks are. When I tell people "you could write a simple import hook to play with that idea", they get a look of fear and panic that's completely unwarranted and just drop their cool idea. (I wonder if having complete examples of a simple global-transformer hook and a simple special-extension hook at the start of the docs would be enough to solve that problem?) And I'm a bit worried that if Victor tries to make things like MacroPy and Hy easier, it still won't be enough for real-life cases, so all it'll do is discourage people from going right to writing import hooks and seeing how easy that already is.

On Wed, 27 Jan 2016 at 13:34 Andrew Barnert <abarnert@yahoo.com> wrote:
Sure, but you're already asking for a lot of pain if you're trying to be that compatible at the AST/bytecode level so I view this as the least of your worries. :)
So two things. One is that there is an Examples section in the importlib docs for 3.6: https://docs.python.org/3.6/library/importlib.html#examples . As of right now it only covers use-cases that the `imp` module provided since that's the most common thing I get asked about. Second, while it's much easier than it has ever been to do fancy stuff with import, it's a balancing act of promoting it and discouraging it. :) Mess up your import and it can be rather hard to debug. And this is especially true if you hook in early enough such that you start to screw up stuff in the stdlib and not just your own code. It can also lead to people going a bit overboard with things (hence why I kept my life simple with the LazyLoader and actively discourage its use unless you're sure you need it). So it's a balance of "look at this shiny thing!" and "be careful because you might come out screaming".
We don't need to empower every use-case as much as possible. While we're consenting adults, we also try to prevent people from making their own lives harder. All of this stuff is a tough balancing act to get right.

Hi, On 27.01.2016 16:39, Victor Stinner wrote:
I share this concern but haven't a good solution right now. Admittedly, I already have a use-case where I would like to apply a transformation which is NOT an optimization but a global extension. So, the discussion about allowing global extension really made me think about whether that is really a good idea. *BUT* it would allow me to experiment and find out if the risk is worth it. (use-case: adding some hooks before entering and leaving all try blocks)
How does it change the interface for the users? (I mean besides the renaming). I still like your idea of having the following three options: 1) global optimizers 2) local extensions --> via codec or import hook 3) global extension --> use with care So, I assume we talk about specifying 2).
Sounds good.
I agree with Nick. Be explicit.
Not sure if that is needed. If we don't have an immediate use-case, simpler is better.
There are several options to decide if a code transformer must be used on a specific source file.
The user should decide, otherwise there is too much magic involved: a marker (source file) or an option (cmdline). I am indifferent whether the marker should be a codec-decl or an import hook. But it should be file-local (at least I would prefer that). All of the options below seem to involve too much magic for my taste (or I didn't understand them correctly).
Best, Sven

On 01/27/2016 04:39 PM, Victor Stinner wrote:
I believe Nick meant that if a transformer modifies semantics of un-marked code, it would be considered a badly written transformer that doesn't play well with the rest of the language. The responsibility of Python is just to make it easy to do the right thing.
What would this flag be useful for?
There are several options to decide if a code transformer must be used on a specific source file.
[...]
[...]
Yes, you are. But once a custom import hook is in place, you can just use a regular import, the hack in (2) isn't necessary. Also, note that this would solve problem (A) -- without the hook enabled, the source won't be found.

On Jan 27, 2016, at 07:39, Victor Stinner <victor.stinner@gmail.com> wrote:
Is this really necessary? If someone is testing a language change locally, and just wants to use your (original) API for his tests instead of the more complicated alternative of building an import hook, it works fine. If he can't deploy that way, that's fine. If someone builds a transformer that adds a feature in a way that makes it a pure superset of Python, he should be fine with running it on all files, so your API works fine. And if some files that didn't use any of the new features get .pyc files that imply they did, so what? If someone builds a transformer that only runs on files with a different extension, he already needs an import hook, so he might as well just call his transformer from the input hook, same as he does today. So... What case is served by this new, more complicated API that wasn't already served by your original, simple one (remembering that import hooks are already there as a fallback)?
That doesn't really answer his question, unless you're trying to add some syntax that's like a decorator but for an entire module, to be used in addition to the existing more local class and function decorators?
It seems like you're trying to find a declarative alternative to every possible use for an imperative import hook. If you can pull that off, it would be cool--but is it really necessary for your proposal? Does your solution have to make it possible for MacroPy and Hy to drop their complicated import hooks and just register transformers, for it to be a useful solution? If the problem you're trying to solve is just making it easier for MacroPy and Hy to coexist with the new transformers, maybe just solve that. For example, if it's too hard for them to decorate .pyc names in a way that fits in with your system, maybe adding a function to get the pre-hook pyc name and to set the post-hook one (e.g., to insert "-pymacro-" in the middle of it) would be sufficient. If there's something that can't be solved in a similar way--e.g., if you think your proposal has to make macropy.console (or whatever he calls the "macros in the REPL" feature) either automatic or at least a lot easier--then maybe that's a different story, but it would be nice to see the rationale for why we need to solve that today. (Couldn't it be added in 3.7, after people have gotten experience with using 3.6 transformers?)

On Wed, 27 Jan 2016 at 08:49 Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
And the import hook is not that difficult. You can reuse everything from importlib without modification except for needing to override a single method in some loader to do your transformation ( https://docs.python.org/3/library/importlib.html#importlib.abc.InspectLoader...). Otherwise the only complication is instantiating the right classes and setting the path hook in `sys.path_hooks`.
As Victor pointed out, the discussion could end in "nothing changed, but we at least discussed it". I think both you and I currently agree that's the answer to his question. :) -Brett

My thought about decorators is that they allow obvious scoping of changes for the reader. Anything that becomes module scope or is implied based on system state that is set in another module will make debugging and reading much harder. Both lazy_python and codetransformer use bytecode manipulation; however, it is a purely opt-in system where the transformed function is decorated. This keeps the transformations in view while you are reading the code that is affected by them. I would find debugging a project much more difficult if I needed to remember that the order my modules were imported matters a lot because they setup a bunch of state. I am not sure why people want the module to be the smallest unit that is transformed when really it is the code object that should be the smallest unit. This means class bodies and functions. If we treat the module as the most atomic unit then you wouldn't be able to use something like `asconstants` This is a really great local optimzation when calling a function in a loop, especially builtins that you know will most likely never change and you don't want to change if they do. For example: In [1]: from codetransformer.transformers.constants import asconstants In [2]: @asconstants(a=1) ...: def f(): ...: return a ...: In [3]: a = 5 In [4]: f() Out[4]: 1 In [5]: @asconstants('pow') # string means use the built in for this name ...: def g(ns): ...: for n in ns: ...: yield pow(n, 2) ...: In [6]: list(g([1, 2, 3])) Out[6]: [1, 4, 9] In [7]: dis(g) 3 0 SETUP_LOOP 28 (to 31) 3 LOAD_FAST 0 (ns) 6 GET_ITER >> 7 FOR_ITER 20 (to 30) 10 STORE_FAST 1 (n) 13 LOAD_CONST 0 (<built-in function pow>) 16 LOAD_FAST 1 (n) 19 LOAD_CONST 1 (2) 22 CALL_FUNCTION 2 (2 positional, 0 keyword pair) 25 YIELD_VALUE 26 POP_TOP 27 JUMP_ABSOLUTE 7 >> 30 POP_BLOCK >> 31 LOAD_CONST 2 (None) 34 RETURN_VALUE This is a simple optimization that people emulate all the time with things like `sum_ = sum` before the loop or `def g(ns, *, _sum=sum)`. This cannot be used at module scope because often you only think it is safe or worth it to lock in the value for a small segment of code. Hopefully this use case is being considered as I think this is a very simple, non-semantics preserving case that is not very and also practical. On Wed, Jan 27, 2016 at 3:20 PM, Brett Cannon <brett@python.org> wrote:

On Jan 27, 2016, at 12:20, Brett Cannon <brett@python.org> wrote:
Unless it has to work in 2.7 and 3.3 (or, worse, 2.6 and 3.2). :)
You can reuse everything from importlib without modification except for needing to override a single method in some loader to do your transformation
Yes, as of 3.4, the design is amazing. In fact, hooking any level--lookup, source, AST, bytecode, or pyc--is about as easy as it could be. My only complaint is that it's not easy enough to find out how easy import hooks are. When I tell people "you could write a simple import hook to play with that idea", they get a look of fear and panic that's completely unwarranted and just drop their cool idea. (I wonder if having complete examples of a simple global-transformer hook and a simple special-extension hook at the start of the docs would be enough to solve that problem?) And I'm a bit worried that if Victor tries to make things like MacroPy and Hy easier, it still won't be enough for real-life cases, so all it'll do is discourage people from going right to writing import hooks and seeing how easy that already is.

On Wed, 27 Jan 2016 at 13:34 Andrew Barnert <abarnert@yahoo.com> wrote:
Sure, but you're already asking for a lot of pain if you're trying to be that compatible at the AST/bytecode level so I view this as the least of your worries. :)
So two things. One is that there is an Examples section in the importlib docs for 3.6: https://docs.python.org/3.6/library/importlib.html#examples . As of right now it only covers use-cases that the `imp` module provided since that's the most common thing I get asked about. Second, while it's much easier than it has ever been to do fancy stuff with import, it's a balancing act of promoting it and discouraging it. :) Mess up your import and it can be rather hard to debug. And this is especially true if you hook in early enough such that you start to screw up stuff in the stdlib and not just your own code. It can also lead to people going a bit overboard with things (hence why I kept my life simple with the LazyLoader and actively discourage its use unless you're sure you need it). So it's a balance of "look at this shiny thing!" and "be careful because you might come out screaming".
We don't need to empower every use-case as much as possible. While we're consenting adults, we also try to prevent people from making their own lives harder. All of this stuff is a tough balancing act to get right.

Hi, On 27.01.2016 16:39, Victor Stinner wrote:
I share this concern but haven't a good solution right now. Admittedly, I already have a use-case where I would like to apply a transformation which is NOT an optimization but a global extension. So, the discussion about allowing global extension really made me think about whether that is really a good idea. *BUT* it would allow me to experiment and find out if the risk is worth it. (use-case: adding some hooks before entering and leaving all try blocks)
How does it change the interface for the users? (I mean besides the renaming). I still like your idea of having the following three options: 1) global optimizers 2) local extensions --> via codec or import hook 3) global extension --> use with care So, I assume we talk about specifying 2).
Sounds good.
I agree with Nick. Be explicit.
Not sure if that is needed. If we don't have an immediate use-case, simpler is better.
There are several options to decide if a code transformer must be used on a specific source file.
The user should decide, otherwise there is too much magic involved: a marker (source file) or an option (cmdline). I am indifferent whether the marker should be a codec-decl or an import hook. But it should be file-local (at least I would prefer that). All of the options below seem to involve too much magic for my taste (or I didn't understand them correctly).
Best, Sven
participants (6)
-
Andrew Barnert
-
Brett Cannon
-
Joseph Jevnik
-
Petr Viktorin
-
Sven R. Kunze
-
Victor Stinner