Removing dead bytecode vs reporting syntax errors
Hi, Recently, we moved the optimization for the removal of dead code of the form if 0: .... to the ast so we use JUMP bytecodes instead (being completed in PR14116). The reason is that currently, any syntax error in the block will never be reported. For example: if 0: return if 1: pass else: return while 0: return at module level do not raise any syntax error (just some examples), In https://bugs.python.org/issue37500 it was reported that after that, code coverage will decrease as coverage.py sees these new bytecodes (even if they are not executed). In general, the code object is a bit bigger and the optimization now it requires an JUMP instruction to be executed, but syntax errors are reported. The discussion on issue 37500 is about if we should prioritize the optimization or the correctness of reporting syntax errors. In my opinion, SyntaxErrors should be reported with independence of the value of variables (__debug__) or constant as is a property of the code being written not of the code being executed. Also, as CPython is the reference implementation of Python, the danger here is that it could be interpreted that this optimization is part of the language and its behavior should be mirrored in every other Python implementation. Elsewhere we have always prioritize correctness over speed or optimizations. I am writing this email to know what other people think. Should we revert the change and not report Syntax Errors on optimized blocks? Someone sees a viable way of reporting the errors and not emitting the bytecode for these block? Regards, Pablo
On 06.07.2019 0:28, Pablo Galindo Salgado wrote:
Hi,
Recently, we moved the optimization for the removal of dead code of the form
if 0: ....
to the ast so we use JUMP bytecodes instead (being completed in PR14116). The reason is that currently, any syntax error in the block will never be reported. For example:
if 0: return
if 1: pass else: return
while 0: return
at module level do not raise any syntax error (just some examples), In https://bugs.python.org/issue37500 it was reported that after that, code coverage will decrease as coverage.py sees these new bytecodes (even if they are not executed). In general, the code object is a bit bigger and the optimization now it requires an JUMP instruction to be executed, but syntax errors are reported.
The discussion on issue 37500 is about if we should prioritize the optimization or the correctness of reporting syntax errors. In my opinion, SyntaxErrors should be reported with independence of the value of variables (__debug__) or constant as is a property of the code being written not of the code being executed. Also, as CPython is the reference implementation of Python, the danger here is that it could be interpreted that this optimization is part of the language and its behavior should be mirrored in every other Python implementation. Elsewhere we have always prioritize correctness over speed or optimizations.
I am writing this email to know what other people think. Should we revert the change and not report Syntax Errors on optimized blocks? Someone sees a viable way of reporting the errors and not emitting the bytecode for these block?
"Correctness over speed" is Python's core value, so any syntax errors must be reported. Is this optimization so important anyway? `if 0:' seems a niche use case (yes, it's in site.py which is in every installation but the gain there is pretty small) If it is, why not kill two birds with one stone and discard the subtree, but _after_ it has passed scrunity? AFAIK the standard approach to optimization is to construct a raw tree first and only then apply any equivalence transformations to it. -- Regards, Ivan
It seems that the issue that originally caused compatibility issues was that `__debug__` statements were being optimized away, which was apparently desirable from coverage's point of view. It's not clear to me, but it seems that this may also impact what bytecode is generated when Python is run in optimized mode, because statements of the form `if __debug__:` will no longer be completely optimized out under `-O` (note that from what I can tell, `assert` statements are still optimized out under `-O`). Does anyone have performance sensitive code that relies on `if __debug__` so that we can look at a benchmark? The issues with code coverage aside, if it's a significant issue, maybe it is worth considering a special case for `if __debug__` (I don't know enough about the implementation details to know how difficult or annoying this would be to maintain). Best, Paul On 7/5/19 5:51 PM, Ivan Pozdeev via Python-Dev wrote:
"Correctness over speed" is Python's core value, so any syntax errors must be reported.
Is this optimization so important anyway? `if 0:' seems a niche use case (yes, it's in site.py which is in every installation but the gain there is pretty small)
On Jul 5, 2019, at 17:28, Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
[...] I am writing this email to know what other people think. Should we revert the change and not report Syntax Errors on optimized blocks? Someone sees a viable way of reporting the errors and not emitting the bytecode for these block?
Just to be clear, the question on the table is whether to revert this change in behavior for 3.8.0 and beyond. The original change had also been backported and appears in 3.7.4rc1 and rc2. Now that its compatibility implications are clearer, the change will be reverted for 3.7.4 final and will not be considered for future 3.7.x releases. -- Ned Deily nad@python.org -- []
Hi, Until a solution which makes everyone happy can be found, I suggest to move back to the status quo: revert the change. More people seems to expect "if 0: ..." to be removed, than people who care of syntax errors on "if 0". -- Would it be possible to detect if the "if 0" block would raise a syntax error, and only remove it if it doesn't raise a syntax error? That's the approach I chose in my fatoptimizer project which is implemented as an AST optimizer: https://github.com/vstinner/fatoptimizer/blob/master/fatoptimizer/dead_code.... See the tests to see which cases are *not* optimized: https://github.com/vstinner/fatoptimizer/blob/master/test_fatoptimizer.py#L2... Some examples (the "dead code elimitaiton" is not only about "if 0", but also "while 0", dead code after return, etc.): self.check_dont_optimize(""" def func(): if 0: yield """) self.check_dont_optimize("while 1: x = 1") self.check_dont_optimize(""" def func(obj): return if 0: yield from obj """) self.check_dont_optimize(""" try: pass except Exception: yield 3 """) See also the doc: https://fatoptimizer.readthedocs.io/en/latest/optimizations.html#dead-code -- About code coverage, it seems like -X noopt would help: https://github.com/python/cpython/pull/13600 But I'm not sure anymore after Ned Batchelder wrote: "The real-word implications from my world are this: if your code has "if 0:" clauses in it, and you measure its coverage, then because the lines have not been optimized away, coverage.py will think the lines are a possible execution path, and will be considered a miss because they are not executed. This will reduce your coverage percentage." https://bugs.python.org/issue37500#msg347362 Does it mean that coverage.py will report even more "false positive" using -X noopt? Victor
Until a solution which makes everyone happy can be found, I suggest to move back to the status quo: revert the change.
More people seems to expect "if 0: ..." to be removed, than people who care of syntax errors on "if 0".
I don't think this is that clear. As Paul wrote on the issue this is the result of fixing a bug that has been open since 2008 (11 years), which itself was noticed independently, also in 2008 (#1875 and #1920, respectively). He also independently discovered the same issue last year when writing some tests for IPython. https://bugs.python.org/msg347394 On Fri, 5 Jul 2019 at 23:10, Victor Stinner <vstinner@redhat.com> wrote:
Hi,
Until a solution which makes everyone happy can be found, I suggest to move back to the status quo: revert the change.
More people seems to expect "if 0: ..." to be removed, than people who care of syntax errors on "if 0".
--
Would it be possible to detect if the "if 0" block would raise a syntax error, and only remove it if it doesn't raise a syntax error?
That's the approach I chose in my fatoptimizer project which is implemented as an AST optimizer:
https://github.com/vstinner/fatoptimizer/blob/master/fatoptimizer/dead_code....
See the tests to see which cases are *not* optimized:
https://github.com/vstinner/fatoptimizer/blob/master/test_fatoptimizer.py#L2...
Some examples (the "dead code elimitaiton" is not only about "if 0", but also "while 0", dead code after return, etc.):
self.check_dont_optimize(""" def func(): if 0: yield """)
self.check_dont_optimize("while 1: x = 1")
self.check_dont_optimize(""" def func(obj): return if 0: yield from obj """)
self.check_dont_optimize(""" try: pass except Exception: yield 3 """)
See also the doc: https://fatoptimizer.readthedocs.io/en/latest/optimizations.html#dead-code
--
About code coverage, it seems like -X noopt would help: https://github.com/python/cpython/pull/13600
But I'm not sure anymore after Ned Batchelder wrote:
"The real-word implications from my world are this: if your code has "if 0:" clauses in it, and you measure its coverage, then because the lines have not been optimized away, coverage.py will think the lines are a possible execution path, and will be considered a miss because they are not executed. This will reduce your coverage percentage." https://bugs.python.org/issue37500#msg347362
Does it mean that coverage.py will report even more "false positive" using -X noopt?
Victor
I can understand the desire for correctness. I do have to wonder though: has anyone *other* than language implementors noticed this issue with SyntaxErrors not being reported? Perhaps we should remember "Practicality beats purity"? --Ned. On 7/5/19 6:14 PM, Pablo Galindo Salgado wrote:
Until a solution which makes everyone happy can be found, I suggest to move back to the status quo: revert the change.
More people seems to expect "if 0: ..." to be removed, than people who care of syntax errors on "if 0".
I don't think this is that clear. As Paul wrote on the issue this is the result of fixing a bug that has been open since 2008 (11 years), which itself was noticed independently, also in 2008 (#1875 and #1920, respectively). He also independently discovered the same issue last year when writing some tests for IPython.
https://bugs.python.org/msg347394
On Fri, 5 Jul 2019 at 23:10, Victor Stinner <vstinner@redhat.com <mailto:vstinner@redhat.com>> wrote:
Hi,
Until a solution which makes everyone happy can be found, I suggest to move back to the status quo: revert the change.
More people seems to expect "if 0: ..." to be removed, than people who care of syntax errors on "if 0".
--
Would it be possible to detect if the "if 0" block would raise a syntax error, and only remove it if it doesn't raise a syntax error?
That's the approach I chose in my fatoptimizer project which is implemented as an AST optimizer: https://github.com/vstinner/fatoptimizer/blob/master/fatoptimizer/dead_code....
See the tests to see which cases are *not* optimized: https://github.com/vstinner/fatoptimizer/blob/master/test_fatoptimizer.py#L2...
Some examples (the "dead code elimitaiton" is not only about "if 0", but also "while 0", dead code after return, etc.):
self.check_dont_optimize(""" def func(): if 0: yield """)
self.check_dont_optimize("while 1: x = 1")
self.check_dont_optimize(""" def func(obj): return if 0: yield from obj """)
self.check_dont_optimize(""" try: pass except Exception: yield 3 """)
See also the doc: https://fatoptimizer.readthedocs.io/en/latest/optimizations.html#dead-code
--
About code coverage, it seems like -X noopt would help: https://github.com/python/cpython/pull/13600
But I'm not sure anymore after Ned Batchelder wrote:
"The real-word implications from my world are this: if your code has "if 0:" clauses in it, and you measure its coverage, then because the lines have not been optimized away, coverage.py will think the lines are a possible execution path, and will be considered a miss because they are not executed. This will reduce your coverage percentage." https://bugs.python.org/issue37500#msg347362
Does it mean that coverage.py will report even more "false positive" using -X noopt?
Victor
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LWZPZQ2I...
That's a bit of a fine line to walk. I noticed it when writing tests for IPython, which is not a implementation of Python, but is dealing with the nitty gritty details and manipulating the syntax tree it's true, but it's roughly the same class of project as implementing coverage.py, so if we disqualify all the people who notice these bugs because they are working on abstruse meta-code manipulation, I think both sides will come up empty. On July 5, 2019 10:39:27 PM UTC, Ned Batchelder <ned@nedbatchelder.com> wrote:
I can understand the desire for correctness. I do have to wonder though: has anyone *other* than language implementors noticed this issue with SyntaxErrors not being reported?
Perhaps we should remember "Practicality beats purity"?
--Ned.
On 7/5/19 6:14 PM, Pablo Galindo Salgado wrote:
Until a solution which makes everyone happy can be found, I suggest to move back to the status quo: revert the change.
More people seems to expect "if 0: ..." to be removed, than people who care of syntax errors on "if 0".
I don't think this is that clear. As Paul wrote on the issue this is the result of fixing a bug that has been open since 2008 (11 years), which itself was noticed independently, also in 2008
(#1875 and #1920, respectively). He also independently discovered the same issue last year when writing some tests for IPython.
https://bugs.python.org/msg347394
On Fri, 5 Jul 2019 at 23:10, Victor Stinner <vstinner@redhat.com <mailto:vstinner@redhat.com>> wrote:
Hi,
Until a solution which makes everyone happy can be found, I suggest to move back to the status quo: revert the change.
More people seems to expect "if 0: ..." to be removed, than people who care of syntax errors on "if 0".
--
Would it be possible to detect if the "if 0" block would raise a syntax error, and only remove it if it doesn't raise a syntax error?
That's the approach I chose in my fatoptimizer project which is implemented as an AST optimizer:
https://github.com/vstinner/fatoptimizer/blob/master/fatoptimizer/dead_code....
See the tests to see which cases are *not* optimized:
https://github.com/vstinner/fatoptimizer/blob/master/test_fatoptimizer.py#L2...
Some examples (the "dead code elimitaiton" is not only about "if
0",
but also "while 0", dead code after return, etc.):
self.check_dont_optimize(""" def func(): if 0: yield """)
self.check_dont_optimize("while 1: x = 1")
self.check_dont_optimize(""" def func(obj): return if 0: yield from obj """)
self.check_dont_optimize(""" try: pass except Exception: yield 3 """)
See also the doc:
https://fatoptimizer.readthedocs.io/en/latest/optimizations.html#dead-code
--
About code coverage, it seems like -X noopt would help: https://github.com/python/cpython/pull/13600
But I'm not sure anymore after Ned Batchelder wrote:
"The real-word implications from my world are this: if your code
has
"if 0:" clauses in it, and you measure its coverage, then because
the
lines have not been optimized away, coverage.py will think the
lines
are a possible execution path, and will be considered a miss
because
they are not executed. This will reduce your coverage
percentage."
https://bugs.python.org/issue37500#msg347362
Does it mean that coverage.py will report even more "false
positive"
using -X noopt?
Victor
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/LWZPZQ2I...
On 2019-07-06, Victor Stinner wrote:
More people seems to expect "if 0: ..." to be removed, than people who care of syntax errors on "if 0".
One small data point: I have shipped code that depended on 'if 0' removing code from the .pyc file. The code inside was not meant to be released publicly in the case someone inspects the .pyc file. I could have solved the problem in a different way, e.g. have a tool that removes all the code inside the 'if'. Having a tool that toggles 'if DEV_MODE' to 'if 0' was simpler. I will freely admit that is a bit of a dirty solution. I knew that Python removes those blocks but I'm not sure that is guaranteed anywhere. I think it is maybe more important that we give the syntax errors. So, I don't care strongly one way or another. However, there is other code out there that likely depends on the behavior (not just for code coverage). Regards, Neil
[Pablo Galindo Salgado <pablogsal@gmail.com>]
Recently, we moved the optimization for the removal of dead code of the form
if 0: ....
to the ast so we use JUMP bytecodes instead (being completed in PR14116). The reason is that currently, any syntax error in the block will never be reported. For example:
"Any syntax error" is way overstated. "Almost all" syntax errors will be reported. The ones that won't be aren't "really" about syntax (as most people view it). but more restrictions on the context in which certain statements can appear:
if 0: return
if 1: pass else: return
while 0: return
at module level do not raise any syntax error (just some examples),
Whereas in function scope, those are all fine, with "0" or "1". It's the "module level" context that matters to those. Regardless of context, a syntax error that's actually about syntax ;-) is reported: if 0: x +
In https://bugs.python.org/issue37500 it was reported that after that, code coverage will decrease as coverage.py sees these new bytecodes (even if they are not executed). In general, the code object is a bit bigger and the optimization now it requires an JUMP instruction to be executed, but syntax errors are reported.
And I added other gripes to the report. I have one file (temp.py) using "if 0:" over 400 times. When I need to write a small program or example (for, e.g. a StackOverflow answer), I add a new "if 1:" block at the end, fiddle with it until it's ready, then change the "1:" to "0:". So the snippets stay around forever, but are effectively commented out so have no effect (unless/until they're needed again). There are, of course, other ways to comment them out, but none so convenient. For example, under the "if 1:" block, the code is already indented 4 spaces, so can be pasted exactly as-is into a StackOverflow answer. But it doesn't really matter what I happen to do: I'm just one of millions of users, who have come to rely on this stuff for way over a decade.
The discussion on issue 37500 is about if we should prioritize the optimization or the correctness of reporting syntax errors. In my opinion,
Which doesn't really make sense unless it's logically _necessary_ to "pick just one - you can't have both".
SyntaxErrors should be reported with independence of the value of variables (__debug__) or constant as is a property of the code being written not of the code being executed. Also, as CPython is the reference implementation of Python, the danger here is that it could be interpreted that this optimization is part of the language and its behavior should be mirrored in every other Python implementation.
It's been that way for at least 15 years (since Python 2.4, so it's hard to be worried about that now. Indeed, Armin Rigo posted the original example, and he wasn't fooled a bit about intent ;-)
Elsewhere we have always prioritize correctness over speed or optimizations.
When they're irreconcilably in conflict, and the cost of correctness isn't "just too much", sure.
I am writing this email to know what other people think. Should we revert the change and not report Syntax Errors on optimized blocks? Someone sees a viable way of reporting the errors and not emitting the bytecode for these block?
We should revert the change until someone (not me ;-) ) thinks harder about whether it's possible to get both. There doesn't seem to be, e.g., a conceptual problem with simply throwing away "if 0:" subtrees from the AST before generating code, or from snipping the "if 1:" guard off an "if 1:" block. You started your msg by saying "we moved the optimization", but that's not so: the optimization was eliminated. So just finish "moving" it ;-)
You started your msg by saying "we moved the optimization", but that's not so: the optimization was eliminated. So just finish "moving" it ;-)
PR14116 was "finishing" the moving by adding JUMPS (a weaker optimization but still an optimization).
We should revert the change until someone (not me ;-) ) thinks harder about whether it's possible to get both. There doesn't seem to be, e.g., a conceptual problem with simply throwing away "if 0:" subtrees from the AST before generating code, or from snipping the "if 1:" guard off an "if 1:" block.
Ok, these were very good points. I am convinced. I agree that the better thing to do at this pointis to revert this until we can think this a bit more :) I have made a PR to revert the change. On Fri, 5 Jul 2019 at 23:27, Tim Peters <tim.peters@gmail.com> wrote:
Recently, we moved the optimization for the removal of dead code of the
[Pablo Galindo Salgado <pablogsal@gmail.com>] form
if 0: ....
to the ast so we use JUMP bytecodes instead (being completed in
PR14116). The
reason is that currently, any syntax error in the block will never be reported. For example:
"Any syntax error" is way overstated. "Almost all" syntax errors will be reported. The ones that won't be aren't "really" about syntax (as most people view it). but more restrictions on the context in which certain statements can appear:
if 0: return
if 1: pass else: return
while 0: return
at module level do not raise any syntax error (just some examples),
Whereas in function scope, those are all fine, with "0" or "1". It's the "module level" context that matters to those. Regardless of context, a syntax error that's actually about syntax ;-) is reported:
if 0: x +
In https://bugs.python.org/issue37500 it was reported that after that, code coverage will decrease as coverage.py sees these new bytecodes (even if they are not executed). In general, the code object is a bit bigger and the optimization now it requires an JUMP instruction to be executed, but syntax errors are reported.
And I added other gripes to the report.
I have one file (temp.py) using "if 0:" over 400 times. When I need to write a small program or example (for, e.g. a StackOverflow answer), I add a new "if 1:" block at the end, fiddle with it until it's ready, then change the "1:" to "0:". So the snippets stay around forever, but are effectively commented out so have no effect (unless/until they're needed again).
There are, of course, other ways to comment them out, but none so convenient. For example, under the "if 1:" block, the code is already indented 4 spaces, so can be pasted exactly as-is into a StackOverflow answer.
But it doesn't really matter what I happen to do: I'm just one of millions of users, who have come to rely on this stuff for way over a decade.
The discussion on issue 37500 is about if we should prioritize the optimization or the correctness of reporting syntax errors. In my opinion,
Which doesn't really make sense unless it's logically _necessary_ to "pick just one - you can't have both".
SyntaxErrors should be reported with independence of the value of variables (__debug__) or constant as is a property of the code being written not of the code being executed. Also, as CPython is the reference implementation of Python, the danger here is that it could be interpreted that this optimization is part of the language and its behavior should be mirrored in every other Python implementation.
It's been that way for at least 15 years (since Python 2.4, so it's hard to be worried about that now. Indeed, Armin Rigo posted the original example, and he wasn't fooled a bit about intent ;-)
Elsewhere we have always prioritize correctness over speed or optimizations.
When they're irreconcilably in conflict, and the cost of correctness isn't "just too much", sure.
I am writing this email to know what other people think. Should we revert the change and not report Syntax Errors on optimized blocks? Someone sees a viable way of reporting the errors and not emitting the bytecode for these block?
We should revert the change until someone (not me ;-) ) thinks harder about whether it's possible to get both. There doesn't seem to be, e.g., a conceptual problem with simply throwing away "if 0:" subtrees from the AST before generating code, or from snipping the "if 1:" guard off an "if 1:" block.
You started your msg by saying "we moved the optimization", but that's not so: the optimization was eliminated. So just finish "moving" it ;-)
I think this summarizes the situation very well :) https://xkcd.com/1172/ On Fri, 5 Jul 2019 at 22:28, Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
Hi,
Recently, we moved the optimization for the removal of dead code of the form
if 0: ....
to the ast so we use JUMP bytecodes instead (being completed in PR14116). The reason is that currently, any syntax error in the block will never be reported. For example:
if 0: return
if 1: pass else: return
while 0: return
at module level do not raise any syntax error (just some examples), In https://bugs.python.org/issue37500 it was reported that after that, code coverage will decrease as coverage.py sees these new bytecodes (even if they are not executed). In general, the code object is a bit bigger and the optimization now it requires an JUMP instruction to be executed, but syntax errors are reported.
The discussion on issue 37500 is about if we should prioritize the optimization or the correctness of reporting syntax errors. In my opinion, SyntaxErrors should be reported with independence of the value of variables (__debug__) or constant as is a property of the code being written not of the code being executed. Also, as CPython is the reference implementation of Python, the danger here is that it could be interpreted that this optimization is part of the language and its behavior should be mirrored in every other Python implementation. Elsewhere we have always prioritize correctness over speed or optimizations.
I am writing this email to know what other people think. Should we revert the change and not report Syntax Errors on optimized blocks? Someone sees a viable way of reporting the errors and not emitting the bytecode for these block?
Regards, Pablo
Ok, I think I found a way to make everyone happy. I have updated the issue and the PR with the solution. Basically the idea is to add an attribute to the compiler that is checked when emitting the bytecode and if is false, we stop don't emit anything but we do check for errors. Unless I am fundamentally wrong (which is possible as is late here) is very simple and covers all the cases. It would be great if someone could review the PR to check if I missed anything with this argument. On Fri, 5 Jul 2019 at 22:28, Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
Hi,
Recently, we moved the optimization for the removal of dead code of the form
if 0: ....
to the ast so we use JUMP bytecodes instead (being completed in PR14116). The reason is that currently, any syntax error in the block will never be reported. For example:
if 0: return
if 1: pass else: return
while 0: return
at module level do not raise any syntax error (just some examples), In https://bugs.python.org/issue37500 it was reported that after that, code coverage will decrease as coverage.py sees these new bytecodes (even if they are not executed). In general, the code object is a bit bigger and the optimization now it requires an JUMP instruction to be executed, but syntax errors are reported.
The discussion on issue 37500 is about if we should prioritize the optimization or the correctness of reporting syntax errors. In my opinion, SyntaxErrors should be reported with independence of the value of variables (__debug__) or constant as is a property of the code being written not of the code being executed. Also, as CPython is the reference implementation of Python, the danger here is that it could be interpreted that this optimization is part of the language and its behavior should be mirrored in every other Python implementation. Elsewhere we have always prioritize correctness over speed or optimizations.
I am writing this email to know what other people think. Should we revert the change and not report Syntax Errors on optimized blocks? Someone sees a viable way of reporting the errors and not emitting the bytecode for these block?
Regards, Pablo
Hopefully Pablo's proposed solution works. If it doesn't, could this one optimization be left in the peephole optimizer at bytecode level? Otherwise is another solution to follow through with https://discuss.python.org/t/switch-pythons-parsing-tech-to-something-more-p... and switch the parser so it can handle all syntax errors on its own without support from the AST analyzer?
On Mon, Jul 8, 2019 at 1:51 PM Brett Cannon <brett@python.org> wrote:
Hopefully Pablo's proposed solution works.
I'm sure it will.
If it doesn't, could this one optimization be left in the peephole optimizer at bytecode level? Otherwise is another solution to follow through with https://discuss.python.org/t/switch-pythons-parsing-tech-to-something-more-p... and switch the parser so it can handle all syntax errors on its own without support from the AST analyzer?
Thinking about this, that's possible, but it would require bloating the grammar with variants that allow continue/break or not, allow return/yield or not, allow await or not. So I think this particular thing is still best handled by a separate check. (The PEG-based parser I am contemplating would be able to tell an expression statement from an assignment statement without a separate pass to make sure you don't try to assign to a call, and it would allow a much more elegant approach to keyword arguments and the walrus operator.) --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
. If it doesn't, could this one optimization be left in the peephole optimizer at bytecode level?
Sadly no, because at that time there is not enough information left to do things correctly. The problem manifest with constructs like if something or __debug__: ... You cannot just look at the bytecode before the POP_JUMP_IF_FALSE (or similar) because you don't know reliably which bytecodes correspond to the entire condition (at least that I know of). My proposal (it seem that works, but is being reviewed still) is basically doing a compiler pass that just check for errors over the blocks that we don't want to generate bytecode instead of not visiting them at all. I think is the less intrusive and faster solution that I can think off.
This thread is based on https://bugs.python.org/issue37500 to get the thoughts of a broader range of people. On 7/5/2019 5:28 PM, Pablo Galindo Salgado wrote:
Recently, we moved the optimization for the removal of dead code of the form
if 0: ....
to the ast
My impression is that you stopped at least the (complete) removal of such code. An example use case for "if 0:" if 0: <code for algorithm version A> else: <code for algorithm version B> It is easy to switch between branches. Both branches should pass and be covered by tests, but this is impossible when using a constant, while using a constant results in leaner, cleaner .pyc. (I combined here paraphrases of Tim Peters' msg347383 concerns and Net Batchhelder's msg347296 coverage concerns.)
so we use JUMP bytecodes instead (being completed in
Are you leaving the code and just jumping over it?
PR14116). The reason is that currently, any syntax error in the block will never be reported. For example:
if 0: return
if 1: pass else: return
while 0: return
at module level do not raise any syntax error (just some examples), In https://bugs.python.org/issue37500 it was reported that after that, code coverage will decrease as coverage.py sees these new bytecodes (even if they are not executed). In general, the code object is a bit bigger and the optimization now it requires an JUMP instruction to be executed, but syntax errors are reported.
The discussion on issue 37500 is about if we should prioritize the optimization or the correctness of reporting syntax errors.
This is only an issue if we cannot do both.
In my opinion, SyntaxErrors should be reported with independence of the value of variables (__debug__)
__debug__ is defined as a boolean constant, set on startup
or constant as is a property of the code being written not of the code being executed. Also, as CPython is the reference implementation of Python, the danger here is that it could be interpreted that this optimization is part of the language and its behavior should be
mirrored in every other Python implementation. Elsewhere we have always prioritize correctness over speed or optimizations.
I am not sure I see your point here. The python language documentation for __debug__ and assert make no mention of CPython. https://docs.python.org/3/library/constants.html#__debug__ https://docs.python.org/3/reference/simple_stmts.html#the-assert-statement As I said on the issue, it is not unreasonable to me to infer from the above that skipping 'if 0' clauses is at least close to being a language feature, and certainly not a minor CPython implementation detail to be casually changed. https://docs.python.org/3/using/cmdline.html does say "Other implementations’ command line schemes may differ." but you are patching CPython, not some other implementation. https://docs.python.org/3/using/cmdline.html#cmdoption-o says -O removes "assert statements and any code conditional on the value of __debug__". Your initial patch appears to violate this. As for other implementations, I would think that any implementation with -O or the claimed equivalent should do the same.
I am writing this email to know what other people think. Should we revert the change and not report Syntax Errors on optimized blocks? Someone sees a viable way of reporting the errors and not emitting the bytecode for these block?
I presume that most of us agree that the last option, doing both, would be best, or at least agreeable. So I prefer that this be explored more before we fight too hard over a binary choice than might not be necessary. Right now, I would say that the 15-year status quo, bytecode deletion, should remain for 3.8 and that the Steering Council should ultimately decide for 3.9 should a decision be necessary. -- Terry Jan Reedy
participants (12)
-
Brett Cannon
-
Guido van Rossum
-
Ivan Pozdeev
-
Ned Batchelder
-
Ned Deily
-
Neil Schemenauer
-
Pablo Galindo Salgado
-
Paul G
-
Paul Ganssle
-
Terry Reedy
-
Tim Peters
-
Victor Stinner