PEP 626: Precise line numbers for debugging and other tools.
Hi all, I'd like to announce a new PEP. It is mainly codifying that Python should do what you probably already thought it did :) Should be uncontroversial, but all comments are welcome. Cheers, Mark.
https://www.python.org/dev/peps/pep-0626/ :) --Ned. On 7/17/20 10:48 AM, Mark Shannon wrote:
Hi all,
I'd like to announce a new PEP.
It is mainly codifying that Python should do what you probably already thought it did :)
Should be uncontroversial, but all comments are welcome.
Cheers, Mark. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BMX32UAR... Code of Conduct: http://python.org/psf/codeofconduct/
PEP 626 wrote:
Abstract Python should guarantee that when tracing is turned on, "line" tracing events are generated for all lines of code executed and only for lines of code that are executed.
The sample code shows `return` events being executed, even when there is no `return` line -- doesn't this contradict the "only for lines of code executed"? Maybe the Tracing section should have an entry for multiple line events from the same line.
A side effect of ensuring correct line numbers, is that some bytecodes will need to be marked as artificial, and not have a meaningful line number.
Do you have an example of this? -- ~Ethan~
I like the proposal in general but I am against removing lnotab. The reason is that many tools rely on reading this attribute to figure out the Python call stack information. For instance, many sampler profilers read this memory by using ptrace or process_vm_readv and they cannot execute any code on the process under tracing as that would be a security issue. If we remove a 'static' view of that information, it will impact negatively the current set of remote process analysis tools. The proposed new way of retrieving the line number will rely (if we deprecate and remove lnotab) on executing code, making it much more difficult for the ecosystem of profilers and remote process analysis tools to do their job. -- Pablo On Fri, 17 Jul 2020, 15:55 Mark Shannon, <mark@hotpy.org> wrote:
Hi all,
I'd like to announce a new PEP.
It is mainly codifying that Python should do what you probably already thought it did :)
Should be uncontroversial, but all comments are welcome.
Cheers, Mark. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BMX32UAR... Code of Conduct: http://python.org/psf/codeofconduct/
It seems great improvement, but I am worrying about performance. Adding more attributes to the code object will increase memory usage and importing time. Is there some estimation of the overhead? And I am worrying precise tracing blocks future advanced bytecode optimization. Can we omit precise tracing and line number information when optimization (`-O`) is enabled? Regards, On Fri, Jul 17, 2020 at 11:49 PM Mark Shannon <mark@hotpy.org> wrote:
Hi all,
I'd like to announce a new PEP.
It is mainly codifying that Python should do what you probably already thought it did :)
Should be uncontroversial, but all comments are welcome.
Cheers, Mark. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BMX32UAR... Code of Conduct: http://python.org/psf/codeofconduct/
-- Inada Naoki <songofacandy@gmail.com>
On 18/07/2020 9:20 am, Inada Naoki wrote:
It seems great improvement, but I am worrying about performance.
Adding more attributes to the code object will increase memory usage and importing time. Is there some estimation of the overhead?
Zero overhead (approximately). We are just replacing one compressed table with another at the C level. The other attributes are computed.
And I am worrying precise tracing blocks future advanced bytecode optimization. Can we omit precise tracing and line number information when optimization (`-O`) is enabled?
I don't think that is a good idea. Performing any worthwhile performance optimization requires that we can reason about the behavior of programs. Consistent behavior makes that much easier. Inconsistent "micro optimizations" make real optimizations harder. Cheers, Mark.
Regards,
On Fri, Jul 17, 2020 at 11:49 PM Mark Shannon <mark@hotpy.org> wrote:
Hi all,
I'd like to announce a new PEP.
It is mainly codifying that Python should do what you probably already thought it did :)
Should be uncontroversial, but all comments are welcome.
Cheers, Mark. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BMX32UAR... Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, Jul 21, 2020 at 11:46 AM Mark Shannon <mark@hotpy.org> wrote:
On 18/07/2020 9:20 am, Inada Naoki wrote:
And I am worrying precise tracing blocks future advanced bytecode optimization. Can we omit precise tracing and line number information when optimization (`-O`) is enabled?
I don't think that is a good idea. Performing any worthwhile performance optimization requires that we can reason about the behavior of programs. Consistent behavior makes that much easier. Inconsistent "micro optimizations" make real optimizations harder.
Echoing what Mark said, there should be no perceived tension between debugging and optimization. For over 20 years the JVM has been the existence proof: Java is always precisely debuggable when the compiler is generating code at the highest optimization levels. IMHO, a Python user shouldn't have to expect anything less.
On Fri, Jul 17, 2020 at 10:41 AM Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
I like the proposal in general but I am against removing lnotab. The reason is that many tools rely on reading this attribute to figure out the Python call stack information. For instance, many sampler profilers read this memory by using ptrace or process_vm_readv and they cannot execute any code on the process under tracing as that would be a security issue. If we remove a 'static' view of that information, it will impact negatively the current set of remote process analysis tools. The proposed new way of retrieving the line number will rely (if we deprecate and remove lnotab) on executing code, making it much more difficult for the ecosystem of profilers and remote process analysis tools to do their job.
+1 agreed. """Some care must be taken not to break existing tooling. To minimize breakage, the co_lnotab attribute will be retained, but lazily generated on demand.""" - https://www.python.org/dev/peps/pep-0626/#id4 This breaks existing tooling. -gps
--
Pablo
On Fri, 17 Jul 2020, 15:55 Mark Shannon, <mark@hotpy.org> wrote:
Hi all,
I'd like to announce a new PEP.
It is mainly codifying that Python should do what you probably already thought it did :)
Should be uncontroversial, but all comments are welcome.
Cheers, Mark. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BMX32UAR... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/57OXMUBV... Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, Jul 17, 2020 at 8:41 AM Ned Batchelder <ned@nedbatchelder.com> wrote:
https://www.python.org/dev/peps/pep-0626/ :)
--Ned.
On 7/17/20 10:48 AM, Mark Shannon wrote:
Hi all,
I'd like to announce a new PEP.
It is mainly codifying that Python should do what you probably already thought it did :)
Should be uncontroversial, but all comments are welcome.
Cheers, Mark.
"""When a frame object is created, the f_lineno will be set to the line at which the function or class is defined. For modules it will be set to zero.""" Within this PEP it'd be good for us to be very pedantic. f_lineno is a single number. So which number is it given many class and function definition statements can span multiple lines. Is it the line containing the class or def keyword? Or is it the line containing the trailing :? Q: Why can't we have the information about the entire span of lines rather than consider a definition to be a "line"? I think that question applies to later sections as well. Anywhere we refer to a "line", it could actually mean a span of lines. (especially when you consider \ continuation in situations you might not otherwise think could span lines) -gps
On Tue, Jul 21, 2020 at 1:39 PM Gregory P. Smith <greg@krypto.org> wrote:
On Fri, Jul 17, 2020 at 10:41 AM Pablo Galindo Salgado < pablogsal@gmail.com> wrote:
I like the proposal in general but I am against removing lnotab. The reason is that many tools rely on reading this attribute to figure out the Python call stack information. For instance, many sampler profilers read this memory by using ptrace or process_vm_readv and they cannot execute any code on the process under tracing as that would be a security issue. If we remove a 'static' view of that information, it will impact negatively the current set of remote process analysis tools. The proposed new way of retrieving the line number will rely (if we deprecate and remove lnotab) on executing code, making it much more difficult for the ecosystem of profilers and remote process analysis tools to do their job.
+1 agreed.
"""Some care must be taken not to break existing tooling. To minimize breakage, the co_lnotab attribute will be retained, but lazily generated on demand.""" - https://www.python.org/dev/peps/pep-0626/#id4
This breaks existing tooling.
"The co_linetable attribute will hold the line number information. The format is opaque, unspecified and may be changed without notice." ... "Tools that parse the co_lnotab table should move to using the new co_lines() method as soon as is practical." Given it is impossible for tools doing passive inspection of Python VM instances to execute code, co_linetable's exact format will be depended on just as co_lnotab was. co_lnotab was only quasi-"officially" documented in the Python docs, it's spec lives in https://github.com/python/cpython/blob/master/Objects/lnotab_notes.txt (pointed to by a couple module's docs). The lnotab format "changed" once, in 3.6, an unsigned delta was changed to signed (but I don't believe anything beyond some experiments ever actually used negatives?). How about creating something defined and always present for once given the need has been demonstrated. Even if we don't, it will be used, and we will be unable to change it within a release. -gps
-gps
--
Pablo
On Fri, 17 Jul 2020, 15:55 Mark Shannon, <mark@hotpy.org> wrote:
Hi all,
I'd like to announce a new PEP.
It is mainly codifying that Python should do what you probably already thought it did :)
Should be uncontroversial, but all comments are welcome.
Cheers, Mark. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BMX32UAR... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/57OXMUBV... Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Jul 22, 2020 at 3:43 AM Mark Shannon <mark@hotpy.org> wrote:
On 18/07/2020 9:20 am, Inada Naoki wrote:
It seems great improvement, but I am worrying about performance.
Adding more attributes to the code object will increase memory usage and importing time. Is there some estimation of the overhead?
Zero overhead (approximately). We are just replacing one compressed table with another at the C level. The other attributes are computed.
And I am worrying precise tracing blocks future advanced bytecode optimization. Can we omit precise tracing and line number information when optimization (`-O`) is enabled?
I don't think that is a good idea. Performing any worthwhile performance optimization requires that we can reason about the behavior of programs. Consistent behavior makes that much easier. Inconsistent "micro optimizations" make real optimizations harder.
Cheers, Mark.
Tracing output is included in the program behavior? For example, if two code block is completely equal: if a == 1: very very long code block elif a == 2: very very long code block This code can be translated into like this (pseudo code): if a == 1: goto block1 if a == 2: goto block1 block1: very very long code block But if we merge two equal code blocks, we can not produce precise line numbers, can we? Is this inconsistent microoptimization that real optimization harder? This optimization must be prohibited in future Python? Regards, -- Inada Naoki <songofacandy@gmail.com>
On Wed, 22 Jul 2020 12:46:40 +0900 Inada Naoki <songofacandy@gmail.com> wrote:
On Wed, Jul 22, 2020 at 3:43 AM Mark Shannon <mark@hotpy.org> wrote:
On 18/07/2020 9:20 am, Inada Naoki wrote:
It seems great improvement, but I am worrying about performance.
Adding more attributes to the code object will increase memory usage and importing time. Is there some estimation of the overhead?
Zero overhead (approximately). We are just replacing one compressed table with another at the C level. The other attributes are computed.
And I am worrying precise tracing blocks future advanced bytecode optimization. Can we omit precise tracing and line number information when optimization (`-O`) is enabled?
I don't think that is a good idea. Performing any worthwhile performance optimization requires that we can reason about the behavior of programs. Consistent behavior makes that much easier. Inconsistent "micro optimizations" make real optimizations harder.
Cheers, Mark.
Tracing output is included in the program behavior?
For example, if two code block is completely equal:
if a == 1: very very long code block elif a == 2: very very long code block
This code can be translated into like this (pseudo code):
if a == 1: goto block1 if a == 2: goto block1 block1: very very long code block
But if we merge two equal code blocks, we can not produce precise line numbers, can we? Is this inconsistent microoptimization that real optimization harder? This optimization must be prohibited in future Python?
All attempts to improve Python performance by compile-time bytecode optimizations have more or less failed (the latter was Victor's, AFAIR). Is there still interest in pursuing that avenue? Regards Antoine.
On Wed, Jul 22, 2020 at 6:12 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
But if we merge two equal code blocks, we can not produce precise line numbers, can we? Is this inconsistent microoptimization that real optimization harder? This optimization must be prohibited in future Python?
All attempts to improve Python performance by compile-time bytecode optimizations have more or less failed (the latter was Victor's, AFAIR). Is there still interest in pursuing that avenue?
Regards
Antoine.
I don't think all attempts are failed. Note that current CPython includes some optimization already. If they are all failed, we must remove them to make compiler simple. And I think there are some potential optimization if we can limit some debugging/introspecting features, like some C variables are "optimzed away" in gdb when we use -O option. Regards,
On Wed, 22 Jul 2020 19:42:30 +0900 Inada Naoki <songofacandy@gmail.com> wrote:
On Wed, Jul 22, 2020 at 6:12 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
But if we merge two equal code blocks, we can not produce precise line numbers, can we? Is this inconsistent microoptimization that real optimization harder? This optimization must be prohibited in future Python?
All attempts to improve Python performance by compile-time bytecode optimizations have more or less failed (the latter was Victor's, AFAIR). Is there still interest in pursuing that avenue?
Regards
Antoine.
I don't think all attempts are failed. Note that current CPython includes some optimization already.
The set of compile-time optimizations has almost not changed since at least 15 years ago.
And I think there are some potential optimization if we can limit some debugging/introspecting features, like some C variables are "optimzed away" in gdb when we use -O option.
You can think it, but where's the proof? Or at least the design document for these optimizations? How do you explain that Victor's attempt at static optimization failed? Regards Antoine.
On 7/21/20 5:04 PM, Gregory P. Smith wrote:
Given it is impossible for tools doing passive inspection of Python VM instances to execute code, co_linetable's exact format will be depended on just as co_lnotab was. co_lnotab was only quasi-"officially" documented in the Python docs, it's spec lives in https://github.com/python/cpython/blob/master/Objects/lnotab_notes.txt (pointed to by a couple module's docs). The lnotab format "changed" once, in 3.6, an unsigned delta was changed to signed (but I don't believe anything beyond some experiments ever actually used negatives?).
Negatives definitely happen. When I comment out the line in coverage.py that deals with negative deltas, 34 of my tests fail. For example: a = ( 1 ) With 3.8 compiles to: 2 0 LOAD_CONST 0 (1) 1 2 STORE_NAME 0 (a) 4 LOAD_CONST 1 (None) 6 RETURN_VALUE With an lnotab of "02 ff". When executed, this produces these trace events: call on line 1 line on line 2 line on line 1 return on line 1 --Ned.
On 7/22/20 6:42 AM, Inada Naoki wrote:
On Wed, Jul 22, 2020 at 6:12 PM Antoine Pitrou <solipsis@pitrou.net <mailto:solipsis@pitrou.net>> wrote:
But if we merge two equal code blocks, we can not produce precise line numbers, can we? Is this inconsistent microoptimization that real optimization harder? This optimization must be prohibited in future Python?
All attempts to improve Python performance by compile-time bytecode optimizations have more or less failed (the latter was Victor's, AFAIR). Is there still interest in pursuing that avenue?
Regards
Antoine.
I don't think all attempts are failed. Note that current CPython includes some optimization already. If they are all failed, we must remove them to make compiler simple.
And I think there are some potential optimization if we can limit some debugging/introspecting features, like some C variables are "optimzed away" in gdb when we use -O option.
We seem to like following the C model when it comes to implementing optimizations, and then skip the part where C developers can disable all optimizations when reasoning about code is more important than speed. I am fine with any optimizations at all, as long as there is a simple and supported way to ask that they be disabled. --Ned.
On 7/17/20 10:48 AM, Mark Shannon wrote:
Hi all,
I'd like to announce a new PEP.
It is mainly codifying that Python should do what you probably already thought it did :)
Should be uncontroversial, but all comments are welcome.
Thanks for thinking about these aspects of the interpreter, and for using the PEP process to work them out before implementation. In the PEP, you mention, "some bytecodes will need to be marked as artificial, and not have a meaningful line number" (twice), but there's no example of what this means. Can you elaborate? --Ned.
On 22/07/2020 10:07 am, Antoine Pitrou wrote:
On Wed, 22 Jul 2020 12:46:40 +0900 Inada Naoki <songofacandy@gmail.com> wrote:
On Wed, Jul 22, 2020 at 3:43 AM Mark Shannon <mark@hotpy.org> wrote:
On 18/07/2020 9:20 am, Inada Naoki wrote:
It seems great improvement, but I am worrying about performance.
Adding more attributes to the code object will increase memory usage and importing time. Is there some estimation of the overhead?
Zero overhead (approximately). We are just replacing one compressed table with another at the C level. The other attributes are computed.
And I am worrying precise tracing blocks future advanced bytecode optimization. Can we omit precise tracing and line number information when optimization (`-O`) is enabled?
I don't think that is a good idea. Performing any worthwhile performance optimization requires that we can reason about the behavior of programs. Consistent behavior makes that much easier. Inconsistent "micro optimizations" make real optimizations harder.
Cheers, Mark.
Tracing output is included in the program behavior?
For example, if two code block is completely equal:
if a == 1: very very long code block elif a == 2: very very long code block
This code can be translated into like this (pseudo code):
if a == 1: goto block1 if a == 2: goto block1 block1: very very long code block
But if we merge two equal code blocks, we can not produce precise line numbers, can we? Is this inconsistent microoptimization that real optimization harder? This optimization must be prohibited in future Python?
All attempts to improve Python performance by compile-time bytecode optimizations have more or less failed (the latter was Victor's, AFAIR). Is there still interest in pursuing that avenue?
We are continually improving the bytecode, but there is probably only one or two percent speed up left possible from such improvements. None of those improvements would prevent accurate line numbers. Cheers, Mark.
Regards
Antoine.
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BDLZC3SS... Code of Conduct: http://python.org/psf/codeofconduct/
On 22/07/2020 11:42 am, Inada Naoki wrote:
On Wed, Jul 22, 2020 at 6:12 PM Antoine Pitrou <solipsis@pitrou.net <mailto:solipsis@pitrou.net>> wrote:
But if we merge two equal code blocks, we can not produce precise line numbers, can we? Is this inconsistent microoptimization that real optimization harder? This optimization must be prohibited in future Python?
All attempts to improve Python performance by compile-time bytecode optimizations have more or less failed (the latter was Victor's, AFAIR). Is there still interest in pursuing that avenue?
Regards
Antoine.
I don't think all attempts are failed. Note that current CPython includes some optimization already. If they are all failed, we must remove them to make compiler simple.
And I think there are some potential optimization if we can limit some debugging/introspecting features, like some C variables are "optimzed away" in gdb when we use -O option.
C is a pain to debug. Thankfully Python is not C. Damaging people's ability to debug their code, to squeeze out 1% performance, is not worthwhile IMO. Especially if that 1 or 2% costs us 10% later because it makes more sophisticated optimizations impractical. Cheers, Mark.
Regards,
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BMWA4JES... Code of Conduct: http://python.org/psf/codeofconduct/
On 21/07/2020 9:46 pm, Gregory P. Smith wrote:
On Fri, Jul 17, 2020 at 8:41 AM Ned Batchelder <ned@nedbatchelder.com <mailto:ned@nedbatchelder.com>> wrote:
https://www.python.org/dev/peps/pep-0626/ :)
--Ned.
On 7/17/20 10:48 AM, Mark Shannon wrote: > Hi all, > > I'd like to announce a new PEP. > > It is mainly codifying that Python should do what you probably already > thought it did :) > > Should be uncontroversial, but all comments are welcome. > > Cheers, > Mark.
"""When a frame object is created, the f_lineno will be set to the line at which the function or class is defined. For modules it will be set to zero."""
Within this PEP it'd be good for us to be very pedantic. f_lineno is a single number. So which number is it given many class and function definition statements can span multiple lines.
Is it the line containing the class or def keyword? Or is it the line containing the trailing :?
The line of the `def`/`class`. It wouldn't change for the current behavior. I'll add that to the PEP.
Q: Why can't we have the information about the entire span of lines rather than consider a definition to be a "line"?
Pretty much every profiler, coverage tool, and debugger ever expects lines to be natural numbers, not ranges of numbers. A lot of tooling would need to be changed.
I think that question applies to later sections as well. Anywhere we refer to a "line", it could actually mean a span of lines. (especially when you consider \ continuation in situations you might not otherwise think could span lines)
Let's take an example: ``` x = ( a, b, ) ``` You would want the BUILD_TUPLE instruction to have a of span lines 1 to 4 (inclusive), rather just line 1? If you wanted to break on the BUILD_TUPLE where you tell pdb to break? I don't see that it would add much value, but it would add a lot of complexity. Cheers, Mark.
-gps
On 22/07/2020 12:23 pm, Ned Batchelder wrote:
On 7/17/20 10:48 AM, Mark Shannon wrote:
Hi all,
I'd like to announce a new PEP.
It is mainly codifying that Python should do what you probably already thought it did :)
Should be uncontroversial, but all comments are welcome.
Thanks for thinking about these aspects of the interpreter, and for using the PEP process to work them out before implementation.
In the PEP, you mention, "some bytecodes will need to be marked as artificial, and not have a meaningful line number" (twice), but there's no example of what this means. Can you elaborate?
Take the simple Python function: ``` def f(cond): if cond: g() else: h() ``` which compiles to the following bytecode: 0 LOAD_FAST 0 (cond) 2 POP_JUMP_IF_FALSE 12 4 LOAD_GLOBAL 0 (g) 6 CALL_FUNCTION 0 8 POP_TOP 10 JUMP_FORWARD 6 (to 18) 12 LOAD_GLOBAL 1 (h) 14 CALL_FUNCTION 0 16 POP_TOP 18 LOAD_CONST 0 (None) 20 RETURN_VALUE Some of those instruction don't correspond to any line of code. Line number: 0 LOAD_FAST 1 2 POP_JUMP_IF_FALSE 1 4 LOAD_GLOBAL 2 6 CALL_FUNCTION 2 8 POP_TOP 2 10 JUMP_FORWARD 2 or artificial; it's debatable* 12 LOAD_GLOBAL 3 14 CALL_FUNCTION 3 16 POP_TOP 3 18 LOAD_CONST Artificial; there is no `None` in the source. 20 RETURN_VALUE Artificial; there is no `return` statement. *For practical reasons we would label this as line 2. It's faster and makes the line table more compact. Cheers, Mark.
--Ned. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OTYRLXBR...
Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Jul 22, 2020 at 8:51 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
I don't think all attempts are failed. Note that current CPython includes some optimization already.
The set of compile-time optimizations has almost not changed since at least 15 years ago.
Constant folding is rewritten and unused constants are removed from co_consts. That's one of what Victor did his project.
And I think there are some potential optimization if we can limit some debugging/introspecting features, like some C variables are "optimzed away" in gdb when we use -O option.
You can think it, but where's the proof? Or at least the design document for these optimizations? How do you explain that Victor's attempt at static optimization failed?
I have some opinion about it (especially, PHP 7.x achieved significant performance improvement without JIT. I envy it.). But I don't have time to prove it, and it is too off topic because it is not related to precise line number. Please forget what I said about blocking future optimization. My idea was just merging code blocks, but it is not worth enough. And it is not related to execution speed. On the other hand, if we can not remove lnotab, it is still considerable to avoid having two lnotabs in -O mode. Memory overhead of lnotab is not negligible. Regards, -- Inada Naoki <songofacandy@gmail.com>
Le 22/07/2020 à 15:48, Inada Naoki a écrit :
On Wed, Jul 22, 2020 at 8:51 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
I don't think all attempts are failed. Note that current CPython includes some optimization already.
The set of compile-time optimizations has almost not changed since at least 15 years ago.
Constant folding is rewritten and unused constants are removed from co_consts. That's one of what Victor did his project.
Constant folding is not a new optimization, so this does not contradict what I said. Also, constant folding is not precluded by Mark's proposal, AFAIK. Regards Antoine.
On Wed, Jul 22, 2020 at 10:53 PM Antoine Pitrou <antoine@python.org> wrote:
Le 22/07/2020 à 15:48, Inada Naoki a écrit :
On Wed, Jul 22, 2020 at 8:51 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
I don't think all attempts are failed. Note that current CPython includes some optimization already.
The set of compile-time optimizations has almost not changed since at least 15 years ago.
Constant folding is rewritten and unused constants are removed from co_consts. That's one of what Victor did his project.
Constant folding is not a new optimization, so this does not contradict what I said. Also, constant folding is not precluded by Mark's proposal, AFAIK.
Yes, this is tooooo off topic. Please stop it. -- Inada Naoki <songofacandy@gmail.com>
On Wed, Jul 22, 2020 at 5:19 AM Mark Shannon <mark@hotpy.org> wrote:
On 21/07/2020 9:46 pm, Gregory P. Smith wrote:
On Fri, Jul 17, 2020 at 8:41 AM Ned Batchelder <ned@nedbatchelder.com <mailto:ned@nedbatchelder.com>> wrote:
https://www.python.org/dev/peps/pep-0626/ :)
--Ned.
On 7/17/20 10:48 AM, Mark Shannon wrote: > Hi all, > > I'd like to announce a new PEP. > > It is mainly codifying that Python should do what you probably already > thought it did :) > > Should be uncontroversial, but all comments are welcome. > > Cheers, > Mark.
"""When a frame object is created, the f_lineno will be set to the line at which the function or class is defined. For modules it will be set to zero."""
Within this PEP it'd be good for us to be very pedantic. f_lineno is a single number. So which number is it given many class and function definition statements can span multiple lines.
Is it the line containing the class or def keyword? Or is it the line containing the trailing :?
The line of the `def`/`class`. It wouldn't change for the current behavior. I'll add that to the PEP.
Q: Why can't we have the information about the entire span of lines rather than consider a definition to be a "line"?
Pretty much every profiler, coverage tool, and debugger ever expects lines to be natural numbers, not ranges of numbers. A lot of tooling would need to be changed.
I think that question applies to later sections as well. Anywhere we refer to a "line", it could actually mean a span of lines. (especially when you consider \ continuation in situations you might not otherwise think could span lines)
Let's take an example: ``` x = ( a, b, ) ```
You would want the BUILD_TUPLE instruction to have a of span lines 1 to 4 (inclusive), rather just line 1? If you wanted to break on the BUILD_TUPLE where you tell pdb to break?
I don't see that it would add much value, but it would add a lot of complexity.
We should have the data about the range at bytecode compilation time, correct? So why not keep it? sure, most existing tooling would just use the start of the range as the line number as it always has. but some tooling could find the range useful (ex: semantic code indexing for use in display, search, editors, IDEs. Rendering lint errors more accurately instead of just claiming a single line or resorting to parsing hacks to come up with a range, etc.). The downside is that we'd be storing a second number in bytecode making it slightly larger. Though it could be stored efficiently as a prefixed delta so it'd likely average out as less than 2 bytes per line number stored. (i don't have a feeling for our current format to know if that is significant or not - if it is, maybe this idea just gets nixed) The reason the range concept was on my mind is due to something not quite related but involving a changed idea of a line number in our current system that we recently ran into with pytype during a Python upgrade. """in 3.7, if a function body is a plain docstring, the line number of the RETURN_VALUE opcode corresponds to the docstring, whereas in 3.6 it corresponds to the function definition.""" (Thanks, Martin & Rebecca!) ```python def no_op(): """docstring instead of pass.""" ``` so the location of what *was* originally an end of line `# pytype: disable=bad-return-type` comment (to work around an issue not relevant here) turned awkward and version dependent. pytype is bytecode based, thus that is where its line numbers come from. metadata comments in source can only be tied to bytecode via line numbers. making end of line directives occasionally hard to match up. When there is no return statement, this opcode still exists. what line number does it belong to? 3.6's answer made sense to me. 3.7's seems wrong - a docstring isn't responsible for a return opcode. I didn't check what 3.8 and 3.9 do. An alternate answer after this PEP is that it wouldn't have a line number when there is no return statement (pedantically correct, I approve! #win). -gps
Cheers, Mark.
-gps
But on which line is the RETURN opcode if there is more than a docstring? Doesn’t it make sense to have it attached to the last line of the body? (Too bad about pytype, that kind of change happens — we had this kind of thing for mypy too, when line numbers in the AST were fixed.) On Wed, Jul 22, 2020 at 17:29 Gregory P. Smith <greg@krypto.org> wrote:
On Wed, Jul 22, 2020 at 5:19 AM Mark Shannon <mark@hotpy.org> wrote:
On 21/07/2020 9:46 pm, Gregory P. Smith wrote:
On Fri, Jul 17, 2020 at 8:41 AM Ned Batchelder <ned@nedbatchelder.com <mailto:ned@nedbatchelder.com>> wrote:
https://www.python.org/dev/peps/pep-0626/ :)
--Ned.
On 7/17/20 10:48 AM, Mark Shannon wrote: > Hi all, > > I'd like to announce a new PEP. > > It is mainly codifying that Python should do what you probably already > thought it did :) > > Should be uncontroversial, but all comments are welcome. > > Cheers, > Mark.
"""When a frame object is created, the f_lineno will be set to the line at which the function or class is defined. For modules it will be set
to
zero."""
Within this PEP it'd be good for us to be very pedantic. f_lineno is a single number. So which number is it given many class and function definition statements can span multiple lines.
Is it the line containing the class or def keyword? Or is it the line containing the trailing :?
The line of the `def`/`class`. It wouldn't change for the current behavior. I'll add that to the PEP.
Q: Why can't we have the information about the entire span of lines rather than consider a definition to be a "line"?
Pretty much every profiler, coverage tool, and debugger ever expects lines to be natural numbers, not ranges of numbers. A lot of tooling would need to be changed.
I think that question applies to later sections as well. Anywhere we refer to a "line", it could actually mean a span of lines. (especially when you consider \ continuation in situations you might not otherwise think could span lines)
Let's take an example: ``` x = ( a, b, ) ```
You would want the BUILD_TUPLE instruction to have a of span lines 1 to 4 (inclusive), rather just line 1? If you wanted to break on the BUILD_TUPLE where you tell pdb to break?
I don't see that it would add much value, but it would add a lot of complexity.
We should have the data about the range at bytecode compilation time, correct? So why not keep it? sure, most existing tooling would just use the start of the range as the line number as it always has. but some tooling could find the range useful (ex: semantic code indexing for use in display, search, editors, IDEs. Rendering lint errors more accurately instead of just claiming a single line or resorting to parsing hacks to come up with a range, etc.). The downside is that we'd be storing a second number in bytecode making it slightly larger. Though it could be stored efficiently as a prefixed delta so it'd likely average out as less than 2 bytes per line number stored. (i don't have a feeling for our current format to know if that is significant or not - if it is, maybe this idea just gets nixed)
The reason the range concept was on my mind is due to something not quite related but involving a changed idea of a line number in our current system that we recently ran into with pytype during a Python upgrade.
"""in 3.7, if a function body is a plain docstring, the line number of the RETURN_VALUE opcode corresponds to the docstring, whereas in 3.6 it corresponds to the function definition.""" (Thanks, Martin & Rebecca!)
```python def no_op(): """docstring instead of pass.""" ```
so the location of what *was* originally an end of line `# pytype: disable=bad-return-type` comment (to work around an issue not relevant here) turned awkward and version dependent. pytype is bytecode based, thus that is where its line numbers come from. metadata comments in source can only be tied to bytecode via line numbers. making end of line directives occasionally hard to match up.
When there is no return statement, this opcode still exists. what line number does it belong to? 3.6's answer made sense to me. 3.7's seems wrong - a docstring isn't responsible for a return opcode. I didn't check what 3.8 and 3.9 do. An alternate answer after this PEP is that it wouldn't have a line number when there is no return statement (pedantically correct, I approve! #win).
-gps
Cheers, Mark.
-gps
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/H3YBK275... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile)
On 22Jul2020 1319, Mark Shannon wrote:
On 21/07/2020 9:46 pm, Gregory P. Smith wrote:
Q: Why can't we have the information about the entire span of lines rather than consider a definition to be a "line"?
Pretty much every profiler, coverage tool, and debugger ever expects lines to be natural numbers, not ranges of numbers. A lot of tooling would need to be changed.
As someone who worked on apparently the only debugger that expects _character_ ranges, rather than a simple line number, I would love to keep full mapping information somewhere. We experimented with some stack analysis to see if we could tell the difference between being inside the list comprehension vs. outside the comprehension, or which of the nested comprehension is currently running. But it turned out to be too much trouble. An alternative to lnotab that includes the full line/column range for the expression, presumably taken from a particular type of node in the AST, would be great. But I think omitting even line ranges at this stage would be a missed opportunity, since we're breaking non-Python debuggers anyway. Cheers, Steve
In theory, this table could be stored somewhere other than the code object, so that it doesn't actually get paged in or occupy cache unless tracing is on. Whether that saves enough to be worth the extra indirections when tracing is on, I have no intention of volunteering to measure. I will note that in the past, taking out docstrings (not even just moving them to a dict of [code:docstring] -- just taking them out completely) has been considered worthwhile.
In theory, this table could be stored somewhere other than the code object, so that it doesn't actually get paged in or occupy cache unless tracing is on.
As some of us mentioned before, that will hurt the ecosystem of profilers and debugger tools considerably On Thu, 23 Jul 2020 at 18:08, Jim J. Jewett <jimjjewett@gmail.com> wrote:
In theory, this table could be stored somewhere other than the code object, so that it doesn't actually get paged in or occupy cache unless tracing is on. Whether that saves enough to be worth the extra indirections when tracing is on, I have no intention of volunteering to measure. I will note that in the past, taking out docstrings (not even just moving them to a dict of [code:docstring] -- just taking them out completely) has been considered worthwhile. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HEXSSC35... Code of Conduct: http://python.org/psf/codeofconduct/
I certainly understand saying "this change isn't important enough to justify a change." But it sounds as though you are saying the benefit is irrelevant; it is just inherently too expensive to ask programs that are already dealing with internals and trying to optimize performance to make a mechanical change from: code.magic_attrname to: magicdict[code] What have I missed?
On Sat, Jul 25, 2020 at 12:17 PM Jim J. Jewett <jimjjewett@gmail.com> wrote:
I certainly understand saying "this change isn't important enough to justify a change."
But it sounds as though you are saying the benefit is irrelevant;
Jim, if you include what you’re replying to in your own message (like I’m doing here), it will be easier for people to tell who / what you’re replying to. I wasn’t able to tell what your last few messages were in reply to. —Chris it is just inherently too expensive to ask programs that are already
dealing with internals and trying to optimize performance to make a mechanical change from: code.magic_attrname to: magicdict[code]
What have I missed? _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TDCJFNHI... Code of Conduct: http://python.org/psf/codeofconduct/
On 25Jul2020 2014, Jim J. Jewett wrote:
But it sounds as though you are saying the benefit is irrelevant; it is just inherently too expensive to ask programs that are already dealing with internals and trying to optimize performance to make a mechanical change from: code.magic_attrname to: magicdict[code]
What have I missed?
You've missed that debugging and profiling tools that operate purely on native memory can't execute Python code, so the "magic" has to be easily representable in C such that it can be copied into whichever language is being used (whether it's C, C++, C#, Rust, or something else). Cheers, Steve
ah... we may have been talking past each other. Steve Dower wrote:
On 25Jul2020 2014, Jim J. Jewett wrote:
But it sounds as though you are saying the benefit
[of storing the line numbers in an external table, I thought, but perhaps Pablo Galindo Salgado and yourself were talking only of the switch from an lnotab string to an opaque co_linetable?]
is irrelevant; it is just inherently too expensive to ask programs that are already dealing with internals and trying to optimize performance to make a mechanical change from: code.magic_attrname to: magicdict[code] What have I missed?
You've missed that debugging and profiling tools that operate purely on native memory can't execute Python code, so the "magic" has to be easily representable in C such that it can be copied into whichever language is being used (whether it's C, C++, C#, Rust, or something else).
Unless you really were talking only of the switch to co_linetable, I'm still missing the problem. To me, it still looks like a call to: PyAPI_FUNC(PyObject *) PyObject_GetAttrString(PyObject *, const char *); with the code object being stepped through and "co_lnotab" would be replaced by: PyAPI_FUNC(PyObject *) PyDict_GetItem(PyObject *mp, PyObject *key); using that same code object as the key, but getting the dict from some well-known (yet-to-be-defined) location, such as sys.code_to_lnotab. Mark Shannon and Carl Shapiro had seemed to object to the PEP because the new structure would make the code object longer, and making it smaller by a string does seem likely to be good. But if your real objections are to just to replacing the lnotab format with something that needs to be executed, then I apologize for misunderstanding. -jJ
On Tue, Jul 28, 2020 at 2:12 PM Jim J. Jewett <jimjjewett@gmail.com> wrote:
ah... we may have been talking past each other.
Steve Dower wrote:
On 25Jul2020 2014, Jim J. Jewett wrote:
But it sounds as though you are saying the benefit
[of storing the line numbers in an external table, I thought, but perhaps Pablo Galindo Salgado and yourself were talking only of the switch from an lnotab string to an opaque co_linetable?]
is irrelevant; it is just inherently too expensive to ask programs that are already dealing with internals and trying to optimize performance to make a mechanical change from: code.magic_attrname to: magicdict[code] What have I missed?
You've missed that debugging and profiling tools that operate purely on native memory can't execute Python code, so the "magic" has to be easily representable in C such that it can be copied into whichever language is being used (whether it's C, C++, C#, Rust, or something else).
Unless you really were talking only of the switch to co_linetable, I'm still missing the problem. To me, it still looks like a call to:
PyAPI_FUNC(PyObject *) PyObject_GetAttrString(PyObject *, const char *);
with the code object being stepped through and "co_lnotab" would be replaced by:
PyAPI_FUNC(PyObject *) PyDict_GetItem(PyObject *mp, PyObject *key);
using that same code object as the key, but getting the dict from some well-known (yet-to-be-defined) location, such as sys.code_to_lnotab.
Mark Shannon and Carl Shapiro had seemed to object to the PEP because the new structure would make the code object longer, and making it smaller by a string does seem likely to be good. But if your real objections are to just to replacing the lnotab format with something that needs to be executed, then I apologize for misunderstanding.
Introspection of the running CPython process is happening from outside of the CPython interpreter itself. Either from a signal handler or C/C++ managed thread within the process, or (as Pablo suggested) from outside the process entirely. Calling CPython APIs is a non-option in all of those situations. That is why I suggested that the "undocumented" new co_linetable will be used instead of the disappeared co_lnotab regardless of documentation or claimed stability guarantees. It sounds like an equivalent read only data source for this purpose. It doesn't matter to anyone with such a profiler if it is claimed to be unspecified. The data is needed, and the format shouldn't change within a stable python major.minor release (we'd be unlikely to anyways even without that guarantee). Given this, I suggest at least specifying valuable properties of it such as "read only, never mutated" even if the exact format is intentionally left as implementation defined, subject to change between minor releases structure. -gps
-jJ _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WUEFHFTP... Code of Conduct: http://python.org/psf/codeofconduct/
participants (13)
-
Antoine Pitrou
-
Antoine Pitrou
-
Carl Shapiro
-
Chris Jerdonek
-
Ethan Furman
-
Gregory P. Smith
-
Guido van Rossum
-
Inada Naoki
-
Jim J. Jewett
-
Mark Shannon
-
Ned Batchelder
-
Pablo Galindo Salgado
-
Steve Dower