Compiling of ast.Module in Python 3.10 and co_firstlineno behavior

Hi all, I'm stumbling with an issue where the co_firstlineno behavior changed from Python 3.9 to Python 3.10 and I was wondering if this was intentional or not. i.e.: Whenever a code is compiled in Python 3.10, the `code.co_firstlineno` is now always 1, whereas previously it was equal to the first statement. Also, does anyone know if there is any way to restore the old behavior in Python 3.10? I tried setting the `module.lineno` but it didn't really make any difference... As an example, given the code below: import dis source = ''' print(1) print(2) ''' initial_module = compile(source, '<nofilename>', 'exec', PyCF_ONLY_AST, 1) import sys print(sys.version) for i in range(2): module = Module([initial_module.body[i]], []) module_code = compile(module, '<no filename>', 'exec') print(' --> First lineno:', module_code.co_firstlineno) print(' --> Line starts :', list(lineno for offset, lineno in dis.findlinestarts(module_code))) print('---- dis ---') dis.dis(module_code) I have the following outputs for Pyhon 3.9/Python 3.10: 3.9.6 (default, Jul 30 2021, 11:42:22) [MSC v.1916 64 bit (AMD64)] --> First lineno: 2 --> Line starts : [2] ---- dis --- 2 0 LOAD_NAME 0 (print) 2 LOAD_CONST 0 (1) 4 CALL_FUNCTION 1 6 POP_TOP 8 LOAD_CONST 1 (None) 10 RETURN_VALUE --> First lineno: 4 --> Line starts : [4] ---- dis --- 4 0 LOAD_NAME 0 (print) 2 LOAD_CONST 0 (2) 4 CALL_FUNCTION 1 6 POP_TOP 8 LOAD_CONST 1 (None) 10 RETURN_VALUE 3.10.0 (tags/v3.10.0:b494f59, Oct 4 2021, 19:00:18) [MSC v.1929 64 bit (AMD64)] --> First lineno: 1 --> Line starts : [2] ---- dis --- 2 0 LOAD_NAME 0 (print) 2 LOAD_CONST 0 (1) 4 CALL_FUNCTION 1 6 POP_TOP 8 LOAD_CONST 1 (None) 10 RETURN_VALUE --> First lineno: 1 --> Line starts : [4] ---- dis --- 4 0 LOAD_NAME 0 (print) 2 LOAD_CONST 0 (2) 4 CALL_FUNCTION 1 6 POP_TOP 8 LOAD_CONST 1 (None) 10 RETURN_VALUE Thanks, Fabio

Em qui., 17 de fev. de 2022 às 16:05, Mark Shannon <mark@hotpy.org> escreveu:
Hi Mark, The issue I'm facing is that ipython uses an approach of obtaining the ast for a function to be executed and then it goes on node by node executing it. When running in the debugger, the debugger caches some information based on (co_firstlineno, co_name, co_filename) to have information saved across multiple calls to the same function, which works in general because each function in a given python file would have its own co_firstlineno, but in this specific case here it gets a single function and then recompiles it expression by expression -- so, it'll have the same co_filename (<cell>) and the same co_name (<module>), but then the co_firstlineno would be different (because the statement resides in a different line), but with Python 3.10 this assumption fails as even the co_firstlineno will be the same... You can see the actual issues at: https://github.com/microsoft/vscode-jupyter/issues/8803 / https://github.com/ipython/ipykernel/issues/841/ https://github.com/microsoft/debugpy/issues/844 After thinkering a bit it seems it's possible to create a new code object based on an existing code object with `code.replace` (re-assembling the co_lnotab/co_firstlineno), so, I'm going to propose that as a fix to ipython, but I found it really strange that this did change in Python 3.10 in the first place as the old behavior seemed reasonable for me (i.e.: with the new behavior it's a bit strange that the user is compiling something with a single statement on line 99 and yet the resulting code object will have the co_firstlineno == 1). -- note: I also couldn't find any mention of this in the changelog, so, I thought this could've happened by mistake. Best regards, Fabio

Hi Fabio Does the actual function object get re-created as well during the recompilation process that you have described? Perhaps it might help to note that the __code__ attribute of a function object f can be mutated and that f is hashable? Cheers, Gabriele On Thu, 17 Feb 2022 at 19:33, Fabio Zadrozny <fabiofz@gmail.com> wrote:
-- "Egli è scritto in lingua matematica, e i caratteri son triangoli, cerchi, ed altre figure geometriche, senza i quali mezzi è impossibile a intenderne umanamente parola; senza questi è un aggirarsi vanamente per un oscuro laberinto." -- G. Galilei, Il saggiatore.

Em qui., 17 de fev. de 2022 às 17:55, Gabriele <phoenix1987@gmail.com> escreveu:
Thank you for the reminder... Right now the way that it works in ipython the code object is really recreated and then is directly executed (which kind of makes sense since it's expected that cells change for re-evaluation). I had previously considered caching in the debugger using the code object, but as code objects can be created during the regular execution, the debugger could end up creating a huge leak. Best regards, Fabio

Hi Fabio, On 17/02/2022 7:30 pm, Fabio Zadrozny wrote:
A bit off topic, but why not use a different name for each cell?
You can see the actual issues at: https://github.com/microsoft/vscode-jupyter/issues/8803 <https://github.com/microsoft/vscode-jupyter/issues/8803> / https://github.com/ipython/ipykernel/issues/841/ <https://github.com/ipython/ipykernel/issues/841/> https://github.com/microsoft/debugpy/issues/844 <https://github.com/microsoft/debugpy/issues/844>
After thinkering a bit it seems it's possible to create a new code object based on an existing code object with `code.replace` (re-assembling the co_lnotab/co_firstlineno), so, I'm going to propose that as a fix to ipython, but I found it really strange that this did change in Python 3.10 in the first place as the old behavior seemed reasonable for me (i.e.: with the new behavior it's a bit strange that the user is compiling something with a single statement on line 99 and yet the resulting code object will have the co_firstlineno == 1).
That's the behavior for functions. If I define a function on line 10, but the first line of code in that function is on line 100, then `func.__code__.co_firstlineno == 10`, not 100. Modules start on line 1, by definition. You can find the first line of actual code using the `co_lines()` iterator. firstline = next(mod.__code__.co_lines())[2] Cheers, Mark.

Em qui., 17 de fev. de 2022 às 16:05, Mark Shannon <mark@hotpy.org> escreveu:
Hi Mark, The issue I'm facing is that ipython uses an approach of obtaining the ast for a function to be executed and then it goes on node by node executing it. When running in the debugger, the debugger caches some information based on (co_firstlineno, co_name, co_filename) to have information saved across multiple calls to the same function, which works in general because each function in a given python file would have its own co_firstlineno, but in this specific case here it gets a single function and then recompiles it expression by expression -- so, it'll have the same co_filename (<cell>) and the same co_name (<module>), but then the co_firstlineno would be different (because the statement resides in a different line), but with Python 3.10 this assumption fails as even the co_firstlineno will be the same... You can see the actual issues at: https://github.com/microsoft/vscode-jupyter/issues/8803 / https://github.com/ipython/ipykernel/issues/841/ https://github.com/microsoft/debugpy/issues/844 After thinkering a bit it seems it's possible to create a new code object based on an existing code object with `code.replace` (re-assembling the co_lnotab/co_firstlineno), so, I'm going to propose that as a fix to ipython, but I found it really strange that this did change in Python 3.10 in the first place as the old behavior seemed reasonable for me (i.e.: with the new behavior it's a bit strange that the user is compiling something with a single statement on line 99 and yet the resulting code object will have the co_firstlineno == 1). -- note: I also couldn't find any mention of this in the changelog, so, I thought this could've happened by mistake. Best regards, Fabio

Hi Fabio Does the actual function object get re-created as well during the recompilation process that you have described? Perhaps it might help to note that the __code__ attribute of a function object f can be mutated and that f is hashable? Cheers, Gabriele On Thu, 17 Feb 2022 at 19:33, Fabio Zadrozny <fabiofz@gmail.com> wrote:
-- "Egli è scritto in lingua matematica, e i caratteri son triangoli, cerchi, ed altre figure geometriche, senza i quali mezzi è impossibile a intenderne umanamente parola; senza questi è un aggirarsi vanamente per un oscuro laberinto." -- G. Galilei, Il saggiatore.

Em qui., 17 de fev. de 2022 às 17:55, Gabriele <phoenix1987@gmail.com> escreveu:
Thank you for the reminder... Right now the way that it works in ipython the code object is really recreated and then is directly executed (which kind of makes sense since it's expected that cells change for re-evaluation). I had previously considered caching in the debugger using the code object, but as code objects can be created during the regular execution, the debugger could end up creating a huge leak. Best regards, Fabio

Hi Fabio, On 17/02/2022 7:30 pm, Fabio Zadrozny wrote:
A bit off topic, but why not use a different name for each cell?
You can see the actual issues at: https://github.com/microsoft/vscode-jupyter/issues/8803 <https://github.com/microsoft/vscode-jupyter/issues/8803> / https://github.com/ipython/ipykernel/issues/841/ <https://github.com/ipython/ipykernel/issues/841/> https://github.com/microsoft/debugpy/issues/844 <https://github.com/microsoft/debugpy/issues/844>
After thinkering a bit it seems it's possible to create a new code object based on an existing code object with `code.replace` (re-assembling the co_lnotab/co_firstlineno), so, I'm going to propose that as a fix to ipython, but I found it really strange that this did change in Python 3.10 in the first place as the old behavior seemed reasonable for me (i.e.: with the new behavior it's a bit strange that the user is compiling something with a single statement on line 99 and yet the resulting code object will have the co_firstlineno == 1).
That's the behavior for functions. If I define a function on line 10, but the first line of code in that function is on line 100, then `func.__code__.co_firstlineno == 10`, not 100. Modules start on line 1, by definition. You can find the first line of actual code using the `co_lines()` iterator. firstline = next(mod.__code__.co_lines())[2] Cheers, Mark.
participants (3)
-
Fabio Zadrozny
-
Gabriele
-
Mark Shannon