Python 3.11 bytecode and exception table
Hi all, I am the current maintainer of bytecode (https://github.com/MatthieuDartiailh/bytecode) which is a library to perform assembly and disassembly of Python bytecode. The library was created by V. Stinner. I started looking in Python 3.11 support in bytecode, I read Objects/exception_handling_notes.txt and I have a couple of questions regarding the exception table: Currently bytecode exposes three level of abstractions: - the concrete level in which one deals with instruction offset for jumps and explicit indexing into the known constants and names - the bytecode level which uses labels for jumps and allow non integer argument to instructions - the cfg level which provides basic blocks delineation over the bytecode level So my first idea was to directly expose the unpacked exception table (start, stop, target, stack_depth, last_i) at the concrete level and use pseudo-instruction and labels at the bytecode level. At this point of my reflections, I saw https://github.com/python/cpython/commit/c57aad777afc6c0b382981ee9e4bc94c03b... about adding pseudo-instructionto dis output in 3.12 and though it would line up quite nicely. Reading through, I got curious about how SETUP_WITH handled popping one extra item from the stack so I went to look at dis results on a couple of small examples. I tried on 3.10 and 3.11b3 (for some reasons I cannot compile main at a391b74d on windows). I looked at simple things and got a bit surprised: Disassembling: deff(): try: a= 1 except: raise I get on 3.11: 1 0 RESUME 0 2 2 NOP 3 4 LOAD_CONST 1 (1) 6 STORE_FAST 0 (a) 8 LOAD_CONST 0 (None) 10 RETURN_VALUE >> 12 PUSH_EXC_INFO 4 14 POP_TOP 5 16 RAISE_VARARGS 0 >> 18 COPY 3 20 POP_EXCEPT 22 RERAISE 1 ExceptionTable: 4 to 6 -> 12 [0] 12 to 16 -> 18 [1] lasti On 3.10: 2 0 SETUP_FINALLY 5 (to 12) 3 2 LOAD_CONST 1 (1) 4 STORE_FAST 0 (a) 6 POP_BLOCK 8 LOAD_CONST 0 (None) 10 RETURN_VALUE 4 >> 12 POP_TOP 14 POP_TOP 16 POP_TOP 5 18 RAISE_VARARGS 0 This surprised me on two levels: - first I have never seen the RESUME opcode and it is currently not documented - my second surprise comes from the second entry in the exception table. At first I failed to see why it was needed but writing this I realize it corresponds to the explicit handling of exception propagation to the caller. Since I cannot compile 3.12 ATM I am wondering how this plays with pseudo-instruction: in particular are pseudo-instructions generated for all entries in the exception table ? My initial idea was to have a SETUP_FINALLY/SETUP_CLEANUP - POP_BLOCK pair for each line in the exception table and label for the jump target. But I realize it means we will have many such pairs than in 3.10. It is fine by me but I wondered what choice was made in 3.12 dis and if this approach made sense. Best regards Matthieu
On 05/07/2022 09:22, Matthieu Dartiailh wrote:
This surprised me on two levels: - first I have never seen the RESUME opcode and it is currently not documented RESUME occurs at the start of every function (and some other places), and is only used for some internal interpreter bookkeeping. It is documented at https://docs.python.org/3.11/library/dis.html#opcode-RESUME
Hi Matthieu, The dis output for this function in 3.12 is the same as it is in 3.11. The pseudo-instructions are emitted by the compiler's codegen stage, but never make it to compiled bytecode. They are removed or replaced by real opcodes before the code object is created. The recent change to the dis module that you mentioned did not change how the disassembly of bytecode gets displayed. Rather, it added the pseudo-instructions to the opcodes list so that we have access to their mnemonics from python. This is a step towards exposing intermediate compilation steps to python (for unit tests, etc). BTW - part of this will require writing some test utilities for cpython that let us specify and compare opcode sequences, similar to what you have in bytecode. As for deconstructing the exception table and planting the pseudo instructions back into the code - it would be nice if dis could do that, but we may need to settle for an approximation because I'm not sure the exact block structure can be reliably reconstructed from the exception table at the moment. I may be wrong. Having a SETUP_*/POP_BLOCK for each line in the exception table is not going to be correct - there can be nested try-except blocks, for instance, and even without them the compiler can emit the code of an except block in non-contiguous order (in https://github.com/python/cpython/pull/93622 I fixed one of those cases to reduce the size of the exception table, but it wasn't a correctness bug). Irit On Tue, Jul 5, 2022 at 9:27 AM Matthieu Dartiailh <m.dartiailh@gmail.com> wrote:
Hi all,
I am the current maintainer of bytecode ( https://github.com/MatthieuDartiailh/bytecode) which is a library to perform assembly and disassembly of Python bytecode. The library was created by V. Stinner.
I started looking in Python 3.11 support in bytecode, I read Objects/exception_handling_notes.txt and I have a couple of questions regarding the exception table:
Currently bytecode exposes three level of abstractions: - the concrete level in which one deals with instruction offset for jumps and explicit indexing into the known constants and names - the bytecode level which uses labels for jumps and allow non integer argument to instructions - the cfg level which provides basic blocks delineation over the bytecode level
So my first idea was to directly expose the unpacked exception table (start, stop, target, stack_depth, last_i) at the concrete level and use pseudo-instruction and labels at the bytecode level. At this point of my reflections, I saw https://github.com/python/cpython/commit/c57aad777afc6c0b382981ee9e4bc94c03b... about adding pseudo-instructionto dis output in 3.12 and though it would line up quite nicely. Reading through, I got curious about how SETUP_WITH handled popping one extra item from the stack so I went to look at dis results on a couple of small examples. I tried on 3.10 and 3.11b3 (for some reasons I cannot compile main at a391b74d on windows).
I looked at simple things and got a bit surprised:
Disassembling: def f(): try: a = 1 except: raise
I get on 3.11: 1 0 RESUME 0
2 2 NOP
3 4 LOAD_CONST 1 (1) 6 STORE_FAST 0 (a) 8 LOAD_CONST 0 (None) 10 RETURN_VALUE >> 12 PUSH_EXC_INFO
4 14 POP_TOP
5 16 RAISE_VARARGS 0 >> 18 COPY 3 20 POP_EXCEPT 22 RERAISE 1 ExceptionTable: 4 to 6 -> 12 [0] 12 to 16 -> 18 [1] lasti
On 3.10: 2 0 SETUP_FINALLY 5 (to 12)
3 2 LOAD_CONST 1 (1) 4 STORE_FAST 0 (a) 6 POP_BLOCK 8 LOAD_CONST 0 (None) 10 RETURN_VALUE
4 >> 12 POP_TOP 14 POP_TOP 16 POP_TOP
5 18 RAISE_VARARGS 0
This surprised me on two levels: - first I have never seen the RESUME opcode and it is currently not documented - my second surprise comes from the second entry in the exception table. At first I failed to see why it was needed but writing this I realize it corresponds to the explicit handling of exception propagation to the caller. Since I cannot compile 3.12 ATM I am wondering how this plays with pseudo-instruction: in particular are pseudo-instructions generated for all entries in the exception table ?
My initial idea was to have a SETUP_FINALLY/SETUP_CLEANUP - POP_BLOCK pair for each line in the exception table and label for the jump target. But I realize it means we will have many such pairs than in 3.10. It is fine by me but I wondered what choice was made in 3.12 dis and if this approach made sense.
Best regards
Matthieu _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/XZ7KDCI3... Code of Conduct: http://python.org/psf/codeofconduct/
Hi Irit, hi Patrick, Thanks for your quick answers. First thanks Patrick, it seems I went back to the stable docs at one point without noticing it and hence I missed the new opcodes. Thanks Irit for the clarification regarding the pseudo-instructions use in dis. Regarding the existence of nested try/except I believe a we could have 2 SETUP_* followed by 2 POP_BLOCK so I am not sure what issue you see there. However if we can have exception tables with two rows such as (1, 3, ...) and (2, 4, ...) then yes I will have an issue. I guess I will have to try implementing something and try to roundtrip on as many examples as possible. Would you be interested in being posted about my progress ? Best Matthieu Le 7/5/2022 à 11:01 AM, Irit Katriel a écrit :
Hi Matthieu,
The dis output for this function in 3.12 is the same as it is in 3.11.
The pseudo-instructions are emitted by the compiler's codegen stage, but never make it to compiled bytecode. They are removed or replaced by real opcodes before the code object is created.
The recent change to the dis module that you mentioned did not change how the disassembly of bytecode gets displayed. Rather, it added the pseudo-instructions to the opcodes list so that we have access to their mnemonics from python. This is a step towards exposing intermediate compilation steps to python (for unit tests, etc). BTW - part of this will require writing some test utilities for cpython that let us specify and compare opcode sequences, similar to what you have in bytecode.
As for deconstructing the exception table and planting the pseudo instructions back into the code - it would be nice if dis could do that, but we may need to settle for an approximation because I'm not sure the exact block structure can be reliably reconstructed from the exception table at the moment. I may be wrong.
Having a SETUP_*/POP_BLOCK for each line in the exception table is not going to be correct - there can be nested try-except blocks, for instance, and even without them the compiler can emit the code of an except block in non-contiguous order (in https://github.com/python/cpython/pull/93622 I fixed one of those cases to reduce the size of the exception table, but it wasn't a correctness bug).
Irit
On Tue, Jul 5, 2022 at 9:27 AM Matthieu Dartiailh <m.dartiailh@gmail.com> wrote:
Hi all,
I am the current maintainer of bytecode (https://github.com/MatthieuDartiailh/bytecode) which is a library to perform assembly and disassembly of Python bytecode. The library was created by V. Stinner.
I started looking in Python 3.11 support in bytecode, I read Objects/exception_handling_notes.txt and I have a couple of questions regarding the exception table:
Currently bytecode exposes three level of abstractions: - the concrete level in which one deals with instruction offset for jumps and explicit indexing into the known constants and names - the bytecode level which uses labels for jumps and allow non integer argument to instructions - the cfg level which provides basic blocks delineation over the bytecode level
So my first idea was to directly expose the unpacked exception table (start, stop, target, stack_depth, last_i) at the concrete level and use pseudo-instruction and labels at the bytecode level. At this point of my reflections, I saw https://github.com/python/cpython/commit/c57aad777afc6c0b382981ee9e4bc94c03b... about adding pseudo-instructionto dis output in 3.12 and though it would line up quite nicely. Reading through, I got curious about how SETUP_WITH handled popping one extra item from the stack so I went to look at dis results on a couple of small examples. I tried on 3.10 and 3.11b3 (for some reasons I cannot compile main at a391b74d on windows).
I looked at simple things and got a bit surprised:
Disassembling: deff(): try: a= 1 except: raise
I get on 3.11: 1 0 RESUME 0
2 2 NOP
3 4 LOAD_CONST 1 (1) 6 STORE_FAST 0 (a) 8 LOAD_CONST 0 (None) 10 RETURN_VALUE >> 12 PUSH_EXC_INFO
4 14 POP_TOP
5 16 RAISE_VARARGS 0 >> 18 COPY 3 20 POP_EXCEPT 22 RERAISE 1 ExceptionTable: 4 to 6 -> 12 [0] 12 to 16 -> 18 [1] lasti
On 3.10: 2 0 SETUP_FINALLY 5 (to 12)
3 2 LOAD_CONST 1 (1) 4 STORE_FAST 0 (a) 6 POP_BLOCK 8 LOAD_CONST 0 (None) 10 RETURN_VALUE
4 >> 12 POP_TOP 14 POP_TOP 16 POP_TOP
5 18 RAISE_VARARGS 0
This surprised me on two levels: - first I have never seen the RESUME opcode and it is currently not documented - my second surprise comes from the second entry in the exception table. At first I failed to see why it was needed but writing this I realize it corresponds to the explicit handling of exception propagation to the caller. Since I cannot compile 3.12 ATM I am wondering how this plays with pseudo-instruction: in particular are pseudo-instructions generated for all entries in the exception table ?
My initial idea was to have a SETUP_FINALLY/SETUP_CLEANUP - POP_BLOCK pair for each line in the exception table and label for the jump target. But I realize it means we will have many such pairs than in 3.10. It is fine by me but I wondered what choice was made in 3.12 dis and if this approach made sense.
Best regards
Matthieu _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/XZ7KDCI3... Code of Conduct: http://python.org/psf/codeofconduct/
Hi Matthieu, Yes I am interested. Please ping me for PR reviews or any progress updates. Thanks Irit
On 5 Jul 2022, at 20:27, Matthieu Dartiailh <m.dartiailh@gmail.com> wrote:
Hi Irit, hi Patrick,
Thanks for your quick answers.
First thanks Patrick, it seems I went back to the stable docs at one point without noticing it and hence I missed the new opcodes.
Thanks Irit for the clarification regarding the pseudo-instructions use in dis.
Regarding the existence of nested try/except I believe a we could have 2 SETUP_* followed by 2 POP_BLOCK so I am not sure what issue you see there. However if we can have exception tables with two rows such as (1, 3, ...) and (2, 4, ...) then yes I will have an issue. I guess I will have to try implementing something and try to roundtrip on as many examples as possible. Would you be interested in being posted about my progress ?
Best
Matthieu
participants (3)
-
Irit Katriel
-
Matthieu Dartiailh
-
Patrick Reader