Re: [Python-Dev] Opcode cache in ceval loop

Hi Yury, That's great news about the speed improvements with the dict offset cache!
The cache struct is defined in code.h [2], and is 32 bytes long. When a code object becomes hot, it gets an cache offset table allocated for it (+1 byte for each opcode) + an array of cache structs.
Ok, so each opcode has a 1-byte cache that sits separately to the actual bytecode. But a lot of opcodes don't use it so that leads to some wasted memory, correct? But then how do you index the cache, do you keep a count of the current opcode number? If I remember correctly, CPython has some opcodes taking 1 byte, and some taking 3 bytes, so the offset into the bytecode cannot be easily mapped to a bytecode number. Cheers, Damien.

Hi Damien, On 2016-02-01 3:59 PM, Damien George wrote:
Hi Yury,
That's great news about the speed improvements with the dict offset cache!
The cache struct is defined in code.h [2], and is 32 bytes long. When a code object becomes hot, it gets an cache offset table allocated for it (+1 byte for each opcode) + an array of cache structs. Ok, so each opcode has a 1-byte cache that sits separately to the actual bytecode. But a lot of opcodes don't use it so that leads to some wasted memory, correct?
Each code object has a list of opcodes and their arguments (bytes object == unsigned char array). "Hot" code objects have an offset table (unsigned chars), and a cache entries array (hope your email client will display the following correctly): opcodes offset cache entries table OPCODE 0 cache for 1st LOAD_ATTR ARG1 0 cache for 1st LOAD_GLOBAL ARG2 0 cache for 2nd LOAD_ATTR OPCODE 0 cache for 1st LOAD_METHOD LOAD_ATTR 1 ... ARG1 0 ARG2 0 OPCODE 0 LOAD_GLOBAL 2 ARG1 0 ARG2 0 LOAD_ATTR 3 ARG1 0 ARG2 0 ... ... LOAD_METHOD 4 ... ... When, say, a LOAD_ATTR opcode executes, it first checks if the code object has a non-NULL cache-entries table. If it has, that LOAD_ATTR then uses the offset table (indexing with its `INSTR_OFFSET()`) to find its position in cache-entries.
But then how do you index the cache, do you keep a count of the current opcode number? If I remember correctly, CPython has some opcodes taking 1 byte, and some taking 3 bytes, so the offset into the bytecode cannot be easily mapped to a bytecode number.
First, when a code object is created, it doesn't have an offset table and cache entries (those are set to NULL). Each code object has a new field to count how many times it was called. Each time a code object is called with PyEval_EvalFrameEx, that field is inced. Once a code object is called more than 1024 times we: 1. allocate memory for its offset table 2. iterate through its opcodes and count how many LOAD_ATTR, LOAD_METHOD and LOAD_GLOBAL opcodes it has; 3. As part of (2) we initialize the offset-table with correct mapping. Some opcodes will have a non-zero entry in the offset-table, some won't. Opcode args will always have zeros in the offset tables. 4. Then we allocate cache-entries table. Yury

On 2016-02-01 4:21 PM, Yury Selivanov wrote:
Hi Damien,
On 2016-02-01 3:59 PM, Damien George wrote:
[..]
But then how do you index the cache, do you keep a count of the current opcode number? If I remember correctly, CPython has some opcodes taking 1 byte, and some taking 3 bytes, so the offset into the bytecode cannot be easily mapped to a bytecode number.
Here are a few links that might explain the idea better: https://github.com/1st1/cpython/blob/opcache5/Python/ceval.c#L1229 https://github.com/1st1/cpython/blob/opcache5/Python/ceval.c#L2610 https://github.com/1st1/cpython/blob/opcache5/Objects/codeobject.c#L167 Yury
participants (2)
-
Damien George
-
Yury Selivanov