[Python-Dev] Opcode cache in ceval loop
Yury Selivanov
yselivanov.ml at gmail.com
Mon Feb 1 16:21:37 EST 2016
Hi Damien,
On 2016-02-01 3:59 PM, Damien George wrote:
> Hi Yury,
>
> That's great news about the speed improvements with the dict offset cache!
>
>> The cache struct is defined in code.h [2], and is 32 bytes long. When a
>> code object becomes hot, it gets an cache offset table allocated for it
>> (+1 byte for each opcode) + an array of cache structs.
> Ok, so each opcode has a 1-byte cache that sits separately to the
> actual bytecode. But a lot of opcodes don't use it so that leads to
> some wasted memory, correct?
Each code object has a list of opcodes and their arguments
(bytes object == unsigned char array).
"Hot" code objects have an offset table (unsigned chars), and
a cache entries array (hope your email client will display
the following correctly):
opcodes offset cache entries
table
OPCODE 0 cache for 1st LOAD_ATTR
ARG1 0 cache for 1st LOAD_GLOBAL
ARG2 0 cache for 2nd LOAD_ATTR
OPCODE 0 cache for 1st LOAD_METHOD
LOAD_ATTR 1 ...
ARG1 0
ARG2 0
OPCODE 0
LOAD_GLOBAL 2
ARG1 0
ARG2 0
LOAD_ATTR 3
ARG1 0
ARG2 0
... ...
LOAD_METHOD 4
... ...
When, say, a LOAD_ATTR opcode executes, it first checks if the
code object has a non-NULL cache-entries table.
If it has, that LOAD_ATTR then uses the offset table (indexing
with its `INSTR_OFFSET()`) to find its position in
cache-entries.
>
> But then how do you index the cache, do you keep a count of the
> current opcode number? If I remember correctly, CPython has some
> opcodes taking 1 byte, and some taking 3 bytes, so the offset into the
> bytecode cannot be easily mapped to a bytecode number.
First, when a code object is created, it doesn't have
an offset table and cache entries (those are set to NULL).
Each code object has a new field to count how many times
it was called. Each time a code object is called with
PyEval_EvalFrameEx, that field is inced.
Once a code object is called more than 1024 times we:
1. allocate memory for its offset table
2. iterate through its opcodes and count how many
LOAD_ATTR, LOAD_METHOD and LOAD_GLOBAL opcodes it has;
3. As part of (2) we initialize the offset-table with
correct mapping. Some opcodes will have a non-zero
entry in the offset-table, some won't. Opcode args
will always have zeros in the offset tables.
4. Then we allocate cache-entries table.
Yury
More information about the Python-Dev
mailing list