[New-bugs-announce] [issue44921] dict subclassing is slow
Marco Sulla
report at bugs.python.org
Sun Aug 15 16:10:26 EDT 2021
New submission from Marco Sulla <launchpad.net at marco.sulla.e4ward.com>:
I asked on SO why subclassing dict makes the subclass much slower in some operations. This is the answer by Monica (https://stackoverflow.com/a/59914459/1763602):
Indexing and in are slower in dict subclasses because of a bad interaction between a dict optimization and the logic subclasses use to inherit C slots. This should be fixable, though not from your end.
The CPython implementation has two sets of hooks for operator overloads. There are Python-level methods like __contains__ and __getitem__, but there's also a separate set of slots for C function pointers in the memory layout of a type object. Usually, either the Python method will be a wrapper around the C implementation, or the C slot will contain a function that searches for and calls the Python method. It's more efficient for the C slot to implement the operation directly, as the C slot is what Python actually accesses.
Mappings written in C implement the C slots sq_contains and mp_subscript to provide in and indexing. Ordinarily, the Python-level __contains__ and __getitem__ methods would be automatically generated as wrappers around the C functions, but the dict class has explicit implementations of __contains__ and __getitem__, because the explicit implementations (https://github.com/python/cpython/blob/v3.8.1/Objects/dictobject.c) are a bit faster than the generated wrappers:
static PyMethodDef mapp_methods[] = {
DICT___CONTAINS___METHODDEF
{"__getitem__", (PyCFunction)(void(*)(void))dict_subscript, METH_O | METH_COEXIST,
getitem__doc__},
...
(Actually, the explicit __getitem__ implementation is the same function as the mp_subscript implementation, just with a different kind of wrapper.)
Ordinarily, a subclass would inherit its parent's implementations of C-level hooks like sq_contains and mp_subscript, and the subclass would be just as fast as the superclass. However, the logic in update_one_slot (https://github.com/python/cpython/blob/v3.8.1/Objects/typeobject.c#L7202) looks for the parent implementation by trying to find the generated wrapper methods through an MRO search.
dict doesn't have generated wrappers for sq_contains and mp_subscript, because it provides explicit __contains__ and __getitem__ implementations.
Instead of inheriting sq_contains and mp_subscript, update_one_slot ends up giving the subclass sq_contains and mp_subscript implementations that perform an MRO search for __contains__ and __getitem__ and call those. This is much less efficient than inheriting the C slots directly.
Fixing this will require changes to the update_one_slot implementation.
Aside from what I described above, dict_subscript also looks up __missing__ for dict subclasses, so fixing the slot inheritance issue won't make subclasses completely on par with dict itself for lookup speed, but it should get them a lot closer.
As for pickling, on the dumps side, the pickle implementation has a dedicated fast path (https://github.com/python/cpython/blob/v3.8.1/Modules/_pickle.c#L4291) for dicts, while the dict subclass takes a more roundabout path through object.__reduce_ex__ and save_reduce.
On the loads side, the time difference is mostly just from the extra opcodes and lookups to retrieve and instantiate the __main__.A class, while dicts have a dedicated pickle opcode for making a new dict. If we compare the disassembly for the pickles:
In [26]: pickletools.dis(pickle.dumps({0: 0, 1: 1, 2: 2, 3: 3, 4: 4}))
0: \x80 PROTO 4
2: \x95 FRAME 25
11: } EMPTY_DICT
12: \x94 MEMOIZE (as 0)
13: ( MARK
14: K BININT1 0
16: K BININT1 0
18: K BININT1 1
20: K BININT1 1
22: K BININT1 2
24: K BININT1 2
26: K BININT1 3
28: K BININT1 3
30: K BININT1 4
32: K BININT1 4
34: u SETITEMS (MARK at 13)
35: . STOP
highest protocol among opcodes = 4
In [27]: pickletools.dis(pickle.dumps(A({0: 0, 1: 1, 2: 2, 3: 3, 4: 4})))
0: \x80 PROTO 4
2: \x95 FRAME 43
11: \x8c SHORT_BINUNICODE '__main__'
21: \x94 MEMOIZE (as 0)
22: \x8c SHORT_BINUNICODE 'A'
25: \x94 MEMOIZE (as 1)
26: \x93 STACK_GLOBAL
27: \x94 MEMOIZE (as 2)
28: ) EMPTY_TUPLE
29: \x81 NEWOBJ
30: \x94 MEMOIZE (as 3)
31: ( MARK
32: K BININT1 0
34: K BININT1 0
36: K BININT1 1
38: K BININT1 1
40: K BININT1 2
42: K BININT1 2
44: K BININT1 3
46: K BININT1 3
48: K BININT1 4
50: K BININT1 4
52: u SETITEMS (MARK at 31)
53: . STOP
highest protocol among opcodes = 4
we see that the difference between the two is that the second pickle needs a whole bunch of opcodes to look up __main__.A and instantiate it, while the first pickle just does EMPTY_DICT to get an empty dict. After that, both pickles push the same keys and values onto the pickle operand stack and run SETITEMS
----------
components: C API
messages: 399625
nosy: Marco Sulla
priority: normal
severity: normal
status: open
title: dict subclassing is slow
type: performance
versions: Python 3.9
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue44921>
_______________________________________
More information about the New-bugs-announce
mailing list