[capi-sig]Tagged pointers as 'opaque' handles
Hi, has anyone considered mixing the "opaque handle" idea with tagged pointers?
I could imagine (on 64 bits) to reserve, say, a 'signed' 24 bits for a refcount and the rest for an index into an array of object/vtable pointer structs. All negative refcounts would have special meanings, such as
- this is the immortal None
- this is actually a tagged integer and not an object index
- this is a tagged float with an exact 32bit (or integer) value
- refcount has overflown and object has become immortal :)
Things like these, can't say which are reasonable and/or fast enough. I could also imagine reserving a few more bits for a builtin type ID to speed up type checks, especially for int/float/list/tuple/dict/fast-callable, maybe also things like flat tuples, where the items are directly stored consecutively in the object array following the tuple index.
Other runtimes could then implement the handles differently, as really opaque values and with or without tagged pointers, but CPython could use its own macros to give meaning to the tags. 32bit architectures would probably require different macros also for CPython, but since the whole design would need to work with and without tags, that should be doable.
Does this seem like something worth discussing?
Stefan
Hi,
I like tagged pointers because I expect that we can start to experiment it as soon as all C extensons only use an opaque API. Moreover, I expect better performance (yeah, I'm optimistic, it helps ;-))
My notes: https://pythoncapi.readthedocs.io/optimization_ideas.html#tagged-pointers-do...
Le mar. 5 mars 2019 à 08:06, Stefan Behnel <python_capi@behnel.de> a écrit :
I could imagine (on 64 bits) to reserve, say, a 'signed' 24 bits for a refcount and the rest for an index into an array of object/vtable pointer structs. All negative refcounts would have special meanings, such as
I'm not convinced that it's efficient. Compared to the current CPython implementation (PyObject* pointing to PyObject), it adds yet another indirection.
Instead of using 2 indirections for all data, I would prefer to put directly the content into the PyHandle/opaque "PyObject*" to avoid *zero* indirection. I expect that it's more efficient for CPU caches.
We can imagine to store small int, latin1 strings, maybe some singletons like Non?, and maybe also some floats, directly inside a 64-bit PyHandle integer. Python memory allocators are aligned to 8 bytes. We have 3 free bits to store data. One bit is enough to distinguish tagged pointers and regular PyObject*.
Things like these, can't say which are reasonable and/or fast enough.
Neil already implemented the idea and ran some benchmarks :-) https://mail.python.org/archives/list/capi-sig@python.org/thread/EGAY55ZWMF2...
""" The result looks promising: ./python -m perf timeit --name='x+y' -s 'x=10000; y=2' 'x+y' --dup 1000 -v -o int.json ./python -m perf timeit --name='x+y' -s 'x=fixedint(10000); y=fixedint(2)' 'x+y' --dup 1000 -v -o fixedint.json ./python -m perf compare_to int.json fixedint.json Mean +- std dev: [int] 32.3 ns +- 1.0 ns -> [fixedint] 10.8 ns +- 0.3 ns: 3.00x faster (-67%) """
maybe also things like flat tuples, where the items are directly stored consecutively in the object array following the tuple index.
If you put values directly inside PyHandle/opaque PyObject*, the "PyObject** ob_item" array of a PyTuple suddently becomes very efficient in term of memory footprint ;-) (same for list, dict, etc.)
Victor
Night gathers, and now my watch begins. It shall not end until my death.
participants (2)
-
Stefan Behnel
-
Victor Stinner