GH-96068: Document object layout (GH-96069)

https://github.com/python/cpython/commit/575f8880bf8498ee05a8e197fc2ed85db68... commit: 575f8880bf8498ee05a8e197fc2ed85db6880361 branch: main author: Mark Shannon <mark@hotpy.org> committer: markshannon <mark@hotpy.org> date: 2022-08-23T13:55:43+01:00 summary: GH-96068: Document object layout (GH-96069) files: A Objects/object_layout.md A Objects/object_layout_312.gv A Objects/object_layout_312.png A Objects/object_layout_full_312.gv A Objects/object_layout_full_312.png diff --git a/Objects/object_layout.md b/Objects/object_layout.md new file mode 100644 index 00000000000..9380b57938c --- /dev/null +++ b/Objects/object_layout.md @@ -0,0 +1,82 @@ +# Object layout + +## Common header + +Each Python object starts with two fields: + +* ob_refcnt +* ob_type + +which the form the header common to all Python objects, for all versions, +and hold the reference count and class of the object, respectively. + +## Pre-header + +Since the introduction of the cycle GC, there has also been a pre-header. +Before 3.11, this pre-header was two words in size. +It should be considered opaque to all code except the cycle GC. + +## 3.11 pre-header + +In 3.11 the pre-header was extended to include pointers to the VM managed ``__dict__``. +The reason for moving the ``__dict__`` to the pre-header is that it allows +faster access, as it is at a fixed offset, and it also allows object's +dictionaries to be lazily created when the ``__dict__`` attribute is +specifically asked for. + +In the 3.11 the non-GC part of the pre-header consists of two pointers: + +* dict +* values + +The values pointer refers to the ``PyDictValues`` array which holds the +values of the objects's attributes. +Should the dictionary be needed, then ``values`` is set to ``NULL`` +and the ``dict`` field points to the dictionary. + +## 3.12 pre-header + +In 3.12 the the pointer to the list of weak references is added to the +pre-header. In order to make space for it, the ``dict`` and ``values`` +pointers are combined into a single tagged pointer: + +* weakreflist +* dict_or_values + +If the object has no physical dictionary, then the ``dict_or_values`` +has its low bit set to one, and points to the values array. +If the object has a physical dictioanry, then the ``dict_or_values`` +has its low bit set to zero, and points to the dictionary. + +The untagged form is chosen for the dictionary pointer, rather than +the values pointer, to enable the (legacy) C-API function +`_PyObject_GetDictPtr(PyObject *obj)` to work. + + +## Layout of a "normal" Python object in 3.12: + +* weakreflist +* dict_or_values +* GC 1 +* GC 2 +* ob_refcnt +* ob_type + +For a "normal" Python object, that is one that doesn't inherit from a builtin +class or have slots, the header and pre-header form the entire object. + + + +There are several advantages to this layout: + +* It allows lazy `__dict__`s, as described above. +* The regular layout allows us to create tailored traversal and deallocation + functions based on layout, rather than inheritance. +* Multiple inheritance works properly, + as the weakrefs and dict are always at the same offset. + +The full layout object, with an opaque part defined by a C extension, +and `__slots__` looks like this: + + + diff --git a/Objects/object_layout_312.gv b/Objects/object_layout_312.gv new file mode 100644 index 00000000000..c0068d78568 --- /dev/null +++ b/Objects/object_layout_312.gv @@ -0,0 +1,50 @@ +digraph ideal { + + rankdir = "LR" + + + object [ + shape = none + label = <<table border="0" cellspacing="0"> + <tr><td><b>object</b></td></tr> + <tr><td port="w" border="1">weakrefs</td></tr> + <tr><td port="dv" border="1">dict or values</td></tr> + <tr><td border="1" >GC info 0</td></tr> + <tr><td border="1" >GC info 1</td></tr> + <tr><td port="r" border="1" >refcount</td></tr> + <tr><td port="h" border="1" >__class__</td></tr> + </table>> + ] + + values [ + shape = none + label = <<table border="0" cellspacing="0"> + <tr><td><b>values</b></td></tr> + <tr><td port="0" border="1">values[0]</td></tr> + <tr><td border="1">values[1]</td></tr> + <tr><td border="1">...</td></tr> + </table>> + + ] + + class [ + shape = none + label = <<table border="0" cellspacing="0"> + <tr><td><b>class</b></td></tr> + <tr><td port="head" bgcolor="lightgreen" border="1">...</td></tr> + <tr><td border="1" bgcolor="lightgreen">dict_offset</td></tr> + <tr><td border="1" bgcolor="lightgreen">...</td></tr> + <tr><td port="k" border="1" bgcolor="lightgreen">cached_keys</td></tr> + </table>> + ] + + keys [label = "dictionary keys"; fillcolor="lightgreen"; style="filled"] + NULL [ label = " NULL"; shape="plain"] + object:w -> NULL + object:h -> class:head + object:dv -> values:0 + class:k -> keys + + oop [ label = "pointer"; shape="plain"] + oop -> object:r +} diff --git a/Objects/object_layout_312.png b/Objects/object_layout_312.png new file mode 100644 index 00000000000..396dab183b3 Binary files /dev/null and b/Objects/object_layout_312.png differ diff --git a/Objects/object_layout_full_312.gv b/Objects/object_layout_full_312.gv new file mode 100644 index 00000000000..522fa32b066 --- /dev/null +++ b/Objects/object_layout_full_312.gv @@ -0,0 +1,25 @@ +digraph ideal { + + rankdir = "LR" + + + object [ + shape = none + label = <<table border="0" cellspacing="0"> + <tr><td><b>object</b></td></tr> + <tr><td port="w" border="1">weakrefs</td></tr> + <tr><td port="dv" border="1">dict or values</td></tr> + <tr><td border="1" >GC info 0</td></tr> + <tr><td border="1" >GC info 1</td></tr> + <tr><td port="r" border="1" >refcount</td></tr> + <tr><td port="h" border="1" >__class__</td></tr> + <tr><td border="1">opaque (extension) data </td></tr> + <tr><td border="1">...</td></tr> + <tr><td border="1">__slot__ 0</td></tr> + <tr><td border="1">...</td></tr> + </table>> + ] + + oop [ label = "pointer"; shape="plain"] + oop -> object:r +} diff --git a/Objects/object_layout_full_312.png b/Objects/object_layout_full_312.png new file mode 100644 index 00000000000..4f46ca86091 Binary files /dev/null and b/Objects/object_layout_full_312.png differ
participants (1)
-
markshannon