[Python-checkins] GH-96068: Document object layout (GH-96069)

markshannon webhook-mailer at python.org
Tue Aug 23 08:55:52 EDT 2022


https://github.com/python/cpython/commit/575f8880bf8498ee05a8e197fc2ed85db6880361
commit: 575f8880bf8498ee05a8e197fc2ed85db6880361
branch: main
author: Mark Shannon <mark at hotpy.org>
committer: markshannon <mark at hotpy.org>
date: 2022-08-23T13:55:43+01:00
summary:

GH-96068: Document object layout (GH-96069)

files:
A Objects/object_layout.md
A Objects/object_layout_312.gv
A Objects/object_layout_312.png
A Objects/object_layout_full_312.gv
A Objects/object_layout_full_312.png

diff --git a/Objects/object_layout.md b/Objects/object_layout.md
new file mode 100644
index 00000000000..9380b57938c
--- /dev/null
+++ b/Objects/object_layout.md
@@ -0,0 +1,82 @@
+# Object layout
+
+## Common header
+
+Each Python object starts with two fields:
+
+* ob_refcnt
+* ob_type
+
+which the form the header common to all Python objects, for all versions,
+and hold the reference count and class of the object, respectively.
+
+## Pre-header
+
+Since the introduction of the cycle GC, there has also been a pre-header.
+Before 3.11, this pre-header was two words in size.
+It should be considered opaque to all code except the cycle GC.
+
+## 3.11 pre-header
+
+In 3.11 the pre-header was extended to include pointers to the VM managed ``__dict__``.
+The reason for moving the ``__dict__`` to the pre-header is that it allows
+faster access, as it is at a fixed offset, and it also allows object's
+dictionaries to be lazily created when the ``__dict__`` attribute is
+specifically asked for.
+
+In the 3.11 the non-GC part of the pre-header consists of two pointers:
+
+* dict
+* values
+
+The values pointer refers to the ``PyDictValues`` array which holds the
+values of the objects's attributes.
+Should the dictionary be needed, then ``values`` is set to ``NULL``
+and the ``dict`` field points to the dictionary.
+
+## 3.12 pre-header
+
+In 3.12 the the pointer to the list of weak references is added to the
+pre-header. In order to make space for it, the ``dict`` and ``values``
+pointers are combined into a single tagged pointer:
+
+* weakreflist
+* dict_or_values
+
+If the object has no physical dictionary, then the ``dict_or_values``
+has its low bit set to one, and points to the values array.
+If the object has a physical dictioanry, then the ``dict_or_values``
+has its low bit set to zero, and points to the dictionary.
+
+The untagged form is chosen for the dictionary pointer, rather than
+the values pointer, to enable the (legacy) C-API function
+`_PyObject_GetDictPtr(PyObject *obj)` to work.
+
+
+## Layout of a "normal" Python object in 3.12:
+
+* weakreflist
+* dict_or_values
+* GC 1
+* GC 2
+* ob_refcnt
+* ob_type
+
+For a "normal" Python object, that is one that doesn't inherit from a builtin
+class or have slots, the header and pre-header form the entire object.
+
+![Layout of "normal" object in 3.12](./object_layout_312.png)
+
+There are several advantages to this layout:
+
+* It allows lazy `__dict__`s, as described above.
+* The regular layout allows us to create tailored traversal and deallocation
+  functions based on layout, rather than inheritance.
+* Multiple inheritance works properly,
+  as the weakrefs and dict are always at the same offset.
+
+The full layout object, with an opaque part defined by a C extension,
+and `__slots__` looks like this:
+
+![Layout of "full" object in 3.12](./object_layout_full_312.png)
+
diff --git a/Objects/object_layout_312.gv b/Objects/object_layout_312.gv
new file mode 100644
index 00000000000..c0068d78568
--- /dev/null
+++ b/Objects/object_layout_312.gv
@@ -0,0 +1,50 @@
+digraph ideal {
+
+    rankdir = "LR"
+
+
+    object [
+        shape = none
+        label = <<table border="0" cellspacing="0">
+                    <tr><td><b>object</b></td></tr>
+                    <tr><td port="w" border="1">weakrefs</td></tr>
+                    <tr><td port="dv" border="1">dict or values</td></tr>
+                    <tr><td border="1" >GC info 0</td></tr>
+                    <tr><td border="1" >GC info 1</td></tr>
+                    <tr><td port="r" border="1" >refcount</td></tr>
+                    <tr><td port="h" border="1" >__class__</td></tr>
+                </table>>
+    ]
+
+    values [
+        shape = none
+        label = <<table border="0" cellspacing="0">
+                    <tr><td><b>values</b></td></tr>
+                    <tr><td port="0" border="1">values[0]</td></tr>
+                    <tr><td border="1">values[1]</td></tr>
+                    <tr><td border="1">...</td></tr>
+                </table>>
+
+    ]
+
+    class [ 
+        shape = none
+        label = <<table border="0" cellspacing="0">
+                    <tr><td><b>class</b></td></tr>
+                    <tr><td port="head" bgcolor="lightgreen" border="1">...</td></tr>
+                    <tr><td border="1" bgcolor="lightgreen">dict_offset</td></tr>
+                    <tr><td border="1" bgcolor="lightgreen">...</td></tr>
+                    <tr><td port="k" border="1" bgcolor="lightgreen">cached_keys</td></tr>
+                </table>>
+    ]
+
+    keys [label = "dictionary keys"; fillcolor="lightgreen"; style="filled"]
+    NULL [ label = " NULL"; shape="plain"]
+    object:w ->  NULL
+    object:h -> class:head
+    object:dv -> values:0
+    class:k -> keys
+
+    oop [ label = "pointer"; shape="plain"]
+    oop -> object:r
+}
diff --git a/Objects/object_layout_312.png b/Objects/object_layout_312.png
new file mode 100644
index 00000000000..396dab183b3
Binary files /dev/null and b/Objects/object_layout_312.png differ
diff --git a/Objects/object_layout_full_312.gv b/Objects/object_layout_full_312.gv
new file mode 100644
index 00000000000..522fa32b066
--- /dev/null
+++ b/Objects/object_layout_full_312.gv
@@ -0,0 +1,25 @@
+digraph ideal {
+
+    rankdir = "LR"
+
+
+    object [
+        shape = none
+        label = <<table border="0" cellspacing="0">
+                    <tr><td><b>object</b></td></tr>
+                    <tr><td port="w" border="1">weakrefs</td></tr>
+                    <tr><td port="dv" border="1">dict or values</td></tr>
+                    <tr><td border="1" >GC info 0</td></tr>
+                    <tr><td border="1" >GC info 1</td></tr>
+                    <tr><td port="r" border="1" >refcount</td></tr>
+                    <tr><td port="h" border="1" >__class__</td></tr>
+                    <tr><td border="1">opaque (extension) data </td></tr>
+                    <tr><td border="1">...</td></tr>
+                    <tr><td border="1">__slot__ 0</td></tr>
+                    <tr><td border="1">...</td></tr>
+                </table>>
+    ]
+
+    oop [ label = "pointer"; shape="plain"]
+    oop -> object:r
+}
diff --git a/Objects/object_layout_full_312.png b/Objects/object_layout_full_312.png
new file mode 100644
index 00000000000..4f46ca86091
Binary files /dev/null and b/Objects/object_layout_full_312.png differ



More information about the Python-checkins mailing list