[Python-checkins] [3.8] gh-93065: Fix HAMT to iterate correctly over 7-level deep trees (GH-93066) (#93148)

ambv webhook-mailer at python.org
Tue May 24 05:27:00 EDT 2022


https://github.com/python/cpython/commit/6d4927ad1383e8bb8bda6659d24b6d05094a7b99
commit: 6d4927ad1383e8bb8bda6659d24b6d05094a7b99
branch: 3.8
author: Łukasz Langa <lukasz at langa.pl>
committer: ambv <lukasz at langa.pl>
date: 2022-05-24T11:26:25+02:00
summary:

[3.8] gh-93065: Fix HAMT to iterate correctly over 7-level deep trees (GH-93066) (#93148)

Also while there, clarify a few things about why we reduce the hash to 32 bits.

Co-authored-by: Eli Libman <eli at hyro.ai>
Co-authored-by: Yury Selivanov <yury at edgedb.com>
Co-authored-by: Łukasz Langa <lukasz at langa.pl>

(cherry picked from commit c1f5c903a7e4ed27190488f4e33b00d3c3d952e5)

files:
A Misc/NEWS.d/next/Core and Builtins/2022-05-21-23-21-37.gh-issue-93065.5I18WC.rst
M Include/internal/pycore_hamt.h
M Lib/test/test_context.py
M Misc/ACKS
M Python/hamt.c

diff --git a/Include/internal/pycore_hamt.h b/Include/internal/pycore_hamt.h
index e65aef5e21a95..ff9cc24fd7166 100644
--- a/Include/internal/pycore_hamt.h
+++ b/Include/internal/pycore_hamt.h
@@ -5,7 +5,19 @@
 #  error "this header requires Py_BUILD_CORE define"
 #endif
 
-#define _Py_HAMT_MAX_TREE_DEPTH 7
+
+/*
+HAMT tree is shaped by hashes of keys. Every group of 5 bits of a hash denotes
+the exact position of the key in one level of the tree. Since we're using
+32 bit hashes, we can have at most 7 such levels. Although if there are
+two distinct keys with equal hashes, they will have to occupy the same
+cell in the 7th level of the tree -- so we'd put them in a "collision" node.
+Which brings the total possible tree depth to 8. Read more about the actual
+layout of the HAMT tree in `hamt.c`.
+
+This constant is used to define a datastucture for storing iteration state.
+*/
+#define _Py_HAMT_MAX_TREE_DEPTH 8
 
 
 #define PyHamt_Check(o) (Py_TYPE(o) == &_PyHamt_Type)
diff --git a/Lib/test/test_context.py b/Lib/test/test_context.py
index b9e991a400092..b954b135dd6a5 100644
--- a/Lib/test/test_context.py
+++ b/Lib/test/test_context.py
@@ -537,6 +537,41 @@ def test_hamt_collision_1(self):
         self.assertEqual(len(h4), 2)
         self.assertEqual(len(h5), 3)
 
+    def test_hamt_collision_3(self):
+        # Test that iteration works with the deepest tree possible.
+        # https://github.com/python/cpython/issues/93065
+
+        C = HashKey(0b10000000_00000000_00000000_00000000, 'C')
+        D = HashKey(0b10000000_00000000_00000000_00000000, 'D')
+
+        E = HashKey(0b00000000_00000000_00000000_00000000, 'E')
+
+        h = hamt()
+        h = h.set(C, 'C')
+        h = h.set(D, 'D')
+        h = h.set(E, 'E')
+
+        # BitmapNode(size=2 count=1 bitmap=0b1):
+        #   NULL:
+        #     BitmapNode(size=2 count=1 bitmap=0b1):
+        #       NULL:
+        #         BitmapNode(size=2 count=1 bitmap=0b1):
+        #           NULL:
+        #             BitmapNode(size=2 count=1 bitmap=0b1):
+        #               NULL:
+        #                 BitmapNode(size=2 count=1 bitmap=0b1):
+        #                   NULL:
+        #                     BitmapNode(size=2 count=1 bitmap=0b1):
+        #                       NULL:
+        #                         BitmapNode(size=4 count=2 bitmap=0b101):
+        #                           <Key name:E hash:0>: 'E'
+        #                           NULL:
+        #                             CollisionNode(size=4 id=0x107a24520):
+        #                               <Key name:C hash:2147483648>: 'C'
+        #                               <Key name:D hash:2147483648>: 'D'
+
+        self.assertEqual({k.name for k in h.keys()}, {'C', 'D', 'E'})
+
     def test_hamt_stress(self):
         COLLECTION_SIZE = 7000
         TEST_ITERS_EVERY = 647
diff --git a/Misc/ACKS b/Misc/ACKS
index a265e7e9a7273..ed9b639b1d924 100644
--- a/Misc/ACKS
+++ b/Misc/ACKS
@@ -1002,6 +1002,8 @@ Robert Li
 Xuanji Li
 Zekun Li
 Zheao Li
+Eli Libman
+Dan Lidral-Porter
 Robert van Liere
 Ross Light
 Shawn Ligocki
diff --git a/Misc/NEWS.d/next/Core and Builtins/2022-05-21-23-21-37.gh-issue-93065.5I18WC.rst b/Misc/NEWS.d/next/Core and Builtins/2022-05-21-23-21-37.gh-issue-93065.5I18WC.rst
new file mode 100644
index 0000000000000..ea801653f7502
--- /dev/null
+++ b/Misc/NEWS.d/next/Core and Builtins/2022-05-21-23-21-37.gh-issue-93065.5I18WC.rst	
@@ -0,0 +1,5 @@
+Fix contextvars HAMT implementation to handle iteration over deep trees.
+
+The bug was discovered and fixed by Eli Libman. See
+`MagicStack/immutables#84 <https://github.com/MagicStack/immutables/issues/84>`_
+for more details.
diff --git a/Python/hamt.c b/Python/hamt.c
index 5efc8d7fabe8e..0e463d74ef64e 100644
--- a/Python/hamt.c
+++ b/Python/hamt.c
@@ -408,14 +408,22 @@ hamt_hash(PyObject *o)
         return -1;
     }
 
-    /* While it's suboptimal to reduce Python's 64 bit hash to
+    /* While it's somewhat suboptimal to reduce Python's 64 bit hash to
        32 bits via XOR, it seems that the resulting hash function
        is good enough (this is also how Long type is hashed in Java.)
        Storing 10, 100, 1000 Python strings results in a relatively
        shallow and uniform tree structure.
 
-       Please don't change this hashing algorithm, as there are many
-       tests that test some exact tree shape to cover all code paths.
+       Also it's worth noting that it would be possible to adapt the tree
+       structure to 64 bit hashes, but that would increase memory pressure
+       and provide little to no performance benefits for collections with
+       fewer than billions of key/value pairs.
+
+       Important: do not change this hash reducing function. There are many
+       tests that need an exact tree shape to cover all code paths and
+       we do that by specifying concrete values for test data's `__hash__`.
+       If this function is changed most of the regression tests would
+       become useless.
     */
     int32_t xored = (int32_t)(hash & 0xffffffffl) ^ (int32_t)(hash >> 32);
     return xored == -1 ? -2 : xored;



More information about the Python-checkins mailing list