[issue13703] Hash collision security issue

Sat Jan 21 18:02:55 CET 2012

Dave Malcolm <dmalcolm at redhat.com> added the comment:

On Sat, 2012-01-21 at 14:27 +0000, Antoine Pitrou wrote:
> Antoine Pitrou <pitrou at free.fr> added the comment:
> 
> > Thoughts? (apart from "ugh! it's ugly!" yes I know - it's late here)
> 
> Is it guaranteed that no usage pattern can render this protection
> inefficient? What if a dict is constructed by intermingling lookups and
> inserts?
> Similarly, what happens with e.g. the common use case of
> dictdefault(list), where you append() after the lookup/insert? Does some
> key distribution allow the attack while circumventing the protection?

Yes, I agree that I was making an unrealistic assumption about usage
patterns.  There was also some global state (the "is_inserting"
variable).

I've tweaked the approach somewhat, moved the global to be per-dict, and
am attaching a revised version of the patch:
   amortized-probe-counting-dmalcolm-2012-01-21-003.patch

In this patch, rather than reset the count each time, I keep track of
the total number of calls to insertdict() that have happened for each
"large dict" (i.e. for which ma_table != ma_smalltable), and the total
number of probe iterations that have been needed to service those
insertions/overwrites.  It raises the exception when the *number of
probe iterations per insertion* exceeds a threshold factor (rather than
merely comparing the number of iterations against the current ma_used of
the dict).  I believe this means that it's tracking and checking every
time the dict is modified, and (I hope) protects us against any data
that drives the dict implementation away from linear behavior (because
that's essentially what it's testing for).  [the per-dict stats are
reset each time that it shrinks down to using ma_smalltable again, but I
think at-risk usage patterns in which that occurs are uncommon relative
to those in which it doesn't].

When attacked, this leads to exceptions like this:
AlgorithmicComplexityError: dict construction used 1697 probes whilst
performing 53 insertions (len() == 58) at key 58 with hash 42

i.e we have a dictionary containing 58 keys, which has seen 53
insert/overwrite operations since transitioning to the non-ma_smalltable
representation (at size 6); presumably it has 128 PyDictEntries.
Servicing those 53 operations has required a total 1697 iterations
through the probing loop, or a little over 32 probes per insert.

I just did a full run of the test suite (using run_tests.py), and it
mostly passed the new tests I've added (included the test for scenario 2
from Frank's email).

There were two failures:
======================================================================
FAIL: test_inheritance (test.test_pep352.ExceptionClassTests)
----------------------------------------------------------------------
AssertionError: 1 != 0 : {'AlgorithmicComplexityError'} not accounted
for
----------------------------------------------------------------------
which is obviously fixable (given a decision on where the exception
lives in the hierarchy)

and this one:
test test_mutants crashed -- Traceback (most recent call last):
  File
"/home/david/coding/python-hg/cpython-count-collisions/Lib/test/regrtest.py", line 1214, in runtest_inner
    the_package = __import__(abstest, globals(), locals(), [])
  File
"/home/david/coding/python-hg/cpython-count-collisions/Lib/test/test_mutants.py", line 159, in <module>
    test(100)
  File
"/home/david/coding/python-hg/cpython-count-collisions/Lib/test/test_mutants.py", line 156, in test
    test_one(random.randrange(1, 100))
  File
"/home/david/coding/python-hg/cpython-count-collisions/Lib/test/test_mutants.py", line 132, in test_one
    dict2keys = fill_dict(dict2, range(n), n)
  File
"/home/david/coding/python-hg/cpython-count-collisions/Lib/test/test_mutants.py", line 118, in fill_dict
    Horrid(random.choice(candidates))
AlgorithmicComplexityError: dict construction used 2753 probes whilst
performing 86 insertions (len() == 64) at key Horrid(86) with hash 42
though that seems to be deliberately degenerate code.

Caveats:
* no overflow handling (what happens after 2**32 modifications to a
long-lived dict on a 32-bit build?) - though that's fixable.
* no idea what the scaling factor for the threshold should be (there may
also be a deep mathematical objection here, based on how big-O notation
is defined in terms of an arbitrary scaling factor and limit)
* not optimized; I haven't looked at performance yet
* doesn't cover set(), though that also has spare space (I hope) via its
own smalltable array.

BTW, note that although I've been working on this variant of the
collision counting approach, I'm not opposed to the hash randomization
approach, or to adding extra checks in strategic places within the
stdlib: I'm keen to get some kind of appropriate fix approved by the
upstream Python development community so I can backport it to the
various recent-to-ancient versions of CPython I support in RHEL (and
Fedora), before we start seeing real-world attacks.

Hope this is helpful
Dave

----------
Added file: http://bugs.python.org/file24289/amortized-probe-counting-dmalcolm-2012-01-21-003.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13703>
_______________________________________
-------------- next part --------------
diff -r 3be60a4c8c63 Include/pyerrors.h

--- a/Include/pyerrors.h	Fri Jan 20 11:01:06 2012 -0500
+++ b/Include/pyerrors.h	Sat Jan 21 11:22:34 2012 -0500
@@ -207,6 +207,8 @@
 PyAPI_DATA(PyObject *) PyExc_BytesWarning;
 PyAPI_DATA(PyObject *) PyExc_ResourceWarning;
 
+PyAPI_DATA(PyObject *) PyExc_AlgorithmicComplexityError;
+
 
 /* Convenience functions */
 
diff -r 3be60a4c8c63 Lib/test/test_dict.py
--- a/Lib/test/test_dict.py	Fri Jan 20 11:01:06 2012 -0500
+++ b/Lib/test/test_dict.py	Sat Jan 21 11:22:34 2012 -0500
@@ -3,6 +3,8 @@
 
 import collections, random, string
 import gc, weakref
+import sys
+import time
 
 
 class DictTest(unittest.TestCase):
@@ -757,6 +759,198 @@
         self._tracked(MyDict())
 
 
+# Support classes for HashCollisionTests:
+class ChosenHash:
+    """
+    Use this to create arbitrary collections of keys that are non-equal
+    but have equal hashes, without needing to include hostile data
+    within the test suite.
+    """
+    def __init__(self, variability, hash):
+        self.variability = variability
+        self.hash = hash
+
+    def __eq__(self, other):
+        # The variability field is used to handle non-equalness:
+        return self.variability == other.variability
+
+    def __hash__(self):
+        return self.hash
+
+    def __repr__(self):
+        return 'ChosenHash(%r, %r)' % (self.variability,
+                                       self.hash)
+
+class Timer:
+    """
+    Simple way to measure time elapsed during a test case
+    """
+    def __init__(self):
+        self.starttime = time.time()
+
+    def get_elapsed_time(self):
+        """Get elapsed time in seconds as a float"""
+        curtime = time.time()
+        return curtime - self.starttime
+
+    def elapsed_time_as_str(self):
+        """Get elapsed time as a string (with units)"""
+        return '%0.3f seconds' % self.get_elapsed_time()
+
+class TookTooLong(RuntimeError):
+    def __init__(self, timelimit, elapsed, itercount=None):
+        self.timelimit = timelimit
+        self.elapsed = elapsed
+        self.itercount = itercount
+
+    def __str__(self):
+        result = 'took >= %s seconds' % self.timelimit
+        if self.itercount is not None:
+            result += (' (%0.3f seconds elapsed after %i iterations)'
+                       % (self.elapsed, self.itercount))
+        else:
+            result += ' (%0.3f seconds elapsed)' % self.elapsed
+        return result
+
+# Some of the tests involve constructing large dictionaries.  How big
+# should they be?
+ITEM_COUNT = 1000000
+
+# Arbitrary threshold (in seconds) for a "reasonable amount of time"
+# that it should take to work with ITEM_COUNT items:
+TIME_LIMIT = 5
+
+class _FasterThanContext(object):
+    """
+    A context manager for implementing assertFasterThan
+    """
+    def __init__(self, test_case, **kwargs):
+        self.test_case = test_case
+        if 'seconds' in kwargs:
+            self.timelimit = kwargs['seconds']
+        else:
+            raise ValueError()
+
+    def __enter__(self):
+        self.timer = Timer()
+        return self
+
+    def __exit__(self, exc_type, exc_value, tb):
+        if exc_type is not None:
+            # let unexpected exceptions pass through
+            return
+
+        if 1:
+            print('timer within %s took %s'
+                  % (self.test_case, self.timer.elapsed_time_as_str()))
+
+    def handle(self, callable_obj, args, kwargs):
+        """
+        If callable_obj is None, assertRaises/Warns is being used as a
+        context manager, so check for a 'msg' kwarg and return self.
+        If callable_obj is not None, call it passing args and kwargs.
+        """
+        if callable_obj is None:
+            self.msg = kwargs.pop('msg', None)
+            return self
+        with self:
+            callable_obj(*args, **kwargs)
+
+    def check_for_timeout(self, itercount):
+        """
+        Allow directly checking for timeouts from within a loop, supplying 
+        an iteration count.  If the timer has elapsed, this will raise a
+        TookTooLong exception, indicating how many iterations were completed
+        when the time limit was reached.  Otherwise, it does nothing.
+        """
+        elapsed_time = self.timer.get_elapsed_time()
+        if elapsed_time > self.timelimit:
+            raise TookTooLong(self.timelimit,
+                              elapsed_time,
+                              itercount)
+        
+ at support.cpython_only
+class HashCollisionTests(unittest.TestCase):
+    """
+    Issue 13703: tests about the behavior of dicts in the face of hostile data
+    """
+
+    def assertFasterThan(self, callableObj=None, *args, **kwargs):
+        context = _FasterThanContext(self, *args, **kwargs)
+        return context.handle(callableObj, args, kwargs)
+
+    def test_timings_with_benign_data(self):
+        # Verify that inserting many keys into a dict only takes a few seconds
+        d = dict()
+        with self.assertFasterThan(seconds=TIME_LIMIT) as cm:
+            for i in range(ITEM_COUNT):
+                d[i] = 0
+
+        # Verify that we can also retrieve the values quickly:
+        with self.assertFasterThan(seconds=TIME_LIMIT) as cm:
+            d[i]
+
+        # Verify that we can quickly insert the same item many times
+        # (overwriting each time):
+        d = dict()
+        with self.assertFasterThan(seconds=TIME_LIMIT) as cm:
+            for i in range(ITEM_COUNT):
+                d[0] = 0
+
+    def test_not_reaching_limit(self):
+        # Ensure that we can insert equal-hashed keys up to (but not reaching)
+        # the collision climit:
+        with self.assertFasterThan(seconds=TIME_LIMIT) as cm:
+            d = dict()
+            for i in range(50):
+                key = ChosenHash(i, 42)
+                d[key] = 0
+            
+    def test_reaching_collision_limit(self):
+        """
+        Ensure that too many non-equal keys with the same hash lead to a
+        TooManyCollisions exception
+        """
+        with self.assertFasterThan(seconds=TIME_LIMIT) as cm:
+            with self.assertRaisesRegex(AlgorithmicComplexityError,
+                                        ('dict construction used 1697 probes'
+                                         ' whilst performing 53 insertions'
+                                         ' \(len\(\) == 58\)'
+                                         ' at key ChosenHash\(58, 42\)'
+                                         ' with hash 42')):
+                d = dict()
+                for i in range(1000):
+                    key = ChosenHash(i, 42)
+                    d[key] = 0
+
+    # Frank Sievertsen found scenarios in which the collision-counting
+    # scheme could be attacked:
+    #   http://mail.python.org/pipermail/python-dev/2012-January/115726.html
+
+    def test_scenario_b_from_Frank_Sievertsen(self):
+        d = dict()
+
+        with self.assertFasterThan(seconds=TIME_LIMIT) as cm:
+            with self.assertRaisesRegex(AlgorithmicComplexityError,
+                                        ('dict construction used 1697 probes'
+                                         ' whilst performing 53 insertions'
+                                         ' \(len\(\) == 58\)'
+                                         ' at key ChosenHash\(58, 42\)'
+                                         ' with hash 42')):
+                # Insert hash collisions up to (but not reaching) the proposed
+                # limit:
+                for i in range(1000):
+                    key = ChosenHash(i, 42)
+                    d[key] = 0
+                    cm.check_for_timeout(i)
+
+                # Now try to add many equal values that collide
+                # with the hash, and see how long it takes
+                for i in range(ITEM_COUNT):
+                    key = ChosenHash(0, 42)
+                    d[key] = 0
+                    cm.check_for_timeout(i)
+
 from test import mapping_tests
 
 class GeneralMappingTests(mapping_tests.BasicTestMappingProtocol):
@@ -771,6 +965,7 @@
 def test_main():
     support.run_unittest(
         DictTest,
+        HashCollisionTests,
         GeneralMappingTests,
         SubclassMappingTests,
     )
diff -r 3be60a4c8c63 Objects/dictobject.c
--- a/Objects/dictobject.c	Fri Jan 20 11:01:06 2012 -0500
+++ b/Objects/dictobject.c	Sat Jan 21 11:22:34 2012 -0500
@@ -10,6 +10,19 @@
 #include "Python.h"
 #include "stringlib/eq.h"
 
+#define Py_MAX_AVERAGE_PROBES_PER_INSERT 32
+/* power-of-two to make multiplication fast */
+
+/* For large dictionaries, reuse the space allocated for ma_smalltable */
+typedef struct _Py_LargeDictFields {
+    int is_inserting;
+    size_t num_insertions;
+    size_t insertion_iter_count;
+} _Py_LargeDictFields;
+
+#define PyDict_LARGEDICTFIELDS(mp) \
+  ((_Py_LargeDictFields*)&(mp)->ma_smalltable)
+/* FIXME: add assert(mp->ma_table != mp->ma_smalltable) */
 
 /* Set a key error with the specified argument, wrapping it in a
  * tuple automatically so that tuple keys are not unpacked as the
@@ -25,6 +38,45 @@
     Py_DECREF(tup);
 }
 
+/* Set a AlgorithmicComplexityError error */
+static void
+set_algorithmic_complexity_error(PyDictObject *mp, PyObject *key, Py_hash_t hash)
+{
+    _Py_LargeDictFields *large_fields = PyDict_LARGEDICTFIELDS(mp);
+    assert(mp->ma_table != mp->ma_smalltable);
+
+    PyErr_Format(PyExc_AlgorithmicComplexityError,
+                 ("dict construction used %i probes"
+                  " whilst performing %i insertions"
+                  " (len() == %i)"
+                  " at key %R with hash %zd"),
+                 large_fields->insertion_iter_count,
+                 large_fields->num_insertions,
+                 mp->ma_used,
+                 key, hash);
+    /* Backporting notes: (FIXME)
+       %R is a Python 3-ism
+       %zd is for Py_ssize_t, which in Python 3 is the same as Py_hash_t
+    */
+}
+
+static int
+check_iter_count(PyDictObject *mp, PyObject *key, Py_hash_t hash)
+{
+    if (mp->ma_mask >= PyDict_MINSIZE) {
+        _Py_LargeDictFields *large_fields = PyDict_LARGEDICTFIELDS(mp);
+        assert(mp->ma_table != mp->ma_smalltable);
+        if (large_fields->is_inserting) {
+            if (++large_fields->insertion_iter_count
+                > (large_fields->num_insertions * Py_MAX_AVERAGE_PROBES_PER_INSERT) ) {
+                set_algorithmic_complexity_error(mp, key, hash);
+                return 1; /* error */
+            }
+        }
+    }
+    return 0; /* no error */
+}
+
 /* Define this out if you don't want conversion statistics on exit. */
 #undef SHOW_CONVERSION_COUNTS
 
@@ -389,6 +441,10 @@
         }
         else if (ep->me_key == dummy && freeslot == NULL)
             freeslot = ep;
+
+        if (check_iter_count(mp, key, hash)) {
+            return NULL;
+        }
     }
     assert(0);          /* NOT REACHED */
     return 0;
@@ -451,6 +507,9 @@
             return ep;
         if (ep->me_key == dummy && freeslot == NULL)
             freeslot = ep;
+        if (check_iter_count(mp, key, hash)) {
+            return NULL;
+        }
     }
     assert(0);          /* NOT REACHED */
     return 0;
@@ -532,7 +591,14 @@
     typedef PyDictEntry *(*lookupfunc)(PyDictObject *, PyObject *, Py_hash_t);
 
     assert(mp->ma_lookup != NULL);
+    if (mp->ma_mask >= PyDict_MINSIZE) {
+        PyDict_LARGEDICTFIELDS(mp)->is_inserting = 1;
+        PyDict_LARGEDICTFIELDS(mp)->num_insertions += 1;
+    }
     ep = mp->ma_lookup(mp, key, hash);
+    if (mp->ma_mask >= PyDict_MINSIZE) {
+        PyDict_LARGEDICTFIELDS(mp)->is_inserting = 0;
+    }
     if (ep == NULL) {
         Py_DECREF(key);
         Py_DECREF(value);
@@ -675,8 +741,15 @@
         /* else key == value == NULL:  nothing to do */
     }
 
-    if (is_oldtable_malloced)
+    if (is_oldtable_malloced) {
         PyMem_DEL(oldtable);
+    } else {
+        if (mp->ma_table != mp->ma_smalltable) {
+            /* clean ma_smalltable for use as _Py_LargeDictFields: */
+            memset(mp->ma_smalltable, 0, sizeof(mp->ma_smalltable));
+        }
+    }
+
     return 0;
 }
 
diff -r 3be60a4c8c63 Objects/exceptions.c
--- a/Objects/exceptions.c	Fri Jan 20 11:01:06 2012 -0500
+++ b/Objects/exceptions.c	Sat Jan 21 11:22:34 2012 -0500
@@ -2205,6 +2205,12 @@
 SimpleExtendsException(PyExc_Warning, ResourceWarning,
     "Base class for warnings about resource usage.");
 
+/*
+ *    AlgorithmicComplexityError extends BaseException
+ */
+SimpleExtendsException(PyExc_BaseException, AlgorithmicComplexityError,
+    "Base class for warnings about computationally-infeasible data.");
+
 
 
 /* Pre-computed RuntimeError instance for when recursion depth is reached.
@@ -2318,6 +2324,7 @@
     PRE_INIT(UnicodeWarning)
     PRE_INIT(BytesWarning)
     PRE_INIT(ResourceWarning)
+    PRE_INIT(AlgorithmicComplexityError)
 
     /* OSError subclasses */
     PRE_INIT(ConnectionError);
@@ -2399,6 +2406,7 @@
     POST_INIT(UnicodeWarning)
     POST_INIT(BytesWarning)
     POST_INIT(ResourceWarning)
+    POST_INIT(AlgorithmicComplexityError)
 
     if (!errnomap) {
         errnomap = PyDict_New();