Re: [Python-Dev] another dict crasher

"Tim Peters" <tim.one@home.com> writes:
I can't easily see other examples of the problem; there certainly might be things you could do with comparisons that could trigger crashes, but that code's so hairy that it's almost impossible for me to be sure.
It's easy to be sure: any code that tries to remember anything about a dict (ditto any mutable object) across a "dangerous" call, other than the mere address of the object, is a place you *can* provoke a core dump. It may not be easy to provoke, and a given provoking test case may not fail across all platforms, or even every time you run it on a single platform, but it's "an obvious" hole all the same.
Ah, like this one: dict = {} # let's force dict to malloc its table for i in range(1,10): dict[i] = i class Machiavelli2: def __eq__(self, other): dict.clear() return 1 def __hash__(self): return 0 dict[Machiavelli2()] = Machiavelli2() print dict[Machiavelli2()] I'll attach a patch, but it's another branch inside lookdict (though not lookdict_string which is I guess the really performance sensitive one). Cheers, M. Index: dictobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v retrieving revision 2.100 diff -c -1 -r2.100 dictobject.c *** dictobject.c 2001/06/02 08:27:39 2.100 --- dictobject.c 2001/06/02 11:36:47 *************** *** 273,274 **** --- 273,281 ---- cmp = PyObject_RichCompareBool(ep->me_key, key, Py_EQ); + if (ep0 != mp->ma_table) { + PyErr_SetString(PyExc_RuntimeError, + "dict resized on comparison"); + ep = mp->ma_table; + while (ep->me_value) ep++; + return ep; + } if (cmp > 0) { *************** *** 310,311 **** --- 317,325 ---- cmp = PyObject_RichCompareBool(ep->me_key, key, Py_EQ); + if (ep0 != mp->ma_table) { + PyErr_SetString(PyExc_RuntimeError, + "dict resized on comparison"); + ep = mp->ma_table; + while (ep->me_value) ep++; + return ep; + } if (cmp > 0) { Here's another test case to work out the second of those new if statements: dict = {} # let's force dict to malloc its table for i in range(1,10): dict[i] = i class Machiavelli3: def __init__(self, id): self.id = id def __eq__(self, other): if self.id == other.id: dict.clear() return 1 else: return 0 def __repr__(self): return "%s(%s)"%(self.__class__.__name__, self.id) def __hash__(self): return 0 dict[Machiavelli3(1)] = Machiavelli3(0) dict[Machiavelli3(2)] = Machiavelli3(0) print dict[Machiavelli3(2)] -- M-x psych[TAB][RETURN] -- try it

[Michael Hudson]
Ah, like this one:
dict = {}
# let's force dict to malloc its table for i in range(1,10): dict[i] = i
class Machiavelli2: def __eq__(self, other): dict.clear() return 1 def __hash__(self): return 0
dict[Machiavelli2()] = Machiavelli2()
print dict[Machiavelli2()]
Told you it was easy <wink>.
I'll attach a patch, but it's another branch inside lookdict (though not lookdict_string which is I guess the really performance sensitive one).
lookdict_string is crucial to Python's own performance. Dicts indexed by ints or class instances or ... are vital to other apps.
Index: dictobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v retrieving revision 2.100 diff -c -1 -r2.100 dictobject.c *** dictobject.c 2001/06/02 08:27:39 2.100 --- dictobject.c 2001/06/02 11:36:47 *************** *** 273,274 **** --- 273,281 ---- cmp = PyObject_RichCompareBool(ep->me_key, key, Py_EQ); + if (ep0 != mp->ma_table) { + PyErr_SetString(PyExc_RuntimeError, + "dict resized on comparison"); + ep = mp->ma_table; + while (ep->me_value) ep++; + return ep; + } if (cmp > 0) { *************** *** 310,311 **** --- 317,325 ---- cmp = PyObject_RichCompareBool(ep->me_key, key, Py_EQ); + if (ep0 != mp->ma_table) { + PyErr_SetString(PyExc_RuntimeError, + "dict resized on comparison"); + ep = mp->ma_table; + while (ep->me_value) ep++; + return ep; + } if (cmp > 0) {
Then we have other problems. Note the comment before lookdict: Exceptions are never reported by this function, and outstanding exceptions are maintained. The patched code doesn't preserve that. Looking for "the first" unused or dummy slot isn't good enough either, as surely the user has the right to expect that after, e.g., d[m] = 1, d[m] retrieves 1. That is, picking a reusable slot "at random" doesn't respect the *semantics* of dict operations ("just because" the dict resized doesn't mean the key they're looking for went away!). It would be better in this case to go back to the top and start over. However, then an adversarial user can construct a case that never terminates. Unclear what to do.
participants (2)
-
Michael Hudson
-
Tim Peters