[Python-Dev] another dict crasher

Tim Peters tim.one@home.com
Sat, 2 Jun 2001 02:13:49 -0400


[Michael Hudson]
> Actually this crash was dict_print (I always forget about tp_print...).

We all should <wink>.

> It's pretty easy to mend:
>
> *** dictobject.c        Fri Jun  1 13:08:13 2001
> --- dictobject.c-fixed  Fri Jun  1 12:59:07 2001
> ***************
> *** 793,795 ****
>         any = 0;
> !       for (i = 0, ep = mp->ma_table; i < mp->ma_size; i++, ep++) {
>                 if (ep->me_value != NULL) {
> --- 793,796 ----
>         any = 0;
> !       for (i = 0; i < mp->ma_size; i++) {
> !               ep = &mp->ma_table[i];
>                 if (ep->me_value != NULL) {
> ***************
> *** 833,835 ****
>         any = 0;
> !       for (i = 0, ep = mp->ma_table; i < mp->ma_size && v; i++, ep++) {
>                 if (ep->me_value != NULL) {
> --- 834,837 ----
>         any = 0;
> !       for (i = 0; i < mp->ma_size && v; i++) {
> !               ep = &mp->ma_table[i];
>                 if (ep->me_value != NULL) {
>
> I'm not sure this stops still more Machiavellian behaviour from
> crashing the interpreter,

Alas, it doesn't.  You can't trust *anything* about a container you're
iterating over across any call that may call back into Python.  In these
cases, the call to PyObject_Repr() can execute any code at all, including
code that mutates the dict you're crawling over.  In particular, calling
PyObject_Repr() to format the key means the

     ep = &mp->ma_table[i]

pointer may be trash by the time PyObject_Repr() is called again to format
the value.  See characterize() for the pain it takes to guard against
everything, including encouraging <wink> comments like:

			if (cmp > 0 ||
			    i >= a->ma_size ||
			    a->ma_table[i].me_value == NULL)
			{
				/* Not the *smallest* a key; or maybe it is
				 * but the compare shrunk the dict so we can't
				 * find its associated value anymore; or
				 * maybe it is but the compare deleted the
				 * a[thiskey] entry.
				 */
				Py_DECREF(thiskey);
				continue;
			}

It should really add "or maybe it just shuffled the dict around and the
value at ma_table[i] is no longer associated with the key that *used* to be
at ma_table[i], but since there's still *some* non-NULL pointer there we'll
just pretend that didn't happen and press onward".

> and you can certainly get items being printed more than once or not
> at all.  I'm not sure this last is a problem;

Those don't matter:  in a long tradition, we buy "safety" not only at the
cost of bloating the code, but also by making the true behavior in case of
mutation unpredictable & inexplicable.  That's why I *really* liked the
"immutable list" trick in list.sort():  even if we could have made the code
bulletproof without it, we couldn't usefully explain what the heck it
actually did.  It's not Pythonic to blow up, but neither is it Pythonic to
be incomprehensible.  You simply can't win here.

> if the user's being this contrary there's only so much we can
> do to help him or her.

I'd prefer a similar internal immutable-dict trick that raised an exception
if the user was pushing Python into a corner where "blow up or do something
baffling" were its only choices.  That would render the original example
illegal, of course.  But would that be a bad thing?  What *should* it mean
when the user invokes an operation on a container and mutates the container
during that operation?  There's almost no chance that Jython does the same
thing as CPython in all these cases, so it's effectively undefined behavior
no matter how you plug the holes (short of raising an exception).