Re: [Python-Dev] another dict crasher
Michael Hudson <mwh@python.net> writes:
Adapted from a report on comp.lang.python from Wolfgang Lipp: [snip] segfaults both 2.1 and current (well, maybe a day old) CVS. Haven't tried Tim's latest patch, but I don't believe that will make any difference.
It's obvious <wink> what's happening; the dict's resizing inside the for loop in dict_repr and the ep pointer is dangling.
Actually this crash was dict_print (I always forget about tp_print...). It's pretty easy to mend: *** dictobject.c Fri Jun 1 13:08:13 2001 --- dictobject.c-fixed Fri Jun 1 12:59:07 2001 *************** *** 793,795 **** any = 0; ! for (i = 0, ep = mp->ma_table; i < mp->ma_size; i++, ep++) { if (ep->me_value != NULL) { --- 793,796 ---- any = 0; ! for (i = 0; i < mp->ma_size; i++) { ! ep = &mp->ma_table[i]; if (ep->me_value != NULL) { *************** *** 833,835 **** any = 0; ! for (i = 0, ep = mp->ma_table; i < mp->ma_size && v; i++, ep++) { if (ep->me_value != NULL) { --- 834,837 ---- any = 0; ! for (i = 0; i < mp->ma_size && v; i++) { ! ep = &mp->ma_table[i]; if (ep->me_value != NULL) { I'm not sure this stops still more Machiavellian behaviour from crashing the interpreter, and you can certainly get items being printed more than once or not at all. I'm not sure this last is a problem; if the user's being this contrary there's only so much we can do to help him or her. Cheers, M. -- I also feel it essential to note, [...], that Description Logics, non-Monotonic Logics, Default Logics and Circumscription Logics can all collectively go suck a cow. Thank you. -- http://advogato.org/person/Johnath/diary.html?start=4
[Michael Hudson]
Actually this crash was dict_print (I always forget about tp_print...).
We all should <wink>.
It's pretty easy to mend:
*** dictobject.c Fri Jun 1 13:08:13 2001 --- dictobject.c-fixed Fri Jun 1 12:59:07 2001 *************** *** 793,795 **** any = 0; ! for (i = 0, ep = mp->ma_table; i < mp->ma_size; i++, ep++) { if (ep->me_value != NULL) { --- 793,796 ---- any = 0; ! for (i = 0; i < mp->ma_size; i++) { ! ep = &mp->ma_table[i]; if (ep->me_value != NULL) { *************** *** 833,835 **** any = 0; ! for (i = 0, ep = mp->ma_table; i < mp->ma_size && v; i++, ep++) { if (ep->me_value != NULL) { --- 834,837 ---- any = 0; ! for (i = 0; i < mp->ma_size && v; i++) { ! ep = &mp->ma_table[i]; if (ep->me_value != NULL) {
I'm not sure this stops still more Machiavellian behaviour from crashing the interpreter,
Alas, it doesn't. You can't trust *anything* about a container you're iterating over across any call that may call back into Python. In these cases, the call to PyObject_Repr() can execute any code at all, including code that mutates the dict you're crawling over. In particular, calling PyObject_Repr() to format the key means the ep = &mp->ma_table[i] pointer may be trash by the time PyObject_Repr() is called again to format the value. See characterize() for the pain it takes to guard against everything, including encouraging <wink> comments like: if (cmp > 0 || i >= a->ma_size || a->ma_table[i].me_value == NULL) { /* Not the *smallest* a key; or maybe it is * but the compare shrunk the dict so we can't * find its associated value anymore; or * maybe it is but the compare deleted the * a[thiskey] entry. */ Py_DECREF(thiskey); continue; } It should really add "or maybe it just shuffled the dict around and the value at ma_table[i] is no longer associated with the key that *used* to be at ma_table[i], but since there's still *some* non-NULL pointer there we'll just pretend that didn't happen and press onward".
and you can certainly get items being printed more than once or not at all. I'm not sure this last is a problem;
Those don't matter: in a long tradition, we buy "safety" not only at the cost of bloating the code, but also by making the true behavior in case of mutation unpredictable & inexplicable. That's why I *really* liked the "immutable list" trick in list.sort(): even if we could have made the code bulletproof without it, we couldn't usefully explain what the heck it actually did. It's not Pythonic to blow up, but neither is it Pythonic to be incomprehensible. You simply can't win here.
if the user's being this contrary there's only so much we can do to help him or her.
I'd prefer a similar internal immutable-dict trick that raised an exception if the user was pushing Python into a corner where "blow up or do something baffling" were its only choices. That would render the original example illegal, of course. But would that be a bad thing? What *should* it mean when the user invokes an operation on a container and mutates the container during that operation? There's almost no chance that Jython does the same thing as CPython in all these cases, so it's effectively undefined behavior no matter how you plug the holes (short of raising an exception).
participants (2)
-
Michael Hudson
-
Tim Peters