
After the last round of discussion, I was left with the idea that the best thing we could do to help destructive iteration is to introduce a {}.popitem() that returns an arbitrary (key, value) pair and deletes it. I wrote about this:
One more concern: if you repeatedly remove the *first* item, the hash table will start looking lobsided. Since we don't resize the hash table on deletes, maybe picking an item at random (but not using an expensive random generator!) would be better.
and Tim replied:
Which is the reason SETL doesn't specify *which* set item is removed: if you always start looking at "the front" of a dict that's being consumed, the dict fills with turds without shrinking, you skip over them again and again, and consuming the entire dict is still quadratic time.
Unfortunately, while using a random start point is almost always quicker than that, the expected time for consuming the whole dict remains quadratic.
The clearest way around that is to save a per-dict search finger, recording where the last search left off. Start from its current value. Failure if it wraps around. This is linear time in non-pathological cases (a pathological case is one in which it isn't linear time <wink>).
I've implemented this, except I use a static variable for the finger intead of a per-dict finger. I'm concerned about adding 4-8 extra bytes to each dict object for a feature that most dictionaries never need. So, instead, I use a single shared finger. This works just as well as long as this is used for a single dictionary. For multiple dictionaries (either used by the same thread or in different threads), it'll work almost as well, although it's possible to make up a pathological example that would work qadratically. An easy example of such a pathological example is to call popitem() for two identical dictionaries in lock step. Comments please! We could: - Live with the pathological cases. - Forget the whole thing; and then also forget about firstkey() etc. which has the same problem only worse. - Fix the algorithm. Maybe jumping criss-cross through the hash table like lookdict does would improve that; but I don't understand the math used for that ("Cycle through GF(2^n)-{0}" ???). I've placed a patch on SourceForge: http://sourceforge.net/patch/?func=detailpatch&patch_id=102733&group_id=5470 The algorithm is: static PyObject * dict_popitem(dictobject *mp, PyObject *args) { static int finger = 0; int i; dictentry *ep; PyObject *res; if (!PyArg_NoArgs(args)) return NULL; if (mp->ma_used == 0) { PyErr_SetString(PyExc_KeyError, "popitem(): dictionary is empty"); return NULL; } i = finger; if (i >= mp->ma_size) ir = 0; while ((ep = &mp->ma_table[i])->me_value == NULL) { i++; if (i >= mp->ma_size) i = 0; } finger = i+1; res = PyTuple_New(2); if (res != NULL) { PyTuple_SET_ITEM(res, 0, ep->me_key); PyTuple_SET_ITEM(res, 1, ep->me_value); Py_INCREF(dummy); ep->me_key = dummy; ep->me_value = NULL; mp->ma_used--; } return res; } --Guido van Rossum (home page: http://www.python.org/~guido/)