[Python-Dev] {}.popitem() (was Re: {}.first[key,value,item] ...)
Guido van Rossum
guido@python.org
Fri, 08 Dec 2000 13:43:39 -0500
After the last round of discussion, I was left with the idea that the
best thing we could do to help destructive iteration is to introduce a
{}.popitem() that returns an arbitrary (key, value) pair and deletes
it. I wrote about this:
> > One more concern: if you repeatedly remove the *first* item, the hash
> > table will start looking lobsided. Since we don't resize the hash
> > table on deletes, maybe picking an item at random (but not using an
> > expensive random generator!) would be better.
and Tim replied:
> Which is the reason SETL doesn't specify *which* set item is removed: if
> you always start looking at "the front" of a dict that's being consumed, the
> dict fills with turds without shrinking, you skip over them again and again,
> and consuming the entire dict is still quadratic time.
>
> Unfortunately, while using a random start point is almost always quicker
> than that, the expected time for consuming the whole dict remains quadratic.
>
> The clearest way around that is to save a per-dict search finger, recording
> where the last search left off. Start from its current value. Failure if
> it wraps around. This is linear time in non-pathological cases (a
> pathological case is one in which it isn't linear time <wink>).
I've implemented this, except I use a static variable for the finger
intead of a per-dict finger. I'm concerned about adding 4-8 extra
bytes to each dict object for a feature that most dictionaries never
need. So, instead, I use a single shared finger. This works just as
well as long as this is used for a single dictionary. For multiple
dictionaries (either used by the same thread or in different threads),
it'll work almost as well, although it's possible to make up a
pathological example that would work qadratically.
An easy example of such a pathological example is to call popitem()
for two identical dictionaries in lock step.
Comments please! We could:
- Live with the pathological cases.
- Forget the whole thing; and then also forget about firstkey()
etc. which has the same problem only worse.
- Fix the algorithm. Maybe jumping criss-cross through the hash table
like lookdict does would improve that; but I don't understand the
math used for that ("Cycle through GF(2^n)-{0}" ???).
I've placed a patch on SourceForge:
http://sourceforge.net/patch/?func=detailpatch&patch_id=102733&group_id=5470
The algorithm is:
static PyObject *
dict_popitem(dictobject *mp, PyObject *args)
{
static int finger = 0;
int i;
dictentry *ep;
PyObject *res;
if (!PyArg_NoArgs(args))
return NULL;
if (mp->ma_used == 0) {
PyErr_SetString(PyExc_KeyError,
"popitem(): dictionary is empty");
return NULL;
}
i = finger;
if (i >= mp->ma_size)
ir = 0;
while ((ep = &mp->ma_table[i])->me_value == NULL) {
i++;
if (i >= mp->ma_size)
i = 0;
}
finger = i+1;
res = PyTuple_New(2);
if (res != NULL) {
PyTuple_SET_ITEM(res, 0, ep->me_key);
PyTuple_SET_ITEM(res, 1, ep->me_value);
Py_INCREF(dummy);
ep->me_key = dummy;
ep->me_value = NULL;
mp->ma_used--;
}
return res;
}
--Guido van Rossum (home page: http://www.python.org/~guido/)