[Python-Dev] {}.popitem() (was Re: {}.first[key,value,item] ...)

Dec. 8, 2000

      After the last round of discussion, I was left with the idea that the
best thing we could do to help destructive iteration is to introduce a
{}.popitem() that returns an arbitrary (key, value) pair and deletes
it.  I wrote about this:
...
...
One more concern: if you repeatedly remove the *first* item, the hash
table will start looking lobsided.  Since we don't resize the hash
table on deletes, maybe picking an item at random (but not using an
expensive random generator!) would be better.
and Tim replied:
...
Which is the reason SETL doesn't specify *which* set item is removed:  if
you always start looking at "the front" of a dict that's being consumed, the
dict fills with turds without shrinking, you skip over them again and again,
and consuming the entire dict is still quadratic time.
Unfortunately, while using a random start point is almost always quicker
than that, the expected time for consuming the whole dict remains quadratic.
The clearest way around that is to save a per-dict search finger, recording
where the last search left off.  Start from its current value.  Failure if
it wraps around.  This is linear time in non-pathological cases (a
pathological case is one in which it isn't linear time <wink>).
I've implemented this, except I use a static variable for the finger
intead of a per-dict finger.  I'm concerned about adding 4-8 extra
bytes to each dict object for a feature that most dictionaries never
need.  So, instead, I use a single shared finger.  This works just as
well as long as this is used for a single dictionary.  For multiple
dictionaries (either used by the same thread or in different threads),
it'll work almost as well, although it's possible to make up a
pathological example that would work qadratically.

An easy example of such a pathological example is to call popitem()
for two identical dictionaries in lock step.

Comments please!  We could:

- Live with the pathological cases.

- Forget the whole thing; and then also forget about firstkey()
  etc. which has the same problem only worse.

- Fix the algorithm. Maybe jumping criss-cross through the hash table
  like lookdict does would improve that; but I don't understand the
  math used for that ("Cycle through GF(2^n)-{0}" ???).

I've placed a patch on SourceForge:

http://sourceforge.net/patch/?func=detailpatch&patch_id=102733&group_id=5470

The algorithm is:

static PyObject *
dict_popitem(dictobject *mp, PyObject *args)
{
	static int finger = 0;
	int i;
	dictentry *ep;
	PyObject *res;

	if (!PyArg_NoArgs(args))
		return NULL;
	if (mp->ma_used == 0) {
		PyErr_SetString(PyExc_KeyError,
				"popitem(): dictionary is empty");
		return NULL;
	}
	i = finger;
	if (i >= mp->ma_size)
		ir = 0;
	while ((ep = &mp->ma_table[i])->me_value == NULL) {
		i++;
		if (i >= mp->ma_size)
			i = 0;
	}
	finger = i+1;
	res = PyTuple_New(2);
	if (res != NULL) {
		PyTuple_SET_ITEM(res, 0, ep->me_key);
		PyTuple_SET_ITEM(res, 1, ep->me_value);
		Py_INCREF(dummy);
		ep->me_key = dummy;
		ep->me_value = NULL;
		mp->ma_used--;
	}
	return res;
}

--Guido van Rossum (home page: http://www.python.org/~guido/)