[Python-Dev] RE: {}.popitem() (was Re: {}.first[key,value,item] ...)

Tim Peters tim.one@home.com
Sat, 9 Dec 2000 15:49:04 -0500

> ...
> I've implemented this, except I use a static variable for the finger
> intead of a per-dict finger.  I'm concerned about adding 4-8 extra
> bytes to each dict object for a feature that most dictionaries never
> need.

It's a bit ironic that dicts are guaranteed to be at least 1/3 wasted space
<wink>.  Let's pick on Christian's idea to reclaim a few bytes of that.

> So, instead, I use a single shared finger.  This works just as
> well as long as this is used for a single dictionary.  For multiple
> dictionaries (either used by the same thread or in different threads),
> it'll work almost as well, although it's possible to make up a
> pathological example that would work qadratically.
> An easy example of such a pathological example is to call popitem()
> for two identical dictionaries in lock step.

Please see my later comments attached to the patch:


In short, for me (truly) identical dicts perform well with or without my
suggestion, while dicts cloned via dict.copy() perform horribly with or
without my suggestion (their internal structures differ); still curious as
to whether that's also true for you (am I looking at a Windows bug?  I don't
see how, but it's possible ...).  In any case, my suggestion turned out to
be worthless on my box.

Playing around via simulations suggests that a shared finger is going to be
disastrous when consuming more than one dict unless they have identical
internal structure (not just compare equal).  As soon as they get a little
out of synch, it just gets worse with each succeeding probe.

> Comments please!  We could:
> - Live with the pathological cases.

How boring <wink>.

> - Forget the whole thing; and then also forget about firstkey()
>   etc. which has the same problem only worse.

I don't know that this is an important idea for dicts in general (it is
important for sets) -- it's akin to an xrange for dicts.  But then I've had
more than one real-life program that built giant dicts then ran out of
memory trying to iterate over them!  I'd like to fix that.

> - Fix the algorithm. Maybe jumping criss-cross through the hash table
>   like lookdict does would improve that; but I don't understand the
>   math used for that ("Cycle through GF(2^n)-{0}" ???).

Christian explained that well (thanks!).

However, I still don't see any point to doing that business in .popitem():
when inserting keys, the jitterbug probe sequence has the crucial benefit of
preventing primary clustering when keys collide.  But when we consume a
dict, we just want to visit every slot as quickly as possible.

> Appendix, on the use of finger:
> -------------------------------
> Instead of using a global finger variable, you can do the
> following (involving a cast from object to int) :
> - if the 0'th slot of the dict is non-empty:
>   return this element and insert the dummy element
>   as key. Set the value field to the Dictionary Algorithm
>   would give for the removed object's hash. This is the
>   next finger.
> - else:
>   treat the value field of the 0'th slot as the last finger.
>   If it is zero, initialize it with 2^n-1.
>   Repetitively use the DA until you find an entry. Save
>   the finger in slot 0 again.
> This dosn't cost an extra slot, and even when the dictionary
> is written between removals, the chance to loose the finger
> is just 1:(2^n-1) on every insertion.

I like that, except:

1) As above, I don't believe the GF business buys anything over
   a straightforward search when consuming a dict.

2) Overloading the value field bristles with problems, in part
   because it breaks the invariant that a slot is unused if
   and only if the value field is NULL, in part because C
   doesn't guarantee that you can get away with casting an
   arbitrary int to a pointer and back again.

None of the problems in #2 arise if we abuse the me_hash field instead, so
the attached does that.  Here's a typical run of Guido's test case using
this (on an 866MHz machine w/ 256Mb RAM -- the early values jump all over
the place from run to run):

run = 0
log2size = 10 size = 1024
    7.4 usec per item to build (total 0.008 sec)
    3.4 usec per item to destroy twins (total 0.003 sec)
log2size = 11 size = 2048
    6.7 usec per item to build (total 0.014 sec)
    3.4 usec per item to destroy twins (total 0.007 sec)
log2size = 12 size = 4096
    7.0 usec per item to build (total 0.029 sec)
    3.7 usec per item to destroy twins (total 0.015 sec)
log2size = 13 size = 8192
    7.1 usec per item to build (total 0.058 sec)
    5.9 usec per item to destroy twins (total 0.048 sec)
log2size = 14 size = 16384
    14.7 usec per item to build (total 0.241 sec)
    6.4 usec per item to destroy twins (total 0.105 sec)
log2size = 15 size = 32768
    12.2 usec per item to build (total 0.401 sec)
    3.9 usec per item to destroy twins (total 0.128 sec)
log2size = 16 size = 65536
    7.8 usec per item to build (total 0.509 sec)
    4.0 usec per item to destroy twins (total 0.265 sec)
log2size = 17 size = 131072
    7.9 usec per item to build (total 1.031 sec)
    4.1 usec per item to destroy twins (total 0.543 sec)

The last one is over 100 usec per item using the original patch (with or
without my first suggestion).

if-i-were-a-betting-man-i'd-say-"bingo"-ly y'rs  - tim

Drop-in replacement for the popitem in the patch:

static PyObject *
dict_popitem(dictobject *mp, PyObject *args)
	int i = 0;
	dictentry *ep;
	PyObject *res;

	if (!PyArg_NoArgs(args))
		return NULL;
	if (mp->ma_used == 0) {
				"popitem(): dictionary is empty");
		return NULL;
	/* Set ep to "the first" dict entry with a value.  We abuse the hash
	 * field of slot 0 to hold a search finger:
	 * If slot 0 has a value, use slot 0.
	 * Else slot 0 is being used to hold a search finger,
	 * and we use its hash value as the first index to look.
	ep = &mp->ma_table[0];
	if (ep->me_value == NULL) {
		i = (int)ep->me_hash;
		/* The hash field may be uninitialized trash, or it
		 * may be a real hash value, or it may be a legit
		 * search finger, or it may be a once-legit search
		 * finger that's out of bounds now because it
		 * wrapped around or the table shrunk -- simply
		 * make sure it's in bounds now.
		if (i >= mp->ma_size || i < 1)
			i = 1;	/* skip slot 0 */
		while ((ep = &mp->ma_table[i])->me_value == NULL) {
			if (i >= mp->ma_size)
				i = 1;
	res = PyTuple_New(2);
	if (res != NULL) {
		PyTuple_SET_ITEM(res, 0, ep->me_key);
		PyTuple_SET_ITEM(res, 1, ep->me_value);
		ep->me_key = dummy;
		ep->me_value = NULL;
	assert(mp->ma_table[0].me_value == NULL);
	mp->ma_table[0].me_hash = i + 1;  /* next place to start */
	return res;