[Python-Dev] Dictionary tuning

Tim Peters tim.one@comcast.net
Mon, 28 Apr 2003 21:49:52 -0400


[Tim Delaney]
>> What is the effect on peak memory usage over "average" programs?

[Raymond Hettinger]
> Since the amortized growth is the same, the effect is Nil on average.
> Special cases can be contrived with a specific number of records
> where the memory use is doubled, but in general it is nearly unchanged
> for average programs.

That doesn't make sense.  Dicts can be larger after the patch, but never
smaller, so there's nothing opposing the "can be larger" part:  on average,
allocated address space must be strictly larger than before.  Whether that
*matters* on average to the average user is something we can answer
rigorously just as soon as we find an average user with an average program
<wink>.  I'm not inclined to worry much about it.

>> This might be a worthwhile speedup on small dicts (up to a TBD
>> number of entries) but not worthwhile for large dicts.

> Actually, it helps large dictionaries even more that small dictionaries.
> Collisions in large dicts are resolved through other memory probes
> which are almost certain not to be in the current cache line.

That part makes sense.  Resizing a large dict is an expensive operation too.

>> However, to add this capability in would of course add more code
>> to a very common code path (additional test for current size to
>> decide what factor to increase by).

> Your intuition is exactly correct.  All experiments to special case
> various sizes results in decreased performance because it added
> a tiny amount to some of the most heavily exercised code in
> python.

This part isn't clear:  the changed code is in the body of an if() block
that normally *isn't* entered (over an ever-growing dict's life, it's
entered O(log(len(dict))) times, and independent of the number of dict
lookups).  The change cuts the number of times it's entered by approximately
a factor of 2, but it isn't entered often even now.

> Further, it results in an unpredicable branch which is
> also not a good thing.

Since the body of the loop isn't entered often, unpredictable one-shot
branches within the body shouldn't have a measurable effect.  The
unpredictable branches when physically resizing the dict will swamp them
regardless.  The surrounding if-test continues to be predictable in the
"branch taken" direction.

What could be much worse is that stuffing code into the if-block bloats the
code so much as to frustrate lookahead I-stream caching of the normal
"branch taken and return 0" path:

	if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) {
		if (dictresize(mp, mp->ma_used*2) != 0)
			return -1;
	}
	return 0;

Rewriting as

	if (mp->ma_used <= n_used || mp->ma_fill*3 < (mp->ma_mask+1)*2)
		return 0;

	return dictresize(mp, mp->ma_used*2) ? -1 : 0;

would help some compilers generate better code for the expected path, and
especially if the blob after "return 0;" got hairier.

IOW, if fiddling with different growth factors at different sizes slowed
things down, we have to look for something that affected the *normal* paths;
it's hard to imagine that the the guts of the if-block execute often enough
to matter (discounting its call to dictresize(), which is an expensive
routine).

> I found out that timing dict performance was hard.
> Capturing memory usage was harder.  Checking entry
> space,space plus unused space, calls to PyMalloc, and
> calls to the os malloc, only the last is important, but
> it depends on all kinds of things that are not easily
> controlled.

In my early Cray days, the Cray boxes were batch one-job-at-a-time, and all
memory was real.  If you had a CPU-bound program, it took the same number of
nanoseconds each time you ran it.  Benchmarking was hard then too <0.5
wink>.