[Python-Dev] Tunable parameters in dictobject.c (was dictnotes.txt out of date?)

Mon Jun 18 16:28:24 CEST 2012

Martin v. Löwis wrote:
>> The default should be what we've had though.
>> The new settings cause a lot more collisions
>> and resizes.
> 
> Raymond, can you kindly point to an application that demonstrates this
> claim (in particular the "a lot more" part, which I'd translate to
> "more than 20% more").

It is quite easy to find a program that results in a lots more resizes.
The issue is not whether there are more resizes, but whether it is
faster or slower. The evidence suggests that the new default settings
are no slower and reduce memory use.

> 
> I'm fine with reverting changes, but I agree that any benchmarking
> performed should be repeatable, and public. I agree it's sad to see
> a month worth of benchmarking reverted - but had that benchmarking
> been documented publicly, rather than just reporting the outcome,
> such reversal might have been avoided.
> 
>> Dicts get their efficiency from sparseness.

But do they? The results of benchmarking would seem to suggest (at least
on my test machine) that overly-sparse dicts are slower.
Possibly due to increased cache misses.

>> Reducing the mindict size from 8 to 4 causes
>> substantially more collisions in small dicts
>> and gets closer to a linear search of a small tuple.
> 
> Why do you think the dictsize has been reduced from 8 to 4? It has not.

For combined tables it remains 8 as before.

For split tables it *has* been reduced to 4.
This will increase collisions, and it is possible that a linear search would
be faster for these very small dictionaries.
However, a 4 entry table fits into a single cache line (for a 64 byte cache
line on a 32 bit machine) which may save a lot of cache misses.
But this all conjecture. Whatever the reason, the current parameters give
the best performance empirically.

Cheers,
Mark.