[Python-Dev] PyBench DictCreation (was Re: Performance compares)

Tim Peters tim@digicool.com
Fri, 18 May 2001 03:17:07 -0400


[Jeremy]
> I also did a profile run on CreateInstances, which has a difference of
> +55.54% on my machine.  It's basically the same story.  The instance
> dictionary is getting resized more often with Python 2.1+ than it did
> with Python 1.5.2.  I wouldn't be surprised if several more tests are
> showing a slowdown with the same cause.
>
> So boosting the minimum size sounds like a good thing.

I don't know.  PyBench is great for showing that *something* changed, but
it's got even less claim to "typical use" than pystone.

I don't know that the test suite is better in that respect, but it's got much
more variety and everyone has it <wink>.  I stuffed code in dict_dealloc() to
record the ma_fill of each dict on its way to the grave (ma_fill == number of
non-virgin slots).  Across the test suite, here's the ranking, from most to
least popular fill:

  count    fill %total  cumulative %
 ------    ---- ------  ------------
 146321       1  53.30  53.30
  38200       0  13.91  67.21
  32616       2  11.88  79.09
  29648       3  10.80  89.89
   9884       5   3.60  93.49
   5423       4   1.98  95.47
   2428       6   0.88  96.35
   2016       8   0.73  97.08
   1179       7   0.43  97.51
    904       9   0.33  97.84
    709     103   0.26  98.10
    554      10   0.20  98.30
    513      13   0.19  98.49
    459      12   0.17  98.66
    447      11   0.16  98.82
    364      14   0.13  98.95
    233      15   0.08  99.04
    231      16   0.08  99.12
    193      18   0.07  99.19
    180      17   0.07  99.26
    122      19   0.04  99.30
    107      30   0.04  99.34
    105      21   0.04  99.38
     93      22   0.03  99.41
     93      20   0.03  99.45
     86     256   0.03  99.48
     82      23   0.03  99.51
     80      26   0.03  99.54
     74      24   0.03  99.56
     69      27   0.03  99.59
     64      25   0.02  99.61
     60      29   0.02  99.63
     49      28   0.02  99.65
     44      34   0.02  99.67
     33      32   0.01  99.68
     28      31   0.01  99.69
     27      37   0.01  99.70
     27      33   0.01  99.71
     26      35   0.01  99.72
     24      36   0.01  99.73
     23      39   0.01  99.74
     23      38   0.01  99.75
     21     128   0.01  99.75
     19      44   0.01  99.76
     19      40   0.01  99.77
     17      46   0.01  99.77
     16      48   0.01  99.78
     15      47   0.01  99.78
     14      50   0.01  99.79
     14      42   0.01  99.79

There are many more sizes, but I cut off the display here when they got too
rare to round to 1% of 1% of the total count.

Boosting the first non-empty size to 8 would allow 93+% of all dicts to get
away with at most one resize (a dict of size 8 is enough for a fill of 5, but
not 6).  OTOH, the current first non-empty size of 4 is enough for 79% of all
dicts (enough for a fill of 2, but not 3).  If oodles of those tiny dicts are
alive *at the same time*, it would be quite a waste of space to force the
non-empty ones to carry 8 slots.  OTOH, if those small dicts are due to
things like building one- or two-element keyword argument dicts, their
lifetimes rarely overlap.

A more aggressive idea is to allow denser dicts, by allowing them to become
no more than 75% full.  That is, change the resize test from

    mp->ma_fill*3 >= mp->ma_size*2

to

    mp->ma_fill*4 > mp->ma_size*3

That would allow the 10.8% of real(er) life dicts with fill 3 to continue
living in dicts with 4 slots, and allow about 90% of all dicts to get away
with no more than one resize.  The downside is that boosting the max load
factor from 2/3 to 3/4 yields, "in theory", and for a dict hugging the limit,
a small boost in the expected # of compares.  But the "theory" is for random
hash functions with "uniform probing" (tech term that does *not* mean linear
probing), and Python's hash functions often aren't random at all, while AFAIK
no rigorous analysis of its probing strategy exists.

So, plenty of arbitrary data there upon which to flip a coin <wink>.