[Python-Dev] PyBench DictCreation (was Re: Performance compares)
M.-A. Lemburg
mal@lemburg.com
Fri, 18 May 2001 13:38:39 +0200
Tim Peters wrote:
>
> [Jeremy]
> > I also did a profile run on CreateInstances, which has a difference of
> > +55.54% on my machine. It's basically the same story. The instance
> > dictionary is getting resized more often with Python 2.1+ than it did
> > with Python 1.5.2. I wouldn't be surprised if several more tests are
> > showing a slowdown with the same cause.
> >
> > So boosting the minimum size sounds like a good thing.
>
> I don't know. PyBench is great for showing that *something* changed, but
> it's got even less claim to "typical use" than pystone.
It doesn't claim "typical use". pybench is aimed at finding out
performance issues about hot-spots -- there's no such thing as
a "typical program", so pybench gives you low level performance
compares for very specific tasks, e.g. dictionary creation or
for-loop performance.
I have found it to be rather successful at that. At least gives
some good hints at where to look...
> I don't know that the test suite is better in that respect, but it's got much
> more variety and everyone has it <wink>. I stuffed code in dict_dealloc() to
> record the ma_fill of each dict on its way to the grave (ma_fill == number of
> non-virgin slots). Across the test suite, here's the ranking, from most to
> least popular fill:
>
> count fill %total cumulative %
> ------ ---- ------ ------------
> 146321 1 53.30 53.30
> 38200 0 13.91 67.21
> 32616 2 11.88 79.09
> 29648 3 10.80 89.89
> 9884 5 3.60 93.49
> 5423 4 1.98 95.47
> 2428 6 0.88 96.35
> 2016 8 0.73 97.08
> 1179 7 0.43 97.51
> 904 9 0.33 97.84
> 709 103 0.26 98.10
> 554 10 0.20 98.30
> 513 13 0.19 98.49
> 459 12 0.17 98.66
> 447 11 0.16 98.82
> 364 14 0.13 98.95
> 233 15 0.08 99.04
> 231 16 0.08 99.12
> 193 18 0.07 99.19
> 180 17 0.07 99.26
> 122 19 0.04 99.30
> 107 30 0.04 99.34
> 105 21 0.04 99.38
> 93 22 0.03 99.41
> 93 20 0.03 99.45
> 86 256 0.03 99.48
> 82 23 0.03 99.51
> 80 26 0.03 99.54
> 74 24 0.03 99.56
> 69 27 0.03 99.59
> 64 25 0.02 99.61
> 60 29 0.02 99.63
> 49 28 0.02 99.65
> 44 34 0.02 99.67
> 33 32 0.01 99.68
> 28 31 0.01 99.69
> 27 37 0.01 99.70
> 27 33 0.01 99.71
> 26 35 0.01 99.72
> 24 36 0.01 99.73
> 23 39 0.01 99.74
> 23 38 0.01 99.75
> 21 128 0.01 99.75
> 19 44 0.01 99.76
> 19 40 0.01 99.77
> 17 46 0.01 99.77
> 16 48 0.01 99.78
> 15 47 0.01 99.78
> 14 50 0.01 99.79
> 14 42 0.01 99.79
>
> There are many more sizes, but I cut off the display here when they got too
> rare to round to 1% of 1% of the total count.
>
> Boosting the first non-empty size to 8 would allow 93+% of all dicts to get
> away with at most one resize (a dict of size 8 is enough for a fill of 5, but
> not 6). OTOH, the current first non-empty size of 4 is enough for 79% of all
> dicts (enough for a fill of 2, but not 3). If oodles of those tiny dicts are
> alive *at the same time*, it would be quite a waste of space to force the
> non-empty ones to carry 8 slots. OTOH, if those small dicts are due to
> things like building one- or two-element keyword argument dicts, their
> lifetimes rarely overlap.
I found that instance dictionaries are usual within the 8 slot
range. You normally have a few heavy wheight instances and
many light wheight ones which only have two or three attributes
in their instance dict.
> A more aggressive idea is to allow denser dicts, by allowing them to become
> no more than 75% full. That is, change the resize test from
>
> mp->ma_fill*3 >= mp->ma_size*2
>
> to
>
> mp->ma_fill*4 > mp->ma_size*3
>
> That would allow the 10.8% of real(er) life dicts with fill 3 to continue
> living in dicts with 4 slots, and allow about 90% of all dicts to get away
> with no more than one resize. The downside is that boosting the max load
> factor from 2/3 to 3/4 yields, "in theory", and for a dict hugging the limit,
> a small boost in the expected # of compares. But the "theory" is for random
> hash functions with "uniform probing" (tech term that does *not* mean linear
> probing), and Python's hash functions often aren't random at all, while AFAIK
> no rigorous analysis of its probing strategy exists.
>
> So, plenty of arbitrary data there upon which to flip a coin <wink>.
Why not make those parameters macros at the top of dictobject.c
which can then be tuned to whatever the programmer needs/wants ?!
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting: http://www.egenix.com/
Python Software: http://www.lemburg.com/python/