[Python-Dev] shrinking dicts

Vladimir Marangozov Vladimir.Marangozov@inrialpes.fr
Fri, 13 Aug 1999 00:10:26 +0100 (NFT)


Tim Peters wrote:
> 
> [Tim]
> >> ...slowing the expected case to prevent theoretical bad cases is
> >> usually a net loss -- I think the onus is on you to demonstrate
> >> that this change is an exception to that rule.
> 
> [Vladimir Marangozov]
> > I won't, because this case is rare in practice, classifying it already
> > as an exception. A real exception. I'll have to think a bit more about
> > all this. Adding 1/3 new entries to trigger the next resize sounds
> > suboptimal (if it happens at all).
> 
> "Suboptimal" with respect to which specific cost model?  Exhibiting a
> specific bad case isn't compelling, and especially not when it's considered
> to be "a real exception".  Adding new expense to every delete is an obvious
> new burden -- where's the payback, and is the expected net effect amortized
> across all dict usage a win or loss?  Offhand it sounds like a small loss to
> me, although I haven't worked up a formal cost model either <wink>.

C'mon Tim, don't try to impress me with cost models. I'm already impressed :-)
Anyways, I've looked at some traces. As expected, the conclusion is that
this case is extremely rare wrt the average dict usage. There are 3 reasons:
(1) dicts are usually deleted entirely and (2) del d[key] is rare in practice
(3) often d[key] = None is used instead of (2).

There is, however, a small percentage of dicts which are used below 1/3 of
their size. I must say, below 1/3 of their peek size, because dowsizing
is also rare. To trigger a downsize, 1/3 new entries of the peek size must
be inserted.

Besides these observations, after looking at the code one more time, I can't
really understand why the resize logic is based on the "fill" watermark
and not on "used". fill = used + dummy, but since lookdict returns the
first free slot (null or dummy), I don't really see what's the point of
using a fill watermark... Perhaps you can enlighten me on this. Using only
the "used" metrics seems fine to me. I even deactivated "fill" and replaced
it with "used" to see what happens -- no visible changes, except a tiny
speedup I'm willing to neglect.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252