[Python-Dev] Dictionary tuning

Tue, 29 Apr 2003 12:40:41 +1000

> From: Tim Peters [mailto:tim.one@comcast.net]
>=20
> That doesn't make sense.  Dicts can be larger after the=20
> patch, but never
> smaller, so there's nothing opposing the "can be larger"=20
> part:  on average,
> allocated address space must be strictly larger than before. =20
> Whether that
> *matters* on average to the average user is something we can answer
> rigorously just as soon as we find an average user with an=20
> average program
> <wink>.  I'm not inclined to worry much about it.

That's what I was getting at. I know that (for example) most
classes I create have less that 16 entries in their __dict__.
With this change, each class instance would take (approx) twice
as much memory for its __dict__. I suspect that class instance
__dict__ is the most common dictionary I use.

> >> This might be a worthwhile speedup on small dicts (up to a TBD
> >> number of entries) but not worthwhile for large dicts.
>=20
> > Actually, it helps large dictionaries even more that small=20
> dictionaries.
> > Collisions in large dicts are resolved through other memory probes
> > which are almost certain not to be in the current cache line.
>=20
> That part makes sense.  Resizing a large dict is an expensive=20
> operation too.

That's not what I meant. Most dictionaries are fairly small.
Large dictionaries are common, but I doubt they are common enough
to offset the potential memory loss from this patch. Currently if
you go one over a threshold you have a capacity of 2*len(d)-1.
With the patch this would change to 4*len(d)-1 - very significant
for large dictionaries. Thus my consideration that it might be
worthwhile for smaller dictionaries (depending on memory
memory characteristics) but not for large dictionaries.

Perhaps we need to add some internal profiling, so that
"quickly-growing" dictionaries get larger reallocations ;)

> Since the body of the loop isn't entered often, unpredictable one-shot
> branches within the body shouldn't have a measurable effect.  The
> unpredictable branches when physically resizing the dict will=20
> swamp them
> regardless.  The surrounding if-test continues to be=20
> predictable in the
> "branch taken" direction.

I didn't look at the surrounding code (bad Tim D - thwack!) but
in this case I would not expect an appreciable performance loss
from this. However, the fact that we're getting an appreciable
performance *gain* from changes on this branch suggests that it
might be slightly more vulnerable than expected (but should still be
swamped by the resize).

> What could be much worse is that stuffing code into the=20
> if-block bloats the
> code so much as to frustrate lookahead I-stream caching of the normal
> "branch taken and return 0" path:
>=20
> 	if (mp->ma_used > n_used && mp->ma_fill*3 >=3D=20
> (mp->ma_mask+1)*2) {
> 		if (dictresize(mp, mp->ma_used*2) !=3D 0)
> 			return -1;
> 	}
> 	return 0;
>=20
> Rewriting as
>=20
> 	if (mp->ma_used <=3D n_used || mp->ma_fill*3 < (mp->ma_mask+1)*2)
> 		return 0;
>=20
> 	return dictresize(mp, mp->ma_used*2) ? -1 : 0;
>=20
> would help some compilers generate better code for the=20
> expected path, and
> especially if the blob after "return 0;" got hairier.

I find that considerably easier to read in any case ;)

Cheers.

Tim Delaney