RE: [Python-Dev] Dictionary tuning

That's what I was getting at. I know that (for example) most classes I create have less that 16 entries in their __dict__. With this change, each class instance would take (approx) twice as much memory for its __dict__. I suspect that class instance __dict__ is the most common dictionary I use.
That's not what I meant. Most dictionaries are fairly small. Large dictionaries are common, but I doubt they are common enough to offset the potential memory loss from this patch. Currently if you go one over a threshold you have a capacity of 2*len(d)-1. With the patch this would change to 4*len(d)-1 - very significant for large dictionaries. Thus my consideration that it might be worthwhile for smaller dictionaries (depending on memory memory characteristics) but not for large dictionaries. Perhaps we need to add some internal profiling, so that "quickly-growing" dictionaries get larger reallocations ;)
I didn't look at the surrounding code (bad Tim D - thwack!) but in this case I would not expect an appreciable performance loss from this. However, the fact that we're getting an appreciable performance *gain* from changes on this branch suggests that it might be slightly more vulnerable than expected (but should still be swamped by the resize).
I find that considerably easier to read in any case ;) Cheers. Tim Delaney

[Tim Peters]
I think of the resize intervals as steps on a staircase. My patch eliminates the even numbered stairs. The average logarithmic slope of the staircase doesn't change, there are just fewer discrete steps. Also, the height of the staircase doesn't change unless the top stair was even, in which case, another half step is added. [Tim Peters]
[Tim Peters]
Me either, I suspect that it is rare to find a stable application that is functioning just fine and consuming nearly all memory. Sooner or later, some change in data, hardware, os, or script would push it over the edge. [Timothy Delaney]
Those dicts would also be the ones benefitting from the patch. Their density would be halved; resulting in fewer collisions, improved search times, and better cache performance. [Timothy Delaney]
Perhaps we need to add some internal profiling, so that "quickly-growing" dictionaries get larger reallocations ;)
I came up with this patch a couple of months ago and have since tried every tweak I could think of (apply to this size dict but not that one, etc) but found nothing that survived a battery of application benchmarks. Have you guys tried out the patch? I'm very interested in getting results from different benchmarks, processors, cache sizes, and various operating systems. sparse-is-better-than-dense-ly yours, Raymond (currently, the only one. unlike two Tims, two Bretts, two Jacks and a Fredrik distinct from Fred) ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# #################################################################

[Delaney, Timothy C]
Do they have fewer then 6 entries? Dicts with 5 or fewer entries don't change size at all (an "empty dict" comes with room for 5 entries). Surprise <wink>: in many apps, the most frequent use is dicts created to hold keyword arguments at call sites. This is under the covers so you're not normally aware of it. Those almost always hold less than 6 entries; except in apps where they don't. But they're usually short-lived too (not surviving the function call they're created for).
Two-thirds of which is empty space right after resizing, BTW.
With the patch this would change to 4*len(d)-1 - very significant for large dictionaries.
I don't know that it is. One dict slot consumes 12 bytes on 32-bit boxes, and slots are allocated contiguously so there's no hidden malloc overhead per slot. I hope a dict with a million slots counts as large, but that's "only" 12MB for slot space. When it gets too large to fit in RAM, that's deadly to performance; I've reached that point many times in experimental code, but those were lazy algorithms to an extreme. So I'm more worried about apps with several large dicts than about apps with one huge dict.
There's always more than one effect from a change. Raymond explained that large dict performance is boosted due to fewer collisions, and that makes perfect sense (every probe in a large dict is likely to be a cache miss). It doesn't make sense that fiddling the code inside the if-block slows anything, unless perhaps it's an unfortunate I-stream cache effect slowing the normal (if-block not entered) case. When you're looking at out-of-cache code, second- and third- order causes are often the whole story.

[Tim Peters]
I think of the resize intervals as steps on a staircase. My patch eliminates the even numbered stairs. The average logarithmic slope of the staircase doesn't change, there are just fewer discrete steps. Also, the height of the staircase doesn't change unless the top stair was even, in which case, another half step is added. [Tim Peters]
[Tim Peters]
Me either, I suspect that it is rare to find a stable application that is functioning just fine and consuming nearly all memory. Sooner or later, some change in data, hardware, os, or script would push it over the edge. [Timothy Delaney]
Those dicts would also be the ones benefitting from the patch. Their density would be halved; resulting in fewer collisions, improved search times, and better cache performance. [Timothy Delaney]
Perhaps we need to add some internal profiling, so that "quickly-growing" dictionaries get larger reallocations ;)
I came up with this patch a couple of months ago and have since tried every tweak I could think of (apply to this size dict but not that one, etc) but found nothing that survived a battery of application benchmarks. Have you guys tried out the patch? I'm very interested in getting results from different benchmarks, processors, cache sizes, and various operating systems. sparse-is-better-than-dense-ly yours, Raymond (currently, the only one. unlike two Tims, two Bretts, two Jacks and a Fredrik distinct from Fred) ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# #################################################################

[Delaney, Timothy C]
Do they have fewer then 6 entries? Dicts with 5 or fewer entries don't change size at all (an "empty dict" comes with room for 5 entries). Surprise <wink>: in many apps, the most frequent use is dicts created to hold keyword arguments at call sites. This is under the covers so you're not normally aware of it. Those almost always hold less than 6 entries; except in apps where they don't. But they're usually short-lived too (not surviving the function call they're created for).
Two-thirds of which is empty space right after resizing, BTW.
With the patch this would change to 4*len(d)-1 - very significant for large dictionaries.
I don't know that it is. One dict slot consumes 12 bytes on 32-bit boxes, and slots are allocated contiguously so there's no hidden malloc overhead per slot. I hope a dict with a million slots counts as large, but that's "only" 12MB for slot space. When it gets too large to fit in RAM, that's deadly to performance; I've reached that point many times in experimental code, but those were lazy algorithms to an extreme. So I'm more worried about apps with several large dicts than about apps with one huge dict.
There's always more than one effect from a change. Raymond explained that large dict performance is boosted due to fewer collisions, and that makes perfect sense (every probe in a large dict is likely to be a cache miss). It doesn't make sense that fiddling the code inside the if-block slows anything, unless perhaps it's an unfortunate I-stream cache effect slowing the normal (if-block not entered) case. When you're looking at out-of-cache code, second- and third- order causes are often the whole story.
participants (3)
-
Delaney, Timothy C (Timothy)
-
Raymond Hettinger
-
Tim Peters