[Python-Dev] A new dict for Xmas?

Mark Shannon mark at hotpy.org
Fri Dec 16 22:32:44 CET 2011


Jim Jewett wrote:
>> Greg Ewing wrote:
>>> Mark Shannon wrote:
> 
>>>> I have a new dict implementation which allows sharing of keys between
>>>> objects of the same class.
> 
>>> We already have the __slots__ mechanism for memory savings.
>>> Have you done any comparisons with that?
> 
>> You can't make Python programmers use slots, neither can you
>> automatically change existing programs.
> 
> The automatic change is exactly what a dictionary upgrade provides.
> 
> I haven't read your patch in detail yet, but it sounds like you're
> replacing the array of keys + array of values with just an array of
> values, and getting the numerical index from a single per-class array
> of keys.

Each dictionary has key/hash/values as before, but instead of on array,
they are broken into two: a key/hash array and a value array.
The key/hash arrays can be shared amongst dicts,
this happens for well behaved classes and completely empty dicts,
other wise each dict gets two arrays.

> 
> That would normally be sensible (so thanks!), but it isn't a drop-in
> replacement.  If you have a "Data" class intended to take arbitrary

It is a drop in replacement. It conforms to the current API.

> per-instance attributes, it just forces them all to keep resizing up,
> even though individual instances would be small with the current dict.
There is a cut-off point, at the moment it's quite unsophisticated about 
how it does this, but it could easily be improved.
Suggestions are welcome.

> 
> How is this more extreme than replacing a pure dict with some
> auto-calculated slots and an "other_attrs" dict that would normally
> remain empty?

Its less extreme, but equally effective.

> 
> [It may be harder to implement, because of the difficulty of
> calculating the slots in advance ... but I don't see it as any worse,
> once implemented.]
Its a trade of between ease of implementation as effectiveness.
I think the shared key/hash array approach gets most the advantages of
a full map implementation (like PyPy or V8) with much less hassle.

> 
> Of course, maybe your shared dict just points to sequential array
> positions (rather than matching the key position) ... in which case,
> it may well beat slots, though the the "Data" class would still be a
> problem.

It won't beat slots, mainly due to the extra space required to minimise 
collisions, but it is a lot more compact than the present approach.

For a well behaved class with lots of instances, each with 3 or 4 
attributes (ie the minimum size dict) its cuts the space used by the 
per-instance dict from 136 bytes (32bit machine) to 64 bytes plus the 
shared key/hash array. Slots would only require 12 or 16 bytes.

(When verifying these numbers I found a bug in the resizing,
which I have just fixed)

The next enhancement would be to store the naked value array directly 
into an instance, trimming the space cost down to just 32 bytes, but 
this would cause compatibility issues as the (internal) API would need 
to change.

Cheers,
Mark.


More information about the Python-Dev mailing list