[Python-Dev] PEP for new dictionary implementation

Thu Feb 16 22:34:00 CET 2012

Am 16.02.2012 19:24, schrieb Jim J. Jewett:
> 
> 
> PEP author Mark Shannon wrote
> (in http://mail.python.org/pipermail/python-dev/attachments/20120208/05be469a/attachment.txt):
> 
>> ... allows ... (the ``__dict__`` attribute of an object) to share
>> keys with other attribute dictionaries of instances of the same class.
> 
> Is "the same class" a deliberate restriction, or just a convenience
> of implementation? 

It's about the implementation: the class keeps a pointer to the key set.
A subclass has a separate pointer for that.

> I have often created subclasses (or even families
> of subclasses) where instances (as opposed to the type) aren't likely
> to have additional attributes.  These would benefit from key-sharing
> across classes, but I grant that it is a minority use case that isn't
> worth optimizing if it complicates the implementation.

In particular, the potential savings are small: the instances of the
subclass will share the key sets per-class. So if you have S subclasses,
you could save up to S keysets, whereas you are already saving N-S-1
keysets (assuming you have a total of N objects across all classes).

> Have you timed not storing the hash (in the dict) at all, at least for
> (unicode) str-only dicts?  Going to the string for its own cached hash
> breaks locality a bit more, but saves 1/3 of the memory for combined
> tables, and may make a big difference for classes that have relatively
> few instances.

I'd be in favor of that, but it is actually an unrelated change: whether
or not you share key sets is unrelated to whether or not str-only dicts
drop the cached hash. Given a dict, it may be tricky to determine
whether or not it is str-only, i.e. what layout to use.

>> Reduction in memory use is directly related to the number of dictionaries
>> with shared keys in existence at any time. These dictionaries are typically
>> half the size of the current dictionary implementation.
> 
> How do you measure that?  The limit for huge N across huge numbers
> of dicts should be 1/3 (because both hashes and keys are shared); I
> assume that gets swamped by object overhead in typical small dicts.

It's more difficult than that. He also drops the smalltable (which I
think is a good idea), so accounting how this all plays together is tricky.

>> If a table is split the values in the keys table are ignored,
>> instead the values are held in a separate array.
> 
> If they're just dead weight, then why not use them to hold indices
> into the array, so that values arrays only have to be as long as
> the number of keys, rather than rounding them up to a large-enough
> power-of-two?  (On average, this should save half the slots.)

Good idea. However, how do you track per-dict how large the table is?

Regards,
Martin