[Python-ideas] Improving memory usage in shared-key attribute dicts

Thu Oct 30 23:42:21 CET 2014

On Thu, Oct 30, 2014 at 07:02:27PM +0000, Hill, Bruce wrote:

> Thanks to PEP 412: "Key-Sharing Dictionary", CPython attribute 
> dictionaries can share keys between multiple instances, so the memory 
> cost of new attribute dicts comes primarily from the values array. In 
> the current implementation, the keys array and the values array are 
> always kept to be the same size. This is done so that once the key's 
> location in the key array has been tracked down, the same array offset 
> can be used on the value array to find the value.
> 
> Rather than storing values in a sparse array of the same size as the 
> keys array, it would make more sense to store values in a compact 
> array. When a dict uses key sharing, there is an unused field in the 
> PyDictKeyEntry struct ("me_value"), which could be repurposed to hold 
> an index into the value array (perhaps by converting "me_value" into a 
> payload union in PyDictKeyEntry). Since the sparse arrays in 
> dictobject.c never use more than (2n+1)/3 of their entries, this 
> change would reduce the memory footprint of each shared-key dict by 
> roughly 1/3 (or more) and also improve data locality.

How does this compare to the "Alternate Implementation" described in PEP 
412?

http://python.org/dev/peps/pep-0412/#id17

-- 
Steven