[issue13903] New shared-keys dictionary implementation
report at bugs.python.org
Fri Mar 9 18:13:20 CET 2012
Jim Jewett <jimjjewett at gmail.com> added the comment:
>> As of Feb 28, 2012, the PEP mentions an additional
>> optimization of storing the values in an array indexed
>> by (effectively) key insertion order, rather than key
>> position. ("Alternative Implementation")
>> It states that this would reduce memory usage for the
>> values array by 1/3. 1/3 is a worst-case measurement;
>> average is 1/2. (At savings of less than 1/3, the keys
>> would resize, to initial savings of 2/3. And yes, that
>> means in practice, the average savings would be greater
>> than half, because the frequency of dicts of size N
>> decreases with N.)
> I presume you mean to allocate a values array of
> size == actual keys rather than size == usable keys.
Actually, I meant size==maximum(keys in use for this dict),
so that for objects with a potentially complex lifecycle,
those instances that had not yet needed the new attributes
would not need to allocate space for them.
But I see now that just allocating space for each of the
potential keys is indeed simpler. And I suppose that a
class which won't eventually need all the attributes for
every instance is an unusual case.
So to get beneath 2/3 without lots of reallocation
probably requires knowing when the key set is likely
to be complete, and that is indeed more complex than
the current changes. (That said, you have left code
in for immutable keys, so there may be cases where
this isn't so hard after all.)
> Making the values array smaller than the the number
> of usable keys causes a number of issues:
> 1. The values_size will need to be kept in the dict,
> not in the keys,
That was indeed true for my proposal. If you just allocate
2/3, then you don't need to store the value, unless you
want to be lazy about reallocating when the keys grow.
Even then, there are few enough potential sizes to fit
the information in a byte, or we could get the info for
free *if* the dict were already timestamped or versioned.
>> It states that the keys table would need an additional
>> "values_size" field, but in the absence of dummies, this
>> is just ma_used.
I was mixing two structures in my mind. The current key
count (which is sufficient for a new values array) is
actually USABLE_FRACTION(dk_size) - dk_free.
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list