[Python-Dev] PEP for new dictionary implementation
jimjjewett at gmail.com
Fri Feb 17 04:32:51 CET 2012
On Thu, Feb 16, 2012 at 4:34 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> Am 16.02.2012 19:24, schrieb Jim J. Jewett:
>> PEP author Mark Shannon wrote
>> (in http://mail.python.org/pipermail/python-dev/attachments/20120208/05be469a/attachment.txt):
>>> ... allows ... (the ``__dict__`` attribute of an object) to share
>>> keys with other attribute dictionaries of instances of the same class.
>> Is "the same class" a deliberate restriction, or just a convenience
>> of implementation?
> It's about the implementation: the class keeps a pointer to the key set.
> A subclass has a separate pointer for that.
I would prefer to see that reason in the PEP; after a few years, I
have trouble finding email, even when I remember reading the
>> Have you timed not storing the hash (in the dict) at all, at least for
>> (unicode) str-only dicts? Going to the string for its own cached hash
>> breaks locality a bit more, but saves 1/3 of the memory for combined
>> tables, and may make a big difference for classes that have
>> relatively few instances.
> I'd be in favor of that, but it is actually an unrelated change: whether
> or not you share key sets is unrelated to whether or not str-only dicts
> drop the cached hash.
Except that the biggest arguments against it are that it breaks cache
locality, and it changes the dictentry struct -- which this patch
already does anyway.
> Given a dict, it may be tricky to determine
> whether or not it is str-only, i.e. what layout to use.
Isn't that exactly the same determination needed when deciding whether
or not to use lookdict_unicode? (It would make the switch to the more
general lookdict more expensive, as that would involve a new
>>> Reduction in memory use is directly related to the number of dictionaries
>>> with shared keys in existence at any time. These dictionaries are typically
>>> half the size of the current dictionary implementation.
>> How do you measure that? The limit for huge N across huge numbers
>> of dicts should be 1/3 (because both hashes and keys are shared); I
>> assume that gets swamped by object overhead in typical small dicts.
> It's more difficult than that. He also drops the smalltable (which I
> think is a good idea), so accounting how this all plays together is tricky.
All the more reason to explain in the PEP how he measured or approximated it.
>>> If a table is split the values in the keys table are ignored,
>>> instead the values are held in a separate array.
>> If they're just dead weight, then why not use them to hold indices
>> into the array, so that values arrays only have to be as long as
>> the number of keys, rather than rounding them up to a large-enough
>> power-of-two? (On average, this should save half the slots.)
> Good idea. However, how do you track per-dict how large the table is?
Why would you want to?
The per-instance array needs to be at least as large as the highest
index used by any key for which it has a value; if the keys table gets
far larger (or even shrinks), that doesn't really matter to the
instance. What does matter to the instance is getting a value of its
own for a new (to it) key -- and then the keys table can tell it which
index to use, which in turn tells it whether or not it needs to grow
Are are you thinking of len(o.__dict__), which will indeed be a bit
slower? That will happen with split dicts and potentially missing
values, regardless of how much memory is set aside (or not) for the
More information about the Python-Dev