[Python-Dev] A new dict for Xmas?

Fri Dec 23 11:33:59 CET 2011

>> - it would be useful to have a specialized representation for
>>   all-keys-are-strings. In that case, me_hash could be dropped
>>   from the representation. You would get savings compared to
>>   the status quo even in the non-shared case.
> It might tricky switching key tables and I dont think it would save much
> memory as keys that are widely shared take up very little memory anyway,
> and not many other dicts are long-lived.

Why do you say that? In a plain 3.3 interpreter, I counted 595 dict
objects (see script below). Of these, 563 (so nearly of them) had
only strings as keys. Among those, I found 286 different key sets,
where 231 key sets occurred only once (i.e. wouldn't be shared).

Together, the string dictionaries had 13282 keys, and you could save
as many pointers (actually more, because there will be more key slots
than keys).

I'm not sure why you think the string dicts with unshared keys would be
short-lived. But even if they were, what matters is the steady-state
number of dictionaries - if for every short-lived dictionary that
gets released another one is created, any memory savings from reducing
the dict size would still materialize.

>> - I wonder whether the shared keys could be computed at compile
>>   time, considering all attribute names that get assigned for
>>   self. The compiler could list those in the code object, and
>>   class creation could iterate over all methods (taking base
>>   classes into account).
>>
> 
> It probably wouldn't be that hard to make a guess at compile time as to
> what the shared keys would be, but it doesn't really matter.
> The generation of intermediate shared keys will only happen once per
> class, so the overhead would be negligible.

I'm not so much concerned about overhead, but about correctness/
effectiveness of the heuristics. For a class with dynamic attributes,
you may well come up with a very large key set. With source analysis,
you wouldn't attempt to grow the keyset beyond what likely is being
shared.

Regards,
Martin

import sys
d = sys.getobjects(0,dict)
print(len(d), "dicts")
d2 = []
for o in d:
    keys = o.keys()
    if not keys:continue
    types = tuple(set(type(k) for k in keys))
    if types != (str,):
        continue
    d2.append(tuple(sorted(keys)))
print(len(d2), "str dicts")
freq = {}
for keys in d2:
    freq[keys] = freq.get(keys,0)+1
print(len(freq), "different key sets")
freq = sorted(freq.items(), key=lambda t:t[1])
print(len([o for o in freq if o[1]==1]), "unsharable")
print(sum(len(o[0]) for o in freq), "keys")
print(freq[-10:])