Re: [Python-Dev] A new dict for Xmas?

Dec. 23, 2011

      ...
...
- it would be useful to have a specialized representation for
  all-keys-are-strings. In that case, me_hash could be dropped
  from the representation. You would get savings compared to
  the status quo even in the non-shared case.
It might tricky switching key tables and I dont think it would save much
memory as keys that are widely shared take up very little memory anyway,
and not many other dicts are long-lived.
Why do you say that? In a plain 3.3 interpreter, I counted 595 dict
objects (see script below). Of these, 563 (so nearly of them) had
only strings as keys. Among those, I found 286 different key sets,
where 231 key sets occurred only once (i.e. wouldn't be shared).

Together, the string dictionaries had 13282 keys, and you could save
as many pointers (actually more, because there will be more key slots
than keys).

I'm not sure why you think the string dicts with unshared keys would be
short-lived. But even if they were, what matters is the steady-state
number of dictionaries - if for every short-lived dictionary that
gets released another one is created, any memory savings from reducing
the dict size would still materialize.
...
...
- I wonder whether the shared keys could be computed at compile
  time, considering all attribute names that get assigned for
  self. The compiler could list those in the code object, and
  class creation could iterate over all methods (taking base
  classes into account).
It probably wouldn't be that hard to make a guess at compile time as to
what the shared keys would be, but it doesn't really matter.
The generation of intermediate shared keys will only happen once per
class, so the overhead would be negligible.
I'm not so much concerned about overhead, but about correctness/
effectiveness of the heuristics. For a class with dynamic attributes,
you may well come up with a very large key set. With source analysis,
you wouldn't attempt to grow the keyset beyond what likely is being
shared.

Regards,
Martin

import sys
d = sys.getobjects(0,dict)
print(len(d), "dicts")
d2 = []
for o in d:
    keys = o.keys()
    if not keys:continue
    types = tuple(set(type(k) for k in keys))
    if types != (str,):
        continue
    d2.append(tuple(sorted(keys)))
print(len(d2), "str dicts")
freq = {}
for keys in d2:
    freq[keys] = freq.get(keys,0)+1
print(len(freq), "different key sets")
freq = sorted(freq.items(), key=lambda t:t[1])
print(len([o for o in freq if o[1]==1]), "unsharable")
print(sum(len(o[0]) for o in freq), "keys")
print(freq[-10:])

Re: [Python-Dev] A new dict for Xmas?

"Martin v. Löwis"