![](https://secure.gravatar.com/avatar/3acb8bae5a2b5a28f6fe522a4ea9b873.jpg?s=120&d=mm&r=g)
- it would be useful to have a specialized representation for all-keys-are-strings. In that case, me_hash could be dropped from the representation. You would get savings compared to the status quo even in the non-shared case. It might tricky switching key tables and I dont think it would save much memory as keys that are widely shared take up very little memory anyway, and not many other dicts are long-lived.
Why do you say that? In a plain 3.3 interpreter, I counted 595 dict objects (see script below). Of these, 563 (so nearly of them) had only strings as keys. Among those, I found 286 different key sets, where 231 key sets occurred only once (i.e. wouldn't be shared). Together, the string dictionaries had 13282 keys, and you could save as many pointers (actually more, because there will be more key slots than keys). I'm not sure why you think the string dicts with unshared keys would be short-lived. But even if they were, what matters is the steady-state number of dictionaries - if for every short-lived dictionary that gets released another one is created, any memory savings from reducing the dict size would still materialize.
- I wonder whether the shared keys could be computed at compile time, considering all attribute names that get assigned for self. The compiler could list those in the code object, and class creation could iterate over all methods (taking base classes into account).
It probably wouldn't be that hard to make a guess at compile time as to what the shared keys would be, but it doesn't really matter. The generation of intermediate shared keys will only happen once per class, so the overhead would be negligible.
I'm not so much concerned about overhead, but about correctness/ effectiveness of the heuristics. For a class with dynamic attributes, you may well come up with a very large key set. With source analysis, you wouldn't attempt to grow the keyset beyond what likely is being shared. Regards, Martin import sys d = sys.getobjects(0,dict) print(len(d), "dicts") d2 = [] for o in d: keys = o.keys() if not keys:continue types = tuple(set(type(k) for k in keys)) if types != (str,): continue d2.append(tuple(sorted(keys))) print(len(d2), "str dicts") freq = {} for keys in d2: freq[keys] = freq.get(keys,0)+1 print(len(freq), "different key sets") freq = sorted(freq.items(), key=lambda t:t[1]) print(len([o for o in freq if o[1]==1]), "unsharable") print(sum(len(o[0]) for o in freq), "keys") print(freq[-10:])