[Python-Dev] Compact ordered dict is not ordered for split table. (was: PEP XXX: Compact ordered dict

INADA Naoki songofacandy at gmail.com
Fri Jun 24 00:14:55 EDT 2016

> IIUC, key-sharing dicts are a best-effort optimization where if I have
> a class like:
> class Foo:
>     def __init__(self, a, b):
>         self.a = a
>         self.b = b
> f1 = Foo(1, 2)
> f2 = Foo(3, 4)
> then f1.__dict__ and f2.__dict__ can share their key arrays... but if
> I do f1.c = "c", then f1.__dict__ gets automatically switched to a
> regular dict. The idea being that in, say, 99% of cases, different
> objects of the same type all share the same set of keys, and in the
> other 1%, oh well, we fall back on the regular behavior.

Small correction:  Giving up sharing dict can happen when resizing keys.

f1 = Foo(1, 2)  # f1 has [a, b] keys. Let's say it k1.  Foo caches k1.
f2 = Foo(3, 4)  # new instance uses cached k1 keys.
f1.c = "c"   # Since k1 can contain three keys, nothing happen.
f1.d = "d"   # gave up.  Foo doesn't use shared key anymore.
f3 = Foo(5, 6)   # f3 has normal dict.

You can see it by `sys.getsizeof(f1.__dict__), sys.getsizeof(f2.__dict__)`.

> It seems to me that all this works fine for ordered dicts too, if we
> add the restriction that key arrays can be shared if and only if the
> two dicts have the same set of keys *and* initially assign those keys
> in the same order. In, say, 98.9% of cases, different objects of the
> same type all share the same set of keys and initially assign those
> keys in the same order, and in the other 1.1%, oh well, we can
> silently fall back on unshared keys, same as before. (And crucially,
> the OrderedDict semantics are that only adding *new* keys changes the
> order; assignments to existing keys preserve the existing order. So if
> a given type always creates the same instance attributes in the same
> order at startup and never adds or deletes any, then its key values
> *and* key order will stay the same even if it later mutates some of
> those existing attributes in-place.)
> It's possible that there will be some weird types that mess this up, like:
> class WeirdFoo:
>     def __init__(self, a, b):
>         if a % 2 == 0:
>             self.a = a
>             self.b = b
>         else:
>             self.b = b
>             self.a = a
> assert list(WeirdFoo(1, 2).__dict__.keys()) != list(WeirdFoo(2,
> 3).__dict__.keys())
> but, who cares? It'd be good due-diligence to collect data on this to
> confirm that it isn't a big issue, but intuitively, code like
> WeirdFoo.__init__ is vanishingly rare, and this is just a best-effort
> optimization anyway. Catching 98.9% of cases is good enough.

While I think it's less than 98.9% (see below examples), I agree with you.

1) not shared even current implementation

class A:
    n = 0

    def __init__(self, a, b, c):
        self.a, self.b, self.c = a, b, c

    def add(self):
        self.n += 1

a = A()
b = A()

2) not shared if strict ordering rule

class A:
    file = None
    def __init__(self, a, **, filename=None):
        if filename is not None:
            self.file = open(filename, 'w')
        self.a = a

a = A(42, filename="logfile.txt")
b = B(43)

3) Web application's model objects

class User(Model):
    id = IntColumn()
    name = StringColumn()
    age = IntColumn()

# When creating new instance, (name, age) is initialized, and id is
filled after insert.
user = User(name="methane", age=32)

# When instances fetched from DB, ORM populate attributes in (id,
name, age) order.
# 100 instances doesn't share keys under "strict ordering rule".

users = User.query.limit(100).all()

> Is there something I'm missing here? Is this your option #3?


It may works well, but "one special instance disables key-sharing for
all instances
created after" may cause long time increasing memory usage.
People seeing monitoring graph will think their application have memory leak.

My new idea may have more stable memory usage, without decreasing memory
efficiency so much.

See https://mail.python.org/pipermail/python-dev/2016-June/145391.html

Compact ordered dict is more efficient than key-sharing dict in case of Sphinx.
It means, instance __dict__ is not dominance.

I'll implement POC of my new idea and compare it with Sphinx.
If you know another good *real application*, which is easy to benchmark,
please tell me it.

INADA Naoki  <songofacandy at gmail.com>

More information about the Python-Dev mailing list