[Python-Dev] Idea: more compact, interned string key only dict for namespace.

INADA Naoki songofacandy at gmail.com
Wed Jun 22 13:23:18 EDT 2016


As my last email, compact ordered dict can't preserve
insertion order of key sharing dict (PEP 412).

I'm thinking about deprecating key shared dict for now.

Instead, my new idea is introducing more compact dict
specialized for namespace.

If BDFL (or BDFL delegate) likes this idea, I'll take another
one week to implement this.


Background
----------------

* Most keys of namespace dict are string.
* Calculating hash of string is cheap (one memory access, thanks for cache).
* And most keys are interned already.


Design
----------

Instead of normal PyDictKeyEntry, use PyInternedKeyEntry like this.

typedef struct {
    // no me_hash
    PyObject *me_key, *me_value;
} PyInternedKeyEntry;


insertdict() interns key if it's unicode, otherwise it converts dict to
normal compact ordered dict.

lookdict_interned() compares only pointer (doesn't call unicode_eq())
when searching key is interned.

And add new internal API to create interned key only dict.

PyDictObject* _PyDict_NewForNamespace();


Memory usage
--------------------

on amd64 arch.

key-sharing dict:

* 96 bytes for ~3 items
* 128 bytes for 4~5 items.

compact dict:

* 224 bytes for ~5 items.

(232 bytes when keep supporting key-shared dict)

interned key only dict:

* 184 bytes for ~5 items


Note
------

Interned key only dict is still larger than key-shared dict.

But it can be used for more purpose.  It can be used for interning string
for example.  It can be used to kwargs dict when all keys are interned already.

If we provide _PyDict_NewForNamespace to extension modules,
json decoder can have option to use this, too.


-- 
INADA Naoki  <songofacandy at gmail.com>


More information about the Python-Dev mailing list