[Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered

Mon Sep 12 13:24:39 EDT 2016

> From what I understood, Python 3.6 dict got two *different* changes:
>
> * modify the dict structure to use two tables instead of only one: an
> "index" table (the hash table) and a second key/value table
> * tune the dict implementation to only append to the key/value table
>
> The second change depends on the first change.
>
> When a key is deleted, the entry is marked as DUMMY. When we add a new
> item, DUMMY entries are skipped and we only append at the end of the
> key/value table. Sometimes, the key/value table is compacted to free
> memory: all DUMMY entries are removed.

Minor correction:

Put dummy key in *hash* table.  The purpose of the dummy key is same to
previous dict implementation.
The entry where deleted is filled with NULL.

> It would be possible to add a flag to allow to reuse DUMMY entries,
> which means loosing the order. The order would only be lost when we
> add the first item after we removed at least one entry (when the first
> DUMMY entry is reused).

Reusing NULL entry is possible, like original compact dict idea by Raymond.

But we should rebuild hash table before it is filled by dummy keys.
Otherwise, hash lookup may be very slow, or never stop.
Sparseness of hash table is very important.

Compaction in current implementation is not only for packing key-value entries,
but also removing dummy keys from hash table.

>
> The OrderedDict would set the flag to preserve the order.
>
> So technically, it is possible. The question is more what should be
> the "default" dict :-) Ordered or not? :-)

Even if dict don't preserve insertion order when deletion, people may depend
"preserving insertion order unless deletion".

So fundamental question is: Is it to so bad thing that some people
write code depending
on CPython and PyPy implementation?

I think cross-interpreter libraries can use OrederedDict correctly
when they should use it.
(They may run test on micropython, Jython and IronPython).

And I think there are many use cases that "keeping insertion order is
not required, but it's
very nice if it is nearly zero cost.".

For example, when logging with JSON lines,

log.write(json.dumps( { "msg": "hello", "foo": foo, "bar" bar } ))

Stable key order may be not required, but it makes the log more readable.