[Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered

Guido van Rossum guido at python.org
Mon Sep 12 12:35:30 EDT 2016


Couldn't we use the order in the actual hash table (which IIUC now
contains just indexes into the ordered vector of key/value/hash
structs)? That would probably simulate the pre-3.6 order quite
effectively.

But we'd have to add a new API to reveal the order (in effect just
what Nick wanted). How much of the OrderedDict can be implemented just
by adding new methods (IOW without changing the data structure)?

On Mon, Sep 12, 2016 at 9:27 AM, Gregory P. Smith <greg at krypto.org> wrote:
> For the regular dict (non kwargs or namespace __dict__) use case I would
> actually like to see disorder preserved during iteration.
>
> If we don't, we will eventually to find ourselves in a similar state we were
> in pre hash-randomization:
>  (1) Over time, code will come to depend on the order for no good reason.
> Especially true of tests. This greatly increases the engineering barrier
> when trying to move a codebase between Python versions or Python VMs.
>
> The underlying implementation is free to preserve order (as it now does,
> great work!) but I think the behavior of iteration when an ordered type was
> not explicitly requested or ordered iteration was not explicitly requested
> should be perturbed in order to maintain long term code health.
>
> Disorder for this purpose need not be a random shuffle (overkill). It just
> needs to be regularly inconsistent. A simple thing to do on top of 3.6's new
> dict implementation would be to pick a random starting point within the
> order array rather than offset 0 to start iteration from. That small change
> would be sufficient to guarantee that code depending on order must ask for
> order. It could even allow us to get people ready for iteration within the
> same process to become unstable.
>
> Maybe I worry too much. Having slogged through fixing problems to enable
> hash randomization on a code base of tens of millions of lines in 2012...
> there is a lot of value in enforcing disorder where none is intended to be
> guaranteed. Explicit is better than implicit.
>
> -gps
>
> On Mon, Sep 12, 2016 at 5:37 AM Victor Stinner <victor.stinner at gmail.com>
> wrote:
>>
>> 2016-09-12 13:50 GMT+02:00 Antoine Pitrou <solipsis at pitrou.net>:
>> > Besides, I don't think it has been proven that the compact-and-ordered
>> > dict implementation is actually *faster* than the legacy one.
>>
>> Python 3.6 dict is slower than Python 3.5 dict, at least for a simple
>> lookup:
>> http://bugs.python.org/issue27350#msg275581
>>
>> But its memory usage is 25% smaller.
>>
>> I'm curious about the performance of the "compaction" needed after
>> adding too many dummy entries (and to preserve insertion order), but I
>> don't know how to benchmark this :-) Maybe add/remove many new keys? I
>> expect bad performance on the compaction, but maybe not as bad as the
>> "hash DoS".
>>
>> For regular Python code, I don't expect compaction to be a common
>> operation, since it's rare to remove attributes. It's more common to
>> modify attributes value, than to remove them and later add new
>> attributes.
>>
>> Victor
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)


More information about the Python-Dev mailing list