[Python-Dev] Changing the order of iteration over a dictionary
mark at hotpy.org
Fri Jan 20 11:49:05 CET 2012
One of the main sticking points over possible fixes for the
hash-collision security issue seems to be a fear that changing the
iteration order of a dictionary will break backwards compatibility.
The order of iteration has never been specified. In fact not only is it
arbitrary, it cannot be determined from the contents of a dict alone; it
may depend on the insertion order.
Changing a hash function is not the only change that will change the
iteration order; any of the following will also do so:
* Changing the minimum size of a dict.
* Changing the load factor of a dict.
* Changing the resizing policy of a dict.
* Sharing of keys between dicts.
By treating iteration order as part of the API we are effectively ruling
out ever making any improvements to the dict.
For example, my new dictionary implementation
reduces memory use by 47% for gcbench, and by about 20% for the 2to3
benchmark, on my 32bit machine.
(Nice graphs: http://tinyurl.com/7qd2nnm http://tinyurl.com/6uqvl2x )
The new dict implementation (necessarily) changes the iteration order
and will break code that relies on it.
If dict iteration order is to be treated as part of the API (and I think
that is a very bad idea) then it should be documented, which will be
difficult since it is barely deterministic.
This will also be a major problem for PyPy, Jython and IronPython, as
they will have to reimplement their dicts.
So, don't be afraid to change that hash function :)
More information about the Python-Dev