Hi I would like to point out that we implemented rhettingers idea in PyPy that makes all the dicts ordered by default and we don't have any adverse performance effects (in fact, there is quite significant memory saving coming from it). The measurments on CPython could be different, but in principle OrderedDict can be implemented as efficiently as normal dict. Writeup: http://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.h... Previous discussion: https://mail.python.org/pipermail/python-dev/2012-December/123028.html Cheers, fijal
On Sat, Jan 24, 2015 at 12:50 PM, Maciej Fijalkowski
Hi
I would like to point out that we implemented rhettingers idea in PyPy that makes all the dicts ordered by default and we don't have any adverse performance effects (in fact, there is quite significant memory saving coming from it). The measurments on CPython could be different, but in principle OrderedDict can be implemented as efficiently as normal dict.
Writeup: http://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.h...
Previous discussion: https://mail.python.org/pipermail/python-dev/2012-December/123028.html
Cheers, fijal
also as a sidenote: PEP should maybe mention that PyPy is already supporting it, a bit by chance
Wow, very cool. When I implemented the very first Python dict (cribbing
from an algorithm in Knuth) I had no idea that 25 years later there would
still be ways to improve upon it! I've got a feeling Knuth probably didn't
expect this either...
On Sat, Jan 24, 2015 at 2:51 AM, Maciej Fijalkowski
On Sat, Jan 24, 2015 at 12:50 PM, Maciej Fijalkowski
wrote: Hi
I would like to point out that we implemented rhettingers idea in PyPy that makes all the dicts ordered by default and we don't have any adverse performance effects (in fact, there is quite significant memory saving coming from it). The measurments on CPython could be different, but in principle OrderedDict can be implemented as efficiently as normal dict.
Writeup: http://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.h...
Previous discussion: https://mail.python.org/pipermail/python-dev/2012-December/123028.html
Cheers, fijal
also as a sidenote: PEP should maybe mention that PyPy is already supporting it, a bit by chance _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
Hi Guido.
I *think* part of the reason why our implementation works is that
machines are significantly different than at the times of Knuth.
Avoiding cache misses is a very effective way to improve performance
these days.
Cheers,
fijal
On Sat, Jan 24, 2015 at 7:39 PM, Guido van Rossum
Wow, very cool. When I implemented the very first Python dict (cribbing from an algorithm in Knuth) I had no idea that 25 years later there would still be ways to improve upon it! I've got a feeling Knuth probably didn't expect this either...
On Sat, Jan 24, 2015 at 2:51 AM, Maciej Fijalkowski
wrote: On Sat, Jan 24, 2015 at 12:50 PM, Maciej Fijalkowski
wrote: Hi
I would like to point out that we implemented rhettingers idea in PyPy that makes all the dicts ordered by default and we don't have any adverse performance effects (in fact, there is quite significant memory saving coming from it). The measurments on CPython could be different, but in principle OrderedDict can be implemented as efficiently as normal dict.
Writeup: http://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.h...
Previous discussion: https://mail.python.org/pipermail/python-dev/2012-December/123028.html
Cheers, fijal
also as a sidenote: PEP should maybe mention that PyPy is already supporting it, a bit by chance _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
On Sat, Jan 24, 2015 at 11:11 AM, Maciej Fijalkowski
Hi Guido.
I *think* part of the reason why our implementation works is that machines are significantly different than at the times of Knuth. Avoiding cache misses is a very effective way to improve performance these days.
Right. We might look what Knuth has to say about data structures that are stored on disk and loaded into memory piecemeal. :-) -- --Guido van Rossum (python.org/~guido)
On Sat, Jan 24, 2015 at 3:50 AM, Maciej Fijalkowski
I would like to point out that we implemented rhettingers idea in PyPy that makes all the dicts ordered by default and we don't have any adverse performance effects (in fact, there is quite significant memory saving coming from it). The measurments on CPython could be different, but in principle OrderedDict can be implemented as efficiently as normal dict.
Writeup: http://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.h...
Previous discussion: https://mail.python.org/pipermail/python-dev/2012-December/123028.html
Cool. I'll add a note to the PEP. -eric
Hi all,
On 24 January 2015 at 11:50, Maciej Fijalkowski
I would like to point out that we implemented rhettingers idea in PyPy that makes all the dicts ordered by default and we don't have any adverse performance effects (in fact, there is quite significant memory saving coming from it). The measurments on CPython could be different, but in principle OrderedDict can be implemented as efficiently as normal dict.
I would like to add that http://bugs.python.org/issue16991 is the same as today's dicts with an additional doubly-linked list for the order. I'm unsure why you would do that after the 2012 thread started by Raymond Hettinger, but anyway, don't conclude from only this that in the CPython case ordered dictionaries would be slower and bigger. My guess is that, with a simple port of what is now in PyPy, they would not be (but of course no-one can be sure at this point). Let's say, if you could imagine that CPython's dictionaries, tomorrow, are always magically fully ordered, then would it still be a bad idea? If such a discussion would resurface (soon or "one day"), and if other related issues are resolved (like what to do in Jython and IronPython), and if the conclusion would tentatively turn out positive... then, provided there would at that point still be no "Raymond-style" implementation of dicts, I would volunteer to port PyPy's one to CPython[1]. As you may have guessed I don't consider this particularly likely to occur, but it is a standing offer nevertheless :-) A bientôt, Armin. [1] Someone could also do such a port for the goal of getting an alternate `odictobject.c`. He would be welcome to #pypy to get some help from the PyPy guys, including me --- but my offer above doesn't apply in this case. I want to remove a thorn in the foot of python-dev discussing about the language; I'm not really interested in contributing to the `collections.OrderedDict` type.
On Tue Jan 27 2015 at 2:13:08 PM Armin Rigo
Hi all,
On 24 January 2015 at 11:50, Maciej Fijalkowski
wrote: I would like to point out that we implemented rhettingers idea in PyPy that makes all the dicts ordered by default and we don't have any adverse performance effects (in fact, there is quite significant memory saving coming from it). The measurments on CPython could be different, but in principle OrderedDict can be implemented as efficiently as normal dict.
I would like to add that http://bugs.python.org/issue16991 is the same as today's dicts with an additional doubly-linked list for the order. I'm unsure why you would do that after the 2012 thread started by Raymond Hettinger, but anyway, don't conclude from only this that in the CPython case ordered dictionaries would be slower and bigger. My guess is that, with a simple port of what is now in PyPy, they would not be (but of course no-one can be sure at this point). Let's say, if you could imagine that CPython's dictionaries, tomorrow, are always magically fully ordered, then would it still be a bad idea?
It is a potentially bad idea if order is the default behavior of iteration, items(), keys() and values(). Ideally order should only be exposed when explicitly asked for to help prevent bugs and mitigate potential information leaks. But I'm not sure how big of a deal this actually is. The insertion order nicely doesn't give away anything related to the hash seed used for hash randomization which is a nice bonus over today's implementation (and 2.7 & 3.3's very poor hash randomization implementation). Experience cleaning up our huge code base at work to turn on hash randomization by default a couple years ago has shown that people depend on iteration order in code often without intending to. This often leads to latent bugs. Keep iteration order unstable by default and you prevent people from doing that. Make people request an ordered or stable iteration when their code explicitly needs it. If such a discussion would resurface (soon or "one day"), and if other
related issues are resolved (like what to do in Jython and IronPython), and if the conclusion would tentatively turn out positive... then, provided there would at that point still be no "Raymond-style" implementation of dicts, I would volunteer to port PyPy's one to CPython[1]. As you may have guessed I don't consider this particularly likely to occur, but it is a standing offer nevertheless :-)
CPython should benefit from it regardless for the memory savings alone. -gps
A bientôt,
Armin.
[1] Someone could also do such a port for the goal of getting an alternate `odictobject.c`. He would be welcome to #pypy to get some help from the PyPy guys, including me --- but my offer above doesn't apply in this case. I want to remove a thorn in the foot of python-dev discussing about the language; I'm not really interested in contributing to the `collections.OrderedDict` type. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ greg%40krypto.org
On 28/01/2015 07:14, Gregory P. Smith wrote:
It is a potentially bad idea if order is the default behavior of iteration, items(), keys() and values(). Ideally order should only be exposed when explicitly asked for to help prevent bugs and mitigate potential information leaks.
I have to be honest, I think that's the opposite of most new users assumption...
Experience cleaning up our huge code base at work to turn on hash randomization by default a couple years ago has shown that people depend on iteration order in code often without intending to. This often leads to latent bugs. Keep iteration order unstable by default and you prevent people from doing that.
Hmm, well, or you could say that always having ordering would mean the behaviour would match new users experimental understanding and so eliminate all bugs that occur when people accidentally rely on ordering. Personally, I'd prefer to see us be explicit about data structures used when "security matters", an explicit RandomOrderedDict would make that clear. cheers, Chris
participants (6)
-
Armin Rigo
-
Chris Withers
-
Eric Snow
-
Gregory P. Smith
-
Guido van Rossum
-
Maciej Fijalkowski