Guarantee ordered dict literals in v3.7?
data:image/s3,"s3://crabby-images/1726b/1726ba2b828487263f4207552ae0786d5e9f1d07" alt=""
Hello, would it be possible to guarantee that dict literals are ordered in v3.7? The issue is well-known and the workarounds are tedious, example: https://mail.python.org/pipermail/python-ideas/2015-December/037423.html If the feature is guaranteed now, people can rely on it around v3.9. Stefan Krah
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
This sounds reasonable -- I think when we introduced this in 3.6 we were worried that other implementations (e.g. Jython) would have a problem with this, but AFAIK they've reported back that they can do this just fine. So let's just document this as a language guarantee. On Sat, Nov 4, 2017 at 10:30 AM, Stefan Krah <stefan@bytereef.org> wrote:
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/84ecc/84eccc2a9789a30ce89b3034a04e9c359b0b3fe1" alt=""
Isn't ordered dict also useful for **kwargs? If it turns out that there's a dict implementation that's faster by not preserving order, collections.UnorderedDict could be added. There could also be specialized implementations that pre-size the dict (cf: C++ unordered_map::reserve), etc., etc. But these are all future things, which might not be necessary. On 5 November 2017 at 12:44, Sven R. Kunze <srkunze@mail.de> wrote:
data:image/s3,"s3://crabby-images/cd7e0/cd7e070dde02a16b1e5e428904c488e52e05ce40" alt=""
I think that the problem with this is that for the most part, people will use `dict`, and for most uses of `dict`, order doesn't matter (and it never has before). Given that "arbitrary order" includes *any* fixed ordering (insertion ordered, reverse insertion ordered, etc), the common case should keep the existing "no order guarantee" specification. This gives interpreter authors maximum freedom with a fundamental, widely used data type.
Isn't ordered dict also useful for **kwargs?
If this is useful (and it seems like it would be), I think again a syntax modification that allows users to indicate that they want a particular implementation of **kwargs would be better than modifying dict semantics. It could possibly be handled with a type-hinting like syntax: def f(*args, **kwargs : OrderedKwargs): Or a riff on the existing syntax: def f(*args, ***kwargs): def f(*args, ^^kwargs): def f(*args, .**kwargs): In this case, the only guarantee you'd need (which relatively minor compared to a change in the dict semantics) would be that keyword argument order passed to a function would be preserved as the order that it is passed into the `kwargs` constructor. The old **kwargs syntax would give you a `dict` as normal, and the new ^^kwargs would give you an OrderedDict or some other dict subclass with guaranteed order. On 11/05/2017 03:50 PM, Peter Ludemann via Python-Dev wrote:
data:image/s3,"s3://crabby-images/fac1f/fac1f353137c52e340a2ea0ffa4fe887a42e016d" alt=""
On 6 November 2017 at 07:50, Peter Ludemann via Python-Dev < python-dev@python.org> wrote:
Isn't ordered dict also useful for **kwargs?
**kwargs is already specified as insertion ordered as of Python 3.6. https://www.python.org/dev/peps/pep-0468/ Tim Delaney
data:image/s3,"s3://crabby-images/832a7/832a7d28e16a261c5f64f5c6fc6585753582feae" alt=""
Since there is voting going on, -1 on changing the guarantees of `dict`. If ordering is important, OrderedDict is more explicit and more obvious to the over reading the code, even if the underlying implementation is identical. Good luck teaching people about why Python went from OrderedDict to UnorderedDict within a version. I remember learning about this same thing in C++ and just thinking “wut?”. And they just added a new type and said to stop using the old one – not changing something that people already understand. Better to underspecify the default dict and offer explicitly named extensions (DefaultDict, OrderedDict, etc.) for those who want more guarantees. Cheers, Steve Top-posted from my Windows phone From: Peter Ludemann via Python-Dev Sent: Sunday, November 5, 2017 12:53 To: Sven R. Kunze Cc: Python-Dev Subject: Re: [Python-Dev] Guarantee ordered dict literals in v3.7? Isn't ordered dict also useful for **kwargs? If it turns out that there's a dict implementation that's faster by not preserving order, collections.UnorderedDict could be added. There could also be specialized implementations that pre-size the dict (cf: C++ unordered_map::reserve), etc., etc. But these are all future things, which might not be necessary. On 5 November 2017 at 12:44, Sven R. Kunze <srkunze@mail.de> wrote: +1 from me too. On 04.11.2017 21:55, Jim Baker wrote: +1, as Guido correctly recalls, this language guarantee will work well with Jython when we get to the point of implementing 3.7+. On Sat, Nov 4, 2017 at 12:35 PM, Guido van Rossum <guido@python.org> wrote: This sounds reasonable -- I think when we introduced this in 3.6 we were worried that other implementations (e.g. Jython) would have a problem with this, but AFAIK they've reported back that they can do this just fine. So let's just document this as a language guarantee. On Sat, Nov 4, 2017 at 10:30 AM, Stefan Krah <stefan@bytereef.org> wrote: Hello, would it be possible to guarantee that dict literals are ordered in v3.7? The issue is well-known and the workarounds are tedious, example: https://mail.python.org/pipermail/python-ideas/2015-December/037423.html If the feature is guaranteed now, people can rely on it around v3.9. Stefan Krah _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/jbaker%40zyasoft.com _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/srkunze%40mail.de _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/pludemann%40google.com
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 6 November 2017 at 06:50, Peter Ludemann via Python-Dev <python-dev@python.org> wrote:
Isn't ordered dict also useful for **kwargs?
3.6 already made this semantic change for **kwargs and class execution namespaces - it just left the door open to having implementations meet those requirements by way of substituting in collections.OrderedDict for those use cases, rather than using a regular builtin dictionary. So we're also already imposing the requirement on conformant Python implementations to have a reasonably performant insertion-ordered mapping implementation available. The open questions around insertion ordering thus relate to what user expectations should be for: - dictionary displays - explicit invocations of the builtin dict constructor - mutating methods on builtin dicts - serialisation and deserialisation operations that use regular dicts (e.g. the JSON module, csv.DictReader/Writer) It's that last category which is particularly problematic now, as in *C*Python 3.6, the dicts used for these operations actually *are* insertion ordered, so round trips from disk, through a CPython builtin dict, and then back to disk *will* be order preserving, whereas in previous versions there was no guarantee that that would be the case (and hash randomisation meant the key reordering wasn't even deterministic). While such operations were already order preserving in PyPy (since their switch to an insertion-ordered builtin dict implementation served as prior art for the subsequent switch in CPython), we know from experience that it's incredibly easy for even things that are nominally implementation details of CPython to become expected behaviour for Python users (e.g. there's still plenty of code out there that relies on the use of an eager refcounting memory management strategy to handle external resources properly). That means our choices for 3.7 boil down to: * make this a language level guarantee that Python devs can reasonably rely on * deliberately perturb dict iteration in CPython the same way the default Go runtime does [1] When we did the "insertion ordered hash map" availability review, the main implementations we were checking on behalf of were Jython & VOC (JVM implementations), Batavia (JavaScript implementation), and MicroPython (C implementation). Adding IronPython (C# implementation) to the mix gives: * for the JVM, the insertion ordered LinkedHashMap [2] has been available since J2SE 1.4 was released in 2002 * for JavaScript, ECMA 6 defines the Map type [3] as an insertion ordered key/value store * for the .NET CLR, System.Collections.Specialized.OrderedDictionary [4] seems to be the best builtin option, but the Java implementations also map reasonably well to C# semantics * for MicroPython, it's not yet clear whether or not the hash map design used in CPython could also be adapted to their design constraints (i.e. giving both insertion ordering and O(1) lookup without requiring excessive amounts of memory), but it should at least be feasible to make the semantics configurable based on a compile time option (since CPython has a working C implementation of the desired semantics already) Since the round-trip behaviour that comes from guaranteed order preservation is genuinely useful, and we're comfortable with folks switching to more specialised containers when they need different performance characteristics from what the builtins provide, elevating insertion order preservation to a language level requirements makes sense. Cheers, Nick. [1] https://blog.golang.org/go-maps-in-action (search for "Iteration Order") [2] https://docs.oracle.com/javase/8/docs/api/java/util/LinkedHashMap.html [3] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Obj... [4] https://docs.microsoft.com/en-us/dotnet/api/system.collections.specialized.o... -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Nov 06, 2017 at 01:07:51PM +1000, Nick Coghlan wrote:
I agree with this choice. My preference is for the first: having dicts be unordered has never been a positive virtue in itself, but always the cost we paid for fast O(1) access. Now what we have fast O(1) access *without* dicts being unordered, we should make it a language guarantee. Provided of course that we can be reasonable certain that other implementations can do the same. And it looks like we can. But if we wanted to still keep our options open, how about weakening the requirement that globals() and object __dicts__ be specifically the same type as builtin dict? That way if we discover a super-fast and compact dict implementation (maybe one that allows only string keys?) that is unordered, we can use it for object namespaces without affecting the builtin dict.
Shouldn't we check with Nuitka (C++) and Cython as well? I'd be surprised if this is a problem for either of them, but we should ask.
+1 OrderedDict could then become a thin wrapper around regular dicts. -- Steve
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 6 November 2017 at 19:14, Steven D'Aprano <steve@pearwood.info> wrote:
If you're using dicts as dicts, Cython just uses the CPython ones via the C API, rather than defining its own. I didn't do any specific research for C++ (since it's essentially the king of fast low level data structure design, hence its popularity in graphics programming), but did see a few references to C++ ordered hash map implementations while looking into the available options for other runtimes. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 5 November 2017 at 04:35, Guido van Rossum <guido@python.org> wrote:
When I asked Damien George about this for MicroPython, he indicated that they'd have to choose between guaranteed order and O(1) lookups given their current dict implementation. That surprised me a bit (since PyPy and CPython both *saved* memory by switching to their guaranteed order implementations, hence the name "compact dict representation"), but my (admittedly vague) understand is that the presence of a space/speed trade-off in their case has something to do with MicroPython deliberately running with a much higher chance of hash collisions in general (since the data sets it deals with are naturally smaller). So if we make the change, MicroPython will likely go along with it, but it may mean that dict lookups there become O(N), and folks will be relying on "N" being consistently small due to memory constraints (but some typically O(N) algorithms will still become O(N^2) when run on MicroPython). I don't think that situation should change the decision, but I do think it would be helpful if folks that understand CPython's dict implementation could take a look at MicroPython's dict implementation and see if it might be possible for them to avoid having to make that trade-off and instead be able to use a naturally insertion ordered hashmap implementation. Cheers, Nick. P.S. If anyone does want to explore MicroPython's dict implementation, and see if there might be an alternate implementation strategy that offers both O(1) lookup and guaranteed ordering without using additional memory, the relevant files seem to be: * https://github.com/micropython/micropython/blob/77a48e8cd493c0b0e0ca2d2ad58a... (C level hashmap/ordered array structs) * https://github.com/micropython/micropython/blob/master/py/map.c (C level hashmap/ordered array implementation) * https://github.com/micropython/micropython/blob/master/py/objdict.c (Python dict wrapper around the mapping impl) The current behaviour is that the builtin dict uses the hashmap algorithms, while collections.OrderedDict uses the ordered array algorithms. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/32b67/32b67145b0fe3069a1de27c1ec5fc1c9428e9b97" alt=""
I've just looked at the MicroPython dictionary implementation and think they won't have a problem implementing O(1) compact dicts with ordering. The likely reason for the confusion is that they are already have an option for an "ordered array" dict variant that does a brute-force linear search. However, their normal hashed lookup is very similar to ours and is easily amenable to being compact and ordered. See: https://github.com/micropython/micropython/blob/77a48e8cd493c0b0e0ca2d2ad58a... Pretty much any implementation hashed lookup of keys and values is amenable to being compact and ordered. Whatever existing logic that looks up an entry becomes a lookup into a table of indices which in turn references a sequential array of keys and values. This logic is independent of hashing scheme or density, and it has no effect on the number of probes or collision rate. The cost is an extra level of indirection and an extra array of indices (typically very small). The benefit is faster iteration over the smaller dense key/value array, overall memory savings resulting in improved cache utilization, and the side-effect of remembering insertion order. Summary: I think MicroPython will be just fine and if needed I will help create the patch that implements compact-and-ordered behavior. Raymond
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 6 November 2017 at 09:43, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Nice! That's what I thought based on reading some of the design discussions about CPython's dict implementation, but I didn't know the details of either dict implementation well enough to be confident that the CPython changes would map cleanly to MicroPython's variant. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Sun, 5 Nov 2017 12:04:41 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
MicroPython hashmap implementation is effectively O(n) (average and worst case) due to the algorithm parameters chosen (like the load factor of 1). Of course, parameters could be tweaked, but the ones chosen are so because the memory usage is far more important for MicroPython systems than performance characteristics (all due to small amounts of memory). Like, MicroPython was twice as fast than Python 3.3 on average, and 1000 times more efficient in the memory usage.
(since PyPy and CPython both *saved* memory by switching to their guaranteed order implementations, hence the name "compact dict
There's nothing to save in MicroPython's dict implementation, simply because it doesn't waste anything in the first place - uPy's dict entry is just (key, val) (2 machine words), and load factor of the dict can reach 1.0 as mentioned. []
I don't think that situation should change the decision,
Indeed, it shouldn't. What may change it is the simple and obvious fact that there's no need to change anything, as proven by the 25-year history of the language. What happens now borders on technologic surrealism - the CPython, after many years of persuasion, switched its dict algorithm, rather inefficient in terms of memory, to something else, less inefficient (still quite inefficient, taking "no overhead" as the baseline). That algorithm randomly had another property. Now there's a seemingly serious talk of letting that property leak into the *language spec*, despite the fact that there can be unlimited number of dictionary algorithms, most of them not having that property. What it will lead to is further fragmentation of the community. Python2 vs Python3 split is far from being over, and now there're splits between: * people who use "yield from" vs "await" * people who use f-strings vs who don't * people who rely on sorted nature of dict's vs who don't etc. []
That would be the first programmer in the history to have a cake and eat it too. Memory efficiency, runtime efficiency, sorted order: choose 2 of 3. -- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/1726b/1726ba2b828487263f4207552ae0786d5e9f1d07" alt=""
On Mon, Nov 06, 2017 at 12:18:17PM +0200, Paul Sokolovsky wrote:
$ cat xxx.py def pi_float(): """native float""" lasts, t, s, n, na, d, da = 0, 3.0, 3, 1, 0, 0, 24 while s != lasts: lasts = s n, na = n+na, na+8 d, da = d+da, da+32 t = (t * n) / d s += t return s for i in range(100000): x = pi_float() $ time ./micropython xxx.py real 0m4.424s user 0m4.406s sys 0m0.016s $ $ time ../../cpython/python xxx.py real 0m1.066s user 0m1.056s sys 0m0.010s Congratulations ... Stefan Krah
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Mon, 6 Nov 2017 11:36:59 +0100 Stefan Krah <stefan@bytereef.org> wrote:
[]
$ time ./micropython xxx.py $ time ../../cpython/python xxx.py
Congratulations ...
That's why I wrote "Python 3.3", it's hard to argue CPython doesn't do anything about the "Python is slow" proverb. It's still shouldn't be hard to construct a testcase where MicroPython is faster (by not implementing features not needed by 90% of Python programs of course, not "for free"). Anyway, where're memory measurements? (This is offtopic, so I shouldn't have replied.)
Stefan Krah
-- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/1726b/1726ba2b828487263f4207552ae0786d5e9f1d07" alt=""
On Mon, Nov 06, 2017 at 01:03:05PM +0200, Paul Sokolovsky wrote:
Sorry, that was a slightly mischievous benchmark indeed. -- Whether the proposal is surreal or not depends on the likelihood that a) a substantially faster dict algorithm will emerge and b) CPython, PyPy and Jython will switch to it. My proposal was based on the fact that for almost two release cycles the ordering implementation detail hasn't changed. Stefan Krah
data:image/s3,"s3://crabby-images/05644/056443d02103b56fe1c656455ffee12aa1a01f1f" alt=""
On Mon, Nov 6, 2017 at 10:18 AM, Paul Sokolovsky <pmiscml@gmail.com> wrote: the way it's been presented, but this is hardly an enhancement the language has been screaming for for years. Presumably there is little concern that algorithms that rely on this behaviour will be perfectly syntactically conformant with earlier versions but will fail subtly and without explanation? It's a small concern, but a real one - particularly for learners. regards Steve
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 6 November 2017 at 21:18, Steve Holden <steve@holdenweb.com> wrote:
A similar concern existed when we elevated sort stability to being a language requirement - if you relied on that guarantee, your code was technically buggy on versions prior to 2.3, but eventually 2.2 and earlier aged out of general use, allowing such code to become correct in general. So the current discussion is mainly about deciding where we want the compatibility burden to fall in relation to dict insertion ordering: 1. Do we deliberately revert CPython back to being harder to use correctly for the sake of making Python easier to implement? 2. Do we make Python harder to implement for the sake of making it easier to use? 3. Do we choose not to choose, thus implicitly choosing "2" by default due to the fact that Python is defined by a language spec and a reference implementation, rather than *just* a language spec? Here's a more-complicated-than-a-doctest-for-a-dict-repo, but still fairly straightforward, example regarding the "insertion ordering dictionaries are easier to use correctly" argument: import json data = {"a":1, "b":2, "c":3} rendered = json.dumps(data) data2 = json.loads(rendered) rendered2 = json.dumps(data2) # JSON round trip assert data == data2, "JSON round trip failed" # Dict round trip assert rendered == rendered2, "dict round trip failed" Both of those assertions will always pass in CPython 3.6, as well as in PyPy, because their dict implementations are insertion ordered, which means the iteration order on the dictionaries is always "a", "b", "c". If you try it on 3.5 though, you should fairly consistently see that last assertion fail, since there's nothing in 3.5 that ensures that data and data2 will iterate over their keys in the same order. You can make that code implementation independent (and sufficiently version dependent to pass both assertions) by using OrderedDict: from collections import OrderedDict import json data = OrderedDict(a=1, b=2, c=3) rendered = json.dumps(data) data2 = json.loads(rendered, object_pairs_hook=OrderedDict) rendered2 = json.dumps(data2) # JSON round trip assert data == data2, "JSON round trip failed" # Dict round trip assert rendered == rendered2, "dict round trip failed" However, despite the way this code looks, the serialised key order *might not* be "a, b, c" on 3.5 and earlier (it will be on 3.6+, since that already requires that kwarg order be preserved). So the formally correct version independent code that reliably ensures that the key order in the JSON file is always "a, b, c" looks like this: from collections import OrderedDict import json data = OrderedDict((("a",1), ("b",2), ("c",3))) rendered = json.dumps(data) data2 = json.loads(rendered, object_pairs_hook=OrderedDict) rendered2 = json.dumps(data2) # JSON round trip assert data == data2, "JSON round trip failed" # Dict round trip assert rendered == rendered2, "dict round trip failed" # Key order assert "".join(data) == "".join(data2) == "abc", "key order failed" Getting from the "Works on CPython 3.6+ but is technically non-portable" state to a fully portable correct implementation that ensures a particular key order in the JSON file thus currently requires the following changes: - don't use a dict display, use collections.OrderedDict - make sure to set object_pairs_hook when using json.loads - don't use kwargs to OrderedDict, use a sequence of 2-tuples For 3.6, we've already said that we want the last constraint to age out, such that the middle version of the code also ensures a particular key order. The proposal is that in 3.7 we retroactively declare that the first, most obvious, version of this code should in fact reliably pass all three assertions. Failing that, the proposal is that we instead change the dict iteration implementation such that the dict round trip will start failing reasonably consistently again (the same as it did in 3.5), so that folks realise almost immediately that they still need collections.OrderedDict instead of the builtin dict. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 7 November 2017 at 03:27, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
sort_keys is only equivalent to order preservation if the key order you want *is* alphabetical order. While that's typically a good enough assumption for JSON, it's not the case for things like CSV column order, TOML or ini-file setting order, etc. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/fef1e/fef1ed960ef8d77a98dd6e2c2701c87878206a2e" alt=""
I think that Paul has a point. Interestingly, at the same time we're talking about guaranteeing the order of dicts, we're talking about using another, unordered, data structure (hash array mapped tries) to improve the performance of something that resembles a namespace. It seems the "unordered" part will be visible through ExecutionContext.vars(). https://www.python.org/dev/peps/pep-0550/#enumerating-context-vars The ordered-ness of dicts could instead become one of those stable CPython implementation details, such as the fact that resources are cleaned up timely by reference counting, that people nevertheless should not rely on if they're writing portable code. Regards Antoine. On Mon, 6 Nov 2017 12:18:17 +0200 Paul Sokolovsky <pmiscml@gmail.com> wrote:
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Nov 06, 2017 at 12:27:54PM +0100, Antoine Pitrou wrote:
Given that (according to others) none of IronPython, Jython, Batavia, Nuitka, or even MicroPython, should have trouble implementing an insertion-order preserving dict, and that PyPy already has, why should we say it is a CPython implementation detail? -- Steve
data:image/s3,"s3://crabby-images/1383b/1383b5c083aa3fd057a17c816aec322848ea5a95" alt=""
Hi, Could be missed by me but the guarantee that dict literals are ordered implies not that dict must be ordered in all cases. The dict literal: d = {"a": 1, "b": 2} will keep the order of "a" and "b" because it is specified as a dict literal. But d["c"] = 3 can change this order and it is allowed by the specification of guaranteed ordered dict literals. Please correct me if I am wrong. In Python 3.6 it does not because dict is implemented ordered and the insertion order is preserved. Also I think we should give the whole thing more time and wait with this guarantee. There are valid concerns against this. Personally I like the ordering but if I need an ordered dict it is ok for me to write this explicitly. The **kwargs are already guaranteed to be ordered and this was useful because the OrderedDict constructor benefits and for other places it is useful. But making all dict's per default ordered is another level. Regards, Wolfgang
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Mon, 6 Nov 2017 14:30:45 +0100 Antoine Pitrou <solipsis@pitrou.net> wrote:
For MicroPython, it would lead to quite an overhead to make dictionary items be in insertion order. As I mentioned, MicroPython optimizes for very low bookkeeping memory overhead, so lookups are effectively O(n), but orderedness will increase constant factor significantly, perhaps 5x. Also, arguably any algorithm which would *maintain* insertion order over mutating operations would be more complex and/or require more memory that one which doesn't. (This is based on the idea that this problem is equivalent to maintaining a sorted data structure, where sorting key is "insertion order"). CPython 3.6 gives **kwargs in the key specification order? That's fine if that's all that it says (note that it doesn't say what happens if someone takes **kwargs and starts to "del []" or "[] =" it). That allows different ways to implement it, per particular implementation's choice. That's even implementable in MicroPython. Like, lookups will be less efficient, but nobody realistically passes hundreds of kwargs. It's pretty different to go from that to dictionaries in general. Saying that a general dict always maintains insertion order is saying that "in Python, you have to use complex, memory hungry algorithms for a simple mapping type". Saying something like "dict maintains insertion order until first modification" is going down the rabbit hole, making it all confusing, hard to remember, crazy-sounding to novices. Why all that, if, since the beginning of times, Python offered a structure with guaranteed ordering - list, and structure for efficient mapping one values into other - dict. Then even marriage between the two - OrderedDict. Why suddenly once in 25 years there's a need to do something to dict's, violating computer science background behind them (one of the reason enough people loved Python comparing to other "practical hack" languages)? Alternatives were already presented on the thread - if people want more and easier ordered dictionaries, it calls to add a special literal initializer for them ("o{}" was proposed). -- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/e87f3/e87f3c7c6d92519a9dac18ec14406dd41e3da93d" alt=""
On Mon, 6 Nov 2017 at 08:36 Paul Sokolovsky <pmiscml@gmail.com> wrote:
But that's an implementation detail that most folks won't care about.
I don't understand what "computer science background" is being violated?
I don't see that happening. If OrderedDict didn't lead to syntax then I don't see why that's necessary now. I think worrying about future optimizations that order preservation might prevent is a premature optimization/red herring. None of us can predict the future so worrying about some algorithmic breakthrough that will suddenly materialize on how to implement dictionaries seems unnecessary. That line of thought suggests we should guarantee as little as possible and just make most stuff undefined behaviour. I also think Paul S. has made it clear that MicroPython won't be implementing order-preserving dicts due to their memory constraints. That's fine, but I think we also need to realize that MicroPython is already a slight hybrid by being based on Python 3.4 but with a backport of async/await and a subset of the stdlib.
From what I have read, this argument breaks down to this:
Standardizing order preserving and ... - MicroPython won't implement it, but all other major implementations will - Those that have order preservation will implement PEPs 468 and 520 for free - Some minor practical benefits in terms of avoiding accidental reliance on ordering If we don't promise order preservation ... - CPython should probably add some perturbing to the iteration order - All implementations can handle this without issue or major implementation costs So to me this sounds like a decision between the pragmatism of choosing semantics that all Python implementations can handle or the developer benefits of an order-preserving dict everywhere (all the other arguments I think are minor compared to these two).
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Mon, 06 Nov 2017 17:58:47 +0000 Brett Cannon <brett@python.org> wrote: []
I tried to explain that in the previous mail, can try a different angle. So, please open you favorite CS book (better few) and look up "abstract data types", then "mapping/associative array" and "list". We can use Wikipedia too: https://en.wikipedia.org/wiki/Associative_array. So, please look up: "Operations associated with this data type allow". And you'll see, that there're no "ordering" related operations are defined. Vice versa, looking at "sequence" operations, there will be "prev/next", maybe "get n'th" element operations, implying ordering. Python used to be a perfect application of these principles. Its dict was a perfect CS implementation of an abstract associative array, and list - of "sequence" abstract type (with additional guarantee of O(1) random element access). People knew and rejoiced that Python is built on solid science principles, or could *learn* them from it. That no longer will be true, with a sound concept being replaced with on-the-spot practical hack, choosing properties of a random associative array algorithm implementation over properties of a superset of such algorithms (many of which are again don't offer any orderness guarantees). I know though what will be replied (based on the replies below): "all these are implementation details" - no, orderness vs non-orderness of a mapping algorithm is an implementation detail; "users shouldn't know all that" - they should, that's the real knowledge, and up until now, they could learn that from *Python docs*, "we can't predict future" - we don't need, we just need to know the past (25 years in our case), and understand why it was done like that, I don't think Guido couldn't code it ordered in 1991, it's just not natural for a mapping type to be so, and in 2017, it's not more natural than it was in 1991. MicroPython in particular appeared because Python offered all the CS-sound properties and freedom and alternative choices for implementation (more so than any other scripting language). It's losing it, and not just for MicroPython's surprise. [] -- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/e87f3/e87f3c7c6d92519a9dac18ec14406dd41e3da93d" alt=""
On Mon, 6 Nov 2017 at 11:08 Paul Sokolovsky <pmiscml@gmail.com> wrote:
I don't think you meant for this to come off as insulting, but telling me how to look up the definition of an associative array or map feels like you're putting me down. I also have a Ph.D. in computer science so I'm aware of the academic definitions of these data structures.
People knew and rejoiced that Python is built on solid science principles, or could *learn* them from it.
That no longer will be true,
I don't think it's fair to call the current dict implementation a hack. It's a sound design that has a certain property that we are discussing the masking of. As I said previously, I think this discussion comes down to whether we think there are pragmatic benefits to exposing the ordered aspects to the general developer versus not. -Brett
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
I strongly opposed adding an ordered guarantee to regular dicts. If the implementation happens to keep that, great. Maybe OrderedDict can be rewritten to use the dict implementation. But the evidence that all implementations will always be fine with this restraint feels poor, and we have a perfectly good explicit OrderedDict for those who want that. On Nov 6, 2017 7:39 PM, "Brett Cannon" <brett@python.org> wrote:
data:image/s3,"s3://crabby-images/32b67/32b67145b0fe3069a1de27c1ec5fc1c9428e9b97" alt=""
On Nov 6, 2017, at 8:05 PM, David Mertz <mertz@gnosis.cx> wrote:
I strongly opposed adding an ordered guarantee to regular dicts. If the implementation happens to keep that, great. Maybe OrderedDict can be rewritten to use the dict implementation. But the evidence that all implementations will always be fine with this restraint feels poor, and we have a perfectly good explicit OrderedDict for those who want that.
I think this post is dismissive of the value that users would get from having reliable ordering by default. Having worked with Python 3.6 for a while, it is repeatedly delightful to encounter the effects of ordering. When debugging, it is a pleasure to be able to easily see what has changed in a dictionary. When creating XML, it is joy to see the attribs show in the same order you added them. When reading a configuration, modifying it, and writing it back out, it is a godsend to have it written out in about the same order you originally typed it in. The same applies to reading and writing JSON. When adding a VIA header in a HTTP proxy, it is nice to not permute the order of the other headers. When generating url query strings for REST APIs, it is nice have the parameter order match documented examples. We've lived without order for so long that it seems that some of us now think data scrambling is a virtue. But it isn't. Scrambled data is the opposite of human friendly. Raymond P.S. Especially during debugging, it is often inconvenient, difficult, or impossible to bring in an OrderedDict after the fact or to inject one into third-party code that is returning regular dicts. Just because we have OrderedDict in collections doesn't mean that we always get to take advantage of it. Plain dicts get served to us whether we want them or not.
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
I agree with Raymond. dict ordered by default makes better developer experience. So, my concern is how "language spec" is important for minor (sorry about my bad vocabulary) implementation? What's difference between "MicroPython is 100% compatible with language spec" and "MicroPython is almost compatible with Python language spec, but has some restriction"? If it's very important, how about "strong recommendation for implementations" instead of "language spec"? Users who don't care implementations other than CPython and PyPy can rely on it's usability. Regards, INADA Naoki <songofacandy@gmail.com> On Tue, Nov 7, 2017 at 2:11 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Tue, 7 Nov 2017 14:40:07 +0900 INADA Naoki <songofacandy@gmail.com> wrote:
So, the problem is that there's no "Python language spec". And over time, that becomes a problem for alternative implementations, especially not mainstream ("we have infinite amount of memory to burn") ones. What we have is just CPython documentation, which mixes Python language spec and CPython implementation details. And is being changed (including "language spec" part) on the fiat of CPython developers, apparently without any guarantees of platform stability and backward compatibility. Over time, this really becomes a visible drawback, comparing to the close competitors. For example, year goes by year, but in JavaScript, [] + [] is still: '' That's stability! [] -- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Nov 7, 2017, at 09:39, Paul Sokolovsky <pmiscml@gmail.com> wrote:
So, the problem is that there's no "Python language spec”.
There is a language specification: https://docs.python.org/3/reference/index.html But there are still corners that are undocumented, or topics that are deliberately left as implementation details. Cheers, -Barry
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Nov 7, 2017 12:02 PM, "Barry Warsaw" <barry@python.org> wrote: On Nov 7, 2017, at 09:39, Paul Sokolovsky <pmiscml@gmail.com> wrote:
So, the problem is that there's no "Python language spec”.
There is a language specification: https://docs.python.org/3/refe rence/index.html But there are still corners that are undocumented, or topics that are deliberately left as implementation details. Also, specs don't mean that much unless there are multiple implementations in widespread use. In JS the spec matters because it describes the common subset of the language you can expect to see across browsers, and lets the browser vendors coordinate on future changes. Since users actually target and test against multiple implementations, this is useful. In python, CPython's dominance means that most libraries are written against CPython's behavior instead of the spec, and alternative implementations generally don't care about the spec, they care about whether they can run the code their users want to run. So PyPy has found that for their purposes, the python spec includes all kinds of obscure internal implementation details like CPython's static type/heap type distinction, the exact tricks CPython uses to optimize local variable access, the CPython C API, etc. The Pyston devs found that for their purposes, refcounting actually was a mandatory part of the python language. Jython, MicroPython, etc make a different set of compatibility tradeoffs again. I'm not saying the spec is useless, but it's not magic either. It only matters to the extent that it solves some problem for people. -n
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic. I could either be something like an environment variable SCRAMBLE_DICT_ORDER or a flag like --scramble-dict-order. That would probably help somewhat with the very real problem of "everyone's going to start counting on this ordered property". On 11/07/2017 12:58 PM, Barry Warsaw wrote:
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 7 November 2017 at 20:35, Paul G <paul@ganssle.io> wrote:
If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic.
I could either be something like an environment variable SCRAMBLE_DICT_ORDER or a flag like --scramble-dict-order. That would probably help somewhat with the very real problem of "everyone's going to start counting on this ordered property".
This seems like overkill to me. By the same logic, we should add a "delay garbage collection" mode, that allows people to test that their code doesn't make unwarranted assumptions that a reference-counting garbage collector is in use. Most public projects (which are the only ones that really need to worry about this sort of detail) will probably be supporting Python 3.5 and likely even Python 2.7 for some time yet. So they test under non-order-preserving dictionary implementations anyway. And if code that's only targeted for Python 3.7 assumes order preserving dictionaries, it's likely not got a huge user base anyway, so what's the problem? Paul
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
@ Paul Moore
But you can get pretty fine-grained control of garbage collection with judicious use of gc.disable(). If there were a similar mechanism for changing the ordering properties of dictionaries in code, I wouldn't suggest it as an interpreter flag / option. And you're right - it's not pressing - the people likely to test with dictionaries scrambled are exactly the people likely to be supporting 2.7 and 3.5, but that won't be true forever, and it would be nice to have *some* mechanism to test that you're not relying on this property. @Barry Warsaw
As has been suggested elsewhere, if we decide not to make that guarantee, then we should probably deliberately randomize iteration order.
This was my suggestion of a middle way, since deliberate randomization seems like a step too far just to avoid people relying on implementation details. I've seen plenty of code that assumes that `assert` statements will always throw `AssertionError`, but that's not guaranteed, and some people run their test suites with -O just to check that they aren't making that assumption. On 11/07/2017 04:15 PM, Paul Moore wrote:
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
This seems like overkill to me. By the same logic, we should add a "delay garbage collection" mode, that allows people to test that their code doesn't make unwarranted assumptions that a reference-counting garbage collector is in use. Actually, there is a LOT of code out there that expects reference counting. I know a lot of my code does. So if cPython does abandon it some day, there will be issues. Another thought: Let’s say Python declares that dict literals are order preserving. Then, one day, someone invents a massively more efficient non-order preserving implementation, and we want to use it for Python 4. So the Python 4 language spec says that dicts are not order preserving. And this is flagged as an INCOMPATIBLE CHANGE. Now everyone knows to go and check their code, and the 3to4 tool adds an “o” to all dict literals. People will complain, but it won’t be unexpected breakage. Compare to leaving it as an implementation detail — now we will have a lot of code in the wild that expects order-preservation (because people don’t read the language spec) that will break with such a change without any real warning. I think we really do need to accept that cPython is a reference implementation. Because it is. By the way, I only just realized I can delete a key to demonstrate non-order-preservation on py 3.6. So at least I know what to tell students now. -CHB But you can get pretty fine-grained control of garbage collection with judicious use of gc.disable(). If there were a similar mechanism for changing the ordering properties of dictionaries in code, I wouldn't suggest it as an interpreter flag / option. And you're right - it's not pressing - the people likely to test with dictionaries scrambled are exactly the people likely to be supporting 2.7 and 3.5, but that won't be true forever, and it would be nice to have *some* mechanism to test that you're not relying on this property. @Barry Warsaw As has been suggested elsewhere, if we decide not to make that guarantee, then we should probably deliberately randomize iteration order. This was my suggestion of a middle way, since deliberate randomization seems like a step too far just to avoid people relying on implementation details. I've seen plenty of code that assumes that `assert` statements will always throw `AssertionError`, but that's not guaranteed, and some people run their test suites with -O just to check that they aren't making that assumption. On 11/07/2017 04:15 PM, Paul Moore wrote: On 7 November 2017 at 20:35, Paul G <paul@ganssle.io> wrote: If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic. I could either be something like an environment variable SCRAMBLE_DICT_ORDER or a flag like --scramble-dict-order. That would probably help somewhat with the very real problem of "everyone's going to start counting on this ordered property". Most public projects (which are the only ones that really need to worry about this sort of detail) will probably be supporting Python 3.5 and likely even Python 2.7 for some time yet. So they test under non-order-preserving dictionary implementations anyway. And if code that's only targeted for Python 3.7 assumes order preserving dictionaries, it's likely not got a huge user base anyway, so what's the problem? Paul _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Nov 7, 2017, at 16:15, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
Actually, there is a LOT of code out there that expects reference counting. I know a lot of my code does. So if cPython does abandon it some day, there will be issues.
I see this all the time in code reviews: content = open(some file).read() I never let that go uncommented. So while you’re right that CPython is the reference implementation, and few people read the language spec, it’s still encombunt on us to point out broken code, code with implicit assumptions, and code that is not portable between implementations. Having the reference manual to point to chapter and verse is critical to avoid Python just devolving into an ad-hoc language ruled by its most popular implementation. This is something I believe Guido realized early on when JPython was first invented, and it was an important distinction that Python held, e.g. versus Perl. I still believe it’s an important principle to maintain. Cheers, -Barry
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 8 November 2017 at 07:15, Paul Moore <p.f.moore@gmail.com> wrote:
Quite a few projects these days include PyPy in their CI test matrix, and one of the things that does is test whether or not you're relying on refcounting semantics. We also offer ResourceWarning natively in CPython, which means if you run under "python3 -Wd", you'll get a warning when external resources like files are cleaned up implicitly: $ python3 -Wd Python 3.6.2 (default, Oct 2 2017, 16:51:32) [GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> f = open(".bashrc") >>> del f __main__:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='.bashrc' mode='r' encoding='UTF-8'> >>>
The concern is that if we don't explicitly perturb dict iteration order (or offer a way to opt-in to that), then insertion ordering will end up becoming a *de facto* compatibility requirement for Python implementations as CPython 2.7 and other releases prior to 3.6 start dropping out of typical test matrices. With both 2.7 and 3.5 going end-of-life in 2020, that means 3.7 (mid 2018) and 3.8 (late 2019 or early 2020) are our available opportunities to make that decision - beyond that, it starts getting a lot harder to change course away from implicit standardisation, as there'll be a lot more 3.6+-only code in the world by then. The way golang handled this problem is in their dict iterator: they added an extra field to hold a randomly generated hash, and used that hash as the starting point in their iteration sequence. We wouldn't be able to implement per-iterator order randomisation in CPython due to the way the PyDict_Next API works: that uses a caller-provided Py_ssize_t entry to track the current position in the iteration sequence. This means the simplest change we can make is to adjust the code in _PyDict_Next that reads the "current iteration index" from the user supplied pointer to instead interpret that as having a constant offset (e.g. starting with the last item in the "natural" iteration order, and then looping back around to the first one). "Simplest" isn't the same as "simple" though, as: 1. We can't change this globally for all dicts, as we actually *do* need keyword argument dicts and class body execution namespaces to be insertion ordered. That makes it either a per-instance setting, or else a subtly different dict type. 2. So far, I haven't actually come up with a perturbed iteration implementation that doesn't segfault the interpreter. The dict internals are nicely laid out to be iteration friendly, but they really do assume that you're going to start at index zero, and then iterate through to the end of the array. The bounds checking and pointer validity testing becomes relatively fiddly if you try to push against that and instead start iteration from a point partway through the storage array. That second point also becomes a concern from a performance perspective because this is code that runs on *each* iteration of a loop, rather than purely as part of the loop setup. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 8 November 2017 at 11:44, Nick Coghlan <ncoghlan@gmail.com> wrote:
In case anyone else wants to experiment with a proof of concept: https://github.com/ncoghlan/cpython/commit/6a8a6fa32f0a9cd71d9078fbb2b5ea44d... I think we've probably exhausted the utility of discussing this as a purely hypothetical change, and so the only way to move the discussion forward will be for someone to draft a patch that: 1. Perturbs iteration for regular dicts (it's OK for our purposes if it's still deterministic - it just shouldn't match insertion order the way odict does) 2. Switches keyword args and class body execution namespaces over to odict so the test suite passes again 3. Measures the impact such a change would have on the benchmark suite My experiment is a starting point, but it will still be a fair bit of work to get it from there to a viable proof of concept that can be assessed against the status quo. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
For now, odict use twice memory and 2x slower on iteration. https://bugs.python.org/issue31265#msg301942 INADA Naoki <songofacandy@gmail.com> On Wed, Nov 8, 2017 at 11:33 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Nov 7, 2017, at 12:35, Paul G <paul@ganssle.io> wrote:
If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic.
As has been suggested elsewhere, if we decide not to make that guarantee, then we should probably deliberately randomize iteration order. -Barry
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
On Wed, Nov 8, 2017 at 5:35 AM, Paul G <paul@ganssle.io> wrote:
If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic.
I could either be something like an environment variable SCRAMBLE_DICT_ORDER or a flag like --scramble-dict-order. That would probably help somewhat with the very real problem of "everyone's going to start counting on this ordered property".
Namespace is ordered by language spec. What does SCRAMBLE_DICT_ORDER in this code? class A: def __init__(self): self.a, self.b, self.c = 1, 2, 3 a = A() print(a.__dict__) a.__dict__.pop('a') print(a.__dict__) Anyway, I'm -1 on adding such option to dict. dict in CPython is complicated already for performance and compatibility reason. I don't want to add more complexity to dict for such reason. Regards, INADA Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
It seems there must be at least two threads for each topic worth discussing at all. Therefore I feel compelled to point to https://mail.python.org/pipermail/python-dev/2017-November/150381.html, where I state my own conclusion about dict order. I know Paul Sokolovsky does not claim to speak for MicroPython, but I think he had better shut up or he's nevertheless going to damage its reputation beyond repair. I note that micropython.org claims "MicroPython aims to be as compatible with normal Python as possible to allow you to transfer code with ease from the desktop to a microcontroller or embedded system." To me this implies that it is entirely up to the MicroPython project to decide what they'll do about the order of dict elements -- they can keep doing what they are doing, or choose a new dict implementation that satisfies their space and performance needs while also preserving order, or give developer a compile-time choice, or give users a choice at startup time, or something else I haven't thought of yet. Any of those options is better than continuing the flamewar that Paul is waging. Finally: the dict type should not be endowed with other parts of the OrderedDict API, not should other API changes to dict be considered. -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
On Nov 6, 2017 9:11 PM, "Raymond Hettinger" <raymond.hettinger@gmail.com> wrote:
I think this post is dismissive of the value that users would get from having reliable ordering by default. Dismissive seems like an overly strong word. I recognize I disagree with Raymond on best official semantics. Someone else points out that if someday an "even more efficient unordered dict" is discovered, user-facing "dict" doesn't strictly have to be the same data structure as "internal dict". The fact they are is admittedly an implementation detail also. I've had all those same uses about round-tripping serialization that Raymond mentions. I know the standard work arounds (which are not difficult, but DO require a little extra code if we don't have order). But like Raymond, I make most of my living TEACHING Python. I feel like the extra order guarantee would make teaching slightly harder. I'm sure he feels contrarily. It is true that with 3.6 I can no longer show an example where the dict display is oddly changed when printed. But then, unordered sets also wind up sorting small integers on printing, even though that's not a guarantee. Ordering by insertion order (possibly "only until first deletion") is simply not obvious to beginners. If we had, hypothetically, a dict that "always alphabetized keys" that would be more intuitive to them, for example. Insertion order feels obvious to us experts, but it really is an extra cognitive burden to learners beyond understanding "key/Val association". Having worked with Python 3.6 for a while, it is repeatedly delightful to encounter the effects of ordering. When debugging, it is a pleasure to be able to easily see what has changed in a dictionary. When creating XML, it is joy to see the attribs show in the same order you added them. When reading a configuration, modifying it, and writing it back out, it is a godsend to have it written out in about the same order you originally typed it in. The same applies to reading and writing JSON. When adding a VIA header in a HTTP proxy, it is nice to not permute the order of the other headers. When generating url query strings for REST APIs, it is nice have the parameter order match documented examples. We've lived without order for so long that it seems that some of us now think data scrambling is a virtue. But it isn't. Scrambled data is the opposite of human friendly. Raymond P.S. Especially during debugging, it is often inconvenient, difficult, or impossible to bring in an OrderedDict after the fact or to inject one into third-party code that is returning regular dicts. Just because we have OrderedDict in collections doesn't mean that we always get to take advantage of it. Plain dicts get served to us whether we want them or not.
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
On Tue, Nov 7, 2017 at 7:21 AM, David Mertz <mertz@gnosis.cx> wrote:
But like Raymond, I make most of my living TEACHING Python.
and I make least of my living TEACHING Python ( :-) ),and:
I feel like the extra order guarantee would make teaching slightly harder.
I can't understand how this possibly makes python (or dicts) harder to teach -- you can simply say: "dicts insertion order is preserved" or not mention it at all -- I think most people kind of expect it to be preserved, which is why I (used to )always bring up the lack-of-ordering of dicts early on -- but I suspect I simply won't bother mentioning it if it's decided as a language feature. I'm sure he feels contrarily. It is true that with 3.6 I can no longer show
an example where the dict display is oddly changed when printed.
Exactly! I have a really hard time deciding how to handle this -- explaining that ordering is not guaranteed, but not being able to demonstrate it! And frankly, my students are all going to forget what I "explained" soon enough, and replace it with their experience -- which will be that dicts retain their order. But then, unordered sets also wind up sorting small integers on printing,
even though that's not a guarantee.
but it's not hard to make an easy example with order not preserved -- jsut start with a non order example: In [6]: s = {3,7,4} In [7]: s Out[7]: {3, 4, 7} or use other types... And "set" is a mathematical concept that has no oder, whereas the "dictionary" metaphor DOES have order...
Ordering by insertion order (possibly "only until first deletion") is simply not obvious to beginners.
the "only until first deletion" part is really hard -- I hope we don't go that route. But I don't think insertion-order is non-obvious -- particularly with literals.
again, I don't think so -- I kind of agree if dicts did not preserve order in practice -- demonstrating that right out of the gate does help make the "key/Val association" clear -- but if you can't demonstrate it, I think we're looking at more confusion... Maybe I'll ask my students this evening -- this is the first class I'm teaching with py3.6 .... We've lived without order for so long that it seems that some of us now
think data scrambling is a virtue. But it isn't. Scrambled data is the opposite of human friendly.
exactly! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 7 November 2017 at 21:13, Chris Barker <chris.barker@noaa.gov> wrote:
But I thought we *had* gone that route. Actually, there's no "route" to go here. We're only talking about documenting the existing semantics that cPython has, and I thought that included no longer guaranteeing insertion order after a delete. Although I can't prove that by experiment at the moment. I don't know whether my confusion above is an argument for or against documenting the behaviour :-) Paul
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Nov 06, 2017 at 08:05:07PM -0800, David Mertz wrote:
I strongly opposed adding an ordered guarantee to regular dicts. If the implementation happens to keep that, great.
That's the worst of both worlds. The status quo is that unless we deliberately perturb the dictionary order, developers will come to rely on implementation order (because that's what the CPython reference implementation actually offers, regardless of what the docs say). Consequently: - people will be writing non-portable code, whether they know it or not; - CPython won't be able to change the implementation, because it will break too much code; - other implementations will be pressured to match CPython's implementation. The only difference is that on the one hand we are honest and up-front about requiring order-preserving dicts, and on the other we still require it, but pretend that we don't. And frankly, it seems rather perverse to intentionally perturb dictionary order just to keep our options open that someday there might be some algorithm which offers sufficiently better performance but doesn't preserve order. Preserving order is useful, desirable, often requested functionality, and now that we have it, it would have to be one hell of an optimization to justify dropping it again. (It is like Timsort and stability. How much faster sorting would it have taken to justify giving up sort stability? 50% faster? 100%? We wouldn't have done it for a 1% speedup.) It would be better to relax the requirement that builtin dict is used for those things that would benefit from improved performance. Is there any need for globals() to be the same mapping type as dict? Probably not. If somebody comes up with a much more efficient, non-order- preserving map ideal for globals, it would be better to change globals than dict. In my opinion.
I think you have a different definition of "poor" to me :-) Nick has already done a survey of PyPy (which already has insertion- order preserving dicts), Jython, VOC, and Batavia, and they don't have any problem with this. IronPython is built on C#, which has order- preserving mappings. Nuitka is built on C++, and if C++ can't implement an order-preserving mapping, there is something terribly wrong with the world. Cython (I believe) uses CPython's implementation, as does Stackless. The only well-known implementation that may have trouble with this is MicroPython, but it already changes the functionality of a lot of builtins and core language features, e.g. it uses a different method resolution order (so multiple inheritence won't work right), some builtins don't support slicing with three arguments, etc. I think the evidence is excellent that other implementations shouldn't have a problem with this, unless (like MicroPython) they are targetting machines with tiny memory resources. µPy runs on the PyBoard, which I believe has under 200K of memory. I think we can all forgive µPy if it only *approximately* matches Python semantics. -- Steve
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 7 November 2017 at 16:21, Steven D'Aprano <steve@pearwood.info> wrote:
While I think "poor" is understating the case, I think "excellent" (which you use later on) is overstating it. My own characterisation would be "at least arguably good enough".
For these, my research only showed that their respective platforms have an order-preserving hashmap implementation available. What's entirely unclear at this point is how switching wholesale to that may impact the *performance* of these implementations (both in terms of speed and memory usage), and how much code churn would be involved in actually making the change. Making builtin dict() order-preserving may also still impose an ongoing complexity cost on these implementations if they end up having to split "the mapping we use for code execution namespaces" away from "the mapping we provide as the builtin dict". (That said, I think at least Jython already makes that distinction - I believe their string-only namespace dict is a separate type, whereas CPython plays dynamic optimisation games inside the regular dict type). So Barry's suggestion of providing an explicit "collections.UnorderedDict" as a consistent spelling for "an associative mapping without any ordering guarantees" is a reasonable one, even if it's primary usage in CPython ends up being to ensure algorithms are compatible with collections that don't provide an inherently stable iteration order, and any associated performance benefits are mostly seen on other implementations. (As Paul S notes, such a data type would also serve a pedagogical purpose when teaching computer science principles)
IronPython is built on C#, which has order- preserving mappings.
I couldn't actually find a clearly suitable existing collection type in the .NET CLR - the one I linked was just the one commonly referenced as "good enough for most purposes". It had some constraints that meant it may not be suitable as a builtin dict type in a Python implementation (e.g. it looked to have a 32-bit length limit).
Right, the other C/C++ implementations that also target environments with at least 128 MiB+ RAM (and typically more) can reasonably be expected to tolerate similar space/speed trade-offs to those that CPython itself makes (and that's assuming they aren't just using CPython's data type implementations in the first place).
It runs on the ESP8266 as well, and that only has 96 kiB data memory. This means we're already talking 3-4 orders of magnitude difference in memory capacity and typical data set sizes between CPython and MicroPython use cases, and that's only accounting for the *low* end of CPython's use cases - once you expand out to multi-terabyte data sets (the very upper end of what a single x86-64 server can handle if you can afford the RAM for it), then we're talking 9-10 orders of magnitude between CPython's high end and MicroPython's low end. So for CPython's target use cases algorithmic efficiency dominates performance, and we afford to invest extra memory usage and startup overhead in service to more efficient data access algorithms. MicroPython's the opposite - you're going to run out of memory for data storage long before algorithmic inefficiency becomes your biggest problem, but wasted bookkeeping memory and startup overhead can cause problems even with small data sets. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/41d1b/41d1b12bb02cdccb4080ecb917b5d7ebb1daa0c6" alt=""
On 07/11/17 04:05, David Mertz wrote:
If there is an ordered guarantee for regular dicts but not for dict literals, which is the subject of this thread, then haven't we got a recipe for the kind of confusion that will lead to the number of questions from newbies going off of the Richter scale? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Nov 06, 2017 at 06:35:48PM +0200, Paul Sokolovsky wrote:
Paul, it would be good if you could respond to Raymond's earlier comments where he wrote: I've just looked at the MicroPython dictionary implementation and think they won't have a problem implementing O(1) compact dicts with ordering. The likely reason for the confusion is that they are already have an option for an "ordered array" dict variant that does a brute-force linear search. However, their normal hashed lookup is very similar to ours and is easily amenable to being compact and ordered. See: https://github.com/micropython/micropython/blob/77a48e8cd493c0b0e0ca2d2ad58a... Raymond has also volunteered to assist with this.
I think it would be reasonable to say that builtin dicts only maintain insertion order for insertions, lookups, and changing the value. Any mutation which deletes keys may arbitrarily re-order the dict. If the user wants a stronger guarantee, then they should use OrderedDict. -- Steve
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Nov 6, 2017, at 22:33, Steven D'Aprano <steve@pearwood.info> wrote:
In fact, that *is* leaking CPython’s implementation into the language specification. If by chance CPython’s implementation preserved order even after key deletion, either now or in the future, would that be defined as the ordering guarantees for built-in dict in the Python Language Reference? Cheers, -Barry
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Tue, 7 Nov 2017 17:33:03 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
I tried to do that, let me summarize previous point and give IMHO re: contributing an alternative: MicroPython's dict implementation is optimized for the least bookkeeping overhead, not performance on overlarge datasets. For the heap sizes we target (64KB on average), that's a good choice (put it differently, MicroPython's motto is "system's memory (all bunch kilobytes of it) belongs to user, not to MicroPython"). Adding insertion order would either: 1. Lead to significant (several times) slowdown, or 2. To noticeable memory overhead. Note that MicroPython uses the absolute minimum for a dictionary entry - 2 words (key and value). Adding even one extra word (e.g. some indirection pointer) means increasing overhead by 50%, cutting useful user memory size by third. With all that in mind, MicroPython is a very configurable project (215 config options as of this moment). We can have a config option for dict implementation too. But, the points above still hold - MicroPython targets low-memory systems and doesn't really target plenty-of-memory systems (there was never an aim to compete with CPython, the aim was to bring Python (the language) where CPython could never go). Put it another way, the alternative dict implementation is not expected to be used by default. If, with all the above in mind, someone, especially a CPython developer, wants to contribute an alternative dict implementation, it would be gladly accepted. (Note that if CPython followed the same policy, i.e. allowed compile-time selection of old vs new dict algorithm, we wouldn't have this thread.) (Disclaimer: all the above is just my IMHO as a long-time contributor, I'm not a MicroPython BDFL). And I really appreciate all the attention to MicroPython - that's the biggest case on my memory on python-dev. [] -- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Nov 6, 2017, at 02:18, Paul Sokolovsky <pmiscml@gmail.com> wrote:
This is the classic argument of, do you proceed conservatively and use the lowest-common denominator that makes your code work with the widest range of versions, or do you ride the wave and adopt the latest and greatest features as soon as they’re available? Neither answer is wrong or right… for everyone. It’s also a debate as old as the software industry. Every package, project, company, developer, community will have to decide for themselves. Once you realize you can’t win, you’ve won! :) -Barry
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
Is there a major objection to just adding in explicit syntax for order-preserving dictionaries? To some extent that seems like a reasonable compromise position in an "explicit is better than implicit" sense. A whole lot of code is out there that doesn't require or expect order-preserving dictionaries - it would be nice to be able to differentiate out the parts where order actually *does* matter. (Of course, given that CPython's implementation is order-preserving, a bunch of code is probably now being written that implicitly requires on this detail, but at least having syntax that makes that clear would give people the *option* to make the assumption explicit). On 11/06/2017 01:19 PM, Barry Warsaw wrote:
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Nov 6, 2017, at 11:23, Paul G <paul@ganssle.io> wrote:
Is there a major objection to just adding in explicit syntax for order-preserving dictionaries?
I don’t think new syntax is necessary. We already have OrderedDict, which to me is a perfectly sensible way to spell “I need a mapping that preserves insertion order”, and the extra import doesn’t bother me. I’m not saying whether or not to make the language guarantee that built-in dict preserves order. I’m just saying that if we don’t make that language change, we already have everything we need to support both use cases. If we did make the change, it’s possible we would need a way to explicit say that order is not preserved. That seems a little weird to me, but I suppose it could be useful. I like the idea previously brought up that iteration order be deliberately randomized in that case, but we’d still need a good way to spell that. Cheers, -Barry
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Mon, 6 Nov 2017 11:33:10 -0800 Barry Warsaw <barry@python.org> wrote: []
If we did make the change, it’s possible we would need a way to explicit say that order is not preserved. That seems a little weird
I recently was working on a more or less complex dataflow propagation problem. It should converge to a fixed point, and it did, but on different runs, to different ones. So, I know that I'm a bad programmer, need to do more of my homework and grow. I know, that if I rewrite it in C++ or C, it'll work unstable the same way, because it's buggy. (Heck, over these years, I learned that I don't need to rewrite things in C/C++, because Python is the *real* language, which works the way computers do, without sugaring that up). I need to remember that, because with Python 3.7, I may become a good-programmer-in-a-ponyland-of-ordered-dicts. Btw, in all this discussion, I don't remember anyone mentioning sets. I don't recall the way they're implemented in CPython, but they have strong conceptual and semantic resemblance to dict's. So, what about them, do they become ordered too?
Cheers, -Barry
-- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Nov 06, 2017 at 11:33:10AM -0800, Barry Warsaw wrote:
Useful for what? Given that we will hypothetically have order-preserving dicts that perform no worse than unordered dicts, I'm struggling to think of a reason (apart from performance) why somebody would intentionally use a non-ordered dict. If performance was an issue, sure, it makes sense to have a non-ordered dict for when you don't want to pay the cost of keeping insertion order. But performance seems to be a non-issue. I can see people wanting a SortedDict which automatically sorts the keys into some specified order. If I really work at it, I can imagine that there might even be a use-case for randomizing the key order (like calling random.shuffle on the keys). But if you are willing to use a dict with arbitrary order, that means that *you don't care* what order the keys are in. If you don't care, then insertion order should be no better or worse than any other implementation-defined arbitrary order.
That would only be in the scenario that we decide *not* to guarantee insertion-order preserving semantics for dicts, in order to prevent users from relying on an implementation feature that isn't a language guarantee. -- Steve
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
On Mon, Nov 6, 2017 at 11:23 AM, Paul G <paul@ganssle.io> wrote:
This is a really key point -- a heck of a lot more people use cPython than read the language spec. And a great deal of code is developed with a certain level of ignorance -- try something, if it works, and your test pass (if there are any), then you are done. So right now, there is more an more code out there that relies on a regular old dcit being ordered. I've been struggling with teaching this right now -- my written-a-couple-years ago materials talk about dicts being arbitrary order, and then have a little demo of that fact. Now I'm teaching with Python 3.6, and I had to add in something like: cPython does, in fact, preserve order with dicts, but it should be considered an implementation detail, and not counted on ... (and by the say, so does PyPy, and ....)" I don't know, but I'm going to guess about 0% of my class is going to remember that... And if we added o{,,,} syntax it would be even worse, 'cause folks would forget to use it, as their code wouldn't behave differently (kind of like the 'b' flag on unix text files, or the u"string" where everything is ascii in that string...) in short -- we don't have a choice (unless we add an explicit randomization as some suggested -- but that just seems perverse...) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 7 November 2017 at 09:23, Chris Barker <chris.barker@noaa.gov> wrote:
in short -- we don't have a choice (unless we add an explicit randomization as some suggested -- but that just seems perverse...)
And this is the key point for me: "choosing not to choose" is effectively the same as standardising the feature, as enough Python code will come to rely on CPython's behaviour that most alternative implementations will feel obliged to start behaving the same way CPython does (with MicroPython being the potential exception due to memory usage constraints always winning over algorithmic efficiency concerns in that context). We added ResourceWarning a while back to help discourage reliance on CPython promptly calling __del__ methods when dropping the last reference to an object. An equivalent for this case would be for dict objects to randomize iteration (ala Go), once again requiring folks to opt-in via collections.OrderedDict to get guaranteed ordering (perhaps with a "o{}" dict display as new syntactic sugar). But unless someone actually writes a PEP and implementation for that in the next 12 weeks (and Guido accepts it), then we'll have 2 releases and 3 years of CPython working a particular way increasing the inertia against making such a change in 3.8 (and beyond that, I'd say we'd be well and truly into de facto standardisation territory, and the chances of ever introducing deliberate perturbation of dict iteration order would drop to nil). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 7 November 2017 at 06:39, Nick Coghlan <ncoghlan@gmail.com> wrote:
Personally, I think that having an ordered implementation and then deliberately breaking that ordering is pretty silly. Not least because we chose not to do that for 3.6. So we're left with the simple question of whether we make the behaviour required in the documentation (which is basically where this thread started). I see 3 options. First, we maintain the status quo, treat it as a CPython/PyPy implementation detail, and accept that this means that people will expect it - resulting in pressure on alternative implementations to conform, but without a language spec to support them. Second, we document the requirement in the language spec, requiring alternative implementations to either implement it or document it as a way in which they don't confirm to the language spec (which is unattractive for them, as "implements the official Python language spec" is a selling point). Or third, we could document it as an optional behaviour, that language implementations are expected to implement if possible, and if they can't they should document the variance. That is to some extent the best of both worlds, in that it allows implementations to claim conformance with the spec while still not implementing the feature if it causes them problems to do so - they just have to document that they don't implement this optional but recommended behaviour. The downside of this option is that there's no precedent for it in the Python spec (it's basically C's "implementation defined behaviour", which is something Python hasn't needed this far). Paul
data:image/s3,"s3://crabby-images/2c29a/2c29a132620d9d96e8c9dfafc1a72fd807b973db" alt=""
On Tue, 2017-11-07 at 16:39 +1000, Nick Coghlan wrote:
Maybe an UnorderedDict could be added which Python implementations _can_ implement as an optimized (less memory use, faster, ...) version without ordering guarantees if they have a need for it. In other implementations it could just be a synonym for a regular dict. That way it would be explicit that the programmer doesn't care about the ordered behaviour. It would also avoid current mistakes some (especially those new to the language and occasional users) make, because of assumptions from default dict behaviour. (Maybe a commandline switch or other mechanisms to explicitly use that UnorderedDict as the default could also be useful. It would be a no-op in implementations which don't have differing implementations.) -- Jan Claeys
data:image/s3,"s3://crabby-images/f3b2e/f3b2e2e3b59baba79270b218c754fc37694e3059" alt=""
On 6 November 2017 at 17:23, Paul G <paul@ganssle.io> wrote:
Is there a major objection to just adding in explicit syntax for order-preserving dictionaries? To some extent that seems like a reasonable compromise position in an "explicit is better than implicit" sense. A whole lot of code is out there that doesn't require or expect order-preserving dictionaries - it would be nice to be able to differentiate out the parts where order actually *does* matter.
(Of course, given that CPython's implementation is order-preserving, a bunch of code is probably now being written that implicitly requires on this detail, but at least having syntax that makes that clear would give people the *option* to make the assumption explicit).
I think the additional syntax have the added benefit of preventing code that relies on the ordering of dict literals to be run on older versions, therefore triggering subtle bugs that had already being mentioned. And also, forgot along the discussion, is the big disadvantage that other Python implementations would have a quite significant overhead on mandatory ordered dicts. One that was mentioned along the way is transpilers, with Brython as an example - but there might be others. MircoPython is far from being the only implementation affected. js -><-
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Nov 06, 2017 at 10:17:23PM -0200, Joao S. O. Bueno wrote:
I don't think that is correct. Nick already did a survey, and found that C# (IronPython), Java (Jython and VOC) and Javascript (Batavia) all have acceptable insertion-order preserving mappings. C++ (Nuitka) surely won't have any problem with this (if C++ cannot implement an efficient order-preserving map, there is something terribly wrong with the world). As for other languages that somebody might choose to build Python on (the Parrot VM, Haskell, D, Rust, etc) surely we shouldn't be limiting what Python does for the sake of hypothetical implementations in "underpowered" languages? I don't mean to imply that any of those examples are necessarily underpowered, but if language Foo is incapable of supporting an efficient ordered map, then language Foo is simply not good enough for a serious Python implementation. We shouldn't allow Python's evolution to be hamstrung by the requirement to support arbitrarily weak implementation languages.
One that was mentioned along the way is transpilers, with Brython as an example - but there might be others.
Since Brython transpiles to Javascript, couldn't it use the standard Map object, which preserves insertion order? https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Obj... Quote: Description A Map object iterates its elements in insertion order The EMCAScript 6 standard specifies that Map.prototype.forEach operates over the key/value pairs in insertion order: https://tc39.github.io/ecma262/#sec-map-objects -- Steve
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Nov 06, 2017 at 12:18:17PM +0200, Paul Sokolovsky wrote:
I disagree -- the history of Python shows that having dicts be unordered is a PITA for many Python programmers. Python eventually gained an ordered dict because it provides useful functionality that developers demand. Every new generation of Python programmers comes along and gets confused by why dicts mysteriously change their order from how they were entered, why doctests involving dicts break, why keyword arguments lose their order, why they have to import a module to get ordered dicts instead of having it be built-in, etc. Historically, we had things like ConfigParser reordering ini files when you write them. Having dicts be unordered is not a positive virtue, it is a limitation. Up until now, it was the price we've paid for having fast, O(1) dicts. Now we have a dict implementation which is fast, O(1) and ordered. Why pretend that we don't? This is a long-requested feature, and the cost appears to be small: by specifying this, all we do is rule out some, but not all, hypothetical future optimizations. Unordered dicts served CPython well for 20+ years, but I doubt many people will miss them.
Trading off space for time is a very common practice. You said that lookups on MicroPython's dicts are O(N). How efficient is µPy when doing a lookup of a dict with ten million keys? µPy has chosen to optimize for space, rather than time. That's great. But I don't think you should sneer at CPython's choice to optimize for time instead. And given that µPy's dicts already fail to meet the expected O(1) dict behviour, and the already large number of functional differences (not just performance differences) between µPy and Python: http://docs.micropython.org/en/latest/pyboard/genrst/index.html I don't think that this will make much practical difference. MicroPython users already cannot expect to run arbitrary Python code that works in other implementations: the Python community is fragmented between µPy code written for tiny machines, and Python code for machines with lots of memory.
It will no more be a "leak" than any other deliberate design choice.
despite the fact that there can be unlimited number of dictionary algorithms, most of them not having that property.
Sure. So what? There's an unlimited number of algorithms that don't provide the functionality that we want. There are an unlimited number of sort algorithms, but Python guarantees that we're only going to use those that are stable. Similar applies for method resolution (which µPy already violates), strings, etc.
What it will lead to is further fragmentation of the community.
Aren't you concerned about fragmenting the community because of the functional differences between MicroPython and the specs? Sometimes a small amount of fragmentation is unavoidable, and not necessarily a bad thing.
Given that you state that µPy dicts are O(N) and unordered, does that mean you picked only 1 out of 3? -- Steve
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Nov 4, 2017, at 11:35, Guido van Rossum <guido@python.org> wrote:
This sounds reasonable -- I think when we introduced this in 3.6 we were worried that other implementations (e.g. Jython) would have a problem with this, but AFAIK they've reported back that they can do this just fine. So let's just document this as a language guarantee.
The other concern was backward compatibility issues. For example, if 3.7 makes this guarantee official, and folks write code with this assumption that has to work with older versions of Python, then that code could be buggy in previous versions and work in 3.7. This will probably be most evident in test suites, and such failures are often mysterious to debug (as we’ve seen in the past). That doesn’t mean we shouldn’t do it, but it does mean we have to be careful and explicit to educate users about how to write safe multi-Python-version code. Cheers, -Barry
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Yup. At least such code will not break in 3.5. In general if you write code using a newer version you should expect arbitrary breakage when trying to run it under older versions. There is no compatibility guarantee in that direction for anything anyways. I don't see this as a reason to not put this into the language spec at 3.7. On Sun, Nov 5, 2017 at 9:37 AM, Barry Warsaw <barry@python.org> wrote:
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
I'm not entirely sure I understand the full set of reasoning for this - I couldn't really tell what the problem with OrderedDict is from the link Stefan provided. It seems to me like a kind of huge change for the language to move from arbitrary-ordered to guaranteed-ordered dict. The problem I see is that this introduces a huge backwards compatibility burden on all implementations of Python. It's possible that no implementations of Python (either future CPython versions or current/future alternative interpreters) will find any reason to use anything but an insertion-order sorted dictionary, but given that we've done just fine with arbitrary-order semantics for the entire lifetime of the language /and/ there is a container (OrderedDict) which has guaranteed order semantics, it doesn't seem worth it to me. I think I would mostly be concerned with (in terms of likeliness to occur): 1. An edge case we haven't thought of where arbitrary order dictionaries would allow some crticial optimization for a given platform (perhaps in someone writing a transpiler to another language where the convenient equivalent container has arbitrary order, e.g. if Brython wants to implement dicts in terms of objects - https://stackoverflow.com/questions/5525795/does-javascript-guarantee-object... ) 2. Someone invents a new arbitrary-ordered container that would improve on the memory and/or CPU performance of the current dict implementation 3. Some sort of bug or vulnerability is discovered that makes insertion-ordered dictionaries an unwise choice (similar to the hash collision vulnerability that necessitated hash randomization - https://stackoverflow.com/questions/14956313#14959001 ). Perhaps these concerns are overblown, but if indeed guaranteed-order Mapping literals are critical in some or many cases, maybe it would be preferable to introduce new syntax for OrderedDict literals. Best, Paul On 11/05/2017 12:50 PM, Guido van Rossum wrote:
data:image/s3,"s3://crabby-images/1726b/1726ba2b828487263f4207552ae0786d5e9f1d07" alt=""
On Sun, Nov 05, 2017 at 01:14:54PM -0500, Paul G wrote:
I'm not entirely sure I understand the full set of reasoning for this - I couldn't really tell what the problem with OrderedDict is from the link Stefan provided. It seems to me like a kind of huge change for the language to move from arbitrary-ordered to guaranteed-ordered dict. The problem I see is that this introduces a huge backwards compatibility burden on all implementations of Python.
Scientific applications want something like {'a': 10, 'b': "foo", 'c': {'this': b'123'}} as an ordered initializer for unboxed or typed (or both) data. In general, if dicts are ordered, they can be used for example as initializers for (nested) C structs.
2. Someone invents a new arbitrary-ordered container that would improve on the memory and/or CPU performance of the current dict implementation
I would think this is very unlikely, given that the previous dict implementation has always been very fast. The new one is very fast, too. Stefan Krah
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
I can understand why you'd want an ordered container, I just don't see why it must be a dict. Why can't it be: OrderedDict(a=10, b='foo', c=OrderedDict(this=b'123')) Is it just that you don't want to type OrderedDict that many times? If it's so important to provide ordered dictionary literals, I would think it's a no-brainer to give them their own literal syntax (rather than re-defining dicts to have guaranteed order). e.g.: o{'a': 10, 'b': 'foo', 'c': o{'this': b'123'} Then there is no backwards incompatibility problem and users can express whether order does or does not matter to them when initializing a container. On 11/05/2017 01:39 PM, Stefan Krah wrote:
On Sun, Nov 05, 2017 at 01:14:54PM -0500, Paul G wrote:
I'm not entirely sure I understand the full set of reasoning for this - I couldn't really tell what the problem with OrderedDict is from the link Stefan provided. It seems to me like a kind of huge change for the language to move from arbitrary-ordered to guaranteed-ordered dict. The problem I see is that this introduces a huge backwards compatibility burden on all implementations of Python.
data:image/s3,"s3://crabby-images/98c42/98c429f8854de54c6dfbbe14b9c99e430e0e4b7d" alt=""
05.11.17 21:30, Stefan Krah пише:
I didn't try to implement this. But the current implementation requires periodical reallocating if add and remove items. The following loop reallocates the dict every len(d) iterations, while the size of the dict is not changed, and the half of its storage is empty. while True: v = d.pop(k) ... d[k] = v
data:image/s3,"s3://crabby-images/cd7e0/cd7e070dde02a16b1e5e428904c488e52e05ce40" alt=""
I think the question of whether any specific implementation of dict could be made faster for a given architecture or even that the trade-offs made by CPython are generally the right ones is kinda beside the point. It's certainly feasible that an implementation that does not preserve ordering could be better for some implementation of Python, and the question is really how much is gained by changing the language semantics in such a way as to cut off that possibility. On 11/05/2017 02:54 PM, Serhiy Storchaka wrote:
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
On Mon, Nov 6, 2017 at 4:54 AM, Serhiy Storchaka <storchaka@gmail.com> wrote: ...
FYI, Raymond's original compact dict (moving last item to slot used for deleted item) will break OrderedDict. So it's not easy to implement than it looks. OrderedDict uses linked list to keep which slot is used for the key. Moving last item will break it. It means odict.__delitem__ can't use PyDict_DelItem anymore and OrderedDict should touch internal structure of dict. I think current OrderedDict implementation is fragile loose coupling. While two object has different file (dictobject.c and odictobject.c), OrderedDict depends on dict's internal behavior heavily. It prevents optimizing dict. See comment here. https://github.com/python/cpython/blob/a5293b4ff2c1b5446947b4986f98ecf5d5243... I don't have strong opinion about what should we do about dict and OrderedDict. But I feel PyPy's approach (using same implementation and just override __eq__ and add move_to_end() method) is most simple. Regards, INADA Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/98c42/98c429f8854de54c6dfbbe14b9c99e430e0e4b7d" alt=""
06.11.17 05:01, INADA Naoki пише:
All this is implementation details. We did have little time for coordinated changing dict and OrderedDict implementations before releasing 3.6, and left existing coupling. This also prevents from in-place compactization of dict items. We could add a private flag to dict that denotes whether this dict is ordered or no. The ordinal dict could be more compact, while OrderedDict left ordered. And I like your issue31265. The current dict implementation still is young. It takes several optimizations in 3.7 (thank to you Inada) and AFAIK there are still not merged patches. I would wait until it become more stable before making change in language specification.
I think that the coupling with dict and OrderedDict should be more robust, but left private. Dict implementation should be aware of OrderedDict and provide some private methods for using in OrderedDict, but not restrict the optimization of unordered dicts.
data:image/s3,"s3://crabby-images/fac1f/fac1f353137c52e340a2ea0ffa4fe887a42e016d" alt=""
On 6 November 2017 at 06:09, Serhiy Storchaka <storchaka@gmail.com> wrote:
I would personally be happy with this as the guarantee (it covers dict literals and handles PEP 468), but it might be more confusing. "dicts are in arbitrary order" and "dicts maintain insertion order" are fairly simple to explain, "dicts maintain insertion order up to the point that a key is deleted" is less so. Tim Delaney
data:image/s3,"s3://crabby-images/b3d87/b3d872f9a7bbdbbdbd3c3390589970e6df22385a" alt=""
2017-11-05 18:50 GMT+01:00 Guido van Rossum <guido@python.org>:
I don't see this as a reason to not put this into the language spec at 3.7.
It can prevent some kinds of optimizations. Dictionaries are used "everywhere" in Python, so they are very important for performance. I would prefer to only keep the ordering warranty for function keyword arguments and class members, and use explicitly an ordered dictionary when needed. Sorry, I don't have any example right now of a concrete optimization which would not be possible with ordered dictionary. But Serhiy mentioned the performance impact of ordering in Python 3.6 dictionary on deletion. Victor
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 6 November 2017 at 18:46, Victor Stinner <victor.stinner@gmail.com> wrote:
Note that I *don't* think we should mandate that regular dictionaries be synonymous with collections.OrderedDict - I think it's fine to say that regular dicts are insertion ordered *until* a key is deleted, and after that their ordering is arbitrary. Insertion-ordered-until-the-first-delete is sufficient to guarantee serialisation round trips, dict display ordering, and keyword order preservation in the dict constructor (it's not sufficient to guarantee ordering for class namespaces, as those may contain "del" statements, but class bodies are also permitted to use a non-default mapping type). However, if we decide *not* to require that dictionaries be insertion ordered, then I think we should do the same thing Go did, and have dict iterators start at a random offset in the item sequence (that's not actually my idea - Greg Smith suggested it at the core dev sprints). Otherwise we'll end up standardising the 3.6 behaviour by default, as folks come to rely on it, and decline to support Python implementations that don't provide insertion ordering semantics. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
Hi, folks. TLDR, was the final decision made already? If "dict keeps insertion order" is not language spec and we continue to recommend people to use OrderedDict to keep order, I want to optimize OrderedDict for creation/iteration and memory usage. (See https://bugs.python.org/issue31265#msg301942 ) If dict ordering is language spec, I'll stop the effort and use remaining time to another optimizations. My thought is, +1 to make it language spec. * PHP (PHP 7.2 interpreter is faster than Python) keeps insertion order. So even we make it language spec, I think we have enough room to optimize. * It can make stop discussion like "Does X keeps insertion order? It's language spec?", "What about Y? Z?". Everything on top of dict keeps insertion order. It's simple to learn and explain. Regards, INADA Naoki <songofacandy@gmail.com> On Sun, Nov 5, 2017 at 3:35 AM, Guido van Rossum <guido@python.org> wrote:
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
I'm in favor of stating that dict keeps order as part of the language spec. However re-reading https://mail.python.org/pipermail/python-dev/2017-November/150381.html there's still a point of debate: should it be allowed if dict reorders after deletion (presumably as a result of a rehash)? I'm in favor of prescribing that the order should be preserved even in that case, but I realize there's additional implementation work to be done. Inada-san, what do you think of this? --Guido On Thu, Dec 14, 2017 at 6:03 PM, INADA Naoki <songofacandy@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Oh, I just found https://mail.python.org/pipermail/python-dev/2017-November/150323.html so I already know what you think: we should go with "dicts preserve insertion order" rather than "dicts preserve insertion order until the first deletion". I guess we should wait for Serhiy to confirm that he's okay with this. On Thu, Dec 14, 2017 at 6:20 PM, Guido van Rossum <guido@python.org> wrote:
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/32b67/32b67145b0fe3069a1de27c1ec5fc1c9428e9b97" alt=""
I support having regular dicts maintain insertion order but am opposed to Inada changing the implementation of collections.OrderedDict We can have the first without having the second. Over the holidays, I hope to have time to do further analysis and create convincing demonstrations of why we want to keep the doubly-linked list implementation for collections.OrderedDict(). The current regular dictionary is based on the design I proposed several years ago. The primary goals of that design were compactness and faster iteration over the dense arrays of keys and values. Maintaining order was an artifact rather than a design goal. The design can maintain order but that is not its specialty. In contrast, I gave collections.OrderedDict a different design (later coded in C by Eric Snow). The primary goal was to have efficient maintenance of order even for severe workloads such at that imposed by the lru_cache which frequently alters order without touching the underlying dict. Intentionally, the OrderedDict has a design that prioritizes ordering capabilities at the expense of additional memory overhead and a constant factor worse insertion time. It is still my goal to have collections.OrderedDict have a different design with different performance characteristics than regular dicts. It has some order specific methods that regular dicts don't have (such as a move_to_end() and a popitem() that pops efficiently from either end). The OrderedDict needs to be good at those operations because that is what differentiates it from regular dicts. The tracker issue https://bugs.python.org/issue31265 is assigned to me and I currently do not approve of it going forward. The sentiment is nice but it undoes very intentional design decisions. In the upcoming months, I will give it additional study and will be open minded but it is not cool to use a python-dev post as a way to do an end-run around my objections. Back to the original topic of ordering, it is my feeling that it was inevitable that sooner or later we would guarantee ordering for regular dicts. Once we had a performant implementation, the decision would be dominated by how convenient it is users. Also, a single guarantee is simpler for everyone and is better than having a hodgepodge of rules stating that X and Y are guaranteed while Z is not. I think an ordering guarantee for regular dicts would be a nice Christmas present for our users and developers. Cheers, Raymond
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Dec 14, 2017 21:30, "Raymond Hettinger" <raymond.hettinger@gmail.com> wrote:
I support having regular dicts maintain insertion order but am opposed to Inada changing the implementation of collections.OrderedDict We can have the first without having the second. It seems like the two quoted paragraphs are in vociferous agreement. -n
data:image/s3,"s3://crabby-images/32b67/32b67145b0fe3069a1de27c1ec5fc1c9428e9b97" alt=""
I support having regular dicts maintain insertion order but am opposed to Inada changing the implementation of collections.OrderedDict We can have the first without having the second.
It seems like the two quoted paragraphs are in vociferous agreement.
The referenced tracker entry proposes, "Issue31265: Remove doubly-linked list from C OrderedDict". I don't think that should go forward regardless of whether regular dict order is guaranteed. Inada presented a compound proposition: either guarantee regular dict order or let him rip out the core design of OrderedDicts against my wishes. Raymond
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 15 December 2017 at 05:28, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
In contrast, I gave collections.OrderedDict a different design (later coded in C by Eric Snow). The primary goal was to have efficient maintenance of order even for severe workloads such at that imposed by the lru_cache which frequently alters order without touching the underlying dict. Intentionally, the OrderedDict has a design that prioritizes ordering capabilities at the expense of additional memory overhead and a constant factor worse insertion time.
That's interesting information - I wasn't aware of the different performance goals. I'd suggest adding a discussion of these goals to the OrderedDict documentation. Now that dictionaries preserve order (whether or not we make that language guaranteed or an implementation detail) having clear information on the intended performance trade-offs of an OrderedDict would help people understand why they might choose one over the other. Paul
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
That's interesting information - I wasn't aware of the different performance goals.
FYI, performance characteristic of my POC implementation of OrderedDict based on dict order are: * 50% less memory usage * 15% faster creation * 100% (2x) faster iteration * 20% slower move_to_end * 40% slower comparison (copied from https://bugs.python.org/issue31265#msg301942 ) Comparison is very unoptimized at the moment and I believe it can be more faster. On the other hand, I'm not sure about I can optimize move_to_end() more. If OrderdDict is recommended to be used for just keeping insertion order, I feel 1/2 memory usage and 2x faster iteration are more important than 20% slower move_to_end(). But if either "dict keeps insertion order" or "dict keeps insertion order until deletion" is language spec, there is no reason to use energy and time for discussion of OrderedDict implementation. Regards, INADA Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/32b67/32b67145b0fe3069a1de27c1ec5fc1c9428e9b97" alt=""
On Dec 15, 2017, at 7:53 AM, Guido van Rossum <guido@python.org> wrote:
Make it so. "Dict keeps insertion order" is the ruling. Thanks!
Thank you. That is wonderful news :-) Would it be reasonable to replace some of the OrderedDict() uses in the standard library with dict()? For example, have namedtuples's _asdict() go back to returning a plain dict as it did in its original incarnation. Also, it looks like argparse could save an import by using a regular dict. Raymond
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Fri, Dec 15, 2017 at 8:32 AM, Raymond Hettinger < raymond.hettinger@gmail.com> wrote:
If it's documented as OrderedDict that would be backwards incompatible, since that has additional methods. Even if not documented it's likely to break some code. So, I'm not sure about this (though I agree with the sentiment that OrderedDict is much less important now). -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/28d63/28d63dd36c89fc323fc6288a48395e44105c3cc8" alt=""
[Eric Snow <ericsnowcurrently@gmail.com>]
Does that include preserving order after deletion?
Given that we're blessing current behavior: - At any moment, iteration order is from oldest to newest. So, "yes" to your question. - While iteration starts with the oldest, .popitem() returns the youngest. This is analogous to how lists work, viewing a dict similarly ordered "left to right" (iteration starts at the left, .pop() at the right, for lists and dicts).
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Dec 15, 2017 10:50, "Tim Peters" <tim.peters@gmail.com> wrote: [Eric Snow <ericsnowcurrently@gmail.com>]
Does that include preserving order after deletion?
Given that we're blessing current behavior: - At any moment, iteration order is from oldest to newest. So, "yes" to your question. - While iteration starts with the oldest, .popitem() returns the youngest. This is analogous to how lists work, viewing a dict similarly ordered "left to right" (iteration starts at the left, .pop() at the right, for lists and dicts). Fortunately, this also matches OrderedDict.popitem(). It'd be nice if we could also support dict.popitem(last=False) to get the other behavior, again matching OrderedDict. -n
data:image/s3,"s3://crabby-images/32b67/32b67145b0fe3069a1de27c1ec5fc1c9428e9b97" alt=""
On Dec 15, 2017, at 7:53 AM, Guido van Rossum <guido@python.org> wrote:
Make it so. "Dict keeps insertion order" is the ruling.
On Twitter, someone raised an interesting question. Is the guarantee just for 3.7 and later? Or will the blessing also cover 3.6 where it is already true. The 3.6 guidance is to use OrderedDict() when ordering is required. As of now, that guidance seems superfluous and may no longer be a sensible practice. For example, it would be nice for Eric Smith when he does his 3.6 dataclasses backport to not have to put OrderedDict back in the code. Do you still have the keys to the time machine? Raymond
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Fri, Dec 15, 2017 at 12:45 PM, Raymond Hettinger < raymond.hettinger@gmail.com> wrote:
For 3.6 we can't change the language specs, we can just document how it works in CPython. I don't know what other Python implementations do in their version that's supposed to be compatible with 3.6 but I don't want to retroactively declare them non-conforming. (However for 3.7 they have to follow suit.) I also don't think that the "it stays ordered across deletions" part of the ruling is true in CPython 3.6. I don't know what guidance to give Eric, because I don't know what other implementations do nor whether Eric cares about being compatible with those. IIUC micropython does not guarantee this currently, but I don't know if they claim Python 3.6 compatibility -- in fact I can't find any document that specifies the Python version they're compatible with more precisely than "Python 3". -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/32b67/32b67145b0fe3069a1de27c1ec5fc1c9428e9b97" alt=""
FWIW, the regular dict does stay ordered across deletions in CPython3.6: >>> d = dict(a=1, b=2, c=3, d=4) >>> del d['b'] >>> d['b'] = 5 >>> d {'a': 1, 'c': 3, 'd': 4, 'b': 5} Here's are more interesting demonstration: from random import randrange, shuffle from collections import OrderedDict population = 1000000 s = list(range(population // 4)) shuffle(s) d = dict.fromkeys(s) od = OrderedDict.fromkeys(s) for i in range(500000): k = randrange(population) d[k] = i od[k] = i k = randrange(population) if k in d: del d[k] del od[k] assert list(d.items()) == list(od.items()) The dict object insertion logic just appends to the arrays of keys, values, and hashvalues. When the number of usable elements decreases to zero (reaching the limit of the most recent array allocation), the dict is resized (compacted) left-to-right so that order is preserved. Here are some of the relevant sections from the 3.6 source tree: Objects/dictobject.c line 89: Preserving insertion order It's simple for combined table. Since dk_entries is mostly append only, we can get insertion order by just iterating dk_entries. One exception is .popitem(). It removes last item in dk_entries and decrement dk_nentries to achieve amortized O(1). Since there are DKIX_DUMMY remains in dk_indices, we can't increment dk_usable even though dk_nentries is decremented. In split table, inserting into pending entry is allowed only for dk_entries[ix] where ix == mp->ma_used. Inserting into other index and deleting item cause converting the dict to the combined table. Objects/dictobject.c::insertdict() line 1140: if (mp->ma_keys->dk_usable <= 0) { /* Need to resize. */ if (insertion_resize(mp) < 0) { Py_DECREF(value); return -1; } hashpos = find_empty_slot(mp->ma_keys, key, hash); } Objects/dictobject.c::dictresize() line 1282: PyDictKeyEntry *ep = oldentries; for (Py_ssize_t i = 0; i < numentries; i++) { while (ep->me_value == NULL) ep++; newentries[i] = *ep++; }
I don't know what guidance to give Eric, because I don't know what other implementations do nor whether Eric cares about being compatible with those. IIUC micropython does not guarantee this currently, but I don't know if they claim Python 3.6 compatibility -- in fact I can't find any document that specifies the Python version they're compatible with more precisely than "Python 3".
I did a little research and here' what I found: "MicroPython aims to implement the Python 3.4 standard (with selected features from later versions)" -- http://docs.micropython.org/en/latest/pyboard/reference/index.html "PyPy is a fast, compliant alternative implementation of the Python language (2.7.13 and 3.5.3)." -- http://pypy.org/ "Jython 2.7.0 Final Released (May 2015)" -- http://www.jython.org/ "IronPython 2.7.7 released on 2016-12-07" -- http://ironpython.net/ So, it looks like your could say 3.6 does whatever CPython 3.6 already does and not worry about leaving other implementations behind. (And PyPy is actually ahead of us here, having compact and order-preserving dicts for quite a while). Cheers, Raymond
data:image/s3,"s3://crabby-images/05644/056443d02103b56fe1c656455ffee12aa1a01f1f" alt=""
On Fri, Dec 15, 2017 at 9:47 PM, Guido van Rossum <guido@python.org> wrote:
[...]
They currently specify 3.4+. Specifically, https://github.com/micropython/micropython includes: """ MicroPython implements the entire Python 3.4 syntax (including exceptions, with, yield from, etc., and additionally async/await keywords from Python 3.5). The following core datatypes are provided: str (including basic Unicode support), bytes, bytearray, tuple, list, dict, set, frozenset, array.array, collections.namedtuple, classes and instances. Builtin modules include sys, time, and struct, etc. Select ports have support for _thread module (multithreading). *Note that only a subset of Python 3 functionality is implemented for the data types and modules*. """ Note the emphasis I added on the last sentence.
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
Now that dicts are order-preserving, maybe we should change prettyprint: In [7]: d = {'one':1, 'two':2, 'three':3} In [8]: print(d) {'one': 1, 'two': 2, 'three': 3} order preserved. In [9]: pprint.pprint(d) {'one': 1, 'three': 3, 'two': 2} order not preserved ( sorted, I presume? ) With arbitrary order, it made sense to sort, so as to always give the same "pretty" representation. But now that order is "part of" the dict itself, it seems prettyprint should present the preserved order of the dict. NOTE: I discovered this making examples for an intro to Python class -- I was updating the part where I teach that dicts do not preserve order. I was using iPython, which, unbeknownst to me, was using pprint under the hood, so got a different order depending on whether I simply displayed the dict (which used pprint) or called str() or repr() on it. Pretty confusing. Will changing pprint be considered a breaking change? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Mon, Dec 18, 2017 at 7:02 PM, Barry Warsaw <barry@python.org> wrote:
Wait, what? Why would changing pprint (so that it accurately reflects dict's new underlying semantics!) be a breaking change? Are you suggesting it shouldn't be changed in 3.7? -n -- Nathaniel J. Smith -- https://vorpus.org
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Dec 18, 2017 at 07:37:03PM -0800, Nathaniel Smith wrote:
I have a script which today prints data like so: {'Aaron': 62, 'Anne': 51, 'Bob': 23, 'George': 30, 'Karen': 45, 'Sue': 17, 'Sylvester': 34} Tomorrow, it will suddenly start printing: {'Bob': 23, 'Karen': 45, 'Sue': 17, 'George': 30, 'Aaron': 62, 'Anne': 51, 'Sylvester': 34} and my users will yell at me that my script is broken because the data is now in random order. Now, maybe that's my own damn fault for using pprint instead of writing my own pretty printer... but surely the point of pprint is so I don't have to write my own? Besides, the docs say very prominently: "Dictionaries are sorted by key before the display is computed." https://docs.python.org/3/library/pprint.html so I think I can be excused having relied on that feature. -- Steve
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Mon, Dec 18, 2017 at 7:58 PM, Steven D'Aprano <steve@pearwood.info> wrote:
To make sure I understand, do you actually have a script like this, or is this hypothetical?
No need to get aggro -- I asked a question, it wasn't a personal attack. At a high-level, pprint's job is to "pretty-print arbitray Python data structures in a form which can be used as input to the interpreter" (quoting the first sentence of its documentation), i.e., like repr() it's fundamentally intended as a debugging tool that's supposed to match how Python works, not any particular externally imposed output format. Now, how Python works has changed. Previously dict order was arbitrary, so picking the arbitrary order that happened to be sorted was a nice convenience. Now, dict order isn't arbitrary, and sorting dicts both obscures the actual structure of the Python objects, and also breaks round-tripping through pprint. Given that pprint's overarching documented contract of "represent Python objects" now conflicts with the more-specific documented contract of "sort dict keys", something has to give. My feeling is that we should preserve the overarching contract, not the details of how dicts were handled. Here's another example of a teacher struggling with this: https://mastodon.social/@aparrish/13011522 But I would be in favor of adding a kwarg to let people opt-in to the old behavior like: from pprint import PrettyPrinter pprint = PrettyPrinter(sortdict=True).pprint -n -- Nathaniel J. Smith -- https://vorpus.org
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
Nathaniel Smith writes:
To make sure I understand, do you actually have a script like this, or is this hypothetical?
I have a couple of doctests that assume that pprint will sort by key, yes. It makes the tests look quite a bit nicer by pprinting the output, and I get sorting (which matters for some older Pythons) for free. (I admit I don't actually use those tests with older Pythons, but the principle stands.) I don't see why we don't do the obvious, namely add the option to use "native" order to the PrettyPrinter class, with the default being backward compatible. -- Associate Professor Division of Policy and Planning Science http://turnbull/sk.tsukuba.ac.jp/ Faculty of Systems and Information Email: turnbull@sk.tsukuba.ac.jp University of Tsukuba Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN
data:image/s3,"s3://crabby-images/05644/056443d02103b56fe1c656455ffee12aa1a01f1f" alt=""
On Tue, Dec 19, 2017 at 4:49 PM, Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Perhaps now key ordering has been pronounced we could either add a "sorted" method to dicts equivalent to the following code. def sorted(self): return {self[k] for k in sorted(self.keys())} Alternatively the sorted built-in could be modified to handle dicts in this way. Though I still find the assumption of any ordering at all a bit weird I suppose I'll grow used to it. regards Steve
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 19 December 2017 at 17:57, Steve Holden <steve@holdenweb.com> wrote:
I don't think there's any need for this.
Though I still find the assumption of any ordering at all a bit weird I suppose I'll grow used to it.
As far as I'm concerned, dictionaries are still exactly as they were before - key-value mappings with no inherent order. None of my code makes any assumption about the ordering of dictionaries, so it'll be 100% unaffected by this change. I find this whole debate about the "consequences" of mandating insertion order to be completely out of proportion. As far as I'm concerned, the only practical impact is that when you iterate over things like dictionary displays, **kw arguments, etc, you get the "obvious" order, and it's not a lucky accident that you do so. Certainly, with the order guaranteed, people who currently use OrderedDict will be able to simply use a dict in future - although I'd expect very few people will take advantage of this in the immediate future, as by doing so they'll be restricting their code to Python 3.7+ only, for no significant benefit. And let's not forget that OrderedDict is used far less frequently than plain dictionaries, so we're talking about a small percentage of a tiny percentage of uses of mapping objects in the wild that will in *any* way be affected by this change. Paul
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Dec 18, 2017 at 08:49:54PM -0800, Nathaniel Smith wrote:
On Mon, Dec 18, 2017 at 7:58 PM, Steven D'Aprano <steve@pearwood.info> wrote:
The details are much simplified, but basically, and my users probably won't literally yell at me, but yes I do. But does it matter? The thing about backwards-compatibility guarantees is that we have to proceed as if somebody does have such a script. We don't know who, we don't know why, but we have to assume that they are relying on whatever guarantees we've given and will be greatly inconvenienced by any change without sufficient notice.
I didn't interpret it as an attack. Sorry for any confusion, I was trying to be funny -- at least, it sounded funny in my own head.
The *high* level purpose of pprint is to *pretty-print* values, like the name says. If all we wanted was something that outputs an eval()'able representation, we already had that: repr(). But even that requirement that output can be used as input to the interpreter is a non-core promise. There are plenty of exceptions: recursive data structures, functions, any object with the default repr, etc. Even when it works, the guarantee is quite weak. For instance, even the object type is not preserved: py> class MyDict(dict): ... pass ... py> d = MyDict() py> x = eval(repr(d)) py> assert d == x py> assert type(d) == type(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> AssertionError So the "promise" that eval(repr(obj)) will round-trip needs to be understood as being one of those Nice To Have non-core promises, not an actual guaranteed feature. (The bold print giveth, and the fine print taketh away.) So the fact that the output of pprint doesn't preserve the order of the dict won't be breaking any documented language guarantees. (It is probably worth documenting explicitly though, rather than just letting it be implied by the sorted keys guarantee.)
The point of pprint is not merely to duplicate what repr() already does, but to output an aesthetically pleasing view of the data structure. There is no reason to think that is only for the purposes of debugging. pprint is listed in the docs under Data Types, not Debugging: https://docs.python.org/3/library/datatypes.html https://docs.python.org/3/library/debug.html
Beware of promising a feature for convenience, because people will come to rely on it. In any case, lexicographic (the default sorting) order is in some ways the very opposite of "arbitrary order".
Now, dict order isn't arbitrary,
No, we can't say that. Dicts *preserve insertion order*, that is all. There is no requirement that the insertion order be meaningful or significant in any way: it may be completely arbitrary. If I build a mapping of (say) product to price: d = {'hammer': 5, 'screwdriver': 3, 'ladder': 116} the order the items are inserted is arbitrary, probably representing the historical accident of when they were added to the database/catalog or when I thought of them while typing in the dict. The most we can say is that for *some* cases, dict order *may* be meaningful. We're under no obligation to break backwards-compatibility guarantees in order for pretty printing to reflect a feature of dicts which may or may not be of any significance to the user.
and sorting dicts both obscures the actual structure of the Python objects,
You can't see the actual structure of Python objects via pprint. For example, you can't see whether the dict is a split table (shared keys) or combined table. You can only see the parts of the public interface which the repr, or pprint, chooses to show. That's always been the case so nothing changes here. If pprint were new to 3.7, I daresay there would be a good argument to have it display keys in insertion order, but given backwards compatibility, that's not tenable without either an opt-in switch, or a period of deprecation.
and also breaks round-tripping through pprint.
Round-tripping need not promise to preserve order, since dicts don't care about order for the purposes of equality. Round-tripping already is a lossy operation: - object identity is always lost (apart from a few cached objects like small ints and singletons like None); - in some cases, the type of objects can be lost; - any attribute of the object which is not both reflected in its repr and set by its constructor will be lost; (e.g. x = something(); x.extra_attribute = 'spam') - many objects don't round-trip at all, e.g. functions and recursive data structures. So the failure of pprint to preserve such insertion order by default is just one more example.
I believe the overarching contract is to pretty print. Anything else is a Nice To Have. [...]
It would have to be the other way: opt-out of the current behaviour. -- Steve
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Dec 18, 2017, at 22:37, Nathaniel Smith <njs@pobox.com> wrote:
As others have pointed out, exactly because the current behavior is documented. And we all know that if it’s documented (and often even if it’s not, but that’s besides the point here) it will be relied upon. So we can’t change the default behavior. But I have no problems conceptually with giving users options. The devil is in the details though, e.g. should we special case dictionary sorting only? Should we use a sort `key` to mirror sorted() and list.sort()? We can figure those things out and whether it’s even worth doing. I don’t think that’s PEP-worthy, so if someone is sufficiently motivated, please open an issue on bpo. Cheers, -Barry
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
On Tue, Dec 19, 2017 at 8:14 AM, Barry Warsaw <barry@python.org> wrote:
Nathaniel Smith has pointed out that eval(pprint(a_dict)) is supposed to return the same dict -- so documented behavior may already be broken. (though I assume order is still ignored when comparing dicts, so: eval(pprint(a_dict)) == a_dict will still hold. But practicality beats purity, and a number of folks have already posted use-cases where they rely on sorted order, so there you go.
Should we use a sort `key` to mirror sorted() and list.sort()?
That would be a nice feature! If anything is done, I think we should allow a key function. and maybe have key=None as "unsorted" -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
data:image/s3,"s3://crabby-images/832a7/832a7d28e16a261c5f64f5c6fc6585753582feae" alt=""
On 19Dec2017 1004, Chris Barker wrote:
Nathaniel Smith has pointed out that eval(pprint(a_dict)) is supposed to return the same dict -- so documented behavior may already be broken.
Two relevant quotes from the pprint module docs:
Dictionaries are sorted by key before the display is computed.
It says nothing about the resulting dict being the same as the original one, just that it can be used as input. So these are both still true (until someone deliberately breaks the latter). In any case, there are so many ways to spoil the first point for yourself that it's hardly worth treating as an important constraint.
(though I assume order is still ignored when comparing dicts, so: eval(pprint(a_dict)) == a_dict will still hold.
Order had better be ignored when comparing dicts, or plenty of code will break. For example:
{'a': 1, 'b': 2} == {'b': 2, 'a': 1} True
Saying that "iter(dict)" will produce keys in the same order as they were inserted is not the same as saying that "dict" is an ordered mapping. As far as I understand, we've only said the first part. (And the "nerve" here is that I disagreed with even the first part, but didn't fight it too strongly because I never relied on the iteration order of dict. However, I *do* rely on nobody else relying on the iteration order of dict either, and so proposals to change existing semantics that were previously independent of insertion order to make them rely on insertion order will affect me. So now I'm pushing back.) Cheers, Steve
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Tue, Dec 19, 2017 at 4:56 PM, Steve Dower <steve.dower@python.org> wrote:
This is a pretty fine hair to be splitting... I'm sure you wouldn't argue that it would be valid to display the dict {"a": 1} as '["hello"]', just because '["hello"]' is a valid input to the interpreter (that happens to produce a different object than the original one) :-). I think we can assume that pprint's output is supposed to let you reconstruct the original data structures, at least in simple cases, even if that isn't explicitly stated.
I guess the underlying issue here is partly the question of what the pprint module is for. In my understanding, it's primarily a tool for debugging/introspecting Python programs, and the reason it talks about "valid input to the interpreter" isn't because we want anyone to actually feed the data back into the interpreter, but to emphasize that it provides an accurate what-you-see-is-what's-really-there view into how the interpreter understands a given object. It also emphasizes that this is not intended for display to end users; making the output format be "Python code" suggests that the main intended audience is people who know how to read, well, Python code, and therefore can be expected to care about Python's semantics.
Yes, this is never going to change -- I expect that in the long run, the only semantic difference between dict and OrderedDict will be in their __eq__ methods.
I mean, I don't want to be a jerk about this, and we still need to examine things on a case-by-case basis but... Guido has pronounced that Python dict preserves order. If your code "rel[ies] on nobody else relying on the iteration order", then starting in 3.7 your code is no longer Python. Obviously I like that change more than you, but to some extent it's just something we have to live with, and even if I disagreed with the new semantics I'd still rather the standard library handle them consistently rather than being half-one-thing-and-half-another. -n -- Nathaniel J. Smith -- https://vorpus.org
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Dec 19, 2017, at 20:32, Nathaniel Smith <njs@pobox.com> wrote:
pprint.pprint() is indeed mostly for debugging, but not always. As an example of what will break if you change the sorting guarantee: in Mailman 3 the REST etag is calculated from the pprint.pformat() of the result dictionary before it’s JSON-ified. If the order is changed, then it’s possible a client will have an incorrect etag for a structure that is effectively the same. -Barry
data:image/s3,"s3://crabby-images/227ad/227ad844da34915e2d53d651f1d0f394b1fcc61b" alt=""
On 12/19/2017 5:32 PM, Nathaniel Smith wrote:
Any dict object read in from pprint is going to be a different object, not the original one. And, unless the original insertion order was sorted by the same key as pprint uses to sort, the iteration order will be different from the original. As pointed out below, it will compare equal to the original dict. pprint has always allowed you to reconstruct the original data structures, but not the iteration order of dicts. With the new insertion order guarantee, nothing has changed, here. A far more interesting question than what pprint does to dict order is what marshal and pickle do (and have done) with the dict order, although I can't figure that out from the documentation.
data:image/s3,"s3://crabby-images/fef1e/fef1ed960ef8d77a98dd6e2c2701c87878206a2e" alt=""
On Tue, 19 Dec 2017 17:32:52 -0800 Nathaniel Smith <njs@pobox.com> wrote:
Actually, when you want to include a large constant in a Python program, pprint() can be useful to get a nicer formatting for your source code. That said, I do think that pprint() should continue sorting dicts by default. Even though dicts may be ordered *now*, most uses of dict don't expect any particular order. (I also think the pprint() example shows the potential confusion issues with making dict ordered by default, as the user and implementor of an API may not agree whether dict order is significant...) Regards Antoine.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Tue, Dec 19, 2017 at 04:56:16PM -0800, Steve Dower wrote:
On 19Dec2017 1004, Chris Barker wrote:
Indeed. Regular dicts preserve insertion order, they don't take insertion order into account for the purposes of equality. See the example here: https://docs.python.org/3.7/library/stdtypes.html#mapping-types-dict and the description of mapping equality: https://docs.python.org/3.7/reference/expressions.html#value-comparisons "Mappings (instances of dict) compare equal if and only if they have equal (key, value) pairs. Equality comparison of the keys and values enforces reflexivity." Changing that would be a *huge* backwards-compatibility breaking change. Aside: I've just noticed that mapping equality is not transitive: a == b and b == c does not imply that a == c. py> from collections import OrderedDict as OD py> a, b, c = OD.fromkeys('xyz'), dict.fromkeys('xyz'), OD.fromkeys('zyx')) py> a == b == c True py> a == c False -- Steve
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
Chris Barker writes:
Sure, but that's because we put shoes on a snake. Why anybody expects no impediment to slithering, I don't know! I understand the motivation to guarantee order, but it's a programmer convenience that has nothing to do with the idea of mapping, and the particular (insertion) order is very special and usually neither relevant nor reproducible. I have no problem whatsoever with just documenting any failure to preserve order while reproducing dicts, *except* that a process that inserts keys in the same order had better result in the same insertion order. Steve
data:image/s3,"s3://crabby-images/b8491/b8491be6c910fecbef774491deda81cc5f10ed6d" alt=""
On Thu, Dec 21, 2017 at 7:51 PM, Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
json, pickle == png, i.e., guaranteed lossless. repr, pprint == jpg, lossy for very specific motivating reasons. In particular, I use pprint output in regression baselines, and if the long documented sort-by-key behavior changed, I would not be happy.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Dec 18, 2017 at 06:11:05PM -0800, Chris Barker wrote:
Indeed. pprint.PrettyPrinter has separate methods for OrderedDict and regular dicts, and the method for printing dicts calls sorted() while the other does not.
I disagree. Many uses of dicts are still conceptually unordered, even if the dict now preserves insertion order. For those use-cases, insertion order is of no interest whatsoever, and sorting is still "prettier". -- Steve
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
On Mon, Dec 18, 2017 at 7:41 PM, Steven D'Aprano <steve@pearwood.info> wrote:
and many uses of dicts have "sorted" order as completely irrelevant, and sorting them arbitrarily is not necessarily pretty (you can't provide a sort key can you? -- so yes, it's arbitrary) I'm not necessarily saying we should break things, but I won't agree that pprint sorting dicts is the "right" interface for what is actually an order-preserving mapping. I would think it was only the right choice in the first place in order (get it?) to get a consistent representation, not because sorting was a good thing per se. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Dec 18, 2017 at 08:28:52PM -0800, Chris Barker wrote:
I completely agree. We might argue that it was a mistake to sort dicts in the first place, or at least a mistake to *always* sort them without allowing the caller to provide a sort key. But what's done is done: the fact that dicts are sorted by pprint is not merely an implementation detail, but a documented behaviour.
If sorting dicts was the "right" behaviour in Python 3.4, it remains the right behaviour -- at least for use-cases that don't care about insertion order. Anyone using pprint on dicts *now* doesn't care about insertion order. If they did, they would be using OrderedDict. That will change in the future, but even in the future there are lots of use-cases for dicts where insertion order might as well be random. The order that some dict happen to be constructed may not be "pretty" or significant or even consistent from one dict to the next. (If your key/values pairs are coming in from an external source, they might not always come in the same order.) I'm not denying that sometimes it would be nice to see dicts in insertion order. Right now, those use-cases are handled by OrderedDict but in the future many of them will be taken over by regular dicts. So we have a conflict: - for some use-cases, insertion order is the "right" way for pprint to display the dict; - but for others, sorting by keys is the "pretty" way for pprint to display the dict; - and there's no way for pprint to know which is which just by inspecting the dict. How to break this tie? Backwards compatibility trumps all. If we want to change the default behaviour of pprint, we need to go through a deprecation period. Or add a flag sorted=True, and let the caller decide.
*shrug* That's arguable. As you said yourself, dicts were sorted by key to give a "pretty" representation. I'm not so sure that consistency is the justification. What does that even mean? If you print the same dict twice, with no modifications, it will print the same whether you sort first or not. If you print two different dicts, who is to say that they were constructed in the same order? But the point is moot: whatever the justification, the fact that pprint sorts dicts by key is the defined behaviour, and even if it was a mistake to guarantee it, we can't just change it without a deprecation period. -- Steve
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Tue, Dec 19, 2017 at 6:09 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Personally, I think it's a good default behaviour. Frequently you have a dictionary that, in normal code, you look up by key rather than iterating over; but for debugging, you print out everything. (Example: HTTP request/response headers. In your code, you'll query the headers dict for "content-type", but it's nice to be able to say "show me every header".) Insertion order is meaningless, and it's nice to be able to read them in some kind of sane order. Simply sorting the keys in the default way is good enough for a LOT of use-cases.
Agreed, except for the last bit. I'd be inclined to kill two birds with one stone: add a flag sort_key=DEFAULT or sort_key=IDENTITY, which will sort the keys by themselves; you can provide a key function to change the way they're sorted, or you can pass sort_key=None to get them in insertion order.
In old versions of Python, this code would always produce the same result: d = {} d["qwer"] = 1 d["asdf"] = 2 d["zxcv"] = 3 print(d) import pprint; pprint.pprint(d) Then along came hash randomization, and the output would change - but pprint would still be consistent and tidy. Now we have insertion order retained, and both the repr and the pprint are consistent. Both of them are sane. They're just different sameness.
This is really the clincher. But IMO the current behaviour isn't *just* for backcompat; it's good, useful behaviour as it is. I wouldn't want to see it changed even _with_ deprecation. ChrisA
data:image/s3,"s3://crabby-images/832a7/832a7d28e16a261c5f64f5c6fc6585753582feae" alt=""
On 18Dec2017 2309, Steven D'Aprano wrote:
I have never needed OrderedDict before, and dict now also being ordered doesn't mean that when I reach for it I'm doing it because I need an ordered dict - I probably just need a regular dict. *Nothing* about dict should change for me between versions. Adding an option to pprint to explicitly control sorting without changing the default is fine. Please stop assuming that everyone wants an OrderedDict when they say dict. It's an invalid assumption. Cheers, Steve
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Mon, Dec 18, 2017 at 11:38 PM, Steve Dower <steve.dower@python.org> wrote:
Can we all take a deep breath and lay off the hyperbole? The only point under discussion in this subthread is whether pprint -- our module for producing nicely-formatted-reprs -- should continue to sort keys, or should continue to provide an accurate repr. There are reasonable arguments for both positions, but no-one's suggesting anything in the same solar system as "all users of dict have to be updated". Am I missing some underlying nerve that this is hitting for some reason? -n -- Nathaniel J. Smith -- https://vorpus.org
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 19 December 2017 at 08:27, Nathaniel Smith <njs@pobox.com> wrote:
IMO, the key thing is that people appear to be talking as if changing documented behaviour without deprecation is acceptable. Or even if they are OK with a deprecation period, they are still talking about changing documented behaviour for a trivial reason (as Steve Dower said, most people will still use dict for its dict behaviour, not for its orderedness). People (including me) are getting irritated because backward compatibility, which is normally a key principle, is being treated so lightly here. (It happened in another thread as well - changing the output of csv.DictReader from OrderedDict to dict - again suggesting that breaking a documented behaviour was OK, just because dicts are now ordered). Paul
data:image/s3,"s3://crabby-images/1726b/1726ba2b828487263f4207552ae0786d5e9f1d07" alt=""
On Sun, Nov 05, 2017 at 09:01:40PM +0200, Serhiy Storchaka wrote:
Do you suggest to make dictionary displays producing OrderedDict instead of dict?
No, this is essentially a language spec doc issue that would guarantee the ordering properties of the current dict implementation. Stefan Krah
data:image/s3,"s3://crabby-images/1726b/1726ba2b828487263f4207552ae0786d5e9f1d07" alt=""
On Sun, Nov 05, 2017 at 09:35:38PM +0200, Serhiy Storchaka wrote:
Yes, for my use case that would be sufficient and that's what I had in mind initially. A luxury syntax addition like {a = 10, b = {c = "foo"}} that is read as an OrderedDict (where the keys a, b, c are implicitly strings) would of course also be sufficient for my use case. But I suspect many users who have other use cases find it tantalizing not to be able to use the properties of the current regular dict. Stefan Krah
data:image/s3,"s3://crabby-images/1383b/1383b5c083aa3fd057a17c816aec322848ea5a95" alt=""
Hi, I have to admit sometimes I don't understand why small things produce so much mail traffic. :-) If I use a mapping like dict most of the time I don't care if it is ordered. If I need an ordering I use OrderedDict. In a library if I need a dict to be ordered most of the time there is a parameter allowing me to do this or to specify the used class as dict. If this is not the case it can be fixed. In rare cases a workaround is needed. As of now I have dict and OrderedDict, it is clear and explicit. No need to change. Yes it is useful for debugging to have things ordered. Yes in other places the implicit ordering is also fine. For pypy and CPython good and take it, be happy. Also it is fine to teach people that dict (Mapping) is not ordered but CPython has an implementation detail and it is ordered. But if you want the guarantee use OrderedDict. To be really practical even if this is guaranteed in 3.9 I cannot rely on it because of Python 2.7, 3.5, 3.6, ... compatibility. If this versions are out of order in 10 years, even then if I want to produce a small library running on another implementation I have to care, because of the list of differences to the CPython implementation or because the project is not yet up to date with the implementation (Jython). To be save then I will still use OrderedDict guaranteeing me this what I want. Finally even when dict literals will be guaranteed to be ordered it is good to teach and use OrderedDict because it is explicit. For implementations (algorithm) I cannot foresee the future so I cannot tell if it will be a burden or not. Finally someone have to decide it. As long as OrderedDict is available for me to specify it explicit it will be fine. ;-) Regards, Wolfgang On 04.11.2017 18:30, Stefan Krah wrote:
data:image/s3,"s3://crabby-images/9dd94/9dd94ad1163d17ff290b45ac3a45dd8a266b2174" alt=""
On 11/07/2017 09:00 AM, Wolfgang wrote:
I feel a bit out of place on this list since I'm a lurker and not a core developer, but I just wanted to add my agreement with this as well. So maybe take my opinion with a grain of salt... I agree with Wolfgang. I just don't understand why this change is needed. We have dict and we have OrderedDict. Why does dict need to provide the extra ordering constraints? I've read many posts in this discussion and find none of them convincing. Python already guarantees things like ordering of keyword arguments. I've seen some people point out confusion of newcomers (e.g. they are surprised when order is not surprised), but that just seems to me natural confusion that comes about when learning. I would argue that a better solution to that problem is exactly the go solution: i.e. purposely perturbing the ordering in a way that shows up immediately so that users realize the problems in their thinking earlier. The dict provides a mapping from key to value. I personally think that that is mentally much simpler object than a mapping from key to value with certain ordering guarantees. If I want to extra guarantees I import OrderedDict and read what the guarantees are. This seems totally fine to me. I don't really see any advantages to this change but a lack of implementation flexibility and a more complicated core object in Python. Cheers, Thomas
data:image/s3,"s3://crabby-images/ef9a3/ef9a3cb1fb9fd7a4920ec3c178eaddbb9c521a58" alt=""
On 11/07/2017 09:00 AM, Wolfgang wrote: [...]
I don't think that is fine. When I explained this in 3.5, dicts rearranging themselves seemed quite weird to the newcomers. This year, I'm not looking forward to saying that dicts behave "intuitively", but you shouldn't rely on that, because they're theoretically allowed to rearrange themselves. The concept of "implementation detail" and language spec vs. multiple interpreter implementations isn't easy to explain to someone in a "basic coding literacy" course. Today I can still show an example on Python 3.5. But most people I teach today won't run their code on 3.5, or on MicroPython or Brython, and quite soon they'll forget that there's no dict ordering guarantee. Also: I happen to read python-dev and the language docs. I suspect not all teachers do, and when they see that dict order randomization was "fixed", they might just remove the explanation from the lesson and teach something practical instead.
data:image/s3,"s3://crabby-images/81bb5/81bb5c99794aa6104ce457e1b76cfbdd07cddf90" alt=""
Hello, I agree with Wolfgang here. From what I gathered of the discussion, the argument started from « It would be nice if dict litterals returned ordered dicts instead of an unordered ones », which is mostly a good thing, as it allows e.g. `OrderedDict({'spam': 'ham', 'sausages': 'eggs'})` instead of having to rely on lists of couples to create an OrderedDict. It is not of utmost utility, but it would be nice to have and not dissimilar to what we already have with kwargs being ordered dicts, also a matter of slightly better usability. Possibly, in order to avoid implying that all dicts guarantee ordering at all time, a syntax such as `o{}` might be used, mostly to help newcomers. So far, so good. Where it started to go all haywire is when it became conflated that with the fact that CPython and Pypy dicts are actually ordered (up to a point at least) and it suddenly became « Let's guarantee the ordering of all dicts » which apparently is an issue for at least one implementation of Python, and still have to be implemented in several others (said implementation would be trivial, or so it is said, but it still has to be written, along with appropriate tests, regression checks…). So far, the arguments I have seen for that are 1. It is useful in context where one has to guarantee the ordering of some mapping (such as in json) 2. It is useful in contexts where ordering is facultative, but nice to have (debugging has been mentionned) 3. It is already this way in CPython, so people are going to use that anyway I take issue with all of those arguments. 1. If the ordered should be guaranteed, why would it be so hard to use OrderedDict ? - Just write `d = OrderedDict(((key, val) for key, value in …))` instead of `{key: value for key, value in …}`. It is not that hard and at least it is explicit that the order is important. And if it is really so hard, we could have dict comprehensions be ordered too in addition to litterals, it still doesn't mean that dicts should all be ordered - It is even easier if you fill your dict value-per-value, just initialise it as `d = OrderedDict` instead of `d = {}` and voilà ! 2. I would like to see some examples of cases where this is really much more convenient than any other soution, but even then I suspect that these cases are not sufficently relevant to wed all Python backends to ordered dicts forever. 3. This is just a pure fallacy. The language has a documented API that says that if order of insertion is important, you should explicitely use an OrderedDict. If people stray away from it and use implementation details such as the ordering of dict in CPython, they are on their own and shouldn't expect it to be portable to any other version. Again, it's not as if OrderedDict did not exist or was much more inconvenient to use than dict. Also, since the proposed implementation does not keep ordering on deletion, those relying implicitely on the ordering of dicts without reading the docs might get bitten by it later in much more subtle ways. Note that I don't sugest mandatory shuffling of dicts to advertise their non-guaranteed ordering, either. Just that reading the docs (or having your instructor tell you that dict does not guarantee the order) is the reponsibility of the end user. To sum it up - Ordered dict litterals are a nice idea, but probably not that important. If it happens, it would be nice if it could be extended to dict comprehensions, though. - Guaranteeing the ordering of all `dicts` does not make a lot of sense - Changing the API to guarantee the order of dicts **is** an API change, which still means work Am I missing something ? Cheers, E On 7 November 2017 at 10:51, Petr Viktorin <encukou@gmail.com> wrote:
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Nov 7, 2017, at 01:51, Petr Viktorin <encukou@gmail.com> wrote:
Perhaps, but IME, it’s not hard to teach someone that in a code review. Today, if I see someone submit a change that includes an implicit assumption about ordering, I’ll call that out. I can say “you can’t rely on dicts being ordered, so if that’s what you want, use an OrderedDict or sort your test data”. That’s usually a localized observation, meaning, I can look at the diff and see that they are assuming dict iteration ordering. What happens when built-in dict’s implementation behavior becomes a language guarantee? Now the review is much more difficult because I probably won’t be able to tell just from a diff whether the ordering guarantees are preserved. Do they delete a key somewhere? Who knows? I’m not even sure that would be statically determinable since I’d have to trace the use of that dictionary at run time to see if some “del d[key]” is deleting the key in the dict under review or not. I can probably only tell that at run time. So how to I accurately review that code? Is the order presumption valid or invalid? Cheers, -Barry
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
This sounds reasonable -- I think when we introduced this in 3.6 we were worried that other implementations (e.g. Jython) would have a problem with this, but AFAIK they've reported back that they can do this just fine. So let's just document this as a language guarantee. On Sat, Nov 4, 2017 at 10:30 AM, Stefan Krah <stefan@bytereef.org> wrote:
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/84ecc/84eccc2a9789a30ce89b3034a04e9c359b0b3fe1" alt=""
Isn't ordered dict also useful for **kwargs? If it turns out that there's a dict implementation that's faster by not preserving order, collections.UnorderedDict could be added. There could also be specialized implementations that pre-size the dict (cf: C++ unordered_map::reserve), etc., etc. But these are all future things, which might not be necessary. On 5 November 2017 at 12:44, Sven R. Kunze <srkunze@mail.de> wrote:
data:image/s3,"s3://crabby-images/cd7e0/cd7e070dde02a16b1e5e428904c488e52e05ce40" alt=""
I think that the problem with this is that for the most part, people will use `dict`, and for most uses of `dict`, order doesn't matter (and it never has before). Given that "arbitrary order" includes *any* fixed ordering (insertion ordered, reverse insertion ordered, etc), the common case should keep the existing "no order guarantee" specification. This gives interpreter authors maximum freedom with a fundamental, widely used data type.
Isn't ordered dict also useful for **kwargs?
If this is useful (and it seems like it would be), I think again a syntax modification that allows users to indicate that they want a particular implementation of **kwargs would be better than modifying dict semantics. It could possibly be handled with a type-hinting like syntax: def f(*args, **kwargs : OrderedKwargs): Or a riff on the existing syntax: def f(*args, ***kwargs): def f(*args, ^^kwargs): def f(*args, .**kwargs): In this case, the only guarantee you'd need (which relatively minor compared to a change in the dict semantics) would be that keyword argument order passed to a function would be preserved as the order that it is passed into the `kwargs` constructor. The old **kwargs syntax would give you a `dict` as normal, and the new ^^kwargs would give you an OrderedDict or some other dict subclass with guaranteed order. On 11/05/2017 03:50 PM, Peter Ludemann via Python-Dev wrote:
data:image/s3,"s3://crabby-images/fac1f/fac1f353137c52e340a2ea0ffa4fe887a42e016d" alt=""
On 6 November 2017 at 07:50, Peter Ludemann via Python-Dev < python-dev@python.org> wrote:
Isn't ordered dict also useful for **kwargs?
**kwargs is already specified as insertion ordered as of Python 3.6. https://www.python.org/dev/peps/pep-0468/ Tim Delaney
data:image/s3,"s3://crabby-images/832a7/832a7d28e16a261c5f64f5c6fc6585753582feae" alt=""
Since there is voting going on, -1 on changing the guarantees of `dict`. If ordering is important, OrderedDict is more explicit and more obvious to the over reading the code, even if the underlying implementation is identical. Good luck teaching people about why Python went from OrderedDict to UnorderedDict within a version. I remember learning about this same thing in C++ and just thinking “wut?”. And they just added a new type and said to stop using the old one – not changing something that people already understand. Better to underspecify the default dict and offer explicitly named extensions (DefaultDict, OrderedDict, etc.) for those who want more guarantees. Cheers, Steve Top-posted from my Windows phone From: Peter Ludemann via Python-Dev Sent: Sunday, November 5, 2017 12:53 To: Sven R. Kunze Cc: Python-Dev Subject: Re: [Python-Dev] Guarantee ordered dict literals in v3.7? Isn't ordered dict also useful for **kwargs? If it turns out that there's a dict implementation that's faster by not preserving order, collections.UnorderedDict could be added. There could also be specialized implementations that pre-size the dict (cf: C++ unordered_map::reserve), etc., etc. But these are all future things, which might not be necessary. On 5 November 2017 at 12:44, Sven R. Kunze <srkunze@mail.de> wrote: +1 from me too. On 04.11.2017 21:55, Jim Baker wrote: +1, as Guido correctly recalls, this language guarantee will work well with Jython when we get to the point of implementing 3.7+. On Sat, Nov 4, 2017 at 12:35 PM, Guido van Rossum <guido@python.org> wrote: This sounds reasonable -- I think when we introduced this in 3.6 we were worried that other implementations (e.g. Jython) would have a problem with this, but AFAIK they've reported back that they can do this just fine. So let's just document this as a language guarantee. On Sat, Nov 4, 2017 at 10:30 AM, Stefan Krah <stefan@bytereef.org> wrote: Hello, would it be possible to guarantee that dict literals are ordered in v3.7? The issue is well-known and the workarounds are tedious, example: https://mail.python.org/pipermail/python-ideas/2015-December/037423.html If the feature is guaranteed now, people can rely on it around v3.9. Stefan Krah _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/jbaker%40zyasoft.com _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/srkunze%40mail.de _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/pludemann%40google.com
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 6 November 2017 at 06:50, Peter Ludemann via Python-Dev <python-dev@python.org> wrote:
Isn't ordered dict also useful for **kwargs?
3.6 already made this semantic change for **kwargs and class execution namespaces - it just left the door open to having implementations meet those requirements by way of substituting in collections.OrderedDict for those use cases, rather than using a regular builtin dictionary. So we're also already imposing the requirement on conformant Python implementations to have a reasonably performant insertion-ordered mapping implementation available. The open questions around insertion ordering thus relate to what user expectations should be for: - dictionary displays - explicit invocations of the builtin dict constructor - mutating methods on builtin dicts - serialisation and deserialisation operations that use regular dicts (e.g. the JSON module, csv.DictReader/Writer) It's that last category which is particularly problematic now, as in *C*Python 3.6, the dicts used for these operations actually *are* insertion ordered, so round trips from disk, through a CPython builtin dict, and then back to disk *will* be order preserving, whereas in previous versions there was no guarantee that that would be the case (and hash randomisation meant the key reordering wasn't even deterministic). While such operations were already order preserving in PyPy (since their switch to an insertion-ordered builtin dict implementation served as prior art for the subsequent switch in CPython), we know from experience that it's incredibly easy for even things that are nominally implementation details of CPython to become expected behaviour for Python users (e.g. there's still plenty of code out there that relies on the use of an eager refcounting memory management strategy to handle external resources properly). That means our choices for 3.7 boil down to: * make this a language level guarantee that Python devs can reasonably rely on * deliberately perturb dict iteration in CPython the same way the default Go runtime does [1] When we did the "insertion ordered hash map" availability review, the main implementations we were checking on behalf of were Jython & VOC (JVM implementations), Batavia (JavaScript implementation), and MicroPython (C implementation). Adding IronPython (C# implementation) to the mix gives: * for the JVM, the insertion ordered LinkedHashMap [2] has been available since J2SE 1.4 was released in 2002 * for JavaScript, ECMA 6 defines the Map type [3] as an insertion ordered key/value store * for the .NET CLR, System.Collections.Specialized.OrderedDictionary [4] seems to be the best builtin option, but the Java implementations also map reasonably well to C# semantics * for MicroPython, it's not yet clear whether or not the hash map design used in CPython could also be adapted to their design constraints (i.e. giving both insertion ordering and O(1) lookup without requiring excessive amounts of memory), but it should at least be feasible to make the semantics configurable based on a compile time option (since CPython has a working C implementation of the desired semantics already) Since the round-trip behaviour that comes from guaranteed order preservation is genuinely useful, and we're comfortable with folks switching to more specialised containers when they need different performance characteristics from what the builtins provide, elevating insertion order preservation to a language level requirements makes sense. Cheers, Nick. [1] https://blog.golang.org/go-maps-in-action (search for "Iteration Order") [2] https://docs.oracle.com/javase/8/docs/api/java/util/LinkedHashMap.html [3] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Obj... [4] https://docs.microsoft.com/en-us/dotnet/api/system.collections.specialized.o... -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Nov 06, 2017 at 01:07:51PM +1000, Nick Coghlan wrote:
I agree with this choice. My preference is for the first: having dicts be unordered has never been a positive virtue in itself, but always the cost we paid for fast O(1) access. Now what we have fast O(1) access *without* dicts being unordered, we should make it a language guarantee. Provided of course that we can be reasonable certain that other implementations can do the same. And it looks like we can. But if we wanted to still keep our options open, how about weakening the requirement that globals() and object __dicts__ be specifically the same type as builtin dict? That way if we discover a super-fast and compact dict implementation (maybe one that allows only string keys?) that is unordered, we can use it for object namespaces without affecting the builtin dict.
Shouldn't we check with Nuitka (C++) and Cython as well? I'd be surprised if this is a problem for either of them, but we should ask.
+1 OrderedDict could then become a thin wrapper around regular dicts. -- Steve
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 6 November 2017 at 19:14, Steven D'Aprano <steve@pearwood.info> wrote:
If you're using dicts as dicts, Cython just uses the CPython ones via the C API, rather than defining its own. I didn't do any specific research for C++ (since it's essentially the king of fast low level data structure design, hence its popularity in graphics programming), but did see a few references to C++ ordered hash map implementations while looking into the available options for other runtimes. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 5 November 2017 at 04:35, Guido van Rossum <guido@python.org> wrote:
When I asked Damien George about this for MicroPython, he indicated that they'd have to choose between guaranteed order and O(1) lookups given their current dict implementation. That surprised me a bit (since PyPy and CPython both *saved* memory by switching to their guaranteed order implementations, hence the name "compact dict representation"), but my (admittedly vague) understand is that the presence of a space/speed trade-off in their case has something to do with MicroPython deliberately running with a much higher chance of hash collisions in general (since the data sets it deals with are naturally smaller). So if we make the change, MicroPython will likely go along with it, but it may mean that dict lookups there become O(N), and folks will be relying on "N" being consistently small due to memory constraints (but some typically O(N) algorithms will still become O(N^2) when run on MicroPython). I don't think that situation should change the decision, but I do think it would be helpful if folks that understand CPython's dict implementation could take a look at MicroPython's dict implementation and see if it might be possible for them to avoid having to make that trade-off and instead be able to use a naturally insertion ordered hashmap implementation. Cheers, Nick. P.S. If anyone does want to explore MicroPython's dict implementation, and see if there might be an alternate implementation strategy that offers both O(1) lookup and guaranteed ordering without using additional memory, the relevant files seem to be: * https://github.com/micropython/micropython/blob/77a48e8cd493c0b0e0ca2d2ad58a... (C level hashmap/ordered array structs) * https://github.com/micropython/micropython/blob/master/py/map.c (C level hashmap/ordered array implementation) * https://github.com/micropython/micropython/blob/master/py/objdict.c (Python dict wrapper around the mapping impl) The current behaviour is that the builtin dict uses the hashmap algorithms, while collections.OrderedDict uses the ordered array algorithms. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/32b67/32b67145b0fe3069a1de27c1ec5fc1c9428e9b97" alt=""
I've just looked at the MicroPython dictionary implementation and think they won't have a problem implementing O(1) compact dicts with ordering. The likely reason for the confusion is that they are already have an option for an "ordered array" dict variant that does a brute-force linear search. However, their normal hashed lookup is very similar to ours and is easily amenable to being compact and ordered. See: https://github.com/micropython/micropython/blob/77a48e8cd493c0b0e0ca2d2ad58a... Pretty much any implementation hashed lookup of keys and values is amenable to being compact and ordered. Whatever existing logic that looks up an entry becomes a lookup into a table of indices which in turn references a sequential array of keys and values. This logic is independent of hashing scheme or density, and it has no effect on the number of probes or collision rate. The cost is an extra level of indirection and an extra array of indices (typically very small). The benefit is faster iteration over the smaller dense key/value array, overall memory savings resulting in improved cache utilization, and the side-effect of remembering insertion order. Summary: I think MicroPython will be just fine and if needed I will help create the patch that implements compact-and-ordered behavior. Raymond
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 6 November 2017 at 09:43, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Nice! That's what I thought based on reading some of the design discussions about CPython's dict implementation, but I didn't know the details of either dict implementation well enough to be confident that the CPython changes would map cleanly to MicroPython's variant. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Sun, 5 Nov 2017 12:04:41 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
MicroPython hashmap implementation is effectively O(n) (average and worst case) due to the algorithm parameters chosen (like the load factor of 1). Of course, parameters could be tweaked, but the ones chosen are so because the memory usage is far more important for MicroPython systems than performance characteristics (all due to small amounts of memory). Like, MicroPython was twice as fast than Python 3.3 on average, and 1000 times more efficient in the memory usage.
(since PyPy and CPython both *saved* memory by switching to their guaranteed order implementations, hence the name "compact dict
There's nothing to save in MicroPython's dict implementation, simply because it doesn't waste anything in the first place - uPy's dict entry is just (key, val) (2 machine words), and load factor of the dict can reach 1.0 as mentioned. []
I don't think that situation should change the decision,
Indeed, it shouldn't. What may change it is the simple and obvious fact that there's no need to change anything, as proven by the 25-year history of the language. What happens now borders on technologic surrealism - the CPython, after many years of persuasion, switched its dict algorithm, rather inefficient in terms of memory, to something else, less inefficient (still quite inefficient, taking "no overhead" as the baseline). That algorithm randomly had another property. Now there's a seemingly serious talk of letting that property leak into the *language spec*, despite the fact that there can be unlimited number of dictionary algorithms, most of them not having that property. What it will lead to is further fragmentation of the community. Python2 vs Python3 split is far from being over, and now there're splits between: * people who use "yield from" vs "await" * people who use f-strings vs who don't * people who rely on sorted nature of dict's vs who don't etc. []
That would be the first programmer in the history to have a cake and eat it too. Memory efficiency, runtime efficiency, sorted order: choose 2 of 3. -- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/1726b/1726ba2b828487263f4207552ae0786d5e9f1d07" alt=""
On Mon, Nov 06, 2017 at 12:18:17PM +0200, Paul Sokolovsky wrote:
$ cat xxx.py def pi_float(): """native float""" lasts, t, s, n, na, d, da = 0, 3.0, 3, 1, 0, 0, 24 while s != lasts: lasts = s n, na = n+na, na+8 d, da = d+da, da+32 t = (t * n) / d s += t return s for i in range(100000): x = pi_float() $ time ./micropython xxx.py real 0m4.424s user 0m4.406s sys 0m0.016s $ $ time ../../cpython/python xxx.py real 0m1.066s user 0m1.056s sys 0m0.010s Congratulations ... Stefan Krah
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Mon, 6 Nov 2017 11:36:59 +0100 Stefan Krah <stefan@bytereef.org> wrote:
[]
$ time ./micropython xxx.py $ time ../../cpython/python xxx.py
Congratulations ...
That's why I wrote "Python 3.3", it's hard to argue CPython doesn't do anything about the "Python is slow" proverb. It's still shouldn't be hard to construct a testcase where MicroPython is faster (by not implementing features not needed by 90% of Python programs of course, not "for free"). Anyway, where're memory measurements? (This is offtopic, so I shouldn't have replied.)
Stefan Krah
-- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/1726b/1726ba2b828487263f4207552ae0786d5e9f1d07" alt=""
On Mon, Nov 06, 2017 at 01:03:05PM +0200, Paul Sokolovsky wrote:
Sorry, that was a slightly mischievous benchmark indeed. -- Whether the proposal is surreal or not depends on the likelihood that a) a substantially faster dict algorithm will emerge and b) CPython, PyPy and Jython will switch to it. My proposal was based on the fact that for almost two release cycles the ordering implementation detail hasn't changed. Stefan Krah
data:image/s3,"s3://crabby-images/05644/056443d02103b56fe1c656455ffee12aa1a01f1f" alt=""
On Mon, Nov 6, 2017 at 10:18 AM, Paul Sokolovsky <pmiscml@gmail.com> wrote: the way it's been presented, but this is hardly an enhancement the language has been screaming for for years. Presumably there is little concern that algorithms that rely on this behaviour will be perfectly syntactically conformant with earlier versions but will fail subtly and without explanation? It's a small concern, but a real one - particularly for learners. regards Steve
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 6 November 2017 at 21:18, Steve Holden <steve@holdenweb.com> wrote:
A similar concern existed when we elevated sort stability to being a language requirement - if you relied on that guarantee, your code was technically buggy on versions prior to 2.3, but eventually 2.2 and earlier aged out of general use, allowing such code to become correct in general. So the current discussion is mainly about deciding where we want the compatibility burden to fall in relation to dict insertion ordering: 1. Do we deliberately revert CPython back to being harder to use correctly for the sake of making Python easier to implement? 2. Do we make Python harder to implement for the sake of making it easier to use? 3. Do we choose not to choose, thus implicitly choosing "2" by default due to the fact that Python is defined by a language spec and a reference implementation, rather than *just* a language spec? Here's a more-complicated-than-a-doctest-for-a-dict-repo, but still fairly straightforward, example regarding the "insertion ordering dictionaries are easier to use correctly" argument: import json data = {"a":1, "b":2, "c":3} rendered = json.dumps(data) data2 = json.loads(rendered) rendered2 = json.dumps(data2) # JSON round trip assert data == data2, "JSON round trip failed" # Dict round trip assert rendered == rendered2, "dict round trip failed" Both of those assertions will always pass in CPython 3.6, as well as in PyPy, because their dict implementations are insertion ordered, which means the iteration order on the dictionaries is always "a", "b", "c". If you try it on 3.5 though, you should fairly consistently see that last assertion fail, since there's nothing in 3.5 that ensures that data and data2 will iterate over their keys in the same order. You can make that code implementation independent (and sufficiently version dependent to pass both assertions) by using OrderedDict: from collections import OrderedDict import json data = OrderedDict(a=1, b=2, c=3) rendered = json.dumps(data) data2 = json.loads(rendered, object_pairs_hook=OrderedDict) rendered2 = json.dumps(data2) # JSON round trip assert data == data2, "JSON round trip failed" # Dict round trip assert rendered == rendered2, "dict round trip failed" However, despite the way this code looks, the serialised key order *might not* be "a, b, c" on 3.5 and earlier (it will be on 3.6+, since that already requires that kwarg order be preserved). So the formally correct version independent code that reliably ensures that the key order in the JSON file is always "a, b, c" looks like this: from collections import OrderedDict import json data = OrderedDict((("a",1), ("b",2), ("c",3))) rendered = json.dumps(data) data2 = json.loads(rendered, object_pairs_hook=OrderedDict) rendered2 = json.dumps(data2) # JSON round trip assert data == data2, "JSON round trip failed" # Dict round trip assert rendered == rendered2, "dict round trip failed" # Key order assert "".join(data) == "".join(data2) == "abc", "key order failed" Getting from the "Works on CPython 3.6+ but is technically non-portable" state to a fully portable correct implementation that ensures a particular key order in the JSON file thus currently requires the following changes: - don't use a dict display, use collections.OrderedDict - make sure to set object_pairs_hook when using json.loads - don't use kwargs to OrderedDict, use a sequence of 2-tuples For 3.6, we've already said that we want the last constraint to age out, such that the middle version of the code also ensures a particular key order. The proposal is that in 3.7 we retroactively declare that the first, most obvious, version of this code should in fact reliably pass all three assertions. Failing that, the proposal is that we instead change the dict iteration implementation such that the dict round trip will start failing reasonably consistently again (the same as it did in 3.5), so that folks realise almost immediately that they still need collections.OrderedDict instead of the builtin dict. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 7 November 2017 at 03:27, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
sort_keys is only equivalent to order preservation if the key order you want *is* alphabetical order. While that's typically a good enough assumption for JSON, it's not the case for things like CSV column order, TOML or ini-file setting order, etc. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/fef1e/fef1ed960ef8d77a98dd6e2c2701c87878206a2e" alt=""
I think that Paul has a point. Interestingly, at the same time we're talking about guaranteeing the order of dicts, we're talking about using another, unordered, data structure (hash array mapped tries) to improve the performance of something that resembles a namespace. It seems the "unordered" part will be visible through ExecutionContext.vars(). https://www.python.org/dev/peps/pep-0550/#enumerating-context-vars The ordered-ness of dicts could instead become one of those stable CPython implementation details, such as the fact that resources are cleaned up timely by reference counting, that people nevertheless should not rely on if they're writing portable code. Regards Antoine. On Mon, 6 Nov 2017 12:18:17 +0200 Paul Sokolovsky <pmiscml@gmail.com> wrote:
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Nov 06, 2017 at 12:27:54PM +0100, Antoine Pitrou wrote:
Given that (according to others) none of IronPython, Jython, Batavia, Nuitka, or even MicroPython, should have trouble implementing an insertion-order preserving dict, and that PyPy already has, why should we say it is a CPython implementation detail? -- Steve
data:image/s3,"s3://crabby-images/fef1e/fef1ed960ef8d77a98dd6e2c2701c87878206a2e" alt=""
On Tue, 7 Nov 2017 00:14:35 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
That's not what I'm taking away from Paul Sokolovsky's message I was responding to. If you think otherwise then please expand and/or contact Paul so that things are made clearer one way or the other.
data:image/s3,"s3://crabby-images/1383b/1383b5c083aa3fd057a17c816aec322848ea5a95" alt=""
Hi, Could be missed by me but the guarantee that dict literals are ordered implies not that dict must be ordered in all cases. The dict literal: d = {"a": 1, "b": 2} will keep the order of "a" and "b" because it is specified as a dict literal. But d["c"] = 3 can change this order and it is allowed by the specification of guaranteed ordered dict literals. Please correct me if I am wrong. In Python 3.6 it does not because dict is implemented ordered and the insertion order is preserved. Also I think we should give the whole thing more time and wait with this guarantee. There are valid concerns against this. Personally I like the ordering but if I need an ordered dict it is ok for me to write this explicitly. The **kwargs are already guaranteed to be ordered and this was useful because the OrderedDict constructor benefits and for other places it is useful. But making all dict's per default ordered is another level. Regards, Wolfgang
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Mon, 6 Nov 2017 14:30:45 +0100 Antoine Pitrou <solipsis@pitrou.net> wrote:
For MicroPython, it would lead to quite an overhead to make dictionary items be in insertion order. As I mentioned, MicroPython optimizes for very low bookkeeping memory overhead, so lookups are effectively O(n), but orderedness will increase constant factor significantly, perhaps 5x. Also, arguably any algorithm which would *maintain* insertion order over mutating operations would be more complex and/or require more memory that one which doesn't. (This is based on the idea that this problem is equivalent to maintaining a sorted data structure, where sorting key is "insertion order"). CPython 3.6 gives **kwargs in the key specification order? That's fine if that's all that it says (note that it doesn't say what happens if someone takes **kwargs and starts to "del []" or "[] =" it). That allows different ways to implement it, per particular implementation's choice. That's even implementable in MicroPython. Like, lookups will be less efficient, but nobody realistically passes hundreds of kwargs. It's pretty different to go from that to dictionaries in general. Saying that a general dict always maintains insertion order is saying that "in Python, you have to use complex, memory hungry algorithms for a simple mapping type". Saying something like "dict maintains insertion order until first modification" is going down the rabbit hole, making it all confusing, hard to remember, crazy-sounding to novices. Why all that, if, since the beginning of times, Python offered a structure with guaranteed ordering - list, and structure for efficient mapping one values into other - dict. Then even marriage between the two - OrderedDict. Why suddenly once in 25 years there's a need to do something to dict's, violating computer science background behind them (one of the reason enough people loved Python comparing to other "practical hack" languages)? Alternatives were already presented on the thread - if people want more and easier ordered dictionaries, it calls to add a special literal initializer for them ("o{}" was proposed). -- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/e87f3/e87f3c7c6d92519a9dac18ec14406dd41e3da93d" alt=""
On Mon, 6 Nov 2017 at 08:36 Paul Sokolovsky <pmiscml@gmail.com> wrote:
But that's an implementation detail that most folks won't care about.
I don't understand what "computer science background" is being violated?
I don't see that happening. If OrderedDict didn't lead to syntax then I don't see why that's necessary now. I think worrying about future optimizations that order preservation might prevent is a premature optimization/red herring. None of us can predict the future so worrying about some algorithmic breakthrough that will suddenly materialize on how to implement dictionaries seems unnecessary. That line of thought suggests we should guarantee as little as possible and just make most stuff undefined behaviour. I also think Paul S. has made it clear that MicroPython won't be implementing order-preserving dicts due to their memory constraints. That's fine, but I think we also need to realize that MicroPython is already a slight hybrid by being based on Python 3.4 but with a backport of async/await and a subset of the stdlib.
From what I have read, this argument breaks down to this:
Standardizing order preserving and ... - MicroPython won't implement it, but all other major implementations will - Those that have order preservation will implement PEPs 468 and 520 for free - Some minor practical benefits in terms of avoiding accidental reliance on ordering If we don't promise order preservation ... - CPython should probably add some perturbing to the iteration order - All implementations can handle this without issue or major implementation costs So to me this sounds like a decision between the pragmatism of choosing semantics that all Python implementations can handle or the developer benefits of an order-preserving dict everywhere (all the other arguments I think are minor compared to these two).
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Mon, 06 Nov 2017 17:58:47 +0000 Brett Cannon <brett@python.org> wrote: []
I tried to explain that in the previous mail, can try a different angle. So, please open you favorite CS book (better few) and look up "abstract data types", then "mapping/associative array" and "list". We can use Wikipedia too: https://en.wikipedia.org/wiki/Associative_array. So, please look up: "Operations associated with this data type allow". And you'll see, that there're no "ordering" related operations are defined. Vice versa, looking at "sequence" operations, there will be "prev/next", maybe "get n'th" element operations, implying ordering. Python used to be a perfect application of these principles. Its dict was a perfect CS implementation of an abstract associative array, and list - of "sequence" abstract type (with additional guarantee of O(1) random element access). People knew and rejoiced that Python is built on solid science principles, or could *learn* them from it. That no longer will be true, with a sound concept being replaced with on-the-spot practical hack, choosing properties of a random associative array algorithm implementation over properties of a superset of such algorithms (many of which are again don't offer any orderness guarantees). I know though what will be replied (based on the replies below): "all these are implementation details" - no, orderness vs non-orderness of a mapping algorithm is an implementation detail; "users shouldn't know all that" - they should, that's the real knowledge, and up until now, they could learn that from *Python docs*, "we can't predict future" - we don't need, we just need to know the past (25 years in our case), and understand why it was done like that, I don't think Guido couldn't code it ordered in 1991, it's just not natural for a mapping type to be so, and in 2017, it's not more natural than it was in 1991. MicroPython in particular appeared because Python offered all the CS-sound properties and freedom and alternative choices for implementation (more so than any other scripting language). It's losing it, and not just for MicroPython's surprise. [] -- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/e87f3/e87f3c7c6d92519a9dac18ec14406dd41e3da93d" alt=""
On Mon, 6 Nov 2017 at 11:08 Paul Sokolovsky <pmiscml@gmail.com> wrote:
I don't think you meant for this to come off as insulting, but telling me how to look up the definition of an associative array or map feels like you're putting me down. I also have a Ph.D. in computer science so I'm aware of the academic definitions of these data structures.
People knew and rejoiced that Python is built on solid science principles, or could *learn* them from it.
That no longer will be true,
I don't think it's fair to call the current dict implementation a hack. It's a sound design that has a certain property that we are discussing the masking of. As I said previously, I think this discussion comes down to whether we think there are pragmatic benefits to exposing the ordered aspects to the general developer versus not. -Brett
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
I strongly opposed adding an ordered guarantee to regular dicts. If the implementation happens to keep that, great. Maybe OrderedDict can be rewritten to use the dict implementation. But the evidence that all implementations will always be fine with this restraint feels poor, and we have a perfectly good explicit OrderedDict for those who want that. On Nov 6, 2017 7:39 PM, "Brett Cannon" <brett@python.org> wrote:
data:image/s3,"s3://crabby-images/32b67/32b67145b0fe3069a1de27c1ec5fc1c9428e9b97" alt=""
On Nov 6, 2017, at 8:05 PM, David Mertz <mertz@gnosis.cx> wrote:
I strongly opposed adding an ordered guarantee to regular dicts. If the implementation happens to keep that, great. Maybe OrderedDict can be rewritten to use the dict implementation. But the evidence that all implementations will always be fine with this restraint feels poor, and we have a perfectly good explicit OrderedDict for those who want that.
I think this post is dismissive of the value that users would get from having reliable ordering by default. Having worked with Python 3.6 for a while, it is repeatedly delightful to encounter the effects of ordering. When debugging, it is a pleasure to be able to easily see what has changed in a dictionary. When creating XML, it is joy to see the attribs show in the same order you added them. When reading a configuration, modifying it, and writing it back out, it is a godsend to have it written out in about the same order you originally typed it in. The same applies to reading and writing JSON. When adding a VIA header in a HTTP proxy, it is nice to not permute the order of the other headers. When generating url query strings for REST APIs, it is nice have the parameter order match documented examples. We've lived without order for so long that it seems that some of us now think data scrambling is a virtue. But it isn't. Scrambled data is the opposite of human friendly. Raymond P.S. Especially during debugging, it is often inconvenient, difficult, or impossible to bring in an OrderedDict after the fact or to inject one into third-party code that is returning regular dicts. Just because we have OrderedDict in collections doesn't mean that we always get to take advantage of it. Plain dicts get served to us whether we want them or not.
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
I agree with Raymond. dict ordered by default makes better developer experience. So, my concern is how "language spec" is important for minor (sorry about my bad vocabulary) implementation? What's difference between "MicroPython is 100% compatible with language spec" and "MicroPython is almost compatible with Python language spec, but has some restriction"? If it's very important, how about "strong recommendation for implementations" instead of "language spec"? Users who don't care implementations other than CPython and PyPy can rely on it's usability. Regards, INADA Naoki <songofacandy@gmail.com> On Tue, Nov 7, 2017 at 2:11 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Tue, 7 Nov 2017 14:40:07 +0900 INADA Naoki <songofacandy@gmail.com> wrote:
So, the problem is that there's no "Python language spec". And over time, that becomes a problem for alternative implementations, especially not mainstream ("we have infinite amount of memory to burn") ones. What we have is just CPython documentation, which mixes Python language spec and CPython implementation details. And is being changed (including "language spec" part) on the fiat of CPython developers, apparently without any guarantees of platform stability and backward compatibility. Over time, this really becomes a visible drawback, comparing to the close competitors. For example, year goes by year, but in JavaScript, [] + [] is still: '' That's stability! [] -- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Nov 7, 2017, at 09:39, Paul Sokolovsky <pmiscml@gmail.com> wrote:
So, the problem is that there's no "Python language spec”.
There is a language specification: https://docs.python.org/3/reference/index.html But there are still corners that are undocumented, or topics that are deliberately left as implementation details. Cheers, -Barry
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Nov 7, 2017 12:02 PM, "Barry Warsaw" <barry@python.org> wrote: On Nov 7, 2017, at 09:39, Paul Sokolovsky <pmiscml@gmail.com> wrote:
So, the problem is that there's no "Python language spec”.
There is a language specification: https://docs.python.org/3/refe rence/index.html But there are still corners that are undocumented, or topics that are deliberately left as implementation details. Also, specs don't mean that much unless there are multiple implementations in widespread use. In JS the spec matters because it describes the common subset of the language you can expect to see across browsers, and lets the browser vendors coordinate on future changes. Since users actually target and test against multiple implementations, this is useful. In python, CPython's dominance means that most libraries are written against CPython's behavior instead of the spec, and alternative implementations generally don't care about the spec, they care about whether they can run the code their users want to run. So PyPy has found that for their purposes, the python spec includes all kinds of obscure internal implementation details like CPython's static type/heap type distinction, the exact tricks CPython uses to optimize local variable access, the CPython C API, etc. The Pyston devs found that for their purposes, refcounting actually was a mandatory part of the python language. Jython, MicroPython, etc make a different set of compatibility tradeoffs again. I'm not saying the spec is useless, but it's not magic either. It only matters to the extent that it solves some problem for people. -n
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic. I could either be something like an environment variable SCRAMBLE_DICT_ORDER or a flag like --scramble-dict-order. That would probably help somewhat with the very real problem of "everyone's going to start counting on this ordered property". On 11/07/2017 12:58 PM, Barry Warsaw wrote:
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 7 November 2017 at 20:35, Paul G <paul@ganssle.io> wrote:
If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic.
I could either be something like an environment variable SCRAMBLE_DICT_ORDER or a flag like --scramble-dict-order. That would probably help somewhat with the very real problem of "everyone's going to start counting on this ordered property".
This seems like overkill to me. By the same logic, we should add a "delay garbage collection" mode, that allows people to test that their code doesn't make unwarranted assumptions that a reference-counting garbage collector is in use. Most public projects (which are the only ones that really need to worry about this sort of detail) will probably be supporting Python 3.5 and likely even Python 2.7 for some time yet. So they test under non-order-preserving dictionary implementations anyway. And if code that's only targeted for Python 3.7 assumes order preserving dictionaries, it's likely not got a huge user base anyway, so what's the problem? Paul
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
@ Paul Moore
But you can get pretty fine-grained control of garbage collection with judicious use of gc.disable(). If there were a similar mechanism for changing the ordering properties of dictionaries in code, I wouldn't suggest it as an interpreter flag / option. And you're right - it's not pressing - the people likely to test with dictionaries scrambled are exactly the people likely to be supporting 2.7 and 3.5, but that won't be true forever, and it would be nice to have *some* mechanism to test that you're not relying on this property. @Barry Warsaw
As has been suggested elsewhere, if we decide not to make that guarantee, then we should probably deliberately randomize iteration order.
This was my suggestion of a middle way, since deliberate randomization seems like a step too far just to avoid people relying on implementation details. I've seen plenty of code that assumes that `assert` statements will always throw `AssertionError`, but that's not guaranteed, and some people run their test suites with -O just to check that they aren't making that assumption. On 11/07/2017 04:15 PM, Paul Moore wrote:
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
This seems like overkill to me. By the same logic, we should add a "delay garbage collection" mode, that allows people to test that their code doesn't make unwarranted assumptions that a reference-counting garbage collector is in use. Actually, there is a LOT of code out there that expects reference counting. I know a lot of my code does. So if cPython does abandon it some day, there will be issues. Another thought: Let’s say Python declares that dict literals are order preserving. Then, one day, someone invents a massively more efficient non-order preserving implementation, and we want to use it for Python 4. So the Python 4 language spec says that dicts are not order preserving. And this is flagged as an INCOMPATIBLE CHANGE. Now everyone knows to go and check their code, and the 3to4 tool adds an “o” to all dict literals. People will complain, but it won’t be unexpected breakage. Compare to leaving it as an implementation detail — now we will have a lot of code in the wild that expects order-preservation (because people don’t read the language spec) that will break with such a change without any real warning. I think we really do need to accept that cPython is a reference implementation. Because it is. By the way, I only just realized I can delete a key to demonstrate non-order-preservation on py 3.6. So at least I know what to tell students now. -CHB But you can get pretty fine-grained control of garbage collection with judicious use of gc.disable(). If there were a similar mechanism for changing the ordering properties of dictionaries in code, I wouldn't suggest it as an interpreter flag / option. And you're right - it's not pressing - the people likely to test with dictionaries scrambled are exactly the people likely to be supporting 2.7 and 3.5, but that won't be true forever, and it would be nice to have *some* mechanism to test that you're not relying on this property. @Barry Warsaw As has been suggested elsewhere, if we decide not to make that guarantee, then we should probably deliberately randomize iteration order. This was my suggestion of a middle way, since deliberate randomization seems like a step too far just to avoid people relying on implementation details. I've seen plenty of code that assumes that `assert` statements will always throw `AssertionError`, but that's not guaranteed, and some people run their test suites with -O just to check that they aren't making that assumption. On 11/07/2017 04:15 PM, Paul Moore wrote: On 7 November 2017 at 20:35, Paul G <paul@ganssle.io> wrote: If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic. I could either be something like an environment variable SCRAMBLE_DICT_ORDER or a flag like --scramble-dict-order. That would probably help somewhat with the very real problem of "everyone's going to start counting on this ordered property". Most public projects (which are the only ones that really need to worry about this sort of detail) will probably be supporting Python 3.5 and likely even Python 2.7 for some time yet. So they test under non-order-preserving dictionary implementations anyway. And if code that's only targeted for Python 3.7 assumes order preserving dictionaries, it's likely not got a huge user base anyway, so what's the problem? Paul _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Nov 7, 2017, at 16:15, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
Actually, there is a LOT of code out there that expects reference counting. I know a lot of my code does. So if cPython does abandon it some day, there will be issues.
I see this all the time in code reviews: content = open(some file).read() I never let that go uncommented. So while you’re right that CPython is the reference implementation, and few people read the language spec, it’s still encombunt on us to point out broken code, code with implicit assumptions, and code that is not portable between implementations. Having the reference manual to point to chapter and verse is critical to avoid Python just devolving into an ad-hoc language ruled by its most popular implementation. This is something I believe Guido realized early on when JPython was first invented, and it was an important distinction that Python held, e.g. versus Perl. I still believe it’s an important principle to maintain. Cheers, -Barry
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 8 November 2017 at 07:15, Paul Moore <p.f.moore@gmail.com> wrote:
Quite a few projects these days include PyPy in their CI test matrix, and one of the things that does is test whether or not you're relying on refcounting semantics. We also offer ResourceWarning natively in CPython, which means if you run under "python3 -Wd", you'll get a warning when external resources like files are cleaned up implicitly: $ python3 -Wd Python 3.6.2 (default, Oct 2 2017, 16:51:32) [GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> f = open(".bashrc") >>> del f __main__:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='.bashrc' mode='r' encoding='UTF-8'> >>>
The concern is that if we don't explicitly perturb dict iteration order (or offer a way to opt-in to that), then insertion ordering will end up becoming a *de facto* compatibility requirement for Python implementations as CPython 2.7 and other releases prior to 3.6 start dropping out of typical test matrices. With both 2.7 and 3.5 going end-of-life in 2020, that means 3.7 (mid 2018) and 3.8 (late 2019 or early 2020) are our available opportunities to make that decision - beyond that, it starts getting a lot harder to change course away from implicit standardisation, as there'll be a lot more 3.6+-only code in the world by then. The way golang handled this problem is in their dict iterator: they added an extra field to hold a randomly generated hash, and used that hash as the starting point in their iteration sequence. We wouldn't be able to implement per-iterator order randomisation in CPython due to the way the PyDict_Next API works: that uses a caller-provided Py_ssize_t entry to track the current position in the iteration sequence. This means the simplest change we can make is to adjust the code in _PyDict_Next that reads the "current iteration index" from the user supplied pointer to instead interpret that as having a constant offset (e.g. starting with the last item in the "natural" iteration order, and then looping back around to the first one). "Simplest" isn't the same as "simple" though, as: 1. We can't change this globally for all dicts, as we actually *do* need keyword argument dicts and class body execution namespaces to be insertion ordered. That makes it either a per-instance setting, or else a subtly different dict type. 2. So far, I haven't actually come up with a perturbed iteration implementation that doesn't segfault the interpreter. The dict internals are nicely laid out to be iteration friendly, but they really do assume that you're going to start at index zero, and then iterate through to the end of the array. The bounds checking and pointer validity testing becomes relatively fiddly if you try to push against that and instead start iteration from a point partway through the storage array. That second point also becomes a concern from a performance perspective because this is code that runs on *each* iteration of a loop, rather than purely as part of the loop setup. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 8 November 2017 at 11:44, Nick Coghlan <ncoghlan@gmail.com> wrote:
In case anyone else wants to experiment with a proof of concept: https://github.com/ncoghlan/cpython/commit/6a8a6fa32f0a9cd71d9078fbb2b5ea44d... I think we've probably exhausted the utility of discussing this as a purely hypothetical change, and so the only way to move the discussion forward will be for someone to draft a patch that: 1. Perturbs iteration for regular dicts (it's OK for our purposes if it's still deterministic - it just shouldn't match insertion order the way odict does) 2. Switches keyword args and class body execution namespaces over to odict so the test suite passes again 3. Measures the impact such a change would have on the benchmark suite My experiment is a starting point, but it will still be a fair bit of work to get it from there to a viable proof of concept that can be assessed against the status quo. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
For now, odict use twice memory and 2x slower on iteration. https://bugs.python.org/issue31265#msg301942 INADA Naoki <songofacandy@gmail.com> On Wed, Nov 8, 2017 at 11:33 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Nov 7, 2017, at 12:35, Paul G <paul@ganssle.io> wrote:
If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic.
As has been suggested elsewhere, if we decide not to make that guarantee, then we should probably deliberately randomize iteration order. -Barry
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
On Wed, Nov 8, 2017 at 5:35 AM, Paul G <paul@ganssle.io> wrote:
If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic.
I could either be something like an environment variable SCRAMBLE_DICT_ORDER or a flag like --scramble-dict-order. That would probably help somewhat with the very real problem of "everyone's going to start counting on this ordered property".
Namespace is ordered by language spec. What does SCRAMBLE_DICT_ORDER in this code? class A: def __init__(self): self.a, self.b, self.c = 1, 2, 3 a = A() print(a.__dict__) a.__dict__.pop('a') print(a.__dict__) Anyway, I'm -1 on adding such option to dict. dict in CPython is complicated already for performance and compatibility reason. I don't want to add more complexity to dict for such reason. Regards, INADA Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
It seems there must be at least two threads for each topic worth discussing at all. Therefore I feel compelled to point to https://mail.python.org/pipermail/python-dev/2017-November/150381.html, where I state my own conclusion about dict order. I know Paul Sokolovsky does not claim to speak for MicroPython, but I think he had better shut up or he's nevertheless going to damage its reputation beyond repair. I note that micropython.org claims "MicroPython aims to be as compatible with normal Python as possible to allow you to transfer code with ease from the desktop to a microcontroller or embedded system." To me this implies that it is entirely up to the MicroPython project to decide what they'll do about the order of dict elements -- they can keep doing what they are doing, or choose a new dict implementation that satisfies their space and performance needs while also preserving order, or give developer a compile-time choice, or give users a choice at startup time, or something else I haven't thought of yet. Any of those options is better than continuing the flamewar that Paul is waging. Finally: the dict type should not be endowed with other parts of the OrderedDict API, not should other API changes to dict be considered. -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
On Nov 6, 2017 9:11 PM, "Raymond Hettinger" <raymond.hettinger@gmail.com> wrote:
I think this post is dismissive of the value that users would get from having reliable ordering by default. Dismissive seems like an overly strong word. I recognize I disagree with Raymond on best official semantics. Someone else points out that if someday an "even more efficient unordered dict" is discovered, user-facing "dict" doesn't strictly have to be the same data structure as "internal dict". The fact they are is admittedly an implementation detail also. I've had all those same uses about round-tripping serialization that Raymond mentions. I know the standard work arounds (which are not difficult, but DO require a little extra code if we don't have order). But like Raymond, I make most of my living TEACHING Python. I feel like the extra order guarantee would make teaching slightly harder. I'm sure he feels contrarily. It is true that with 3.6 I can no longer show an example where the dict display is oddly changed when printed. But then, unordered sets also wind up sorting small integers on printing, even though that's not a guarantee. Ordering by insertion order (possibly "only until first deletion") is simply not obvious to beginners. If we had, hypothetically, a dict that "always alphabetized keys" that would be more intuitive to them, for example. Insertion order feels obvious to us experts, but it really is an extra cognitive burden to learners beyond understanding "key/Val association". Having worked with Python 3.6 for a while, it is repeatedly delightful to encounter the effects of ordering. When debugging, it is a pleasure to be able to easily see what has changed in a dictionary. When creating XML, it is joy to see the attribs show in the same order you added them. When reading a configuration, modifying it, and writing it back out, it is a godsend to have it written out in about the same order you originally typed it in. The same applies to reading and writing JSON. When adding a VIA header in a HTTP proxy, it is nice to not permute the order of the other headers. When generating url query strings for REST APIs, it is nice have the parameter order match documented examples. We've lived without order for so long that it seems that some of us now think data scrambling is a virtue. But it isn't. Scrambled data is the opposite of human friendly. Raymond P.S. Especially during debugging, it is often inconvenient, difficult, or impossible to bring in an OrderedDict after the fact or to inject one into third-party code that is returning regular dicts. Just because we have OrderedDict in collections doesn't mean that we always get to take advantage of it. Plain dicts get served to us whether we want them or not.
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
On Tue, Nov 7, 2017 at 7:21 AM, David Mertz <mertz@gnosis.cx> wrote:
But like Raymond, I make most of my living TEACHING Python.
and I make least of my living TEACHING Python ( :-) ),and:
I feel like the extra order guarantee would make teaching slightly harder.
I can't understand how this possibly makes python (or dicts) harder to teach -- you can simply say: "dicts insertion order is preserved" or not mention it at all -- I think most people kind of expect it to be preserved, which is why I (used to )always bring up the lack-of-ordering of dicts early on -- but I suspect I simply won't bother mentioning it if it's decided as a language feature. I'm sure he feels contrarily. It is true that with 3.6 I can no longer show
an example where the dict display is oddly changed when printed.
Exactly! I have a really hard time deciding how to handle this -- explaining that ordering is not guaranteed, but not being able to demonstrate it! And frankly, my students are all going to forget what I "explained" soon enough, and replace it with their experience -- which will be that dicts retain their order. But then, unordered sets also wind up sorting small integers on printing,
even though that's not a guarantee.
but it's not hard to make an easy example with order not preserved -- jsut start with a non order example: In [6]: s = {3,7,4} In [7]: s Out[7]: {3, 4, 7} or use other types... And "set" is a mathematical concept that has no oder, whereas the "dictionary" metaphor DOES have order...
Ordering by insertion order (possibly "only until first deletion") is simply not obvious to beginners.
the "only until first deletion" part is really hard -- I hope we don't go that route. But I don't think insertion-order is non-obvious -- particularly with literals.
again, I don't think so -- I kind of agree if dicts did not preserve order in practice -- demonstrating that right out of the gate does help make the "key/Val association" clear -- but if you can't demonstrate it, I think we're looking at more confusion... Maybe I'll ask my students this evening -- this is the first class I'm teaching with py3.6 .... We've lived without order for so long that it seems that some of us now
think data scrambling is a virtue. But it isn't. Scrambled data is the opposite of human friendly.
exactly! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 7 November 2017 at 21:13, Chris Barker <chris.barker@noaa.gov> wrote:
But I thought we *had* gone that route. Actually, there's no "route" to go here. We're only talking about documenting the existing semantics that cPython has, and I thought that included no longer guaranteeing insertion order after a delete. Although I can't prove that by experiment at the moment. I don't know whether my confusion above is an argument for or against documenting the behaviour :-) Paul
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Nov 06, 2017 at 08:05:07PM -0800, David Mertz wrote:
I strongly opposed adding an ordered guarantee to regular dicts. If the implementation happens to keep that, great.
That's the worst of both worlds. The status quo is that unless we deliberately perturb the dictionary order, developers will come to rely on implementation order (because that's what the CPython reference implementation actually offers, regardless of what the docs say). Consequently: - people will be writing non-portable code, whether they know it or not; - CPython won't be able to change the implementation, because it will break too much code; - other implementations will be pressured to match CPython's implementation. The only difference is that on the one hand we are honest and up-front about requiring order-preserving dicts, and on the other we still require it, but pretend that we don't. And frankly, it seems rather perverse to intentionally perturb dictionary order just to keep our options open that someday there might be some algorithm which offers sufficiently better performance but doesn't preserve order. Preserving order is useful, desirable, often requested functionality, and now that we have it, it would have to be one hell of an optimization to justify dropping it again. (It is like Timsort and stability. How much faster sorting would it have taken to justify giving up sort stability? 50% faster? 100%? We wouldn't have done it for a 1% speedup.) It would be better to relax the requirement that builtin dict is used for those things that would benefit from improved performance. Is there any need for globals() to be the same mapping type as dict? Probably not. If somebody comes up with a much more efficient, non-order- preserving map ideal for globals, it would be better to change globals than dict. In my opinion.
I think you have a different definition of "poor" to me :-) Nick has already done a survey of PyPy (which already has insertion- order preserving dicts), Jython, VOC, and Batavia, and they don't have any problem with this. IronPython is built on C#, which has order- preserving mappings. Nuitka is built on C++, and if C++ can't implement an order-preserving mapping, there is something terribly wrong with the world. Cython (I believe) uses CPython's implementation, as does Stackless. The only well-known implementation that may have trouble with this is MicroPython, but it already changes the functionality of a lot of builtins and core language features, e.g. it uses a different method resolution order (so multiple inheritence won't work right), some builtins don't support slicing with three arguments, etc. I think the evidence is excellent that other implementations shouldn't have a problem with this, unless (like MicroPython) they are targetting machines with tiny memory resources. µPy runs on the PyBoard, which I believe has under 200K of memory. I think we can all forgive µPy if it only *approximately* matches Python semantics. -- Steve
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 7 November 2017 at 16:21, Steven D'Aprano <steve@pearwood.info> wrote:
While I think "poor" is understating the case, I think "excellent" (which you use later on) is overstating it. My own characterisation would be "at least arguably good enough".
For these, my research only showed that their respective platforms have an order-preserving hashmap implementation available. What's entirely unclear at this point is how switching wholesale to that may impact the *performance* of these implementations (both in terms of speed and memory usage), and how much code churn would be involved in actually making the change. Making builtin dict() order-preserving may also still impose an ongoing complexity cost on these implementations if they end up having to split "the mapping we use for code execution namespaces" away from "the mapping we provide as the builtin dict". (That said, I think at least Jython already makes that distinction - I believe their string-only namespace dict is a separate type, whereas CPython plays dynamic optimisation games inside the regular dict type). So Barry's suggestion of providing an explicit "collections.UnorderedDict" as a consistent spelling for "an associative mapping without any ordering guarantees" is a reasonable one, even if it's primary usage in CPython ends up being to ensure algorithms are compatible with collections that don't provide an inherently stable iteration order, and any associated performance benefits are mostly seen on other implementations. (As Paul S notes, such a data type would also serve a pedagogical purpose when teaching computer science principles)
IronPython is built on C#, which has order- preserving mappings.
I couldn't actually find a clearly suitable existing collection type in the .NET CLR - the one I linked was just the one commonly referenced as "good enough for most purposes". It had some constraints that meant it may not be suitable as a builtin dict type in a Python implementation (e.g. it looked to have a 32-bit length limit).
Right, the other C/C++ implementations that also target environments with at least 128 MiB+ RAM (and typically more) can reasonably be expected to tolerate similar space/speed trade-offs to those that CPython itself makes (and that's assuming they aren't just using CPython's data type implementations in the first place).
It runs on the ESP8266 as well, and that only has 96 kiB data memory. This means we're already talking 3-4 orders of magnitude difference in memory capacity and typical data set sizes between CPython and MicroPython use cases, and that's only accounting for the *low* end of CPython's use cases - once you expand out to multi-terabyte data sets (the very upper end of what a single x86-64 server can handle if you can afford the RAM for it), then we're talking 9-10 orders of magnitude between CPython's high end and MicroPython's low end. So for CPython's target use cases algorithmic efficiency dominates performance, and we afford to invest extra memory usage and startup overhead in service to more efficient data access algorithms. MicroPython's the opposite - you're going to run out of memory for data storage long before algorithmic inefficiency becomes your biggest problem, but wasted bookkeeping memory and startup overhead can cause problems even with small data sets. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/41d1b/41d1b12bb02cdccb4080ecb917b5d7ebb1daa0c6" alt=""
On 07/11/17 04:05, David Mertz wrote:
If there is an ordered guarantee for regular dicts but not for dict literals, which is the subject of this thread, then haven't we got a recipe for the kind of confusion that will lead to the number of questions from newbies going off of the Richter scale? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Nov 06, 2017 at 06:35:48PM +0200, Paul Sokolovsky wrote:
Paul, it would be good if you could respond to Raymond's earlier comments where he wrote: I've just looked at the MicroPython dictionary implementation and think they won't have a problem implementing O(1) compact dicts with ordering. The likely reason for the confusion is that they are already have an option for an "ordered array" dict variant that does a brute-force linear search. However, their normal hashed lookup is very similar to ours and is easily amenable to being compact and ordered. See: https://github.com/micropython/micropython/blob/77a48e8cd493c0b0e0ca2d2ad58a... Raymond has also volunteered to assist with this.
I think it would be reasonable to say that builtin dicts only maintain insertion order for insertions, lookups, and changing the value. Any mutation which deletes keys may arbitrarily re-order the dict. If the user wants a stronger guarantee, then they should use OrderedDict. -- Steve
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Nov 6, 2017, at 22:33, Steven D'Aprano <steve@pearwood.info> wrote:
In fact, that *is* leaking CPython’s implementation into the language specification. If by chance CPython’s implementation preserved order even after key deletion, either now or in the future, would that be defined as the ordering guarantees for built-in dict in the Python Language Reference? Cheers, -Barry
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Tue, 7 Nov 2017 17:33:03 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
I tried to do that, let me summarize previous point and give IMHO re: contributing an alternative: MicroPython's dict implementation is optimized for the least bookkeeping overhead, not performance on overlarge datasets. For the heap sizes we target (64KB on average), that's a good choice (put it differently, MicroPython's motto is "system's memory (all bunch kilobytes of it) belongs to user, not to MicroPython"). Adding insertion order would either: 1. Lead to significant (several times) slowdown, or 2. To noticeable memory overhead. Note that MicroPython uses the absolute minimum for a dictionary entry - 2 words (key and value). Adding even one extra word (e.g. some indirection pointer) means increasing overhead by 50%, cutting useful user memory size by third. With all that in mind, MicroPython is a very configurable project (215 config options as of this moment). We can have a config option for dict implementation too. But, the points above still hold - MicroPython targets low-memory systems and doesn't really target plenty-of-memory systems (there was never an aim to compete with CPython, the aim was to bring Python (the language) where CPython could never go). Put it another way, the alternative dict implementation is not expected to be used by default. If, with all the above in mind, someone, especially a CPython developer, wants to contribute an alternative dict implementation, it would be gladly accepted. (Note that if CPython followed the same policy, i.e. allowed compile-time selection of old vs new dict algorithm, we wouldn't have this thread.) (Disclaimer: all the above is just my IMHO as a long-time contributor, I'm not a MicroPython BDFL). And I really appreciate all the attention to MicroPython - that's the biggest case on my memory on python-dev. [] -- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Nov 6, 2017, at 02:18, Paul Sokolovsky <pmiscml@gmail.com> wrote:
This is the classic argument of, do you proceed conservatively and use the lowest-common denominator that makes your code work with the widest range of versions, or do you ride the wave and adopt the latest and greatest features as soon as they’re available? Neither answer is wrong or right… for everyone. It’s also a debate as old as the software industry. Every package, project, company, developer, community will have to decide for themselves. Once you realize you can’t win, you’ve won! :) -Barry
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
Is there a major objection to just adding in explicit syntax for order-preserving dictionaries? To some extent that seems like a reasonable compromise position in an "explicit is better than implicit" sense. A whole lot of code is out there that doesn't require or expect order-preserving dictionaries - it would be nice to be able to differentiate out the parts where order actually *does* matter. (Of course, given that CPython's implementation is order-preserving, a bunch of code is probably now being written that implicitly requires on this detail, but at least having syntax that makes that clear would give people the *option* to make the assumption explicit). On 11/06/2017 01:19 PM, Barry Warsaw wrote:
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Nov 6, 2017, at 11:23, Paul G <paul@ganssle.io> wrote:
Is there a major objection to just adding in explicit syntax for order-preserving dictionaries?
I don’t think new syntax is necessary. We already have OrderedDict, which to me is a perfectly sensible way to spell “I need a mapping that preserves insertion order”, and the extra import doesn’t bother me. I’m not saying whether or not to make the language guarantee that built-in dict preserves order. I’m just saying that if we don’t make that language change, we already have everything we need to support both use cases. If we did make the change, it’s possible we would need a way to explicit say that order is not preserved. That seems a little weird to me, but I suppose it could be useful. I like the idea previously brought up that iteration order be deliberately randomized in that case, but we’d still need a good way to spell that. Cheers, -Barry
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Mon, 6 Nov 2017 11:33:10 -0800 Barry Warsaw <barry@python.org> wrote: []
If we did make the change, it’s possible we would need a way to explicit say that order is not preserved. That seems a little weird
I recently was working on a more or less complex dataflow propagation problem. It should converge to a fixed point, and it did, but on different runs, to different ones. So, I know that I'm a bad programmer, need to do more of my homework and grow. I know, that if I rewrite it in C++ or C, it'll work unstable the same way, because it's buggy. (Heck, over these years, I learned that I don't need to rewrite things in C/C++, because Python is the *real* language, which works the way computers do, without sugaring that up). I need to remember that, because with Python 3.7, I may become a good-programmer-in-a-ponyland-of-ordered-dicts. Btw, in all this discussion, I don't remember anyone mentioning sets. I don't recall the way they're implemented in CPython, but they have strong conceptual and semantic resemblance to dict's. So, what about them, do they become ordered too?
Cheers, -Barry
-- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Nov 06, 2017 at 11:33:10AM -0800, Barry Warsaw wrote:
Useful for what? Given that we will hypothetically have order-preserving dicts that perform no worse than unordered dicts, I'm struggling to think of a reason (apart from performance) why somebody would intentionally use a non-ordered dict. If performance was an issue, sure, it makes sense to have a non-ordered dict for when you don't want to pay the cost of keeping insertion order. But performance seems to be a non-issue. I can see people wanting a SortedDict which automatically sorts the keys into some specified order. If I really work at it, I can imagine that there might even be a use-case for randomizing the key order (like calling random.shuffle on the keys). But if you are willing to use a dict with arbitrary order, that means that *you don't care* what order the keys are in. If you don't care, then insertion order should be no better or worse than any other implementation-defined arbitrary order.
That would only be in the scenario that we decide *not* to guarantee insertion-order preserving semantics for dicts, in order to prevent users from relying on an implementation feature that isn't a language guarantee. -- Steve
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
On Mon, Nov 6, 2017 at 11:23 AM, Paul G <paul@ganssle.io> wrote:
This is a really key point -- a heck of a lot more people use cPython than read the language spec. And a great deal of code is developed with a certain level of ignorance -- try something, if it works, and your test pass (if there are any), then you are done. So right now, there is more an more code out there that relies on a regular old dcit being ordered. I've been struggling with teaching this right now -- my written-a-couple-years ago materials talk about dicts being arbitrary order, and then have a little demo of that fact. Now I'm teaching with Python 3.6, and I had to add in something like: cPython does, in fact, preserve order with dicts, but it should be considered an implementation detail, and not counted on ... (and by the say, so does PyPy, and ....)" I don't know, but I'm going to guess about 0% of my class is going to remember that... And if we added o{,,,} syntax it would be even worse, 'cause folks would forget to use it, as their code wouldn't behave differently (kind of like the 'b' flag on unix text files, or the u"string" where everything is ascii in that string...) in short -- we don't have a choice (unless we add an explicit randomization as some suggested -- but that just seems perverse...) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 7 November 2017 at 09:23, Chris Barker <chris.barker@noaa.gov> wrote:
in short -- we don't have a choice (unless we add an explicit randomization as some suggested -- but that just seems perverse...)
And this is the key point for me: "choosing not to choose" is effectively the same as standardising the feature, as enough Python code will come to rely on CPython's behaviour that most alternative implementations will feel obliged to start behaving the same way CPython does (with MicroPython being the potential exception due to memory usage constraints always winning over algorithmic efficiency concerns in that context). We added ResourceWarning a while back to help discourage reliance on CPython promptly calling __del__ methods when dropping the last reference to an object. An equivalent for this case would be for dict objects to randomize iteration (ala Go), once again requiring folks to opt-in via collections.OrderedDict to get guaranteed ordering (perhaps with a "o{}" dict display as new syntactic sugar). But unless someone actually writes a PEP and implementation for that in the next 12 weeks (and Guido accepts it), then we'll have 2 releases and 3 years of CPython working a particular way increasing the inertia against making such a change in 3.8 (and beyond that, I'd say we'd be well and truly into de facto standardisation territory, and the chances of ever introducing deliberate perturbation of dict iteration order would drop to nil). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 7 November 2017 at 06:39, Nick Coghlan <ncoghlan@gmail.com> wrote:
Personally, I think that having an ordered implementation and then deliberately breaking that ordering is pretty silly. Not least because we chose not to do that for 3.6. So we're left with the simple question of whether we make the behaviour required in the documentation (which is basically where this thread started). I see 3 options. First, we maintain the status quo, treat it as a CPython/PyPy implementation detail, and accept that this means that people will expect it - resulting in pressure on alternative implementations to conform, but without a language spec to support them. Second, we document the requirement in the language spec, requiring alternative implementations to either implement it or document it as a way in which they don't confirm to the language spec (which is unattractive for them, as "implements the official Python language spec" is a selling point). Or third, we could document it as an optional behaviour, that language implementations are expected to implement if possible, and if they can't they should document the variance. That is to some extent the best of both worlds, in that it allows implementations to claim conformance with the spec while still not implementing the feature if it causes them problems to do so - they just have to document that they don't implement this optional but recommended behaviour. The downside of this option is that there's no precedent for it in the Python spec (it's basically C's "implementation defined behaviour", which is something Python hasn't needed this far). Paul
data:image/s3,"s3://crabby-images/2c29a/2c29a132620d9d96e8c9dfafc1a72fd807b973db" alt=""
On Tue, 2017-11-07 at 16:39 +1000, Nick Coghlan wrote:
Maybe an UnorderedDict could be added which Python implementations _can_ implement as an optimized (less memory use, faster, ...) version without ordering guarantees if they have a need for it. In other implementations it could just be a synonym for a regular dict. That way it would be explicit that the programmer doesn't care about the ordered behaviour. It would also avoid current mistakes some (especially those new to the language and occasional users) make, because of assumptions from default dict behaviour. (Maybe a commandline switch or other mechanisms to explicitly use that UnorderedDict as the default could also be useful. It would be a no-op in implementations which don't have differing implementations.) -- Jan Claeys
data:image/s3,"s3://crabby-images/f3b2e/f3b2e2e3b59baba79270b218c754fc37694e3059" alt=""
On 6 November 2017 at 17:23, Paul G <paul@ganssle.io> wrote:
Is there a major objection to just adding in explicit syntax for order-preserving dictionaries? To some extent that seems like a reasonable compromise position in an "explicit is better than implicit" sense. A whole lot of code is out there that doesn't require or expect order-preserving dictionaries - it would be nice to be able to differentiate out the parts where order actually *does* matter.
(Of course, given that CPython's implementation is order-preserving, a bunch of code is probably now being written that implicitly requires on this detail, but at least having syntax that makes that clear would give people the *option* to make the assumption explicit).
I think the additional syntax have the added benefit of preventing code that relies on the ordering of dict literals to be run on older versions, therefore triggering subtle bugs that had already being mentioned. And also, forgot along the discussion, is the big disadvantage that other Python implementations would have a quite significant overhead on mandatory ordered dicts. One that was mentioned along the way is transpilers, with Brython as an example - but there might be others. MircoPython is far from being the only implementation affected. js -><-
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Nov 06, 2017 at 10:17:23PM -0200, Joao S. O. Bueno wrote:
I don't think that is correct. Nick already did a survey, and found that C# (IronPython), Java (Jython and VOC) and Javascript (Batavia) all have acceptable insertion-order preserving mappings. C++ (Nuitka) surely won't have any problem with this (if C++ cannot implement an efficient order-preserving map, there is something terribly wrong with the world). As for other languages that somebody might choose to build Python on (the Parrot VM, Haskell, D, Rust, etc) surely we shouldn't be limiting what Python does for the sake of hypothetical implementations in "underpowered" languages? I don't mean to imply that any of those examples are necessarily underpowered, but if language Foo is incapable of supporting an efficient ordered map, then language Foo is simply not good enough for a serious Python implementation. We shouldn't allow Python's evolution to be hamstrung by the requirement to support arbitrarily weak implementation languages.
One that was mentioned along the way is transpilers, with Brython as an example - but there might be others.
Since Brython transpiles to Javascript, couldn't it use the standard Map object, which preserves insertion order? https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Obj... Quote: Description A Map object iterates its elements in insertion order The EMCAScript 6 standard specifies that Map.prototype.forEach operates over the key/value pairs in insertion order: https://tc39.github.io/ecma262/#sec-map-objects -- Steve
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Nov 06, 2017 at 12:18:17PM +0200, Paul Sokolovsky wrote:
I disagree -- the history of Python shows that having dicts be unordered is a PITA for many Python programmers. Python eventually gained an ordered dict because it provides useful functionality that developers demand. Every new generation of Python programmers comes along and gets confused by why dicts mysteriously change their order from how they were entered, why doctests involving dicts break, why keyword arguments lose their order, why they have to import a module to get ordered dicts instead of having it be built-in, etc. Historically, we had things like ConfigParser reordering ini files when you write them. Having dicts be unordered is not a positive virtue, it is a limitation. Up until now, it was the price we've paid for having fast, O(1) dicts. Now we have a dict implementation which is fast, O(1) and ordered. Why pretend that we don't? This is a long-requested feature, and the cost appears to be small: by specifying this, all we do is rule out some, but not all, hypothetical future optimizations. Unordered dicts served CPython well for 20+ years, but I doubt many people will miss them.
Trading off space for time is a very common practice. You said that lookups on MicroPython's dicts are O(N). How efficient is µPy when doing a lookup of a dict with ten million keys? µPy has chosen to optimize for space, rather than time. That's great. But I don't think you should sneer at CPython's choice to optimize for time instead. And given that µPy's dicts already fail to meet the expected O(1) dict behviour, and the already large number of functional differences (not just performance differences) between µPy and Python: http://docs.micropython.org/en/latest/pyboard/genrst/index.html I don't think that this will make much practical difference. MicroPython users already cannot expect to run arbitrary Python code that works in other implementations: the Python community is fragmented between µPy code written for tiny machines, and Python code for machines with lots of memory.
It will no more be a "leak" than any other deliberate design choice.
despite the fact that there can be unlimited number of dictionary algorithms, most of them not having that property.
Sure. So what? There's an unlimited number of algorithms that don't provide the functionality that we want. There are an unlimited number of sort algorithms, but Python guarantees that we're only going to use those that are stable. Similar applies for method resolution (which µPy already violates), strings, etc.
What it will lead to is further fragmentation of the community.
Aren't you concerned about fragmenting the community because of the functional differences between MicroPython and the specs? Sometimes a small amount of fragmentation is unavoidable, and not necessarily a bad thing.
Given that you state that µPy dicts are O(N) and unordered, does that mean you picked only 1 out of 3? -- Steve
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Nov 4, 2017, at 11:35, Guido van Rossum <guido@python.org> wrote:
This sounds reasonable -- I think when we introduced this in 3.6 we were worried that other implementations (e.g. Jython) would have a problem with this, but AFAIK they've reported back that they can do this just fine. So let's just document this as a language guarantee.
The other concern was backward compatibility issues. For example, if 3.7 makes this guarantee official, and folks write code with this assumption that has to work with older versions of Python, then that code could be buggy in previous versions and work in 3.7. This will probably be most evident in test suites, and such failures are often mysterious to debug (as we’ve seen in the past). That doesn’t mean we shouldn’t do it, but it does mean we have to be careful and explicit to educate users about how to write safe multi-Python-version code. Cheers, -Barry
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Yup. At least such code will not break in 3.5. In general if you write code using a newer version you should expect arbitrary breakage when trying to run it under older versions. There is no compatibility guarantee in that direction for anything anyways. I don't see this as a reason to not put this into the language spec at 3.7. On Sun, Nov 5, 2017 at 9:37 AM, Barry Warsaw <barry@python.org> wrote:
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
I'm not entirely sure I understand the full set of reasoning for this - I couldn't really tell what the problem with OrderedDict is from the link Stefan provided. It seems to me like a kind of huge change for the language to move from arbitrary-ordered to guaranteed-ordered dict. The problem I see is that this introduces a huge backwards compatibility burden on all implementations of Python. It's possible that no implementations of Python (either future CPython versions or current/future alternative interpreters) will find any reason to use anything but an insertion-order sorted dictionary, but given that we've done just fine with arbitrary-order semantics for the entire lifetime of the language /and/ there is a container (OrderedDict) which has guaranteed order semantics, it doesn't seem worth it to me. I think I would mostly be concerned with (in terms of likeliness to occur): 1. An edge case we haven't thought of where arbitrary order dictionaries would allow some crticial optimization for a given platform (perhaps in someone writing a transpiler to another language where the convenient equivalent container has arbitrary order, e.g. if Brython wants to implement dicts in terms of objects - https://stackoverflow.com/questions/5525795/does-javascript-guarantee-object... ) 2. Someone invents a new arbitrary-ordered container that would improve on the memory and/or CPU performance of the current dict implementation 3. Some sort of bug or vulnerability is discovered that makes insertion-ordered dictionaries an unwise choice (similar to the hash collision vulnerability that necessitated hash randomization - https://stackoverflow.com/questions/14956313#14959001 ). Perhaps these concerns are overblown, but if indeed guaranteed-order Mapping literals are critical in some or many cases, maybe it would be preferable to introduce new syntax for OrderedDict literals. Best, Paul On 11/05/2017 12:50 PM, Guido van Rossum wrote:
data:image/s3,"s3://crabby-images/1726b/1726ba2b828487263f4207552ae0786d5e9f1d07" alt=""
On Sun, Nov 05, 2017 at 01:14:54PM -0500, Paul G wrote:
I'm not entirely sure I understand the full set of reasoning for this - I couldn't really tell what the problem with OrderedDict is from the link Stefan provided. It seems to me like a kind of huge change for the language to move from arbitrary-ordered to guaranteed-ordered dict. The problem I see is that this introduces a huge backwards compatibility burden on all implementations of Python.
Scientific applications want something like {'a': 10, 'b': "foo", 'c': {'this': b'123'}} as an ordered initializer for unboxed or typed (or both) data. In general, if dicts are ordered, they can be used for example as initializers for (nested) C structs.
2. Someone invents a new arbitrary-ordered container that would improve on the memory and/or CPU performance of the current dict implementation
I would think this is very unlikely, given that the previous dict implementation has always been very fast. The new one is very fast, too. Stefan Krah
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
I can understand why you'd want an ordered container, I just don't see why it must be a dict. Why can't it be: OrderedDict(a=10, b='foo', c=OrderedDict(this=b'123')) Is it just that you don't want to type OrderedDict that many times? If it's so important to provide ordered dictionary literals, I would think it's a no-brainer to give them their own literal syntax (rather than re-defining dicts to have guaranteed order). e.g.: o{'a': 10, 'b': 'foo', 'c': o{'this': b'123'} Then there is no backwards incompatibility problem and users can express whether order does or does not matter to them when initializing a container. On 11/05/2017 01:39 PM, Stefan Krah wrote:
On Sun, Nov 05, 2017 at 01:14:54PM -0500, Paul G wrote:
I'm not entirely sure I understand the full set of reasoning for this - I couldn't really tell what the problem with OrderedDict is from the link Stefan provided. It seems to me like a kind of huge change for the language to move from arbitrary-ordered to guaranteed-ordered dict. The problem I see is that this introduces a huge backwards compatibility burden on all implementations of Python.
data:image/s3,"s3://crabby-images/98c42/98c429f8854de54c6dfbbe14b9c99e430e0e4b7d" alt=""
05.11.17 21:30, Stefan Krah пише:
I didn't try to implement this. But the current implementation requires periodical reallocating if add and remove items. The following loop reallocates the dict every len(d) iterations, while the size of the dict is not changed, and the half of its storage is empty. while True: v = d.pop(k) ... d[k] = v
data:image/s3,"s3://crabby-images/cd7e0/cd7e070dde02a16b1e5e428904c488e52e05ce40" alt=""
I think the question of whether any specific implementation of dict could be made faster for a given architecture or even that the trade-offs made by CPython are generally the right ones is kinda beside the point. It's certainly feasible that an implementation that does not preserve ordering could be better for some implementation of Python, and the question is really how much is gained by changing the language semantics in such a way as to cut off that possibility. On 11/05/2017 02:54 PM, Serhiy Storchaka wrote:
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
On Mon, Nov 6, 2017 at 4:54 AM, Serhiy Storchaka <storchaka@gmail.com> wrote: ...
FYI, Raymond's original compact dict (moving last item to slot used for deleted item) will break OrderedDict. So it's not easy to implement than it looks. OrderedDict uses linked list to keep which slot is used for the key. Moving last item will break it. It means odict.__delitem__ can't use PyDict_DelItem anymore and OrderedDict should touch internal structure of dict. I think current OrderedDict implementation is fragile loose coupling. While two object has different file (dictobject.c and odictobject.c), OrderedDict depends on dict's internal behavior heavily. It prevents optimizing dict. See comment here. https://github.com/python/cpython/blob/a5293b4ff2c1b5446947b4986f98ecf5d5243... I don't have strong opinion about what should we do about dict and OrderedDict. But I feel PyPy's approach (using same implementation and just override __eq__ and add move_to_end() method) is most simple. Regards, INADA Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/98c42/98c429f8854de54c6dfbbe14b9c99e430e0e4b7d" alt=""
06.11.17 05:01, INADA Naoki пише:
All this is implementation details. We did have little time for coordinated changing dict and OrderedDict implementations before releasing 3.6, and left existing coupling. This also prevents from in-place compactization of dict items. We could add a private flag to dict that denotes whether this dict is ordered or no. The ordinal dict could be more compact, while OrderedDict left ordered. And I like your issue31265. The current dict implementation still is young. It takes several optimizations in 3.7 (thank to you Inada) and AFAIK there are still not merged patches. I would wait until it become more stable before making change in language specification.
I think that the coupling with dict and OrderedDict should be more robust, but left private. Dict implementation should be aware of OrderedDict and provide some private methods for using in OrderedDict, but not restrict the optimization of unordered dicts.
data:image/s3,"s3://crabby-images/fac1f/fac1f353137c52e340a2ea0ffa4fe887a42e016d" alt=""
On 6 November 2017 at 06:09, Serhiy Storchaka <storchaka@gmail.com> wrote:
I would personally be happy with this as the guarantee (it covers dict literals and handles PEP 468), but it might be more confusing. "dicts are in arbitrary order" and "dicts maintain insertion order" are fairly simple to explain, "dicts maintain insertion order up to the point that a key is deleted" is less so. Tim Delaney
data:image/s3,"s3://crabby-images/b3d87/b3d872f9a7bbdbbdbd3c3390589970e6df22385a" alt=""
2017-11-05 18:50 GMT+01:00 Guido van Rossum <guido@python.org>:
I don't see this as a reason to not put this into the language spec at 3.7.
It can prevent some kinds of optimizations. Dictionaries are used "everywhere" in Python, so they are very important for performance. I would prefer to only keep the ordering warranty for function keyword arguments and class members, and use explicitly an ordered dictionary when needed. Sorry, I don't have any example right now of a concrete optimization which would not be possible with ordered dictionary. But Serhiy mentioned the performance impact of ordering in Python 3.6 dictionary on deletion. Victor
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 6 November 2017 at 18:46, Victor Stinner <victor.stinner@gmail.com> wrote:
Note that I *don't* think we should mandate that regular dictionaries be synonymous with collections.OrderedDict - I think it's fine to say that regular dicts are insertion ordered *until* a key is deleted, and after that their ordering is arbitrary. Insertion-ordered-until-the-first-delete is sufficient to guarantee serialisation round trips, dict display ordering, and keyword order preservation in the dict constructor (it's not sufficient to guarantee ordering for class namespaces, as those may contain "del" statements, but class bodies are also permitted to use a non-default mapping type). However, if we decide *not* to require that dictionaries be insertion ordered, then I think we should do the same thing Go did, and have dict iterators start at a random offset in the item sequence (that's not actually my idea - Greg Smith suggested it at the core dev sprints). Otherwise we'll end up standardising the 3.6 behaviour by default, as folks come to rely on it, and decline to support Python implementations that don't provide insertion ordering semantics. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
Hi, folks. TLDR, was the final decision made already? If "dict keeps insertion order" is not language spec and we continue to recommend people to use OrderedDict to keep order, I want to optimize OrderedDict for creation/iteration and memory usage. (See https://bugs.python.org/issue31265#msg301942 ) If dict ordering is language spec, I'll stop the effort and use remaining time to another optimizations. My thought is, +1 to make it language spec. * PHP (PHP 7.2 interpreter is faster than Python) keeps insertion order. So even we make it language spec, I think we have enough room to optimize. * It can make stop discussion like "Does X keeps insertion order? It's language spec?", "What about Y? Z?". Everything on top of dict keeps insertion order. It's simple to learn and explain. Regards, INADA Naoki <songofacandy@gmail.com> On Sun, Nov 5, 2017 at 3:35 AM, Guido van Rossum <guido@python.org> wrote:
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
I'm in favor of stating that dict keeps order as part of the language spec. However re-reading https://mail.python.org/pipermail/python-dev/2017-November/150381.html there's still a point of debate: should it be allowed if dict reorders after deletion (presumably as a result of a rehash)? I'm in favor of prescribing that the order should be preserved even in that case, but I realize there's additional implementation work to be done. Inada-san, what do you think of this? --Guido On Thu, Dec 14, 2017 at 6:03 PM, INADA Naoki <songofacandy@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Oh, I just found https://mail.python.org/pipermail/python-dev/2017-November/150323.html so I already know what you think: we should go with "dicts preserve insertion order" rather than "dicts preserve insertion order until the first deletion". I guess we should wait for Serhiy to confirm that he's okay with this. On Thu, Dec 14, 2017 at 6:20 PM, Guido van Rossum <guido@python.org> wrote:
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/32b67/32b67145b0fe3069a1de27c1ec5fc1c9428e9b97" alt=""
I support having regular dicts maintain insertion order but am opposed to Inada changing the implementation of collections.OrderedDict We can have the first without having the second. Over the holidays, I hope to have time to do further analysis and create convincing demonstrations of why we want to keep the doubly-linked list implementation for collections.OrderedDict(). The current regular dictionary is based on the design I proposed several years ago. The primary goals of that design were compactness and faster iteration over the dense arrays of keys and values. Maintaining order was an artifact rather than a design goal. The design can maintain order but that is not its specialty. In contrast, I gave collections.OrderedDict a different design (later coded in C by Eric Snow). The primary goal was to have efficient maintenance of order even for severe workloads such at that imposed by the lru_cache which frequently alters order without touching the underlying dict. Intentionally, the OrderedDict has a design that prioritizes ordering capabilities at the expense of additional memory overhead and a constant factor worse insertion time. It is still my goal to have collections.OrderedDict have a different design with different performance characteristics than regular dicts. It has some order specific methods that regular dicts don't have (such as a move_to_end() and a popitem() that pops efficiently from either end). The OrderedDict needs to be good at those operations because that is what differentiates it from regular dicts. The tracker issue https://bugs.python.org/issue31265 is assigned to me and I currently do not approve of it going forward. The sentiment is nice but it undoes very intentional design decisions. In the upcoming months, I will give it additional study and will be open minded but it is not cool to use a python-dev post as a way to do an end-run around my objections. Back to the original topic of ordering, it is my feeling that it was inevitable that sooner or later we would guarantee ordering for regular dicts. Once we had a performant implementation, the decision would be dominated by how convenient it is users. Also, a single guarantee is simpler for everyone and is better than having a hodgepodge of rules stating that X and Y are guaranteed while Z is not. I think an ordering guarantee for regular dicts would be a nice Christmas present for our users and developers. Cheers, Raymond
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Dec 14, 2017 21:30, "Raymond Hettinger" <raymond.hettinger@gmail.com> wrote:
I support having regular dicts maintain insertion order but am opposed to Inada changing the implementation of collections.OrderedDict We can have the first without having the second. It seems like the two quoted paragraphs are in vociferous agreement. -n
data:image/s3,"s3://crabby-images/32b67/32b67145b0fe3069a1de27c1ec5fc1c9428e9b97" alt=""
I support having regular dicts maintain insertion order but am opposed to Inada changing the implementation of collections.OrderedDict We can have the first without having the second.
It seems like the two quoted paragraphs are in vociferous agreement.
The referenced tracker entry proposes, "Issue31265: Remove doubly-linked list from C OrderedDict". I don't think that should go forward regardless of whether regular dict order is guaranteed. Inada presented a compound proposition: either guarantee regular dict order or let him rip out the core design of OrderedDicts against my wishes. Raymond
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 15 December 2017 at 05:28, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
In contrast, I gave collections.OrderedDict a different design (later coded in C by Eric Snow). The primary goal was to have efficient maintenance of order even for severe workloads such at that imposed by the lru_cache which frequently alters order without touching the underlying dict. Intentionally, the OrderedDict has a design that prioritizes ordering capabilities at the expense of additional memory overhead and a constant factor worse insertion time.
That's interesting information - I wasn't aware of the different performance goals. I'd suggest adding a discussion of these goals to the OrderedDict documentation. Now that dictionaries preserve order (whether or not we make that language guaranteed or an implementation detail) having clear information on the intended performance trade-offs of an OrderedDict would help people understand why they might choose one over the other. Paul
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
That's interesting information - I wasn't aware of the different performance goals.
FYI, performance characteristic of my POC implementation of OrderedDict based on dict order are: * 50% less memory usage * 15% faster creation * 100% (2x) faster iteration * 20% slower move_to_end * 40% slower comparison (copied from https://bugs.python.org/issue31265#msg301942 ) Comparison is very unoptimized at the moment and I believe it can be more faster. On the other hand, I'm not sure about I can optimize move_to_end() more. If OrderdDict is recommended to be used for just keeping insertion order, I feel 1/2 memory usage and 2x faster iteration are more important than 20% slower move_to_end(). But if either "dict keeps insertion order" or "dict keeps insertion order until deletion" is language spec, there is no reason to use energy and time for discussion of OrderedDict implementation. Regards, INADA Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/32b67/32b67145b0fe3069a1de27c1ec5fc1c9428e9b97" alt=""
On Dec 15, 2017, at 7:53 AM, Guido van Rossum <guido@python.org> wrote:
Make it so. "Dict keeps insertion order" is the ruling. Thanks!
Thank you. That is wonderful news :-) Would it be reasonable to replace some of the OrderedDict() uses in the standard library with dict()? For example, have namedtuples's _asdict() go back to returning a plain dict as it did in its original incarnation. Also, it looks like argparse could save an import by using a regular dict. Raymond
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Fri, Dec 15, 2017 at 8:32 AM, Raymond Hettinger < raymond.hettinger@gmail.com> wrote:
If it's documented as OrderedDict that would be backwards incompatible, since that has additional methods. Even if not documented it's likely to break some code. So, I'm not sure about this (though I agree with the sentiment that OrderedDict is much less important now). -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/28d63/28d63dd36c89fc323fc6288a48395e44105c3cc8" alt=""
[Eric Snow <ericsnowcurrently@gmail.com>]
Does that include preserving order after deletion?
Given that we're blessing current behavior: - At any moment, iteration order is from oldest to newest. So, "yes" to your question. - While iteration starts with the oldest, .popitem() returns the youngest. This is analogous to how lists work, viewing a dict similarly ordered "left to right" (iteration starts at the left, .pop() at the right, for lists and dicts).
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Dec 15, 2017 10:50, "Tim Peters" <tim.peters@gmail.com> wrote: [Eric Snow <ericsnowcurrently@gmail.com>]
Does that include preserving order after deletion?
Given that we're blessing current behavior: - At any moment, iteration order is from oldest to newest. So, "yes" to your question. - While iteration starts with the oldest, .popitem() returns the youngest. This is analogous to how lists work, viewing a dict similarly ordered "left to right" (iteration starts at the left, .pop() at the right, for lists and dicts). Fortunately, this also matches OrderedDict.popitem(). It'd be nice if we could also support dict.popitem(last=False) to get the other behavior, again matching OrderedDict. -n
data:image/s3,"s3://crabby-images/32b67/32b67145b0fe3069a1de27c1ec5fc1c9428e9b97" alt=""
On Dec 15, 2017, at 7:53 AM, Guido van Rossum <guido@python.org> wrote:
Make it so. "Dict keeps insertion order" is the ruling.
On Twitter, someone raised an interesting question. Is the guarantee just for 3.7 and later? Or will the blessing also cover 3.6 where it is already true. The 3.6 guidance is to use OrderedDict() when ordering is required. As of now, that guidance seems superfluous and may no longer be a sensible practice. For example, it would be nice for Eric Smith when he does his 3.6 dataclasses backport to not have to put OrderedDict back in the code. Do you still have the keys to the time machine? Raymond
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Fri, Dec 15, 2017 at 12:45 PM, Raymond Hettinger < raymond.hettinger@gmail.com> wrote:
For 3.6 we can't change the language specs, we can just document how it works in CPython. I don't know what other Python implementations do in their version that's supposed to be compatible with 3.6 but I don't want to retroactively declare them non-conforming. (However for 3.7 they have to follow suit.) I also don't think that the "it stays ordered across deletions" part of the ruling is true in CPython 3.6. I don't know what guidance to give Eric, because I don't know what other implementations do nor whether Eric cares about being compatible with those. IIUC micropython does not guarantee this currently, but I don't know if they claim Python 3.6 compatibility -- in fact I can't find any document that specifies the Python version they're compatible with more precisely than "Python 3". -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/32b67/32b67145b0fe3069a1de27c1ec5fc1c9428e9b97" alt=""
FWIW, the regular dict does stay ordered across deletions in CPython3.6: >>> d = dict(a=1, b=2, c=3, d=4) >>> del d['b'] >>> d['b'] = 5 >>> d {'a': 1, 'c': 3, 'd': 4, 'b': 5} Here's are more interesting demonstration: from random import randrange, shuffle from collections import OrderedDict population = 1000000 s = list(range(population // 4)) shuffle(s) d = dict.fromkeys(s) od = OrderedDict.fromkeys(s) for i in range(500000): k = randrange(population) d[k] = i od[k] = i k = randrange(population) if k in d: del d[k] del od[k] assert list(d.items()) == list(od.items()) The dict object insertion logic just appends to the arrays of keys, values, and hashvalues. When the number of usable elements decreases to zero (reaching the limit of the most recent array allocation), the dict is resized (compacted) left-to-right so that order is preserved. Here are some of the relevant sections from the 3.6 source tree: Objects/dictobject.c line 89: Preserving insertion order It's simple for combined table. Since dk_entries is mostly append only, we can get insertion order by just iterating dk_entries. One exception is .popitem(). It removes last item in dk_entries and decrement dk_nentries to achieve amortized O(1). Since there are DKIX_DUMMY remains in dk_indices, we can't increment dk_usable even though dk_nentries is decremented. In split table, inserting into pending entry is allowed only for dk_entries[ix] where ix == mp->ma_used. Inserting into other index and deleting item cause converting the dict to the combined table. Objects/dictobject.c::insertdict() line 1140: if (mp->ma_keys->dk_usable <= 0) { /* Need to resize. */ if (insertion_resize(mp) < 0) { Py_DECREF(value); return -1; } hashpos = find_empty_slot(mp->ma_keys, key, hash); } Objects/dictobject.c::dictresize() line 1282: PyDictKeyEntry *ep = oldentries; for (Py_ssize_t i = 0; i < numentries; i++) { while (ep->me_value == NULL) ep++; newentries[i] = *ep++; }
I don't know what guidance to give Eric, because I don't know what other implementations do nor whether Eric cares about being compatible with those. IIUC micropython does not guarantee this currently, but I don't know if they claim Python 3.6 compatibility -- in fact I can't find any document that specifies the Python version they're compatible with more precisely than "Python 3".
I did a little research and here' what I found: "MicroPython aims to implement the Python 3.4 standard (with selected features from later versions)" -- http://docs.micropython.org/en/latest/pyboard/reference/index.html "PyPy is a fast, compliant alternative implementation of the Python language (2.7.13 and 3.5.3)." -- http://pypy.org/ "Jython 2.7.0 Final Released (May 2015)" -- http://www.jython.org/ "IronPython 2.7.7 released on 2016-12-07" -- http://ironpython.net/ So, it looks like your could say 3.6 does whatever CPython 3.6 already does and not worry about leaving other implementations behind. (And PyPy is actually ahead of us here, having compact and order-preserving dicts for quite a while). Cheers, Raymond
data:image/s3,"s3://crabby-images/05644/056443d02103b56fe1c656455ffee12aa1a01f1f" alt=""
On Fri, Dec 15, 2017 at 9:47 PM, Guido van Rossum <guido@python.org> wrote:
[...]
They currently specify 3.4+. Specifically, https://github.com/micropython/micropython includes: """ MicroPython implements the entire Python 3.4 syntax (including exceptions, with, yield from, etc., and additionally async/await keywords from Python 3.5). The following core datatypes are provided: str (including basic Unicode support), bytes, bytearray, tuple, list, dict, set, frozenset, array.array, collections.namedtuple, classes and instances. Builtin modules include sys, time, and struct, etc. Select ports have support for _thread module (multithreading). *Note that only a subset of Python 3 functionality is implemented for the data types and modules*. """ Note the emphasis I added on the last sentence.
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
Now that dicts are order-preserving, maybe we should change prettyprint: In [7]: d = {'one':1, 'two':2, 'three':3} In [8]: print(d) {'one': 1, 'two': 2, 'three': 3} order preserved. In [9]: pprint.pprint(d) {'one': 1, 'three': 3, 'two': 2} order not preserved ( sorted, I presume? ) With arbitrary order, it made sense to sort, so as to always give the same "pretty" representation. But now that order is "part of" the dict itself, it seems prettyprint should present the preserved order of the dict. NOTE: I discovered this making examples for an intro to Python class -- I was updating the part where I teach that dicts do not preserve order. I was using iPython, which, unbeknownst to me, was using pprint under the hood, so got a different order depending on whether I simply displayed the dict (which used pprint) or called str() or repr() on it. Pretty confusing. Will changing pprint be considered a breaking change? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Mon, Dec 18, 2017 at 7:02 PM, Barry Warsaw <barry@python.org> wrote:
Wait, what? Why would changing pprint (so that it accurately reflects dict's new underlying semantics!) be a breaking change? Are you suggesting it shouldn't be changed in 3.7? -n -- Nathaniel J. Smith -- https://vorpus.org
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Dec 18, 2017 at 07:37:03PM -0800, Nathaniel Smith wrote:
I have a script which today prints data like so: {'Aaron': 62, 'Anne': 51, 'Bob': 23, 'George': 30, 'Karen': 45, 'Sue': 17, 'Sylvester': 34} Tomorrow, it will suddenly start printing: {'Bob': 23, 'Karen': 45, 'Sue': 17, 'George': 30, 'Aaron': 62, 'Anne': 51, 'Sylvester': 34} and my users will yell at me that my script is broken because the data is now in random order. Now, maybe that's my own damn fault for using pprint instead of writing my own pretty printer... but surely the point of pprint is so I don't have to write my own? Besides, the docs say very prominently: "Dictionaries are sorted by key before the display is computed." https://docs.python.org/3/library/pprint.html so I think I can be excused having relied on that feature. -- Steve
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Mon, Dec 18, 2017 at 7:58 PM, Steven D'Aprano <steve@pearwood.info> wrote:
To make sure I understand, do you actually have a script like this, or is this hypothetical?
No need to get aggro -- I asked a question, it wasn't a personal attack. At a high-level, pprint's job is to "pretty-print arbitray Python data structures in a form which can be used as input to the interpreter" (quoting the first sentence of its documentation), i.e., like repr() it's fundamentally intended as a debugging tool that's supposed to match how Python works, not any particular externally imposed output format. Now, how Python works has changed. Previously dict order was arbitrary, so picking the arbitrary order that happened to be sorted was a nice convenience. Now, dict order isn't arbitrary, and sorting dicts both obscures the actual structure of the Python objects, and also breaks round-tripping through pprint. Given that pprint's overarching documented contract of "represent Python objects" now conflicts with the more-specific documented contract of "sort dict keys", something has to give. My feeling is that we should preserve the overarching contract, not the details of how dicts were handled. Here's another example of a teacher struggling with this: https://mastodon.social/@aparrish/13011522 But I would be in favor of adding a kwarg to let people opt-in to the old behavior like: from pprint import PrettyPrinter pprint = PrettyPrinter(sortdict=True).pprint -n -- Nathaniel J. Smith -- https://vorpus.org
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
Nathaniel Smith writes:
To make sure I understand, do you actually have a script like this, or is this hypothetical?
I have a couple of doctests that assume that pprint will sort by key, yes. It makes the tests look quite a bit nicer by pprinting the output, and I get sorting (which matters for some older Pythons) for free. (I admit I don't actually use those tests with older Pythons, but the principle stands.) I don't see why we don't do the obvious, namely add the option to use "native" order to the PrettyPrinter class, with the default being backward compatible. -- Associate Professor Division of Policy and Planning Science http://turnbull/sk.tsukuba.ac.jp/ Faculty of Systems and Information Email: turnbull@sk.tsukuba.ac.jp University of Tsukuba Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN
data:image/s3,"s3://crabby-images/05644/056443d02103b56fe1c656455ffee12aa1a01f1f" alt=""
On Tue, Dec 19, 2017 at 4:49 PM, Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Perhaps now key ordering has been pronounced we could either add a "sorted" method to dicts equivalent to the following code. def sorted(self): return {self[k] for k in sorted(self.keys())} Alternatively the sorted built-in could be modified to handle dicts in this way. Though I still find the assumption of any ordering at all a bit weird I suppose I'll grow used to it. regards Steve
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 19 December 2017 at 17:57, Steve Holden <steve@holdenweb.com> wrote:
I don't think there's any need for this.
Though I still find the assumption of any ordering at all a bit weird I suppose I'll grow used to it.
As far as I'm concerned, dictionaries are still exactly as they were before - key-value mappings with no inherent order. None of my code makes any assumption about the ordering of dictionaries, so it'll be 100% unaffected by this change. I find this whole debate about the "consequences" of mandating insertion order to be completely out of proportion. As far as I'm concerned, the only practical impact is that when you iterate over things like dictionary displays, **kw arguments, etc, you get the "obvious" order, and it's not a lucky accident that you do so. Certainly, with the order guaranteed, people who currently use OrderedDict will be able to simply use a dict in future - although I'd expect very few people will take advantage of this in the immediate future, as by doing so they'll be restricting their code to Python 3.7+ only, for no significant benefit. And let's not forget that OrderedDict is used far less frequently than plain dictionaries, so we're talking about a small percentage of a tiny percentage of uses of mapping objects in the wild that will in *any* way be affected by this change. Paul
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Dec 18, 2017 at 08:49:54PM -0800, Nathaniel Smith wrote:
On Mon, Dec 18, 2017 at 7:58 PM, Steven D'Aprano <steve@pearwood.info> wrote:
The details are much simplified, but basically, and my users probably won't literally yell at me, but yes I do. But does it matter? The thing about backwards-compatibility guarantees is that we have to proceed as if somebody does have such a script. We don't know who, we don't know why, but we have to assume that they are relying on whatever guarantees we've given and will be greatly inconvenienced by any change without sufficient notice.
I didn't interpret it as an attack. Sorry for any confusion, I was trying to be funny -- at least, it sounded funny in my own head.
The *high* level purpose of pprint is to *pretty-print* values, like the name says. If all we wanted was something that outputs an eval()'able representation, we already had that: repr(). But even that requirement that output can be used as input to the interpreter is a non-core promise. There are plenty of exceptions: recursive data structures, functions, any object with the default repr, etc. Even when it works, the guarantee is quite weak. For instance, even the object type is not preserved: py> class MyDict(dict): ... pass ... py> d = MyDict() py> x = eval(repr(d)) py> assert d == x py> assert type(d) == type(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> AssertionError So the "promise" that eval(repr(obj)) will round-trip needs to be understood as being one of those Nice To Have non-core promises, not an actual guaranteed feature. (The bold print giveth, and the fine print taketh away.) So the fact that the output of pprint doesn't preserve the order of the dict won't be breaking any documented language guarantees. (It is probably worth documenting explicitly though, rather than just letting it be implied by the sorted keys guarantee.)
The point of pprint is not merely to duplicate what repr() already does, but to output an aesthetically pleasing view of the data structure. There is no reason to think that is only for the purposes of debugging. pprint is listed in the docs under Data Types, not Debugging: https://docs.python.org/3/library/datatypes.html https://docs.python.org/3/library/debug.html
Beware of promising a feature for convenience, because people will come to rely on it. In any case, lexicographic (the default sorting) order is in some ways the very opposite of "arbitrary order".
Now, dict order isn't arbitrary,
No, we can't say that. Dicts *preserve insertion order*, that is all. There is no requirement that the insertion order be meaningful or significant in any way: it may be completely arbitrary. If I build a mapping of (say) product to price: d = {'hammer': 5, 'screwdriver': 3, 'ladder': 116} the order the items are inserted is arbitrary, probably representing the historical accident of when they were added to the database/catalog or when I thought of them while typing in the dict. The most we can say is that for *some* cases, dict order *may* be meaningful. We're under no obligation to break backwards-compatibility guarantees in order for pretty printing to reflect a feature of dicts which may or may not be of any significance to the user.
and sorting dicts both obscures the actual structure of the Python objects,
You can't see the actual structure of Python objects via pprint. For example, you can't see whether the dict is a split table (shared keys) or combined table. You can only see the parts of the public interface which the repr, or pprint, chooses to show. That's always been the case so nothing changes here. If pprint were new to 3.7, I daresay there would be a good argument to have it display keys in insertion order, but given backwards compatibility, that's not tenable without either an opt-in switch, or a period of deprecation.
and also breaks round-tripping through pprint.
Round-tripping need not promise to preserve order, since dicts don't care about order for the purposes of equality. Round-tripping already is a lossy operation: - object identity is always lost (apart from a few cached objects like small ints and singletons like None); - in some cases, the type of objects can be lost; - any attribute of the object which is not both reflected in its repr and set by its constructor will be lost; (e.g. x = something(); x.extra_attribute = 'spam') - many objects don't round-trip at all, e.g. functions and recursive data structures. So the failure of pprint to preserve such insertion order by default is just one more example.
I believe the overarching contract is to pretty print. Anything else is a Nice To Have. [...]
It would have to be the other way: opt-out of the current behaviour. -- Steve
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Dec 18, 2017, at 22:37, Nathaniel Smith <njs@pobox.com> wrote:
As others have pointed out, exactly because the current behavior is documented. And we all know that if it’s documented (and often even if it’s not, but that’s besides the point here) it will be relied upon. So we can’t change the default behavior. But I have no problems conceptually with giving users options. The devil is in the details though, e.g. should we special case dictionary sorting only? Should we use a sort `key` to mirror sorted() and list.sort()? We can figure those things out and whether it’s even worth doing. I don’t think that’s PEP-worthy, so if someone is sufficiently motivated, please open an issue on bpo. Cheers, -Barry
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
On Tue, Dec 19, 2017 at 8:14 AM, Barry Warsaw <barry@python.org> wrote:
Nathaniel Smith has pointed out that eval(pprint(a_dict)) is supposed to return the same dict -- so documented behavior may already be broken. (though I assume order is still ignored when comparing dicts, so: eval(pprint(a_dict)) == a_dict will still hold. But practicality beats purity, and a number of folks have already posted use-cases where they rely on sorted order, so there you go.
Should we use a sort `key` to mirror sorted() and list.sort()?
That would be a nice feature! If anything is done, I think we should allow a key function. and maybe have key=None as "unsorted" -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
data:image/s3,"s3://crabby-images/832a7/832a7d28e16a261c5f64f5c6fc6585753582feae" alt=""
On 19Dec2017 1004, Chris Barker wrote:
Nathaniel Smith has pointed out that eval(pprint(a_dict)) is supposed to return the same dict -- so documented behavior may already be broken.
Two relevant quotes from the pprint module docs:
Dictionaries are sorted by key before the display is computed.
It says nothing about the resulting dict being the same as the original one, just that it can be used as input. So these are both still true (until someone deliberately breaks the latter). In any case, there are so many ways to spoil the first point for yourself that it's hardly worth treating as an important constraint.
(though I assume order is still ignored when comparing dicts, so: eval(pprint(a_dict)) == a_dict will still hold.
Order had better be ignored when comparing dicts, or plenty of code will break. For example:
{'a': 1, 'b': 2} == {'b': 2, 'a': 1} True
Saying that "iter(dict)" will produce keys in the same order as they were inserted is not the same as saying that "dict" is an ordered mapping. As far as I understand, we've only said the first part. (And the "nerve" here is that I disagreed with even the first part, but didn't fight it too strongly because I never relied on the iteration order of dict. However, I *do* rely on nobody else relying on the iteration order of dict either, and so proposals to change existing semantics that were previously independent of insertion order to make them rely on insertion order will affect me. So now I'm pushing back.) Cheers, Steve
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Tue, Dec 19, 2017 at 4:56 PM, Steve Dower <steve.dower@python.org> wrote:
This is a pretty fine hair to be splitting... I'm sure you wouldn't argue that it would be valid to display the dict {"a": 1} as '["hello"]', just because '["hello"]' is a valid input to the interpreter (that happens to produce a different object than the original one) :-). I think we can assume that pprint's output is supposed to let you reconstruct the original data structures, at least in simple cases, even if that isn't explicitly stated.
I guess the underlying issue here is partly the question of what the pprint module is for. In my understanding, it's primarily a tool for debugging/introspecting Python programs, and the reason it talks about "valid input to the interpreter" isn't because we want anyone to actually feed the data back into the interpreter, but to emphasize that it provides an accurate what-you-see-is-what's-really-there view into how the interpreter understands a given object. It also emphasizes that this is not intended for display to end users; making the output format be "Python code" suggests that the main intended audience is people who know how to read, well, Python code, and therefore can be expected to care about Python's semantics.
Yes, this is never going to change -- I expect that in the long run, the only semantic difference between dict and OrderedDict will be in their __eq__ methods.
I mean, I don't want to be a jerk about this, and we still need to examine things on a case-by-case basis but... Guido has pronounced that Python dict preserves order. If your code "rel[ies] on nobody else relying on the iteration order", then starting in 3.7 your code is no longer Python. Obviously I like that change more than you, but to some extent it's just something we have to live with, and even if I disagreed with the new semantics I'd still rather the standard library handle them consistently rather than being half-one-thing-and-half-another. -n -- Nathaniel J. Smith -- https://vorpus.org
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Dec 19, 2017, at 20:32, Nathaniel Smith <njs@pobox.com> wrote:
pprint.pprint() is indeed mostly for debugging, but not always. As an example of what will break if you change the sorting guarantee: in Mailman 3 the REST etag is calculated from the pprint.pformat() of the result dictionary before it’s JSON-ified. If the order is changed, then it’s possible a client will have an incorrect etag for a structure that is effectively the same. -Barry
data:image/s3,"s3://crabby-images/227ad/227ad844da34915e2d53d651f1d0f394b1fcc61b" alt=""
On 12/19/2017 5:32 PM, Nathaniel Smith wrote:
Any dict object read in from pprint is going to be a different object, not the original one. And, unless the original insertion order was sorted by the same key as pprint uses to sort, the iteration order will be different from the original. As pointed out below, it will compare equal to the original dict. pprint has always allowed you to reconstruct the original data structures, but not the iteration order of dicts. With the new insertion order guarantee, nothing has changed, here. A far more interesting question than what pprint does to dict order is what marshal and pickle do (and have done) with the dict order, although I can't figure that out from the documentation.
data:image/s3,"s3://crabby-images/fef1e/fef1ed960ef8d77a98dd6e2c2701c87878206a2e" alt=""
On Tue, 19 Dec 2017 17:32:52 -0800 Nathaniel Smith <njs@pobox.com> wrote:
Actually, when you want to include a large constant in a Python program, pprint() can be useful to get a nicer formatting for your source code. That said, I do think that pprint() should continue sorting dicts by default. Even though dicts may be ordered *now*, most uses of dict don't expect any particular order. (I also think the pprint() example shows the potential confusion issues with making dict ordered by default, as the user and implementor of an API may not agree whether dict order is significant...) Regards Antoine.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Tue, Dec 19, 2017 at 04:56:16PM -0800, Steve Dower wrote:
On 19Dec2017 1004, Chris Barker wrote:
Indeed. Regular dicts preserve insertion order, they don't take insertion order into account for the purposes of equality. See the example here: https://docs.python.org/3.7/library/stdtypes.html#mapping-types-dict and the description of mapping equality: https://docs.python.org/3.7/reference/expressions.html#value-comparisons "Mappings (instances of dict) compare equal if and only if they have equal (key, value) pairs. Equality comparison of the keys and values enforces reflexivity." Changing that would be a *huge* backwards-compatibility breaking change. Aside: I've just noticed that mapping equality is not transitive: a == b and b == c does not imply that a == c. py> from collections import OrderedDict as OD py> a, b, c = OD.fromkeys('xyz'), dict.fromkeys('xyz'), OD.fromkeys('zyx')) py> a == b == c True py> a == c False -- Steve
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
Chris Barker writes:
Sure, but that's because we put shoes on a snake. Why anybody expects no impediment to slithering, I don't know! I understand the motivation to guarantee order, but it's a programmer convenience that has nothing to do with the idea of mapping, and the particular (insertion) order is very special and usually neither relevant nor reproducible. I have no problem whatsoever with just documenting any failure to preserve order while reproducing dicts, *except* that a process that inserts keys in the same order had better result in the same insertion order. Steve
data:image/s3,"s3://crabby-images/b8491/b8491be6c910fecbef774491deda81cc5f10ed6d" alt=""
On Thu, Dec 21, 2017 at 7:51 PM, Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
json, pickle == png, i.e., guaranteed lossless. repr, pprint == jpg, lossy for very specific motivating reasons. In particular, I use pprint output in regression baselines, and if the long documented sort-by-key behavior changed, I would not be happy.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Dec 18, 2017 at 06:11:05PM -0800, Chris Barker wrote:
Indeed. pprint.PrettyPrinter has separate methods for OrderedDict and regular dicts, and the method for printing dicts calls sorted() while the other does not.
I disagree. Many uses of dicts are still conceptually unordered, even if the dict now preserves insertion order. For those use-cases, insertion order is of no interest whatsoever, and sorting is still "prettier". -- Steve
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
On Mon, Dec 18, 2017 at 7:41 PM, Steven D'Aprano <steve@pearwood.info> wrote:
and many uses of dicts have "sorted" order as completely irrelevant, and sorting them arbitrarily is not necessarily pretty (you can't provide a sort key can you? -- so yes, it's arbitrary) I'm not necessarily saying we should break things, but I won't agree that pprint sorting dicts is the "right" interface for what is actually an order-preserving mapping. I would think it was only the right choice in the first place in order (get it?) to get a consistent representation, not because sorting was a good thing per se. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Dec 18, 2017 at 08:28:52PM -0800, Chris Barker wrote:
I completely agree. We might argue that it was a mistake to sort dicts in the first place, or at least a mistake to *always* sort them without allowing the caller to provide a sort key. But what's done is done: the fact that dicts are sorted by pprint is not merely an implementation detail, but a documented behaviour.
If sorting dicts was the "right" behaviour in Python 3.4, it remains the right behaviour -- at least for use-cases that don't care about insertion order. Anyone using pprint on dicts *now* doesn't care about insertion order. If they did, they would be using OrderedDict. That will change in the future, but even in the future there are lots of use-cases for dicts where insertion order might as well be random. The order that some dict happen to be constructed may not be "pretty" or significant or even consistent from one dict to the next. (If your key/values pairs are coming in from an external source, they might not always come in the same order.) I'm not denying that sometimes it would be nice to see dicts in insertion order. Right now, those use-cases are handled by OrderedDict but in the future many of them will be taken over by regular dicts. So we have a conflict: - for some use-cases, insertion order is the "right" way for pprint to display the dict; - but for others, sorting by keys is the "pretty" way for pprint to display the dict; - and there's no way for pprint to know which is which just by inspecting the dict. How to break this tie? Backwards compatibility trumps all. If we want to change the default behaviour of pprint, we need to go through a deprecation period. Or add a flag sorted=True, and let the caller decide.
*shrug* That's arguable. As you said yourself, dicts were sorted by key to give a "pretty" representation. I'm not so sure that consistency is the justification. What does that even mean? If you print the same dict twice, with no modifications, it will print the same whether you sort first or not. If you print two different dicts, who is to say that they were constructed in the same order? But the point is moot: whatever the justification, the fact that pprint sorts dicts by key is the defined behaviour, and even if it was a mistake to guarantee it, we can't just change it without a deprecation period. -- Steve
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Tue, Dec 19, 2017 at 6:09 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Personally, I think it's a good default behaviour. Frequently you have a dictionary that, in normal code, you look up by key rather than iterating over; but for debugging, you print out everything. (Example: HTTP request/response headers. In your code, you'll query the headers dict for "content-type", but it's nice to be able to say "show me every header".) Insertion order is meaningless, and it's nice to be able to read them in some kind of sane order. Simply sorting the keys in the default way is good enough for a LOT of use-cases.
Agreed, except for the last bit. I'd be inclined to kill two birds with one stone: add a flag sort_key=DEFAULT or sort_key=IDENTITY, which will sort the keys by themselves; you can provide a key function to change the way they're sorted, or you can pass sort_key=None to get them in insertion order.
In old versions of Python, this code would always produce the same result: d = {} d["qwer"] = 1 d["asdf"] = 2 d["zxcv"] = 3 print(d) import pprint; pprint.pprint(d) Then along came hash randomization, and the output would change - but pprint would still be consistent and tidy. Now we have insertion order retained, and both the repr and the pprint are consistent. Both of them are sane. They're just different sameness.
This is really the clincher. But IMO the current behaviour isn't *just* for backcompat; it's good, useful behaviour as it is. I wouldn't want to see it changed even _with_ deprecation. ChrisA
data:image/s3,"s3://crabby-images/832a7/832a7d28e16a261c5f64f5c6fc6585753582feae" alt=""
On 18Dec2017 2309, Steven D'Aprano wrote:
I have never needed OrderedDict before, and dict now also being ordered doesn't mean that when I reach for it I'm doing it because I need an ordered dict - I probably just need a regular dict. *Nothing* about dict should change for me between versions. Adding an option to pprint to explicitly control sorting without changing the default is fine. Please stop assuming that everyone wants an OrderedDict when they say dict. It's an invalid assumption. Cheers, Steve
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Mon, Dec 18, 2017 at 11:38 PM, Steve Dower <steve.dower@python.org> wrote:
Can we all take a deep breath and lay off the hyperbole? The only point under discussion in this subthread is whether pprint -- our module for producing nicely-formatted-reprs -- should continue to sort keys, or should continue to provide an accurate repr. There are reasonable arguments for both positions, but no-one's suggesting anything in the same solar system as "all users of dict have to be updated". Am I missing some underlying nerve that this is hitting for some reason? -n -- Nathaniel J. Smith -- https://vorpus.org
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 19 December 2017 at 08:27, Nathaniel Smith <njs@pobox.com> wrote:
IMO, the key thing is that people appear to be talking as if changing documented behaviour without deprecation is acceptable. Or even if they are OK with a deprecation period, they are still talking about changing documented behaviour for a trivial reason (as Steve Dower said, most people will still use dict for its dict behaviour, not for its orderedness). People (including me) are getting irritated because backward compatibility, which is normally a key principle, is being treated so lightly here. (It happened in another thread as well - changing the output of csv.DictReader from OrderedDict to dict - again suggesting that breaking a documented behaviour was OK, just because dicts are now ordered). Paul
data:image/s3,"s3://crabby-images/1726b/1726ba2b828487263f4207552ae0786d5e9f1d07" alt=""
On Sun, Nov 05, 2017 at 09:01:40PM +0200, Serhiy Storchaka wrote:
Do you suggest to make dictionary displays producing OrderedDict instead of dict?
No, this is essentially a language spec doc issue that would guarantee the ordering properties of the current dict implementation. Stefan Krah
data:image/s3,"s3://crabby-images/1726b/1726ba2b828487263f4207552ae0786d5e9f1d07" alt=""
On Sun, Nov 05, 2017 at 09:35:38PM +0200, Serhiy Storchaka wrote:
Yes, for my use case that would be sufficient and that's what I had in mind initially. A luxury syntax addition like {a = 10, b = {c = "foo"}} that is read as an OrderedDict (where the keys a, b, c are implicitly strings) would of course also be sufficient for my use case. But I suspect many users who have other use cases find it tantalizing not to be able to use the properties of the current regular dict. Stefan Krah
data:image/s3,"s3://crabby-images/1383b/1383b5c083aa3fd057a17c816aec322848ea5a95" alt=""
Hi, I have to admit sometimes I don't understand why small things produce so much mail traffic. :-) If I use a mapping like dict most of the time I don't care if it is ordered. If I need an ordering I use OrderedDict. In a library if I need a dict to be ordered most of the time there is a parameter allowing me to do this or to specify the used class as dict. If this is not the case it can be fixed. In rare cases a workaround is needed. As of now I have dict and OrderedDict, it is clear and explicit. No need to change. Yes it is useful for debugging to have things ordered. Yes in other places the implicit ordering is also fine. For pypy and CPython good and take it, be happy. Also it is fine to teach people that dict (Mapping) is not ordered but CPython has an implementation detail and it is ordered. But if you want the guarantee use OrderedDict. To be really practical even if this is guaranteed in 3.9 I cannot rely on it because of Python 2.7, 3.5, 3.6, ... compatibility. If this versions are out of order in 10 years, even then if I want to produce a small library running on another implementation I have to care, because of the list of differences to the CPython implementation or because the project is not yet up to date with the implementation (Jython). To be save then I will still use OrderedDict guaranteeing me this what I want. Finally even when dict literals will be guaranteed to be ordered it is good to teach and use OrderedDict because it is explicit. For implementations (algorithm) I cannot foresee the future so I cannot tell if it will be a burden or not. Finally someone have to decide it. As long as OrderedDict is available for me to specify it explicit it will be fine. ;-) Regards, Wolfgang On 04.11.2017 18:30, Stefan Krah wrote:
data:image/s3,"s3://crabby-images/9dd94/9dd94ad1163d17ff290b45ac3a45dd8a266b2174" alt=""
On 11/07/2017 09:00 AM, Wolfgang wrote:
I feel a bit out of place on this list since I'm a lurker and not a core developer, but I just wanted to add my agreement with this as well. So maybe take my opinion with a grain of salt... I agree with Wolfgang. I just don't understand why this change is needed. We have dict and we have OrderedDict. Why does dict need to provide the extra ordering constraints? I've read many posts in this discussion and find none of them convincing. Python already guarantees things like ordering of keyword arguments. I've seen some people point out confusion of newcomers (e.g. they are surprised when order is not surprised), but that just seems to me natural confusion that comes about when learning. I would argue that a better solution to that problem is exactly the go solution: i.e. purposely perturbing the ordering in a way that shows up immediately so that users realize the problems in their thinking earlier. The dict provides a mapping from key to value. I personally think that that is mentally much simpler object than a mapping from key to value with certain ordering guarantees. If I want to extra guarantees I import OrderedDict and read what the guarantees are. This seems totally fine to me. I don't really see any advantages to this change but a lack of implementation flexibility and a more complicated core object in Python. Cheers, Thomas
data:image/s3,"s3://crabby-images/ef9a3/ef9a3cb1fb9fd7a4920ec3c178eaddbb9c521a58" alt=""
On 11/07/2017 09:00 AM, Wolfgang wrote: [...]
I don't think that is fine. When I explained this in 3.5, dicts rearranging themselves seemed quite weird to the newcomers. This year, I'm not looking forward to saying that dicts behave "intuitively", but you shouldn't rely on that, because they're theoretically allowed to rearrange themselves. The concept of "implementation detail" and language spec vs. multiple interpreter implementations isn't easy to explain to someone in a "basic coding literacy" course. Today I can still show an example on Python 3.5. But most people I teach today won't run their code on 3.5, or on MicroPython or Brython, and quite soon they'll forget that there's no dict ordering guarantee. Also: I happen to read python-dev and the language docs. I suspect not all teachers do, and when they see that dict order randomization was "fixed", they might just remove the explanation from the lesson and teach something practical instead.
data:image/s3,"s3://crabby-images/81bb5/81bb5c99794aa6104ce457e1b76cfbdd07cddf90" alt=""
Hello, I agree with Wolfgang here. From what I gathered of the discussion, the argument started from « It would be nice if dict litterals returned ordered dicts instead of an unordered ones », which is mostly a good thing, as it allows e.g. `OrderedDict({'spam': 'ham', 'sausages': 'eggs'})` instead of having to rely on lists of couples to create an OrderedDict. It is not of utmost utility, but it would be nice to have and not dissimilar to what we already have with kwargs being ordered dicts, also a matter of slightly better usability. Possibly, in order to avoid implying that all dicts guarantee ordering at all time, a syntax such as `o{}` might be used, mostly to help newcomers. So far, so good. Where it started to go all haywire is when it became conflated that with the fact that CPython and Pypy dicts are actually ordered (up to a point at least) and it suddenly became « Let's guarantee the ordering of all dicts » which apparently is an issue for at least one implementation of Python, and still have to be implemented in several others (said implementation would be trivial, or so it is said, but it still has to be written, along with appropriate tests, regression checks…). So far, the arguments I have seen for that are 1. It is useful in context where one has to guarantee the ordering of some mapping (such as in json) 2. It is useful in contexts where ordering is facultative, but nice to have (debugging has been mentionned) 3. It is already this way in CPython, so people are going to use that anyway I take issue with all of those arguments. 1. If the ordered should be guaranteed, why would it be so hard to use OrderedDict ? - Just write `d = OrderedDict(((key, val) for key, value in …))` instead of `{key: value for key, value in …}`. It is not that hard and at least it is explicit that the order is important. And if it is really so hard, we could have dict comprehensions be ordered too in addition to litterals, it still doesn't mean that dicts should all be ordered - It is even easier if you fill your dict value-per-value, just initialise it as `d = OrderedDict` instead of `d = {}` and voilà ! 2. I would like to see some examples of cases where this is really much more convenient than any other soution, but even then I suspect that these cases are not sufficently relevant to wed all Python backends to ordered dicts forever. 3. This is just a pure fallacy. The language has a documented API that says that if order of insertion is important, you should explicitely use an OrderedDict. If people stray away from it and use implementation details such as the ordering of dict in CPython, they are on their own and shouldn't expect it to be portable to any other version. Again, it's not as if OrderedDict did not exist or was much more inconvenient to use than dict. Also, since the proposed implementation does not keep ordering on deletion, those relying implicitely on the ordering of dicts without reading the docs might get bitten by it later in much more subtle ways. Note that I don't sugest mandatory shuffling of dicts to advertise their non-guaranteed ordering, either. Just that reading the docs (or having your instructor tell you that dict does not guarantee the order) is the reponsibility of the end user. To sum it up - Ordered dict litterals are a nice idea, but probably not that important. If it happens, it would be nice if it could be extended to dict comprehensions, though. - Guaranteeing the ordering of all `dicts` does not make a lot of sense - Changing the API to guarantee the order of dicts **is** an API change, which still means work Am I missing something ? Cheers, E On 7 November 2017 at 10:51, Petr Viktorin <encukou@gmail.com> wrote:
data:image/s3,"s3://crabby-images/50535/5053512c679a1bec3b1143c853c1feacdabaee83" alt=""
On Nov 7, 2017, at 01:51, Petr Viktorin <encukou@gmail.com> wrote:
Perhaps, but IME, it’s not hard to teach someone that in a code review. Today, if I see someone submit a change that includes an implicit assumption about ordering, I’ll call that out. I can say “you can’t rely on dicts being ordered, so if that’s what you want, use an OrderedDict or sort your test data”. That’s usually a localized observation, meaning, I can look at the diff and see that they are assuming dict iteration ordering. What happens when built-in dict’s implementation behavior becomes a language guarantee? Now the review is much more difficult because I probably won’t be able to tell just from a diff whether the ordering guarantees are preserved. Do they delete a key somewhere? Who knows? I’m not even sure that would be statically determinable since I’d have to trace the use of that dictionary at run time to see if some “del d[key]” is deleting the key in the dict under review or not. I can probably only tell that at run time. So how to I accurately review that code? Is the order presumption valid or invalid? Cheers, -Barry
participants (40)
-
Antoine Pitrou
-
Barry Warsaw
-
Brett Cannon
-
Chris Angelico
-
Chris Barker
-
Chris Barker - NOAA Federal
-
Chris Jerdonek
-
David Mertz
-
Eric Fahlgren
-
Eric Snow
-
Eric V. Smith
-
Evpok Padding
-
Glenn Linderman
-
Guido van Rossum
-
INADA Naoki
-
Jan Claeys
-
Jim Baker
-
Joao S. O. Bueno
-
Mark Lawrence
-
Nathaniel Smith
-
Nick Coghlan
-
Paul G
-
Paul Ganssle
-
Paul Moore
-
Paul Sokolovsky
-
Peter Ludemann
-
Petr Viktorin
-
Raymond Hettinger
-
Serhiy Storchaka
-
Stefan Krah
-
Stephen J. Turnbull
-
Steve Dower
-
Steve Holden
-
Steven D'Aprano
-
Sven R. Kunze
-
Thomas Nyberg
-
Tim Delaney
-
Tim Peters
-
Victor Stinner
-
Wolfgang