Dict literal use for custom dict classes

I really like the OrderedDict class. But there is one thing that has always bothered me about it. Quite often I want to initialize a small ordered dict. When the keys are all strings this is pretty easy, since you can just use the keyword arguments. But when some, or all of the keys are other things this is an issue. In that case there are two options (as far as I know). If you want an ordered dict of this form for instance: {1: 'a', 4: int, 2: (3, 3)}, you would either have to use: OrderedDict([(1, 'a'), (4, int), (2, (3, 3))]) or you could use: d = OrderedDict() d[1] = 'a' d[4] = int d[2] = (3, 3) In my opinion both are quite verbose and the first is pretty unreadable because of all the nested tuples. That is why I have two suggestions for language additions that fix that. The first one is the normal dict literal syntax available to custom dict classes like this: OrderedDict{1: 'a', 4: int, 2: (3, 3)} This looks much cleaner in my opinion. As far as I can tell it could simply be implemented as if the either of the two above options was used. This would make it available to all custom dict types that implement the two options above. A second very similar option, which might be cleaner and more useful, is to make this syntax available (only) after initialization. So it could be used like this: d = OrderedDict(){1: 'a', 4: int, 2: (3, 3)} d{3: 4, 'a': 'c'} *>>> *OrderedDict(){1: 'a', 4: int, 2: (3, 3), 3: 4, 'a': 'c'} This would allow arguments to the __init__ method as well. And this way it could simply be a shorthand for setting multiple attributes. It might even be used to change multiple values in a list if that is a feature that is wanted. Lastly I think either of the two sugested options could be used to allow dict comprehensions for custom dict types. But this might require a bit more work (although not much I think). I'm interested to hear what you guys think. Jelte PS. I read part of this thread https://mail.python.org/pipermail/python-ideas/2009-June/thread.html#4916, but that seemed more about making OrderedDict itself a literal. Which is not what I'm suggesting here.

On Dec 12, 2015, at 18:13, Jelte Fennema <me@jeltef.nl> wrote:
I really like the OrderedDict class. But there is one thing that has always bothered me about it. Quite often I want to initialize a small ordered dict. When the keys are all strings this is pretty easy, since you can just use the keyword arguments.
I don't think this will work. Python uses a dict to pass in kwargs, so you've already lost ordering at that point, if I'm right.
My first preference would be for the proposals for dict to become ordered by default to come to fruition. If that turns out to be unacceptable (even though, as I understand it, PyPy has already been able to do this), then this looks pretty good to me. It's and alternate call operator that uses dict-like syntax to pass in a tuple of pairs.
IMO, this is strictly worse than the other option. It's confusing that the call operator wouldn't happen before this dict-like call. Either would be unneeded if dict was ordered by default, and it's far and way the best option, IMO.

= 3.4 though and is for fun only so I would not recommend using this in a
One of my favorite things about python is that it is always very clear when and in what order my code is executed. There is very little syntactic sugar that obscures the execution order; however, using an unordered collection (dict) to discuss the order that data will be added to an order-aware collection seems very confusing. We should first look at what the language already provides us for defining this. One thing that might make the association lists more readable without changing the language would be to visually break up the pairs over multiple lines. This could change the `OrderedDict` construction to look like: OrderedDict([ (k0, v0), (k1, v1), (kn, vn), ]) This makes the k: v mapping much more clear. Another option, which we have used in the library 'datashape', is to make the class itself subscriptable: R['a':int32, 'b':float64, 'c': string] or if we were to write it like the above example: R[ 'a':int32, 'b':float64, 'c':string, ] This might look like custom syntax; however, this is just using `__getitem__`, `tuple` literals, and `slice` literals in an interesting way to look sort of like a dictionary. One nice property of this syntax is that because readers know that they are creating a tuple, the fact that this is an order-preserving operation is very clear. This code is semantically equivalent to normal numpy code like: `my_array[idx_0_start:idx_0_end, idx_1_start:idx_1_end, idx_2_start:idx_2_end]` Here `R` is a an alias for the class `Record` where `R is Record`. This class has a metaclass that adds a `__getitem__`. This getitem looks for either a single slice or a tuple of slices and then checks the values to make sure they are all valid inputs. This is used to internally construct an `OrderedDict` We hadn't considered the comprehension case; however, you could dispatch the `__getitem__` on a generator that yields tuples to simulate the comprehension. This could look like: R[(k, v) for k, v in other_seq] where inside our `__getitem__` we would add a case like: if isinstance(key, types.GeneratorType): mapping = OrderedDict(key) If you would like to see like to see a code example of the implementation for the `Record` type it is available under the BSD license here: https://github.com/blaze/datashape/blob/master/datashape/coretypes.py#L968 There is another library that I have worked on that adds the ability to overload all of the literals, including dictionaries. This requires CPython production setting. I am merely mentioning this to show another possible syntax for this. @ordereddict_literals def f(): return {'a': 1, 'b': 2, 'c': 3}
f() OrderedDict([('a', 1), ('b', 2), ('c', 3)])
This code is available under the GPLv2 license here: https://github.com/llllllllll/codetransformer/blob/master/codetransformer/tr... On Sat, Dec 12, 2015 at 7:13 PM, Jelte Fennema <me@jeltef.nl> wrote:

On Sat, Dec 12, 2015 at 4:53 PM, Joseph Jevnik <joejev@gmail.com> wrote:
Good thing you recommended against its use :-), because we can't use GPL contributions for Python. Quoting the Python license: "All Python licenses, unlike the GPL, let you distribute a modified version without making your changes open source." (https://docs.python.org/3/license.html . -- --Guido van Rossum (python.org/~guido)

On Dec 12, 2015, at 08:25 PM, Joseph Jevnik wrote:
I realize that the python license is incompatible with the GPL.
Not technically correct, and not what Guido is saying. The current PSF license is indeed compatible with the GPL: https://www.gnu.org/licenses/license-list.html#GPLCompatibleLicenses it's just that the PSF cannot accept contributions licensed under the terms of the GPL. https://www.python.org/psf/contrib/ Cheers, -Barry

On 13 December 2015 at 00:53, Joseph Jevnik <joejev@gmail.com> wrote:
This is (IMO) readable, but a bit heavy on punctuation. The OP suggested OrderedDict{1: 'a', 4: int, 2: (3, 3)} as a syntax - while it's a bit too special case on its own, one possibility would be to have callable{k1: v1, k2: v2, ...} be syntactic sugar for callable([(k1, k1), (k2, v2), ...]) Then the syntax would work with any function or constructor that took "list of key/value pairs" as an argument. Points against this suggestion, however: 1. It's not clear to me if this would be parseable within the constraints of the Python language parser. 2. It is *only* syntax sugar, and as such adds no extra expressiveness to the language. 3. It's still pretty specialised - while the "list of key/value pairs" pattern is not uncommon, it's not exactly common, either... Paul

On 2015-12-15 11:02, Paul Moore wrote:
Very interesting... I've faced the issue several times over the years when I've wanted to unpack values into a function call in an ordered manner (but it hasn't been available). Perhaps:: callable{**odict} In fact with callables I'd even go so far as wish that ordered unpacking was the default somehow, though I guess that probably isn't possible due to history. So, would be happy to have a way to do it. The syntax looks slightly odd but I could get used to it. I found this on the subject, don't know its status: http://legacy.python.org/dev/peps/pep-0468/ -- -Mike

On Wed, Dec 16, 2015 at 10:47 AM, Mike Miller <python-ideas@mgmiller.net> wrote:
That's not so clear, actually! It turns out that PyPy was able to make their regular 'dict' implementation ordered, while at the same time making it faster and more memory-efficient compared to their previous (CPython-like) implementation: http://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.h... So in PyPy all these issues are automatically solved for free. The $1e6-question these other proposals have to answer is, why not do what PyPy did? Maybe there is a good reason not to, but it seems like it'll be difficult to get consensus on moving forward on any of these other more complicated proposals until someone has first made a serious attempt at porting PyPy's dict to CPython and is able to clearly describe why it didn't work. (3.5 does have a faster C implementation of OrderedDict, thanks to tireless efforts by Eric Snow -- https://bugs.python.org/issue16991 -- but this implementation uses a very different and less cache-efficient strategy than PyPy.) -n -- Nathaniel J. Smith -- http://vorpus.org

For Jython, ordered dict semantics for the dict type *could* possibly work. Currently, dict objects are backed by java.util.concurrent.ConcurrentHashMap, to get correct semantics with respect to possible del when iterating over the dict; and to provide volatile memory semantics, to match CPython's memory model, informal as it may be. Using CHM also likely helps with Jython's threading story. Note that we can not simply use a java.util.LinkedHashMap that has been wrapped with java.util.collections.synchronizedMap; at the very least we would get this behavior: The iterators returned by the iterator method of the collections returned
(http://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashMap.html) But there is an alternative: Caffeine is an interesting project that has matured significantly since when I took a look at it last year ( https://github.com/ben-manes/caffeine). Caffeine implements a generally useful concurrent linked hash map which provides the necessary weak iteration semantics we need for Python compatibility; and it looks like Caffeine may have performance comparable to, or better than CHM (but not clear if that extends to map construction, currently a pretty heavy cost for Jython). Caffeine also builds on the implementation experience of Google Guava, which Jython currently uses extensively for internal runtime caches. So it's certainly worth exploring if this possible change for Python gets further interest - we will want to benchmark and really understand because dict/__dict__ support is one of the most critical aspects of good Python performance. - Jim On Wed, Dec 16, 2015 at 4:17 PM, Nathaniel Smith <njs@pobox.com> wrote:

On Dec 16, 2015, at 15:17, Nathaniel Smith <njs@pobox.com> wrote:
You don't even need that; a dict that's ordered as long as you never delete from it or expand it after initial creation is good enough, and that may be simpler. (For example, Raymond Hettinger's two-table prototype guarantees this much, even though it isn't order-preserving on deletes, and it should be faster and more compact than the current design, although I don't know if anyone's proven that part, and it's the "dead-simple once you see it" kind of brilliant rather than the deep-magic kind.) The bigger problem is that any other Python implementation that uses some native (Java, .NET, JS, ...) structure to back its Python dicts will obviously need to either change to a different one or use a two-table structure like Raymond's--which may be a pessimization rather than an optimization and/or may break whatever thread-safety guarantees they were relying on. So, you'd need to make sure there's a good answer for every major implementation out there before changing the language to require them all to do it. Of course we'd also need to require that **kw unpacking happens in iteration order, and **kw collecting collects the keywords in the order passed, but both of those should be easy. (They may still be changes--an implementation might, say, reverse the order for a slight simplification or something--but they should be trivial compare to finding an order-preserving-at-least-until-mutation structure.) But assuming those are all reasonable, it seems like the easiest solution to the problem.

On Wed, Dec 16, 2015 at 9:12 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
IIUC, the PyPy dict is exactly a fully-worked-out version of Raymond Hettinger's two-table design, and they claim that it is in fact faster and more compact than the current design, so I suppose one could argue that someone has indeed proven that part :-). -n -- Nathaniel J. Smith -- http://vorpus.org

On Wednesday, December 16, 2015 9:36 PM, Nathaniel Smith <njs@pobox.com> wrote:
According the blog post, some cases are slower, "for example when repeatedly adding and removing keys in equal number". For PyPy, that's obviously fine. But PyPy generally isn't worried about small performance regressions between PyPy versions for relatively uncommon edge cases, especially if the new code is still much faster than CPython, and when it enables "various optimization possibilities which we're going to explore in the near future", and so on. I don't know if the same applies for CPython. So it may be better to stick with Raymond's original design, which should be faster than 3.5 in all cases, not just most, and require less and simpler code, and still provide the guarantee we actually need here (which will hopefully be an easier requirement on the other implementations as well). As an aside, IIRC, the rejected "blist" proposal from a few years ago sped up every benchmark, and also provided all the guarantees we need here. (I think it used a flat a-list for tiny dicts, a variant B-tree for medium-sized dicts and all non-tiny literals, and a hash when you resize a dict beyond a certain cutoff, or something like that.) It's obviously more complicated than the Raymond design or the PyPy design, and was presumably rejected for a good reason, but it's more evidence that the requirement may not be too strenuous.

On Wed, Dec 16, 2015 at 3:17 PM, Nathaniel Smith <njs@pobox.com> wrote:
If I recall correctly, the PyPy implementation is either based on a proposal by Raymond Hettinger [1] or on the same concept. Note that I mention the approach as an alternative in PEP 468. -eric [1] original: https://mail.python.org/pipermail/python-dev/2012-December/123028.html revisited: https://mail.python.org/pipermail/python-dev/2013-May/126327.html

On Sun, Dec 13, 2015 at 01:13:49AM +0100, Jelte Fennema wrote:
You have a rather strict view of "unreadable" :-) Some alternatives if you dislike the look of the above: # Option 3: d = OrderedDict() for key, value in zip([1, 4, 2], ['a', int, (3, 3)]): d[key] = value # Option 4: d = OrderedDict([(1, 'a'), (4, int)]) # The pretty values. d[2] = (3, 3) # The ugly nested tuple at the end. So there's no shortage of work-arounds for the lack of nice syntax for creating ordered dicts. And besides, why single out OrderedDict? There are surely people out there using ordered mappings like RedBlackTree that have to deal with this same issue. Perhaps what we need is to stop focusing on a specific dictionary type, and think about the lowest common denominator for any mapping. And that, I believe, is a list of (key, value) tuples. See below.
I don't understand what that syntax is supposed to do. Obviously it creates an OrderedDict, but you haven't explained the details. Is the prefix "OrderedDict" hard-coded in the parser/lexer, like the b prefix for byte-strings and r prefix for raw strings? In that case, I think that's pretty verbose, and would prefer to see something shorter: o{1: 'a', 4: int, 2: (3, 3)} perhaps. If OrderedDict is important enough to get its own syntax, it's important enough to get its own *short* syntax. That was my preferred solution, but it no longer is. Or is the prefix "OrderedDict" somehow looked-up at run-time? So we could write something like: spam = random.choice(list_of_callables) result = spam{1: 'a', 4: int, 2: (3, 3)} and spam would be called, whatever it happens to be, with a single list argument: [(1, 'a'), (4, int), (2, (3, 3))] What puts me off this solution is that it is syntactic sugar for not one but two distinct operations: - sugar for creating a list of tuples; - and sugar for a function call. But if we had the first, we don't need the second, and we don't need to treat OrderedDict as a special case. We could use any mapping: MyDict(sugar) OrderedDict(sugar) BinaryTree(sugar) and functions that aren't mappings at all, but expect lists of (a, b) tuples: covariance(sugar)
What does that actually do, in detail? Does it call d.__getitem__(key, value) repeatedly? So I could do something like this: L = [None]*10 L{1: 'a', 3: 'b', 5: 'c', 7: 'd', 9: 'e'} assert L == [None, 'a', None, 'b', None, 'c', None, 'd', None, 'e'] If we had nice syntax for creating ordered dict literals, would we want this feature? I don't think so. It must be pretty rare to want something like that (at least, I can't remember the last time I did) and when we do, we can often do it with slicing: py> L = [None]*10 py> L[1::2] = 'abcde' py> L [None, 'a', None, 'b', None, 'c', None, 'd', None, 'e']
This would allow arguments to the __init__ method as well.
How? You said that this option was only available after initialization.
And this way it could simply be a shorthand for setting multiple attributes.
How does the reader (or the interpreter) tell when d{key: value} means "call __setitem__" and when it means "call __setattr__"?
I think that there is a kernel of a good idea in this. Let's go back to the idea of syntactic sugar for a list of tuples. The user can then call the function or class of their choice, they aren't limited to just one mapping type. I'm going to suggest [key:value] as syntax. Now your original example becomes: d = OrderedDict([1: 'a', 4: int, 2: (3, 3)]) which breaks up the triple ))) at the end, so hopefully you will not think its ugly. Also, we're not limited to just calling the constructor, it could be any method: d.update([2: None, 1: 'b', 5: 99.9]) or anywhere at all: x = [2: None, 1: 'b', 5: 99.9, 1: 'a', 4: int, 2: (3, 3)] + items # shorter and less error-prone than: x = ( [(2, None), (1, 'b'), (5, 99.9), (1, 'a'), (4, int), (2, (3, 3))] + values ) There could be a comprehension form: [key: value for x in seq if condition] similar to the dict comprehension form. -- Steve

On Dec 12, 2015, at 19:24, Steven D'Aprano <steve@pearwood.info> wrote:
This does seem to be the obvious syntax: if [1, 2, 3] is a list and {1, 2, 3} is a set, and {1: 2, 3: 4} is a dict, then [1: 2, 3: 4] should be something that bears the same relationship to dict as list does to set: an a-list. (And we don't even have the {} ambiguity problem with [], because an a-list is the same type as a list, and no pairs is the same value as no elements.) And I think there's some precedent here. IIRC, in YAML, {1:2, 3:4} is unordered dict a la JSON (and Python), but [1:2, 3:4] is... actually, I think it's ambiguous between an ordered dict and a list of pairs, and you can resolve that by declaring !odict or !seq, or you can just leave it up to the implementation to pick one if you don't care... but let's pretend it wasn't ambiguous; either one covers the use case (and Python only has the latter option anyway, unless OrderedDict becomes a builtin). And it's definitely readable in your examples.

On Sun, Dec 13, 2015 at 12:15 AM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
And I think there's some precedent here. IIRC, in YAML, {1:2, 3:4} is unordered dict a la JSON (and Python), but [1:2, 3:4] is... actually, I think it's ambiguous between an ordered dict and a list of pairs, and you can resolve that by declaring !odict or !seq, or you can just leave it up to the implementation to pick one if you don't care... but let's pretend it wasn't ambiguous; either one covers the use case (and Python only has the latter option anyway, unless OrderedDict becomes a builtin).
For YAML, I read it as a list of dicts. My Python's yaml module (pyyaml?) agrees.
However, YAML's website (page: http://www.yaml.org/refcard.html) lists the !!omap type cast as using this syntax: '!!omap': [ one: 1, two: 2 ] I tried using !!seq. Not sure if I'm doing it right.
(Huh. PyYAML module thinks that an omap should be a list of pairs. It might eventually change to OrderedDict, though.)

On Tuesday, December 15, 2015 4:24 AM, Franklin? Lee <leewangzhong+python@gmail.com> wrote:
According to the YAML 1.1 (and 1.2, since AFAICR they never created the separate repo for 1.2) type-repo omap spec (http://yaml.org/type/omap.html):
Most programming languages do not have a built-in native data type for supporting ordered maps. Such data types are usually provided by libraries. If no such data type is available, an application may resort to loading an “!!omap” into a native array of hash tables containing one key each.
The “!!omap” tag may be given explicitly. Alternatively, the application may choose to implicitly type a sequence of single-key mappings to ordered maps. In this case, an explicit “!seq” transfer must be given to sequences of single-key mappings that do not represent ordered maps.
So, if you don't specify either "!!omap" or "!seq", it's up to your implementation whether you've designated an ordered dict or a list of dicts. I was wrong on some of the details--it's "!!omap" rather than "!odict", and it's a list of one-element dicts rather than a list of pairs, and it sounds like even if you _do_ explicitly specify one type the implementation is allowed to give you the other... But still, as I said, YAML is precedent for Python interpreting ['one': 1, 'two': 2] as OrderedDict([('one', 1), ('two', 2)]). Of course it's also precedent for Python interpreting that as a list of dicts [{'one': 1}, {'two': 2}], and I don't think anyone wants that... I suppose you can find precedent for almost any idea , no matter how silly, as long as it's implementable. :)
import yaml
Ha, so I was wrong about YAML allowing that, but PyYAML does it anyway?

I think you are right in suggesting that the whole problem goes away when there is a nice way to specify list of tuples. Since I indeed cannot think of a moment where my previous syntax cannot be replaced by this one. Another option would be that this syntax would not represent a list of tuples but an OrderedDict. I think they both have their advantages, the list of tuples would allow list operations such as `+ items` as you suggested and usage of the same keys multiple times. But OrderedDict would allow simple indexing directly. But it does not really matter that much since both could easily be used to generate the other. OrderedDict(['1':'2']) and list(['1':'2'].items()) respectively. I think the main case for the list of tuples is actually that you can make any OrderedDict from a list of tuples, but not the other way around, since duplicate keys would be removed. Which is why I like your idea for a shorthand for a list of tuples better, since it covers more uses. One important thing to note is the discussion I already mentioned in my first email. Especially this message where guide votes -100 for your syntax for OrderedDict creation: https://mail.python.org/pipermail/python-ideas/2009-June/004924.html I'm not sure why he disliked that syntax and if he still does. Or if his thoughts are different when it would represent a list of tuples instead of an OrderedDict. On 13 December 2015 at 04:24, Steven D'Aprano <steve@pearwood.info> wrote:

On Sun, Dec 13, 2015 at 5:00 AM, Jelte Fennema <me@jeltef.nl> wrote:
I also wonder why he doesn't like it. I wouldn't like it if it represented a list of tuples. What we have is: - [a, b, c] -> list = [ordered, mutable, collection] - {a, b, c} -> set = [unordered, mutable, collection, uniquekeys] - {a:x, b:y, c:z} -> dict = [unordered, mutable, mapping, uniquekeys] - (a, b, c) -> tuple = [ordered, immutable, collection] It seems to me that the pattern would extend to: - [a:x, b:y, c:z] -> [ordered, mutable, mapping] - (a:x, b:y, c:z) -> [ordered, immutable, mapping] The first one is ALMOST OrderedDict, except that it has unique keys. The second is ALMOST namedtuple, except that it: - doesn't allow duplicate keys - doesn't allow indexing by keys (though this can change) - doesn't allow arbitrary key type (and we don't want ints allowed as keys) - needs a name and a type If we add the rule that a mapping literal's type should have unique keys, then [a:x] -> OrderedDict fits the pattern. But [a:x] => [(a,x)] doesn't.

After thinking some more, I think you are right in saying that it would make more sense to let it represent an OrderedDict directly. Mostly because the mutability suggested by the square brackets. And also a bit because I'm not sure when a mapping that maps multiple values to the same key is actually useful. Secondly, I think your idea for namedtuple literals is great. This would be really useful in the namedtuple use case where you want to return multiple values from a function, but you want to be clear in what these values actually are. I think this would need to generate some kind of anonymous named tuple class though, since it would make no sense to have to create a new class when using a literal like this. I would really like to hear Guido's response to these ideas. Since he disliked the idea so much in the past and I can't find a reference to his reasoning. Jelte PS. Seeing as we're cleary drifting from the original topic of this thread, would it be a good idea if a new one would be created with these new ideas in mind? On 15 December 2015 at 12:49, Franklin? Lee <leewangzhong+python@gmail.com> wrote:

On Wed, Dec 16, 2015 at 12:08 AM, Jelte Fennema <me@jeltef.nl> wrote:
Be careful of this trap, though:
Dictionary-like things iterate over their keys; tuple-like things iterate over their values. (And a list of pairs would effectively iterate over items().) Having extremely similar syntax for creating them might well lead to a lot of confusion on that point. ChrisA

I see your point, but it would almost never make sense to list the attributes of a namedtuple. As this would be the only one that could cause confusion (since OrderedDict would do the same as dict), I doubt it would actually cause that much confusion. Jelt On 15 December 2015 at 14:19, Chris Angelico <rosuav@gmail.com> wrote:

On Tue, Dec 15, 2015 at 8:08 AM, Jelte Fennema <me@jeltef.nl> wrote:
Whoa whoa whoa. I wasn't suggesting a namedtuple literal. Let's not bring down the wrath of the gods. (Also, it's come up before (https://mail.python.org/pipermail/python-ideas/2014-April/027434.html) and I was part of the discussion (https://mail.python.org/pipermail/python-ideas/2014-April/027602.html). So it's not my idea.) (And it shouldn't be namedtuple, exactly, since namedtuple is a metaclass which generates classes with names. Attrtuple? For performance, it would map [keylist] => attrtuple[keylist]. Earlier discussion here: https://mail.python.org/pipermail/python-ideas/2013-June/021277.html)

One more point I don't think anyone's brought up yet with the [k: v] syntax: Just as there's no empty set literal because it would be ambiguous with the empty dict, there would be no empty OrderedDict literal because it would be ambiguous with the empty list. The fact that the exact same ambiguity is resolved in opposite directions (which only makes sense if you know the history--dict preceded set in the language, but list preceded OrderedDict) makes it doubly irregular. Of course we could introduce [:] as an empty OrderedDict literal, and {:} as an empty dict. But unless you actually wanted to deprecate {} for empty dict and eventually make it mean empty set (which I doubt anyone would argue for), that just gives us two ways to do it instead of solving the problem. Meanwhile: On Tuesday, December 15, 2015 5:09 AM, Jelte Fennema <me@jeltef.nl> wrote:
After thinking some more, I think you are right in saying that it would make more sense to let it represent an OrderedDict directly. Mostly because the mutability suggested by the square brackets. And also a bit because I'm not sure when a mapping that maps multiple values to the same key is actually useful.
Well, multidicts are actually useful pretty often--but in Python, they're usually spelled defaultdict(set) or defaultdict(list). After all, you need some syntax to look up (and modify and delete) values. In a dict that directly has multiple values per key, there's no way to specify which one you want, but in a dict that explicitly stores those multiple values as a set or list, it's just d[key] to get or delete that set or list, and d[key].add to add a value, and so on. I think Franklin's point was that a list of pairs is _most often_ used as a mapping initializer, but can mean other things as well, some of which might have a need for duplicate keys. For example, a stats package might take a mapping or iterable-of-pairs (the same type the dict constructor takes) for a collection of timestamped data points, and it's perfectly reasonable for two measurements to have the same timestamp in some datasets, but not in others. If the syntax defines an OrderedDict, it can't be used for the first kind of dataset. As for your mutability point: there's no reason it couldn't be a list of 2-lists instead of a list of 2-tuples. Sure, that will take a little more space in most implementations, but that rarely matters for literals--an object that's big enough in memory that you start to worry about compactness is probably way too big to put in a source file. (And if it _does_ matter, there's no reason CPython couldn't have special code that initializes the lists constructed by [:] literals with capacity 2 instead of the normal minimum capacity, on the expectation that you're not likely to go appending to all of the elements of a list constructed that way, and if you really want to, it's probably clearer to write it with [[]] syntax.)
Secondly, I think your idea for namedtuple literals is great. This would be really useful in the namedtuple use case where you want to return multiple values from a function, but you want to be clear in what these values actually are. I think this would need to generate some kind of anonymous named tuple class though, since it would make no sense to have to create a new class when using a literal like this.
First, why would it make no sense to create a new class? This isn't a prototype language; if the attributes are part of the object's type, then that type has to exist, and be accessible as a class. (You could cache the types, so that any two object literals with the same attributes have the same type, but that doesn't really change anything.) If you really want to avoid generating a new type, the type has to have normal-Python dynamic-per-object attributes, like SimpleNamespace, not class-specified attributes. I don't think that's an argument for (:) literals creating new classes, so much as an argument against them producing anything remotely like a namedtuple. None of the existing collection literals produce anything with attributes; namedtuple values can't be looked up by key (as in a mapping), only by attribute (as in SimpleNamespace) or index (as in a sequence); namedtuples don't iterate their keys (like a mapping) but their values (like a sequence)... So that's a pretty bad analogy with dict and OrderedDict in almost every way. Also, the syntax looks enough like general object literals in other languages like JavaScript that it will probably mislead people. (Or, worse, they'd actually be right--you could use (:) and lambda together to create some very unpythonic code, and people coming from JS would be very tempted to do so. The fact that you have to explicitly call type to do that today is enough to prevent that from being an attractive nuisance.) If you look at all the other collection literals (including the proposed [:] for OrderedDict) and try to guess what (:) does by analogy, the obvious answer would be a FrozenOrderedDict. Since frozen dicts in general aren't even useful enough to be in the stdlib, much less as builtins, much less ordered ones, I can't imagine they need literals. But a literal that looks like it should mean that, and instead means something completely different, is at best a new and clunky thing that has to be memorized separately from the rest of the syntax.

On Tue, Dec 15, 2015 at 1:36 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
One more point I don't think anyone's brought up yet with the [k: v] syntax: Just as there's no empty set literal because it would be ambiguous with the empty dict, there would be no empty OrderedDict literal because it would be ambiguous with the empty list. The fact that the exact same ambiguity is resolved in opposite directions (which only makes sense if you know the history--dict preceded set in the language, but list preceded OrderedDict) makes it doubly irregular.
I regularly write {} for an empty set and have to fix it later. It looks like math! On the other hand, I don't think anyone will learn OrderedDict before becoming VERY familiar with lists. If you try to treat a list as an empty OrderedDict, at least you will fail as soon as you use it. `empty_list_that_i_think_is_an_ordered_dict[0] = 5` will raise an error.
I think Franklin's point was that a list of pairs is _most often_ used as a mapping initializer, but can mean other things as well, some of which might have a need for duplicate keys. For example, a stats package might take a mapping or iterable-of-pairs (the same type the dict constructor takes) for a collection of timestamped data points, and it's perfectly reasonable for two measurements to have the same timestamp in some datasets, but not in others. If the syntax defines an OrderedDict, it can't be used for the first kind of dataset.
No, when I thought about it while writing the email, I was okay with this syntax NOT having multiple values per key, because I don't think it's a very basic data structure to think about (there's no builtin analogous to it), and the indexing syntax wouldn't be like dict's (indexing gets you a collection of values). If you really wanted an ordered multidict, you could write `[k1: [1,2,3], k2: [4,5], k3: [6]]`, or use set displays or tuple displays instead of list displays. In fact, that just shows how `d = [k1: x, k2: y, k1: z]` as an OrderedMultiDict would be ambiguous: Is d[k1] a list, a set, or a tuple? Does k1 come before or after k2?

I'm still against it. Despite its popularity in certain circles and use cases, OrderedDict is way less important than dict. Having too many variants on these display notations is confusing for many users. On Tue, Dec 15, 2015 at 3:49 AM, Franklin? Lee < leewangzhong+python@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 15.12.2015 12:49, Franklin? Lee wrote:
How would the parser be able to detect these ? Since requests to be able to access the order of values, parameters and definitions in source code come up rather often, perhaps it'd better to provide Python application with a standard access mechanism to this order rather than trying to push use of OrderedDict and the like into the runtime, causing unnecessary performance overhead. The parser does have access to this information in the AST and some of it is partially copied into code object attributes, but there's no general purpose access to the information. Based on the source code order, you could do lots of things, e.g. avoid hacks to map class attributes to column definitions for ORMs, make it possible to write OrderedDict(a=x, b=y) and have the literal order preserved, have NamedTuple(a=x, b=y) work without additional tricks, etc. What's important here is that the runtime performance would not change. The code objects would gain some additional tuples, which store the order of the literals used in their AST, so only the memory consumption would increase. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Dec 16 2015)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On Wed, Dec 16, 2015 at 2:50 AM, M.-A. Lemburg <mal@egenix.com> wrote:
+1 from me. That's essentially the goal of PEP 468. [1] While the proposed solution focuses on OrderedDict, note the various alternatives at the bottom of the PEP. Also note that OrderedDict now has a C implementation that doesn't suffer from the same performance penalty. [2] -eric [1] http://legacy.python.org/dev/peps/pep-0468/ [2] OrderedDict is actually faster for iteration, the same speed for other non-mutation operations, not much slower for most mutation operations, and 4x slower in the worst case. That is a drastic improvement over the pure Python OrderedDict.

On Dec 17, 2015, at 10:37, Eric Snow <ericsnowcurrently@gmail.com> wrote:
It seems like whatever the resolution of this discussion (unless it's "people mostly agree that X would be acceptable, doable, and probably desirable, but nobody's going to do it") you may want to rewrite and repush PEP 468. After all, if Python 3.6 changes to make all dicts ordered, or make dict literals preserve literal order at least until first mutation (or anything in between--as Bruce Leban pointed out, there are at least three sensible things in between rather than just the one I suggested, and of courseall of them are good enough here), or to stash AST links on code objects, etc., you should be able to show how PEP 468 only adds a trivial requirement to any implementation of Python 3.6, and then the concerns come down to "maybe on some implementations it would be slightly faster to construct the kwdict in reverse, but now it will have to not do that". Conversely if someone convinces everyone that there is no solution that works for dict literals, you'd probably need to explain why that same problem doesn't apply to kwargs to keep the PEP alive.
IIRC, the one concern of Guido's that you couldn't answer was that if someone keeps the kwdict and adds to it, he could end up wasting a lot of space, not just time. If OrderedDict is still 150-200% bigger than dict, as in the pure Python version, that's still a problem.

On Thu, Dec 17, 2015 at 11:07 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
It seems like whatever the resolution of this discussion (unless it's "people mostly agree that X would be acceptable, doable, and probably desirable, but nobody's going to do it") you may want to rewrite and repush PEP 468.
Agreed. It's just a simple matter of finding time. <wink>
IIRC, the one concern of Guido's that you couldn't answer was that if someone keeps the kwdict and adds to it, he could end up wasting a lot of space, not just time. If OrderedDict is still 150-200% bigger than dict, as in the pure Python version, that's still a problem.
Yeah, he said something like that. However, with the C implementation the memory usage is less. Compared to dict: * basically 8 extra pointers on the object [1] * an instance __dict__ * an array of pointers equal in length to the underlying dict's hash table * basically 4 pointers per item So, per the __sizeof__ method, an empty C OrderedDict uses 800 bytes in contrast to 280 bytes for dict. Each added item uses an additional 24 bytes. With 1000 items usage is 122720 bytes (compared to 49240 bytes for dict). With the pure Python OrderedDict, empty is 824 bytes, each item uses 152 bytes, and with 1000 items usage is 250744 bytes. -eric [1] https://hg.python.org/cpython/file/default/Objects/odictobject.c#l481

I care about readability but I find: d = OrderedDict() for key, value in zip([1, 4, 2], ['a', int, (3, 3)]): d[key] = value quite readable. Laura

This is indeed another option, but it decouples the keys from the values. Which makes it harder to see with a quick look what key will have what value, you will have to count the keys. This feels to me like a serious drawback of this method since knowing what value a key will have is pretty important when talking about initializing dictionaries. On 13 December 2015 at 22:10, Cody Piersall <cody.piersall@gmail.com> wrote:

Hello, Currently expression (a=1, b=2) is a syntax error. If it's defined to mean (('a',1), ('b',2)) it can be used when making OrderedDict or anything that requires named ordered args e.g. OrderedDict((a=1, b=2)) another variant with more changes in VM is OrderedDict(**(a=1, b=2)) Niki

Getting back to the original issue of being able to initialize an OrderedDict with a literal syntax, one proposal is that all dicts are ordered. There are actually several intermediate steps between dicts are unordered and ordered and here are some: (1) Dict literals preserve order of the original keys. When a literal is iterated, its keys are returned in declaration order. If the dict is changed in any way, then no guarantee is made about iteration order. (2) Dict literals preserve order of the original keys. This order is preserved even if values are changed as long as no keys are added or deleted. If a key is added or removed, then no guarantee is made about iteration order. (3) Dict literals preserve order of the original keys. This order is preserved for all keys originally added but new keys may be returned in any order, even interspersed between original keys. The order of the new keys is not stable in that it may change anytime the dict is changed in any way. (4) Dict literals are really ordered dicts. For the case of initialization, all of the above make OrderedDict{k1: v1, k2: v2, ...}) work. If that use case is the primary motivator here, then just implementing option (1) seems sufficient. --- Bruce Check out my puzzle book and get it free here: http://J.mp/ingToConclusionsFree (available on iOS)

On Dec 12, 2015, at 18:13, Jelte Fennema <me@jeltef.nl> wrote:
I really like the OrderedDict class. But there is one thing that has always bothered me about it. Quite often I want to initialize a small ordered dict. When the keys are all strings this is pretty easy, since you can just use the keyword arguments.
I don't think this will work. Python uses a dict to pass in kwargs, so you've already lost ordering at that point, if I'm right.
My first preference would be for the proposals for dict to become ordered by default to come to fruition. If that turns out to be unacceptable (even though, as I understand it, PyPy has already been able to do this), then this looks pretty good to me. It's and alternate call operator that uses dict-like syntax to pass in a tuple of pairs.
IMO, this is strictly worse than the other option. It's confusing that the call operator wouldn't happen before this dict-like call. Either would be unneeded if dict was ordered by default, and it's far and way the best option, IMO.

= 3.4 though and is for fun only so I would not recommend using this in a
One of my favorite things about python is that it is always very clear when and in what order my code is executed. There is very little syntactic sugar that obscures the execution order; however, using an unordered collection (dict) to discuss the order that data will be added to an order-aware collection seems very confusing. We should first look at what the language already provides us for defining this. One thing that might make the association lists more readable without changing the language would be to visually break up the pairs over multiple lines. This could change the `OrderedDict` construction to look like: OrderedDict([ (k0, v0), (k1, v1), (kn, vn), ]) This makes the k: v mapping much more clear. Another option, which we have used in the library 'datashape', is to make the class itself subscriptable: R['a':int32, 'b':float64, 'c': string] or if we were to write it like the above example: R[ 'a':int32, 'b':float64, 'c':string, ] This might look like custom syntax; however, this is just using `__getitem__`, `tuple` literals, and `slice` literals in an interesting way to look sort of like a dictionary. One nice property of this syntax is that because readers know that they are creating a tuple, the fact that this is an order-preserving operation is very clear. This code is semantically equivalent to normal numpy code like: `my_array[idx_0_start:idx_0_end, idx_1_start:idx_1_end, idx_2_start:idx_2_end]` Here `R` is a an alias for the class `Record` where `R is Record`. This class has a metaclass that adds a `__getitem__`. This getitem looks for either a single slice or a tuple of slices and then checks the values to make sure they are all valid inputs. This is used to internally construct an `OrderedDict` We hadn't considered the comprehension case; however, you could dispatch the `__getitem__` on a generator that yields tuples to simulate the comprehension. This could look like: R[(k, v) for k, v in other_seq] where inside our `__getitem__` we would add a case like: if isinstance(key, types.GeneratorType): mapping = OrderedDict(key) If you would like to see like to see a code example of the implementation for the `Record` type it is available under the BSD license here: https://github.com/blaze/datashape/blob/master/datashape/coretypes.py#L968 There is another library that I have worked on that adds the ability to overload all of the literals, including dictionaries. This requires CPython production setting. I am merely mentioning this to show another possible syntax for this. @ordereddict_literals def f(): return {'a': 1, 'b': 2, 'c': 3}
f() OrderedDict([('a', 1), ('b', 2), ('c', 3)])
This code is available under the GPLv2 license here: https://github.com/llllllllll/codetransformer/blob/master/codetransformer/tr... On Sat, Dec 12, 2015 at 7:13 PM, Jelte Fennema <me@jeltef.nl> wrote:

On Sat, Dec 12, 2015 at 4:53 PM, Joseph Jevnik <joejev@gmail.com> wrote:
Good thing you recommended against its use :-), because we can't use GPL contributions for Python. Quoting the Python license: "All Python licenses, unlike the GPL, let you distribute a modified version without making your changes open source." (https://docs.python.org/3/license.html . -- --Guido van Rossum (python.org/~guido)

On Dec 12, 2015, at 08:25 PM, Joseph Jevnik wrote:
I realize that the python license is incompatible with the GPL.
Not technically correct, and not what Guido is saying. The current PSF license is indeed compatible with the GPL: https://www.gnu.org/licenses/license-list.html#GPLCompatibleLicenses it's just that the PSF cannot accept contributions licensed under the terms of the GPL. https://www.python.org/psf/contrib/ Cheers, -Barry

On 13 December 2015 at 00:53, Joseph Jevnik <joejev@gmail.com> wrote:
This is (IMO) readable, but a bit heavy on punctuation. The OP suggested OrderedDict{1: 'a', 4: int, 2: (3, 3)} as a syntax - while it's a bit too special case on its own, one possibility would be to have callable{k1: v1, k2: v2, ...} be syntactic sugar for callable([(k1, k1), (k2, v2), ...]) Then the syntax would work with any function or constructor that took "list of key/value pairs" as an argument. Points against this suggestion, however: 1. It's not clear to me if this would be parseable within the constraints of the Python language parser. 2. It is *only* syntax sugar, and as such adds no extra expressiveness to the language. 3. It's still pretty specialised - while the "list of key/value pairs" pattern is not uncommon, it's not exactly common, either... Paul

On 2015-12-15 11:02, Paul Moore wrote:
Very interesting... I've faced the issue several times over the years when I've wanted to unpack values into a function call in an ordered manner (but it hasn't been available). Perhaps:: callable{**odict} In fact with callables I'd even go so far as wish that ordered unpacking was the default somehow, though I guess that probably isn't possible due to history. So, would be happy to have a way to do it. The syntax looks slightly odd but I could get used to it. I found this on the subject, don't know its status: http://legacy.python.org/dev/peps/pep-0468/ -- -Mike

On Wed, Dec 16, 2015 at 10:47 AM, Mike Miller <python-ideas@mgmiller.net> wrote:
That's not so clear, actually! It turns out that PyPy was able to make their regular 'dict' implementation ordered, while at the same time making it faster and more memory-efficient compared to their previous (CPython-like) implementation: http://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.h... So in PyPy all these issues are automatically solved for free. The $1e6-question these other proposals have to answer is, why not do what PyPy did? Maybe there is a good reason not to, but it seems like it'll be difficult to get consensus on moving forward on any of these other more complicated proposals until someone has first made a serious attempt at porting PyPy's dict to CPython and is able to clearly describe why it didn't work. (3.5 does have a faster C implementation of OrderedDict, thanks to tireless efforts by Eric Snow -- https://bugs.python.org/issue16991 -- but this implementation uses a very different and less cache-efficient strategy than PyPy.) -n -- Nathaniel J. Smith -- http://vorpus.org

For Jython, ordered dict semantics for the dict type *could* possibly work. Currently, dict objects are backed by java.util.concurrent.ConcurrentHashMap, to get correct semantics with respect to possible del when iterating over the dict; and to provide volatile memory semantics, to match CPython's memory model, informal as it may be. Using CHM also likely helps with Jython's threading story. Note that we can not simply use a java.util.LinkedHashMap that has been wrapped with java.util.collections.synchronizedMap; at the very least we would get this behavior: The iterators returned by the iterator method of the collections returned
(http://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashMap.html) But there is an alternative: Caffeine is an interesting project that has matured significantly since when I took a look at it last year ( https://github.com/ben-manes/caffeine). Caffeine implements a generally useful concurrent linked hash map which provides the necessary weak iteration semantics we need for Python compatibility; and it looks like Caffeine may have performance comparable to, or better than CHM (but not clear if that extends to map construction, currently a pretty heavy cost for Jython). Caffeine also builds on the implementation experience of Google Guava, which Jython currently uses extensively for internal runtime caches. So it's certainly worth exploring if this possible change for Python gets further interest - we will want to benchmark and really understand because dict/__dict__ support is one of the most critical aspects of good Python performance. - Jim On Wed, Dec 16, 2015 at 4:17 PM, Nathaniel Smith <njs@pobox.com> wrote:

On Dec 16, 2015, at 15:17, Nathaniel Smith <njs@pobox.com> wrote:
You don't even need that; a dict that's ordered as long as you never delete from it or expand it after initial creation is good enough, and that may be simpler. (For example, Raymond Hettinger's two-table prototype guarantees this much, even though it isn't order-preserving on deletes, and it should be faster and more compact than the current design, although I don't know if anyone's proven that part, and it's the "dead-simple once you see it" kind of brilliant rather than the deep-magic kind.) The bigger problem is that any other Python implementation that uses some native (Java, .NET, JS, ...) structure to back its Python dicts will obviously need to either change to a different one or use a two-table structure like Raymond's--which may be a pessimization rather than an optimization and/or may break whatever thread-safety guarantees they were relying on. So, you'd need to make sure there's a good answer for every major implementation out there before changing the language to require them all to do it. Of course we'd also need to require that **kw unpacking happens in iteration order, and **kw collecting collects the keywords in the order passed, but both of those should be easy. (They may still be changes--an implementation might, say, reverse the order for a slight simplification or something--but they should be trivial compare to finding an order-preserving-at-least-until-mutation structure.) But assuming those are all reasonable, it seems like the easiest solution to the problem.

On Wed, Dec 16, 2015 at 9:12 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
IIUC, the PyPy dict is exactly a fully-worked-out version of Raymond Hettinger's two-table design, and they claim that it is in fact faster and more compact than the current design, so I suppose one could argue that someone has indeed proven that part :-). -n -- Nathaniel J. Smith -- http://vorpus.org

On Wednesday, December 16, 2015 9:36 PM, Nathaniel Smith <njs@pobox.com> wrote:
According the blog post, some cases are slower, "for example when repeatedly adding and removing keys in equal number". For PyPy, that's obviously fine. But PyPy generally isn't worried about small performance regressions between PyPy versions for relatively uncommon edge cases, especially if the new code is still much faster than CPython, and when it enables "various optimization possibilities which we're going to explore in the near future", and so on. I don't know if the same applies for CPython. So it may be better to stick with Raymond's original design, which should be faster than 3.5 in all cases, not just most, and require less and simpler code, and still provide the guarantee we actually need here (which will hopefully be an easier requirement on the other implementations as well). As an aside, IIRC, the rejected "blist" proposal from a few years ago sped up every benchmark, and also provided all the guarantees we need here. (I think it used a flat a-list for tiny dicts, a variant B-tree for medium-sized dicts and all non-tiny literals, and a hash when you resize a dict beyond a certain cutoff, or something like that.) It's obviously more complicated than the Raymond design or the PyPy design, and was presumably rejected for a good reason, but it's more evidence that the requirement may not be too strenuous.

On Wed, Dec 16, 2015 at 3:17 PM, Nathaniel Smith <njs@pobox.com> wrote:
If I recall correctly, the PyPy implementation is either based on a proposal by Raymond Hettinger [1] or on the same concept. Note that I mention the approach as an alternative in PEP 468. -eric [1] original: https://mail.python.org/pipermail/python-dev/2012-December/123028.html revisited: https://mail.python.org/pipermail/python-dev/2013-May/126327.html

On Sun, Dec 13, 2015 at 01:13:49AM +0100, Jelte Fennema wrote:
You have a rather strict view of "unreadable" :-) Some alternatives if you dislike the look of the above: # Option 3: d = OrderedDict() for key, value in zip([1, 4, 2], ['a', int, (3, 3)]): d[key] = value # Option 4: d = OrderedDict([(1, 'a'), (4, int)]) # The pretty values. d[2] = (3, 3) # The ugly nested tuple at the end. So there's no shortage of work-arounds for the lack of nice syntax for creating ordered dicts. And besides, why single out OrderedDict? There are surely people out there using ordered mappings like RedBlackTree that have to deal with this same issue. Perhaps what we need is to stop focusing on a specific dictionary type, and think about the lowest common denominator for any mapping. And that, I believe, is a list of (key, value) tuples. See below.
I don't understand what that syntax is supposed to do. Obviously it creates an OrderedDict, but you haven't explained the details. Is the prefix "OrderedDict" hard-coded in the parser/lexer, like the b prefix for byte-strings and r prefix for raw strings? In that case, I think that's pretty verbose, and would prefer to see something shorter: o{1: 'a', 4: int, 2: (3, 3)} perhaps. If OrderedDict is important enough to get its own syntax, it's important enough to get its own *short* syntax. That was my preferred solution, but it no longer is. Or is the prefix "OrderedDict" somehow looked-up at run-time? So we could write something like: spam = random.choice(list_of_callables) result = spam{1: 'a', 4: int, 2: (3, 3)} and spam would be called, whatever it happens to be, with a single list argument: [(1, 'a'), (4, int), (2, (3, 3))] What puts me off this solution is that it is syntactic sugar for not one but two distinct operations: - sugar for creating a list of tuples; - and sugar for a function call. But if we had the first, we don't need the second, and we don't need to treat OrderedDict as a special case. We could use any mapping: MyDict(sugar) OrderedDict(sugar) BinaryTree(sugar) and functions that aren't mappings at all, but expect lists of (a, b) tuples: covariance(sugar)
What does that actually do, in detail? Does it call d.__getitem__(key, value) repeatedly? So I could do something like this: L = [None]*10 L{1: 'a', 3: 'b', 5: 'c', 7: 'd', 9: 'e'} assert L == [None, 'a', None, 'b', None, 'c', None, 'd', None, 'e'] If we had nice syntax for creating ordered dict literals, would we want this feature? I don't think so. It must be pretty rare to want something like that (at least, I can't remember the last time I did) and when we do, we can often do it with slicing: py> L = [None]*10 py> L[1::2] = 'abcde' py> L [None, 'a', None, 'b', None, 'c', None, 'd', None, 'e']
This would allow arguments to the __init__ method as well.
How? You said that this option was only available after initialization.
And this way it could simply be a shorthand for setting multiple attributes.
How does the reader (or the interpreter) tell when d{key: value} means "call __setitem__" and when it means "call __setattr__"?
I think that there is a kernel of a good idea in this. Let's go back to the idea of syntactic sugar for a list of tuples. The user can then call the function or class of their choice, they aren't limited to just one mapping type. I'm going to suggest [key:value] as syntax. Now your original example becomes: d = OrderedDict([1: 'a', 4: int, 2: (3, 3)]) which breaks up the triple ))) at the end, so hopefully you will not think its ugly. Also, we're not limited to just calling the constructor, it could be any method: d.update([2: None, 1: 'b', 5: 99.9]) or anywhere at all: x = [2: None, 1: 'b', 5: 99.9, 1: 'a', 4: int, 2: (3, 3)] + items # shorter and less error-prone than: x = ( [(2, None), (1, 'b'), (5, 99.9), (1, 'a'), (4, int), (2, (3, 3))] + values ) There could be a comprehension form: [key: value for x in seq if condition] similar to the dict comprehension form. -- Steve

On Dec 12, 2015, at 19:24, Steven D'Aprano <steve@pearwood.info> wrote:
This does seem to be the obvious syntax: if [1, 2, 3] is a list and {1, 2, 3} is a set, and {1: 2, 3: 4} is a dict, then [1: 2, 3: 4] should be something that bears the same relationship to dict as list does to set: an a-list. (And we don't even have the {} ambiguity problem with [], because an a-list is the same type as a list, and no pairs is the same value as no elements.) And I think there's some precedent here. IIRC, in YAML, {1:2, 3:4} is unordered dict a la JSON (and Python), but [1:2, 3:4] is... actually, I think it's ambiguous between an ordered dict and a list of pairs, and you can resolve that by declaring !odict or !seq, or you can just leave it up to the implementation to pick one if you don't care... but let's pretend it wasn't ambiguous; either one covers the use case (and Python only has the latter option anyway, unless OrderedDict becomes a builtin). And it's definitely readable in your examples.

On Sun, Dec 13, 2015 at 12:15 AM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
And I think there's some precedent here. IIRC, in YAML, {1:2, 3:4} is unordered dict a la JSON (and Python), but [1:2, 3:4] is... actually, I think it's ambiguous between an ordered dict and a list of pairs, and you can resolve that by declaring !odict or !seq, or you can just leave it up to the implementation to pick one if you don't care... but let's pretend it wasn't ambiguous; either one covers the use case (and Python only has the latter option anyway, unless OrderedDict becomes a builtin).
For YAML, I read it as a list of dicts. My Python's yaml module (pyyaml?) agrees.
However, YAML's website (page: http://www.yaml.org/refcard.html) lists the !!omap type cast as using this syntax: '!!omap': [ one: 1, two: 2 ] I tried using !!seq. Not sure if I'm doing it right.
(Huh. PyYAML module thinks that an omap should be a list of pairs. It might eventually change to OrderedDict, though.)

On Tuesday, December 15, 2015 4:24 AM, Franklin? Lee <leewangzhong+python@gmail.com> wrote:
According to the YAML 1.1 (and 1.2, since AFAICR they never created the separate repo for 1.2) type-repo omap spec (http://yaml.org/type/omap.html):
Most programming languages do not have a built-in native data type for supporting ordered maps. Such data types are usually provided by libraries. If no such data type is available, an application may resort to loading an “!!omap” into a native array of hash tables containing one key each.
The “!!omap” tag may be given explicitly. Alternatively, the application may choose to implicitly type a sequence of single-key mappings to ordered maps. In this case, an explicit “!seq” transfer must be given to sequences of single-key mappings that do not represent ordered maps.
So, if you don't specify either "!!omap" or "!seq", it's up to your implementation whether you've designated an ordered dict or a list of dicts. I was wrong on some of the details--it's "!!omap" rather than "!odict", and it's a list of one-element dicts rather than a list of pairs, and it sounds like even if you _do_ explicitly specify one type the implementation is allowed to give you the other... But still, as I said, YAML is precedent for Python interpreting ['one': 1, 'two': 2] as OrderedDict([('one', 1), ('two', 2)]). Of course it's also precedent for Python interpreting that as a list of dicts [{'one': 1}, {'two': 2}], and I don't think anyone wants that... I suppose you can find precedent for almost any idea , no matter how silly, as long as it's implementable. :)
import yaml
Ha, so I was wrong about YAML allowing that, but PyYAML does it anyway?

I think you are right in suggesting that the whole problem goes away when there is a nice way to specify list of tuples. Since I indeed cannot think of a moment where my previous syntax cannot be replaced by this one. Another option would be that this syntax would not represent a list of tuples but an OrderedDict. I think they both have their advantages, the list of tuples would allow list operations such as `+ items` as you suggested and usage of the same keys multiple times. But OrderedDict would allow simple indexing directly. But it does not really matter that much since both could easily be used to generate the other. OrderedDict(['1':'2']) and list(['1':'2'].items()) respectively. I think the main case for the list of tuples is actually that you can make any OrderedDict from a list of tuples, but not the other way around, since duplicate keys would be removed. Which is why I like your idea for a shorthand for a list of tuples better, since it covers more uses. One important thing to note is the discussion I already mentioned in my first email. Especially this message where guide votes -100 for your syntax for OrderedDict creation: https://mail.python.org/pipermail/python-ideas/2009-June/004924.html I'm not sure why he disliked that syntax and if he still does. Or if his thoughts are different when it would represent a list of tuples instead of an OrderedDict. On 13 December 2015 at 04:24, Steven D'Aprano <steve@pearwood.info> wrote:

On Sun, Dec 13, 2015 at 5:00 AM, Jelte Fennema <me@jeltef.nl> wrote:
I also wonder why he doesn't like it. I wouldn't like it if it represented a list of tuples. What we have is: - [a, b, c] -> list = [ordered, mutable, collection] - {a, b, c} -> set = [unordered, mutable, collection, uniquekeys] - {a:x, b:y, c:z} -> dict = [unordered, mutable, mapping, uniquekeys] - (a, b, c) -> tuple = [ordered, immutable, collection] It seems to me that the pattern would extend to: - [a:x, b:y, c:z] -> [ordered, mutable, mapping] - (a:x, b:y, c:z) -> [ordered, immutable, mapping] The first one is ALMOST OrderedDict, except that it has unique keys. The second is ALMOST namedtuple, except that it: - doesn't allow duplicate keys - doesn't allow indexing by keys (though this can change) - doesn't allow arbitrary key type (and we don't want ints allowed as keys) - needs a name and a type If we add the rule that a mapping literal's type should have unique keys, then [a:x] -> OrderedDict fits the pattern. But [a:x] => [(a,x)] doesn't.

After thinking some more, I think you are right in saying that it would make more sense to let it represent an OrderedDict directly. Mostly because the mutability suggested by the square brackets. And also a bit because I'm not sure when a mapping that maps multiple values to the same key is actually useful. Secondly, I think your idea for namedtuple literals is great. This would be really useful in the namedtuple use case where you want to return multiple values from a function, but you want to be clear in what these values actually are. I think this would need to generate some kind of anonymous named tuple class though, since it would make no sense to have to create a new class when using a literal like this. I would really like to hear Guido's response to these ideas. Since he disliked the idea so much in the past and I can't find a reference to his reasoning. Jelte PS. Seeing as we're cleary drifting from the original topic of this thread, would it be a good idea if a new one would be created with these new ideas in mind? On 15 December 2015 at 12:49, Franklin? Lee <leewangzhong+python@gmail.com> wrote:

On Wed, Dec 16, 2015 at 12:08 AM, Jelte Fennema <me@jeltef.nl> wrote:
Be careful of this trap, though:
Dictionary-like things iterate over their keys; tuple-like things iterate over their values. (And a list of pairs would effectively iterate over items().) Having extremely similar syntax for creating them might well lead to a lot of confusion on that point. ChrisA

I see your point, but it would almost never make sense to list the attributes of a namedtuple. As this would be the only one that could cause confusion (since OrderedDict would do the same as dict), I doubt it would actually cause that much confusion. Jelt On 15 December 2015 at 14:19, Chris Angelico <rosuav@gmail.com> wrote:

On Tue, Dec 15, 2015 at 8:08 AM, Jelte Fennema <me@jeltef.nl> wrote:
Whoa whoa whoa. I wasn't suggesting a namedtuple literal. Let's not bring down the wrath of the gods. (Also, it's come up before (https://mail.python.org/pipermail/python-ideas/2014-April/027434.html) and I was part of the discussion (https://mail.python.org/pipermail/python-ideas/2014-April/027602.html). So it's not my idea.) (And it shouldn't be namedtuple, exactly, since namedtuple is a metaclass which generates classes with names. Attrtuple? For performance, it would map [keylist] => attrtuple[keylist]. Earlier discussion here: https://mail.python.org/pipermail/python-ideas/2013-June/021277.html)

One more point I don't think anyone's brought up yet with the [k: v] syntax: Just as there's no empty set literal because it would be ambiguous with the empty dict, there would be no empty OrderedDict literal because it would be ambiguous with the empty list. The fact that the exact same ambiguity is resolved in opposite directions (which only makes sense if you know the history--dict preceded set in the language, but list preceded OrderedDict) makes it doubly irregular. Of course we could introduce [:] as an empty OrderedDict literal, and {:} as an empty dict. But unless you actually wanted to deprecate {} for empty dict and eventually make it mean empty set (which I doubt anyone would argue for), that just gives us two ways to do it instead of solving the problem. Meanwhile: On Tuesday, December 15, 2015 5:09 AM, Jelte Fennema <me@jeltef.nl> wrote:
After thinking some more, I think you are right in saying that it would make more sense to let it represent an OrderedDict directly. Mostly because the mutability suggested by the square brackets. And also a bit because I'm not sure when a mapping that maps multiple values to the same key is actually useful.
Well, multidicts are actually useful pretty often--but in Python, they're usually spelled defaultdict(set) or defaultdict(list). After all, you need some syntax to look up (and modify and delete) values. In a dict that directly has multiple values per key, there's no way to specify which one you want, but in a dict that explicitly stores those multiple values as a set or list, it's just d[key] to get or delete that set or list, and d[key].add to add a value, and so on. I think Franklin's point was that a list of pairs is _most often_ used as a mapping initializer, but can mean other things as well, some of which might have a need for duplicate keys. For example, a stats package might take a mapping or iterable-of-pairs (the same type the dict constructor takes) for a collection of timestamped data points, and it's perfectly reasonable for two measurements to have the same timestamp in some datasets, but not in others. If the syntax defines an OrderedDict, it can't be used for the first kind of dataset. As for your mutability point: there's no reason it couldn't be a list of 2-lists instead of a list of 2-tuples. Sure, that will take a little more space in most implementations, but that rarely matters for literals--an object that's big enough in memory that you start to worry about compactness is probably way too big to put in a source file. (And if it _does_ matter, there's no reason CPython couldn't have special code that initializes the lists constructed by [:] literals with capacity 2 instead of the normal minimum capacity, on the expectation that you're not likely to go appending to all of the elements of a list constructed that way, and if you really want to, it's probably clearer to write it with [[]] syntax.)
Secondly, I think your idea for namedtuple literals is great. This would be really useful in the namedtuple use case where you want to return multiple values from a function, but you want to be clear in what these values actually are. I think this would need to generate some kind of anonymous named tuple class though, since it would make no sense to have to create a new class when using a literal like this.
First, why would it make no sense to create a new class? This isn't a prototype language; if the attributes are part of the object's type, then that type has to exist, and be accessible as a class. (You could cache the types, so that any two object literals with the same attributes have the same type, but that doesn't really change anything.) If you really want to avoid generating a new type, the type has to have normal-Python dynamic-per-object attributes, like SimpleNamespace, not class-specified attributes. I don't think that's an argument for (:) literals creating new classes, so much as an argument against them producing anything remotely like a namedtuple. None of the existing collection literals produce anything with attributes; namedtuple values can't be looked up by key (as in a mapping), only by attribute (as in SimpleNamespace) or index (as in a sequence); namedtuples don't iterate their keys (like a mapping) but their values (like a sequence)... So that's a pretty bad analogy with dict and OrderedDict in almost every way. Also, the syntax looks enough like general object literals in other languages like JavaScript that it will probably mislead people. (Or, worse, they'd actually be right--you could use (:) and lambda together to create some very unpythonic code, and people coming from JS would be very tempted to do so. The fact that you have to explicitly call type to do that today is enough to prevent that from being an attractive nuisance.) If you look at all the other collection literals (including the proposed [:] for OrderedDict) and try to guess what (:) does by analogy, the obvious answer would be a FrozenOrderedDict. Since frozen dicts in general aren't even useful enough to be in the stdlib, much less as builtins, much less ordered ones, I can't imagine they need literals. But a literal that looks like it should mean that, and instead means something completely different, is at best a new and clunky thing that has to be memorized separately from the rest of the syntax.

On Tue, Dec 15, 2015 at 1:36 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
One more point I don't think anyone's brought up yet with the [k: v] syntax: Just as there's no empty set literal because it would be ambiguous with the empty dict, there would be no empty OrderedDict literal because it would be ambiguous with the empty list. The fact that the exact same ambiguity is resolved in opposite directions (which only makes sense if you know the history--dict preceded set in the language, but list preceded OrderedDict) makes it doubly irregular.
I regularly write {} for an empty set and have to fix it later. It looks like math! On the other hand, I don't think anyone will learn OrderedDict before becoming VERY familiar with lists. If you try to treat a list as an empty OrderedDict, at least you will fail as soon as you use it. `empty_list_that_i_think_is_an_ordered_dict[0] = 5` will raise an error.
I think Franklin's point was that a list of pairs is _most often_ used as a mapping initializer, but can mean other things as well, some of which might have a need for duplicate keys. For example, a stats package might take a mapping or iterable-of-pairs (the same type the dict constructor takes) for a collection of timestamped data points, and it's perfectly reasonable for two measurements to have the same timestamp in some datasets, but not in others. If the syntax defines an OrderedDict, it can't be used for the first kind of dataset.
No, when I thought about it while writing the email, I was okay with this syntax NOT having multiple values per key, because I don't think it's a very basic data structure to think about (there's no builtin analogous to it), and the indexing syntax wouldn't be like dict's (indexing gets you a collection of values). If you really wanted an ordered multidict, you could write `[k1: [1,2,3], k2: [4,5], k3: [6]]`, or use set displays or tuple displays instead of list displays. In fact, that just shows how `d = [k1: x, k2: y, k1: z]` as an OrderedMultiDict would be ambiguous: Is d[k1] a list, a set, or a tuple? Does k1 come before or after k2?

I'm still against it. Despite its popularity in certain circles and use cases, OrderedDict is way less important than dict. Having too many variants on these display notations is confusing for many users. On Tue, Dec 15, 2015 at 3:49 AM, Franklin? Lee < leewangzhong+python@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 15.12.2015 12:49, Franklin? Lee wrote:
How would the parser be able to detect these ? Since requests to be able to access the order of values, parameters and definitions in source code come up rather often, perhaps it'd better to provide Python application with a standard access mechanism to this order rather than trying to push use of OrderedDict and the like into the runtime, causing unnecessary performance overhead. The parser does have access to this information in the AST and some of it is partially copied into code object attributes, but there's no general purpose access to the information. Based on the source code order, you could do lots of things, e.g. avoid hacks to map class attributes to column definitions for ORMs, make it possible to write OrderedDict(a=x, b=y) and have the literal order preserved, have NamedTuple(a=x, b=y) work without additional tricks, etc. What's important here is that the runtime performance would not change. The code objects would gain some additional tuples, which store the order of the literals used in their AST, so only the memory consumption would increase. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Dec 16 2015)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On Wed, Dec 16, 2015 at 2:50 AM, M.-A. Lemburg <mal@egenix.com> wrote:
+1 from me. That's essentially the goal of PEP 468. [1] While the proposed solution focuses on OrderedDict, note the various alternatives at the bottom of the PEP. Also note that OrderedDict now has a C implementation that doesn't suffer from the same performance penalty. [2] -eric [1] http://legacy.python.org/dev/peps/pep-0468/ [2] OrderedDict is actually faster for iteration, the same speed for other non-mutation operations, not much slower for most mutation operations, and 4x slower in the worst case. That is a drastic improvement over the pure Python OrderedDict.

On Dec 17, 2015, at 10:37, Eric Snow <ericsnowcurrently@gmail.com> wrote:
It seems like whatever the resolution of this discussion (unless it's "people mostly agree that X would be acceptable, doable, and probably desirable, but nobody's going to do it") you may want to rewrite and repush PEP 468. After all, if Python 3.6 changes to make all dicts ordered, or make dict literals preserve literal order at least until first mutation (or anything in between--as Bruce Leban pointed out, there are at least three sensible things in between rather than just the one I suggested, and of courseall of them are good enough here), or to stash AST links on code objects, etc., you should be able to show how PEP 468 only adds a trivial requirement to any implementation of Python 3.6, and then the concerns come down to "maybe on some implementations it would be slightly faster to construct the kwdict in reverse, but now it will have to not do that". Conversely if someone convinces everyone that there is no solution that works for dict literals, you'd probably need to explain why that same problem doesn't apply to kwargs to keep the PEP alive.
IIRC, the one concern of Guido's that you couldn't answer was that if someone keeps the kwdict and adds to it, he could end up wasting a lot of space, not just time. If OrderedDict is still 150-200% bigger than dict, as in the pure Python version, that's still a problem.

On Thu, Dec 17, 2015 at 11:07 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
It seems like whatever the resolution of this discussion (unless it's "people mostly agree that X would be acceptable, doable, and probably desirable, but nobody's going to do it") you may want to rewrite and repush PEP 468.
Agreed. It's just a simple matter of finding time. <wink>
IIRC, the one concern of Guido's that you couldn't answer was that if someone keeps the kwdict and adds to it, he could end up wasting a lot of space, not just time. If OrderedDict is still 150-200% bigger than dict, as in the pure Python version, that's still a problem.
Yeah, he said something like that. However, with the C implementation the memory usage is less. Compared to dict: * basically 8 extra pointers on the object [1] * an instance __dict__ * an array of pointers equal in length to the underlying dict's hash table * basically 4 pointers per item So, per the __sizeof__ method, an empty C OrderedDict uses 800 bytes in contrast to 280 bytes for dict. Each added item uses an additional 24 bytes. With 1000 items usage is 122720 bytes (compared to 49240 bytes for dict). With the pure Python OrderedDict, empty is 824 bytes, each item uses 152 bytes, and with 1000 items usage is 250744 bytes. -eric [1] https://hg.python.org/cpython/file/default/Objects/odictobject.c#l481

I care about readability but I find: d = OrderedDict() for key, value in zip([1, 4, 2], ['a', int, (3, 3)]): d[key] = value quite readable. Laura

This is indeed another option, but it decouples the keys from the values. Which makes it harder to see with a quick look what key will have what value, you will have to count the keys. This feels to me like a serious drawback of this method since knowing what value a key will have is pretty important when talking about initializing dictionaries. On 13 December 2015 at 22:10, Cody Piersall <cody.piersall@gmail.com> wrote:

Hello, Currently expression (a=1, b=2) is a syntax error. If it's defined to mean (('a',1), ('b',2)) it can be used when making OrderedDict or anything that requires named ordered args e.g. OrderedDict((a=1, b=2)) another variant with more changes in VM is OrderedDict(**(a=1, b=2)) Niki

Getting back to the original issue of being able to initialize an OrderedDict with a literal syntax, one proposal is that all dicts are ordered. There are actually several intermediate steps between dicts are unordered and ordered and here are some: (1) Dict literals preserve order of the original keys. When a literal is iterated, its keys are returned in declaration order. If the dict is changed in any way, then no guarantee is made about iteration order. (2) Dict literals preserve order of the original keys. This order is preserved even if values are changed as long as no keys are added or deleted. If a key is added or removed, then no guarantee is made about iteration order. (3) Dict literals preserve order of the original keys. This order is preserved for all keys originally added but new keys may be returned in any order, even interspersed between original keys. The order of the new keys is not stable in that it may change anytime the dict is changed in any way. (4) Dict literals are really ordered dicts. For the case of initialization, all of the above make OrderedDict{k1: v1, k2: v2, ...}) work. If that use case is the primary motivator here, then just implementing option (1) seems sufficient. --- Bruce Check out my puzzle book and get it free here: http://J.mp/ingToConclusionsFree (available on iOS)
participants (19)
-
Andrew Barnert
-
Barry Warsaw
-
Bruce Leban
-
Chris Angelico
-
Cody Piersall
-
Eric Snow
-
Franklin? Lee
-
Guido van Rossum
-
Jelte Fennema
-
Jim Baker
-
Joseph Jevnik
-
Laura Creighton
-
M.-A. Lemburg
-
Mike Miller
-
Nathaniel Smith
-
Niki Spahiev
-
Paul Moore
-
Ryan Hiebert
-
Steven D'Aprano