[Python-ideas] Dict literal use for custom dict classes

Sun Dec 13 00:15:58 EST 2015

On Dec 12, 2015, at 19:24, Steven D'Aprano <steve at pearwood.info> wrote:
> 
>> On Sun, Dec 13, 2015 at 01:13:49AM +0100, Jelte Fennema wrote:
>> I really like the OrderedDict class. But there is one thing that has always
>> bothered me about it. Quite often I want to initialize a small ordered
>> dict. When the keys are all strings this is pretty easy, since you can just
>> use the keyword arguments. But when  some, or all of the keys are other
>> things this is an issue. In that case there are two options (as far as I
>> know). If you want an ordered dict of this form for instance: {1: 'a', 4:
>> int, 2: (3, 3)}, you would either have to use:
>> OrderedDict([(1, 'a'), (4, int), (2, (3, 3))])
>> 
>> or you could use:
>> d = OrderedDict()
>> d[1] = 'a'
>> d[4] = int
>> d[2] = (3, 3)
>> 
>> In my opinion both are quite verbose and the first is pretty unreadable
>> because of all the nested tuples.
> 
> You have a rather strict view of "unreadable" :-)
> 
> Some alternatives if you dislike the look of the above:
> 
> # Option 3:
> d = OrderedDict()
> for key, value in zip([1, 4, 2], ['a', int, (3, 3)]):
>    d[key] = value
> 
> # Option 4:
> d = OrderedDict([(1, 'a'), (4, int)])  # The pretty values.
> d[2] = (3, 3)  # The ugly nested tuple at the end.
> 
> So there's no shortage of work-arounds for the lack of nice syntax for 
> creating ordered dicts.
> 
> And besides, why single out OrderedDict? There are surely people out 
> there using ordered mappings like RedBlackTree that have to deal with 
> this same issue. Perhaps what we need is to stop focusing on a specific 
> dictionary type, and think about the lowest common denominator for any 
> mapping. And that, I believe, is a list of (key, value) tuples. See 
> below.
> 
> 
> 
>> That is why I have two suggestions for
>> language additions that fix that.
>> The first one is the normal dict literal syntax available to custom dict
>> classes like this:
>> OrderedDict{1: 'a', 4: int, 2: (3, 3)}
> 
> I don't understand what that syntax is supposed to do.
> 
> Obviously it creates an OrderedDict, but you haven't explained the 
> details. Is the prefix "OrderedDict" hard-coded in the parser/lexer, 
> like the b prefix for byte-strings and r prefix for raw strings? In that 
> case, I think that's pretty verbose, and would prefer to see something 
> shorter:
> 
> o{1: 'a', 4: int, 2: (3, 3)}
> 
> perhaps. If OrderedDict is important enough to get its own syntax, it's 
> important enough to get its own *short* syntax. That was my preferred 
> solution, but it no longer is.
> 
> Or is the prefix "OrderedDict" somehow looked-up at run-time? So we 
> could write something like:
> 
> spam = random.choice(list_of_callables)
> result = spam{1: 'a', 4: int, 2: (3, 3)}
> 
> and spam would be called, whatever it happens to be, with a single 
> list argument:
> 
> [(1, 'a'), (4, int), (2, (3, 3))]
> 
> 
> What puts me off this solution is that it is syntactic sugar for not one 
> but two distinct operations:
> 
> - sugar for creating a list of tuples;
> - and sugar for a function call.
> 
> But if we had the first, we don't need the second, and we don't need to 
> treat OrderedDict as a special case. We could use any mapping:
> 
> MyDict(sugar)
> OrderedDict(sugar)
> BinaryTree(sugar)
> 
> and functions that aren't mappings at all, but expect lists of (a, b) 
> tuples:
> 
> covariance(sugar)
> 
> 
>> This looks much cleaner in my opinion. As far as I can tell it could simply
>> be implemented as if the either of the two above options was used. This
>> would make it available to all custom dict types that implement the two
>> options above.
>> 
>> A second very similar option, which might be cleaner and more useful, is to
>> make this syntax available (only) after initialization. So it could be used
>> like this:
>> d = OrderedDict(){1: 'a', 4: int, 2: (3, 3)}
>> d{3: 4, 'a': 'c'}
>> *>>> *OrderedDict(){1: 'a', 4: int, 2: (3, 3), 3: 4, 'a': 'c'}
> 
> What does that actually do, in detail? Does it call d.__getitem__(key, 
> value) repeatedly? So I could do something like this:
> 
> L = [None]*10
> L{1: 'a', 3: 'b', 5: 'c', 7: 'd', 9: 'e'}
> assert L == [None, 'a', None, 'b', None, 'c', None, 'd', None, 'e']
> 
> If we had nice syntax for creating ordered dict literals, would we want 
> this feature? I don't think so. It must be pretty rare to want something 
> like that (at least, I can't remember the last time I did) and when we 
> do, we can often do it with slicing:
> 
> py> L = [None]*10
> py> L[1::2] = 'abcde'
> py> L
> [None, 'a', None, 'b', None, 'c', None, 'd', None, 'e']
> 
> 
> 
>> This would allow arguments to the __init__ method as well.
> 
> How? You said that this option was only available after
> initialization.
> 
> 
>> And this way it could simply be a shorthand for setting multiple attributes.
> 
> How does the reader (or the interpreter) tell when 
> 
> d{key: value}
> 
> means "call __setitem__" and when it means "call __setattr__"?
> 
> 
> 
>> It might even
>> be used to change multiple values in a list if that is a feature that is
>> wanted.
>> 
>> Lastly I think either of the two sugested options could be used to allow
>> dict comprehensions for custom dict types. But this might require a bit
>> more work (although not much I think).
>> 
>> I'm interested to hear what you guys think.
> 
> I think that there is a kernel of a good idea in this. Let's go back to 
> the idea of syntactic sugar for a list of tuples. The user can then call 
> the function or class of their choice, they aren't limited to just one 
> mapping type.
> 
> I'm going to suggest [key:value] as syntax

This does seem to be the obvious syntax: if [1, 2, 3] is a list and {1, 2, 3} is a set, and {1: 2, 3: 4} is a dict, then [1: 2, 3: 4] should be something that bears the same relationship to dict as list does to set: an a-list. (And we don't even have the {} ambiguity problem with [], because an a-list is the same type as a list, and no pairs is the same value as no elements.)

And I think there's some precedent here. IIRC, in YAML, {1:2, 3:4} is unordered dict a la JSON (and Python), but [1:2, 3:4] is... actually, I think it's ambiguous between an ordered dict and a list of pairs, and you can resolve that by declaring !odict or !seq, or you can just leave it up to the implementation to pick one if you don't care... but let's pretend it wasn't ambiguous; either one covers the use case (and Python only has the latter option anyway, unless OrderedDict becomes a builtin).

And it's definitely readable in your examples.

> . Now your original example 
> becomes:
> 
> d = OrderedDict([1: 'a', 4: int, 2: (3, 3)])
> 
> which breaks up the triple ))) at the end, so hopefully you will not 
> think its ugly. Also, we're not limited to just calling the constructor, 
> it could be any method:
> 
> d.update([2: None, 1: 'b', 5: 99.9])
> 
> or anywhere at all:
> 
> x = [2: None, 1: 'b', 5: 99.9, 1: 'a', 4: int, 2: (3, 3)] + items
> 
> # shorter and less error-prone than:
> x = (
>    [(2, None), (1, 'b'), (5, 99.9), (1, 'a'), (4, int), (2, (3, 3))]
>    + values
>    )
> 
> 
> There could be a comprehension form:
> 
> [key: value for x in seq if condition]
> 
> similar to the dict comprehension form.
> 
> 
> 
> 
> -- 
> Steve
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/