[Python-ideas] Dict literal use for custom dict classes

Joseph Jevnik joejev at gmail.com
Sat Dec 12 19:53:20 EST 2015


One of my favorite things about python is that it is always very clear when
and in what order my code is executed. There is very little syntactic sugar
that obscures the execution order; however, using an unordered collection
(dict) to discuss the order that data will be added to an order-aware
collection seems very confusing. We should first look at what the language
already provides us for defining this.

One thing that might make the association lists more readable without
changing the language would be to visually break up the pairs over multiple
lines. This could change the `OrderedDict` construction to look like:

OrderedDict([
    (k0, v0),
    (k1, v1),
    (kn, vn),
])

This makes the k: v mapping much more clear.
Another option, which we have used in the library 'datashape', is to make
the class itself subscriptable:

R['a':int32, 'b':float64, 'c': string]

or if we were to write it like the above example:

R[
    'a':int32,
    'b':float64,
    'c':string,
]

This might look like custom syntax; however, this is just using
`__getitem__`, `tuple` literals, and `slice` literals in an interesting way
to look sort of like a dictionary. One nice property of this syntax is that
because readers know that they are creating a tuple, the fact that this is
an order-preserving operation is very clear. This code is semantically
equivalent to normal numpy code like: `my_array[idx_0_start:idx_0_end,
idx_1_start:idx_1_end, idx_2_start:idx_2_end]`

Here `R` is a an alias for the class `Record` where `R is Record`. This
class has a metaclass that adds a `__getitem__`. This getitem looks for
either a single slice or a tuple of slices and then checks the values to
make sure they are all valid inputs. This is used to internally construct
an `OrderedDict` We hadn't considered the comprehension case; however, you
could dispatch the `__getitem__` on a generator that yields tuples to
simulate the comprehension. This could look like:

R[(k, v) for k, v in other_seq]

where inside our `__getitem__` we would add a case like:

if isinstance(key, types.GeneratorType):
    mapping = OrderedDict(key)

If you would like to see like to see a code example of the implementation
for the `Record` type it is available under the BSD license here:
https://github.com/blaze/datashape/blob/master/datashape/coretypes.py#L968

There is another library that I have worked on that adds the ability to
overload all of the literals, including dictionaries. This requires CPython
>= 3.4 though and is for fun only so I would not recommend using this in a
production setting. I am merely mentioning this to show another possible
syntax for this.

@ordereddict_literals
def f():
    return {'a': 1, 'b': 2, 'c': 3}

>>> f()
OrderedDict([('a', 1), ('b', 2), ('c', 3)])

This code is available under the GPLv2 license here:
https://github.com/llllllllll/codetransformer/blob/master/codetransformer/transformers/literals.py#L15


On Sat, Dec 12, 2015 at 7:13 PM, Jelte Fennema <me at jeltef.nl> wrote:

> I really like the OrderedDict class. But there is one thing that has
> always bothered me about it. Quite often I want to initialize a small
> ordered dict. When the keys are all strings this is pretty easy, since you
> can just use the keyword arguments. But when  some, or all of the keys are
> other things this is an issue. In that case there are two options (as far
> as I know). If you want an ordered dict of this form for instance: {1: 'a',
> 4: int, 2: (3, 3)}, you would either have to use:
> OrderedDict([(1, 'a'), (4, int), (2, (3, 3))])
>
> or you could use:
> d = OrderedDict()
> d[1] = 'a'
> d[4] = int
> d[2] = (3, 3)
>
> In my opinion both are quite verbose and the first is pretty unreadable
> because of all the nested tuples. That is why I have two suggestions for
> language additions that fix that.
> The first one is the normal dict literal syntax available to custom dict
> classes like this:
> OrderedDict{1: 'a', 4: int, 2: (3, 3)}
>
> This looks much cleaner in my opinion. As far as I can tell it could
> simply be implemented as if the either of the two above options was used.
> This would make it available to all custom dict types that implement the
> two options above.
>
> A second very similar option, which might be cleaner and more useful, is
> to make this syntax available (only) after initialization. So it could be
> used like this:
> d = OrderedDict(){1: 'a', 4: int, 2: (3, 3)}
> d{3: 4, 'a': 'c'}
> *>>> *OrderedDict(){1: 'a', 4: int, 2: (3, 3), 3: 4, 'a': 'c'}
>
> This would allow arguments to the __init__ method as well. And this way
> it could simply be a shorthand for setting multiple attributes. It might
> even be used to change multiple values in a list if that is a feature that
> is wanted.
>
> Lastly I think either of the two sugested options could be used to allow
> dict comprehensions for custom dict types. But this might require a bit
> more work (although not much I think).
>
> I'm interested to hear what you guys think.
>
> Jelte
>
> PS. I read part of this thread
> https://mail.python.org/pipermail/python-ideas/2009-June/thread.html#4916,
> but that seemed more about making OrderedDict itself a literal. Which is
> not what I'm suggesting here.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20151212/ec6529ca/attachment-0001.html>


More information about the Python-ideas mailing list