[Python-ideas] Add dict.append and dict.extend

Tue Jun 5 02:27:35 EDT 2018

On Mon, Jun 4, 2018 at 4:02 PM, Yuval Greenfield <ubershmekel at gmail.com> wrote:
> The proposed meanings surprised me too. My initial instinct for
> `dict.append` was that it would always succeed, much like `list.append`
> always succeeds.

Many of the methods named `append` in the standard libraries fail if
adding the item would violate a constraint of the data structure.
`list.append` is an exception because it stores uninterpreted object
references, but, e.g., bytearray().append(-1) raises ValueError. Also,
`dict.__setitem__` and `dict.update` fail if a key is unhashable,
which is another dict-specific constraint.

Regardless, I'm not too attached to those names. I want the underlying
functionality, and the names made sense to me. I'd be okay with
`unique_setitem` and `unique_update`.

On Mon, Jun 4, 2018 at 5:25 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Mon, Jun 04, 2018 at 02:22:29PM -0700, Ben Rudiak-Gould wrote:
>> Very often I expect that the key I'm adding to a dict isn't already in
>> it.
>
> Can you give some examples of when you want to do that? I'm having
> difficulty in thinking of any.

One example (or family of examples) is any situation where you would
have a UNIQUE constraint on an indexed column in a database. If the
values in a column should always be distinct, like the usernames in a
table of user accounts, you can declare that column UNIQUE (or PRIMARY
KEY) and any attempt to add a record with a duplicate username will
fail.

People often use Python dicts to look up objects by some property of
the object, which is similar to indexing a database column. When the
values aren't necessarily unique (like a zip code), you have to use
something like defaultdict(list) for the index, because Python doesn't
have a dictionary that supports duplicate keys like C++'s
std::multimap. When the values should be unique (like a username), the
best data type for the index is dict, but there's no method on dicts
that has the desired behavior of refusing to add a record with a
duplicate key. I think this is a frequent enough use case to deserve
standard library support.

Of course you can implement the same functionality in other ways, but
that's as true of databases as it is of Python. If SQL didn't have
UNIQUE, every client of the database would have its own code for
checking and enforcing the constraint. They'd all have different
names, and slightly different implementations. The uniqueness property
that they're supposed to guarantee would probably be documented only
in comments if at all. Some implementations would probably have bugs.
You can't offload all of your programming needs onto the database
developer, but I think UNIQUE is a useful enough feature to merit
inclusion in SQL. And that's my argument for Python as well.

Another example is keyword arguments. f(**{'a': 1}, **{'a': 2}) could
mean f(a=1) or f(a=2), but actually it's an error. I think that was a
good design decision: it's consistent with Python's general philosophy
of raising exceptions when things look dodgy, which makes it much
easier to find bugs. Compare this to JavaScript, where if you pass
four arguments to a function that expected three, the fourth is just
discarded; and if the actual incorrect argument was the first, not the
fourth, then all of the arguments will be bound to the wrong
variables, and if an argument that was supposed to be a number gets a
value of some other type as a consequence, and the function tries to
add 1 to it, it still won't fail but will produce some silly result
like "[object Object]1", which will then propagate through more of the
code, until finally you get a wrong answer or a failure in code that's
unrelated to the actually erroneous code.

I'm thankful that Python doesn't do that, and I wish it didn't do it
even more than it already doesn't. Methods that raise an exception on
duplicated keys, instead of silently discarding the old or new value,
are an example of the sort of fail-safe operations that I'd like to
see more of.

For overridable options with defaults, `__setitem__` and `update` do
the right thing - I certainly don't think they're useless.

> I'm sorry, I can't visualise how it would take you up to five lines to
> check and update a key. It shouldn't take more than two:
>
> if key not in d:
>     d[key] = value
>
> Can you give an real-life example of the five line version?

The three lines I was thinking of were something like

    if k in d:
        raise KeyError(k)
    d[k] = ...

The five lines were something like

    d = get_mapping()
    k = get_key()
    if k in d:
        raise KeyError(k)
    d[k] = ...

as a checked version of

    get_mapping()[get_key()] = ...

(or in general, any situation where you can't or don't want to
duplicate the expressions that produce the mapping and the key).

> I don't see any connection between "append" and "fail if the key already
> exists". That's not what it means with lists.

If Python had built-in dictionaries with no unique-key constraint, and
you started with

    multidict({'a': 1, 'b': 2})

and appended 'a': 3 to that, you'd get

    multidict({'a': 1, 'b': 2, 'a': 3})

just as if you'd appended ('a', 3) to [('a', 1), ('b', 2)], except
that this "list" is indexed on the first half of each element.

If you try to append 'a': 3 to the actual Python dict {'a': 1, 'b':
2}, it should fail because {'a': 1, 'b': 2, 'a': 3} violates the
unique-key constraint of that data structure. The failure isn't the
point, as such. It just means the method can't do what it's meant to
do, which is add something to the dict while leaving everything that's
already there alone.

-- Ben