Mailman 3 Add dict.append and dict.extend - Python-ideas

George Leslie-Waksman

June 2018

10:57 p.m.

Semantically, I'm not sure append and extend would be universally understood to mean don't overwrite. This can be accomplished with a custom subclass for your use case: ``` import collections class OverwriteGuardedDict(collections.UserDict): def append(self, key, value): if key in self.data: raise KeyError(key) self.data[key] = value def extend(self, other): overlap = self.data.keys() & other.keys() if overlap: raise KeyError(','.join(overlap)) self.data.update(other) ``` On Mon, Jun 4, 2018 at 2:24 PM Ben Rudiak-Gould <benrudiak@gmail.com> wrote:

...

Reply

Sign in to reply online Use email software

David Mertz

11:04 p.m.

I don't think I'd ever guess the intended semantics from the names in a million years. They seem like horribly misnamed methods, made worse by the false suggestion of similarity to list operations. In 20 years of using Python, moreover, I don't think I've ever wanted the described behavior under any spelling. On Mon, Jun 4, 2018, 6:58 PM George Leslie-Waksman <waksman@gmail.com> wrote:

...

Semantically, I'm not sure append and extend would be universally understood to mean don't overwrite.

This can be accomplished with a custom subclass for your use case:

``` import collections

class OverwriteGuardedDict(collections.UserDict): def append(self, key, value): if key in self.data: raise KeyError(key) self.data[key] = value

def extend(self, other): overlap = self.data.keys() & other.keys() if overlap: raise KeyError(','.join(overlap)) self.data.update(other) ```

On Mon, Jun 4, 2018 at 2:24 PM Ben Rudiak-Gould <benrudiak@gmail.com> wrote:

...
I'd like to propose adding `append` and `extend` methods to dicts which behave like `__setitem__` and `update` respectively, except that they raise an exception (KeyError?) instead of overwriting preexisting entries.

Very often I expect that the key I'm adding to a dict isn't already in it. If I want to verify that, I have to expand my single-line assignment statement to 3-5 lines (depending on whether the dict and key are expressions that I now need to assign to local variables). If I don't verify it, I may overwrite a dict entry and produce silently wrong output.

The names `append` and `extend` make sense now that dicts are defined to preserve insertion order: they try to append the new entries, and if that can't be done because it would duplicate a key, they raise an exception.

In case of error, `extend` should probably leave successfully appended entries in the dict, since that's consistent with list.extend and dict.update.

The same methods would also be useful on sets. Unfortunately, the names make less sense.

-- Ben _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Reply

Sign in to reply online Use email software

MRAB

12:59 a.m.

On 2018-06-05 01:25, Steven D'Aprano wrote:

...

[snip]

...

It shouldn't take more than one: d.setdefault(key, value) :-) [snip]

Reply

Sign in to reply online Use email software

Ben Rudiak-Gould

6:27 a.m.

On Mon, Jun 4, 2018 at 4:02 PM, Yuval Greenfield <ubershmekel@gmail.com> wrote:

...

The proposed meanings surprised me too. My initial instinct for `dict.append` was that it would always succeed, much like `list.append` always succeeds.

Many of the methods named `append` in the standard libraries fail if adding the item would violate a constraint of the data structure. `list.append` is an exception because it stores uninterpreted object references, but, e.g., bytearray().append(-1) raises ValueError. Also, `dict.__setitem__` and `dict.update` fail if a key is unhashable, which is another dict-specific constraint. Regardless, I'm not too attached to those names. I want the underlying functionality, and the names made sense to me. I'd be okay with `unique_setitem` and `unique_update`. On Mon, Jun 4, 2018 at 5:25 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

On Mon, Jun 04, 2018 at 02:22:29PM -0700, Ben Rudiak-Gould wrote:

...
Very often I expect that the key I'm adding to a dict isn't already in it.

Can you give some examples of when you want to do that? I'm having difficulty in thinking of any.

One example (or family of examples) is any situation where you would have a UNIQUE constraint on an indexed column in a database. If the values in a column should always be distinct, like the usernames in a table of user accounts, you can declare that column UNIQUE (or PRIMARY KEY) and any attempt to add a record with a duplicate username will fail. People often use Python dicts to look up objects by some property of the object, which is similar to indexing a database column. When the values aren't necessarily unique (like a zip code), you have to use something like defaultdict(list) for the index, because Python doesn't have a dictionary that supports duplicate keys like C++'s std::multimap. When the values should be unique (like a username), the best data type for the index is dict, but there's no method on dicts that has the desired behavior of refusing to add a record with a duplicate key. I think this is a frequent enough use case to deserve standard library support. Of course you can implement the same functionality in other ways, but that's as true of databases as it is of Python. If SQL didn't have UNIQUE, every client of the database would have its own code for checking and enforcing the constraint. They'd all have different names, and slightly different implementations. The uniqueness property that they're supposed to guarantee would probably be documented only in comments if at all. Some implementations would probably have bugs. You can't offload all of your programming needs onto the database developer, but I think UNIQUE is a useful enough feature to merit inclusion in SQL. And that's my argument for Python as well. Another example is keyword arguments. f(**{'a': 1}, **{'a': 2}) could mean f(a=1) or f(a=2), but actually it's an error. I think that was a good design decision: it's consistent with Python's general philosophy of raising exceptions when things look dodgy, which makes it much easier to find bugs. Compare this to JavaScript, where if you pass four arguments to a function that expected three, the fourth is just discarded; and if the actual incorrect argument was the first, not the fourth, then all of the arguments will be bound to the wrong variables, and if an argument that was supposed to be a number gets a value of some other type as a consequence, and the function tries to add 1 to it, it still won't fail but will produce some silly result like "[object Object]1", which will then propagate through more of the code, until finally you get a wrong answer or a failure in code that's unrelated to the actually erroneous code. I'm thankful that Python doesn't do that, and I wish it didn't do it even more than it already doesn't. Methods that raise an exception on duplicated keys, instead of silently discarding the old or new value, are an example of the sort of fail-safe operations that I'd like to see more of. For overridable options with defaults, `__setitem__` and `update` do the right thing - I certainly don't think they're useless.

...

I'm sorry, I can't visualise how it would take you up to five lines to check and update a key. It shouldn't take more than two:

if key not in d: d[key] = value

Can you give an real-life example of the five line version?

The three lines I was thinking of were something like if k in d: raise KeyError(k) d[k] = ... The five lines were something like d = get_mapping() k = get_key() if k in d: raise KeyError(k) d[k] = ... as a checked version of get_mapping()[get_key()] = ... (or in general, any situation where you can't or don't want to duplicate the expressions that produce the mapping and the key).

...

I don't see any connection between "append" and "fail if the key already exists". That's not what it means with lists.

If Python had built-in dictionaries with no unique-key constraint, and you started with multidict({'a': 1, 'b': 2}) and appended 'a': 3 to that, you'd get multidict({'a': 1, 'b': 2, 'a': 3}) just as if you'd appended ('a', 3) to [('a', 1), ('b', 2)], except that this "list" is indexed on the first half of each element. If you try to append 'a': 3 to the actual Python dict {'a': 1, 'b': 2}, it should fail because {'a': 1, 'b': 2, 'a': 3} violates the unique-key constraint of that data structure. The failure isn't the point, as such. It just means the method can't do what it's meant to do, which is add something to the dict while leaving everything that's already there alone. -- Ben

Reply

Sign in to reply online Use email software

George Leslie-Waksman

June 2018

10:57 p.m.

Semantically, I'm not sure append and extend would be universally understood to mean don't overwrite. This can be accomplished with a custom subclass for your use case: ``` import collections class OverwriteGuardedDict(collections.UserDict): def append(self, key, value): if key in self.data: raise KeyError(key) self.data[key] = value def extend(self, other): overlap = self.data.keys() & other.keys() if overlap: raise KeyError(','.join(overlap)) self.data.update(other) ``` On Mon, Jun 4, 2018 at 2:24 PM Ben Rudiak-Gould <benrudiak@gmail.com> wrote:

...

Reply

Sign in to reply online Use email software

David Mertz

11:04 p.m.

I don't think I'd ever guess the intended semantics from the names in a million years. They seem like horribly misnamed methods, made worse by the false suggestion of similarity to list operations. In 20 years of using Python, moreover, I don't think I've ever wanted the described behavior under any spelling. On Mon, Jun 4, 2018, 6:58 PM George Leslie-Waksman <waksman@gmail.com> wrote:

...

Semantically, I'm not sure append and extend would be universally understood to mean don't overwrite.

This can be accomplished with a custom subclass for your use case:

``` import collections

class OverwriteGuardedDict(collections.UserDict): def append(self, key, value): if key in self.data: raise KeyError(key) self.data[key] = value

def extend(self, other): overlap = self.data.keys() & other.keys() if overlap: raise KeyError(','.join(overlap)) self.data.update(other) ```

On Mon, Jun 4, 2018 at 2:24 PM Ben Rudiak-Gould <benrudiak@gmail.com> wrote:

...
I'd like to propose adding `append` and `extend` methods to dicts which behave like `__setitem__` and `update` respectively, except that they raise an exception (KeyError?) instead of overwriting preexisting entries.

Very often I expect that the key I'm adding to a dict isn't already in it. If I want to verify that, I have to expand my single-line assignment statement to 3-5 lines (depending on whether the dict and key are expressions that I now need to assign to local variables). If I don't verify it, I may overwrite a dict entry and produce silently wrong output.

The names `append` and `extend` make sense now that dicts are defined to preserve insertion order: they try to append the new entries, and if that can't be done because it would duplicate a key, they raise an exception.

In case of error, `extend` should probably leave successfully appended entries in the dict, since that's consistent with list.extend and dict.update.

The same methods would also be useful on sets. Unfortunately, the names make less sense.

-- Ben _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Reply

Sign in to reply online Use email software

MRAB

12:59 a.m.

On 2018-06-05 01:25, Steven D'Aprano wrote:

...

[snip]

...

It shouldn't take more than one: d.setdefault(key, value) :-) [snip]

Reply

Sign in to reply online Use email software

Ben Rudiak-Gould

June 2018

11:27 p.m.

On Mon, Jun 4, 2018 at 4:02 PM, Yuval Greenfield <ubershmekel@gmail.com> wrote:

...

The proposed meanings surprised me too. My initial instinct for `dict.append` was that it would always succeed, much like `list.append` always succeeds.

Many of the methods named `append` in the standard libraries fail if adding the item would violate a constraint of the data structure. `list.append` is an exception because it stores uninterpreted object references, but, e.g., bytearray().append(-1) raises ValueError. Also, `dict.__setitem__` and `dict.update` fail if a key is unhashable, which is another dict-specific constraint. Regardless, I'm not too attached to those names. I want the underlying functionality, and the names made sense to me. I'd be okay with `unique_setitem` and `unique_update`. On Mon, Jun 4, 2018 at 5:25 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

On Mon, Jun 04, 2018 at 02:22:29PM -0700, Ben Rudiak-Gould wrote:

...
Very often I expect that the key I'm adding to a dict isn't already in it.

Can you give some examples of when you want to do that? I'm having difficulty in thinking of any.

One example (or family of examples) is any situation where you would have a UNIQUE constraint on an indexed column in a database. If the values in a column should always be distinct, like the usernames in a table of user accounts, you can declare that column UNIQUE (or PRIMARY KEY) and any attempt to add a record with a duplicate username will fail. People often use Python dicts to look up objects by some property of the object, which is similar to indexing a database column. When the values aren't necessarily unique (like a zip code), you have to use something like defaultdict(list) for the index, because Python doesn't have a dictionary that supports duplicate keys like C++'s std::multimap. When the values should be unique (like a username), the best data type for the index is dict, but there's no method on dicts that has the desired behavior of refusing to add a record with a duplicate key. I think this is a frequent enough use case to deserve standard library support. Of course you can implement the same functionality in other ways, but that's as true of databases as it is of Python. If SQL didn't have UNIQUE, every client of the database would have its own code for checking and enforcing the constraint. They'd all have different names, and slightly different implementations. The uniqueness property that they're supposed to guarantee would probably be documented only in comments if at all. Some implementations would probably have bugs. You can't offload all of your programming needs onto the database developer, but I think UNIQUE is a useful enough feature to merit inclusion in SQL. And that's my argument for Python as well. Another example is keyword arguments. f(**{'a': 1}, **{'a': 2}) could mean f(a=1) or f(a=2), but actually it's an error. I think that was a good design decision: it's consistent with Python's general philosophy of raising exceptions when things look dodgy, which makes it much easier to find bugs. Compare this to JavaScript, where if you pass four arguments to a function that expected three, the fourth is just discarded; and if the actual incorrect argument was the first, not the fourth, then all of the arguments will be bound to the wrong variables, and if an argument that was supposed to be a number gets a value of some other type as a consequence, and the function tries to add 1 to it, it still won't fail but will produce some silly result like "[object Object]1", which will then propagate through more of the code, until finally you get a wrong answer or a failure in code that's unrelated to the actually erroneous code. I'm thankful that Python doesn't do that, and I wish it didn't do it even more than it already doesn't. Methods that raise an exception on duplicated keys, instead of silently discarding the old or new value, are an example of the sort of fail-safe operations that I'd like to see more of. For overridable options with defaults, `__setitem__` and `update` do the right thing - I certainly don't think they're useless.

...

I'm sorry, I can't visualise how it would take you up to five lines to check and update a key. It shouldn't take more than two:

if key not in d: d[key] = value

Can you give an real-life example of the five line version?

The three lines I was thinking of were something like if k in d: raise KeyError(k) d[k] = ... The five lines were something like d = get_mapping() k = get_key() if k in d: raise KeyError(k) d[k] = ... as a checked version of get_mapping()[get_key()] = ... (or in general, any situation where you can't or don't want to duplicate the expressions that produce the mapping and the key).

...

I don't see any connection between "append" and "fail if the key already exists". That's not what it means with lists.

If Python had built-in dictionaries with no unique-key constraint, and you started with multidict({'a': 1, 'b': 2}) and appended 'a': 3 to that, you'd get multidict({'a': 1, 'b': 2, 'a': 3}) just as if you'd appended ('a', 3) to [('a', 1), ('b', 2)], except that this "list" is indexed on the first half of each element. If you try to append 'a': 3 to the actual Python dict {'a': 1, 'b': 2}, it should fail because {'a': 1, 'b': 2, 'a': 3} violates the unique-key constraint of that data structure. The failure isn't the point, as such. It just means the method can't do what it's meant to do, which is add something to the dict while leaving everything that's already there alone. -- Ben

Reply

Sign in to reply online Use email software

Add dict.append and dict.extend

Ben Rudiak-Gould

George Leslie-Waksman

Yuval Greenfield

David Mertz

Steven D'Aprano

MRAB

Chris Barker - NOAA Federal

Ben Rudiak-Gould

Michael Selik

George Leslie-Waksman

Yuval Greenfield

David Mertz

Steven D'Aprano

MRAB

Chris Barker - NOAA Federal

Ben Rudiak-Gould

Michael Selik

tags

participants (8)