
I'd like to propose adding `append` and `extend` methods to dicts which behave like `__setitem__` and `update` respectively, except that they raise an exception (KeyError?) instead of overwriting preexisting entries. Very often I expect that the key I'm adding to a dict isn't already in it. If I want to verify that, I have to expand my single-line assignment statement to 3-5 lines (depending on whether the dict and key are expressions that I now need to assign to local variables). If I don't verify it, I may overwrite a dict entry and produce silently wrong output. The names `append` and `extend` make sense now that dicts are defined to preserve insertion order: they try to append the new entries, and if that can't be done because it would duplicate a key, they raise an exception. In case of error, `extend` should probably leave successfully appended entries in the dict, since that's consistent with list.extend and dict.update. The same methods would also be useful on sets. Unfortunately, the names make less sense. -- Ben

Semantically, I'm not sure append and extend would be universally understood to mean don't overwrite. This can be accomplished with a custom subclass for your use case: ``` import collections class OverwriteGuardedDict(collections.UserDict): def append(self, key, value): if key in self.data: raise KeyError(key) self.data[key] = value def extend(self, other): overlap = self.data.keys() & other.keys() if overlap: raise KeyError(','.join(overlap)) self.data.update(other) ``` On Mon, Jun 4, 2018 at 2:24 PM Ben Rudiak-Gould <benrudiak@gmail.com> wrote:

On Mon, Jun 4, 2018 at 3:58 PM George Leslie-Waksman <waksman@gmail.com> wrote:
Semantically, I'm not sure append and extend would be universally understood to mean don't overwrite.
The proposed meanings surprised me too. My initial instinct for `dict.append` was that it would always succeed, much like `list.append` always succeeds.

I don't think I'd ever guess the intended semantics from the names in a million years. They seem like horribly misnamed methods, made worse by the false suggestion of similarity to list operations. In 20 years of using Python, moreover, I don't think I've ever wanted the described behavior under any spelling. On Mon, Jun 4, 2018, 6:58 PM George Leslie-Waksman <waksman@gmail.com> wrote:

On Mon, Jun 04, 2018 at 02:22:29PM -0700, Ben Rudiak-Gould wrote:
Can you give some examples of when you want to do that? I'm having difficulty in thinking of any. The only example I thought of is when you have a "preferences" or "settings" dict, and you want to add default settings but only if the user hasn't provided them. But the way to do that is to go in the opposite direction, starting with the defaults, and unconditionally add user settings, overriding the default. # wrong way (yes, I've actually done this :-( settings = get_user_prefs() for key, value in get_default_prefs(): if key not in settings: settings[key] = value # right way settings = get_default_prefs() settings.update(get_user_prefs()) So I'm afraid I don't see the value of this.
I'm sorry, I can't visualise how it would take you up to five lines to check and update a key. It shouldn't take more than two: if key not in d: d[key] = value Can you give an real-life example of the five line version?
I don't see any connection between "append" and "fail if the key already exists". That's not what it means with lists. -- Steve

d.setdefault(key, value)
I though the OP wanted an error if the key already existed. This is close, as it won’t change the dict if the key is already there, but it will add it if it’s not. @OP Maybe post those five lines so we know exactly what you want — maybe there is already a good solution. I know I spent years thinking “there should be an easy way to do this” before I found setdefault(). -CHB

On Mon, Jun 4, 2018 at 4:02 PM, Yuval Greenfield <ubershmekel@gmail.com> wrote:
Many of the methods named `append` in the standard libraries fail if adding the item would violate a constraint of the data structure. `list.append` is an exception because it stores uninterpreted object references, but, e.g., bytearray().append(-1) raises ValueError. Also, `dict.__setitem__` and `dict.update` fail if a key is unhashable, which is another dict-specific constraint. Regardless, I'm not too attached to those names. I want the underlying functionality, and the names made sense to me. I'd be okay with `unique_setitem` and `unique_update`. On Mon, Jun 4, 2018 at 5:25 PM, Steven D'Aprano <steve@pearwood.info> wrote:
One example (or family of examples) is any situation where you would have a UNIQUE constraint on an indexed column in a database. If the values in a column should always be distinct, like the usernames in a table of user accounts, you can declare that column UNIQUE (or PRIMARY KEY) and any attempt to add a record with a duplicate username will fail. People often use Python dicts to look up objects by some property of the object, which is similar to indexing a database column. When the values aren't necessarily unique (like a zip code), you have to use something like defaultdict(list) for the index, because Python doesn't have a dictionary that supports duplicate keys like C++'s std::multimap. When the values should be unique (like a username), the best data type for the index is dict, but there's no method on dicts that has the desired behavior of refusing to add a record with a duplicate key. I think this is a frequent enough use case to deserve standard library support. Of course you can implement the same functionality in other ways, but that's as true of databases as it is of Python. If SQL didn't have UNIQUE, every client of the database would have its own code for checking and enforcing the constraint. They'd all have different names, and slightly different implementations. The uniqueness property that they're supposed to guarantee would probably be documented only in comments if at all. Some implementations would probably have bugs. You can't offload all of your programming needs onto the database developer, but I think UNIQUE is a useful enough feature to merit inclusion in SQL. And that's my argument for Python as well. Another example is keyword arguments. f(**{'a': 1}, **{'a': 2}) could mean f(a=1) or f(a=2), but actually it's an error. I think that was a good design decision: it's consistent with Python's general philosophy of raising exceptions when things look dodgy, which makes it much easier to find bugs. Compare this to JavaScript, where if you pass four arguments to a function that expected three, the fourth is just discarded; and if the actual incorrect argument was the first, not the fourth, then all of the arguments will be bound to the wrong variables, and if an argument that was supposed to be a number gets a value of some other type as a consequence, and the function tries to add 1 to it, it still won't fail but will produce some silly result like "[object Object]1", which will then propagate through more of the code, until finally you get a wrong answer or a failure in code that's unrelated to the actually erroneous code. I'm thankful that Python doesn't do that, and I wish it didn't do it even more than it already doesn't. Methods that raise an exception on duplicated keys, instead of silently discarding the old or new value, are an example of the sort of fail-safe operations that I'd like to see more of. For overridable options with defaults, `__setitem__` and `update` do the right thing - I certainly don't think they're useless.
The three lines I was thinking of were something like if k in d: raise KeyError(k) d[k] = ... The five lines were something like d = get_mapping() k = get_key() if k in d: raise KeyError(k) d[k] = ... as a checked version of get_mapping()[get_key()] = ... (or in general, any situation where you can't or don't want to duplicate the expressions that produce the mapping and the key).
I don't see any connection between "append" and "fail if the key already exists". That's not what it means with lists.
If Python had built-in dictionaries with no unique-key constraint, and you started with multidict({'a': 1, 'b': 2}) and appended 'a': 3 to that, you'd get multidict({'a': 1, 'b': 2, 'a': 3}) just as if you'd appended ('a', 3) to [('a', 1), ('b', 2)], except that this "list" is indexed on the first half of each element. If you try to append 'a': 3 to the actual Python dict {'a': 1, 'b': 2}, it should fail because {'a': 1, 'b': 2, 'a': 3} violates the unique-key constraint of that data structure. The failure isn't the point, as such. It just means the method can't do what it's meant to do, which is add something to the dict while leaving everything that's already there alone. -- Ben

On Monday, June 4, 2018 at 11:29:15 PM UTC-7, Ben Rudiak-Gould wrote:
This might do the trick for you: class InsertOnlyDict(dict): ''' Supports item inserts, but not updates. ''' def __init__(self, *args, **kwds): self.update(*args, **kwds) def __setitem__(self, key, value): if key in self: raise KeyError(f'Duplicate key, {key!r}') super().__setitem__(key, value) def update(self, *args, **kwds): for k, v in dict(*args, **kwds).items(): self[k] = v If you're using a dict-like as an interface to a database table with a unique key constraint, I think your database will appropriately raise IntegrityError when you accidentally try to update instead of insert.

Semantically, I'm not sure append and extend would be universally understood to mean don't overwrite. This can be accomplished with a custom subclass for your use case: ``` import collections class OverwriteGuardedDict(collections.UserDict): def append(self, key, value): if key in self.data: raise KeyError(key) self.data[key] = value def extend(self, other): overlap = self.data.keys() & other.keys() if overlap: raise KeyError(','.join(overlap)) self.data.update(other) ``` On Mon, Jun 4, 2018 at 2:24 PM Ben Rudiak-Gould <benrudiak@gmail.com> wrote:

On Mon, Jun 4, 2018 at 3:58 PM George Leslie-Waksman <waksman@gmail.com> wrote:
Semantically, I'm not sure append and extend would be universally understood to mean don't overwrite.
The proposed meanings surprised me too. My initial instinct for `dict.append` was that it would always succeed, much like `list.append` always succeeds.

I don't think I'd ever guess the intended semantics from the names in a million years. They seem like horribly misnamed methods, made worse by the false suggestion of similarity to list operations. In 20 years of using Python, moreover, I don't think I've ever wanted the described behavior under any spelling. On Mon, Jun 4, 2018, 6:58 PM George Leslie-Waksman <waksman@gmail.com> wrote:

On Mon, Jun 04, 2018 at 02:22:29PM -0700, Ben Rudiak-Gould wrote:
Can you give some examples of when you want to do that? I'm having difficulty in thinking of any. The only example I thought of is when you have a "preferences" or "settings" dict, and you want to add default settings but only if the user hasn't provided them. But the way to do that is to go in the opposite direction, starting with the defaults, and unconditionally add user settings, overriding the default. # wrong way (yes, I've actually done this :-( settings = get_user_prefs() for key, value in get_default_prefs(): if key not in settings: settings[key] = value # right way settings = get_default_prefs() settings.update(get_user_prefs()) So I'm afraid I don't see the value of this.
I'm sorry, I can't visualise how it would take you up to five lines to check and update a key. It shouldn't take more than two: if key not in d: d[key] = value Can you give an real-life example of the five line version?
I don't see any connection between "append" and "fail if the key already exists". That's not what it means with lists. -- Steve

d.setdefault(key, value)
I though the OP wanted an error if the key already existed. This is close, as it won’t change the dict if the key is already there, but it will add it if it’s not. @OP Maybe post those five lines so we know exactly what you want — maybe there is already a good solution. I know I spent years thinking “there should be an easy way to do this” before I found setdefault(). -CHB

On Mon, Jun 4, 2018 at 4:02 PM, Yuval Greenfield <ubershmekel@gmail.com> wrote:
Many of the methods named `append` in the standard libraries fail if adding the item would violate a constraint of the data structure. `list.append` is an exception because it stores uninterpreted object references, but, e.g., bytearray().append(-1) raises ValueError. Also, `dict.__setitem__` and `dict.update` fail if a key is unhashable, which is another dict-specific constraint. Regardless, I'm not too attached to those names. I want the underlying functionality, and the names made sense to me. I'd be okay with `unique_setitem` and `unique_update`. On Mon, Jun 4, 2018 at 5:25 PM, Steven D'Aprano <steve@pearwood.info> wrote:
One example (or family of examples) is any situation where you would have a UNIQUE constraint on an indexed column in a database. If the values in a column should always be distinct, like the usernames in a table of user accounts, you can declare that column UNIQUE (or PRIMARY KEY) and any attempt to add a record with a duplicate username will fail. People often use Python dicts to look up objects by some property of the object, which is similar to indexing a database column. When the values aren't necessarily unique (like a zip code), you have to use something like defaultdict(list) for the index, because Python doesn't have a dictionary that supports duplicate keys like C++'s std::multimap. When the values should be unique (like a username), the best data type for the index is dict, but there's no method on dicts that has the desired behavior of refusing to add a record with a duplicate key. I think this is a frequent enough use case to deserve standard library support. Of course you can implement the same functionality in other ways, but that's as true of databases as it is of Python. If SQL didn't have UNIQUE, every client of the database would have its own code for checking and enforcing the constraint. They'd all have different names, and slightly different implementations. The uniqueness property that they're supposed to guarantee would probably be documented only in comments if at all. Some implementations would probably have bugs. You can't offload all of your programming needs onto the database developer, but I think UNIQUE is a useful enough feature to merit inclusion in SQL. And that's my argument for Python as well. Another example is keyword arguments. f(**{'a': 1}, **{'a': 2}) could mean f(a=1) or f(a=2), but actually it's an error. I think that was a good design decision: it's consistent with Python's general philosophy of raising exceptions when things look dodgy, which makes it much easier to find bugs. Compare this to JavaScript, where if you pass four arguments to a function that expected three, the fourth is just discarded; and if the actual incorrect argument was the first, not the fourth, then all of the arguments will be bound to the wrong variables, and if an argument that was supposed to be a number gets a value of some other type as a consequence, and the function tries to add 1 to it, it still won't fail but will produce some silly result like "[object Object]1", which will then propagate through more of the code, until finally you get a wrong answer or a failure in code that's unrelated to the actually erroneous code. I'm thankful that Python doesn't do that, and I wish it didn't do it even more than it already doesn't. Methods that raise an exception on duplicated keys, instead of silently discarding the old or new value, are an example of the sort of fail-safe operations that I'd like to see more of. For overridable options with defaults, `__setitem__` and `update` do the right thing - I certainly don't think they're useless.
The three lines I was thinking of were something like if k in d: raise KeyError(k) d[k] = ... The five lines were something like d = get_mapping() k = get_key() if k in d: raise KeyError(k) d[k] = ... as a checked version of get_mapping()[get_key()] = ... (or in general, any situation where you can't or don't want to duplicate the expressions that produce the mapping and the key).
I don't see any connection between "append" and "fail if the key already exists". That's not what it means with lists.
If Python had built-in dictionaries with no unique-key constraint, and you started with multidict({'a': 1, 'b': 2}) and appended 'a': 3 to that, you'd get multidict({'a': 1, 'b': 2, 'a': 3}) just as if you'd appended ('a', 3) to [('a', 1), ('b', 2)], except that this "list" is indexed on the first half of each element. If you try to append 'a': 3 to the actual Python dict {'a': 1, 'b': 2}, it should fail because {'a': 1, 'b': 2, 'a': 3} violates the unique-key constraint of that data structure. The failure isn't the point, as such. It just means the method can't do what it's meant to do, which is add something to the dict while leaving everything that's already there alone. -- Ben

On Monday, June 4, 2018 at 11:29:15 PM UTC-7, Ben Rudiak-Gould wrote:
This might do the trick for you: class InsertOnlyDict(dict): ''' Supports item inserts, but not updates. ''' def __init__(self, *args, **kwds): self.update(*args, **kwds) def __setitem__(self, key, value): if key in self: raise KeyError(f'Duplicate key, {key!r}') super().__setitem__(key, value) def update(self, *args, **kwds): for k, v in dict(*args, **kwds).items(): self[k] = v If you're using a dict-like as an interface to a database table with a unique key constraint, I think your database will appropriately raise IntegrityError when you accidentally try to update instead of insert.
participants (8)
-
Ben Rudiak-Gould
-
Chris Barker - NOAA Federal
-
David Mertz
-
George Leslie-Waksman
-
Michael Selik
-
MRAB
-
Steven D'Aprano
-
Yuval Greenfield