Mailman 3 PEP 584: Add Union Operators To dict - Python-Dev

PEP 584: Add Union Operators To dict

Brandt Bucher

Feb. 4, 2020

4:48 p.m.

Steven D'Aprano and I have pushed a third draft of PEP 584: https://www.python.org/dev/peps/pep-0584/ The accompanying reference implementation is on GitHub: https://github.com/brandtbucher/cpython/tree/addiction For those who have been following the discussions over the past year on python-ideas, this new draft does not contain much additional content; the most notable difference is that the choice of operator has been changed from + to |. The rest of the revisions are mostly reformatting and reorganizing the information to bring the document in line with PEP standards. Please let us know what you think – we'd love to hear any *new* feedback that hasn't yet been addressed in the PEP or the related discussions it links to! Thanks! Brandt

Show replies by date

Guido van Rossum

February 2020

5:28 p.m.

Thanks Brandt (and Steven of course)! If there are no objections by next week I'll recommend this to the Steering Council for acceptance. In the meantime, I am wondering about the reference implementation -- is it suitable to submit as a PR? Or is it a toy written in pure Python? (I found it a little tricky to follow your branch.) On Tue, Feb 4, 2020 at 4:57 PM Brandt Bucher <brandtbucher@gmail.com> wrote:

...

Steven D'Aprano and I have pushed a third draft of PEP 584:

https://www.python.org/dev/peps/pep-0584/

The accompanying reference implementation is on GitHub:

https://github.com/brandtbucher/cpython/tree/addiction

For those who have been following the discussions over the past year on python-ideas, this new draft does not contain much additional content; the most notable difference is that the choice of operator has been changed from + to |. The rest of the revisions are mostly reformatting and reorganizing the information to bring the document in line with PEP standards.

Please let us know what you think – we'd love to hear any *new* feedback that hasn't yet been addressed in the PEP or the related discussions it links to!

Thanks!

Brandt _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TTIKCDIP... Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Brandt Bucher

5:34 p.m.

It already has a PR open against master, with all tests passing: https://github.com/python/cpython/pull/12088 Sorry about the messy history - this proposal has changed significantly several times over the past year (at least as far as the implementation is concerned). At one point, both operators were implemented for comparison with each other!

Musbur

4:27 a.m.

This is a great PEP. Just recently I needed this and was surprised that nothing of the sort had been implemented yet (I looked for quite some time). I have one suggestion: Wouldn't it be useful for these operators to also accept sets (functionally acting like a dict with None for all values)? This would make it very elegant to 'normalize' dicts by pruning (dict & set) or padding (set | dict) dictionaries. I would find this useful for efficient data sanitation purposes, as when processing input from web forms. Am 05.02.2020 02:28 schrieb Guido van Rossum:

...

Thanks Brandt (and Steven of course)! If there are no objections by next week I'll recommend this to the Steering Council for acceptance.

In the meantime, I am wondering about the reference implementation -- is it suitable to submit as a PR? Or is it a toy written in pure Python? (I found it a little tricky to follow your branch.)

On Tue, Feb 4, 2020 at 4:57 PM Brandt Bucher <brandtbucher@gmail.com> wrote:

...
Steven D'Aprano and I have pushed a third draft of PEP 584:

https://www.python.org/dev/peps/pep-0584/ [1]

The accompanying reference implementation is on GitHub:

https://github.com/brandtbucher/cpython/tree/addiction [2]

For those who have been following the discussions over the past year on python-ideas, this new draft does not contain much additional content; the most notable difference is that the choice of operator has been changed from + to |. The rest of the revisions are mostly reformatting and reorganizing the information to bring the document in line with PEP standards.

Please let us know what you think – we'd love to hear any *new* feedback that hasn't yet been addressed in the PEP or the related discussions it links to!

Thanks!

Brandt _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ [3] Message archived at

https://mail.python.org/archives/list/python-dev@python.org/message/TTIKCDIP...

...
[4] Code of Conduct: http://python.org/psf/codeofconduct/ [5]

--

--Guido van Rossum (python.org/~guido [6]) _Pronouns: he/him __(why is my pronoun here?)_ [7]

Links: ------ [1] https://www.python.org/dev/peps/pep-0584/ [2] https://github.com/brandtbucher/cpython/tree/addiction [3] https://mail.python.org/mailman3/lists/python-dev.python.org/ [4] https://mail.python.org/archives/list/python-dev@python.org/message/TTIKCDIP... [5] http://python.org/psf/codeofconduct/ [6] http://python.org/~guido [7] http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NFVVGNXB... Code of Conduct: http://python.org/psf/codeofconduct/

Rhodri James

6:04 a.m.

On 05/02/2020 12:27, Musbur wrote:

...

I have one suggestion: Wouldn't it be useful for these operators to also accept sets (functionally acting like a dict with None for all values)? This would make it very elegant to 'normalize' dicts by pruning (dict & set) or padding (set | dict) dictionaries. I would find this useful for efficient data sanitation purposes, as when processing input from web forms.

Why None? Why not 0, or False, or 42? This sort of thing belongs more in a function or method, IMHO. On the original PEP, I'm resigned to it being the union operator and +1 on the whole thing. Good work, guys! -- Rhodri James *-* Kynesim Ltd

Brandt Bucher

8:42 a.m.

...

I have one suggestion: Wouldn't it be useful for these operators to also accept sets (functionally acting like a dict with None for all values)?

...

Why None? Why not 0, or False, or 42? This sort of thing belongs more in a function or method, IMHO.

Well, in their defense, None is the null object in Python, so it would be a natural choice. With that said, I am strongly against this for several reasons: - The proposal in its current form is very easy to wrap your head around: "|" takes dicts, "|=" takes anything dict.update does. - Python doesn't allow sets to be dictionaries anywhere else. It's much more natural to use a dict like a set than to use a set like a dict. Hence, this PEP! - A consistent argument that we've gotten since the very beginning is something to the effect of "I don't know what types are being used in the expression 'a | b'." While we argue that this isn't the issue in practice that people make it out to be, accepting sets here would definitely muddy the waters.

...

This would make it very elegant to 'normalize' dicts by pruning (dict & set) or padding (set | dict) dictionaries.

Well, if the PEP lands, the padding part is easy: padded = dict.fromkeys(expected) | given Or even better, just do the dict.fromkeys construction at module level once, and do: padded = EXPECTED | given Or, implement the behavior in a subclass if you truly need it to be a set! We're intentionally very subclass-friendly (unlike list, for example).

Chris Angelico

8:54 a.m.

On Thu, Feb 6, 2020 at 3:47 AM Brandt Bucher <brandtbucher@gmail.com> wrote:

...

...
I have one suggestion: Wouldn't it be useful for these operators to also accept sets (functionally acting like a dict with None for all values)?

...
Why None? Why not 0, or False, or 42? This sort of thing belongs more in a function or method, IMHO.

Well, in their defense, None is the null object in Python, so it would be a natural choice. With that said, I am strongly against this for several reasons:

- The proposal in its current form is very easy to wrap your head around: "|" takes dicts, "|=" takes anything dict.update does. - Python doesn't allow sets to be dictionaries anywhere else. It's much more natural to use a dict like a set than to use a set like a dict. Hence, this PEP! - A consistent argument that we've gotten since the very beginning is something to the effect of "I don't know what types are being used in the expression 'a | b'." While we argue that this isn't the issue in practice that people make it out to be, accepting sets here would definitely muddy the waters.

- and you can use dict.fromkeys to "upgrade" a set to a dict anyway. If this goes through as is, and then in a few years' time we start seeing lots of people begging to be able to skip the explicit fromkeys and accept sets directly, it can be addressed then. Personally, I think sets make good sense on the right hand side of an intersection (as a means of saying "whitelist the keys to this set"; values would be irrelevant there anyway), but not with union/update. ChrisA

Brett Cannon

3:21 p.m.

I agree that if we want to go down the road of creating a copy to allow for subclasses then we should define a dunder method for such a use, even if it's redundant in the face of dict.copy().

Paul G

3:38 p.m.

It looks to me like dict.__copy__ is not implemented, does anyone know why it's not basically an alias for dict.copy? If it's just random happenstance, presumably we could move dict.copy to __copy__ and then have dict.copy as an alias or thin wrapper. It might be desirable anyway for copy.copy to have a "fast path". On February 7, 2020 11:21:55 PM UTC, Brett Cannon <brett@python.org> wrote:

...

I agree that if we want to go down the road of creating a copy to allow for subclasses then we should define a dunder method for such a use, even if it's redundant in the face of dict.copy(). _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ON3E55CD... Code of Conduct: http://python.org/psf/codeofconduct/

Paul Ganssle

12:12 p.m.

Hi Brandt, very nice PEP. I have two questions here. First:

...

- The proposal in its current form is very easy to wrap your head around: "|" takes dicts, "|=" takes anything dict.update does.

I see this asymmetry between the | and |= mentioned a few times in the PEP, but I don't see any rationale other than "the authors have decided". I am not saying that this is the wrong decision, but the reasoning behind this choice is not obvious, and I think it might be a good idea to include the rationale in the PEP. I'd say the asymmetry between list's `__add__` and `__iadd__` semantics is actually fairly confusing for anyone who hasn't encountered it before: >>> a = [] >>> a = a + "one" --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-5-f2c4cda7ee5b> in <module> ----> 1 a = a + "one" TypeError: can only concatenate list (not "str") to list >>> a += "one" >>> a ['o', 'n', 'e'] I think most people would be surprised at the difference in semantics here and this also an example of a situation where it's not obvious that the "call `list.extend`" behavior is the right thing to do. It would be nice to see why you rejected: 1. Giving |= `.update` semantics rather than the semantics you chose for |. 2. Giving `|` the same semantics as `|=`. Second question: The specification mentions "Dict union will return a new dict containing the left operand merged with the right operand, which must be a dict (or an instance of a dict subclass)." Can you clarify if it is part of the spec that it will always return a `dict` even if one or both of the operands is a dict subclass? You mentioned in another post that this is deliberately intended to be subclass-friendly, but if it always returns `dict`, then I would expect this: >>> class MyDict(dict): ... pass ... >>> MyDict({1: 2}) | MyDict({3: 4}) {1: 2, 3: 4} I realize that there's a lot of precedent for this with other builtin types (int subclasses reverting to int with +, etc), though generally the justifications for this are two-fold: 1. For symmetrical operations like addition it's not obvious which operand's type should prevail, particularly if the two are both different dict subclasses. I think in this case | is already asymmetrical and it would be relatively natural to think of the right thing to do as "make a copy of the LHS then update it with the RHS" (thus retaining the subtype of the LHS). 2. Subclasses may override the constructor in unpredictable ways, so we don't know how to construct arbitrary subtypes. I /think/ this objection could be satisfied by using logic equivalent to `copy.copy` for the LHS when it is a dict subclass, and then using the mapping protocol on the RHS. Unless there is compelling reason to do otherwise, I am in favor of trying to retain subclass identity after operations, but either way it would be good to be explicit about it in the specification and maybe include a bit of the rationale one way or the other. Best, Paul

Brandt Bucher

12:58 p.m.

...

I see this asymmetry between the | and |= mentioned a few times in the PEP, but I don't see any rationale other than "the authors have decided".

I agree that this probably deserves to be addressed. I can't speak for Steven, but I know that my motivation here is to restrict `|` in order to avoid confusing casting. It suddenly becomes a very complicated problem with strange edge-cases as soon as reflected operations get involved. In contrast, `|=` appears in dramatically simpler contexts, so we don't need to worry about resolving all of the combinations of possible LHS and RHS types (or whether the user can keep up with what we're doing under-the-hood). I have no idea if this is what motivated the decision in `list`, but I think it's a good one.

...

The specification mentions "Dict union will return a new dict containing the left operand merged with the right operand, which must be a dict (or an instance of a dict subclass)." Can you clarify if it is part of the spec that it will always return a dict even if one or both of the operands is a dict subclass?

See my recent post on this here. I believe that we should call an overridden `lhs.copy()`, but others disagreed during code review. For what it's worth, it's probably not good for us to use "dict" and "`dict`" (with monospace/backticks) to mean different things in the PEP... ;)

...

Unless there is compelling reason to do otherwise, I am in favor of trying to retain subclass identity after operation.

I'll count you as a vote for `new = lhs.copy(); dict.update(new, rhs)`, then. ;)

Serhiy Storchaka

9:19 a.m.

05.02.20 14:27, Musbur пише:

...

I have one suggestion: Wouldn't it be useful for these operators to also accept sets (functionally acting like a dict with None for all values)? This would make it very elegant to 'normalize' dicts by pruning (dict & set) or padding (set | dict) dictionaries. I would find this useful for efficient data sanitation purposes, as when processing input from web forms.

d = {} d |= {(1, 2)} What is d now? {1: 2} or {(1, 2): None}

Musbur

12:37 a.m.

Depends on what d is.

...

...
...
type({}) <class 'dict'>

So the result is {(1, 2): None}, but the ambiguity comes from the definition of {}, not from my proposal. Am 05.02.2020 18:19 schrieb Serhiy Storchaka:

...

05.02.20 14:27, Musbur пише:

...
I have one suggestion: Wouldn't it be useful for these operators to also accept sets (functionally acting like a dict with None for all values)? This would make it very elegant to 'normalize' dicts by pruning (dict & set) or padding (set | dict) dictionaries. I would find this useful for efficient data sanitation purposes, as when processing input from web forms.

d = {} d |= {(1, 2)}

What is d now? {1: 2} or {(1, 2): None} _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZUXJBYUH... Code of Conduct: http://python.org/psf/codeofconduct/

Chris Angelico

12:51 a.m.

On Thu, Feb 6, 2020 at 7:42 PM Musbur <musbur@posteo.org> wrote:

...

Depends on what d is.

...
...
...
type({}) <class 'dict'>

So the result is {(1, 2): None}, but the ambiguity comes from the definition of {}, not from my proposal.

Actually, what Serhiy hinted at was a consequence (and, I would say, a rather weird corner case) of the current definition of |= as being "equivalent to update()". Since update() will accept a number of things, including an iterable of pairs, it's actually possible to use a set of two-element tuples as a representation of keys and values. It seems pretty unlikely that anyone would use a *set* for this (as opposed to, say, a list or generator), but it does mean that your proposal would conflict with that. Which in turn means you're actually asking for |= to special-case sets; not impossible, by any means, but it's a definite change and not simply a logical extension of current behaviour. I don't think supporting sets in |= is worth this confusion, especially since there's no easy way to override the choice of value. It'd be just as easy to do thing|=dict.fromkeys(s) and avoid the whole issue. However, IMO the &= operator would be better able to accept a set, and this is something that wouldn't conflict (and thus can be safely added later). ChrisA

Brandt Bucher

11:38 a.m.

One issue that's come up during PR review with Guido and Serhiy is, when evaluating `a | b`, whether or not the default implementation should call an overridden `copy()` method on `a` in order to create an instance of the correct subclass (rather than a plain-ol' `dict`). For example, `defaultdict` works correctly without any additional modification: ```

...

...
...
from collections import defaultdict i = defaultdict(int, {1: 1, 2: 2}) s = defaultdict(str, {2: "2", 3: "3"}) i | s defaultdict(<class 'int'>, {1: 1, 2: '2', 3: '3'}) s | i defaultdict(<class 'str'>, {2: 2, 3: '3', 1: 1})


So this has immediate benefits for both usability and maintenance: subclasses only need to override `copy()` to get working `__or__`/`__ror__` behavior. While this isn't what `list` and `set` do, for example, I argue that `dict` is more often subclassed, and we shouldn't blindly follow the precedent of their behavior when designing a new API for `dict`.

We decided to bring the discussion here to get input from a larger audience. I'm currently +0 on calling `copy()`, but I know Steven feels a bit more strongly about this than I do:

> I think the standard handling of subclasses in Python builtins is wrong, and I don't wish to emulate that wrong behaviour without a really good reason. Or at least a better reason than "other methods break subclassing unless explicitly overloaded, so this should do so too". Or at least not without a fight :-)

The more detailed thread of discussion starts at https://github.com/python/cpython/pull/12088#issuecomment-582609024 (note that we are no longer considering calling an overridden `update` method).

Serhiy Storchaka

1:23 p.m.

06.02.20 21:38, Brandt Bucher пише:

...

One issue that's come up during PR review with Guido and Serhiy is, when evaluating `a | b`, whether or not the default implementation should call an overridden `copy()` method on `a` in order to create an instance of the correct subclass (rather than a plain-ol' `dict`).

It would create an exception of two rules: 1. Operators on subclasses of builtin classes do not depend on overridden methods of arguments (except the corresponding dunder method). `list.__add__` and `set.__or__` do not call copy() and extend()/update(). You should override the corresponding dunder method to change the behavior of the operator. 2. Operators do not depend on non-dunder methods. This looks to me as a direct violation of the principle "Special cases aren't special enough to break the rules."

...

...
I think the standard handling of subclasses in Python builtins is wrong, and I don't wish to emulate that wrong behaviour without a really good reason. Or at least a better reason than "other methods break subclassing unless explicitly overloaded, so this should do so too". Or at least not without a fight :-)

We can discuss and change the standard handling of subclasses in Python builtins. But if it be changed, it should be changed for all builtin classes, without exceptions and special cases. This could resolve a problem with rule 1. But there is still a problem with rule 2.

Brandt Bucher

1:44 p.m.

...

It would create an exception of two rules:

I don't think these are "rules", I think they're just "the way things are". If I'm subclassing `dict`, and I see in the docs something to the effect of:

...

By default, `dict` subclasses will return `dict` objects for `|` operations. To force the creation of a new instance of the subclass, users can override the `copy` method. In that case, the return value from this method will be used instead.

Then my life suddenly becomes a lot better, because chances are I've already thought to override `copy`. And if I want the "legacy" behavior, it's as simple as not bothering with "copy" (or, if I need to, overriding the `__or__` trio)... but I'm sure this is the less common case. If we're quoting the Zen, then let's not elevate past design patterns to "rules". Besides, practicality beats purity. ;)

Paul Ganssle

1:56 p.m.

On 2/6/20 4:23 PM, Serhiy Storchaka wrote:

...

It would create an exception of two rules:

1. Operators on subclasses of builtin classes do not depend on overridden methods of arguments (except the corresponding dunder method). `list.__add__` and `set.__or__` do not call copy() and extend()/update(). You should override the corresponding dunder method to change the behavior of the operator.

2. Operators do not depend on non-dunder methods.

This looks to me as a direct violation of the principle "Special cases aren't special enough to break the rules."

I may not fully understand the implications of #1, but I would think you could implement the semantics Brandt wants using only dunder methods and copy.copy (which itself dispatches to one of a number of dunder methods - __copy__, __reduce__, __setstate__, depending on which ones are defined - we could presumably avoid the `copy` import by partially porting that logic into `__or__`): def __or__(self, other): new_value = copy.copy(self) for key in other.__keys__(): new_value.__setitem__(key, other.__getitem__(key)) return new_value Obviously the actual implementation would be in C and handle more edge cases and whatnot, but I think the desired semantics can all be achieved using only magic methods on the objects themselves (though we'd probably want to bypass all that stuff in favor of a "fast path" in the case of `dict` | `dict`). Best, Paul

Brandt Bucher

2:03 p.m.

...

but I think the desired semantics can all be achieved using only magic methods on the objects themselves

Hm. So, just to clarify, you're suggesting we use `__copy__`, if it exists? Interesting...

Brandt Bucher

2:25 p.m.

Sorry Paul, I sent my reply too soon. I see what you're saying, and I'm pretty firmly -1 on reinventing (or importing) copy.copy. We already have an API for copying a dict (dict.copy). I still fail to see problem with using a method that doesn't start and end with underscores, other than that we "haven't done it". Brandt

Chris Angelico

2:36 p.m.

On Fri, Feb 7, 2020 at 9:30 AM Brandt Bucher <brandtbucher@gmail.com> wrote:

...

Sorry Paul, I sent my reply too soon.

I see what you're saying, and I'm pretty firmly -1 on reinventing (or importing) copy.copy. We already have an API for copying a dict (dict.copy).

I still fail to see problem with using a method that doesn't start and end with underscores, other than that we "haven't done it".

Before Python 3.0, iterators had a next() method, and that was explicitly and consciously changed to __next__(). The arguments there seem relevant here too. https://www.python.org/dev/peps/pep-3114/ ChrisA

Brandt Bucher

9:54 p.m.

...

...
We already have an API for copying a dict (dict.copy). I still fail to see problem with using a method that doesn't start and end with underscores, other than that we "haven't done it".

Before Python 3.0, iterators had a next() method, and that was explicitly and consciously changed to __next__(). The arguments there seem relevant here too.

Thanks for bringing this up. PEP 3114 does a great job of explaining how and why Python uses dunders for *language-level* constructs. I would probably be opposed to language features or built-in functions doing magical things with the `copy` method of dicts; however, in this case, it's being used by the dict itself! As the PEP 3114 says:

...

In Python, double underscores before and after a name are used to distinguish names that belong to the language itself... Not all things that are called "protocols" are made of methods with double-underscore names... even though the read method is part of the file protocol, it does not have double underscores because there is no language construct that implicitly invokes x.read().

The language itself doesn't call `copy`, but `dict` should feel free to. Especially if it makes life easier for subclasses (which PEP 584 actually encourages users to make in the "Rejected Semantics" section: https://www.python.org/dev/peps/pep-0584/#rejected-semantics)!

Paul G

3 p.m.

I don't have a terribly strong opinion about whether or not it is acceptable to use dict.copy, my point was that the desired semantics can be achieved using only dunder methods if desired, and I think at this point getting the semantics right probably matters more than the implementation details. If we all agree on the semantics and we're just trying to decide how to get there, then I suppose I don't have a dog in the fight. I will note that it doesn't seem to be true that operators never call standard methods. Looks like date.__add__ calls date.toordinal and date.fromordinal (at least in the pure Python implementation), and datetime calls those plus tzinfo.utcoffset. Not sure if the rule Serhiy is citing is only intended to apply to builtins, though. On February 6, 2020 10:25:52 PM UTC, Brandt Bucher <brandtbucher@gmail.com> wrote:

...

Sorry Paul, I sent my reply too soon.

I see what you're saying, and I'm pretty firmly -1 on reinventing (or importing) copy.copy. We already have an API for copying a dict (dict.copy).

I still fail to see problem with using a method that doesn't start and end with underscores, other than that we "haven't done it".

Brandt _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7TQI54BE... Code of Conduct: http://python.org/psf/codeofconduct/

Serhiy Storchaka

11:54 p.m.

07.02.20 01:00, Paul G пише:

...

I will note that it doesn't seem to be true that operators never call standard methods. Looks like date.__add__ calls date.toordinal and date.fromordinal (at least in the pure Python implementation), and datetime calls those plus tzinfo.utcoffset. Not sure if the rule Serhiy is citing is only intended to apply to builtins, though.

It is an implementation detail. The C implementation does not call date.toordinal and date.fromordinal.

Steven D'Aprano

9:18 p.m.

Thank you to everyone for your patience while I dragged my feet responding to this. It's not because of a lack of interest, just a lack of uninterrupted time to focus on this :-) I believe that the final sticking points are the behaviour with subclasses and whether or not the union operator ought to call "copy" on the left hand operator, or directly on dict. That is, the difference is between *roughly* these two: new = self.copy() # Preserves subclasses. new = dict.copy(self) # Always a builtin dict (But we should use a __copy__ dunder rather than copy.) TL;DR: My *strong* preference is for the union operator to call the left hand operand's copy method, in order to preserve the subclass of the LH operand, but if that's a sticking point for the proposal I'll accept the alternative, non-preserving, behaviour. On Thu, Feb 06, 2020 at 07:38:03PM -0000, Brandt Bucher wrote:

...

One issue that's come up during PR review with Guido and Serhiy is, when evaluating `a | b`, whether or not the default implementation should call an overridden `copy()` method on `a` in order to create an instance of the correct subclass (rather than a plain-ol' `dict`). For example, `defaultdict` works correctly without any additional modification:

My opinion is that Python built-in types make subclassing unnecessarily(?) awkward to use, and I would like to see that change. For example, consider subclassing float. If you want your subclass to be actually usable, you have to write a whole bunch of boilerplate, otherwise the first time you perform arithmetic on your subclass, it will be converted to a regular old float. class MyFloat(float): # This is the only thing I actually want to change. def mymethod(self): pass # But I need to override every operator too. def __add__(self, other): tmp = super().__add__(other) if tmp is NotImplemented: return tmp return MyFloat(tmp) # and so on for all the other operators This is painful and adds a great amount of friction to subclassing. A more pertinent example, from dict itself: py> class MyDict(dict): ... pass ... py> d = MyDict() py> type(d.copy()) <class 'dict'> So unless my subclass overrides the copy method, copy doesn't actually makes a copy, it coerces the copy into the superclass. My preference is to avoid inflicting that pain onto subclasses. Serhiy commented: "We can discuss and change the standard handling of subclasses in Python builtins. But if it be changed, it should be changed for all builtin classes, without exceptions and special cases." Changing all builtins is a big, backwards-incompatible change. It probably should have been done in 3.0. If it is done, it would surely need to be done using a `__future__` import. So in practice, it will probably never be done. My personal opinion is that I would rather have the inconsistency between dict union operator and the rest of the builtins. Better to have one class do the Right Thing (in my opinion) than none of them. But I accept that others may disagree, and if this issue is a sticking point, I'll back down gracefully and go with the non-preserving behaviour. Of course, if the status quo where builtin methods and operators return the builtin class unless explicitly overridden in the subclass is an intentional design choice for a good reason, that may cast a different light on this issue. If we do decide to delegate to the operands for making a copy, we should follow Brett's comment: "I agree that if we want to go down the road of creating a copy to allow for subclasses then we should define a dunder method for such a use, even if it's redundant in the face of dict.copy()." In other words: * dict gains a `__copy__` dunder * `dict.copy` becomes an alias to `dict.__copy__` * dict union operator should call the `__copy__` dunder of the left hand operand (rather than `copy` itself). I believe that will do the right thing in all cases. Have I missed anything? [Brandt]

...

So this has immediate benefits for both usability and maintenance: subclasses only need to override `copy()` to get working `__or__`/`__ror__` behavior.

Indeed, except it should be the dunder `__copy__`. -- Steven

Brandt Bucher

8:42 a.m.

After a few days of thinking and experimenting, I’ve been convinced that `copy` (and also `__copy__`) is not the right protocol for what we want to do here. I believe that 584 can likely continue without subclass-preserving behavior, but that better behavior could perhaps could be added to *all* built-in types later, since it’s outside the scope of this PEP.

...

My opinion is that Python built-in types make subclassing unnecessarily(?) awkward to use, and I would like to see that change.

Yes! But, on further reflection, I don’t think this is the correct way of approaching it.

...

For example, consider subclassing float. If you want your subclass to be actually usable, you have to write a whole bunch of boilerplate, otherwise the first time you perform arithmetic on your subclass, it will be converted to a regular old float… This is painful and adds a great amount of friction to subclassing.

`float` is a *perfect* example of the problems with the way things are currently, so let’s focus on this. Currently, subclassing `float` requires ~30 overridden methods of repetitive (but non-trivial) boilerplate to get everything working right. However, calling the `float` equivalent of `dict.copy()` on the LHS before proceeding with the default implementation wouldn’t help us, because floats (like many built-in types) are immutable. So a new, plain, built-in `float` would still be returned by the default machinery. It doesn’t know how to construct a new, different instance of our subclass, and it can’t change one it’s already built. This leads me to believe that we’re approaching the problem wrong. Rather than making a copy and working on it, I think the problem would be better served by a protocol that runs the default implementation, *then* calls some under hook on the subclass to build a new instance. Let’s call this method `__build__`. I’m not sure what its arguments would look like, but it would probably need at least `self`, and an instance of the built-in base class (in this case a `float`), and return a new instance of the subclass based on the two. It would likely also need to work with `cls` instead of `self` for `classmethod` constructors like `dict.fromkeys`, or have a second hook for that case. By subclassing `float` and defining `__build__` to something like this: ``` class MyFloat(float): … def __build__(self, result): Return MyFloat(result, some_state=self.some_state) … ``` I could now trust the built-in `float` machinery to try calling `lhs.__build__(result)` on the result that *would* have been returned *before* returning it. This is a simple example, but a protocol like this would work for mutables as well.

...

A more pertinent example, from dict itself:

If `dict` *were* to grow more operators, they would likely be `^`, `&`, and `-`. You can consider the case of subclassing `set` or `frozenset`, since they currently has those. Calling `lhs.copy()` first before updating is fine for additive operations like `|`, but for subtractive operations like the others, this can be very bad for performance, especially if we’re now *required* to call them. Again, doing things the default way, and *then* constructing the appropriate subclass in an agreed-upon way seems like the path to take here.

...

Changing all builtins is a big, backwards-incompatible change.

If implemented right, a system like the one described above (`__build__`) wouldn’t be backward-incompatible, as long as nobody was already using the name. Just food for thought. I think this is a much bigger issue than PEP 584, but I'm convinced that the consistent status quo should prevail until a suitable solution for all types can be worked out (if ever).

Andrew Barnert

2:05 p.m.

Brandt Bucher wrote:

...

This leads me to believe that we’re approaching the problem wrong. Rather than making a copy and working on it, I think the problem would be better served by a protocol that runs the default implementation, then calls some under hook on the subclass to build a new instance.

Let’s call this method `__build__`. I’m not sure what its arguments would look like, but it would probably need at least `self`, and an instance of the built-in base class (in this case a `float`), and return a new instance of the subclass based on the two. It would likely also need to work with `cls` instead of `self` for `classmethod` constructors like `dict.fromkeys`, or have a second hook for that case.

You can call `self.fromkeys`, and it works just like calling `type(self).fromkeys`. The only real advantage of having a second hook is that it would simplify the most trivial cases—which are very common. In particular, probably 90% of subclasses of builtins are like Steven's `MyFloat` example—all you really want to do is call your constructor in place of the super's constructor, and if you have to call it with the result of your super's constructor instead, that's fine because `MyFloat(x)` on a `float` or `MyFloat` is equivalent to `x` anyway. So you could just write `__build_cls__ = __new__` and you're done. With only an instance-method version, you'd have to write `def __build__(self, other): return type(self)(other)`. Which isn't _terrible_ or anything, but as boilerplate that has to be added (probably without being understood) to hundreds of classes, it's not exactly ideal. If there were a way to actually get your constructor called on the `__new__` arguments directly, without constructing the superclass instance first, that would be even better. Besides being more efficient (and that "more efficient" could actually be a big deal, because we're talking about every call to every operator dunder and many other methods on builtin needing to check this in addition to whatever else it does…), it would allow a trivial implementation on types that share their super's constructor signature but can't guarantee that `MyType(x) == x`. Even for cases like `defaultdict`, if you could supply a constructor, you'd be fine: `partial(self, self.default_factory)` can be used with the arguments to a `dict` construction call just as easily as it can be used with a `dict` itself. But I'm not sure there is such a way. (Maybe the pickle/copy protocol can help here? Not sure without thinking it through more…)

...

If implemented right, a system like the one described above (__build__) wouldn’t be backward-incompatible, as long as nobody was already using the name.

Assuming the builtins don't grow `__build__` methods that use `cls` or `type(self)` (which is what you'd ideally want, but then you get the same massive backward-incompatibility problem we were trying to avoid…), it seems like we're adding possibly significant cost to everything (maybe not significant for `dict.__union__`, but maybe so for `int.__add__`) for a benefit that almost no code actually uses. Maybe the longterm benefit of everyone being able to drop those `MyFloat(…)` calls all over once they can require 3.10+ is worth the immediate and permanent cost to performance and implementation complexity, but I'm not sure. (If there were an opt-in way to replace the super's construction call instead of post-hooking it, the cost might be reduced enough to change that calculation. But again, I'm not sure if there is such a way.)

Guido van Rossum

2:12 p.m.

So can we just finish PEP 584 without the .copy() call? Surely the more general solution will not be ready for Python 3.9, while PEP 584 is nearly done. (I'm also skeptical about a general solution, but I'd rather stay out of that discussion for a while longer, and maybe you all come up with something good.) On Sun, Feb 16, 2020 at 8:46 AM Brandt Bucher <brandtbucher@gmail.com> wrote:

...

After a few days of thinking and experimenting, I’ve been convinced that `copy` (and also `__copy__`) is not the right protocol for what we want to do here. I believe that 584 can likely continue without subclass-preserving behavior, but that better behavior could perhaps could be added to *all* built-in types later, since it’s outside the scope of this PEP.

...
My opinion is that Python built-in types make subclassing unnecessarily(?) awkward to use, and I would like to see that change.

Yes! But, on further reflection, I don’t think this is the correct way of approaching it.

...
For example, consider subclassing float. If you want your subclass to be actually usable, you have to write a whole bunch of boilerplate, otherwise the first time you perform arithmetic on your subclass, it will be converted to a regular old float… This is painful and adds a great amount of friction to subclassing.

`float` is a *perfect* example of the problems with the way things are currently, so let’s focus on this.

Currently, subclassing `float` requires ~30 overridden methods of repetitive (but non-trivial) boilerplate to get everything working right. However, calling the `float` equivalent of `dict.copy()` on the LHS before proceeding with the default implementation wouldn’t help us, because floats (like many built-in types) are immutable. So a new, plain, built-in `float` would still be returned by the default machinery. It doesn’t know how to construct a new, different instance of our subclass, and it can’t change one it’s already built.

This leads me to believe that we’re approaching the problem wrong. Rather than making a copy and working on it, I think the problem would be better served by a protocol that runs the default implementation, *then* calls some under hook on the subclass to build a new instance.

Let’s call this method `__build__`. I’m not sure what its arguments would look like, but it would probably need at least `self`, and an instance of the built-in base class (in this case a `float`), and return a new instance of the subclass based on the two. It would likely also need to work with `cls` instead of `self` for `classmethod` constructors like `dict.fromkeys`, or have a second hook for that case.

By subclassing `float` and defining `__build__` to something like this:

``` class MyFloat(float): … def __build__(self, result): Return MyFloat(result, some_state=self.some_state) … ```

I could now trust the built-in `float` machinery to try calling `lhs.__build__(result)` on the result that *would* have been returned *before* returning it. This is a simple example, but a protocol like this would work for mutables as well.

...
A more pertinent example, from dict itself:

If `dict` *were* to grow more operators, they would likely be `^`, `&`, and `-`. You can consider the case of subclassing `set` or `frozenset`, since they currently has those. Calling `lhs.copy()` first before updating is fine for additive operations like `|`, but for subtractive operations like the others, this can be very bad for performance, especially if we’re now *required* to call them. Again, doing things the default way, and *then* constructing the appropriate subclass in an agreed-upon way seems like the path to take here.

...
Changing all builtins is a big, backwards-incompatible change.

If implemented right, a system like the one described above (`__build__`) wouldn’t be backward-incompatible, as long as nobody was already using the name.

Just food for thought. I think this is a much bigger issue than PEP 584, but I'm convinced that the consistent status quo should prevail until a suitable solution for all types can be worked out (if ever). _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7HNJ6RVV... Code of Conduct: http://python.org/psf/codeofconduct/