Moving PEP 584 forward (dict + and += operators)
A month ago I promised the Steering Council I'd review PEP 584 and promptly got distracted by other things. But here it is, at last. Some procedural things first: you (the PEP authors) should apply to the Steering Council for a formal review. They most likely will assign one of the SC members as a BDFL-Delegate who will guide the PEP to a resolution (either acceptance or rejection, after any number of revisions). Don't wait until you think the PEP is perfect! It doesn't have to be for a BDFL-Delegate to be assigned. Note that the SC elections have just started, and in a few weeks we'll have a new SC. I won't be in it (I withdrew from the ballot). I don't know how this will affect the chances of this PEP. (Also note that after an election, a SC can overturn decisions by the previous SC, so I don't mean this as a reason to hurry -- just something to consider.) Now to the content of PEP 584. ## Should the operators be + and +=, or | and |= ? I haven't read _all_ the posts on this topic, but I've seen quite a few, and I still strongly favor | and |= . My reason for this is mainly that the operation really does do a set union **on the keys**. Also, + is already overloaded enough (numbers, strings, sequences) and we don't need yet another, different meaning. The | operator should be known to most users from sets (which are introduced before dicts in the official Python tutorial). I don't particularly care about learnability or discoverability in this case -- I don't think beginners should be taught these operators as a major tool in their toolbox (let them use .update()), and for anyone wondering how to do something with a dict without access to the official docs, there's always help(dict). The natural way of discovering these would be through tutorials, reading of the library docs, or by reading other people's code, not by trying random key combinations in the REPL. ## Does it matter that the operation is lossy, and not commutative in the values? Centithreads have been wasted on this topic, and I really don't care. I **certainly** don't think that introducing a new operator or using a different operator (<<, really?) is needed because of the operator's asymmetry. A tutorial can show this in two or three lines, and anybody who remembers these operators exists but has forgotten how they handle conflicting values (and doesn't remember they correspond to.update()) can type one line in the REPL to figure it out:
{"a": 1} | {"a": 2} {"a": 2}
## What about the Counter class? The Counter class is an over-engineered contraption that has way too many APIs to remember, and is quite inconsistent in what it tries to be. (When does it delete zeros? Why does it sometimes discard negative values?) We should not use it for guidance nor should we feel constrained by its design. ## What about performance? Performance is not the only objective when using Python. Switching to inplace operators (here |=) is a generally useful and well-known technique (it also applies to string and list concatenation, for example). ## Other objections The PEP does a good job of addressing pretty much everything that's been brought up against it (in fact, perhaps it goes a little too far in humoring obvious strawmen). ## Open questions There's only one left in the PEP: "Should these operators be part of the ABC Mapping API?" No. This would require every Mapping implementation to add these. While you might think that this could be addressed by adding concrete implementations in the Mapping ABC, that doesn't work for |, since there's no API for creating new instances. It could be done for |=, by adding this to MutableMapping: def __ior__(self, other): self.update(other) but without the ability to define __or__ in Mapping this seems inconsistent. Mapping implementations are of course free to define __or__. All in all I would recommend to the SC to go forward with this proposal, targeting Python 3.9, assuming the operators are changed to | and |=, and the PEP is brought more in line with the PEP editing guidelines from PEP 1 and PEP 12. (In particular, there are some suggested section headings there that ought to be followed more closely. I'd be glad to give the authors more guidance in this matter if they request it.) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
Thanks for the detailed feedback and helpful suggestions, Guido. Steven, I’m currently on my honeymoon through the 16th, and won’t be working on this (or anything). Feel free to address these points while I’m out. When I’m back I’ll be happy to touch base and make a PR, update the implementation, and anything else you might need. Brandt
I don't particularly care about learnability or discoverability in this case -- I don't think beginners should be taught these operators as a major tool in their toolbox (let them use .update()), and for anyone wondering how to do something with a dict without access to the official docs, there's always help(dict).
I initially had some concern about the |= operator not being overly clear to beginners, but that's a good point: it's not really made for beginners or intended to be an "essential-to-know" feature of Python. Although, I think the dict documentation [1] could benefit from a brief example that demonstrates equivalent code if PEP 584 is implemented, with one using dict.update() and the other using the operator.
Performance is not the only objective when using Python. Switching to inplace operators (here |=) is a generally useful and well-known technique (it also applies to string and list concatenation, for example).
Also, I think it's worth noting that we can optimize the actual behavior of the in-place operator under the hood, similar to what is done with consecutive string concatenations [2]. Of course, the optimization wouldn't be the same as it is with strings (since strings are immutable and dicts are mutable), but it still works as an example. IMO, that's a significant benefit of high-level languages such as Python: we can simplify the syntax while optimizing the exact behavior in the internals. [1]: https://docs.python.org/3/library/stdtypes.html#mapping-types-dict [2]: A recent blog post by Paul Ganssle explains the string concat optimization quite well in: https://blog.ganssle.io/articles/2019/11/string-concat.html (in the "Optimizing string concatenation" section) On Mon, Dec 2, 2019 at 2:57 PM Guido van Rossum <guido@python.org> wrote:
A month ago I promised the Steering Council I'd review PEP 584 and promptly got distracted by other things. But here it is, at last.
Some procedural things first: you (the PEP authors) should apply to the Steering Council for a formal review. They most likely will assign one of the SC members as a BDFL-Delegate who will guide the PEP to a resolution (either acceptance or rejection, after any number of revisions). Don't wait until you think the PEP is perfect! It doesn't have to be for a BDFL-Delegate to be assigned.
Note that the SC elections have just started, and in a few weeks we'll have a new SC. I won't be in it (I withdrew from the ballot). I don't know how this will affect the chances of this PEP. (Also note that after an election, a SC can overturn decisions by the previous SC, so I don't mean this as a reason to hurry -- just something to consider.)
Now to the content of PEP 584.
## Should the operators be + and +=, or | and |= ?
I haven't read _all_ the posts on this topic, but I've seen quite a few, and I still strongly favor | and |= . My reason for this is mainly that the operation really does do a set union **on the keys**.
Also, + is already overloaded enough (numbers, strings, sequences) and we don't need yet another, different meaning. The | operator should be known to most users from sets (which are introduced before dicts in the official Python tutorial).
I don't particularly care about learnability or discoverability in this case -- I don't think beginners should be taught these operators as a major tool in their toolbox (let them use .update()), and for anyone wondering how to do something with a dict without access to the official docs, there's always help(dict). The natural way of discovering these would be through tutorials, reading of the library docs, or by reading other people's code, not by trying random key combinations in the REPL.
## Does it matter that the operation is lossy, and not commutative in the values?
Centithreads have been wasted on this topic, and I really don't care. I **certainly** don't think that introducing a new operator or using a different operator (<<, really?) is needed because of the operator's asymmetry. A tutorial can show this in two or three lines, and anybody who remembers these operators exists but has forgotten how they handle conflicting values (and doesn't remember they correspond to.update()) can type one line in the REPL to figure it out:
{"a": 1} | {"a": 2} {"a": 2}
## What about the Counter class?
The Counter class is an over-engineered contraption that has way too many APIs to remember, and is quite inconsistent in what it tries to be. (When does it delete zeros? Why does it sometimes discard negative values?) We should not use it for guidance nor should we feel constrained by its design.
## What about performance?
Performance is not the only objective when using Python. Switching to inplace operators (here |=) is a generally useful and well-known technique (it also applies to string and list concatenation, for example).
## Other objections
The PEP does a good job of addressing pretty much everything that's been brought up against it (in fact, perhaps it goes a little too far in humoring obvious strawmen).
## Open questions
There's only one left in the PEP: "Should these operators be part of the ABC Mapping API?"
No. This would require every Mapping implementation to add these. While you might think that this could be addressed by adding concrete implementations in the Mapping ABC, that doesn't work for |, since there's no API for creating new instances.
It could be done for |=, by adding this to MutableMapping:
def __ior__(self, other): self.update(other)
but without the ability to define __or__ in Mapping this seems inconsistent. Mapping implementations are of course free to define __or__.
All in all I would recommend to the SC to go forward with this proposal, targeting Python 3.9, assuming the operators are changed to | and |=, and the PEP is brought more in line with the PEP editing guidelines from PEP 1 and PEP 12. (In particular, there are some suggested section headings there that ought to be followed more closely. I'd be glad to give the authors more guidance in this matter if they request it.)
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...> _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/SWBLMT... Code of Conduct: http://python.org/psf/codeofconduct/
On Mon, Dec 2, 2019 at 8:39 PM Kyle Stanley <aeros167@gmail.com> wrote:
Also, I think it's worth noting that we can optimize the actual behavior of the in-place operator under the hood, similar to what is done with consecutive string concatenations [2]. Of course, the optimization wouldn't be the same as it is with strings (since strings are immutable and dicts are mutable), but it still works as an example. IMO, that's a significant benefit of high-level languages such as Python: we can simplify the syntax while optimizing the exact behavior in the internals.
Actually there's no need to optimize the |= operator -- for strings we have to optimize += *because* strings are immutable, but for dicts we would define |= as essentially an alias for .update(), just like the relationship between += and .extend() for lists, and then no unnecessary objects would be created. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
03.12.19 07:04, Guido van Rossum пише:
Actually there's no need to optimize the |= operator -- for strings we have to optimize += *because* strings are immutable, but for dicts we would define |= as essentially an alias for .update(), just like the relationship between += and .extend() for lists, and then no unnecessary objects would be created.
Yet one question: should |= accept only dicts at right side, or arbitrary mappings with the keys() method, or even iterables of pairs as dict.update()? And the same question for |. Should `{} | Mapping()` and `{} | []` work?
On Tue, Dec 3, 2019 at 2:10 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
03.12.19 07:04, Guido van Rossum пише:
Actually there's no need to optimize the |= operator -- for strings we have to optimize += *because* strings are immutable, but for dicts we would define |= as essentially an alias for .update(), just like the relationship between += and .extend() for lists, and then no unnecessary objects would be created.
Yet one question: should |= accept only dicts at right side, or arbitrary mappings with the keys() method, or even iterables of pairs as dict.update()?
IMO it should follow the example of sets, and accept Mappings but not the other thing. (If you have the other thing, use update().)
And the same question for |. Should `{} | Mapping()` and `{} | []` work?
Ditto -- {} | Mapping() should work, but {} | [] should not. Steven, please take note -- these kinds of things should be spelled out in the PEP (apologies if they are already in there). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
04.12.19 20:18, Guido van Rossum пише:
On Tue, Dec 3, 2019 at 2:10 AM Serhiy Storchaka <storchaka@gmail.com <mailto:storchaka@gmail.com>> wrote:
And the same question for |. Should `{} | Mapping()` and `{} | []` work?
Ditto -- {} | Mapping() should work, but {} | [] should not.
set() | Set() falls back to Set.__ror__. collection.abc.Set.__ror__ is defined, but you suggested to not define it for collection.abc.Mapping.
Oh, that’s fair. I don’t think we should update Mapping or MutableMapping. People who want this for their own class can override __or__ and __ror__. On Wed, Dec 4, 2019 at 11:19 Serhiy Storchaka <storchaka@gmail.com> wrote:
04.12.19 20:18, Guido van Rossum пише:
On Tue, Dec 3, 2019 at 2:10 AM Serhiy Storchaka <storchaka@gmail.com <mailto:storchaka@gmail.com>> wrote:
And the same question for |. Should `{} | Mapping()` and `{} | []` work?
Ditto -- {} | Mapping() should work, but {} | [] should not.
set() | Set() falls back to Set.__ror__. collection.abc.Set.__ror__ is defined, but you suggested to not define it for collection.abc.Mapping. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/GAIB5H... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile)
Actually there's no need to optimize the |= operator -- for strings we have to optimize += *because* strings are immutable, but for dicts we would define |= as essentially an alias for .update(), just like the relationship between += and .extend() for lists, and then no unnecessary objects would be created.
Yeah that's why I noted that any form of optimization for the |= operator on dicts would not be the same as += is for strings. I wasn't actually sure of what form any potential optimization would take for the |= operator though. What exactly was the performance question/point in reference to? The question seemed to imply that there would be some minor performance detriment from using |=, but it's not clear to me as to when that would be a factor.
## What about performance?
Performance is not the only objective when using Python. Switching to inplace operators (here |=) is a generally useful and well-known technique (it also applies to string and list concatenation, for example).
Also with lists, I recall that using the += operator is *very slightly* faster than list.extend() in most situations:
ls_plus_eq = """\ for i in range(1_000): ls += [x for x in range(10)] """ ls_extend = """\ for i in range(1_000): ls.extend([x for x in range(10)]) """ timeit.timeit(ls_plus_eq, setup="ls = []", number=10_000) 6.563132778996078 timeit.timeit(ls_extend, setup="ls = []", number=10_000) 6.695127692000824 timeit.timeit("ls+=other", setup="ls = []; other=[i for i in range(100_000)]", number=10_000) 4.400735091003298 timeit.timeit("ls.extend(other)", setup="ls = []; other=[i for i in range(100_000)]", number=10_000) 4.574331789997814 timeit.timeit("ls+=other", setup="ls = []; other=[i for i in range(100)]", number=10_000_000) 3.5332175369985634 timeit.timeit("ls.extend(other)", setup="ls = []; other=[i for i in range(100)]", number=10_000_000) 3.7756526679950184
(Python 3.8) It seems to be a difference of only ~2-4% in most cases (~6-7% with the last set), but I find it interesting that += is barely faster. Of course, some of the above examples are fairly unrealistic, for most practical use cases they're essentially the same. I tend to prefer ls.extend() most of the time myself (the behavior is a bit more obvious). I'm mostly just curious if the difference between |= and dict.update() would end up being similar as far as performance goes, with |= having a negligible advantage over dict.update() in most situations.
I think the small time difference you noticed is only due to method lookup. Le mar. 3 déc. 2019 à 13:57, Kyle Stanley <aeros167@gmail.com> a écrit :
Actually there's no need to optimize the |= operator -- for strings we have to optimize += *because* strings are immutable, but for dicts we would define |= as essentially an alias for .update(), just like the relationship between += and .extend() for lists, and then no unnecessary objects would be created.
Yeah that's why I noted that any form of optimization for the |= operator on dicts would not be the same as += is for strings. I wasn't actually sure of what form any potential optimization would take for the |= operator though. What exactly was the performance question/point in reference to? The question seemed to imply that there would be some minor performance detriment from using |=, but it's not clear to me as to when that would be a factor.
## What about performance?
Performance is not the only objective when using Python. Switching to inplace operators (here |=) is a generally useful and well-known technique (it also applies to string and list concatenation, for example).
Also with lists, I recall that using the += operator is very slightly faster than list.extend() in most situations:
ls_plus_eq = """\ for i in range(1_000): ls += [x for x in range(10)] """ ls_extend = """\ for i in range(1_000): ls.extend([x for x in range(10)]) """ timeit.timeit(ls_plus_eq, setup="ls = []", number=10_000) 6.563132778996078 timeit.timeit(ls_extend, setup="ls = []", number=10_000) 6.695127692000824 timeit.timeit("ls+=other", setup="ls = []; other=[i for i in range(100_000)]", number=10_000) 4.400735091003298 timeit.timeit("ls.extend(other)", setup="ls = []; other=[i for i in range(100_000)]", number=10_000) 4.574331789997814 timeit.timeit("ls+=other", setup="ls = []; other=[i for i in range(100)]", number=10_000_000) 3.5332175369985634 timeit.timeit("ls.extend(other)", setup="ls = []; other=[i for i in range(100)]", number=10_000_000) 3.7756526679950184
(Python 3.8)
It seems to be a difference of only ~2-4% in most cases (~6-7% with the last set), but I find it interesting that += is barely faster. Of course, some of the above examples are fairly unrealistic, for most practical use cases they're essentially the same. I tend to prefer ls.extend() most of the time myself (the behavior is a bit more obvious). I'm mostly just curious if the difference between |= and dict.update() would end up being similar as far as performance goes, with |= having a negligible advantage over dict.update() in most situations. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/243KH6... Code of Conduct: http://python.org/psf/codeofconduct/
-- Antoine Rozo
I think the small time difference you noticed is only due to method lookup.
I believe it's a bit more than just the Python method lookup, but that makes some difference. Within the C-API, list += other uses list_inplace_concat [1]; whereas list.extend uses _PyList_Extend [2]. They both call list_extend [3], but they're not exactly equivalent. With the in-place operation, the intermediate results can be discarded right away. Also, at the bytecode level, += can use the STORE_FAST instruction to directly the push the result from the TOS to local var; whereas list.extend uses POP_TOP. But I'll leave it at that, I don't want to focus much on performance; especially not with comparing list += other vs ls.extend(other). As mentioned elsewhere in the topic, performance isn't the goal of this PEP. I was mostly just curious if dict |= other would likely be very slightly faster than dict.update(other), and wanted some elaboration on where performance might be a mild concern. [1]: https://github.com/python/cpython/blob/894331838b256412c95d54051ec46a1cb96f5... [2]: https://github.com/python/cpython/blob/894331838b256412c95d54051ec46a1cb96f5... [3]: https://github.com/python/cpython/blob/894331838b256412c95d54051ec46a1cb96f5... On Tue, Dec 3, 2019 at 12:21 PM Antoine Rozo <antoine.rozo@gmail.com> wrote:
I think the small time difference you noticed is only due to method lookup.
Le mar. 3 déc. 2019 à 13:57, Kyle Stanley <aeros167@gmail.com> a écrit :
Actually there's no need to optimize the |= operator -- for strings we
have to optimize += *because* strings are immutable, but for dicts we would define |= as essentially an alias for .update(), just like the relationship between += and .extend() for lists, and then no unnecessary objects would be created.
Yeah that's why I noted that any form of optimization for the |=
operator on dicts would not be the same as += is for strings. I wasn't actually sure of what form any potential optimization would take for the |= operator though. What exactly was the performance question/point in reference to? The question seemed to imply that there would be some minor performance detriment from using |=, but it's not clear to me as to when that would be a factor.
## What about performance?
Performance is not the only objective when using Python. Switching to
inplace operators (here |=) is a generally useful and well-known technique (it also applies to string and list concatenation, for example).
Also with lists, I recall that using the += operator is very slightly
faster than list.extend() in most situations:
ls_plus_eq = """\ for i in range(1_000): ls += [x for x in range(10)] """ ls_extend = """\ for i in range(1_000): ls.extend([x for x in range(10)]) """ timeit.timeit(ls_plus_eq, setup="ls = []", number=10_000) 6.563132778996078 timeit.timeit(ls_extend, setup="ls = []", number=10_000) 6.695127692000824 timeit.timeit("ls+=other", setup="ls = []; other=[i for i in
4.400735091003298
timeit.timeit("ls.extend(other)", setup="ls = []; other=[i for i in range(100_000)]", number=10_000) 4.574331789997814 timeit.timeit("ls+=other", setup="ls = []; other=[i for i in range(100)]", number=10_000_000) 3.5332175369985634 timeit.timeit("ls.extend(other)", setup="ls = []; other=[i for i in range(100)]", number=10_000_000) 3.7756526679950184
(Python 3.8)
It seems to be a difference of only ~2-4% in most cases (~6-7% with the last set), but I find it interesting that += is barely faster. Of course, some of the above examples are fairly unrealistic, for most practical use cases they're essentially the same. I tend to prefer ls.extend() most of
range(100_000)]", number=10_000) the time myself (the behavior is a bit more obvious). I'm mostly just curious if the difference between |= and dict.update() would end up being similar as far as performance goes, with |= having a negligible advantage over dict.update() in most situations.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/243KH6... Code of Conduct: http://python.org/psf/codeofconduct/
-- Antoine Rozo
With regard to | vs +, I honestly think the former would be *easier* for beginners than the latter, because: - Having a behavior that is unrelated to other uses of + would arguably be harder to learn, or at least easier to misunderstand when you first see it. - In the CS class world, many students will be covering sets anyway. - | having related behavior across sets and dicts, which are themselves related data structures, would make it easier to remember. On Mon, Dec 2, 2019, 1:59 PM Guido van Rossum <guido@python.org> wrote:
A month ago I promised the Steering Council I'd review PEP 584 and promptly got distracted by other things. But here it is, at last.
Some procedural things first: you (the PEP authors) should apply to the Steering Council for a formal review. They most likely will assign one of the SC members as a BDFL-Delegate who will guide the PEP to a resolution (either acceptance or rejection, after any number of revisions). Don't wait until you think the PEP is perfect! It doesn't have to be for a BDFL-Delegate to be assigned.
Note that the SC elections have just started, and in a few weeks we'll have a new SC. I won't be in it (I withdrew from the ballot). I don't know how this will affect the chances of this PEP. (Also note that after an election, a SC can overturn decisions by the previous SC, so I don't mean this as a reason to hurry -- just something to consider.)
Now to the content of PEP 584.
## Should the operators be + and +=, or | and |= ?
I haven't read _all_ the posts on this topic, but I've seen quite a few, and I still strongly favor | and |= . My reason for this is mainly that the operation really does do a set union **on the keys**.
Also, + is already overloaded enough (numbers, strings, sequences) and we don't need yet another, different meaning. The | operator should be known to most users from sets (which are introduced before dicts in the official Python tutorial).
I don't particularly care about learnability or discoverability in this case -- I don't think beginners should be taught these operators as a major tool in their toolbox (let them use .update()), and for anyone wondering how to do something with a dict without access to the official docs, there's always help(dict). The natural way of discovering these would be through tutorials, reading of the library docs, or by reading other people's code, not by trying random key combinations in the REPL.
## Does it matter that the operation is lossy, and not commutative in the values?
Centithreads have been wasted on this topic, and I really don't care. I **certainly** don't think that introducing a new operator or using a different operator (<<, really?) is needed because of the operator's asymmetry. A tutorial can show this in two or three lines, and anybody who remembers these operators exists but has forgotten how they handle conflicting values (and doesn't remember they correspond to.update()) can type one line in the REPL to figure it out:
{"a": 1} | {"a": 2} {"a": 2}
## What about the Counter class?
The Counter class is an over-engineered contraption that has way too many APIs to remember, and is quite inconsistent in what it tries to be. (When does it delete zeros? Why does it sometimes discard negative values?) We should not use it for guidance nor should we feel constrained by its design.
## What about performance?
Performance is not the only objective when using Python. Switching to inplace operators (here |=) is a generally useful and well-known technique (it also applies to string and list concatenation, for example).
## Other objections
The PEP does a good job of addressing pretty much everything that's been brought up against it (in fact, perhaps it goes a little too far in humoring obvious strawmen).
## Open questions
There's only one left in the PEP: "Should these operators be part of the ABC Mapping API?"
No. This would require every Mapping implementation to add these. While you might think that this could be addressed by adding concrete implementations in the Mapping ABC, that doesn't work for |, since there's no API for creating new instances.
It could be done for |=, by adding this to MutableMapping:
def __ior__(self, other): self.update(other)
but without the ability to define __or__ in Mapping this seems inconsistent. Mapping implementations are of course free to define __or__.
All in all I would recommend to the SC to go forward with this proposal, targeting Python 3.9, assuming the operators are changed to | and |=, and the PEP is brought more in line with the PEP editing guidelines from PEP 1 and PEP 12. (In particular, there are some suggested section headings there that ought to be followed more closely. I'd be glad to give the authors more guidance in this matter if they request it.)
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...> _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/SWBLMT... Code of Conduct: http://python.org/psf/codeofconduct/
I general, I am against this proposition. It makes the language more complex without adding any benefit. There are already many ways of merging dicts, including the expression form. It conflicts with Counter. It can make the code more errorprone because + or | for dicts will no longer fail immediately. But if this proposition be accepted, I will try to make it as consistent and harmless to the language as possible. 02.12.19 21:54, Guido van Rossum пише:
## Should the operators be + and +=, or | and |= ?
I argued for | and |= as lesser evil. But there may be a problem. Dict and dict keys view are interchangeable in the context of some set operations: "in" checks for existence of the key and iterating yields keys. Currently both `dictkeys | dict` and `dict | dictkeys` return the same, the set containing the union of keys.
{1: 2}.keys() | {3} {1, 3} {3} | {1: 2}.keys() {1, 3}
What it will return if implement | for dicts? It should be mentioned in the PEP. It should be tested with a preliminary implementation what behavior is possible and more natural.
## Does it matter that the operation is lossy, and not commutative in the values?
I never understood the argument about not-commutation. I do not believe somebody proposed it seriously. It looks as a straw-man.
## What about performance?
It should be mentioned in the PEP that `dict1 | dict2 | dict3` is less efficient than `{**dict1, **dict2, **dict3}`.
## Other objections
The principal question about the result type was not mentioned above. `dict | dict` should return an exact dict for dict subclasses for the same reasons as not including __or__ in the Mapping API. We cannot guarantee the signature and the behavior of the constructor and therefore we have no way to create a copy as an instance of general dict subclass. This is why `dict.copy()` returns an exact dict. This is why `list + list`, `tuple + tuple`, `str + str`, `set | set`, `frozenset | frozenset`, etc, etc return an instance of the base class. This is why all binary operators for numbers (like `int + int`, `float * float`) return an instance of the base class. Making `dict | dict` returning an instance of a dict subclass will be an exception to the rule.
On Tue, Dec 3, 2019 at 2:00 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
[SNIP]
## What about performance?
It should be mentioned in the PEP that `dict1 | dict2 | dict3` is less efficient than `{**dict1, **dict2, **dict3}`.
... in CPython, but is it guaranteed to be faster in e.g. PyPy? We should be very careful about making any performance promises/points unless we know it is fairly universal that the design will explicitly make something faster or slower relative to another operation. To me this PEP is entirely a question of whether the operators will increase developer productivity and not some way to do dict merging faster, and so performance questions should stay out of it unless it's somehow slower than dict.update().
03.12.19 20:51, Brett Cannon пише:
On Tue, Dec 3, 2019 at 2:00 AM Serhiy Storchaka <storchaka@gmail.com <mailto:storchaka@gmail.com>> wrote:
[SNIP] > ## What about performance?
It should be mentioned in the PEP that `dict1 | dict2 | dict3` is less efficient than `{**dict1, **dict2, **dict3}`.
... in CPython, but is it guaranteed to be faster in e.g. PyPy? We should be very careful about making any performance promises/points unless we know it is fairly universal that the design will explicitly make something faster or slower relative to another operation.
Does PyPy have optimizations for `list1 + list2 + list3` and `set1 | set2 | set3`? If no, it is unlikely it will have an optimization for `dict1 | dict2 | dict3`. In any case `{**dict1, **dict2, **dict3}` is optimal independently of the implementation.
To me this PEP is entirely a question of whether the operators will increase developer productivity and not some way to do dict merging faster, and so performance questions should stay out of it unless it's somehow slower than dict.update().
The PEP should contain all objections and pitfalls. I do not say that this is an argument against | for dicts, but this detail should be mentioned. BTW the PEP contains the wrong statement about `{**dict1, **dict2}` -- it works not only for string keys. After removing the false statement and adding the performance note the alternative will look much better.
On Dec 3, 2019, at 02:00, Serhiy Storchaka <storchaka@gmail.com> wrote:
I argued for | and |= as lesser evil. But there may be a problem. Dict and dict keys view are interchangeable in the context of some set operations: "in" checks for existence of the key and iterating yields keys. Currently both `dictkeys | dict` and `dict | dictkeys` return the same, the set containing the union of keys.
This is all just because a dict is an iterable of, and container of, its keys. It’s not a set of them the way its keys view is, but it doesn’t have to be. And you don’t need to do anything special to preserve that, just make sure dict.__or__ and __ror__ don’t try to handle sets (or arbitrary iterables), only mappings (or only dicts), and the set or view implementation will still work.
{1: 2}.keys() | {3} {1, 3} {3} | {1: 2}.keys() {1, 3}
What it will return if implement | for dicts? It should be mentioned in the PEP. It should be tested with a preliminary implementation what behavior is possible and more natural.
What is there to document or test here? There’s no dicts involved in either operator, only a set and a key view, both of which are set types and implement set union. I think the question you wanted to ask is | between a dict and a set or view, not between two sets or views. But as I said above, there’s an obvious right thing to do there, and the obvious implementation does that. Of course it’s still worth writing the tests, as well as the other usual __rspam__ tests that go with every operator on the builtins.
## Other objections
The principal question about the result type was not mentioned above. `dict | dict` should return an exact dict for dict subclasses for the same reasons as not including __or__ in the Mapping API. We cannot guarantee the signature and the behavior of the constructor and therefore we have no way to create a copy as an instance of general dict subclass. This is why `dict.copy()` returns an exact dict. This is why `list + list`, `tuple + tuple`, `str + str`, `set | set`, `frozenset | frozenset`, etc, etc return an instance of the base class. This is why all binary operators for numbers (like `int + int`, `float * float`) return an instance of the base class.
Also because if you want MyInt + int and int + MyInt to return a MyInt, you can do that trivially (as a subclass your __ror__ always gets precedence over the base __or__), and if you want MyDicf.copy() to return a MyDict it’s even easier (just override copy), so there’s no reason for the base class to even try to do that for you. So I agree, dict.__or__ should return a dict; anyone who wants MyDict to return a MyDict can just override, as with every other operator on the builtins. More generally, I think the design should follow all the other operators on builtins on all such questions. For example, should dict.__or__ and __ior__ handle all the same values as update, or some more restricted set of types? The same as set.__or__ and __ior__ vs. union, and list.__add__ and __iadd__ vs extend, and so on. If they’re all consistent, and there’s no compelling reason to add an inconsistency here, don’t.
03.12.19 21:04, Andrew Barnert via Python-ideas пише:
On Dec 3, 2019, at 02:00, Serhiy Storchaka <storchaka@gmail.com> wrote:
What it will return if implement | for dicts? It should be mentioned in the PEP. It should be tested with a preliminary implementation what behavior is possible and more natural.
What is there to document or test here? There’s no dicts involved in either operator, only a set and a key view, both of which are set types and implement set union.
Oh, sorry, it was a wrong example. Here is the right one:
{(1, 2): 3}.keys() | {4: 5} {(1, 2), 4}
{4: 5} | {(1, 2): 3}.keys() {(1, 2), 4}
How the results will change after implementing PEP 584? It all should be considered in the PEP. Note that dictkeys.__or__ does not return NotImplemented (there is an issue for this), so the PEP can require more wider changes than just adding __or__, __ror__ and __ior__ to dict.
I think the set operation of dict_keys accepts any iterable by accident. There is an issue for it: https://bugs.python.org/issue38538 On Wed, Dec 4, 2019 at 4:32 PM Serhiy Storchaka <storchaka@gmail.com> wrote:
03.12.19 21:04, Andrew Barnert via Python-ideas пише:
On Dec 3, 2019, at 02:00, Serhiy Storchaka <storchaka@gmail.com> wrote:
What it will return if implement | for dicts? It should be mentioned in the PEP. It should be tested with a preliminary implementation what behavior is possible and more natural.
What is there to document or test here? There’s no dicts involved in either operator, only a set and a key view, both of which are set types and implement set union.
Oh, sorry, it was a wrong example. Here is the right one:
{(1, 2): 3}.keys() | {4: 5} {(1, 2), 4}
{4: 5} | {(1, 2): 3}.keys() {(1, 2), 4}
How the results will change after implementing PEP 584? It all should be considered in the PEP. Note that dictkeys.__or__ does not return NotImplemented (there is an issue for this), so the PEP can require more wider changes than just adding __or__, __ror__ and __ior__ to dict. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/OYUGN5... Code of Conduct: http://python.org/psf/codeofconduct/
-- Inada Naoki <songofacandy@gmail.com>
On Tue, Dec 3, 2019 at 2:02 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
I general, I am against this proposition. It makes the language more complex without adding any benefit. There are already many ways of merging dicts, including the expression form. It conflicts with Counter. It can make the code more errorprone because + or | for dicts will no longer fail immediately.
But if this proposition be accepted, I will try to make it as consistent and harmless to the language as possible.
02.12.19 21:54, Guido van Rossum пише:
## Should the operators be + and +=, or | and |= ?
I argued for | and |= as lesser evil. But there may be a problem. Dict and dict keys view are interchangeable in the context of some set operations: "in" checks for existence of the key and iterating yields keys. Currently both `dictkeys | dict` and `dict | dictkeys` return the same, the set containing the union of keys.
Hm, I didn't know this. But it shouldn't change. This is a special case in keys(), it seems -- set | dict and dict | set fail. The special case can continue to return the same outcome.
{1: 2}.keys() | {3} {1, 3} {3} | {1: 2}.keys() {1, 3}
Your example uses sets -- did you mean {3: 4} instead of {3}?
What it will return if implement | for dicts? It should be mentioned in the PEP. It should be tested with a preliminary implementation what behavior is possible and more natural.
Again, Steven, please take note.
## What about performance?
It should be mentioned in the PEP that `dict1 | dict2 | dict3` is less efficient than `{**dict1, **dict2, **dict3}`.
Ditto. (I think this is a much less common use case -- more common is a variable number of dicts, which I would solve with a loop over update() or over |=.)
## Other objections
The principal question about the result type was not mentioned above. `dict | dict` should return an exact dict for dict subclasses for the same reasons as not including __or__ in the Mapping API. We cannot guarantee the signature and the behavior of the constructor and therefore we have no way to create a copy as an instance of general dict subclass. This is why `dict.copy()` returns an exact dict. This is why `list + list`, `tuple + tuple`, `str + str`, `set | set`, `frozenset | frozenset`, etc, etc return an instance of the base class. This is why all binary operators for numbers (like `int + int`, `float * float`) return an instance of the base class. Making `dict | dict` returning an instance of a dict subclass will be an exception to the rule.
Also an important detail, and again I hope that Steven adds this to the PEP. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
Abdur-Rahmaan Janhangeer http://www.pythonmembers.club | https://github.com/Abdur-rahmaanJ Mauritius On Mon, 2 Dec 2019, 23:55 Guido van Rossum, <guido@python.org> wrote:
Also, + is already overloaded enough (numbers, strings, sequences) and we don't need yet another, different meaning. The | operator should be known to most users from sets (which are introduced before dicts in the official Python tutorial).
I don't particularly care about learnability or discoverability in this case -- I don't think beginners should be taught these operators as a major tool in their toolbox (let them use .update()), and for anyone wondering how to do something with a dict without access to the official docs, there's always help(dict). The natural way of discovering these would be through tutorials, reading of the library docs, or by reading other people's code, not by trying random key combinations in the REPL.
My feel about that is to favour the + operator. Technically | is better but mimicking lists, + might sound better. It follows the general trend in Py. <<I don't think beginners should be taught these operators as a major tool in their toolbox>> True, fits more of a "Python Tricks" session but sometimes according to the audiance level it is tempting to just mention it since it's one line away. <<The natural way of discovering these would be through tutorials, ..., not by trying random key combinations in the REPL.>> *smile*
Thanks Guido. I have some notes in progress for a third revision of the PEP, but I don't think there's anything critical in them, and certainly nothing that need delay asking for a BDFL-Delegate. Brandt, if you are reading this[1], do you have any objection to me applying to the Steering Council for a BDFL-delegate, or would you prefer me to wait? [1] And if you are, get off the computer and go enjoy your honeymoon!!! -- Steven
On Tue, Dec 3, 2019 at 4:18 AM Steven D'Aprano <steve@pearwood.info> wrote:
I have some notes in progress for a third revision of the PEP, but I don't think there's anything critical in them, and certainly nothing that need delay asking for a BDFL-Delegate.
Brandt, if you are reading this[1], do you have any objection to me applying to the Steering Council for a BDFL-delegate, or would you prefer me to wait?
Please go ahead and send an email to steering-council@python.org. I jumped the gun a bit and discussed this proposal with the SC when we had some extra time in our meeting. The general sentiment is neutral to positive, provided the operators chosen are | and |=. The sentiment for + and += was distinctly negative. There were also some mumblings about the text of the PEP, but nothing that couldn't be fixed by some editing. I think Nick is going to send you a PR or at least a detailed critique. There was also a question about OrderedDict. I haven't thought much about it; I suppose someone should think about what's the right thing to do here. Maybe ditto for some other common subclasses of dict, like defaultdict. Good luck with the PEP! -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
If I can spend my two cents, I think the fact the most of you prefer | is because is already how sets works. And IMHO it's a bit illogical, since sets also support -. So I do not understand why | was chosen instead of +. Furthermore, sets supports < operator, that gives you the false hope that sets can be sorted. But it's not. So I don't think sets are *not* a good example. On the contrary, I feel so **natural** to see dict1 + dict2. Furthermore, the problem is: what is better for generic functions? Maybe I need a generic function that, in its code, do also a sum of input objects. Maybe I want to support also dict. In this way writing such function is much more hard.
On Dec 25, 2019, at 14:57, python-ideas--- via Python-ideas <python-ideas@python.org> wrote: First, as a side note, you seem to have configured your python-ideas-posting address with the name “python-ideas” rather than with a name that can be used to distinguish you from other people. This will make conversations confusing.
If I can spend my two cents, I think the fact the most of you prefer | is because is already how sets works. And IMHO it's a bit illogical, since sets also support -. So I do not understand why | was chosen instead of +.
First, what does it matter that sets support -? You could just as well argue that + for list and str is illogical because int supports - and they don’t. Or even that + is illogical even for ints because arrays support @ and they don’t. Just because one analogy holds between two types doesn’t mean every analogy you can imagine between those types does. Also, the PEP explicitly says that it’s not ruling out adding all of the set operators, just deferring it to a separate PEP to be written (and accepted or rejected) in the future. Given that, if set - (and & and ^) makes anything illogical, it’s +, not |. (Although really, I think “illogical” is a strange claim to make for any option here. It’s logical to spell the union of two dicts the same way you spell the union of two sets; it’s also logical to spell the concatenation of two dicts the same way you spell the concatenation of two lists. The question is which one is a more useful analogy, or which one is less potentially confusing, not which one you can come up with a convoluted way of declaring illogical if you really try.)
Furthermore, sets supports < operator, that gives you the false hope that sets can be sorted. But it's not.
Sure they can: >>> a = [{1,2}, {1}] >>> sorted(a) [{1}, {1, 2}] Of course you have to be careful because it’s only a partial order, and sorting sets that aren’t comparable is usually meaningless. But there’s nothing about < that demands it be a total order—otherwise, you couldn’t even use it with float.
So I don't think sets are *not* a good example. On the contrary, I feel so **natural** to see dict1 + dict2.
Furthermore, the problem is: what is better for generic functions? Maybe I need a generic function that, in its code, do also a sum of input objects. Maybe I want to support also dict. In this way writing such function is much more hard.
What kind of code needs to “sum” generic things that might be dicts and might be lists, when they mean such different things? And why doesn’t this code also need to sum sets? Or other types like tries or something? What’s special and common to numbers, timediffs, sequences, and dicts. but not sets, tries, and datetimes?
Andrew Barnert wrote: > On Dec 25, 2019, at 14:57, python-ideas--- via Python-ideas python-ideas@python.org wrote: > > If I can spend my two cents, I think the fact the > > most of you prefer | is because is already how sets works. And IMHO it's a bit illogical, > > since sets also support -. So I do not understand why | was chosen instead of +. > First, what does it matter that sets support -? You could just as well argue that > + for list and str is illogical because int supports - and they don’t. Subtracting two lists or two strings has no sense, so the comparison is unfair. On the contrary, on sets you can apply union *and* difference. And since union seems the exact contrary of difference, it's illogical that | is used instead of +. That said, the set API at this point is consolidated. My only hope is Python does not make the same errors with `dict` or any other type. > (Although really, I think “illogical” is a strange claim to make for any option here. It’s > logical to spell the union of two dicts the same way you spell the union of two sets See above... > Of course you have to be careful because it’s only a partial order, and sorting sets > that aren’t comparable is usually meaningless Indeed, what a coder really need is a isstrictsubset() method, not <. Since set1 < set2 has sense, but sorted(sets) have not. So it was better to have set1.isstrictsubset(set2) and **no** <. But, as I said, the ship was sailed for sets. > What kind of code needs to “sum” generic things that might be dicts and might be > lists, when they mean such different things? Now I can't think about any practical example. Anyway, generally speaking, Python is full of functions that can be applied to completely different objects, just because the API is identical. If it quacks... > And why doesn’t this code also need to sum sets? Who said it does not need? It will be simply more convoluted. So I hope, again, this does not happen to `dict` too. > What’s special and common to numbers, timediffs, sequences, and dicts. but not sets, tries, and datetimes? Well, because summing 2 datetimes has no sense? About tries, I don't remember tries in the stdlib. If so, it's OT. > Also, the PEP explicitly says that it’s not ruling out adding all of the set operators, > just deferring it to a separate PEP to be written (and accepted or rejected) in the > future. .......I'm not asking to support other operators and I don't really know why you think so. > you seem to have configured your python-ideas-posting address > with the name “python-ideas” rather than with a name that can be used to distinguish you > from other people. This will make conversations confusing. ............it's python-ideas@***marco.sulla****.e4ward.com ............................
On Thu, Dec 26, 2019 at 1:12 PM python-ideas--- via Python-ideas <python-ideas@python.org> wrote: > > Andrew Barnert wrote: > > On Dec 25, 2019, at 14:57, python-ideas--- via Python-ideas python-ideas@python.org wrote: > > > If I can spend my two cents, I think the fact the > > > most of you prefer | is because is already how sets works. And IMHO it's a bit illogical, > > > since sets also support -. So I do not understand why | was chosen instead of +. > > First, what does it matter that sets support -? You could just as well argue that > > + for list and str is illogical because int supports - and they don’t. > > Subtracting two lists or two strings has no sense, so the comparison is unfair. Except that it DOES make sense in some contexts. People are far too quick to say that something "makes no sense", implying that there is no sensible way to interpret it. I've seen plenty of people complain that you can't add two strings together (ie that concatenation is fundamentally different from addition), that you can't multiply a list by an integer, that you can't multiply a string by an integer, that you can't divide a string by a string, etc, etc, etc. Okay, so Python only supports the last one for the specific case of paths, but that's actually only *one* logically defensible interpretation of division (another being "split on this substring", and I'm sure there are others). I'm not asking for Python to support all of those operations, but I do ask people to be a little more respectful to the notion that these operations are meaningful. > On the contrary, on sets you can apply union *and* difference. And since union seems the exact contrary of difference, it's illogical that | is used instead of +. > > That said, the set API at this point is consolidated. My only hope is Python does not make the same errors with `dict` or any other type. > > > (Although really, I think “illogical” is a strange claim to make for any option here. It’s > > logical to spell the union of two dicts the same way you spell the union of two sets > > See above... And see above :) > > Of course you have to be careful because it’s only a partial order, and sorting sets > > that aren’t comparable is usually meaningless > > Indeed, what a coder really need is a isstrictsubset() method, not <. Since set1 < set2 has sense, but sorted(sets) have not. So it was better to have set1.isstrictsubset(set2) and **no** <. But, as I said, the ship was sailed for sets. > Mathematically, the subset relationship is a perfectly reasonable ordering, so it makes perfect sense that the "<" operator be used for this meaning. > > you seem to have configured your python-ideas-posting address > > with the name “python-ideas” rather than with a name that can be used to distinguish you > > from other people. This will make conversations confusing. > > ............it's python-ideas@***marco.sulla****.e4ward.com ............................ My guess is that you don't have a real name set, so your client has autogenerated a "real name" from the part of your email address before the at sign (the "mailbox" portion). If you set an actual name, it'll come through more readably to everyone else. ChrisA
David Mertz wrote:
Set union, however, has a great deal in common with bitwise-or
On the contrary, in mathematics there's the concept of direct sum of sets, and the categorical sum, aka disjoint union. They are not union operations, but similar. They have not named them "direct or" and "categorical or" :D It's more logical for a mathematician to think to the union as a sort of particular sum, instead of an "or". Furthermore, bitwise-or is an operation that is born for sequences. Indeed, bytes are sequences of bits, in a precise position. Sets are not sequences, so bitwise-or for sets is completely illogical.
If we had a multi-set type (collections.Counter is close, but not exact)
It's not "not exact", it's completely unrelated. Counter is a subset of `dict` and has a beautiful update method. And if God wants, also a + instead of a pipe... Chris Angelico wrote:
Subtracting two lists or two strings has no sense, so the comparison is unfair. Except that it DOES make sense in some contexts.
Source, please.
People are far too quick to say that something "makes no sense", implying that there is no sensible way to interpret it.
"There's preferably only one obvious way to do it", no?
I've seen plenty of people complain that you can't add two strings together (ie that concatenation is fundamentally different from addition), that you can't multiply a list by an integer, that you can't multiply a string by an integer, that you can't divide a string by a string, etc, etc, etc.
Well, they are wrong :D They want funny and lazy operators? Suggest them numpy. numpy transform Python in the "language" Matlab. And I've seen numpy coders that used ~ also for simple integers... Really funny results. Usually they are scientists, and programming is boring for them.
that's actually only one logically defensible interpretation of division (another being "split on this substring", and I'm sure there are others)
No. There is the split function. It's more typical that you forgot to transform the strings, that you probably get from a file, to numbers.
I do ask people to be a little more respectful to the notion that these operations are meaningful.
.....respectful? I presented my opinion. You are free to agree or not. I've not swore of split on the floor :D
Mathematically, the subset relationship is a perfectly reasonable ordering, so it makes perfect sense that the "<" operator be used for this meaning.
Mathematically, the operator is ⊂. "<" operator is used for comparison, and it's vital for sorting. And sorting sets makes no sense. You have not ⊂ on your keyboard? Well.... sorry, you have to use a function :D I'm not against operator overloading, on the contrary. But "<" was clearly chosen only because is graphically similar to ⊂, without thinking about the consequences.
My guess is that you don't have a real name set, so your client has autogenerated a "real name" from the part of your email address before the at sign (the "mailbox" portion). If you set an actual name, it'll come through more readably to everyone else.
More simply, I'm posting from mail.python.org site, and I had not set name and surname in profile.
On Thu, Dec 26, 2019 at 11:15 PM Marco Sulla via Python-ideas <python-ideas@python.org> wrote:
Chris Angelico wrote:
Subtracting two lists or two strings has no sense, so the comparison is unfair. Except that it DOES make sense in some contexts.
Source, please.
Pike v8.1 release 13 running Hilfe v3.5 (Incremental Pike Frontend)
"abc, def, ghi" - ", "; (1) Result: "abcdefghi" ({1, 4, 2, 8, 5, 7}) - ({2, 4}); (2) Result: ({ /* 4 elements */ 1, 8, 5, 7 })
In the case of arrays/lists, subtraction is defined as "remove each of these elements, as many times as it comes up" (kinda like set difference, but retaining order). In the case of strings, it's "remove this substring any time it occurs". I'm not saying that Python needs these features, but they DO make sense in the contexts that make use of them. They are not nonsensical operations.
People are far too quick to say that something "makes no sense", implying that there is no sensible way to interpret it.
"There's preferably only one obvious way to do it", no?
That's an argument for not incorporating them into Python. It's not stating that they "make no sense".
I've seen plenty of people complain that you can't add two strings together (ie that concatenation is fundamentally different from addition), that you can't multiply a list by an integer, that you can't multiply a string by an integer, that you can't divide a string by a string, etc, etc, etc.
Well, they are wrong :D They want funny and lazy operators? Suggest them numpy. numpy transform Python in the "language" Matlab. And I've seen numpy coders that used ~ also for simple integers... Really funny results. Usually they are scientists, and programming is boring for them.
This is the Blub Paradox in action. You understand adding strings together and therefore it makes sense; but subtracting strings makes no sense, because it doesn't fit in your mind. In reality, both operations are equally plausible.
that's actually only one logically defensible interpretation of division (another being "split on this substring", and I'm sure there are others)
No. There is the split function. It's more typical that you forgot to transform the strings, that you probably get from a file, to numbers.
Again, that's a justification for not including it in Python, but it does NOT mean that the operation is nonsensical.
I do ask people to be a little more respectful to the notion that these operations are meaningful.
.....respectful? I presented my opinion. You are free to agree or not. I've not swore of split on the floor :D
When your opinion is that an operation "has no sense", that's not respectful.
Mathematically, the subset relationship is a perfectly reasonable ordering, so it makes perfect sense that the "<" operator be used for this meaning.
Mathematically, the operator is ⊂. "<" operator is used for comparison, and it's vital for sorting. And sorting sets makes no sense.
Once again, you assert this. Do you have proof that it absolutely makes NO SENSE in any context, or just that you don't see value in it? If you're just presenting your opinion, then present it as an opinion, not as a scoffing dismissal. ChrisA
Chris Angelico wrote:
Mathematically, the operator is ⊂. "<" operator is used for comparison, and it's vital for sorting. And sorting sets makes no sense. Once again, you assert this. Do you have proof that it absolutely makes NO SENSE in any context, or just that you don't see value in it? If you're just presenting your opinion, then present it as an opinion, not as a scoffing dismissal.
Angelico, I can change my mind if you will be able to give me at least one practical example where `sorted(sets)` is really useful.
I'm not saying that Python needs these features
Thanks to God :D
When your opinion is that an operation "has no sense", that's not respectful.
For the operation? Seriously, if you think that the operation has sense, it **has no sense** :D that you feel offended. Simply try to proof it to me the contrary. I can change my mind, I'm not a member of Daesh.
On Dec 26, 2019, at 04:15, Marco Sulla via Python-ideas <python-ideas@python.org> wrote:
Mathematically,
Whenever someone tries to argue that “Mathematically, this doesn’t make sense” it ends up isomorphic to an argument that they really would have enjoyed one more semester of math classes as an undergrad but for whatever reason didn’t take it.
the operator is ⊂. "<" operator is used for comparison, and it's vital for sorting.
Yes. It’s the defining operation for the partial order in a poset (partially ordered set). And when studying posets generically, you always spell the operation <. And sorting is a concept on posets, not on integers or any other specific poset. (Sometimes you spell it with the bent curvy < instead of the normal one—but in that case, it’s the bent curvy one that’s vital for sorting.) And containment—the poset formed by a set of sets and the subset relation—is usually the very next example they teach you after the less than operation on integers. In fact, this isn’t just _a_ partial order, but the canonical one: everything else is defined by composing containment orders, taking sub-posets of them, or finding isomorphisms to them (or doing the same with posets recursively defined that way). This works because every partial order is isomorphic to a containment order; that’s one of the first proofs you do, and it’s dead simple. (It’s similar to using permutations in group theory.)
And sorting sets makes no sense.
Well, then all of order theory is wrong. Worse, what do you think does make sense to sort? Natural numbers? Not if any of the usual constructions of the naturals actually works. Briefly: Define zero as {}, succ(n) as n U {n}, and N as the infinite set generated by zero and succ(n) for any other member. Define n<m as subset, and you’ve got the poset of naturals (N,<). It’s trivial to prove that this behaves exactly the way you expect it to. More generally, everything is a set, so any operation that doesn’t make sense on sets just doesn’t make sense on anything. (Set theory isn’t the only possible foundation for math, but it’s the most common one, because most mathematicians don’t care or ever need to think about foundations; as long as there’s one thing that works that they vaguely remember from undergraduate study, that’s good enough for their work, and for everything else up to getting really high with a philosopher buddy.)
You have not ⊂ on your keyboard? Well.... sorry, you have to use a function :D
Why? I don’t have the raised x character on my keyboard either, and I’m quite happy to spell multiplication with * instead of with a named function or method. And likewise / for division, & for and, and so on.
I'm not against operator overloading, on the contrary. But "<" was clearly chosen only because is graphically similar to ⊂, without thinking about the consequences.
You’ve got it backward. Historically, the subset symbol is a C squashed to look graphically similar to a < (or actually a reversed version of a reversed C squashed to look graphically similar to >), and Russell, who chose that out of the many different popular 19th century spellings, certainly was thinking about the consequences. It’s true that in Python—unlike in set theory—we don’t usually deal with sets of sets (in fact, we can’t, unless we use frozenset), but if you want to abandon the “mathematically” and talk about whether sorting makes sense for Python sets (after all, practicality beats purity), the fact that sorted works on sets and on iterables of sets means that yes, it does. Python’s set.__lt__ makes a perfectly good partial order.
Andrew Barnert wrote:
the operator is ⊂. "<" operator is used for comparison, and it's vital for sorting. Yes. It’s the defining operation for the partial order in a poset (partially ordered set). And when studying posets generically, you always spell the operation <.
Nope. Usually, you define the operation <=. And posets requires that set1 <= set1 == True and this is true. Unluckily, sort operations in Python requires and uses **only** `__lt__()`. And set1 < set1 == False So, **in Python**, sorting a list of set **has no mathematical sense**, because **you can sort them in any way**, partial or total. They are like NaN. And if you tried to sort some sets in REPL, as I do, you realized it very fast.
"<" was clearly chosen only because is graphically similar to ⊂, without thinking about the consequences. You’ve got it backward. Historically [...]
Hey, I'm Roman. Historically, I should be the citizen of the capital of the whole Europe, part of Middle East and North Africa. Furthermore, I'm the descendant of Lucius Cornelius Sulla, the first dictator of Roma. So I should be the Emperor of UE, at minimum :D And now something completely different. cdd, come dovevasi dimostrare, sorted(sets) has no sense. I do not want to spit over the work of people that created a **wonderful** language. But, IMHO, the set API has some problems. So, I repeat, I **kindly** hope that set will be not taken as example for future APIs. I **strongly** hope that the good, old, plus operator will be chosen, because, as I wrote, **mathematically** has more sense, for sets, dicts and whatever object needs to be merged to another one. Peace and love.
On Dec 26, 2019, at 11:32, Marco Sulla via Python-ideas <python-ideas@python.org> wrote:
Andrew Barnert wrote:
the operator is ⊂. "<" operator is used for comparison, and it's vital for sorting. Yes. It’s the defining operation for the partial order in a poset (partially ordered set). And when studying posets generically, you always spell the operation <.
Nope.
Usually, you define the operation <=.
I didn’t want to get into that, because I assumed you weren’t going to argue that <= makes sense for sets but < doesn’t (especially given that a<b iff a<=b and not a==b for sets, just like for all of the numeric types), or that Python’s sort isn’t actually a sort and therefore < isn’t meaningful for any type and therefore not for sets, because both are silly. But it seems like you want to argue both of them, so… I’m not sure what to say to that, except that neither of those help make your case that set shouldn’t spell subset the same way that int spell less. Let’s go through it:
And posets requires that
set1 <= set1 == True
and this is true. Unluckily, sort operations in Python requires and uses **only** `__lt__()`. And
set1 < set1 == False
Sure, just like with int.
So, **in Python**, sorting a list of set **has no mathematical sense**,
So sorting a list of int has no mathematical sense? Surely you can’t mean that. But if you replace the word “set” with “int” in your argument, it’s exactly as valid.
because **you can sort them in any way**, partial or total.
I’m not sure what this means. You can’t choose whether to sort them partially or totally; you either have a totally ordered collection of sets or you don’t. If it’s totally ordered, sorted gives you a meaningful sort (and one that preserves input order if there are duplicates), just as with totally ordered values of any other type. And if you have a collection of sets that isn’t totally ordered, it behaves in a well defined but usually not particularly useful way, just as with any other type—as you clearly realize:
They are like NaN.
Exactly. So are you arguing that int < is fine but float < is mathematically nonsense? If not, then why is the fact that set < is like float < an argument that set comparison shouldn’t be spelled <? Also, I’m not sure why you think this would be different if Python sorted were defined on <= instead of <. This would only affect types for which a<=b iff a<b or a==b is false, and that doesn’t include set any more than it includes int and float.
"<" was clearly chosen only because is graphically similar to ⊂, without thinking about the consequences. You’ve got it backward. Historically [...]
Hey, I'm Roman. Historically, I should be the citizen of the capital of the whole Europe, part of Middle East and North Africa.
Ok. And if someone were trying to claim that as a Roman you’re not European, because the EU just invented a definition for convenience without thinking about the fact that it makes no sense, this history would be a good counter argument. It doesn’t just counter the claim that the definition is a spurious and thoughtless recent invention, it also makes it blatantly obvious that the definition makes sense. Just like history is a good counter argument to your claim that Python invented < for sets for convenience without thinking about the fact that it makes no sense. It doesn’t just counter the claim that Python’s definition is a spurious and thoughtless recent invention, it also makes it blatantly obvious that the definition makes sense.
Andrew Barnert wrote:
I didn’t want to get into that, because I assumed you weren’t going to argue that <= makes sense for sets but < doesn’t
So you're telling about **strict** partial ordering. I can spend thousand of words, but I think Python can speak for me: ``` (venv) marco@buzz:~/sources/tests/more_itertools$ python3.9 Python 3.9.0a0 (heads/master-dirty:d8ca2354ed, Oct 30 2019, 20:25:01) [GCC 9.2.1 20190909] on linux Type "help", "copyright", "credits" or "license" for more information.
import random a = [set(), {1}, {2}, {3}, {1, 4}, {1, 4}, {3, 7}, {3, 7, 10}, {10}] random.shuffle(a) sorted(a) [set(), {3}, {10, 3, 7}, {2}, {1, 4}, {10}, {3, 7}, {1}, {1, 4}] random.shuffle(a) sorted(a) [set(), {3, 7}, {1}, {10}, {3}, {10, 3, 7}, {1, 4}, {1, 4}, {2}] random.shuffle(a) sorted(a) [set(), {3, 7}, {1}, {10}, {3}, {1, 4}, {1, 4}, {10, 3, 7}, {2}] random.shuffle(a) sorted(a) [set(), {1}, {10}, {3}, {2}, {3, 7}, {10, 3, 7}, {1, 4}, {1, 4}] random.shuffle(a) sorted(a) [set(), {2}, {3, 7}, {1}, {1, 4}, {10}, {3}, {10, 3, 7}, {1, 4}]
Notice the positions of the two {1, 4} sets......
SOooooo.... sorted(sets) does **not** sort at all. Total, partial, so and so, nothing.
Can we now stop please this OT and return to the thread?
Really the point about partial order was EXACTLY the thread. If you want to say that floating point numbers are not ordered for exactly the same reason, and in exactly the same way, as sets... well, I guess you can die on that hill. Since NaN is an IEEE-854 value, everything you mention is precisely identical of floats. Is your argument that we need to stop using the '<' operator for floats also?! On Thu, Dec 26, 2019 at 4:31 PM Marco Sulla via Python-ideas < python-ideas@python.org> wrote:
Andrew Barnert wrote:
I didn’t want to get into that, because I assumed you weren’t going to argue that <= makes sense for sets but < doesn’t
So you're telling about **strict** partial ordering. I can spend thousand of words, but I think Python can speak for me:
``` (venv) marco@buzz:~/sources/tests/more_itertools$ python3.9 Python 3.9.0a0 (heads/master-dirty:d8ca2354ed, Oct 30 2019, 20:25:01) [GCC 9.2.1 20190909] on linux Type "help", "copyright", "credits" or "license" for more information.
import random a = [set(), {1}, {2}, {3}, {1, 4}, {1, 4}, {3, 7}, {3, 7, 10}, {10}] random.shuffle(a) sorted(a) [set(), {3}, {10, 3, 7}, {2}, {1, 4}, {10}, {3, 7}, {1}, {1, 4}] random.shuffle(a) sorted(a) [set(), {3, 7}, {1}, {10}, {3}, {10, 3, 7}, {1, 4}, {1, 4}, {2}] random.shuffle(a) sorted(a) [set(), {3, 7}, {1}, {10}, {3}, {1, 4}, {1, 4}, {10, 3, 7}, {2}] random.shuffle(a) sorted(a) [set(), {1}, {10}, {3}, {2}, {3, 7}, {10, 3, 7}, {1, 4}, {1, 4}] random.shuffle(a) sorted(a) [set(), {2}, {3, 7}, {1}, {1, 4}, {10}, {3}, {10, 3, 7}, {1, 4}]
Notice the positions of the two {1, 4} sets...... SOooooo.... sorted(sets) does **not** sort at all. Total, partial, so and so, nothing. Can we now stop please this OT and return to the thread? _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/MOTB2KSSDABYABZNXNZOM7XN2UDYJ5HD/ Code of Conduct: http://python.org/psf/codeofconduct/
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
David Mertz wrote:
NaN is an IEEE-854 value, everything you mention is precisely identical of floats. Is your argument that we need to stop using the '<' operator for floats also?!
Nope, but I simply ignored in this case IEEE 754-2019 (that supersedes IEEE 854) and I raised an exception, if any number tried to do any comparison with a (s)NaN. And the same for non-comparable sets. This is because NaN, IMHO, it's not the right name for the entity. A number can't be Not a Number. It's an Undefined Number. Think about 0/0. Suppose that exists an x so 0/0 = x. If so, 0*x = 0. But this is valid for **every** x. So x is a number, but it's undefined. It can be 1, 100, pi/4, 10.5!, -e. So you can't say if it's greater or smaller or equal to any other number, NaN included of course! So you **can't** say 5 < NaN is false. It could be true or false, it's like the spin of an electron. No, it's worse, since you'll never know if NaN is greater or lesser than 5, you can't measure it in any way. But IEEE 754-2019 is written mainly for C and Assembly. And you can't return nothing in a comparison between two floats. So they take the most practical solution: return false. But Python and high level languages have another option: raise an exception. And this is IMHO the most sane solution, because you **can't** compare NaNs. Think about a list with a NaN inside. Ok, now the sorting algorithm simply don't move the NaN. But what if we change the algorithm and we move also NaNs? Where to put them? At the begin of the list? At the end? Even if the list is composed only by NaNs, every position is **wrong**. Because you can't know if the NaN is greater, equal or lesser than any other number. So, an exception should be raised. And the same for sets. This obviously does not apply to total ordering. Even if I think mathematically has no sense, IEEE 754 clearly defined a total ordering, also for NaNs. So a total_ordering() key function, passed to sort functions, should order as IEEE 754 declares, without raising exceptions. There's a bug opened somewhere about this key function. This is what I've done. Unluckily, the API now works this way. It's not a big problem anyway. You have only to pass as key to the sort function: ``` def sortNans(x): try: if math.isnan(x): return float("+inf") except TypeError pass return x ``` and do sorted(iterable_with_nans, key=sortNans) This key does not distinguish between NaN and -NaN... but who cares! Since NaN is undefined, it could be also negative, so it's sign is meaningless (except for total ordering). And the result is wrong... but I repeat, who cares? :D
On Dec 26, 2019, at 14:46, Marco Sulla via Python-ideas <python-ideas@python.org> wrote:
David Mertz wrote:
NaN is an IEEE-854 value, everything you mention is precisely identical of floats. Is your argument that we need to stop using the '<' operator for floats also?!
Nope, but I simply ignored in this case IEEE 754-2019 (that supersedes IEEE 854)
Not that this is relevant, but if you’re going to be a pedant, the floats in whatever Python you’re using right now are probably 854/754-1985 doubles, not 754-2019 binary64s.
This is because NaN, IMHO, it's not the right name for the entity. A number can't be Not a Number. It's an Undefined Number.
OK, so you really are arguing against IEEE float here. As David said, you can die on that hill if you want to. I think everyone agrees that it’s not an ideal design (except maybe fictional mathematicians as imagined by electrical engineers), but everyone has agreed to use it anyway. So if your argument is that IEEE float is wrong, and set is wrong for the same reason, and therefore dict shouldn’t be like set, that’s fine, but don’t expect many people to be sold.
But IEEE 754-2019 is written mainly for C and Assembly. And you can't return nothing in a comparison between two floats. So they take the most practical solution: return false.
They defined a whole complicated system of signaling vs. quiet nan values and signaling (on sNaN) vs. non-signaling operations and multiple forms of exception handling from traps to checkable flags to substitution of default values. You can use much of that in C (and all of it in assembly) if you want to, even if people usually don’t. Also, who says you can’t return nothing in a comparison between two floats? You can write a named function that has any API you want (say, it returns -1, 0, 1, or NaN, or even an enum of less, greater, equal, left nan, right nan, both nan, other incomparable) instead of a Boolean function that has to return true or false. But the fact that C (and countless other languages, including Python) also provides the operators and people use them extensively implies that many people don’t think the operators are useless, even if you do.
This obviously does not apply to total ordering. Even if I think mathematically has no sense, IEEE 754 clearly defined a total ordering, also for NaNs. So a total_ordering() key function, passed to sort functions, should order as IEEE 754 declares, without raising exceptions. There's a bug opened somewhere about this key function.
This is what I've done. Unluckily, the API now works this way. It's not a big problem anyway. You have only to pass as key to the sort function:
``` def sortNans(x): try: if math.isnan(x): return float("+inf") except TypeError pass
return x ```
But this isn’t the same thing that IEEE totalOrder defines. The biggest difference is that it treats nan the same as inf, instead of larger—since the original argument revolved around putting nans at the end so that, e.g., programmers can lop the nans off, this seems pretty significant. It also doesn’t distinguish -0 and 0, different representations of other equal doubles, negative NaNs, sNaNs, and NaNs with different payloads—all of which are increasingly unlikely to be relevant, especially compared to nan vs. inf, but if you’re going to complain that Python doesn’t do IEEE totalOrder you shouldn’t offer a fix that still doesn’t.
and do
sorted(iterable_with_nans, key=sortNans)
This key does not distinguish between NaN and -NaN... but who cares!
Presumably whoever is asking for IEEE totalOrder instead of IEEE < cares, or why would they be asking for it? Of course if you want something different from IEEE comparison and also different from IEEE totalOrder for some use case, that’s fine—you just showed how trivial it is to do whatever you want, with the current API. I don’t see how that demonstrates that the API is “unlucky”; It seems like the exact opposite. If you want to move NaNs to the end, or treat them the same as inf, or raise, or anything else you can think of, it’s easy. I can’t see how any one-size-fits-all API could make more than one of those things easy.
On Dec 26, 2019, at 10:53, Andrew Barnert <abarnert@yahoo.com> wrote:
You’ve got it backward. Historically, the subset symbol is a C squashed to look graphically similar to a < (or actually a reversed version of a reversed C squashed to look graphically similar to >), and Russell, who chose that out of the many different popular 19th century spellings, certainly was thinking about the consequences.
Actually, according to a few well-cited internet sites (like http://jeff560.tripod.com/set.html) I’m wrong about this. It doesn’t change the fact that the symbol was invented to intentionally look like < well over a century before Python, but for the sake of accuracy: Gergonne was using C (from the French or Latin word for containment) for superset as early as 1817. But most people didn’t follow him, and in fact < and > was the most popular spelling for most of the 19th century. But later, along with a bunch of other alternatives, Schröder replaced the > with a rotated U (from untergeordnet) squashed to look more like a > (and likewise for übergeordnet and <), while Peano revived Gergonne’s C and flipped it and squashed it to look like a > (and later added the re-flipped version). Meanwhile, post-Klein group theorists were coming up with different “weakened” versions of the < symbol to distinguish “subset that may or may not form a group” from “subgroup” (which is still spelled < today). Russell and Whitehead presumably noticed the happy accident that the three happen to be essentially the same, but since they borrowed more symbols from Schröder it’s probably the sideways U that’s the direct ancestor of our modern symbol.
Well, Barnert, maybe you didn't understood my irony, so I speak more seriously. This is extremely interesting, but **completely** OT. We are discussing about the operator that potentially could merge in a future two `dict`s. I think this is OT also for the mailing list... but I think you could at least separate this discussion in a separate thread. Can you, please? Thanks in advance.
On 2019-12-26 02:09, python-ideas--- via Python-ideas wrote:
Andrew Barnert wrote: [snip]
you seem to have configured your python-ideas-posting address with the name “python-ideas” rather than with a name that can be used to distinguish you from other people. This will make conversations confusing.
............it's python-ideas@***marco.sulla****.e4ward.com ............................ Conduct: http://python.org/psf/codeofconduct/
The headers say: From: python-ideas--- via Python-ideas <python-ideas@python.org> Reply-To: python-ideas@marco.sulla.e4ward.com It's the From address that's the issue.
On 12/25/19 10:06 PM, MRAB wrote:
On 2019-12-26 02:09, python-ideas--- via Python-ideas wrote:
Andrew Barnert wrote: [snip]
you seem to have configured your python-ideas-posting address with the name “python-ideas” rather than with a name that can be used to distinguish you from other people. This will make conversations confusing.
............it's python-ideas@***marco.sulla****.e4ward.com ............................ Conduct: http://python.org/psf/codeofconduct/
The headers say:
From: python-ideas--- via Python-ideas <python-ideas@python.org> Reply-To: python-ideas@marco.sulla.e4ward.com
It's the From address that's the issue.
It may be the list doing DMARC protective munging, which moves the list address to the From: and the From: address to Reply-To: His domain is only at level quarantine, which doesn't cause as much of an issue, but still cause some problems, so maybe the list is set to include DMARC mitigation to domains with a DMARC of quarantine. The fact that his domain has a DMARC policy of quarantine says it really shouldn't be used on a mailing list, but that aspect is commonly ignored by the service providers. -- Richard Damon
On Wed, Dec 25, 2019, 9:11 PM python-ideas--- via Python-ideas < python-ideas@python.org> wrote:
On the contrary, on sets you can apply union *and* difference. And since union seems the exact contrary of difference, it's illogical that | is used instead of +.
Set union is self-evidently NOT the inverse operation of set difference. It's not merely not an obvious analogy, it's simply wrong. If we had a multi-set type (collections.Counter is close, but not exact), then there would be something close to an inverse of set difference. But that's not the type Python provides (nor should it as a built-in). Set union, however, has a great deal in common with bitwise-or, so using the same symbol is intuitive.
On Wed, Dec 25, 2019, at 21:09, python-ideas--- via Python-ideas wrote:
On the contrary, on sets you can apply union *and* difference. And since union seems the exact contrary of difference, it's illogical that | is used instead of +.
But sets also support symmetric difference ^, and intersection &. All the bitwise operators mean the same thing that they do for an integer imagined as a set of bit values. The use of - for difference is the odd one out, and it's only this way because for bit notation it's spelled &~ and there's no ~ operator to make an "anti-set".
participants (18)
-
Abdur-Rahmaan Janhangeer
-
Andrew Barnert
-
Antoine Rozo
-
Brandt Bucher
-
Brett Cannon
-
Chris Angelico
-
David Mertz
-
Guido van Rossum
-
Inada Naoki
-
Kyle Stanley
-
Marco Sulla
-
MRAB
-
python-ideas@marco.sulla.e4ward.com
-
Random832
-
Richard Damon
-
Ryan
-
Serhiy Storchaka
-
Steven D'Aprano