Cartesian Product on `__mul__`

I think it looks very fine when you type {1, 2, 3} * {"a", "b", "c"} and get set(itertools.product({1, 2, 3}, {"a", "b", "c"})). So i am proposing set multiplication implementation as cartesian product.

On Jul 25, 2019, at 09:46, Batuhan Taskaya <isidentical@gmail.com> wrote:
I think it looks very fine when you type {1, 2, 3} * {"a", "b", "c"} and get set(itertools.product({1, 2, 3}, {"a", "b", "c"})). So i am proposing set multiplication implementation as cartesian product.
I think it might make more sense to reopen the discussion of using @ for cartesian product for all containers, for a few reasons. * The * operator means repeat on lists, tuples, and strings. While there are already some operator differences between sets and other containers, but it’s generally the types supporting different operators (like sets having | and &, and not having +) rather than giving different meanings to the same operators. Plus, tuples are frequently used for small things that are conceptually sets or set-like; this is even enshrined in APIs like str.endswith and isinstance. * (Ordered) Cartesian product for lists, tuples, iterators, etc. makes just as much sense even when they’re not being used as pseudo-sets, ordered sets, or multisets, and might even be more common in code in practice (which is presumably why there’s a function for that in the stdlib). So, why only allow it for sets? * Cartesian product feels more like matrix multiplication than like scalar or elementwise multiplication or repetition, doesn’t it? * Adding @ to the builtin containers was already raised during the @ discussion and tabled for the future. I believe this is because the NumPy community wanted the proposal to be as minimal as possible so they could be sure of getting it, and get it ASAP, not because anyone had or anticipated objections beyond “is it common enough of a need to be worth it?” I don’t think this discussion would need to include other ideas deferred at the same time, like @ for composition on functions. (Although @ on iterators might be worth bringing up, if only to reject it. We don’t allow + between arbitrary iterables defaulting to chain but meaning type(lhs)(chain(…)) on some types, so why do the equivalent with @?)

On 25/07/2019 19.41, Andrew Barnert via Python-ideas wrote:
Adding @ to sets for Cartesian products *might* be reasonable, but giving @ a meaning other than matrix multiplication for ordered collections (lists, tuples) sounds like a terrible idea that will only cause confusion.

Batuhan Taskaya wrote:
I'm not sure this would be used frequently enough to justify making it a built-in operation on sets. I can't think of any situation where I've wanted to materialise a Cartesian product as an actual set object. Usually I'm iterating over it and doing something else, and we already have more flexible ways of doing that -- list comprehensions, nested loops, etc. Also, this would really only work sensibly for Cartesian products of two sets, not three or more. Writing s1 * s2 * s3 wouldn't give you a set of 3-tuples (a, b, c), but a set of 2-tuples ((a, b), c). Mathematicians usually gloss over this distinction, but in programming it becomes important. -- Greg

On Jul 25, 2019, at 14:57, Chris Angelico <rosuav@gmail.com> wrote:
The usual set-theoretic definition of tuples is just recursively as ordered pairs: () is 0, (a) is a, (a, b) is <a, b>, (a, b, c) is <<a, b>, c>, etc. So, you don’t have to gloss over anything; s1 * s2 * s3 gives you elements ((a, b), c), but those are identical to elements (a, b, c). The reason it’s important in Python is that Python tuples aren’t mathematical tuples (except maybe for len==2). But then Python sets aren’t infinite, don’t contain other sets, etc., so…
Both could be solved if the product of a set and a set was a "Cartesian product" object that just retains an iterator on each.
But then, when you actually want the nested product, would you have to write something like set(a*b)*c? More realistically, you’d probably be producting two things that you thought were independent objects, when it turns out that one is actually a product object wrapping two sets and you unexpectedly got a flattened product, and now have to debug why each game[1] is a player rather than a game number because each game is a (player, player, number) rather than a (dontcare, number)… Even more realistically, both desired possibilities would probably would come up so rarely that nobody would have a good intuition for what to expect, so a magic collapsing product object is probably a waste of effort. Because really, when you need N-way products, you almost always need either iterators or numpy in the first place. But if the OP has some good examples, maybe we can extrapolate from those instead of guessing what might be useful if examples existed. :)

Andrew Barnert via Python-ideas wrote:
But that makes the cartesian product operator non-associative, since s1 * (s2 * s3) would be a set of <a, <b, c>>. I still maintain that *in practice* mathematicians ignore the issue and consider that ((a, b), c), (a, (b, c)) and (a, b, c) are just different ways of writing essentially the same thing. A programming language *could* be designed with a tuple type for which that is literally true, but Python wasn't designed that way. And I don't think it's a problem at all. You can easily write a comprehension that gives you whatever kind of cartesian product you want. -- Greg

IMO if you need the concrete Cartesian product instantiated you're probably doing something wrong, or you're addicted to a certain kind of programming competitions with highly mathematical puzzles. itertools.product() is good enough for the occasional legitimate use case (I think I recall encountering one in the past decade or so). Batuhan, if you still want to continue to debate this, please show some real use cases of programs where itertools.product() makes it hard for the human reader to understand the code. Examples like {1, 2, 3} * {"a", "b", "c"} do *not* count. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Nop, operators are useful as you've said and it is why i am proposing this. neopythonic.blogspot.com/2019/03/why-operators-are-useful.html
please show some real use cases of programs where itertools.product() makes it hard for the human reader to understand the code.
Its more clear and understandable than itertools.product for people from outside of a programming background. The simplicity is useful when we are working with multiple expressions at sametime. (cset(a & b) * cset(a | b)) Also there are some benefits of it for external libraries like ORMs. I am spending most of my time on repl for debugging my models and when i query something it returns set of values. If i want to get every permutation of 2 different query set i need to import itertools and call itertools.product every time. But i am giving up the idea because of not getting enough attention from community. On Sat, Jul 27, 2019 at 12:06 AM Guido van Rossum <guido@python.org> wrote:
IMO if you need the concrete Cartesian product instantiated you're
probably doing something wrong, or you're addicted to a certain kind of programming competitions with highly mathematical puzzles. itertools.product() is good enough for the occasional legitimate use case (I think I recall encountering one in the past decade or so).
Batuhan, if you still want to continue to debate this, please show some
real use cases of programs where itertools.product() makes it hard for the human reader to understand the code. Examples like {1, 2, 3} * {"a", "b", "c"} do *not* count.

On 07/27/2019 03:17 AM, Batuhan Taskaya wrote:
Its more clear and understandable than itertools.product for people from outside of a programming background. The simplicity is useful when we are working with multiple expressions at sametime. (cset(a & b) * cset(a | b))
So build that functionality into cset. I believe that's what NumPy does. -- ~Ethan~

On Jul 25, 2019, at 09:46, Batuhan Taskaya <isidentical@gmail.com> wrote:
I think it looks very fine when you type {1, 2, 3} * {"a", "b", "c"} and get set(itertools.product({1, 2, 3}, {"a", "b", "c"})). So i am proposing set multiplication implementation as cartesian product.
I think it might make more sense to reopen the discussion of using @ for cartesian product for all containers, for a few reasons. * The * operator means repeat on lists, tuples, and strings. While there are already some operator differences between sets and other containers, but it’s generally the types supporting different operators (like sets having | and &, and not having +) rather than giving different meanings to the same operators. Plus, tuples are frequently used for small things that are conceptually sets or set-like; this is even enshrined in APIs like str.endswith and isinstance. * (Ordered) Cartesian product for lists, tuples, iterators, etc. makes just as much sense even when they’re not being used as pseudo-sets, ordered sets, or multisets, and might even be more common in code in practice (which is presumably why there’s a function for that in the stdlib). So, why only allow it for sets? * Cartesian product feels more like matrix multiplication than like scalar or elementwise multiplication or repetition, doesn’t it? * Adding @ to the builtin containers was already raised during the @ discussion and tabled for the future. I believe this is because the NumPy community wanted the proposal to be as minimal as possible so they could be sure of getting it, and get it ASAP, not because anyone had or anticipated objections beyond “is it common enough of a need to be worth it?” I don’t think this discussion would need to include other ideas deferred at the same time, like @ for composition on functions. (Although @ on iterators might be worth bringing up, if only to reject it. We don’t allow + between arbitrary iterables defaulting to chain but meaning type(lhs)(chain(…)) on some types, so why do the equivalent with @?)

On 25/07/2019 19.41, Andrew Barnert via Python-ideas wrote:
Adding @ to sets for Cartesian products *might* be reasonable, but giving @ a meaning other than matrix multiplication for ordered collections (lists, tuples) sounds like a terrible idea that will only cause confusion.

Batuhan Taskaya wrote:
I'm not sure this would be used frequently enough to justify making it a built-in operation on sets. I can't think of any situation where I've wanted to materialise a Cartesian product as an actual set object. Usually I'm iterating over it and doing something else, and we already have more flexible ways of doing that -- list comprehensions, nested loops, etc. Also, this would really only work sensibly for Cartesian products of two sets, not three or more. Writing s1 * s2 * s3 wouldn't give you a set of 3-tuples (a, b, c), but a set of 2-tuples ((a, b), c). Mathematicians usually gloss over this distinction, but in programming it becomes important. -- Greg

On Jul 25, 2019, at 14:57, Chris Angelico <rosuav@gmail.com> wrote:
The usual set-theoretic definition of tuples is just recursively as ordered pairs: () is 0, (a) is a, (a, b) is <a, b>, (a, b, c) is <<a, b>, c>, etc. So, you don’t have to gloss over anything; s1 * s2 * s3 gives you elements ((a, b), c), but those are identical to elements (a, b, c). The reason it’s important in Python is that Python tuples aren’t mathematical tuples (except maybe for len==2). But then Python sets aren’t infinite, don’t contain other sets, etc., so…
Both could be solved if the product of a set and a set was a "Cartesian product" object that just retains an iterator on each.
But then, when you actually want the nested product, would you have to write something like set(a*b)*c? More realistically, you’d probably be producting two things that you thought were independent objects, when it turns out that one is actually a product object wrapping two sets and you unexpectedly got a flattened product, and now have to debug why each game[1] is a player rather than a game number because each game is a (player, player, number) rather than a (dontcare, number)… Even more realistically, both desired possibilities would probably would come up so rarely that nobody would have a good intuition for what to expect, so a magic collapsing product object is probably a waste of effort. Because really, when you need N-way products, you almost always need either iterators or numpy in the first place. But if the OP has some good examples, maybe we can extrapolate from those instead of guessing what might be useful if examples existed. :)

Andrew Barnert via Python-ideas wrote:
But that makes the cartesian product operator non-associative, since s1 * (s2 * s3) would be a set of <a, <b, c>>. I still maintain that *in practice* mathematicians ignore the issue and consider that ((a, b), c), (a, (b, c)) and (a, b, c) are just different ways of writing essentially the same thing. A programming language *could* be designed with a tuple type for which that is literally true, but Python wasn't designed that way. And I don't think it's a problem at all. You can easily write a comprehension that gives you whatever kind of cartesian product you want. -- Greg

IMO if you need the concrete Cartesian product instantiated you're probably doing something wrong, or you're addicted to a certain kind of programming competitions with highly mathematical puzzles. itertools.product() is good enough for the occasional legitimate use case (I think I recall encountering one in the past decade or so). Batuhan, if you still want to continue to debate this, please show some real use cases of programs where itertools.product() makes it hard for the human reader to understand the code. Examples like {1, 2, 3} * {"a", "b", "c"} do *not* count. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Nop, operators are useful as you've said and it is why i am proposing this. neopythonic.blogspot.com/2019/03/why-operators-are-useful.html
please show some real use cases of programs where itertools.product() makes it hard for the human reader to understand the code.
Its more clear and understandable than itertools.product for people from outside of a programming background. The simplicity is useful when we are working with multiple expressions at sametime. (cset(a & b) * cset(a | b)) Also there are some benefits of it for external libraries like ORMs. I am spending most of my time on repl for debugging my models and when i query something it returns set of values. If i want to get every permutation of 2 different query set i need to import itertools and call itertools.product every time. But i am giving up the idea because of not getting enough attention from community. On Sat, Jul 27, 2019 at 12:06 AM Guido van Rossum <guido@python.org> wrote:
IMO if you need the concrete Cartesian product instantiated you're
probably doing something wrong, or you're addicted to a certain kind of programming competitions with highly mathematical puzzles. itertools.product() is good enough for the occasional legitimate use case (I think I recall encountering one in the past decade or so).
Batuhan, if you still want to continue to debate this, please show some
real use cases of programs where itertools.product() makes it hard for the human reader to understand the code. Examples like {1, 2, 3} * {"a", "b", "c"} do *not* count.

On 07/27/2019 03:17 AM, Batuhan Taskaya wrote:
Its more clear and understandable than itertools.product for people from outside of a programming background. The simplicity is useful when we are working with multiple expressions at sametime. (cset(a & b) * cset(a | b))
So build that functionality into cset. I believe that's what NumPy does. -- ~Ethan~
participants (7)
-
Andrew Barnert
-
Batuhan Taskaya
-
Chris Angelico
-
Ethan Furman
-
Greg Ewing
-
Guido van Rossum
-
Thomas Jollans