On Tue, Jun 27, 2017 at 5:35 PM, Nathaniel Smith <njs@pobox.com> wrote:

On Jun 26, 2017 6:56 PM, "Charles R Harris" <charlesr.harris@gmail.com> wrote:

On 27 Jun 2017, 9:25 AM +1000, Nathaniel Smith <njs@pobox.com>, wrote:
I guess my preference would be:
1) deprecate +
2) move binary - back to deprecated-but-not-an-error
3) fix np.diff to use logical_xor when the inputs are boolean, since
that seems to be what people expect
4) keep unary - as an error

And if we want to be less aggressive, then a reasonable alternative would be:
1) deprecate +
2) un-deprecate binary -
3) keep unary - as an error

Using '+' for 'or' and '*' for 'and' is pretty common and the variation of '+' for 'xor' was common back in the day because 'and' and 'xor' make boolean algebra a ring, which appealed to mathematicians as opposed to everyone else ;)

'+' for 'xor' and '*' for 'and' is perfectly natural; that's just + and * in Z/2. It's not only a ring, it's a field! '+' for 'or' is much weirder; why would you use '+' for an operation that's not even invertible? I guess it's a semi-ring. But we have the '|' character right there; there's no expectation that every weird mathematical notation will be matched in numpy... The most notable is that '*' doesn't mean matrix multiplication.

You can see the same progression in measure theory where eventually intersection and xor (symmetric difference) was replaced with union and complement. Using '-' for xor is something I hadn't seen outside of numpy, but I suspect it must be standard somewhere. I would leave '*' and '+' alone, as the breakage and inconvenience from removing them would be significant.

'*' doesn't bother me, because it really does have only one sensible behavior; even built-in bool() effectively uses 'and' for '*'.

But, now I remember... The major issue here is that some people want dot(a, b) on Boolean matrices to use these semantics, right? Because in this particular case it leads to some useful connections to the matrix representation for logical relations [1]. So it's sort of similar to the diff() case. For the basic operation, using '|' or '^' is fine, but there are these derived operations like 'dot' and 'diff' where people have different expectations.

I guess Juan's example of 'sum' is relevant here too. It's pretty weird that if 'a' and 'b' are one-dimensional boolean arrays, 'a @ b' and 'sum(a * b)' give totally different results.

So that's the fundamental problem: there are a ton of possible conventions that are each appealing in one narrow context, and they all contradict each other, so trying to shove them all into numpy simultaneously is messy.

I'm glad we at least seem to have succeeded in getting rid of unary '-', that one was particularly indefensible in the context of everything else :-). For the rest, I'm really not sure whether it's better to deprecate everything and tell people to use specialized tools for specialized purposes (e.g. add a 'logical_dot'), or to special case the high-level operations people want (make 'dot' and 'diff' continue to work, but deprecate + and -), or just leave the whole incoherent mish-mash alone.

-n

[1] https://en.wikipedia.org/wiki/Logical_matrix

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion