On Mon, Jun 26, 2017 at 6:14 PM, Juan Nunez-Iglesias <jni.soma@gmail.com> wrote:
OMG deprecating + would be a nightmare. I can’t even begin to count the number of times I’ve used e.g. np.sum(arr == num)… Originally with a dtype cast but generally I’ve removed it because it worked.

… But I just saw the behaviour of `sum` is different from that of adding arrays together (where it indeed means `or`), which I agree is confusing. As long as the sum and mean behaviours are unchanged, I won’t raise too much of a fuss. =P

Generally, although one might expect xor, what *I* would expect is for the behaviour to match the Python bool type, which is not the case right now. So my vote would be to modify ***in NumPy 2.0*** the behaviour of + and - to match Python’s built-in bool (ie upcasting to int). 

And, in general, I’m in favour of something as foundational as NumPy, in version 1.x, to follow semantic versioning and not break APIs until 2.x.

Juan.

On 27 Jun 2017, 9:25 AM +1000, Nathaniel Smith <njs@pobox.com>, wrote:
On Sun, Jun 25, 2017 at 9:45 AM, Stefan van der Walt
<stefanv@berkeley.edu> wrote:
Hi Chuck

On Sun, Jun 25, 2017, at 09:32, Charles R Harris wrote:

The boolean binary '-' operator was deprecated back in NumPy 1.9 and changed
to an error in 1.13. This caused a number of failures in downstream
projects. The choices now are to continue the deprecation for another couple
of releases, or simply give up on the change. For booleans, `a - b` was
implemented as `a xor b`, which leads to the somewhat unexpected identity `a
- b == b - a`, but it is a handy operator that allows simplification of some
functions, `numpy.diff` among therm. At this point I'm inclined to give up
on the deprecation and retain the old behavior. It is a bit impure but
perhaps we can consider it a feature rather than a bug.


What was the original motivation behind the deprecation? `xor` seems like
exactly what one would expect when subtracting boolean arrays.

But, in principle, I'm not against the deprecation (we've had to fix a few
problems that arose in skimage, but nothing big).

I believe that this happened as part of a review of the whole
arithmetic system for np.bool_. Traditionally, we have + is "or",
binary - is "xor", and unary - is "not".

Here are some identities you might expect, if 'a' and 'b' are np.bool_ objects:

a - b = a + (-b)
a + b - b = a
bool(a + b) = bool(a) + bool(b)
bool(a - b) = bool(a) - bool(b)
bool(-a) = -bool(a)

But in fact none of these identities hold. Furthermore, the np.bool_
arithmetic operations are all confusing synonyms for operations that
could be written more clearly using the proper boolean operators |, ^,
~, so they violate TOOWTDI. So I think the general idea was to
deprecate all of this nonsense.

It looks like what actually happened is that binary - and unary - got
deprecated a while back and are now raising errors in 1.13.0, but +
did not. This is sort of unfortunate, because binary - is the only one
of these that's somewhat defensible (it doesn't match the builtin bool
type, but it does at least correspond to subtraction in Z/2, so
identities like 'a - (b - b) = a' do hold).


That's because xor corresponds to addition in Z/2 and every element is its own additive inverse.
 
I guess my preference would be:
1) deprecate +
2) move binary - back to deprecated-but-not-an-error
3) fix np.diff to use logical_xor when the inputs are boolean, since
that seems to be what people expect
4) keep unary - as an error

And if we want to be less aggressive, then a reasonable alternative would be:
1) deprecate +
2) un-deprecate binary -
3) keep unary - as an error


Using '+' for 'or' and '*' for 'and' is pretty common and the variation of '+' for 'xor' was common back in the day because 'and' and 'xor' make boolean algebra a ring, which appealed to mathematicians as opposed to everyone else ;) You can see the same progression in measure theory where eventually intersection and xor (symmetric difference) was replaced with union and complement. Using '-' for xor is something I hadn't seen outside of numpy, but I suspect it must be standard somewhere.  I would leave '*' and '+' alone, as the breakage and inconvenience from removing them would be significant.

Chuck