[Numpy-discussion] Deprecate boolean math operators?

Fri Dec 6 09:32:16 EST 2013

On Fri, Dec 6, 2013 at 4:39 AM, Sebastian Berg
<sebastian at sipsolutions.net> wrote:
> On Thu, 2013-12-05 at 23:02 -0500, josef.pktd at gmail.com wrote:
>> On Thu, Dec 5, 2013 at 10:56 PM, Alexander Belopolsky <ndarray at mac.com> wrote:
>> > On Thu, Dec 5, 2013 at 5:37 PM, Sebastian Berg <sebastian at sipsolutions.net>
>> > wrote:
>> >> there was a discussion that for numpy booleans math operators +,-,* (and
>> >> the unary -), while defined, are not very helpful.
>> >
>> > It has been suggested at the Github that there is an area where it is useful
>> > to have linear algebra operations like matrix multiplication to be defined
>> > over a semiring:
>> >
>> > http://en.wikipedia.org/wiki/Logical_matrix
>> >
>> > This still does not justify having unary or binary -, so I suggest that we
>> > first discuss deprecation of those.
>>
>> Does it make sense to only remove - and maybe / ?
>>
>> would python sum still work?   (I almost never use it.)
>>
>> >>> sum(mask)
>> 2
>> >>> sum(mask.tolist())
>> 2
>>
>> is accumulate the same as sum and would keep working?
>>
>> >>> np.add.accumulate(mask)
>> array([0, 0, 0, 1, 2])
>>
>>
>> In operation with other dtypes, do they still dominate so these work?
>>
>
> Hey,

In statistics and econometrics (and economic theory) we just use an
indicator function 1_{x=5} which has largely the same properties as a
numpy bool array, at least in my code.

some of the common operations are *, dot and kron.

So far this has worked quite well as intuition, plus numpy casting rules.

dot is the main surprise, because I thought that it would upcast. (I
always think of dot as a np.linalg.)

>
> of course the other types will always dominate interpreting bools as 0
> and 1. This would only affect operations with only booleans.

My guess is that this would leave then 90% of our (statsmodels)
possible usage alone.

There is still the case that with * we can calculate the intersection.

There is a
> good point that * is well defined however you define it, though. (Btw. /
> is not defined for bools, `np.bool_(True)/np.bool_(True)` will upcast to
> int8 to do the operation)
>
> However, while well defined, + is not defined like it is for python
> bools (which are just ints) so that is the reason to consider
> deprecation there (if we allow upcast to int8 -- or maybe the default
> int -- in the future, in-place += and -= operations would not behave
> differently, since they just cast back...).

Actually, I used + once:

The calculation in terms of indicator functions is

1_{A} + 1_{B} - 1_{A & B}

The last part avoids double counting, which is not necessary if numpy
casts back to bool.
Nothing that couldn't be replaced by logical operators, but the
(linear) algebra is not "logical".

In this case I did care about memory because the arrays are (nobs,
nobs) (nobs is the number of observations shape[0]) which can be
large, and I have a sparse version also. In most other case we use
astype(int) already very often, because eventually we still have to
cast and memory won't be a big problem.

The mental model is set membership and set operations with indicator
functions, not "logical", and I don't remember running into problems
with this so far, and happily ignored logical_xxx when I do linear
algebra instead of just working with masks of booleans.

Nevertheless: If I'm forced to, then I will get used to logical_xxx. (*)
And the above bool addition hasn't made it into statsmodels yet. I
used a simpler version because I thought initially it's too cute. (And
I was using an older numpy that couldn't do broadcasted dot.)

(*) how do you search in the documentation of `&` or `|`, I cannot
find what the other symbols are, if there are any.

>
> I suppose python sum works because it first tries using the C-Api number
> protocol, which also means it is not affected. If you were to write a
> sum which just uses the `+` operator, it would be affected, but that
> would seem good to me.

based on the ticket example, I'm not sure whether `+` should upcast or not.

>>> mm.dtype
dtype('bool')
>>> mm.sum(0)
array([48, 45, 56, 47])

>>> mm.sum(0, bool)
array([ True,  True,  True,  True], dtype=bool)
I would just use any

but what happens with logical cumsum

>>> mm[:5].cumsum(0, bool)
array([[False,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]], dtype=bool)

same as mm[:5].astype(int).cumsum(0, bool)  without casting

Josef

>
> - Sebastian
>
>
>> >>> x / mask
>> array([0, 0, 0, 3, 4])
>> >>> x * 1. / mask
>> array([ nan,  inf,  inf,   3.,   4.])
>> >>> x**mask
>> array([1, 1, 1, 3, 4])
>> >>> mask - 5
>> array([-5, -5, -5, -4, -4])
>>
>> Josef
>>
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion