[Numpy-discussion] Deprecate boolean math operators?

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Dec 6 11:12:11 EST 2013


On Fri, Dec 6, 2013 at 9:32 AM,  <josef.pktd at gmail.com> wrote:
> On Fri, Dec 6, 2013 at 4:39 AM, Sebastian Berg
> <sebastian at sipsolutions.net> wrote:
>> On Thu, 2013-12-05 at 23:02 -0500, josef.pktd at gmail.com wrote:
>>> On Thu, Dec 5, 2013 at 10:56 PM, Alexander Belopolsky <ndarray at mac.com> wrote:
>>> > On Thu, Dec 5, 2013 at 5:37 PM, Sebastian Berg <sebastian at sipsolutions.net>
>>> > wrote:
>>> >> there was a discussion that for numpy booleans math operators +,-,* (and
>>> >> the unary -), while defined, are not very helpful.
>>> >
>>> > It has been suggested at the Github that there is an area where it is useful
>>> > to have linear algebra operations like matrix multiplication to be defined
>>> > over a semiring:
>>> >
>>> > http://en.wikipedia.org/wiki/Logical_matrix
>>> >
>>> > This still does not justify having unary or binary -, so I suggest that we
>>> > first discuss deprecation of those.
>>>
>>> Does it make sense to only remove - and maybe / ?
>>>
>>> would python sum still work?   (I almost never use it.)
>>>
>>> >>> sum(mask)
>>> 2
>>> >>> sum(mask.tolist())
>>> 2
>>>
>>> is accumulate the same as sum and would keep working?
>>>
>>> >>> np.add.accumulate(mask)
>>> array([0, 0, 0, 1, 2])
>>>
>>>
>>> In operation with other dtypes, do they still dominate so these work?
>>>
>>
>> Hey,
>
>
> In statistics and econometrics (and economic theory) we just use an
> indicator function 1_{x=5} which has largely the same properties as a
> numpy bool array, at least in my code.
>
> some of the common operations are *, dot and kron.
>
> So far this has worked quite well as intuition, plus numpy casting rules.
>
> dot is the main surprise, because I thought that it would upcast. (I
> always think of dot as a np.linalg.)
>
>
>>
>> of course the other types will always dominate interpreting bools as 0
>> and 1. This would only affect operations with only booleans.
>
> My guess is that this would leave then 90% of our (statsmodels)
> possible usage alone.
>
> There is still the case that with * we can calculate the intersection.
>
>
> There is a
>> good point that * is well defined however you define it, though. (Btw. /
>> is not defined for bools, `np.bool_(True)/np.bool_(True)` will upcast to
>> int8 to do the operation)
>>
>> However, while well defined, + is not defined like it is for python
>> bools (which are just ints) so that is the reason to consider
>> deprecation there (if we allow upcast to int8 -- or maybe the default
>> int -- in the future, in-place += and -= operations would not behave
>> differently, since they just cast back...).
>
> Actually, I used + once:
>
> The calculation in terms of indicator functions is
>
> 1_{A} + 1_{B} - 1_{A & B}
>
> The last part avoids double counting, which is not necessary if numpy
> casts back to bool.
> Nothing that couldn't be replaced by logical operators, but the
> (linear) algebra is not "logical".
>
> In this case I did care about memory because the arrays are (nobs,
> nobs) (nobs is the number of observations shape[0]) which can be
> large, and I have a sparse version also. In most other case we use
> astype(int) already very often, because eventually we still have to
> cast and memory won't be a big problem.
>
> The mental model is set membership and set operations with indicator
> functions, not "logical", and I don't remember running into problems
> with this so far, and happily ignored logical_xxx when I do linear
> algebra instead of just working with masks of booleans.

http://en.wikipedia.org/wiki/Indicator_function
with the added advantage that we have also the version where +
constrains to (0, 1).
However `-` doesn't work properly because
>>> np.bool_(-5)
True
instead of False
except in the case `1 - mask`.

We really have two kinds of addition:

bool sum: for indicating set membership
counting sum: for counting number of elements.

from my viewpoint:

I would keep + and * since they work well     (bool + and count +)
minus - is partially broken and `/` looks useless

this casts anyway
>>> 1 - m1
array([1, 1, 0, 0, 0])

and I never thought of doing this
>>> True - m1
array([ True,  True, False, False, False], dtype=bool)

(python set defines minus but raises error on plus)

Josef

>
> Nevertheless: If I'm forced to, then I will get used to logical_xxx. (*)
> And the above bool addition hasn't made it into statsmodels yet. I
> used a simpler version because I thought initially it's too cute. (And
> I was using an older numpy that couldn't do broadcasted dot.)
>
> (*) how do you search in the documentation of `&` or `|`, I cannot
> find what the other symbols are, if there are any.
>
>>
>> I suppose python sum works because it first tries using the C-Api number
>> protocol, which also means it is not affected. If you were to write a
>> sum which just uses the `+` operator, it would be affected, but that
>> would seem good to me.
>
> based on the ticket example, I'm not sure whether `+` should upcast or not.
>
>>>> mm.dtype
> dtype('bool')
>>>> mm.sum(0)
> array([48, 45, 56, 47])
>
>>>> mm.sum(0, bool)
> array([ True,  True,  True,  True], dtype=bool)
> I would just use any
>
> but what happens with logical cumsum
>
>>>> mm[:5].cumsum(0, bool)
> array([[False,  True,  True,  True],
>        [ True,  True,  True,  True],
>        [ True,  True,  True,  True],
>        [ True,  True,  True,  True],
>        [ True,  True,  True,  True]], dtype=bool)
>
> same as mm[:5].astype(int).cumsum(0, bool)  without casting
>
> Josef
>
>
>>
>> - Sebastian
>>
>>
>>> >>> x / mask
>>> array([0, 0, 0, 3, 4])
>>> >>> x * 1. / mask
>>> array([ nan,  inf,  inf,   3.,   4.])
>>> >>> x**mask
>>> array([1, 1, 1, 3, 4])
>>> >>> mask - 5
>>> array([-5, -5, -5, -4, -4])
>>>
>>> Josef
>>>
>>> >
>>> > _______________________________________________
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion at scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> >
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list