[Numpy-discussion] Type of 1st argument in Numexpr where()

Wed Dec 20 16:29:57 EST 2006

Ivan Vilata i Balaguer wrote:
> Tim Hochberg (el 2006-12-20 a les 09:20:01 -0700) va dir::
>
>   
>> Actually, this is on purpose. Numpy.where (and most other switching 
>> constructs in Python) will switch on almost anything. In particular, any 
>> number that is nonzero is considered True, zero is considered False. By 
>> changing the signature, you're restricting where to only accepting 
>> booleans. Since booleans and ints can by freely cast to doubles in 
>> numexpr, always using float for the condition saves us a couple of opcodes.
>> [...]
>>     
>
> Yes, I understand the reasons you expose here.  Nou you brought the
> topic about, I'm curious about what does "always using float for the
> condition saves us a couple of opcodes" mean.  Could you explain this?
> Just for curiosity. :)
>   
Let's look at simpler than where, which is a confusing function. How 
about *sin*. Also, let's pretend complex numbers don't exist to make 
things still simpler. There is only a single *sin* function defined in 
the numexpr interpreter, and it operates on floats. This works because 
the numexpr compiler is smart enough to insert cast opcodes to convert 
boolean or integer types to floats before operating on the with the 
*sin* opcode which strictly works on floats (remember we are pretending 
complex numbers don't exist).

The situation with the first argument to where is analogous. Booleans 
and ints are automagically promoted to floats. Since the opcode is 
designed to work on floats everything works great. And, we only need a 
single opcode to treat bools, ints and float. That is where "saving a 
couple of opcodes" comes in. However::

   1. Booleans are probably more common than floats as the argument to
      where. At present floats are the most efficient case; other cases
      incur some extra overhead due to casting.
   2. It doesn't work for complex values.

Problem #2 is easily fixable, should we so desire, simply by adding 
another opcode. Problem #1 is not so easy.

It would be possible to adapt your original idea. We could do the following:

   1. Add a function boolean() to the numexpr namespace. This would cast
      it's argument to an array of bools.
   2. Tweak the compile (actually, probably where_func in
      expressions.py) to compile where(x,a,b) as where(bool(x),a,b)
   3. Change where to take bools as the first argument.

Or, maybe it would be cleaner to instead change the casting rules so 
that casting to bool happens automagically. Having cycles in the casting 
rules frightens me a bit, but it could probably be made to work.

So, in summary, I think that the general idea you proposed could be made 
to work with some more effort. Conceptually, it's cleaner and it could 
be made more efficient for the common case. On the downside, this would 
require three new opcodes, as opposed to a single new opcode to do the 
simple minded fix. So, I'm still a bit up in the air as to whether it's 
a good idea.

-tim