[Numpy-discussion] Back to numexpr

Tue Jun 13 13:08:38 EDT 2006

On Tue, Jun 13, 2006 at 09:56:37AM -0700, Tim Hochberg wrote:
> 
> I've finally got around to looking at numexpr again. Specifically, I'm 
> looking at Francesc Altet's numexpr-0.2, with the idea of harmonizing 
> the two versions. Let me go through his list of enhancements and comment 
> (my comments are dedented):
> 
>    - Addition of a boolean type. This allows better array copying times
>    for large arrays (lightweight computations ara typically bounded by
>    memory bandwidth).
> 
> Adding this to numexpr looks like a no brainer. Behaviour of booleans 
> are different than integers, so in addition to being more memory 
> efficient, this enables boolean &, |, ~, etc to work properly.
> 
>    - Enhanced performance for strided and unaligned data, specially for
>    lightweigth computations (e.g. 'a>10'). With this and the addition of
>    the boolean type, we can get up to 2x better times than previous
>    versions. Also, most of the supported computations goes faster than
>    with numpy or numarray, even the simplest one.
> 
> Francesc, if you're out there, can you briefly describe what this 
> support consists of? It's been long enough since I was messing with this 
> that it's going to take me a while to untangle NumExpr_run, where I 
> expect it's lurking, so any hints would be appreciated.
> 
>    - Addition of ~, & and | operators (a la numarray.where)
> 
> Sounds good.

All the above is checked in already :-)

>    - Support for both numpy and numarray (use the flag --force-numarray
>    in setup.py).
> 
> At first glance this looks like it doesn't make things to messy, so I'm 
> in favor of incorporating this.

... although I had ripped this all out. I'd rather have a numpy-compatible
numarray layer (at the C level, this means defining macros like PyArray_DATA)
than different code for each.

>    - Added a new benchmark for testing boolean expressions and
>    strided/unaligned arrays: boolean_timing.py
> 
> Benchmarks are always good.

Haven't checked that in yet.

> 
>    Things that I want to address in the future:
> 
>    - Add tests on strided and unaligned data (currently only tested
>    manually)
> 
> Yep! Tests are good.
> 
>    - Add types for int16, int64 (in 32-bit platforms), float32,
>      complex64 (simple prec.)
> 
> I have some specific ideas about how this should be accomplished. 
> Basically, I don't think we want to support every type in the same way, 
> since this is going to make the case statement blow up to an enormous 
> size. This may slow things down and at a minimum it will make things 
> less comprehensible.

I've been thinking how to generate the virtual machine programmatically,
specifically I've been looking at vmgen from gforth again. I've got other
half-formed ideas too (separate scalar machine for reductions?) that I'm
working on too.

But yes, the # of types does make things harder to redo :-)

> My thinking is that we only add casts for the extra 
> types and do the computations at high precision. Thus adding two int16 
> numbers compiles to two OP_CAST_Ffs followed by an OP_ADD_FFF, and then 
> a OP_CAST_fF.  The details are left as an excercise to the reader ;-). 
> So, adding int16, float32, complex64 should only require the addition of 
> 6 casting opcodes plus appropriate modifications to the compiler.

My thinking too.

> For large arrays, this should have most of the benfits of giving each 
> type it's own opcode, since the memory bandwidth is still small, while 
> keeping the interpreter relatively simple.
> 
> Unfortunately, int64 doesn't fit under this scheme; is it used enough to 
> matter? I hate pile a whole pile of new opcodes on for something that's 
> rarely used.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca