[Numpy-discussion] Back to numexpr

Tue Jun 13 14:46:15 EDT 2006

Francesc Altet wrote:

>Ei, numexpr seems to be back, wow! :-D
>
>A Dimarts 13 Juny 2006 18:56, Tim Hochberg va escriure:
>  
>
>>I've finally got around to looking at numexpr again. Specifically, I'm
>>looking at Francesc Altet's numexpr-0.2, with the idea of harmonizing
>>the two versions. Let me go through his list of enhancements and comment
>>(my comments are dedented):
>>    
>>
>
>Well, as David already said, he committed most of my additions some days 
>ago :-)
>
>  
>
>>    - Enhanced performance for strided and unaligned data, specially for
>>    lightweigth computations (e.g. 'a>10'). With this and the addition of
>>    the boolean type, we can get up to 2x better times than previous
>>    versions. Also, most of the supported computations goes faster than
>>    with numpy or numarray, even the simplest one.
>>
>>Francesc, if you're out there, can you briefly describe what this
>>support consists of? It's been long enough since I was messing with this
>>that it's going to take me a while to untangle NumExpr_run, where I
>>expect it's lurking, so any hints would be appreciated.
>>    
>>
>
>This is easy. When dealing with strided or unaligned vectors, instead of 
>copying them completely to well-behaved arrays, they are copied only when the 
>virtual machine needs the appropriate blocks. With this, there is no need to 
>write the well-behaved array back into main memory, which can bring an 
>important bottleneck, specially when dealing with large arrays. This allows a 
>better use of the processor caches because data is catched and used only when 
>the VM needs it. Also, I see that David has added support for byteswapped 
>arrays, which is great! 
>  
>
I'm looking at this now. I imagine it will become clear eventually. I've 
clearly forgotten some stuff over the last few months. Sigh.

First I need to get it to compile here. It seems that a few GCCisms have 
crept back in.

[SNIP]

>>rarely used.
>>    
>>
>
>Uh, I'm afraid that yes. In PyTables, int64, while being a bit bizarre for 
>some users (specially in 32-bit platforms), is a type with the same rights 
>than the others and we would like to give support for it in numexpr. In fact, 
>Ivan Vilata already has implemented this suport in our local copy of numexpr, 
>so perhaps (I say perhaps because we are in the middle of a big project now 
>and are a bit scarce of time resources) we can provide the patch against the 
>latest version of David for your consideration. With this we can solve the 
>problem with int64 support in 32-bit platforms (although addmittedly, the VM 
>gets a bit more complicated, I really think that this is worth the effort)
>  
>
In addition to complexity, I worry that we'll overflow the code cache at 
some point and slow everything down. To be honest I have no idea at what 
point that is likely to happen, but I know they worry about it with the 
Python interpreter mainloop. Also, it becomes much, much slower to 
compile past a certain number of case statements under VC7, not sure 
why. That's mostly my problem though.

One idea that might be worth trying for int64 is to special case them 
using functions. That is using OP_FUNC_LL and OP_FUNC_LLL and some 
casting opcodes. This could support int64 with relatively few new 
opcodes. There's obviously some exta overhead introduced here by the 
function call. How much this matters is probably a function of how well 
the compiler / hardware supports int64 to begin with.

That brings up another point. We probably don't want to have casting 
opcodes from/to everything. Given that there are 8 types on the table 
now, if we support every casting opcode we're going to have 56(?) 
opcodes just for casting. I imagine what we'll have to do is write a 
cast from int16 to float as OP_CAST_Ii; OP_CAST_FI; trading an extra 
step in these cases for keeping the number of casting opcodes under 
control. Once again, int64 is problematic since you lose precision 
casting to int. I guess in this case you could get by with being able to 
cast back and forth to float and int. No need to cast directly to 
booleans, etc as two stage casting should suffice for this.

-tim