[Numpy-discussion] Back to numexpr
Tim Hochberg
tim.hochberg at cox.net
Tue Jun 13 15:49:45 EDT 2006
Francesc Altet wrote:
>A Dimarts 13 Juny 2006 20:46, Tim Hochberg va escriure:
>
>
>>>Uh, I'm afraid that yes. In PyTables, int64, while being a bit bizarre for
>>>some users (specially in 32-bit platforms), is a type with the same rights
>>>than the others and we would like to give support for it in numexpr. In
>>>fact, Ivan Vilata already has implemented this suport in our local copy
>>>of numexpr, so perhaps (I say perhaps because we are in the middle of a
>>>big project now and are a bit scarce of time resources) we can provide
>>>the patch against the latest version of David for your consideration.
>>>With this we can solve the problem with int64 support in 32-bit platforms
>>>(although addmittedly, the VM gets a bit more complicated, I really think
>>>that this is worth the effort)
>>>
>>>
>>In addition to complexity, I worry that we'll overflow the code cache at
>>some point and slow everything down. To be honest I have no idea at what
>>point that is likely to happen, but I know they worry about it with the
>>Python interpreter mainloop.
>>
>>
>
>That's true. I didn't think about this :-/
>
>
>
>>Also, it becomes much, much slower to
>>compile past a certain number of case statements under VC7, not sure
>>why. That's mostly my problem though.
>>
>>
>
>No, this is a general problem (I'd say much more in GCC, because the optimizer
>runs so slooooow). However, this should only affect to poor developers, not
>users and besides, we should find a solution for int64 in 32-bit platforms.
>
>
Yeah. This is just me whining. Under VC7, there is a very sudden change
when adding more cases where compile times go from seconds to minutes. I
think we're already past that now anyway, so slowing that down more
isn't going to hurt me. Overflowing the cache is the real thing I worry
about.
>>One idea that might be worth trying for int64 is to special case them
>>using functions. That is using OP_FUNC_LL and OP_FUNC_LLL and some
>>casting opcodes. This could support int64 with relatively few new
>>opcodes. There's obviously some exta overhead introduced here by the
>>function call. How much this matters is probably a function of how well
>>the compiler / hardware supports int64 to begin with.
>>
>>
>
>Mmm, in my experience int64 operations are reasonable well supported by modern
>32-bit processors (IIRC they normally take twice of the time than int32 ops).
>
>The problem with using a long for representing ints in numexpr is that we have
>the duality of being represented differently in 32/64-bit platforms and that
>could a headache in the long term (int64 support in 32-bit platforms is only
>one issue, but there should be more). IMHO, it is much better to assign the
>role for ints in numexpr to a unique datatype, and this should be int64, for
>the sake of wide int64 support, but also for future (and present!) 64-bit
>processors. The problem would be that operations with 32-bit ints in 32-bit
>processors can be slowed-down by a factor 2x (or more, because there is a
>casting now), but in exchange, whe have full portable code and int64 support.
>
>
This certainly makes things simpler. I think that this would be fine
with me since I mostly use float and complex, so the speed issue
wouldn't hit me much. But that's 'cause I'm selfish that way ;-)
>In case we consider entering this way, we have two options here: keep VM
>simple and advertise that int32 arithmetic in numexpr in 32-bit platforms
>will be sub-optimal, or, as we already have done, add the proper machinery to
>support both integer separately (at the expense of making the VM more
>complex). Or perhaps David can come with a better solution (vmgen from
>gforth? no idea what this is, but the name sounds sexy;-)
>
>
Yeah!
>>That brings up another point. We probably don't want to have casting
>>opcodes from/to everything. Given that there are 8 types on the table
>>now, if we support every casting opcode we're going to have 56(?)
>>opcodes just for casting. I imagine what we'll have to do is write a
>>cast from int16 to float as OP_CAST_Ii; OP_CAST_FI; trading an extra
>>step in these cases for keeping the number of casting opcodes under
>>control. Once again, int64 is problematic since you lose precision
>>casting to int. I guess in this case you could get by with being able to
>>cast back and forth to float and int. No need to cast directly to
>>booleans, etc as two stage casting should suffice for this.
>>
>>
>
>Well, we already thought about this. Not only you can't safely cast an int64
>to an int32 without loosing precistion, but what is worse, you can't even
>cast it to any other commonly available datatype (casting to a float64 will
>also loose precision). And, although you can afford loosing precision when
>dealing with floating data in some scenarios (but not certainly with a
>general-purpose library like numexpr tries to be), it is by any means
>unacceptable loosing 'precision' in ints. So, to my mind, the only solution
>is completely avoiding casting int64 to any type.
>
>
I forgot that the various OP_CAST_xy opcodes only do safe casting. That
makes the number of potential casts much less, so I guess this is not as
big a deal as I thought. I'm still not sure, for instance, if we need
boolean to int16, int32, int64, float32, float64, complex64 and
complex128. It wouldn't kill us, but it's probably overkill.
-tim
More information about the NumPy-Discussion
mailing list