[Numpy-discussion] On Numexpr and uint64 type

Mon Mar 10 14:50:55 EDT 2008

A Monday 10 March 2008, Charles R Harris escrigué:
> On Mon, Mar 10, 2008 at 11:08 AM, Francesc Altet <faltet at carabos.com> 
wrote:
> > Hi,
> >
> > In order to allow in-kernel queries in PyTables (www.pytables.org)
> > work with unsigned 64-bit integers, we would like to see uint64
> > support in Numexpr (http://code.google.com/p/numexpr/).
> >
> > To do this, we have to decide first how uint64 interacts with other
> > types.  For example, which should be the outcome of:
> >
> > numpy.array([1], 'int64') / numpy.array([2], 'uint64')
> >
> > Basically, there are a couple of possibilities:
> >
> > 1) To follow the behaviour of NumPy and upcast both operands to
> > float64 and do the operation.  That is:
> >
> > In [21]: numpy.array([1], 'int64') / numpy.array([2], 'uint64')
> > Out[21]: array([ 0.5])
> >
> > 2) Implement support for uint64 as a non-upcastable type, so that
> > one cannot merge uint64 operands with other types.  That is:
> >
> > In [21]: numpy.array([1], 'int64') / numpy.array([2], 'uint64')
> > Out[21]: TypeError: unsupported operand type(s) for /: 'int64'
> > and 'uint64'
> >
> > Solution 1) is appealing because is how NumPy works, but I don't
> > personally like the upcasting to float64.  First of all, because
> > you transparently convert numbers potentially loosing the least
> > significant digits.  Second, because an operation between integers
> > gives a float as a result, and this is different for typical
> > programming languages.
>
> I don't like the up(down)casting either. I suspect the original
> justification was preserving precision, but it doesn't do that.
> Addition of signed and unsinged numbers are the same in modular
> arithmetic, so simply treating everything as uint64 would, IMHO, be
> the best option there and for multiplication. Not everything has a
> modular inverse, but truncation is the C solution in that case. The
> question seems to be whether to return a signed or unsigned integer.
> Hmm. I would go for unsigned, which could be converted to signed by
> casting. The sign of the remainder might be a problem, though, which
> would give unusual truncation behavior.

Mmm, yes.  We've already considered converting all operands to uint64 
first too, and have an uint64 as an outcome too, but realized that we 
could have some difficulties when doing boolean comparisons in Numexpr.  
For example, if a is an int64 and b is uint64, and we want to 
compute "a + b", we could have:

In [44]: a = numpy.array([-4], 'int64')

In [45]: b = numpy.array([2], 'uint64')

In [46]: c = a.astype('uint64') + b.astype('uint64')

In [47]: c
Out[47]: array([18446744073709551614], dtype=uint64)

In [48]: c.astype('int64')
Out[48]: array([-2], dtype=int64)   # in case we want signed integers

The difficulty that we observed is that the expression 'a + b < 0' (i.e. 
checking for signedness) could surprise the unexperienced user (this 
would be evaluated as false because the outcome of a + b is unsigned).  
Having said that, this approach is completely consistent and, if 
properly documented, could be a nice way to implement uint64 for 
Numexpr case.

D. Cooke or T. Hochberg have something to say to that regard?

Thanks,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"