Should non ufunc numpy functions behave like ufunc regarding casting to output argument ?
Hi, I am trying to add support for out argument to one C function using numpy API (still the clip function). I was wondering about the expected behaviour when out does not have the "expected" type. For example, using again the clip function (but the question is not specific to this function) In [1]: import numpy In [2]: a = numpy.linspace(0, 10, 101) In [3]: b = numpy.zeros(a.shape, dtype = numpy.int32) In [4]: print a.dtype float64 In [5]: a.clip(0.1, 0.5, b) Should this be equivalent to b = a.clip(0.1, 0.5); b = b.astype(numpy.int32) (ie, the casting is done at the end, similar to an ufunc) ? cheers, David
On 15/01/07, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
Hi,
I am trying to add support for out argument to one C function using numpy API (still the clip function). I was wondering about the expected behaviour when out does not have the "expected" type. For example, using again the clip function (but the question is not specific to this function)
In [1]: import numpy
In [2]: a = numpy.linspace(0, 10, 101)
In [3]: b = numpy.zeros(a.shape, dtype = numpy.int32)
In [4]: print a.dtype float64
In [5]: a.clip(0.1, 0.5, b)
Should this be equivalent to b = a.clip(0.1, 0.5); b = b.astype(numpy.int32) (ie, the casting is done at the end, similar to an ufunc) ?
Since the point of output arguments is to avoid allocating new storage, I'm not sure whether to say yes or no here... but if you're given an output array to store the answer in, you're more or less forced to convert it to that type (and layout, and whatnot) for storage - imagine, for example, it is every third element of a bigger array (and remember you do not have access to the name "b"). for example: a = numpy.arange(10) b = numpy.zeros(40).astype(numpy.uint8) a.clip(0.1,5.2,b[::4]) You can't do anything about the data type of b, so you don't really have any choice but to convert to uint8; one hopes that this will not require the allocation of an extra array of floats. A. M. Archibald
On 1/15/07, A. M. Archibald <peridot.faceted@gmail.com> wrote:
On 15/01/07, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
Hi,
I am trying to add support for out argument to one C function using numpy API (still the clip function). I was wondering about the expected behaviour when out does not have the "expected" type. For example, using again the clip function (but the question is not specific to this function)
In [1]: import numpy
In [2]: a = numpy.linspace(0, 10, 101)
In [3]: b = numpy.zeros(a.shape, dtype = numpy.int32)
In [4]: print a.dtype float64
In [5]: a.clip(0.1, 0.5, b)
Should this be equivalent to b = a.clip(0.1, 0.5); b = b.astype(numpy.int32) (ie, the casting is done at the end, similar to an ufunc) ?
Since the point of output arguments is to avoid allocating new storage,
If we take that seriously, then an error should be raised on a shape, or type mismatch. Chuck
On 15/01/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
Since the point of output arguments is to avoid allocating new storage,
If we take that seriously, then an error should be raised on a shape, or type mismatch.
In fact: In [10]: a = zeros(3) In [11]: b = zeros(4,dtype=uint8) In [12]: add(a,a,b) --------------------------------------------------------------------------- exceptions.ValueError Traceback (most recent call last) /home/peridot/<ipython console> ValueError: invalid return array shape In [13]: add(a,a,b[:3]) Out[13]: array([0, 0, 0], dtype=uint8) In [14]: add(b,b,b.reshape((2,2))) --------------------------------------------------------------------------- exceptions.ValueError Traceback (most recent call last) /home/peridot/<ipython console> ValueError: invalid return array shape We do raise an error on shape mismatch; type mismatches are used to forcibly cast the result - which is useful! So I'd say we take it seriously. It's one of the ways to make your code run faster. A. M. Archibald
Charles R Harris wrote:
On 1/15/07, *A. M. Archibald* <peridot.faceted@gmail.com <mailto:peridot.faceted@gmail.com>> wrote:
On 15/01/07, David Cournapeau <david@ar.media.kyoto-u.ac.jp <mailto:david@ar.media.kyoto-u.ac.jp>> wrote: > Hi, > > I am trying to add support for out argument to one C function using > numpy API (still the clip function). I was wondering about the expected > behaviour when out does not have the "expected" type. > For example, using again the clip function (but the question is not > specific to this function) > > In [1]: import numpy > > In [2]: a = numpy.linspace(0, 10, 101) > > In [3]: b = numpy.zeros(a.shape, dtype = numpy.int32) > > In [4]: print a.dtype > float64 > > In [5]: a.clip(0.1, 0.5, b) > > Should this be equivalent to b = a.clip(0.1, 0.5); b = > b.astype(numpy.int32) (ie, the casting is done at the end, similar to an > ufunc) ?
Since the point of output arguments is to avoid allocating new storage,
If we take that seriously, then an error should be raised on a shape, or type mismatch.
For the shape, there is no problem, I guess. A different shape is an error (except if I want to support broadcasting...). The problem is really for out having same shape than in, but having different type than input. At first, I wanted to throw an error (for example, clipping an array which gives a float, and out is integer), but that would be incompatible with current clip behaviour. Concerning the point of avoiding allocating new storage, I am a bit suspicious: if the types do not match, and the casting is done at the end, then it means all internal computation will be done is whatever type is chosen by the function (I am using PyArray_CommonType for that), and the cast done at the end, meaning new storage. Actually, I find more logical to throw an error of the point is to avoid new storage, as giving a mismatched type out buffer would make the function create an internal buffer. David
David Cournapeau wrote:
Charles R Harris wrote:
On 1/15/07, *A. M. Archibald* <peridot.faceted@gmail.com <mailto:peridot.faceted@gmail.com>> wrote:
On 15/01/07, David Cournapeau <david@ar.media.kyoto-u.ac.jp <mailto:david@ar.media.kyoto-u.ac.jp>> wrote: > Hi, > > I am trying to add support for out argument to one C function using > numpy API (still the clip function). I was wondering about the expected > behaviour when out does not have the "expected" type. > For example, using again the clip function (but the question is not > specific to this function) > > In [1]: import numpy > > In [2]: a = numpy.linspace(0, 10, 101) > > In [3]: b = numpy.zeros(a.shape, dtype = numpy.int32) > > In [4]: print a.dtype > float64 > > In [5]: a.clip(0.1, 0.5, b) > > Should this be equivalent to b = a.clip(0.1, 0.5); b = > b.astype(numpy.int32) (ie, the casting is done at the end, similar to an > ufunc) ?
Since the point of output arguments is to avoid allocating new storage,
If we take that seriously, then an error should be raised on a shape, or type mismatch.
For the shape, there is no problem, I guess. A different shape is an error (except if I want to support broadcasting...). The problem is really for out having same shape than in, but having different type than input.
At first, I wanted to throw an error (for example, clipping an array which gives a float, and out is integer), but that would be incompatible with current clip behaviour.
Concerning the point of avoiding allocating new storage, I am a bit suspicious: if the types do not match, and the casting is done at the end, then it means all internal computation will be done is whatever type is chosen by the function (I am using PyArray_CommonType for that), and the cast done at the end, meaning new storage.
Actually, I find more logical to throw an error of the point is to avoid new storage, as giving a mismatched type out buffer would make the function create an internal buffer. Sorry, this last sentence is incomplete and does not really make sense: I meant
Actually, I find more logical to throw an error of the point is to avoid new storage, as giving a mismatched type out buffer would make the function create an new array when using the numpy C Api function PyArray_ConvertToCommonType. David
On Jan 15, 2007, at 10:41 PM, David Cournapeau wrote:
Concerning the point of avoiding allocating new storage, I am a bit suspicious: if the types do not match, and the casting is done at the end, then it means all internal computation will be done is whatever type is chosen by the function (I am using PyArray_CommonType for that), and the cast done at the end, meaning new storage.
Presumably you should do what ufuncs do: divide the computation up into blocks when the array is big. If a cast is required then you do the computation for each block, allocating new storage for that block. Then you do the cast for the block and copy it to the output array. That way you only have to allocate enough new storage for a single block, which is (potentially) much smaller than the whole array. Rick
On 1/16/07, Rick White <rlw@stsci.edu> wrote:
On Jan 15, 2007, at 10:41 PM, David Cournapeau wrote:
Concerning the point of avoiding allocating new storage, I am a bit suspicious: if the types do not match, and the casting is done at the end, then it means all internal computation will be done is whatever type is chosen by the function (I am using PyArray_CommonType for that), and the cast done at the end, meaning new storage.
Presumably you should do what ufuncs do: divide the computation up into blocks when the array is big. If a cast is required then you do the computation for each block, allocating new storage for that block. Then you do the cast for the block and copy it to the output array.
Well, at that point, why not making clip a ufunc, then... I was already wondering about that, but I don't know much about ufunc on C side. David
participants (5)
-
A. M. Archibald -
Charles R Harris -
David Cournapeau -
David Cournapeau -
Rick White