PyArray_SETITEM with object arrays in Cython

Hello, I am writing some Cython code and have noted that the buffer interface offers very little speedup for PyObject arrays. In trying to rewrite the same code using the C API in Cython, I find I can't get PyArray_SETITEM to work, in a call like: PyArray_SETITEM(result, <void *> iterresult.dataptr, obj) where result is an ndarray of dtype object, and obj is a PyObject*. Anyone have some experience with this can offer pointers (no pun intended!)? Thanks, Wes

Hi Wes, I do not profess to be an expert, but I have been off loading a fair number of loops to C from Python code and achieved significant improvements most have been of the following form (which I have found to be the fastest): size = *incomingArrayObj->dimensions; r_dptr = PyArray_DATA(resultArray); while(size--) { r_dptr = result; r_dptr++; } Where for multidimensional arrays r_dptr could be incremented by the number of dims rather than just ++: dims = PyArray_DIM(incomingArrayObj,1); i have not however actually used PyArray_SETITEM so cannot comment on the issue you are having. Hanni 2009/2/11 Wes McKinney <wesmckinn@gmail.com>
Hello,
I am writing some Cython code and have noted that the buffer interface offers very little speedup for PyObject arrays. In trying to rewrite the same code using the C API in Cython, I find I can't get PyArray_SETITEM to work, in a call like:
PyArray_SETITEM(result, <void *> iterresult.dataptr, obj)
where result is an ndarray of dtype object, and obj is a PyObject*.
Anyone have some experience with this can offer pointers (no pun intended!)?
Thanks, Wes
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Wes McKinney wrote:
I am writing some Cython code and have noted that the buffer interface offers very little speedup for PyObject arrays. In trying to rewrite the same code using the C API in Cython, I find I can't get PyArray_SETITEM to work, in a call like:
PyArray_SETITEM(result, <void *> iterresult.dataptr, obj)
where result is an ndarray of dtype object, and obj is a PyObject*.
Interesting. Whatever you end up doing, I'll make sure to integrate whatever works faster into Cython. I do doubt your results a bit though -- the buffer interface in Cython increfs/decrefs the objects, but otherwise it should be completely raw access, so using SETITEM shouldn't be faster except one INCREF/DECREF per object (i.e. still way faster than using Python). Could you perhaps post your Cython code? Dag Sverre

I actually got it to work-- the function prototype in the pxi file was wrong, needed to be: int PyArray_SETITEM(object obj, void* itemptr, object item) This still doesn't explain why the buffer interface was slow. The general problem here is an indexed array (by dates or strings, for example), that you want to conform to a new index. The arrays most of the time contain floats but occasionally PyObjects. For some reason the access and assignment is slow (this function can be faster by a factor of 50 with C API macros, so clearly something is awry)-- let me know if you see anything obviously wrong with this def reindexObject(ndarray[object, ndim=1] index, ndarray[object, ndim=1] arr, dict idxMap): ''' Using the provided new index, a given array, and a mapping of index-value correpondences in the value array, return a new ndarray conforming to the new index. ''' cdef object idx, value cdef int length = index.shape[0] cdef ndarray[object, ndim = 1] result = np.empty(length, dtype=object) cdef int i = 0 for i from 0 <= i < length: idx = index[i] if not PyDict_Contains(idxMap, idx): result[i] = None continue value = arr[idxMap[idx]] result[i] = value return result On Wed, Feb 11, 2009 at 3:25 PM, Dag Sverre Seljebotn < dagss@student.matnat.uio.no> wrote:
Wes McKinney wrote:
I am writing some Cython code and have noted that the buffer interface offers very little speedup for PyObject arrays. In trying to rewrite the same code using the C API in Cython, I find I can't get PyArray_SETITEM to work, in a call like:
PyArray_SETITEM(result, <void *> iterresult.dataptr, obj)
where result is an ndarray of dtype object, and obj is a PyObject*.
Interesting. Whatever you end up doing, I'll make sure to integrate whatever works faster into Cython.
I do doubt your results a bit though -- the buffer interface in Cython increfs/decrefs the objects, but otherwise it should be completely raw access, so using SETITEM shouldn't be faster except one INCREF/DECREF per object (i.e. still way faster than using Python).
Could you perhaps post your Cython code?
Dag Sverre
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Wes McKinney wrote:
The general problem here is an indexed array (by dates or strings, for example), that you want to conform to a new index. The arrays most of the time contain floats but occasionally PyObjects. For some reason the access and assignment is slow (this function can be faster by a factor of 50 with C API macros, so clearly something is awry)-- let me know if you see anything obviously wrong with this
Thanks for the example; I don't have time this week but I recorded it on the Cython trac and will hopefully get back to it soon. http://trac.cython.org/cython_trac/ticket/209 Dag Sverre

Wes McKinney wrote:
This still doesn't explain why the buffer interface was slow. I finally remembered to look at this; there seems to be a problem in your code: def reindexObject(ndarray[object, ndim=1] index, ndarray[object, ndim=1] arr, dict idxMap): ''' Using the provided new index, a given array, and a mapping of index-value correpondences in the value array, return a new ndarray conforming to the new index. ''' cdef object idx, value
cdef int length = index.shape[0] cdef ndarray[object, ndim = 1] result = np.empty(length, dtype=object)
cdef int i = 0 for i from 0 <= i < length: idx = index[i] if not PyDict_Contains(idxMap, idx): result[i] = None continue value = arr[idxMap[idx]] result[i] = value return result
The problem is with arr[idxMap[idx]]. The result from idxMap[idx] is a Python object which leads to non-efficient indexing. Use arr[<int>idxMap[idx]] instead. Dag Sverre
participants (3)
-
Dag Sverre Seljebotn
-
Hanni Ali
-
Wes McKinney