How to copy data from a C array to a numpy array efficiently?
Hi, I am developing a Python wrapper of the NASA CDF C library in Cython. I have got a working version of it now, but it is slower than the counterpart in IDL. For loading the same file, mine takes about 400 ms, whereas the IDL version takes about 290 ms. The main overhead in my code is caused by a for-loop of element-by-element copying. Here is the relevant code in cython: #-------------------------------- code ----------------------------------------------------- #-- double realData = numpy.zeros(lenData, np_dtype) dblEntry = <double *>malloc(lenData * sizeof(double)) status = CDFlib( SELECT_, zVAR_RECCOUNT_, numRecs, NULL_) status = CDFlib( GET_, zVAR_HYPERDATA_, dblEntry, NULL_) for ii in range(lenData): realData[ii] = dblEntry[ii] realData.shape = np_shape free(dblEntry) #--------------------------------- end of code ------------------------------------------- The time-consuming for-loop is highlighted in red. If I change range(lenData) to range(lenData/2), the cpu time will be down from 400 ms to 230 ms for the case I mentioned above. Because the element-by-element copying for-loop seems pretty naive to me, I am wondering if there is a better way to copy data from the C array, dblEntry, to the numpy array, realData. I tried the numpy C API PyArray_NewFromDescr with flag NPY_ENSURECOPY, but didn't get any luck. On the one hand, the flag above didn't seem to work as I expected, because I got memory deallocation failure error messages when I quitted ipython, where I tested my code, which I don't get if I use the naive for-loop. On the other hand, I can't figure out how to use PyArray_NewFromDescr correctly because the loaded data I got were not correct. Anyway, here is how I used PyArray_NewFromDescr: #----------------------------------------- code ------------------------------------------ cdef np.npy_intp dims[1] dims[0] = lenData realData = PyArray_NewFromDescr(numpy.ndarray, numpy.dtype(np_dtype), 1, dims, NULL, <void *>dblEntry, NPY_CARRAY|NPY_ENSURECOPY, None) free(dblEntry) #-------------------------------------- end of code -------------------------------------- BTW, it can be compiled successfully by cython, in case you are wondering if the code had all the necessary pieces, Thank you very much for reading. :-) Cheers, Jianbao
On 10/07/2012 08:41 AM, Jianbao Tao wrote:
Hi,
I am developing a Python wrapper of the NASA CDF C library in Cython. I have got a working version of it now, but it is slower than the counterpart in IDL. For loading the same file, mine takes about 400 ms, whereas the IDL version takes about 290 ms.
The main overhead in my code is caused by a for-loop of element-by-element copying. Here is the relevant code in cython: #-------------------------------- code ----------------------------------------------------- #-- double realData = numpy.zeros(lenData, np_dtype)
dblEntry = <double *>malloc(lenData * sizeof(double)) status = CDFlib( SELECT_, zVAR_RECCOUNT_, numRecs, NULL_) status = CDFlib( GET_, zVAR_HYPERDATA_, dblEntry, NULL_) for ii in range(lenData): realData[ii] = dblEntry[ii] realData.shape = np_shape free(dblEntry)
You don't say what np_dtype is here (or the Cython variable declaration for it). Assuming it is np.double and "cdef np.ndarray[double] realData", what you should do is simple pass the buffer of realData to the CDFlib function: status = CDFlib(GET_, ..., &realData[0], NULL) Then there's no need for copying. This is really what you should do anyway, then if the dtype is different leave it to the "astype" function (but then comparisons with IDL should take into account the dtype conversion). Dag Sverre
On 10/07/2012 08:48 AM, Dag Sverre Seljebotn wrote:
On 10/07/2012 08:41 AM, Jianbao Tao wrote:
Hi,
I am developing a Python wrapper of the NASA CDF C library in Cython. I have got a working version of it now, but it is slower than the counterpart in IDL. For loading the same file, mine takes about 400 ms, whereas the IDL version takes about 290 ms.
The main overhead in my code is caused by a for-loop of element-by-element copying. Here is the relevant code in cython: #-------------------------------- code ----------------------------------------------------- #-- double realData = numpy.zeros(lenData, np_dtype)
dblEntry = <double *>malloc(lenData * sizeof(double)) status = CDFlib( SELECT_, zVAR_RECCOUNT_, numRecs, NULL_) status = CDFlib( GET_, zVAR_HYPERDATA_, dblEntry, NULL_) for ii in range(lenData): realData[ii] = dblEntry[ii] realData.shape = np_shape free(dblEntry)
You don't say what np_dtype is here (or the Cython variable declaration for it).
Assuming it is np.double and "cdef np.ndarray[double] realData", what you should do is simple pass the buffer of realData to the CDFlib function:
status = CDFlib(GET_, ..., &realData[0], NULL)
Then there's no need for copying.
This is really what you should do anyway, then if the dtype is different leave it to the "astype" function (but then comparisons with IDL should take into account the dtype conversion).
To really answer your question (though in this case you should use a different approach), what you should use to copy data efficiently is the C memcpy function. Dag Sverre
On 10/07/2012 12:41 AM, Jianbao Tao wrote:
Hi,
I am developing a Python wrapper of the NASA CDF C library in Cython. I have got a working version of it now, but it is slower than the counterpart in IDL. For loading the same file, mine takes about 400 ms, whereas the IDL version takes about 290 ms.
Does spacepy.pycdf not work well for you? (Not Cython, of course...) As Dag pointed out, passing a pointer to the numpy array data in to the CDF library works very nicely, although you may need some contortions to handle column-major zVars. (Making the numpy array Fortran order doesn't quite do it since the record dimension doesn't move around.) -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jniehof@lanl.gov Correspondence / Technical data or Software Publicly Available
participants (3)
-
Dag Sverre Seljebotn
-
Jianbao Tao
-
Jonathan T. Niehof