How to copy data from a C array to a numpy array efficiently?
Hi,
I am developing a Python wrapper of the NASA CDF C library in Cython. I
have got a working version of it now, but it is slower than the counterpart
in IDL. For loading the same file, mine takes about 400 ms, whereas the IDL
version takes about 290 ms.
The main overhead in my code is caused by a for-loop of element-by-element
copying. Here is the relevant code in cython:
#-------------------------------- code
-----------------------------------------------------
#-- double
realData = numpy.zeros(lenData, np_dtype)
dblEntry =
On 10/07/2012 08:41 AM, Jianbao Tao wrote:
Hi,
I am developing a Python wrapper of the NASA CDF C library in Cython. I have got a working version of it now, but it is slower than the counterpart in IDL. For loading the same file, mine takes about 400 ms, whereas the IDL version takes about 290 ms.
The main overhead in my code is caused by a for-loop of element-by-element copying. Here is the relevant code in cython: #-------------------------------- code ----------------------------------------------------- #-- double realData = numpy.zeros(lenData, np_dtype)
dblEntry =
malloc(lenData * sizeof(double)) status = CDFlib( SELECT_, zVAR_RECCOUNT_, numRecs, NULL_) status = CDFlib( GET_, zVAR_HYPERDATA_, dblEntry, NULL_) for ii in range(lenData): realData[ii] = dblEntry[ii] realData.shape = np_shape free(dblEntry)
You don't say what np_dtype is here (or the Cython variable declaration for it). Assuming it is np.double and "cdef np.ndarray[double] realData", what you should do is simple pass the buffer of realData to the CDFlib function: status = CDFlib(GET_, ..., &realData[0], NULL) Then there's no need for copying. This is really what you should do anyway, then if the dtype is different leave it to the "astype" function (but then comparisons with IDL should take into account the dtype conversion). Dag Sverre
On 10/07/2012 08:48 AM, Dag Sverre Seljebotn wrote:
On 10/07/2012 08:41 AM, Jianbao Tao wrote:
Hi,
I am developing a Python wrapper of the NASA CDF C library in Cython. I have got a working version of it now, but it is slower than the counterpart in IDL. For loading the same file, mine takes about 400 ms, whereas the IDL version takes about 290 ms.
The main overhead in my code is caused by a for-loop of element-by-element copying. Here is the relevant code in cython: #-------------------------------- code ----------------------------------------------------- #-- double realData = numpy.zeros(lenData, np_dtype)
dblEntry =
malloc(lenData * sizeof(double)) status = CDFlib( SELECT_, zVAR_RECCOUNT_, numRecs, NULL_) status = CDFlib( GET_, zVAR_HYPERDATA_, dblEntry, NULL_) for ii in range(lenData): realData[ii] = dblEntry[ii] realData.shape = np_shape free(dblEntry) You don't say what np_dtype is here (or the Cython variable declaration for it).
Assuming it is np.double and "cdef np.ndarray[double] realData", what you should do is simple pass the buffer of realData to the CDFlib function:
status = CDFlib(GET_, ..., &realData[0], NULL)
Then there's no need for copying.
This is really what you should do anyway, then if the dtype is different leave it to the "astype" function (but then comparisons with IDL should take into account the dtype conversion).
To really answer your question (though in this case you should use a different approach), what you should use to copy data efficiently is the C memcpy function. Dag Sverre
On 10/07/2012 12:41 AM, Jianbao Tao wrote:
Hi,
I am developing a Python wrapper of the NASA CDF C library in Cython. I have got a working version of it now, but it is slower than the counterpart in IDL. For loading the same file, mine takes about 400 ms, whereas the IDL version takes about 290 ms.
Does spacepy.pycdf not work well for you? (Not Cython, of course...) As Dag pointed out, passing a pointer to the numpy array data in to the CDF library works very nicely, although you may need some contortions to handle column-major zVars. (Making the numpy array Fortran order doesn't quite do it since the record dimension doesn't move around.) -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jniehof@lanl.gov Correspondence / Technical data or Software Publicly Available
participants (3)
-
Dag Sverre Seljebotn
-
Jianbao Tao
-
Jonathan T. Niehof