Passing numpy arrays to matlab

Thu Nov 9 01:32:04 EST 2006

Josh Marshall wrote:
>
>
> I don't see how you are going to get around doing the copies. Matlab 
> is in a separate process from the Python interpreter, and there is no 
> shared memory. In what way do you want these proxy classes to "look 
> like numpy arrays"?
I am not talking about the copy in the matlab <-> python interaction. 
This is done through pipe, handled by the OS; I don't know the details, 
but I know that communication through pipe is quite fast under linux 
(see below), and is not the bottleneck.

>
> Note that mlabwrap creates proxy arrays, and only copies the data if 
> you actually request it to. (AFAIRemember) Otherwise you aren't losing 
> any speed, because there aren't going to be any copies.
There may be no copy for returned data you don't need, but that's not 
the case I am talking about. For all other cases, I don't think this is 
what's happening: if you take a look at mlabwrap, in the C mlabraw 
module, the function mlabraw_put always calls numeric2mx for arrays, 
which itself always calls makeMxFromNumeric, which makes a copy. Same in 
the other direction once you call mlabwrap_get. I am doing the same in 
my module, because that's the simplest thing to do.

The problem is that when you are using the function engPutVariable of 
the matlab engine API, you need to give a pointer to a mxArray 
structure, which is the C representation of a matlab array. You cannot 
say (this is one of the brain damaged thing of matlab C api I was 
talking in an other mail): build a mxArray from existing data: this is 
the copy I am talking about, and this is one expensive. In the best case 
(real numpy arrays with fortran storage), you can do a memcpy, but in 
most cases, you need to do something which takes strides into account 
(because complex matlab arrays are actually not fortran, or because by 
default, most numpy arrays are C storage, and this makes a difference 
for rank >= 2), which implies non-contiguous memory access, which is 
*really* expensive (around 2 cycles/byte at best, on my bi Xeon 3.2 Ghz).

Basically, if you want to do something like calling the resample 
function of matlab on an numpy array and using the result later in 
numpy, here is what's happening right now:

    1 copy numpy (or numarray in the case of mlabwrap, but this should 
not matter, I guess) data into an mxArray
    2 send the mxArray to matlab engine: done with pipe (imply copy ? At 
least, it is contiguous array copy)
    3 compute the thing into matlab
    4 send the result to python mxArray
    5 copy the data of the mxArray to numpy array

A quick profiling show that if you don't do any processing in matlab, 
just sending and getting an array back, 1 and 5 takes roughly 80-90 % of 
the time in my implementation (which is faster than mlabwrap, but I 
think this is just caused by the much fancier API of mlabwrap, ie the 
core mecanism to pass arrays should be roughly the same, as mlabwrap 
uses the C function makeMxFromNumeric, and I am using a similar function 
myself through ctypes), the 10-20% are used for the communication 
through the pipe. I believe that most typical usage cases involve 1 and 5.

5 should be avoidable in many cases if I know how to build a proxy class 
around the mxArray so that the the proxy behaves as a numpy array, with 
the buffer owned by the mxArray; but I don't know how to do that 
(particularly, how to handle the destruction of data, as the proxy 
should destroy the mxArray once the proxy  object is garbage collected). 
1 would be easy if the C matlab API was sane, which is not the case; 
they give functions which are impossible to use correctly (mxSetPr and 
mxSetData).
>
> What could be possible to do is add an array interface to the mlabwrap 
> proxy classes so they can be used as numpy arrays when required for 
> passing to numpy functions (or PIL, etc). Thus we only copy when we 
> want to use numpy functions. Then we could define the operators on the 
> proxy class to perform their operations on the other side of the bridge.
Yes, that's what I want to do, and in theory, this should be possible 
without copy; my initial question in the beginning of the thread is how 
to build a numpy proxy class from existing buffer of data, with the 
proxy becoming the owner of the data (ie should do all the deallocation, 
including here cleaning mxArray structures).

cheers,

David

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642