[Numpy-discussion] Numpy arrays vs typed memoryviews

Sturla Molden sturla.molden at gmail.com
Sat Jan 25 12:34:32 EST 2014


I think I have said this before, but its worth a repeat: 

Pickle (including cPickle) is a slow hog! That might not be the overhead
you see, you just haven't noticed it yet. 

I saw this some years ago when I worked on shared memory arrays for Numpy
(cf. my account on Github). Shared memory really did not help to speed up
the IPC, because the entire overhead was dominated by pickle. (Shared
memory is a fine way of saving RAM, though.)

multiprocessing.Queue will use pickle for serialization, and is therefore
not the right tool for numerical parallel computing with Cython or NumPy.

In order to use multiprocessing efficiently with NumPy, we need a new Queue
type that knows about NumPy arrays (and/or Cython memoryviews), and treat
them as special cases. Getting rid of pickle altogether is the important
part, not facilitating its use even further. It is easy to make a Queue
type for Cython or NumPy arrays using a duplex pipe and couple of mutexes.
Or you can use shared memory as ringbuffer and atomic compare-and-swap on
the first bytes as spinlocks. It is not difficult to get the overhead of
queuing arrays down to little more than a memcpy. 

I've been wanting to do this for a while, so maybe it is time to start a
new toy project :) 

Sturla


Neal Hughes <hughes.neal at gmail.com> wrote:
> I like Cython a lot. My only complaint is that I have to keep switching 
> between the numpy array support and typed memory views. Both have there 
> advantages but neither can do every thing I need.
> 
> Memoryviews have the clean syntax and seem to work better in cdef classes
>  and in inline functions.
> 
> But Memoryviews can't be pickled and so can't be passed between
> processes.  Also there seems to be a high overhead on converting between
> memory views  and python numpy arrays. Where this overhead is a problem,
> or where i need  to use pythons multiprocessing module I tend to switch to numpy arrays.
> 
> If memory views could be converted into python fast, and pickled I would 
> have no need for the old numpy array support.
> 
> Wondering if these problems will ever be addressed, or if I am missing 
> something completely.
> 
> --
> 
> ---
> You received this message because you are subscribed to the Google Groups
> "cython-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cython-users+unsubscribe at googlegroups.com.
> For more options, visit <a
> href="https://groups.google.com/groups/opt_out.">https://groups.google.com/groups/opt_out.</a>
> 
> ------=_Part_1342_18667054.1390644115997
> Content-Type: text/html; charset=UTF-8
> Content-Transfer-Encoding: quoted-printable
> 
> <div dir=3D"ltr">I like Cython a lot. My only complaint is that I have to k=
> eep switching between the numpy array support and typed memory views. Both =
> have there advantages but neither can do every thing I need.<div><br></div>=
> <div>Memoryviews have the clean syntax and seem to work better in cdef clas=
> ses and in inline functions.</div><div><br></div><div>But Memoryviews can't=
>  be pickled and so can't be passed between processes. Also there seems to b=
> e a high overhead on converting between memory views and python numpy array=
> s. Where this overhead is a problem, or where i need to use pythons multipr=
> ocessing module I tend to switch to numpy arrays.</div><div><br></div><div>=
> If memory views could be converted into python fast, and pickled I would ha=
> ve no need for the old numpy array support.</div><div><br></div><div>Wonder=
> ing if these problems will ever be addressed, or if I am missing something =
> completely.<br></div><div><br></div><div><br></div></div>
> 
> <p></p>
> 
> -- <br />
> &nbsp;<br />
> --- <br />
> You received this message because you are subscribed to the Google Groups &=
> quot;cython-users&quot; group.<br />
> To unsubscribe from this group and stop receiving emails from it, send an e=
> mail to cython-users+unsubscribe at googlegroups.com.<br />
> For more options, visit <a href=3D"https://groups.google.com/groups/opt_out=
> ">https://groups.google.com/groups/opt_out</a>.<br />
> 
> ------=_Part_1342_18667054.1390644115997--




More information about the NumPy-Discussion mailing list