
On 24/06/15 22:50, M.-A. Lemburg wrote:
The tricky part is managing pointers in those data structures, e.g. a container types for other Python objects will have to store all referenced objects in the shared memory segment as well.
If a container type for Python objects contains some unknown object type we would have to use pickle as fallback.
For NumPy arrays using simple types this is a lot easier, since you don't have to deal with pointers to other objects.
The objects we deal with in scientific computing are usually arrays with a rather regular structure, not deeply nested Python objects. Even a more complex object like scipy.spatial.cKDTree is just a collection of a few contiguous arrays under the hood. So we could for most parts squash the pickle overhead that anyone will encounter by specializing a queue that has knowledge about a small set of Python types.
When saying "passing a lot of binary data over a pipe" you mean the meta-data ?
No, I mean the buffer pointed to by PyArray_DATA(obj) when using the NumPy C API. We have to send a lot of raw bytes over an IPC mechanism before this communication compares to the pickle overhead. Sturla