Multiprocessing, shared memory vs. pickled copies
philip at semanchuk.com
Tue Apr 5 01:34:29 CEST 2011
On Apr 4, 2011, at 4:20 PM, John Ladasky wrote:
> I have been playing with multiprocessing for a while now, and I have
> some familiarity with Pool. Apparently, arguments passed to a Pool
> subprocess must be able to be pickled.
multiprocessing's use of pickle is not limited to Pool. For instance, objects put into a multiprocessing.Queue are also pickled, as are the args to a multiprocessing.Process. So if you're going to use multiprocessing, you're going to use pickle, and you need pickleable objects.
> Pickling is still a pretty
> vague progress to me, but I can see that you have to write custom
> __reduce__ and __setstate__ methods for your objects.
Well, that's only if one's objects don't support pickle by default. A lot of classes do without any need for custom __reduce__ and __setstate__ methods. Since you're apparently not too familiar with pickle, I don't want you to get the false impression that it's a lot of trouble. I've used pickle a number of times and never had to write custom methods for it.
> Now, I don't know that I actually HAVE to pass my neural network and
> input data as copies -- they're both READ-ONLY objects for the
> duration of an evaluate function (which can go on for quite a while).
> So, I have also started to investigate shared-memory approaches. I
> don't know how a shared-memory object is referenced by a subprocess
> yet, but presumably you pass a reference to the object, rather than
> the whole object. Also, it appears that subprocesses also acquire a
> temporary lock over a shared memory object, and thus one process may
> well spend time waiting for another (individual CPU caches may
> sidestep this problem?) Anyway, an implementation of a shared-memory
> ndarray is here:
There's no standard shared memory implementation for Python. The mmap module is as close as you get. I wrote & support the posix_ipc and sysv_ipc modules which give you IPC primitives (shared memory and semaphores) in Python. They work well (IMHO) but they're *nix-only and much lower level than multiprocessing. If multiprocessing is like a kitchen well stocked with appliances, posix_ipc (and sysc_ipc) is like a box of sharp knives.
Note that mmap and my IPC modules don't expose Python objects. They expose raw bytes in memory. YOu're still going to have to jump through some hoops (...like pickle) to turn your Python objects into a bytestream and vice versa.
What might be easier than fooling around with boxes of sharp knives is to convert your ndarray objects to Python lists. Lists are pickle-friendly and easy to turn back into ndarray objects once they've crossed the pickle boundary.
> When should one pickle and copy? When to implement an object in
> shared memory? Why is pickling apparently such a non-trivial process
> anyway? And, given that multi-core CPU's are apparently here to stay,
> should it be so difficult to make use of them?
My answers to these questions:
2) In Python, almost never unless you're using a nice wrapper like shmarray.py
3) I don't think it's non-trivial =)
4) No, definitely not. Python will only get better at working with multiple cores/CPUs, but there's plenty of room for improvement on the status quo.
Hope this helps
More information about the Python-list