Re: [Python-Dev] Forking and Multithreading - enemy brothers
Pascal Chambon writes:
I don't really get it there... it seems to me that multiprocessing only requires picklability for the objects it needs to transfer, i.e those given as arguments to the called function, and thsoe put into multiprocessing queues/pipes. Global program data needn't be picklable - on windows it gets wholly recreated by the child process, from python bytecode.
So if you're having pickle errors, it must be because the "object_from_module_xyz" itself is *not* picklable, maybe because it contains references to unpicklable objets. In such case, properly implementing pickle magic methods inside the object should do it, shouldn't it ?
I'm also a long time lurker (and in financial software, coincidentally). Pascal is correct here. We use a number of C++ libraries wrapped via Boost.Python to do various calculations. The typical function calls return wrapped C++ types. Boost.Python types are not, unfortunately, pickleable. For a number of technical reasons, and also unfortunately, we often have to load these libraries in their own process, but we want to hide this from our users. We accomplish this by pickling the instance data, but importing the types fresh when we unpickle, all implemented in the magic pickle methods. We would lose any information that was dynamically added to the type in the remote process, but we simply don't do that. We very often have many unpickleable objects imported somewhere when we spin off our processes using the multiprocess library, and this does not cause any problems. Jesse Noller <jnoller <at> gmail.com> writes:
We already have an implementation that spawns a subprocess and then pushes the required state to the child. The fundamental need for things to be pickleable *all the time* kinda makes it annoying to work with.
This requirement puts a fairly large additional strain on working with unwieldy, wrapped C++ libraries in a multiprocessing environment. I'm not very knowledgeable on the internals of the system, but would it be possible to have some kind of fallback system whereby if an object fails to pickle we instead send information about how to import it? This has all kinds of limitations - it only works for importable things (i.e. not instances), it can potentially lose information dynamically added to the object, etc., but I thought I would throw the idea out there so someone knowledgeable can decide if it has any merit. Ben
On 04:58 pm, jaedan31@gmail.com wrote:
Jesse Noller <jnoller <at> gmail.com> writes:
We already have an implementation that spawns a subprocess and then pushes the required state to the child. The fundamental need for things to be pickleable *all the time* kinda makes it annoying to work with.
This requirement puts a fairly large additional strain on working with unwieldy, wrapped C++ libraries in a multiprocessing environment. I'm not very knowledgeable on the internals of the system, but would it be possible to have some kind of fallback system whereby if an object fails to pickle we instead send information about how to import it? This has all kinds of limitations - it only works for importable things (i.e. not instances), it can potentially lose information dynamically added to the object, etc., but I thought I would throw the idea out there so someone knowledgeable can decide if it has any merit.
It's already possible to define pickling for arbitrary objects. You should be able to do this for the kinds of importable objects you're talking about, and perhaps even for some of the actual instances (though that depends on how introspectable they are from Python, and whether the results of this introspection can be used to re-instantiate the object somewhere else). Take a look at the copy_reg module. Jean-Paul
Hello Some update about the spawnl() thingy ; I've adapted the win32 code to have a new unix Popen object, which works with a spawn() semantic. It's quite straightforward, and the mutiprocessing call of a python functions works OK. But I've run into some trouble : synchronization primitives. Win32 semaphore can be "teleported" to another process via the DuplicateHandle() call. But unix named semaphores don't work that way - instead, they must be opened with the same name by each spawned subprocess. The problem here, the current semaphore C code is optimized to forbid semaphore sharing (other than via fork) : use of (O_EXL|O_CREAT) on opening, immediate unlinking of new semaphores.... So if we want to benefit from sync primitives with this spawn() implementation, we need a working named semaphore implementation, too... What's the best in your opinion ? Editing the current multiprocessing semaphore's behaviour to allow (with specific options, attributes and methods) its use in this case ? Or adding a new NamedSemaphore type like this one ? http://semanchuk.com/philip/posix_ipc/ Regards, Pascal
participants (3)
-
Ben Walker -
exarkun@twistedmatrix.com -
Pascal Chambon