[Twisted-Python] Question about processes in python
Hello everyone, I'd like to run a python function in a new process, like python 2.6's multiprocessing does. I wanted to use ProcessProtocol, but it seems to only work for executing executables and interacting with them using stdin/stdout/other fds. Is there a way of running a python piece of code in a new process? Thank you, Gabriel
Hi,
I'd like to run a python function in a new process, like python 2.6's multiprocessing does. I wanted to use ProcessProtocol, but it seems to only work for executing executables and interacting with them using stdin/stdout/other fds. Is there a way of running a python piece of code in a new process?
Well, the only way to run Python code is to run the Python interpreter in a process and feed it the python code you want to run (using the -c option). Cheers, Reza -- Reza Lotun mobile: +44 (0)7521 310 763 email: rlotun@gmail.com work: reza@tweetdeck.com twitter: @rlotun
Reza Lotun wrote:
Hi,
I'd like to run a python function in a new process, like python 2.6's multiprocessing does. I wanted to use ProcessProtocol, but it seems to only work for executing executables and interacting with them using stdin/stdout/other fds. Is there a way of running a python piece of code in a new process?
Well, the only way to run Python code is to run the Python interpreter in a process and feed it the python code you want to run (using the -c option).
Cheers, Reza
Hi Reza, yes, I'd thought of that, but I can't get the function's code, I tried the inspect module but it only works of the code is written to disk; I'd rather not have to first write the code to disk just to be able to turn it into a string and feed it to python -c. Any ideas on how I could do this? Thanks, Gabriel
yes, I'd thought of that, but I can't get the function's code, I tried the inspect module but it only works of the code is written to disk; I'd rather not have to first write the code to disk just to be able to turn it into a string and feed it to python -c. Any ideas on how I could do this?
You don't have to write it to disk - you can use cStringIO. -- Reza Lotun mobile: +44 (0)7521 310 763 email: rlotun@gmail.com work: reza@tweetdeck.com twitter: @rlotun
Hello, 2010/4/12 Reza Lotun <rlotun@gmail.com>:
yes, I'd thought of that, but I can't get the function's code, I tried the inspect module but it only works of the code is written to disk; I'd rather not have to first write the code to disk just to be able to turn it into a string and feed it to python -c. Any ideas on how I could do this?
You don't have to write it to disk - you can use cStringIO.
That's interesting. :) You would write string in a string buffer that's in the memory, and then... how do you pass it to the child process? You could not get a pointer to the same location in memory, as far as I know. You would still need a socket, or something like that. Am I wrong? alex
-- Reza Lotun mobile: +44 (0)7521 310 763 email: rlotun@gmail.com work: reza@tweetdeck.com twitter: @rlotun
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
-- Alexandre Quessy http://alexandre.quessy.net/
On 01:24 pm, alexandre@quessy.net wrote:
Hello,
2010/4/12 Reza Lotun <rlotun@gmail.com>:
yes, I'd thought of that, but I can't get the function's code, I tried the inspect module but it only works of the code is written to disk; I'd rather not have to first write the code to disk just to be able to turn it into a string and feed it to python -c. Any ideas on how I could do this?
You don't have to write it to disk - you can use cStringIO.
That's interesting. :) You would write string in a string buffer that's in the memory, and then... how do you pass it to the child process? You could not get a pointer to the same location in memory, as far as I know. You would still need a socket, or something like that. Am I wrong?
Sticking with "python -c" is probably easiest. Jean-Paul
You don't have to write it to disk - you can use cStringIO.
Sorry, please disregard this - I was thinking about something else. The way I've done Python process spawning before was to have my code predefined in a string, and then launched via a ProcessProtocol. You can do tricks whereby you have your code actually defined in a string, then you generate your Python source to run tests/etc. on *that*, and then you send your code-string to python -c when you actually want to run it in a separate process. I remember playing around with inspect before, but I gave up at some point and took the path of least resistance. Cheers, Reza -- Reza Lotun mobile: +44 (0)7521 310 763 email: rlotun@gmail.com work: reza@tweetdeck.com twitter: @rlotun
Hi Gabriel, You might want to try Python's multiprocessing code, it works very well. I've had pretty good luck using it with Twisted inside the processes too. The multiprocessing library handles the work of setting up the send and receive file descriptors. -J On Mon, Apr 12, 2010 at 6:09 AM, Gabriel Rossetti <gabriel.rossetti@arimaz.com> wrote:
Hello everyone,
I'd like to run a python function in a new process, like python 2.6's multiprocessing does. I wanted to use ProcessProtocol, but it seems to only work for executing executables and interacting with them using stdin/stdout/other fds. Is there a way of running a python piece of code in a new process?
Thank you, Gabriel
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Jason J. W. Williams wrote:
Hi Gabriel,
You might want to try Python's multiprocessing code, it works very well. I've had pretty good luck using it with Twisted inside the processes too. The multiprocessing library handles the work of setting up the send and receive file descriptors.
-J
Hi Jason, yes, I was going to use that but I read in several threads that this didn't work correctly in Twisted Gabriel
On Mon, Apr 12, 2010 at 6:09 AM, Gabriel Rossetti <gabriel.rossetti@arimaz.com> wrote:
Hello everyone,
I'd like to run a python function in a new process, like python 2.6's multiprocessing does. I wanted to use ProcessProtocol, but it seems to only work for executing executables and interacting with them using stdin/stdout/other fds. Is there a way of running a python piece of code in a new process?
Thank you, Gabriel
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On 2010.04.12 16:23:54 +0200, Gabriel Rossetti wrote:
Jason J. W. Williams wrote:
You might want to try Python's multiprocessing code, it works very well. I've had pretty good luck using it with Twisted inside the processes too. The multiprocessing library handles the work of setting up the send and receive file descriptors.
yes, I was going to use that but I read in several threads that this didn't work correctly in Twisted
IMO Jason probably has a lurking bug that he just hasn't noticed so far. Same thing with using subprocess instead of the Twisted equivalent inside a Twisted reactor. IMX it works 99% of the time. Murphy's Law says it'll pass cursory testing and then fail in production. -- David Ripton dripton@ripton.net
Haven't had any issues yet. Twisted imports occur inside the process function. The app was originally written as a purely blocking multiprocessing app and rewritten to use Twisted inside the sub-processes. It's passed all automated and hand tests without an issue. Is there a reason importing Twisted inside sub-process should not work? -J On Mon, Apr 12, 2010 at 8:45 AM, David Ripton <dripton@ripton.net> wrote:
On 2010.04.12 16:23:54 +0200, Gabriel Rossetti wrote:
Jason J. W. Williams wrote:
You might want to try Python's multiprocessing code, it works very well. I've had pretty good luck using it with Twisted inside the processes too. The multiprocessing library handles the work of setting up the send and receive file descriptors.
yes, I was going to use that but I read in several threads that this didn't work correctly in Twisted
IMO Jason probably has a lurking bug that he just hasn't noticed so far.
Same thing with using subprocess instead of the Twisted equivalent inside a Twisted reactor. IMX it works 99% of the time. Murphy's Law says it'll pass cursory testing and then fail in production.
-- David Ripton dripton@ripton.net
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
I will say importing your reactor (from twisted.internet import reactor) must be done inside the sub-process or you won't get what you want. -J On Mon, Apr 12, 2010 at 9:39 AM, Jason J. W. Williams <jasonjwwilliams@gmail.com> wrote:
Haven't had any issues yet. Twisted imports occur inside the process function. The app was originally written as a purely blocking multiprocessing app and rewritten to use Twisted inside the sub-processes. It's passed all automated and hand tests without an issue. Is there a reason importing Twisted inside sub-process should not work?
-J
On Mon, Apr 12, 2010 at 8:45 AM, David Ripton <dripton@ripton.net> wrote:
On 2010.04.12 16:23:54 +0200, Gabriel Rossetti wrote:
Jason J. W. Williams wrote:
You might want to try Python's multiprocessing code, it works very well. I've had pretty good luck using it with Twisted inside the processes too. The multiprocessing library handles the work of setting up the send and receive file descriptors.
yes, I was going to use that but I read in several threads that this didn't work correctly in Twisted
IMO Jason probably has a lurking bug that he just hasn't noticed so far.
Same thing with using subprocess instead of the Twisted equivalent inside a Twisted reactor. IMX it works 99% of the time. Murphy's Law says it'll pass cursory testing and then fail in production.
-- David Ripton dripton@ripton.net
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On 04/12/2010 04:39 PM, Jason J. W. Williams wrote:
Haven't had any issues yet. Twisted imports occur inside the process function. The app was originally written as a purely blocking multiprocessing app and rewritten to use Twisted inside the sub-processes. It's passed all automated and hand tests without an issue. Is there a reason importing Twisted inside sub-process should not work?
When I last looked at it, multiprocessing did awful things like fork'ing and not re-execing the interpreter in the child process, which seemed like an absolute disaster waiting to happen, for many types of objects which the child process inherits. Does it still do that? I guess what you're doing will work though, In that setup, where the multiprocessing code is the absolute first thing you call, you're essentially using it as a helper to fork off the child process & setup the communication pipes (see the example I just posted for a more explicit example). There's the issue that any multiprocess code which writes to the pipes (or whatever) used for sending results will block (and block the reactor) of course.
Yeah...our code doesn't actually use the pipes. They get fed over AMQP (txAMQP). We're essentially using multiprocessing as a sub-proc supervisor to spin-up "processing" engines that consume the AMQP queues they're configured with. -J On Mon, Apr 12, 2010 at 9:49 AM, Phil Mayers <p.mayers@imperial.ac.uk> wrote:
On 04/12/2010 04:39 PM, Jason J. W. Williams wrote:
Haven't had any issues yet. Twisted imports occur inside the process function. The app was originally written as a purely blocking multiprocessing app and rewritten to use Twisted inside the sub-processes. It's passed all automated and hand tests without an issue. Is there a reason importing Twisted inside sub-process should not work?
When I last looked at it, multiprocessing did awful things like fork'ing and not re-execing the interpreter in the child process, which seemed like an absolute disaster waiting to happen, for many types of objects which the child process inherits. Does it still do that?
I guess what you're doing will work though, In that setup, where the multiprocessing code is the absolute first thing you call, you're essentially using it as a helper to fork off the child process & setup the communication pipes (see the example I just posted for a more explicit example).
There's the issue that any multiprocess code which writes to the pipes (or whatever) used for sending results will block (and block the reactor) of course.
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On 2010.04.12 09:39:21 -0600, Jason J. W. Williams wrote:
Haven't had any issues yet. Twisted imports occur inside the process function. The app was originally written as a purely blocking multiprocessing app and rewritten to use Twisted inside the sub-processes. It's passed all automated and hand tests without an issue. Is there a reason importing Twisted inside sub-process should not work?
Here's JP's canonical answer: http://stackoverflow.com/questions/1948641/twisted-threading-with-subprocess... I've seen this problem in real code. We had a PyGTK + Twisted program that erroneously used subprocess in one place. 2% of the time, it caused an exception. 98% of the time, it worked fine. Classic race condition. Could be you have a similar bug but it never actually manifests on your combination of code, OS, and hardware. Hard to say. -- David Ripton dripton@ripton.net
The comment about passing installSignalHandlers=False to reactor.run() is good to know. Are the signal handlers Twisted installs used by Twisted for anything besides reactor.spawnProcess() or other tasks related to sub-process management? -J On Mon, Apr 12, 2010 at 10:06 AM, David Ripton <dripton@ripton.net> wrote:
On 2010.04.12 09:39:21 -0600, Jason J. W. Williams wrote:
Haven't had any issues yet. Twisted imports occur inside the process function. The app was originally written as a purely blocking multiprocessing app and rewritten to use Twisted inside the sub-processes. It's passed all automated and hand tests without an issue. Is there a reason importing Twisted inside sub-process should not work?
Here's JP's canonical answer:
http://stackoverflow.com/questions/1948641/twisted-threading-with-subprocess...
I've seen this problem in real code. We had a PyGTK + Twisted program that erroneously used subprocess in one place. 2% of the time, it caused an exception. 98% of the time, it worked fine. Classic race condition. Could be you have a similar bug but it never actually manifests on your combination of code, OS, and hardware. Hard to say.
-- David Ripton dripton@ripton.net
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On Apr 12, 2010, at 12:06 PM, David Ripton wrote:
On 2010.04.12 09:39:21 -0600, Jason J. W. Williams wrote:
Haven't had any issues yet. Twisted imports occur inside the process function. The app was originally written as a purely blocking multiprocessing app and rewritten to use Twisted inside the sub-processes. It's passed all automated and hand tests without an issue. Is there a reason importing Twisted inside sub-process should not work?
Here's JP's canonical answer:
http://stackoverflow.com/questions/1948641/twisted-threading-with-subprocess...
I've seen this problem in real code. We had a PyGTK + Twisted program that erroneously used subprocess in one place. 2% of the time, it caused an exception. 98% of the time, it worked fine. Classic race condition. Could be you have a similar bug but it never actually manifests on your combination of code, OS, and hardware. Hard to say.
I've noted this in a comment on the stackoverflow answer, but it bears repeating: This was a long-standing bug in Twisted which has since been fixed on trunk, although it isn't present in a release: <http://twistedmatrix.com/trac/ticket/733>. Starting with the next release (Twisted 10.1, we hope), you should be able to do this without getting this particular type of error. Still, I wouldn't recommend using the subprocess module in a Twisted application, and multiprocessing even less. 'subprocess' uses select(), which means that if you are running processes in a server handling a large number of connections with a reactor that you've selected for that job, you will occasionally notice that '.communicate()' will blow up because your file descriptors are too big to fit into a select() call. Its handling of EINTR and EAGAIN are less consistent than spawnProcess, you can't independently handle subprocess termination and subprocess-closing-its-output, so certain types of bugs are much tricker to track down... and I'm sure there are other issues, but I haven't had an opportunity to thoroughly audit it. Multiprocessing has its own subtly *differently* wonky implementation of subprocess spawning (it uses os.fork(), not the subprocess module), it uses pickle to move objects between processes, and it seems to spawn threads internally, which means it depends on correct thread/process (and hey, maybe thread/pickle too) interaction. Both of these will work reasonably well for small applications; multiprocessing is *great* if you have a simple, straightforward little multithreaded thing that you want to make multiprocess in order to take advantage of multiple cores. But by the time you've already taken the trouble to learn how to use Twisted, IMHO spawnProcess is a lot more powerful and a lot less trouble than either of these solutions. Even more so if you use a higher-level abstraction over it, like ampoule: <https://launchpad.net/ampoule>.
That all makes sense to me if you're using it to coordinate between the processes using multiprocessing's mechanisms (which I assume most do). We just aren't. Once the processes are spawned they do their own thing without any knowledge of each other. There something I should be expecting to bite me in the rear by folding Twisted into this architecture? -J On Mon, Apr 12, 2010 at 12:23 PM, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
On Apr 12, 2010, at 12:06 PM, David Ripton wrote:
On 2010.04.12 09:39:21 -0600, Jason J. W. Williams wrote:
Haven't had any issues yet. Twisted imports occur inside the process function. The app was originally written as a purely blocking multiprocessing app and rewritten to use Twisted inside the sub-processes. It's passed all automated and hand tests without an issue. Is there a reason importing Twisted inside sub-process should not work?
Here's JP's canonical answer:
http://stackoverflow.com/questions/1948641/twisted-threading-with-subprocess...
I've seen this problem in real code. We had a PyGTK + Twisted program that erroneously used subprocess in one place. 2% of the time, it caused an exception. 98% of the time, it worked fine. Classic race condition. Could be you have a similar bug but it never actually manifests on your combination of code, OS, and hardware. Hard to say.
I've noted this in a comment on the stackoverflow answer, but it bears repeating:
This was a long-standing bug in Twisted which has since been fixed on trunk, although it isn't present in a release: <http://twistedmatrix.com/trac/ticket/733>. Starting with the next release (Twisted 10.1, we hope), you should be able to do this without getting this particular type of error.
Still, I wouldn't recommend using the subprocess module in a Twisted application, and multiprocessing even less. 'subprocess' uses select(), which means that if you are running processes in a server handling a large number of connections with a reactor that you've selected for that job, you will occasionally notice that '.communicate()' will blow up because your file descriptors are too big to fit into a select() call. Its handling of EINTR and EAGAIN are less consistent than spawnProcess, you can't independently handle subprocess termination and subprocess-closing-its-output, so certain types of bugs are much tricker to track down... and I'm sure there are other issues, but I haven't had an opportunity to thoroughly audit it.
Multiprocessing has its own subtly *differently* wonky implementation of subprocess spawning (it uses os.fork(), not the subprocess module), it uses pickle to move objects between processes, and it seems to spawn threads internally, which means it depends on correct thread/process (and hey, maybe thread/pickle too) interaction.
Both of these will work reasonably well for small applications; multiprocessing is *great* if you have a simple, straightforward little multithreaded thing that you want to make multiprocess in order to take advantage of multiple cores. But by the time you've already taken the trouble to learn how to use Twisted, IMHO spawnProcess is a lot more powerful and a lot less trouble than either of these solutions. Even more so if you use a higher-level abstraction over it, like ampoule: <https://launchpad.net/ampoule>.
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On Mon, 2010-04-12 at 14:23 -0400, Glyph Lefkowitz wrote:
Still, I wouldn't recommend using the subprocess module in a Twisted application, and multiprocessing even less. 'subprocess' uses select(), which means that if you are running processes in a server handling a large number of connections with a reactor that you've selected for that job, you will occasionally notice that '.communicate()' will blow up because your file descriptors are too big to fit into a select() call.
I'm pretty sure Twisted's process.py uses select() in some cases... possibly on import though, which would be less of an issue, I forget.
Itamar Turner-Trauring wrote:
On Mon, 2010-04-12 at 14:23 -0400, Glyph Lefkowitz wrote:
Still, I wouldn't recommend using the subprocess module in a Twisted application, and multiprocessing even less. 'subprocess' uses select(), which means that if you are running processes in a server handling a large number of connections with a reactor that you've selected for that job, you will occasionally notice that '.communicate()' will blow up because your file descriptors are too big to fit into a select() call.
I'm pretty sure Twisted's process.py uses select() in some cases... possibly on import though, which would be less of an issue, I forget.
Yeah, I don't think it's much of an issue. It does do it on import, and then later as a fallback if that import-time check found “broken versions of linux” where “write-only pipe are detected as readable.” This is apparently Linux < 2.6.11. I suppose if someone does find it's an issue for them that it wouldn't be too hard to fix, although perhaps they'd find it easier to simply upgrade their OS... -Andrew.
On 13 April 2010 03:39, Jason J. W. Williams <jasonjwwilliams@gmail.com> wrote:
Haven't had any issues yet. Twisted imports occur inside the process function. The app was originally written as a purely blocking multiprocessing app and rewritten to use Twisted inside the sub-processes. It's passed all automated and hand tests without an issue. Is there a reason importing Twisted inside sub-process should not work?
If you just use Twisted in the subprocesses, that sounds like it should be fine to me. If you want a more twisty solution, there's ampoule: https://launchpad.net/ampoule Cheers, mwh
On 04/12/2010 01:09 PM, Gabriel Rossetti wrote:
Hello everyone,
I'd like to run a python function in a new process, like python 2.6's multiprocessing does. I wanted to use ProcessProtocol, but it seems to only work for executing executables and interacting with them using stdin/stdout/other fds. Is there a way of running a python piece of code in a new process?
I frequently use a model like this: #!/usr/bin/python import sys # and loads of twisted imports # function to run in a child process def work(): while True: cmd = sys.stdin.readline() if not cmd: break # do something sys.stdout.write('result\n') sys.stdout.flush() class MyProcessProtocol(protocol.ProcessProtocol): def __init__(self): self.buffer = '' self.queue = [] def outReceived(self, data): self.buffer += data while '\n' in self.buffer: reply, self.buffer = self.buffer.split('\n', 1) d,c = self.queue.pop(0) d.callback(reply) if self.queue: self.transport.write('%s\n' % (self.queue[0][1],) def cmd(self, c): d = defer.Deferred() if self.queue: # just enqueue it self.queue.append((d,c)) else: # hang onto the deferred and also send the cmd self.queue.append((d,c)) self.transport.write(c+'\n') return d def parent(): proto = MyProcessProtocol() reactor.spawnProcess( proto, sys.executable, (sys.executable, os.path.abspath(__file__), 'WORKER') ) # write some commands to the child d = proto.send_cmd('something') d.addCallback(result).addErrback(failed) def twisted_main(): reactor.callWhenRunning(parent) reactor.run() def main(): args = sys.argv[1:] if args and args[0]=='WORKER': # we're a child process work() else: twisted_main() if __name__=='__main__': main() It's pretty simple; you need to be sure that the module does all the imports it needs in the child process. You can extend the above model to something more complex than simple line-based results, using e.g. JSON or python pickle, to handle exceptions, and so forth. IMHO Twisted REALLY REALLY needs something built-in to do this. I'm aware of Ampoule; it didn't work for me when I tried it (which was a while ago) and having to define AMP commands to use it was a bit of a turn off for me.
participants (11)
-
Alexandre Quessy
-
Andrew Bennetts
-
David Ripton
-
exarkun@twistedmatrix.com
-
Gabriel Rossetti
-
Glyph Lefkowitz
-
Itamar Turner-Trauring
-
Jason J. W. Williams
-
Michael Hudson-Doyle
-
Phil Mayers
-
Reza Lotun