[Twisted-Python] Re: Twisted and the Posh Module
On Monday 14 March 2005 10:13 am, twisted-python-request@twistedmatrix.com wrote:
Has anyone tried playing with Twisted and the posh module? I need to do some CPU-intensive stuff inside the reactor, which unfortunately holds on to the GIL. I was thinking of just making a deferToPosh() type of method in the spirit of deferToThread(), but forking on each operation would be pretty expensive. Which brings up the question of having process pools.
Has anyone done anything like this? I don't really need any interaction between the worker processes and Twisted, so in theory I could spawn off some worker processes early and just communicate queues of tasks to complete, independent of Twisted and the reactor.
I think having some sort of process pool mangagement in Twisted is a great idea, especially with multi-core CPUs emerging on the scene. I have access to a dual-core Pentium Prescott CPU and it would be great to have that available to keep both cores humming on a certain CPU-intensive project I'm considering. However, I'm not sure the best way to go about it would be with the posh module, pypar (see http://datamining.anu.edu.au/~ole/pypar/), or just using Perspective Broker as an underlying message-passing mechanism with UNIX sockets and/or TCP. One thought might be to have a single master process start up and act as a PB server and process pool manager. Subsidiary processes could then make authenticated PB connections to the server to "volunteer" for work in the process pool. Note that pypar lets you easily find out how many CPUs you have under kernel control, with pypar.size(). Thus, the main process could start the process pool by spawning a subsidiary "volunteer" process for each CPU core present. --- Ed Suominen Registered Patent Agent Open-Source Software Author (yes, both...) Web Site: http://www.eepatents.com
On Mon, 14 Mar 2005 10:59:51 -0800, Ed Suominen <general@eepatents.com> wrote:
On Monday 14 March 2005 10:13 am, twisted-python-request@twistedmatrix.com wrote:
Has anyone tried playing with Twisted and the posh module? I need to do some CPU-intensive stuff inside the reactor, which unfortunately holds on to the GIL. I was thinking of just making a deferToPosh() type of method in the spirit of deferToThread(), but forking on each operation would be pretty expensive. Which brings up the question of having process pools.
Has anyone done anything like this? I don't really need any interaction between the worker processes and Twisted, so in theory I could spawn off some worker processes early and just communicate queues of tasks to complete, independent of Twisted and the reactor.
I think having some sort of process pool mangagement in Twisted is a great idea, especially with multi-core CPUs emerging on the scene. I have access to a dual-core Pentium Prescott CPU and it would be great to have that available to keep both cores humming on a certain CPU-intensive project I'm considering.
However, I'm not sure the best way to go about it would be with the posh module, pypar (see http://datamining.anu.edu.au/~ole/pypar/), or just using Perspective Broker as an underlying message-passing mechanism with UNIX sockets and/or TCP. One thought might be to have a single master process start up and act as a PB server and process pool manager. Subsidiary processes could then make authenticated PB connections to the server to "volunteer" for work in the process pool.
Note that pypar lets you easily find out how many CPUs you have under kernel control, with pypar.size(). Thus, the main process could start the process pool by spawning a subsidiary "volunteer" process for each CPU core present.
Quotient currently uses Twisted's spawnProcess() to start up a worker process and communicates with it using PB over stdin/stdout. The child process performs fulltext indexing and searches for the parent process. The code is currently about half transformed into a general process pool service. Unfortunately I have not had time to work on this in several weeks, and probably will not have time to finish it for at least several more. If anyone is interested in picking up development and finishing it, I'd be more than happy to accept patches :) Once this is done, it should go into Twisted, since it is clearly a feature quite a few people using Twisted desire. Most of the code is currently in a module in a branch in the Quotient repository: http://divmod.org/cvs/branches/exarkun/runnerup-2357/atop/runnerup.py?rev=7515&view=markup In particular, the ProcessController and ServiceController classes. Jp
Ed Suominen wrote:
I think having some sort of process pool mangagement in Twisted is a great idea, especially with multi-core CPUs emerging on the scene. I have access to a dual-core Pentium Prescott CPU and it would be great to have that available to keep both cores humming on a certain CPU-intensive project I'm considering.
However, I'm not sure the best way to go about it would be with the posh module, pypar (see http://datamining.anu.edu.au/~ole/pypar/), or just using Perspective Broker as an underlying message-passing mechanism with UNIX sockets and/or TCP. One thought might be to have a single master process start up and act as a PB server and process pool manager. Subsidiary processes could then make authenticated PB connections to the server to "volunteer" for work in the process pool.
Note that pypar lets you easily find out how many CPUs you have under kernel control, with pypar.size(). Thus, the main process could start the process pool by spawning a subsidiary "volunteer" process for each CPU core present.
My goal is somewhat similar in that I have multiple CPU's that aren't being used very much, but the other problem is that certain libraries I use (PIL) grab the GIL for long-running operations like image modifications. I can thread them off, but it still liberally acquires the GIL and makes the server unresponsive for the duration of the operation. I too had looked at using Perspective Broker to communicate with separate "worker" processes, and the only reason I'm not excited with that option is that the bandwidth between the master and worker processes involves a lot of large binary strings. Shared memory seemed more efficient than PB for transferring those strings. So for that reason, finding out how many CPU's I have isn't that important, because I'll still want more worker processes than I have CPU's.
On Mar 14, 2005, at 3:24 PM, Ken Kinder wrote:
Ed Suominen wrote:
I think having some sort of process pool mangagement in Twisted is a great idea, especially with multi-core CPUs emerging on the scene. I have access to a dual-core Pentium Prescott CPU and it would be great to have that available to keep both cores humming on a certain CPU-intensive project I'm considering.
However, I'm not sure the best way to go about it would be with the posh module, pypar (see http://datamining.anu.edu.au/~ole/pypar/), or just using Perspective Broker as an underlying message-passing mechanism with UNIX sockets and/or TCP. One thought might be to have a single master process start up and act as a PB server and process pool manager. Subsidiary processes could then make authenticated PB connections to the server to "volunteer" for work in the process pool. Note that pypar lets you easily find out how many CPUs you have under kernel control, with pypar.size(). Thus, the main process could start the process pool by spawning a subsidiary "volunteer" process for each CPU core present.
My goal is somewhat similar in that I have multiple CPU's that aren't being used very much, but the other problem is that certain libraries I use (PIL) grab the GIL for long-running operations like image modifications. I can thread them off, but it still liberally acquires the GIL and makes the server unresponsive for the duration of the operation.
I too had looked at using Perspective Broker to communicate with separate "worker" processes, and the only reason I'm not excited with that option is that the bandwidth between the master and worker processes involves a lot of large binary strings. Shared memory seemed more efficient than PB for transferring those strings.
So for that reason, finding out how many CPU's I have isn't that important, because I'll still want more worker processes than I have CPU's.
Ideally we'd have a message-passing system that could use multiple backends (i.e. shared memory, mmap, or sockets). Using sockets is probably a better solution for now -- you're likely to do a lot of copying anyway, cause it's Python and PIL :) With sockets, you can scale right to multiple computers.. with shared memory, you're stuck on a single box. The API that POSH exposes (proxied non-blocking objects) can't scale well to multiple machines, where a socket-based API could be scaled down to actually use an efficient shared memory implementation at some point. -bob
Bob Ippolito wrote:
Ideally we'd have a message-passing system that could use multiple backends (i.e. shared memory, mmap, or sockets). Using sockets is probably a better solution for now -- you're likely to do a lot of copying anyway, cause it's Python and PIL :)
With sockets, you can scale right to multiple computers.. with shared memory, you're stuck on a single box. The API that POSH exposes (proxied non-blocking objects) can't scale well to multiple machines, where a socket-based API could be scaled down to actually use an efficient shared memory implementation at some point.
I'm not trying to distribute work among multiple boxes with this -- I already do that. :) The goal is to free up each node's GIL in an existing clustered application and make better use of multiple CPU's. -Ken
participants (4)
-
Bob Ippolito
-
Ed Suominen
-
Jp Calderone
-
Ken Kinder