How does Twisted Web Multiprocess work ?
Hi, I am exploring Twisted Web for my RESTful application. My application is stateless and involves storing and retrieving objects based on Object-ID. This application will run on beefy (multicore, lots of memory) machine. However, not all APIs that the application issues to underlying storage are async and hence I cannot fully utilize Deferreds Which means, there will some blocking calls and hence my primary interest is to use Twisted Web in multiprocessing mode I came across http://stackoverflow.com/questions/10077745/twistedweb-on-multicore-multipro... However, I am not sure if it is the "correct" way of doing things. Hence I had some questions around it: 1. Is there an interface (similar to defertoThread) which allows me to execute a blocking call in a separate process ? 2. Does reactor synchronize access of all processes to the shared listen socket ? 3. Is there a sample code I can refer to where the application is spawning subprocesses to handle HTTP requests ? Thanks in advance!
On 10 June 2015 at 03:01, Sagar Dixit <sagar.dixit@gmail.com> wrote:
Hi,
I am exploring Twisted Web for my RESTful application. My application is stateless and involves storing and retrieving objects based on Object-ID. This application will run on beefy (multicore, lots of memory) machine. However, not all APIs that the application issues to underlying storage are async and hence I cannot fully utilize Deferreds Which means, there will some blocking calls and hence my primary interest is to use Twisted Web in multiprocessing mode
I came across http://stackoverflow.com/questions/10077745/twistedweb-on-multicore-multipro...
However, I am not sure if it is the "correct" way of doing things.
Hence I had some questions around it:
1. Is there an interface (similar to defertoThread) which allows me to execute a blocking call in a separate process ?
2. Does reactor synchronize access of all processes to the shared listen socket ?
3. Is there a sample code I can refer to where the application is spawning subprocesses to handle HTTP requests ?
We've used SO_REUSEPORT (which is briefly mentioned on the SO page you linked to) in order to run multiple processes. On a recentish Linux the following approach has worked well for us: # Make a socket with SO_REUSEPORT set so that we can run multiple web # applications. This is easier to do from outside of Twisted as there's # not yet official support for setting socket options. s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) # The following might not work on older kernels, or SO_REUSEPORT # might not be available if Python was compiled with older headers. # In the former case you're out of luck, but you can define the # header yourself if it's missing from Python's socket module. s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1) # Listen on all interfaces on port 1234. s.bind(('0.0.0.0', 1234)) # Use a backlog of 50, which seems to be fairly common. s.listen(50) # Adopt this socket into something more Twisty. endpoint = AdoptedStreamServerEndpoint(reactor, s.fileno(), s.family) # Prevent garbage collection. This is something we discovered we # needed, and is perhaps something that AdoptedStreamServerEndpoint # ought to do itself. endpoint.socket = s # Create a service with the endpoint. service = OurService(site_endpoint) (Derived from src/maasserver/eventloop.py in MAAS.) We then use Upstart (Ubuntu 14.10 and before) or systemd (Ubuntu 15.04) to run multiple processes. For us this mean 4, but you can have as many or as few as you need, for example 1 process per CPU core. I like this approach because we can use the system's facilities for process supervision instead of cobbling together our own. One caveat is that the processes are running concurrently. If you're modifiying shared resources you'll want to consider explicit locking where before you might have been safe with Twisted' single-threadedness. Gavin.
Thanks Gavin Good to know how things are done at your end. thanks! However, why is Twisted not internally using SO_REUSEPORT ? Is there any plan to integrate it inside twisted so that application need not worry about it ? The problem with code mentioned here: http://stackoverflow.com/questions/10077745/twistedweb-on-multicore-multipro... is that the requests are not equally distributed across all processes. This imbalance leads to underutilization of CPU cores. This issue is also discussed on https://lwn.net/Articles/542629/ and that this is not an issue with SO_REUSEPORT Does it mean that Twisted is using single listening socket and all processes accept() on that socket ? thanks On Wed, Jun 10, 2015 at 2:23 AM, Gavin Panella <gavin@gromper.net> wrote:
On 10 June 2015 at 03:01, Sagar Dixit <sagar.dixit@gmail.com> wrote:
Hi,
I am exploring Twisted Web for my RESTful application. My application is stateless and involves storing and retrieving objects based on Object-ID. This application will run on beefy (multicore, lots of memory) machine. However, not all APIs that the application issues to underlying storage are async and hence I cannot fully utilize Deferreds Which means, there will some blocking calls and hence my primary interest is to use Twisted Web in multiprocessing mode
I came across
http://stackoverflow.com/questions/10077745/twistedweb-on-multicore-multipro...
However, I am not sure if it is the "correct" way of doing things.
Hence I had some questions around it:
1. Is there an interface (similar to defertoThread) which allows me to execute a blocking call in a separate process ?
2. Does reactor synchronize access of all processes to the shared listen socket ?
3. Is there a sample code I can refer to where the application is spawning subprocesses to handle HTTP requests ?
We've used SO_REUSEPORT (which is briefly mentioned on the SO page you linked to) in order to run multiple processes. On a recentish Linux the following approach has worked well for us:
# Make a socket with SO_REUSEPORT set so that we can run multiple web # applications. This is easier to do from outside of Twisted as there's # not yet official support for setting socket options. s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) # The following might not work on older kernels, or SO_REUSEPORT # might not be available if Python was compiled with older headers. # In the former case you're out of luck, but you can define the # header yourself if it's missing from Python's socket module. s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
# Listen on all interfaces on port 1234. s.bind(('0.0.0.0', 1234))
# Use a backlog of 50, which seems to be fairly common. s.listen(50)
# Adopt this socket into something more Twisty. endpoint = AdoptedStreamServerEndpoint(reactor, s.fileno(), s.family)
# Prevent garbage collection. This is something we discovered we # needed, and is perhaps something that AdoptedStreamServerEndpoint # ought to do itself. endpoint.socket = s
# Create a service with the endpoint. service = OurService(site_endpoint)
(Derived from src/maasserver/eventloop.py in MAAS.)
We then use Upstart (Ubuntu 14.10 and before) or systemd (Ubuntu 15.04) to run multiple processes. For us this mean 4, but you can have as many or as few as you need, for example 1 process per CPU core. I like this approach because we can use the system's facilities for process supervision instead of cobbling together our own.
One caveat is that the processes are running concurrently. If you're modifiying shared resources you'll want to consider explicit locking where before you might have been safe with Twisted' single-threadedness.
Gavin.
_______________________________________________ Twisted-web mailing list Twisted-web@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web
-- ssdixit
On 10 June 2015 at 23:39, Sagar Dixit <sagar.dixit@gmail.com> wrote:
Thanks Gavin
Good to know how things are done at your end. thanks! However, why is Twisted not internally using SO_REUSEPORT ? Is there any plan to integrate it inside twisted so that application need not worry about it ?
I suspect it's only because it's fairly new in Linux and support hasn't been written/merged into Twisted. Someone else might have detail to add to that; in this regard I'm only a user of Twisted (for now... maybe I'll get an itch to address this in Twisted itself if someone else hasn't already done so).
The problem with code mentioned here: http://stackoverflow.com/questions/10077745/twistedweb-on-multicore-multipro... is that the requests are not equally distributed across all processes. This imbalance leads to underutilization of CPU cores. This issue is also discussed on https://lwn.net/Articles/542629/ and that this is not an issue with SO_REUSEPORT
That's interesting. We used SO_REUSEPORT for convenience but it's good to know that it's balancing nicely too :)
Does it mean that Twisted is using single listening socket and all processes accept() on that socket ?
Each process gets its own socket, but the kernel allows all of them to bind to the same local address:port. If you need more processes you can start them without ceremony and it'll all Just Work. Likewise when killing processes. Done right, you can have seamless upgrades of your service by restarting one process at a time. No need for special support from a supervisor. Gavin.
On 10/06/15 23:39, Sagar Dixit wrote:
Thanks Gavin
Good to know how things are done at your end. thanks! However, why is Twisted not internally using SO_REUSEPORT ? Is there any plan to integrate it inside twisted so that application need not worry about it ?
The application will always need to "worry about it", even if that just means making a single API call to re-start itself inside a process pool. Twisted can't just fork and re-exec app code that isn't expecting it.
There is a twisted developer guide for communicating with child processes: http://twistedmatrix.com/documents/current/core/howto/process.html I'm not sure I understand why having multiple processes listen on the same socket is desirable in your case. From reading the articles you linked to, it seems like it is only useful in the case where a forward proxy becomes a bottleneck. Is that the case wirh your application? Thanks, Carl Waldbieser On Jun 9, 2015 10:03 PM, "Sagar Dixit" <sagar.dixit@gmail.com> wrote:
Hi,
I am exploring Twisted Web for my RESTful application. My application is stateless and involves storing and retrieving objects based on Object-ID. This application will run on beefy (multicore, lots of memory) machine. However, not all APIs that the application issues to underlying storage are async and hence I cannot fully utilize Deferreds Which means, there will some blocking calls and hence my primary interest is to use Twisted Web in multiprocessing mode
I came across http://stackoverflow.com/questions/10077745/twistedweb-on-multicore-multipro...
However, I am not sure if it is the "correct" way of doing things.
Hence I had some questions around it:
1. Is there an interface (similar to defertoThread) which allows me to execute a blocking call in a separate process ?
2. Does reactor synchronize access of all processes to the shared listen socket ?
3. Is there a sample code I can refer to where the application is spawning subprocesses to handle HTTP requests ?
Thanks in advance!
_______________________________________________ Twisted-web mailing list Twisted-web@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web
Hi Carl The only motivation to use multiple processes is to be able to parallelize blocking calls (since event driven handling is not useful) In my case forward proxy is not a bottleneck, however when my code spawns multiple processes as per http://stackoverflow.com/questions/10077745/twistedweb-on-multicore-multipro... and I run multiple concurrent clients, I see that the server's processes are not efficiently utilized. Specifically, even if there are outstanding requests, one of the processes accepts requests lot more while other processes are idling. I was able to confirm this by printing the Process IDs of the processes which are handling request This sort of imbalance could lead to underutilization of CPU - also discussed here https://lwn.net/Articles/542629/ Hence I was curious to know how Twisted multiprocessing works. It seems SO_REUSEPORT is not used within Twisted and as per Gavin's reply can be used "outside" of Twisted. On Wed, Jun 10, 2015 at 5:42 PM, Carl Waldbieser <cwaldbieser@gmail.com> wrote:
There is a twisted developer guide for communicating with child processes: http://twistedmatrix.com/documents/current/core/howto/process.html
I'm not sure I understand why having multiple processes listen on the same socket is desirable in your case. From reading the articles you linked to, it seems like it is only useful in the case where a forward proxy becomes a bottleneck. Is that the case wirh your application?
Thanks, Carl Waldbieser On Jun 9, 2015 10:03 PM, "Sagar Dixit" <sagar.dixit@gmail.com> wrote:
Hi,
I am exploring Twisted Web for my RESTful application. My application is stateless and involves storing and retrieving objects based on Object-ID. This application will run on beefy (multicore, lots of memory) machine. However, not all APIs that the application issues to underlying storage are async and hence I cannot fully utilize Deferreds Which means, there will some blocking calls and hence my primary interest is to use Twisted Web in multiprocessing mode
I came across http://stackoverflow.com/questions/10077745/twistedweb-on-multicore-multipro...
However, I am not sure if it is the "correct" way of doing things.
Hence I had some questions around it:
1. Is there an interface (similar to defertoThread) which allows me to execute a blocking call in a separate process ?
2. Does reactor synchronize access of all processes to the shared listen socket ?
3. Is there a sample code I can refer to where the application is spawning subprocesses to handle HTTP requests ?
Thanks in advance!
_______________________________________________ Twisted-web mailing list Twisted-web@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web
_______________________________________________ Twisted-web mailing list Twisted-web@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web
-- ssdixit
participants (4)
-
Carl Waldbieser
-
Gavin Panella
-
Phil Mayers
-
Sagar Dixit