First of all I would like to thank to developer of twisted for making such a great platform for network applications,, I just love it... But for theory point of view I want to clear some of my doubts. 1) What is the underlying technology? As, I believe that twisted is a TCP server which which uses select() based call to handle multiple requests. Is it true or there is something else which I am not aware of. 2) So, if twisted is single process based network application framework (without threads and forks),, I believe it cannot take advantage of multiple processor residing on one machine. For e.g. if OS schedule twisted process on one processor,, that's it,, it can only run on same processor not taking advantage of multiple processor. I am kind of confused in this question. Can anyone shed some light on this? 3) So, suppose I have one twisted reactor based process running,, I can use defferToThread as one of the way to kind of using multiple processing? 4) So, one of the technique to achieve parallelism is by running multiple twisted reactor on different processor and using some scheduler which takes the web request and forwards that request to different twisted reactor instance on different processor. This is close to Load balancing as we can also use the upper architecture and configure the scheduler to schedule requests to different computers. Many of the questions might sound weird,, and please feel free to write any comment. Thanks Arun
On Jan 6, 2010, at 9:04 PM, arun chhetri wrote:
First of all I would like to thank to developer of twisted for making such a great platform for network applications,, I just love it...
Great! Glad to hear you're enjoying it!
1) What is the underlying technology? As, I believe that twisted is a TCP server which which uses select() based call to handle multiple requests. Is it true or there is something else which I am not aware of.
Twisted uses select() (or something like it) to handle multiple connections, it's true. But Twisted can handle UDP, UNIX sockets, serial ports, and several other kinds of resource in addition to just TCP. If you are using Twisted with a GUI library (such as GTK+, Qt, Cocoa, or WxWidgets) you can handle GUI events in the same event loop.
2) So, if twisted is single process based network application framework (without threads and forks),, I believe it cannot take advantage of multiple processor residing on one machine. For e.g. if OS schedule twisted process on one processor,, that's it,, it can only run on same processor not taking advantage of multiple processor. I am kind of confused in this question. Can anyone shed some light on this?
One of the other types of connection that Twisted can handle is a pipe connection to a subprocess. If you want to take advantage of multiple connections, you can use Twisted's process-spawning APIs, either directly: http://twistedmatrix.com/documents/9.0.0/api/twisted.internet.interfaces.IRe... or via a convenience process-pool API, such as Ampoule: https://launchpad.net/ampoule
3) So, suppose I have one twisted reactor based process running,, I can use defferToThread as one of the way to kind of using multiple processing?
No, not really. Threads in python are bound by the global interpreter lock and are therefore not that useful for using multiple processors.
Am Mittwoch, den 06.01.2010, 22:02 -0500 schrieb Glyph Lefkowitz:
3) So, suppose I have one twisted reactor based process running,, I can use defferToThread as one of the way to kind of using multiple processing?
No, not really. Threads in python are bound by the global interpreter lock and are therefore not that useful for using multiple processors.
While this is generally true, in specific cases you can use it for speeding up by using multiple cores. One example would be a program that does image manipulation via PIL. PIL does release the GIL before it starts to munch on the image, hence you can achieve more than 100% user time :) One script that I use to scale images does manage, in ideal settings over 300% user time; the trick being to use a couple of threads more than I've got cpu cores, this way a couple of threads can inform the Linux kernel what files the process is interested in. This way, the CPU has usually enough work to do the work, and as most work is spent in C code in PIL, the GIL does not limit CPU utilization. But generally speaking as Glyph has mentioned, Python usually can use only one CPU core per process. The reason for that you need to hold a single global lock to access the virtual machine, but also the runtime in any way. So some C function, can do something like that: * extract input arguments from PyObjects. * release GIL * munch munch * reacquire GIL * create PyObjects for return state. notice that only code that in no way touches the Python runtime can run without the GIL being hold. Andreas
_______________________________________________ Twisted-web mailing list Twisted-web@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web
Am Mittwoch, den 06.01.2010, 20:04 -0600 schrieb arun chhetri:
First of all I would like to thank to developer of twisted for making such a great platform for network applications,, I just love it... But for theory point of view I want to clear some of my doubts.
1) What is the underlying technology? As, I believe that twisted is a TCP server which which uses select() based call to handle multiple requests. Is it true or there is something else which I am not aware of.
Basically yes. It uses select or more or less equivalent techniques to do an event handling loop. It's called a reactor in Twisted ;)
2) So, if twisted is single process based network application framework (without threads and forks),, I believe it cannot take advantage of multiple processor residing on one machine. For e.g. if OS schedule twisted process on one processor,, that's it,, it can only run on same processor not taking advantage of multiple processor. I am kind of confused in this question. Can anyone shed some light on this?
Correctly. If you want to use multiple processors, you should consider running multiple processes, e.g. using load balancing.
3) So, suppose I have one twisted reactor based process running,, I can use defferToThread as one of the way to kind of using multiple processing?
Yes and no. Yes it runs in a thread. Yes, in some cases you can use many cores this way. But no, Python with it's GIL is not the perfect language to write threaded programs that want to use threads for performance.
4) So, one of the technique to achieve parallelism is by running multiple twisted reactor on different processor and using some scheduler which takes the web request and forwards that request to different twisted reactor instance on different processor. This is close to Load balancing as we can also use the upper architecture and configure the scheduler to schedule requests to different computers.
Exactly. Or you can split the work asymetrically, I once had a server that did the work (data lookups), but used multiple backends for different languages (1 for French, a couple of English lookup processes). One thing to consider is that Twisted, when done correctly is rather efficient. 2 experiences from my work come to mind: -) once I've done a web crawler in Twisted without any consideration how many connections I create. The website which was not a first tier operation (like Google, Yahoo, MSN, ...), but still some commerical offering was DoSed in seconds. Twisted has no problem with opening 20000 http connections (if you give it the file descriptors to manage that), but typical sites, don't take that very well. Lesson to consider: Always consider your server and client capabilities, and rate limit stuff as needed. -) I've also done some years ago a delivery network, that patched files on the fly (by substituting a couple of bytes, nothing eleborate, but it did complicate the reading and sending of the data). Even early versions of that http server (it was twisted.web2 back then) where capable to fill a Gigabit pipe without issues. So the "single-threadness" of Twisted (and in some ways of Python) is not that big an issue, other moving parts will probably limit you first, e.g. SQL databases are notorious for being a choking point for scaling. Andreas
Many of the questions might sound weird,, and please feel free to write any comment. Thanks Arun
_______________________________________________ Twisted-web mailing list Twisted-web@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web
participants (3)
-
Andreas Kostyrka
-
arun chhetri
-
Glyph Lefkowitz