[Twisted-Python] epoll and other questions

Is there any plan to use epoll instead of poll to make twisted scalabile with hundred thousand simultanous sockets connected? I'm evaluating if to use twisted or C/C++ for my cpushare.com server side. I'd expect network to be the main bottleneck in the short term, and so I feel safe at spending extra cycles in userspace (at least in the short term), but even in the short it should at least handle hundred thousand tcp connections, and so I'm really scared about poll. I think poll would be the biggest showstopper, and that's why I'm asking to make twisted use epoll. I assume my application would require no change, so I can start developing with current twisted, I can test it with poll, and then later fix the internals when the slowdown becomes noticeable. Right? I understand there's no limitation on the number of sockets simultanously open, I just need to use ulimit to boost the limit of fds. A slightly separated issue: I assume it's best for me not to do any blocking I/O in the main network server handling the 100k connections and to create a secondary internal server communicating again through tcp/ip (loopback device) with the primary server to do the real blocking I/O. Is this correct? Best would be to use asynchronous I/O for the IO, but I think using a second process will be a lot simpler in practice since I don't need bulk I/O performance (I only need to avoid blocking). I only want to keep the network pipeline full even when some disk-read is happening. Best would be to use threading (or shared memory with MAP_SHARED in tmpfs), but it seems twisted is not mature enough for threading and shared memory communication using futex, right? If I would write it in C I could probably get various performance bits faster but I doubt the time spent on those bits would payoff significantly, opinions? Another thing I plan doing is to ship the public key (matching the private key stored only on the server) on the client source tarball, this way as far as people downloaded the right tarball, they will be able to securely connect to the server since they will be able to check the signature. Is there any example of this idea (public key stored in a file in the client package) available somewhere? thank you! PS. if somebody has some spare time and wants to have a look at the python _client_ code downloadable at www.cpushare.com/download.php and tell me if it's decent code and if I interfaced correctly between twisted and pyqt that's welcome. I'm at the point that I've to open up the connections from the QT event handlers etc.. I still have some unsolved issue on the QT side (like error windows instead of using sys.stderr and disabling one side of the tab when the other is running), but those are low priority at the moment, the GUI design was strightforward and the real big issues will start now in the network protocol implementation with twisted.

Andrea Arcangeli wrote:
Is there any plan to use epoll instead of poll to make twisted scalabile with hundred thousand simultanous sockets connected?
There has been some work (I personally wrote a partial epoll python library at the time epoll was very new). I think the progress stopped then because of epoll API instability; now that epoll is no longer a moving target, someone should get back on the case.
Yes. All the different reactors implement the same interface. Also, notice that the default reactor most likely uses select, not poll: $ python -c 'from twisted.internet import reactor; print reactor' <twisted.internet.default.SelectReactor instance at 0x401fb16c> $ python -c 'from twisted.internet import pollreactor; \ pollreactor.install(); from twisted.internet import reactor; \ print reactor' <twisted.internet.pollreactor.PollReactor instance at 0x401f33ec>
My gut feeling is you'll either hit an OS limit or sys.maxint, and the latter is pretty huge. Haven't looked at the details.
Well, there's nothing Twisted- or even Python-specific in that. The solution probably depends heavily on your dataset size, access patterns, and available RAM. Some people advocate heavy RAM caching. Sendfile might be the solution, but I don't think there's any integration of sendfile with python, far less with twisted. Your plan on isolating disk IO to separate process(es) sounds quite sane. Your master process could receive the file data from the IO workers in blocks via a shared mmap, to avoid passing it through a socket (even if the socket was a local TCP connection or UNIX domain). Don't know if that optimization is worth it; I would delay writing any extra code until the problem actually shows up. Note that python threading is very likely _not_ what you want; the threads synchronize in the interpreter level quite a lot. Sadly, not even http://www.kegel.com/c10k.html (which is normally _the_ resource for things like this) talks that much about disk IO.
SFS (secure file system) does something like that. The info page included has this: SFS clients require no configuration. Simply run the program `sfscd', and a directory `/sfs' should appear on your system. To test your client, access our SFS test server. Type the following commands: % cd /sfs/@sfs.fs.net,uzwadtctbjb3dg596waiyru8cx5kb4an % cat CONGRATULATIONS You have set up a working SFS client. % Note that the `/sfs/@sfs.fs.net,...' directory does not need to exist before you run the `cd' command. SFS transparently mounts new servers as you access them. The part after the comma is a hash of the public key the server at sfs.fs.net must present, in order to be accepted.

On Wed, Oct 06, 2004 at 11:02:27PM +0300, Tommi Virtanen wrote:
then because of epoll API instability; now that epoll is no longer a moving target, someone should get back on the case.
agreed ;). Ideally with truly huge number of sockets open, the time wasted in poll at some point would be more than the time wasted in the python interpreter (if compared to a C source). Would be interesting to measure the breakpoint, so when the poll cost becomes higher than the interpreter.
good point. I'll use pollreactor for now. Apparently, I still have to use the normal "select" reactor for interfacing with pyqt, but that's ok since I don't (yet) need scalability on the client side...
My gut feeling is you'll either hit an OS limit or sys.maxint, and the latter is pretty huge. Haven't looked at the details.
ok fine ;).
yes, heavy ram caching is fine for reads, but writes may still require O_SYNC.
Sendfile might be the solution, but I don't think there's any integration of sendfile with python, far less with twisted.
sendfile is synchronous too, so I don't think it'd solve the problem. Plus sendfile only works from the filesystem to the network, while for me it's almost the other way around and I've to parse the data anyways (I'm even thinking to use pickle objects as storage for each user, but I'm a bit afraid about the versioning and the unpickle/pickle performance, so if I upgrade the user class and then all unpickle breaks because I lack a on-disk format different from the in-memory format).
so you're saying I could already used shared mmap. but how to serialize then? I'd need pthread_mutex for that. Otherwise if I have to serialize through a pipe I can as well send the data through the pipe as well (it's not going to be high bandwidth communication where an additional memcpy matters, it'd prefer shared mem only for lowlatency and full-userspace locking for the data producer)
Agreed ;)
Note that python threading is very likely _not_ what you want; the threads synchronize in the interpreter level quite a lot.
agreed, it's not really scaling. This is also why I doubt the serialization through shmem would work well, unless I write a module from scratch for the pthread_mutex futex driven locking.
I found sfscd program, but it's not a python program and it seems a bit different from what I wanted to do. My object was to create a private/public key pair, and to use an SSL library to load that file automatically and use it as the public/private key. My point is that if twisted supports the native ssh protocol from id_rsa* than it'll be a joke to implement my public/private key in a file too. I was just trying to reuse whatever is available right now, be it SSH/SSL/sshtunnel/whatever as transport for the encryption. So if you've a suggestion of what encrypted transport to use that's welcome. Thank you very much for the help!

On Thu, 2004-10-07 at 17:53, Andrea Arcangeli wrote:
Actually there's a QT reactor that uses QT's event loop (and by extension whatever QT uses internally, be it select() or poll()).
The SSH code uses pycrypto. Twisted's SSL layer uses PyOpenSSL, but PyOpenSSL doesn't expose the OpenSSL encryption APIs. There are Python wrappers for a number of other crypto libraries as well.

On Wed, 2004-10-06 at 15:24, Andrea Arcangeli wrote:
Is there any plan to use epoll instead of poll to make twisted scalabile with hundred thousand simultanous sockets connected?
It wouldn't be hard to do, mainly someone needs to wrap epoll for Python.
Correct. -- Itamar Shtull-Trauring http://itamarst.org

Andrea Arcangeli wrote:
Is there any plan to use epoll instead of poll to make twisted scalabile with hundred thousand simultanous sockets connected?
There has been some work (I personally wrote a partial epoll python library at the time epoll was very new). I think the progress stopped then because of epoll API instability; now that epoll is no longer a moving target, someone should get back on the case.
Yes. All the different reactors implement the same interface. Also, notice that the default reactor most likely uses select, not poll: $ python -c 'from twisted.internet import reactor; print reactor' <twisted.internet.default.SelectReactor instance at 0x401fb16c> $ python -c 'from twisted.internet import pollreactor; \ pollreactor.install(); from twisted.internet import reactor; \ print reactor' <twisted.internet.pollreactor.PollReactor instance at 0x401f33ec>
My gut feeling is you'll either hit an OS limit or sys.maxint, and the latter is pretty huge. Haven't looked at the details.
Well, there's nothing Twisted- or even Python-specific in that. The solution probably depends heavily on your dataset size, access patterns, and available RAM. Some people advocate heavy RAM caching. Sendfile might be the solution, but I don't think there's any integration of sendfile with python, far less with twisted. Your plan on isolating disk IO to separate process(es) sounds quite sane. Your master process could receive the file data from the IO workers in blocks via a shared mmap, to avoid passing it through a socket (even if the socket was a local TCP connection or UNIX domain). Don't know if that optimization is worth it; I would delay writing any extra code until the problem actually shows up. Note that python threading is very likely _not_ what you want; the threads synchronize in the interpreter level quite a lot. Sadly, not even http://www.kegel.com/c10k.html (which is normally _the_ resource for things like this) talks that much about disk IO.
SFS (secure file system) does something like that. The info page included has this: SFS clients require no configuration. Simply run the program `sfscd', and a directory `/sfs' should appear on your system. To test your client, access our SFS test server. Type the following commands: % cd /sfs/@sfs.fs.net,uzwadtctbjb3dg596waiyru8cx5kb4an % cat CONGRATULATIONS You have set up a working SFS client. % Note that the `/sfs/@sfs.fs.net,...' directory does not need to exist before you run the `cd' command. SFS transparently mounts new servers as you access them. The part after the comma is a hash of the public key the server at sfs.fs.net must present, in order to be accepted.

On Wed, Oct 06, 2004 at 11:02:27PM +0300, Tommi Virtanen wrote:
then because of epoll API instability; now that epoll is no longer a moving target, someone should get back on the case.
agreed ;). Ideally with truly huge number of sockets open, the time wasted in poll at some point would be more than the time wasted in the python interpreter (if compared to a C source). Would be interesting to measure the breakpoint, so when the poll cost becomes higher than the interpreter.
good point. I'll use pollreactor for now. Apparently, I still have to use the normal "select" reactor for interfacing with pyqt, but that's ok since I don't (yet) need scalability on the client side...
My gut feeling is you'll either hit an OS limit or sys.maxint, and the latter is pretty huge. Haven't looked at the details.
ok fine ;).
yes, heavy ram caching is fine for reads, but writes may still require O_SYNC.
Sendfile might be the solution, but I don't think there's any integration of sendfile with python, far less with twisted.
sendfile is synchronous too, so I don't think it'd solve the problem. Plus sendfile only works from the filesystem to the network, while for me it's almost the other way around and I've to parse the data anyways (I'm even thinking to use pickle objects as storage for each user, but I'm a bit afraid about the versioning and the unpickle/pickle performance, so if I upgrade the user class and then all unpickle breaks because I lack a on-disk format different from the in-memory format).
so you're saying I could already used shared mmap. but how to serialize then? I'd need pthread_mutex for that. Otherwise if I have to serialize through a pipe I can as well send the data through the pipe as well (it's not going to be high bandwidth communication where an additional memcpy matters, it'd prefer shared mem only for lowlatency and full-userspace locking for the data producer)
Agreed ;)
Note that python threading is very likely _not_ what you want; the threads synchronize in the interpreter level quite a lot.
agreed, it's not really scaling. This is also why I doubt the serialization through shmem would work well, unless I write a module from scratch for the pthread_mutex futex driven locking.
I found sfscd program, but it's not a python program and it seems a bit different from what I wanted to do. My object was to create a private/public key pair, and to use an SSL library to load that file automatically and use it as the public/private key. My point is that if twisted supports the native ssh protocol from id_rsa* than it'll be a joke to implement my public/private key in a file too. I was just trying to reuse whatever is available right now, be it SSH/SSL/sshtunnel/whatever as transport for the encryption. So if you've a suggestion of what encrypted transport to use that's welcome. Thank you very much for the help!

On Thu, 2004-10-07 at 17:53, Andrea Arcangeli wrote:
Actually there's a QT reactor that uses QT's event loop (and by extension whatever QT uses internally, be it select() or poll()).
The SSH code uses pycrypto. Twisted's SSL layer uses PyOpenSSL, but PyOpenSSL doesn't expose the OpenSSL encryption APIs. There are Python wrappers for a number of other crypto libraries as well.

On Wed, 2004-10-06 at 15:24, Andrea Arcangeli wrote:
Is there any plan to use epoll instead of poll to make twisted scalabile with hundred thousand simultanous sockets connected?
It wouldn't be hard to do, mainly someone needs to wrap epoll for Python.
Correct. -- Itamar Shtull-Trauring http://itamarst.org
participants (3)
-
Andrea Arcangeli
-
Itamar Shtull-Trauring
-
Tommi Virtanen