Den 3. nov. 2012 kl. 11:14 skrev Antoine Pitrou firstname.lastname@example.org:
True, but is the time latency O(n) or O(log n)?
Right, that's the difference. However, I think here we are concerned about CPU load on the server, not individual latency (as long as it is acceptable, e.g. lower than 5 ms).
Ok, I can do som tests on Windows :-)
Also, from what I read, the complexity of select.poll is O(n) with respect to file handles, so this should not be any worse (O(log n) katency wait, O(n) polling) I think.
epoll and kqueue are better than O(number of objects) though.
I know, they claim the wait to be about O(1). I guess that magic happens in the kernel.
With IOCP on Windows there is a thread-pool that continuously polls the i/o tasks for completion. So I think IOCPs might approach O(n) at some point.
I assume as long as we are staying in user-space, there will always be a O(n) overhead somewhere. To avoid it one would need the kernel to trigger callbacks from hardware interrupts, which presumably is what epoll and kqueue do. But at least on Windows, anything except "one-thread-per-client" involves O(n) polling by user-space threads (even IOCPs and RegisterWaitForSingleObject do that). The kernel only schedules threads, it does not trigger i/o callbacks from hardware.
But who in their right mind use Windows for these kind of servers anyway?
Another interesting strategy for high-performance on Windows 64: Just use blocking i/o and one thread per client. The stack-space limitation is a 32-bit problem, and Windows 64 has no problem scheduling an insane number of threads. Even desktop computers today can have 16 GB of RAM, so there is virtually no limitation on the number of i/o threads Windows 64 can multiplex.
That's still a huge waste of RAM, isn't it?
That is depending on perspective :-)If threads are a simpler design pattern than IOCPs, the latter is a huge waste of work hours. Which is cheaper today? Think to some extent IOCPs solves a problem related to 32-bit address spaces or limited RAM. But if RAM is cheaper than programming effort, just go ahead and waste as much as you need :-)
Also, those who need these kind of servers can certainly afford to buy enough RAM.
Also, by relying on preemptive threading you have to use Python's synchronization primitives (locks, etc.), and I don't know how these would scale.
But would it scale with Python threads and the GIL as well? You would be better to answer that.
I haven't done any tests with a large number of threads, but the GIL certainly has a (per-thread as well as per-context switch) overhead.
That is the thing, plain Windows threads and Python threads in huge numbers might not behave similarly. It would be interesting to test.