[Twisted-Python] Persistence in browsers?
Hello, I'm not sure if this is the right list, or maybe twisted-web is more appropriate. Also, I might ask a question which has already been answered many times, but I was unable to find references. Short version: I have 10.000 - 100.000 web browsers that are connected to my site, and I need to inform them __real-time__ (a max of 3-5 seconds delay) of an event that happened on the server. Is twisted the right way to go, given the fact that it promises asynchronous event handling ? Long version: I have an information flux on a web page, that must change, as stated before, on some specific event that happens on a server. I have thought of two ways of doing this: 1. The "ask every 5 seconds approach" Pretty obvious, the browser connects every 5 seconds and requests the page again. However, for 10.000 clients, the server soon dies, and the 5 seconds limit is still not respected (because times of response get incredibly long when apache is submerged in requests). 2. The "ask and wait for answer approach" The basic idea is the following: - the browser connects to the web page - there is a javascript snippet in the page that reconnects in the background (using the javascript HTTPRequest object) to a special script on the server. - the server keeps the connection open (by sending spaces, literally, once every 10-15 seconds - and sleeping in between, not to put too much stress on the server either). When an event happens, the server sends all the needed data to the client, that redisplays it (through javascript). Of course, there is the problem with apache and it's 5 minutes script running limit (I have implemented this in PHP), but the javascript code is pretty smart to handle this, and when a connection fails, it reconnects and all goes well. This was a little better than the first approach, at least in the response times, that are now consistent with the requirements. However, a new problem arrises: apache cannot handle a very large number of open connections at the same time (every web browser has at least an open connection, in this case). After my calculations (it's pretty hard to compute exactly, as I know of no javascript-enabled crawler that I can programmatically use), the server will be completely trashed at around 300 connections. The problem gets even more complicated with today's browsers: they have a limitation of 2 concurent connections to the same site (don't know if you noticed, but you cannot download 3 files concurently from the same site). And the HTTPRequest connections count toward this limit. So if a client uses two of my information fluxes, he will be unable to visit the site at the same time. Don't know if twisted solves this last problem. If not, I'll try to find a work around (messing around with the DNS seems like a good idea at this point in time). The question is if twisted can solve my problem of informing all my clients of the event (the event will not happen concurently for all the clients, so there is no problem with server load; however, all the clients will be listening concurently for their specific event) Thanks for any answer, or for any direction/pointers you can give me. I might be totally wrong in my approach, so I'm really open to all suggestions (except buying lots of servers to make this work, of course). Tiberiu DONDERA
Tibi Dondera wrote:
Short version: I have 10.000 - 100.000 web browsers that are connected to my site, and I need to inform them __real-time__ (a max of 3-5 seconds delay) of an event that happened on the server. Is twisted the right way to go, given the fact that it promises asynchronous event handling ?
Wow. In short, "maybe". That is a lot of clients. I believe that Twisted is probably up to the task, but regardless of which solution you choose, you are going to have to do a lot of tuning. Have you done any prototyping yet? If you do, I definitely recommend inspecting the different reactors for performance differences with your application. If you can use Javascript, you might want to use a JS sockets library rather than XMLHTTPRequest, just so that you don't end up using one of your persistent connections up. Or, perhaps the notification site should be notify.yourdomain.com rather than yourdomain.com, so that the persistent connection doesn't count towards your "site"...
Hello, I was unable to locate anu javascript library that uses sockets, the closest I found was some mozilla technologies that transform it in a kind of JS server, but I need a cross-browser, industry standard (read: that works with the current software of users, i.e. 95% IE) solution. Also, I still did not understand from your answer if twisted _does_ support this technique, of answering a HTTP client, then putting it to sleep until something happens, on a large scale. Thanks for your help, and any further info. _________________ Tiberiu DONDERA -----Original Message----- From: twisted-python-bounces@twistedmatrix.com [mailto:twisted-python-bounces@twistedmatrix.com] On Behalf Of Glyph Lefkowitz Sent: Saturday, 05 February 2005 03:47 To: Twisted general discussion Subject: Re: [Twisted-Python] Persistence in browsers? Tibi Dondera wrote:
Short version: I have 10.000 - 100.000 web browsers that are connected to my site, and I need to inform them __real-time__ (a max of 3-5 seconds delay) of an event that happened on the server. Is twisted the right way to go, given the fact that it promises asynchronous event handling ?
Wow. In short, "maybe". That is a lot of clients. I believe that Twisted is probably up to the task, but regardless of which solution you choose, you are going to have to do a lot of tuning. Have you done any prototyping yet? If you do, I definitely recommend inspecting the different reactors for performance differences with your application. If you can use Javascript, you might want to use a JS sockets library rather than XMLHTTPRequest, just so that you don't end up using one of your persistent connections up. Or, perhaps the notification site should be notify.yourdomain.com rather than yourdomain.com, so that the persistent connection doesn't count towards your "site"... _______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On Mon, 2005-02-07 at 14:11 +0100, Tibi Dondera wrote:
Also, I still did not understand from your answer if twisted _does_ support this technique, of answering a HTTP client, then putting it to sleep until something happens, on a large scale.
It does. In fact Donovan is giving a talk about it at the upcoming PyCon. For web questions you probably want the twisted-web mailing list.
"Tibi Dondera" <incoming@pronet-romania.com> writes:
Short version: I have 10.000 - 100.000 web browsers that are connected to my site, and I need to inform them __real-time__ (a max of 3-5 seconds delay) of an event that happened on the server. Is twisted the right way to go, given the fact that it promises asynchronous event handling ?
Independent of twisted versus other methods (which some other notes have addressed), I think such a load is going to require that you consider distributing your notification system. For example, notifying 100000 clients in 5 seconds is 20000 packets per second being generated which is a non-trivial load on both a machine (not quite sure you can hit this from application space on typical machines even) and its corresponding bandwidth (say 64 byte packets including protocol overhead would saturate a 10Mbit stream). At the least you might try tiering your notification system - the code running on your page would connect to one of a pool of servers (ideally geographically distributed over the network or else backbone latencies and bandwidths might become an issue), and those servers would maintain a single connection back to a central server. This could be tiered multiple times for more efficiency or scaleability. Then, your question really becomes one of what sort of fan-out factor do you need to optimize notifications. For example, the central server(s) would notice the change, but only have to notify the next tier (perhaps 10s of machines). Those 10s of machines would either each notify their own next tier, or directly notify a bunch of leaf machines. With this sort of structure, the number of individual clients that any given "node" in the system has to support is far less and the system can grow incrementally by first permitting the ratio of clients to leaf nodes to increase, and then adding leaf nodes (or another tier) when needed to suddenly bring that ratio back down. -- David
Thank you for your answers to my questions, I've been away for a while and was unable to respond. I will have no problem with the load on the server, because actually, each client will be informed of a different event (or very few clients will follow the same events), and not all events happen at the same time (but the requirement that the system support lots of clients, and that for every client, the response time must not be over 5 sec). I will dig deeper into the twisted-web architecture, to see how "sending to sleep" HTTP clients is supported. Thanks again. _________________ Tiberiu DONDERA -----Original Message----- From: twisted-python-bounces@twistedmatrix.com [mailto:twisted-python-bounces@twistedmatrix.com] On Behalf Of David Bolen Sent: Monday, 07 February 2005 18:31 To: twisted-python@twistedmatrix.com Subject: [Twisted-Python] Re: Persistence in browsers? "Tibi Dondera" <incoming@pronet-romania.com> writes:
Short version: I have 10.000 - 100.000 web browsers that are connected to my site, and I need to inform them __real-time__ (a max of 3-5 seconds delay) of an event that happened on the server. Is twisted the right way to go, given the fact that it promises asynchronous event handling ?
Independent of twisted versus other methods (which some other notes have addressed), I think such a load is going to require that you consider distributing your notification system. For example, notifying 100000 clients in 5 seconds is 20000 packets per second being generated which is a non-trivial load on both a machine (not quite sure you can hit this from application space on typical machines even) and its corresponding bandwidth (say 64 byte packets including protocol overhead would saturate a 10Mbit stream). At the least you might try tiering your notification system - the code running on your page would connect to one of a pool of servers (ideally geographically distributed over the network or else backbone latencies and bandwidths might become an issue), and those servers would maintain a single connection back to a central server. This could be tiered multiple times for more efficiency or scaleability. Then, your question really becomes one of what sort of fan-out factor do you need to optimize notifications. For example, the central server(s) would notice the change, but only have to notify the next tier (perhaps 10s of machines). Those 10s of machines would either each notify their own next tier, or directly notify a bunch of leaf machines. With this sort of structure, the number of individual clients that any given "node" in the system has to support is far less and the system can grow incrementally by first permitting the ratio of clients to leaf nodes to increase, and then adding leaf nodes (or another tier) when needed to suddenly bring that ratio back down. -- David _______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
participants (4)
-
David Bolen
-
Glyph Lefkowitz
-
Itamar Shtull-Trauring
-
Tibi Dondera