Hello,
I'm not sure if this
is the right list, or maybe twisted-web is more appropriate.
Also, I might ask a
question which has already been answered many times, but I was unable to find
references.
Short
version:
I have 10.000 -
100.000 web browsers that are connected to my site, and I need to inform them
__real-time__ (a max of 3-5 seconds delay) of an event that happened on the
server. Is twisted the right way to go, given the fact that it promises
asynchronous event handling ?
Long
version:
I have an
information flux on a web page, that must change, as stated before, on some
specific event that happens on a server.
I have thought of
two ways of doing this:
1. The "ask every 5
seconds approach"
Pretty obvious, the
browser connects every 5 seconds and requests the page again. However, for
10.000 clients, the server soon dies, and the 5 seconds limit is still not
respected (because times of response get incredibly long when apache is
submerged in requests).
2. The "ask and wait
for answer approach"
The basic idea
is the following:
- the browser
connects to the web page
- there is a
javascript snippet in the page that reconnects in the background (using the
javascript HTTPRequest object) to a special script on the
server.
- the server keeps
the connection open (by sending spaces, literally, once every 10-15 seconds -
and sleeping in between, not to put too much stress on the server either). When
an event happens, the server sends all the needed data to the client, that
redisplays it (through javascript).
Of course, there is
the problem with apache and it's 5 minutes script running limit (I have
implemented this in PHP), but the javascript code is pretty smart to handle
this, and when a connection fails, it reconnects and all goes
well.
This was a little
better than the first approach, at least in the response times, that are now
consistent with the requirements. However, a new problem arrises: apache cannot
handle a very large number of open connections at the same time (every web
browser has at least an open connection, in this case). After my calculations
(it's pretty hard to compute exactly, as I know of no
javascript-enabled crawler that I can programmatically use), the server will be
completely trashed at around 300 connections.
The
problem gets even more complicated with today's browsers: they have a limitation
of 2 concurent connections to the same site (don't know if you noticed, but you
cannot download 3 files concurently from the same site). And the
HTTPRequest connections count toward this limit. So if a client uses two of my
information fluxes, he will be unable to visit the site at the same
time.
Don't
know if twisted solves this last problem. If not, I'll try to find a work around
(messing around with the DNS seems like a good idea at this point in
time).
The
question is if twisted can solve my problem of informing all my clients of the
event (the event will not happen concurently for all the clients, so there is no
problem with server load; however, all the clients will be listening concurently
for their specific event)
Thanks for any answer, or for any direction/pointers
you can give me. I might be totally wrong in my approach, so I'm really open to
all suggestions (except buying lots of servers to make this work, of
course).
Tiberiu
DONDERA