![](https://secure.gravatar.com/avatar/d6328babd9f9a98ecc905e1ccac2495e.jpg?s=120&d=mm&r=g)
On 10:15 am, spongelavapaul@googlemail.com wrote:
I've hit a problem as my app has got bigger (about 30-40 widgets now, all chattering roughly once every 2 seconds) where the reliable message delivery mechanism is spiralling out of control. It seems that the constant back and forth means that large 'baskets' of messages are resent. The more this happens, the busier everything gets until the browser becomes unresponsive.
This is unfortunate, but I'm sure it's fixable. At least, partially. Client-server communication, especially in JavaScript, isn't free.
There's a fix for it: [Divmod-dev] athena duplicate messages issue but I'm slightly concerned about the potential for lost messages - and also confused about how this could happen. Given that HTTP is a reliable connection-oriented transport, where is the gap that messages can fall through?
HTTP is neither reliable nor connection-oriented :). TCP is reliable and connection-oriented, but HTTP builds on top of it to produce something which is neither. "reliable" in this case doesn't mean that the transport is perfect and will deliver everything, but that if you send messages "1, 2, 3", you will get messages "1, 2, 3" in that order or you will get nothing at all. (Of course you may also get just "1", or "1, 2", but you will never get "3, 1, 2".) Even if HTTP had a way to initiate the delivery of a message over a channel that was already busy receiving the response to another message (it doesn't) we'd have to contend with the browser APIs for issuing HTTP requests, which leave out significant portions of the actual protocol. For example, browser javascript may never issue more than two concurrent requests to the same host, since the spec says that's all that you can do. So, what is happening here is that have Nevow attempts to implement a protocol in terms of HTTP messages as individual, unreliable messages, which may be eaten by beasts like transparent proxies and browser runtime bugs, and present to your application a stream of messages which are always in order and never dropped. This is, as it happens, *exactly* what Orbited does, and Nevow could potentially be implemented on top of Orbited. However, Nevow's implementation has a bug, and over- zealously re-delivers messages, when frequently re-delivery is not required. This is rarely a problem except for the noise that it generates in your log files and the performance problems that it creates, which you've noticed, if your message queue starts to back up. So, my suggestion to you would be to read through the relevant JavaScript code for delivering "baskets" to the server, and try to figure out what exactly is happening, and write a patch to correct this behavior. It's not trivial, but it's not rocket science either. If I recall correctly, the problem is that the client will overzealously interrupt its own connection to the server where it is sending a basket of collected messages, in order to free up the HTTP connection to send a *new* message which it has generated. It would be better if the client would allow for a brief (and actually "brief" probably needs to be pretty long, in the wild) grace period to allow the HTTP request to be fully received and responded to before piling on more work. Part of the problem here, of course, is that the crappy JavaScript browser HTTP API won't let us tell how much of our request has been uploaded or process the response as it arrives. So we have to guess what a reasonable timeout would be, rather than have the algorithm operate on actual data. In other words, you're right: the messages are not actually disappearing into a black hole :). As far as what you should do: I think you should try to write a patch. It's not trivial, but it's not rocket science either: it's just computer science. Hopefully my description of the problem is accurate enough to get you started; I'm sure that if you ask for help on this list or on IRC as you're working on it, you will find no shortage of it. Lots of people have reported this problem over the years but nobody has (as far as I can tell from searching right now) thought to even report the bug as a ticket on divmod.org, let alone contribute a fix for it.
I think I can cope with lost messages in most cases, so would it be useful to add a kind of 'sendRemote' that was like 'callRemote' but didn't care about a response? Or maybe this already exists and I've missed it?
Could you cope with these messages arriving arbitrarily out of order? I am willing to bet not; it would just make your application extremely difficult to test, and it would start spewing exceptions when it started to get more heavily loaded, rather than making the browser unresponsive.
P.S. this app is likely to get more noisy - is it likely that I'll have to abandon Athena for Orbited or similar? I mean, are there architectural differences that will prevent Athena scaling?