
Chaz. wrote:
Now let me address the issue of TCP. It is a pretty heavy protocol to use. It takes a lot of resources on the sender and target and can take some time to establish a connection. Opening a 1000 or more sockets consumes a lot of resources in the underlying OS and in the Twisted client!
People keep trying to help you, and you keep repeating yourself. From what I can gather: You *need* a relatively lightweight group communication method. My advice would be to investigate a message bus - see recent posts on this mailing list. "Spread" at www.spread.org and ActiveMQ (via the simple text-over-tcp-based STOMP protocol). Reports are that both can (under the right conditions) execute many thousands of group messages per second. Failing that, Glyph has hinted at another approach. You could elect a small number (~1%) of your nodes as "proxies" so that as well as being clients, they act as intermediaries for messages. This is a simple form of overlay network, which you also stated you didn't want to use - lord knows why. People use these techniques for a reason - they work. You *want* (have decided you want) a reliable multicast protocol over which you'll layer a simple RPC protocol. RMT (reliable multicast transport) is as yet an unsolved problem. It is VERY VERY hard. None exist for Twisted, to the best of my knowledge. I would be willing to bet money that, for "thousands" of nodes, the overhead of implementing such a protocol (in Python, one presumes) would exceed the overhead of just using TCP. If you had said "hundreds of thousands" of nodes, well, that would be different. If you want to knock an RMT up based on the assumption you won't drop packets, then be my guest, but I would suggest that if you *really* believe multicast is that reliable, then your experience of IP multicast networks has been a lot more rosy than mine, and I run a very large one. "reliable multicast" into google would be a good start - there are some good RFCs produced the the rmt IETF working group.
If I use TCP and stick to the serial, synchronized semantics of RPC, doing one call at a time, I have only a few ways to solve the problem. Do one call at a time, repeat N times, and that could take quite a while. I could do M spawnProcesses and have each do N/M RPC calls. Or I could use M threads and do it that way. Granted I have M sockets open at a time, it is possible for this to take quite a while to execute. Performance would be terrible (and yes I want an approach that has good to very good performance. After all who would want poor to terrible performance?)
Knuth and his comments on early optimisation apply here. Have you tried it? You might be surprised. I have some Twisted code that does SNMP to over a thousand devices. This is, obviously, unicast UDP. The throughput is very high. A simple ACK-based sequence-numbered UDP unicast will very likely scale to thousands of nodes.
So I divided the problem down to two parts. One, can I reduce the amount of traffic on the invoking side of the RPC request? Second, is how to deal with the response. Obviously I have to deal with the issue of failure, since RPC semantics require EXACTLY-ONCE.
How many calls per second are you doing, and approximately what volume of data will each call exchange? You seem inflexible about aspects of the design. If if were me, I'd abandon RPC semantics. Smarter people than anyone here have argued convincingly against making a remote procedure call look anything like a local one, and once you abandon *that*, RPCs look like message exchanges.
That gets me to the multicast or broadcast scheme. In one call I could get the N processors to start working. Now I just have to solve the other half of the problem: how to get the answers returned without swamping the network or how to detect when I didn't get an answer from a processor at all.
That leads me to the observation that on an uncongested ethernet I almost always have a successful transmission. This means I have to deal
Successful transmission is really the easy bit for multicast. There is IGMP snooping, IGMP querier misbehaviour, loss of forwarding on an upstream IGP flap, flooding issues due to global MSDP issues, and so forth.
with that issue and a few others. Why do I care? Because I believe I can accomplish what I need - get great performance most of the time, and only in a few instances have to deal with do the operation over again.
This is a tough problem to solve. I am not sure of the outcome but I am sure that I need to start somewhere. What I know is that it is partly transport and partly marshalling. The semantics of the call have to stay fixed: EXACTLY-ONCE.
If you MUST have EXACTLY-ONCE group communication semantics, you should use a message bus.