Mailman 3 Re: [Twisted-Python] Multicast XMLRPC - Twisted

newer
[Twisted-Python] Weekly Bug Summary

Re: [Twisted-Python] Multicast XMLRPC

older
Re: [Twisted-Python] Multicast...

Chaz.

Aug. 24, 2006

4:54 p.m.

Chaz. wrote:

...

radix@twistedmatrix.com wrote:

...
On 03:49 pm, eprparadocs@gmail.com wrote:

...
I have a XMLPRC client that works well under TCP and I am now trying to get it to work under Multicast conditions.

It is unlikely that you will be able to get the XMLRPC protocol to work over multicast, given that multicast is an unreliable transport, like UDP. HTTP and XMLRPC don't know how to deal with that.

...
Here is the "TCP" code I used:

class StorageService(internet.TCPServer): def __init__(self,tcpPort,configInfo): r = StoragePeer(configInfo) xmlrpc.addIntrospection(r) internet.TCPServer.__init__(self,tcpPort,server.Site(r))

Subclassing the thinks in twisted.application.internet is not really how those classes are meant to be used. Why did you do this instead of just instantiating a TCPServer with the appropriate port and factory?

...
I changed the call to TCPServer to:

class StorageService(internet.TCPServer): def __init__(self,tcpPort,configInfo): r = StoragePeer(configInfo) xmlrpc.addIntrospection(r) internet.MulticastServer.__init__(self,tcpPort,server.Site(r))

I thought this would work since, but it doesn't. What I get returned is the following error message:

Failed to load application: unbound method __init__() must be called with MulticastServer instance as first argument (got StorageService instance instead)

This is a trivial error in your Python. You can't call methods of classes directly unless the first argument is an instance of that class.

I know how much work I will need to do to get any UDP-like protocol to work with EXACTLY-ONCE semantics.

It seems my one problem was that in the definition of my class... class StorageService(internet.TCPServer) I should have used class StorageService(internet.MulticastServer) That solved my immediate problem, though I did find out that XMLRPC does in fact assume that you have a connection oriented protocol underneath it. Now I will just have to fix that problem. Also for those of you that said you can't do: internet.TCPServer.__init__(self,...) I would suggest you are wrong. In fact that is exactly how subclassing works in Python. But that is for another time. Once again thanks! Chaz.

Show replies by date

L. Daniel Burr

August 2006

5:23 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

On Thu, 24 Aug 2006 11:54:38 -0500, Chaz. <eprparadocs@gmail.com> wrote:

...

Chaz. wrote:

Also for those of you that said you can't do:

internet.TCPServer.__init__(self,...)

I would suggest you are wrong. In fact that is exactly how subclassing works in Python. But that is for another time.

Sorry, but you didn't read JP's response carefully. What JP said was that you cannot "call an unbound method from one class with a self argument which is an instance of an unrelated class". That's exactly what you were doing in your first post; you called TCPServer.__init__ with a "self" argument that was actually an instance of a totally unrelated class, and that is why the call failed.

...

Once again thanks!

Have a good one, L. Daniel Burr

glyph＠divmod.com

5:48 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

On Thu, 24 Aug 2006 12:54:38 -0400, "Chaz." <eprparadocs@gmail.com> wrote:

...

Chaz. wrote:

...

It seems my one problem was that in the definition of my class...

Those classes are not really meant to be used by subclassing.

...

That solved my immediate problem, though I did find out that XMLRPC does in fact assume that you have a connection oriented protocol underneath it. Now I will just have to fix that problem.

How do you propose to "fix" that property of XMLRPC? It's not really a "problem", in that it's *defined* to use not only a connection, but an HTTP connection at that. From the XML-RPC specification: "An XML-RPC message is an HTTP-POST request." There is Jabber-RPC, which indicates how you might make an XMLRPC-*like* protocol over some other transport, but in your case that still requires a reliable multicast message delivery layer (a monumental task by itself). What is the application you are writing this for?

...

Also for those of you that said you can't do:

internet.TCPServer.__init__(self,...)

I would suggest you are wrong. In fact that is exactly how subclassing works in Python. But that is for another time.

You misunderstood. I assure you they were correct, but that isn't what they said. Simplified, here is what you did:

...

...
...
class A: ... def __init__(self): ... print self ... class B: ... def __init__(self): ... A.__init__(self) ... B() Traceback (most recent call last): File "<stdin>", line 1, in ? File "<stdin>", line 3, in __init__ TypeError: unbound method __init__() must be called with A instance as first argument (got B instance instead)

This is, in fact, illegal, and that is why you got the exception that you did. This is all moot, however, since you shouldn't use TCPServer with subclassing :).

Chaz.

6:13 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

glyph@divmod.com wrote:

...

On Thu, 24 Aug 2006 12:54:38 -0400, "Chaz." <eprparadocs@gmail.com> wrote:

...
Chaz. wrote:

...
It seems my one problem was that in the definition of my class...

Those classes are not really meant to be used by subclassing.

...
That solved my immediate problem, though I did find out that XMLRPC does in fact assume that you have a connection oriented protocol underneath it. Now I will just have to fix that problem.

How do you propose to "fix" that property of XMLRPC? It's not really a "problem", in that it's *defined* to use not only a connection, but an HTTP connection at that. From the XML-RPC specification: "An XML-RPC message is an HTTP-POST request."

There is Jabber-RPC, which indicates how you might make an XMLRPC-*like* protocol over some other transport, but in your case that still requires a reliable multicast message delivery layer (a monumental task by itself).

What is the application you are writing this for?

...
Also for those of you that said you can't do:

internet.TCPServer.__init__(self,...)

I would suggest you are wrong. In fact that is exactly how subclassing works in Python. But that is for another time.

You misunderstood. I assure you they were correct, but that isn't what they said. Simplified, here is what you did:

...
...
...
class A: ... def __init__(self): ... print self ... class B: ... def __init__(self): ... A.__init__(self) ... B() Traceback (most recent call last): File "<stdin>", line 1, in ? File "<stdin>", line 3, in __init__ TypeError: unbound method __init__() must be called with A instance as first argument (got B instance instead)

This is, in fact, illegal, and that is why you got the exception that you did.

This is all moot, however, since you shouldn't use TCPServer with subclassing :).

_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

First, I got the mistake I made. It was the standard "cut-and-paste" problem. I cut code from one place, pasted it in another and forgot one piece all together. I have a thousand or more processors running in an application and need to send out a request to each and every one of them. Obviously it is impossible to send it out sequentially and it doesn't need to. The model of communication I am working from is that most communications are delivered and only once in a while do they fail (this is pretty true of an un-congested ethernet-like network). So my algorithm is as follows (and if someone sees something wrong please let me know...a thousand pairs of eyes and brains is better than one). Send out the RPC request via Multicast. Set some timeout (based on the request type). Since I know the servers in the collection, when I don't hear from one within the time out, resend the RPC request. There is a problem with this approach - that is the semantics of the call. Current RPC calls are EXACTLY ONCE semantics - it either works or doesn't. In this there is an uncertainty - it might have worked and I don't know about it (the response was lost for instance) or the server itself failed and restarted. This means I can't every be guaranteed the normal call semantics. It means it switches to AT LEAST ONCE semantics. This means I have to be careful in how I code the server side. Does anyone see another approach. Chaz

Itamar Shtull-Trauring

6:55 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

On Thu, 2006-08-24 at 14:13 -0400, Chaz. wrote:

...

There is a problem with this approach - that is the semantics of the call. Current RPC calls are EXACTLY ONCE semantics - it either works or doesn't. In this there is an uncertainty - it might have worked and I don't know about it (the response was lost for instance) or the server itself failed and restarted. This means I can't every be guaranteed the normal call semantics. It means it switches to AT LEAST ONCE semantics. This means I have to be careful in how I code the server side.

If your protocol has a unique message identifier you can make sure duplicate commands are not rerun.

Chaz.

7:13 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

Itamar Shtull-Trauring wrote:

...

On Thu, 2006-08-24 at 14:13 -0400, Chaz. wrote:

...
There is a problem with this approach - that is the semantics of the call. Current RPC calls are EXACTLY ONCE semantics - it either works or doesn't. In this there is an uncertainty - it might have worked and I don't know about it (the response was lost for instance) or the server itself failed and restarted. This means I can't every be guaranteed the normal call semantics. It means it switches to AT LEAST ONCE semantics. This means I have to be careful in how I code the server side.

If your protocol has a unique message identifier you can make sure duplicate commands are not rerun.

I thought about doing that but ruled it out. So long as the server runs I can count on the UID being stepped correctly. If the server goes down and comes back up, it might be reset or set incorrectly. So I can't guarantee it. Can you see another way around the problem? Chaz

Samuel Reynolds

1:59 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

At 2006-08-24 02:13 PM -0400, you wrote:

...

I have a thousand or more processors running in an application and need to send out a request to each and every one of them.

This is an incomplete statement of the problem. To get assistance you need to more fully state the problem you're trying to solve. What are you trying to accomplish? 1. Are you... a. sending data *to* each processor from a central server (e.g., configuration data)? b. retrieving data *from* each processor for display/processing at a central server (e.g., status information)? c. both? 2. How often do you need to send/receive the data? 3. How much latency is acceptable? 4. How much data loss is acceptable? Your desire to use multicast suggests that you're doing 1a. If 1c, you need to specify 2-4 separately for both 1a and 1b. Without this (and probably other) information, any solution suggested (or adopted) is, at best, a hammer looking for a nail.

...

Obviously it is impossible to send it out sequentially and it doesn't need to.

Very little is actually impossible.

...

The model of communication I am working from is that most communications are delivered and only once in a while do they fail (this is pretty true of an un-congested ethernet-like network).

So my algorithm is as follows (and if someone sees something wrong please let me know...a thousand pairs of eyes and brains is better than one). Send out the RPC request via Multicast. Set some timeout (based on the request type). Since I know the servers in the collection, when I don't hear from one within the time out, resend the RPC request.

Hammer. Look at the whole toolbox. Don't settle too quickly on one tool; you might need a screwdriver, instead. For example, each processor might periodically call the central server to send its status information. Or it might call the central server only when it has new data to send, and otherwise make a simpler, low-overhead "heartbeat" call for monitoring. HTH. - Sam __________________________________________________________ Spinward Stars, LLC Samuel Reynolds Software Consulting and Development 303-805-1446 http://SpinwardStars.com/ sam@SpinwardStars.com

Chaz.

4:05 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

Samuel Reynolds wrote:

...

At 2006-08-24 02:13 PM -0400, you wrote:

...
I have a thousand or more processors running in an application and need to send out a request to each and every one of them.

This is an incomplete statement of the problem. To get assistance you need to more fully state the problem you're trying to solve.

What are you trying to accomplish? 1. Are you... a. sending data *to* each processor from a central server (e.g., configuration data)? b. retrieving data *from* each processor for display/processing at a central server (e.g., status information)? c. both? 2. How often do you need to send/receive the data? 3. How much latency is acceptable? 4. How much data loss is acceptable?

Your desire to use multicast suggests that you're doing 1a. If 1c, you need to specify 2-4 separately for both 1a and 1b.

Without this (and probably other) information, any solution suggested (or adopted) is, at best, a hammer looking for a nail.

...
Obviously it is impossible to send it out sequentially and it doesn't need to.

Very little is actually impossible.

...
The model of communication I am working from is that most communications are delivered and only once in a while do they fail (this is pretty true of an un-congested ethernet-like network).

So my algorithm is as follows (and if someone sees something wrong please let me know...a thousand pairs of eyes and brains is better than one). Send out the RPC request via Multicast. Set some timeout (based on the request type). Since I know the servers in the collection, when I don't hear from one within the time out, resend the RPC request.

Hammer. Look at the whole toolbox. Don't settle too quickly on one tool; you might need a screwdriver, instead.

For example, each processor might periodically call the central server to send its status information. Or it might call the central server only when it has new data to send, and otherwise make a simpler, low-overhead "heartbeat" call for monitoring.

HTH.

- Sam

__________________________________________________________ Spinward Stars, LLC Samuel Reynolds Software Consulting and Development 303-805-1446 http://SpinwardStars.com/ sam@SpinwardStars.com

_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Since the title of the email is Multicast XML-RPC I would say I was pretty specific. I need to send XML-RPC requests to thousands of machines, something that can't be done using a connection oriented protocol via TCP very efficiently. Chaz.

Jean-Paul Calderone

4:17 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

On Fri, 25 Aug 2006 12:05:50 -0400, "Chaz." <eprparadocs@gmail.com> wrote:

...

Samuel Reynolds wrote:

...
At 2006-08-24 02:13 PM -0400, you wrote:

...
I have a thousand or more processors running in an application and need to send out a request to each and every one of them.

This is an incomplete statement of the problem. To get assistance you need to more fully state the problem you're trying to solve.

What are you trying to accomplish? 1. Are you... a. sending data *to* each processor from a central server (e.g., configuration data)? b. retrieving data *from* each processor for display/processing at a central server (e.g., status information)? c. both? 2. How often do you need to send/receive the data? 3. How much latency is acceptable? 4. How much data loss is acceptable?

Since the title of the email is Multicast XML-RPC I would say I was pretty specific. I need to send XML-RPC requests to thousands of machines, something that can't be done using a connection oriented protocol via TCP very efficiently.

Chaz.

Multicast is a choice for a solution to a problem. So is "sending XML-RPC requests to thousands of machines". Neither are themselves descriptions of a problem; rather, they are descriptions of a solution. Sam is asking for more information about the problem you are trying to solve. Jean-Paul

Chaz.

5:07 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

Jean-Paul Calderone wrote:

...

On Fri, 25 Aug 2006 12:05:50 -0400, "Chaz." <eprparadocs@gmail.com> wrote:

...
Samuel Reynolds wrote:

...
At 2006-08-24 02:13 PM -0400, you wrote:

...
I have a thousand or more processors running in an application and need to send out a request to each and every one of them.

This is an incomplete statement of the problem. To get assistance you need to more fully state the problem you're trying to solve.

What are you trying to accomplish? 1. Are you... a. sending data *to* each processor from a central server (e.g., configuration data)? b. retrieving data *from* each processor for display/processing at a central server (e.g., status information)? c. both? 2. How often do you need to send/receive the data? 3. How much latency is acceptable? 4. How much data loss is acceptable?

Since the title of the email is Multicast XML-RPC I would say I was pretty specific. I need to send XML-RPC requests to thousands of machines, something that can't be done using a connection oriented protocol via TCP very efficiently.

Chaz.

Multicast is a choice for a solution to a problem. So is "sending XML-RPC requests to thousands of machines". Neither are themselves descriptions of a problem; rather, they are descriptions of a solution.

Sam is asking for more information about the problem you are trying to solve.

Jean-Paul

_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

I will state what I thought was obvious: I need to make "calls" to thousands of machines to do something. I want to minimize the overhead both of making the call and the machines sending back the responses. On the invoking side I believe multicast or broadcast is the way to go since I can send out one message and hit them all. On the response side I need a low-overhead protocol. TCP is pretty resource intensive so I need something else. I think a reliable datagram service on top of some underlying transport is the way to go (on top of multicast/broadcast/IP is what I am thinking about). I think this describes the problem well. Chaz

Jean-Paul Calderone

5:28 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

On Fri, 25 Aug 2006 13:07:49 -0400, "Chaz." <eprparadocs@gmail.com> wrote:

...

I will state what I thought was obvious:

Sorry to inconvenience you so.

...

[snip]

I think this describes the problem well.

I'll leave it to others to decide. For my part, I don't believe I have any else productive to contribute. Jean-Paul

glyph＠divmod.com

5:54 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

On Fri, 25 Aug 2006 13:07:49 -0400, "Chaz." <eprparadocs@gmail.com> wrote:

...

I will state what I thought was obvious: I need to make "calls" to thousands of machines to do something. I want to minimize the overhead both of making the call and the machines sending back the responses.

Maybe you could start off a little bit further back into the problem today. Like, "I got up this morning, and I thought, 'I would like some toast.', but I don't know how to make toast, so I wanted to design a 100,000 node parallel neural network to develop a receipie for toast." Perhaps then someone on this list could relate their toast development experiences, such as "using a TCP-based tree topology similar to IRC servers has been sufficient in my experience for toast-oriented data exchange although I have been using a parallelized coordinated genetic algorithm rather than a neural network to develop an optimal crunch/warmth experience", or possibly "ToastVortex, my Twisted-basted toast application server is available at http://toastvortex.example.com/" or better yet, "buy a toaster and put some bread in it".

...

TCP is pretty resource intensive so I need something else. I think a reliable datagram service on top of some underlying transport is the way to go (on top of multicast/broadcast/IP is what I am thinking about).

TCP's "resource" consumption is localized in an a highly optimized environment; in OS kernels, where the TCP stack is tuned constantly by thousands of people, in routing hardware that is specialized to give TCP traffic priority to improve performance, and in the guts of the public internet that runs such hardware and is constantly monitored and tweaked to give TCP even more of a boost. Any custom multicast protocol you develop, while perhaps theoretically better than TCP, is possibly going to get swamped by the marginalia that TCP has spent decades eradicating. In Python, you're going to be doing a lot of additional CPU work. For example, TCP acks often won't even be promoted to userspace, whereas you're going to need to process every unicast acknowledgement to your multicast message separately in userspace. While my toast network deployments are minimal, I *have* written quite a few multi-unicast servers, some of which processed quite a high volume of traffic acceptably, and in at least one case this work was later optimized by another developer who spent months working on a multicast replacement. That replacement which was later abandoned because the deployment burden of a large-scale multicast-capable network was huge. That's to say nothing of the months of additional time required to develop and properly *test* such a beast. You haven't said what resources TCP is consuming which are unacceptble, however, Is it taking too much system time? Too much local bandwidth? Is your ethernet experiencing too many collisions? Are you concerned about the cost of public internet bandwidth overages with your service provider? What's your network topology? It would be hard to list the answers to all of these questions (or even exhaustively ask all the questions one would need to comment usefully) but one might at least make guesses that did not fall too wide of the mark if one knew what the application in question were actually *doing*. In any event, XML-RPC is hardly a protocol which is famous for its low resource consumption on any of these axes, so if you're driven by efficiency concerns, it seems an odd choice to layer on top of a hand-tuned multicast-request/unicast-response protocol.

Chaz.

6:43 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

glyph@divmod.com wrote:

...

On Fri, 25 Aug 2006 13:07:49 -0400, "Chaz." <eprparadocs@gmail.com> wrote:

...
I will state what I thought was obvious: I need to make "calls" to thousands of machines to do something. I want to minimize the overhead both of making the call and the machines sending back the responses.

Maybe you could start off a little bit further back into the problem today. Like, "I got up this morning, and I thought, 'I would like some toast.', but I don't know how to make toast, so I wanted to design a 100,000 node parallel neural network to develop a receipie for toast."

Perhaps then someone on this list could relate their toast development experiences, such as "using a TCP-based tree topology similar to IRC servers has been sufficient in my experience for toast-oriented data exchange although I have been using a parallelized coordinated genetic algorithm rather than a neural network to develop an optimal crunch/warmth experience", or possibly "ToastVortex, my Twisted-basted toast application server is available at http://toastvortex.example.com/" or better yet, "buy a toaster and put some bread in it".

...
TCP is pretty resource intensive so I need something else. I think a reliable datagram service on top of some underlying transport is the way to go (on top of multicast/broadcast/IP is what I am thinking about).

TCP's "resource" consumption is localized in an a highly optimized environment; in OS kernels, where the TCP stack is tuned constantly by thousands of people, in routing hardware that is specialized to give TCP traffic priority to improve performance, and in the guts of the public internet that runs such hardware and is constantly monitored and tweaked to give TCP even more of a boost. Any custom multicast protocol you develop, while perhaps theoretically better than TCP, is possibly going to get swamped by the marginalia that TCP has spent decades eradicating. In Python, you're going to be doing a lot of additional CPU work. For example, TCP acks often won't even be promoted to userspace, whereas you're going to need to process every unicast acknowledgement to your multicast message separately in userspace.

While my toast network deployments are minimal, I *have* written quite a few multi-unicast servers, some of which processed quite a high volume of traffic acceptably, and in at least one case this work was later optimized by another developer who spent months working on a multicast replacement. That replacement which was later abandoned because the deployment burden of a large-scale multicast-capable network was huge. That's to say nothing of the months of additional time required to develop and properly *test* such a beast.

You haven't said what resources TCP is consuming which are unacceptble, however, Is it taking too much system time? Too much local bandwidth? Is your ethernet experiencing too many collisions? Are you concerned about the cost of public internet bandwidth overages with your service provider? What's your network topology? It would be hard to list the answers to all of these questions (or even exhaustively ask all the questions one would need to comment usefully) but one might at least make guesses that did not fall too wide of the mark if one knew what the application in question were actually *doing*.

In any event, XML-RPC is hardly a protocol which is famous for its low resource consumption on any of these axes, so if you're driven by efficiency concerns, it seems an odd choice to layer on top of a hand-tuned multicast-request/unicast-response protocol.

_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Perhaps the simple way to say this is that I need to do group communications that support RPC semantics with minimal overhead. You ask about the network topology; all I can say is that it supports the normal communication means: unicast, broadcast and maybe multicast. I am being intentionally vague since I don't want to have any specific network architecture. I don't want to use overlay networks, if at all possible. While they are nice, I would prefer something a little more direct (though that might not be possible). The reason? Direct operations are faster. I have a membership list of the state of all the processors in the system (and I am talking 1000's of processors) without the use of standard heartbeat (in the traditional use of heartbeat I would have N! ping messages!). I figured out probabilistic polling with gossip was enough. I don't particular care if it is PB, XML-RPC or SOAP as the marshalling mechanism. I mention them since they allow me to solve one problem at a time. I would like to build the solution a piece at a time to do some measurements and testing. Today the underlying transport and tomorrow the marshallings. Now let me address the issue of TCP. It is a pretty heavy protocol to use. It takes a lot of resources on the sender and target and can take some time to establish a connection. Opening a 1000 or more sockets consumes a lot of resources in the underlying OS and in the Twisted client! If I use TCP and stick to the serial, synchronized semantics of RPC, doing one call at a time, I have only a few ways to solve the problem. Do one call at a time, repeat N times, and that could take quite a while. I could do M spawnProcesses and have each do N/M RPC calls. Or I could use M threads and do it that way. Granted I have M sockets open at a time, it is possible for this to take quite a while to execute. Performance would be terrible (and yes I want an approach that has good to very good performance. After all who would want poor to terrible performance?) So I divided the problem down to two parts. One, can I reduce the amount of traffic on the invoking side of the RPC request? Second, is how to deal with the response. Obviously I have to deal with the issue of failure, since RPC semantics require EXACTLY-ONCE. That gets me to the multicast or broadcast scheme. In one call I could get the N processors to start working. Now I just have to solve the other half of the problem: how to get the answers returned without swamping the network or how to detect when I didn't get an answer from a processor at all. That leads me to the observation that on an uncongested ethernet I almost always have a successful transmission. This means I have to deal with that issue and a few others. Why do I care? Because I believe I can accomplish what I need - get great performance most of the time, and only in a few instances have to deal with do the operation over again. This is a tough problem to solve. I am not sure of the outcome but I am sure that I need to start somewhere. What I know is that it is partly transport and partly marshalling. The semantics of the call have to stay fixed: EXACTLY-ONCE. Hope this helps cast the problem...I didn't mean to sound terse before I just figured everyone had already thought about the problem and knew the issues. Chaz

glyph＠divmod.com

7:48 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

On Fri, 25 Aug 2006 14:43:43 -0400, "Chaz." <eprparadocs@gmail.com> wrote:

...

Perhaps the simple way to say this is that I need to do group communications that support RPC semantics with minimal overhead.

I'm still not really clear on what the application is.

...

You ask about the network topology; all I can say is that it supports the normal communication means: unicast, broadcast and maybe multicast.

Heh. "Normal" communication means? After writing a P2P layer and working on a SIP implementation, I have come to understand that the only "normal" communication available is an *outgoing*, *unencrypted* HTTP request on port 80... ;-) More seriously, if you're writing an application for distributing compute nodes to home computers, multicast is a non-starter. If it's an intranet, then maybe it's feasible. Or, if you're on Internet 2 for some reason. (Is anybody actually on internet 2 these days?) At any rate, producing a functioning multiunicast prototype with, e.g. PB, would be the easiest way to get started if you need to fall back to that sort of topology anyway in the case where a multicast solution doesn't work. Then you can collect data and determine how much bandwidth is going to be saved in a realistic scenario...

...

I am being intentionally vague since I don't want to have any specific network architecture.

If you want to support arbitrary network architecture, you _definitely_ can't use multicast, at all. Even determining if *unicast* datagrams work on an arbitrary network is a hard problem.

...

I don't want to use overlay networks, if at all possible. While they are nice, I would prefer something a little more direct (though that might not be possible). The reason? Direct operations are faster.

Sometimes. If your topology involves an extremely well-connected overlay hub peer and a bunch of intermittently or poorly connected edge peers, direct operations can be significantly slower. While I'm not a big fan of IRC's network architecture, the math on what happens if every client is responsible for all of their own messages on a channel of 1000 people is really eye-opening.

...

I don't particular care if it is PB, XML-RPC or SOAP as the marshalling mechanism. I mention them since they allow me to solve one problem at a time. I would like to build the solution a piece at a time to do some measurements and testing. Today the underlying transport and tomorrow the marshallings.

It still seems to me like this is backwards. The application can be complete, end-to-end, if you start marshalling data and sending it over a simplistic (but possibly too-expensive) mechanism. Then, you can replace the transport as necessary later. Preserving the semantics of the marshalling between things as radically different as XMLRPC and PB would be very hard; but as you've said, the semantics of your transport must remain identical.

...

Now let me address the issue of TCP. It is a pretty heavy protocol to use. It takes a lot of resources on the sender and target and can take some time to establish a connection. Opening a 1000 or more sockets consumes a lot of resources in the underlying OS and in the Twisted client!

I still don't know what you mean by "resources", and as compared to what. In my experience all the alternatives to TCP end up consuming an equivalent amount of RAM and CPU time... although in some cases you might save on bandwidth.

...

If I use TCP and stick to the serial, synchronized semantics of RPC, doing one call at a time, I have only a few ways to solve the problem. Do one call at a time, repeat N times, and that could take quite a while.

I'm not sure what you mean by "at a time". The operations can be quite effectively parallelized, both by TCP and by Twisted talking to the OS: if you keep a list of all your open connections and do the naive thing, i.e., for each heartbeat: for connection in connections: connection.sendPing(timeout=30).addErrback(connection.uhOh) the initial loop will not take very long even with a very large number of connections, and Twisted will send out traffic as network conditions permit. Most importantly, you do not need to wait for any of the calls to complete to issue more calls, regardless of whether they're unicast or multicast. This same API could be refactored internally to group together peers in the same multicast group and coalesce their pings; but you still need to do the same complexity order of work, because you have to track each peer's response individually. Finally, if all you're concerned with is clients dying, you can remove Python from the equation entirely and let the TCP stack do its thing: set SO_KEEPALIVE on all your sockets [in Twisted-ese: self.transport.setTcpKeepAlive(True)] and just wait for connectionLost to be called when a ping fails. No user-space work _at all_, and probably pretty minimal bandwidth usage.

...

I could do M spawnProcesses and have each do N/M RPC calls.

Yow. That definitely doesn't make sense unless you have a massively SMP box.

...

Or I could use M threads and do it that way.

... and that would basically _never_ make sense, under any conditions. Python's GIL negates any SMP benefits, Twisted won't send network messages from threads anyway, and it would be substantially more complex.

...

Granted I have M sockets open at a time, it is possible for this to take quite a while to execute. Performance would be terrible (and yes I want an approach that has good to very good performance. After all who would want poor to terrible performance?)

"performance 'would be' terrible" sounds like premature optimization to me. At least, I have lots of experience with systems where this performance was more than good enough. Huge massively multiplayer games use such systems and manage to deal with tens of thousands of concurrent clients per game node with (relative) ease, over the public internet, with good performance, and without breaking the bank on bandwidth.

...

So I divided the problem down to two parts. One, can I reduce the amount of traffic on the invoking side of the RPC request? Second, is how to deal with the response. Obviously I have to deal with the issue of failure, since RPC semantics require EXACTLY-ONCE.

If you're concerned about bandwidth *as a resource of its own* then this is perhaps a legitimate concern. But if you're concerned about reducing bandwidth as a means to increase the real-time performance of the system I don't think that it's actually going to save you a lot. You save some bandwidth, but then you move a bunch of request/response tracking out of hardware and into Python. Unless your new algorithm is more efficient by a large margin, and N is very big indeed (100,000 is not "big", especially when you can partition it using techniques like overlay networks).

...

That leads me to the observation that on an uncongested ethernet (...)

"uncongested ethernet" implies something very concrete about your network topology. Certainly it implies that you have enough spare bandwith that you don't need to be compressing every byte. Want to expound? :)

...

Hope this helps cast the problem...I didn't mean to sound terse before I just figured everyone had already thought about the problem and knew the issues.

I still really don't know what the problem at hand is. I gather it has something to do with sending a lot of traffic to a lot of peers but that is still a description of an implementation technique, not a problem. Are you making toast? Doing distributed testing? Sequencing genomes? Cracking encryption? Writing some kind of monster distributed enterprise calendar server? (I'm still not sure what you meant by "communicating groups", above.) Is the "problem" in this case to develop a generic infrastructure for some wider set of problems, like an open-source implementation of a MapReduce daemon? If so, what are the initial problems it's expected to be applied to? What does all this data, other than hearbteats, that you're slinging around *represent*?

Chaz.

8:33 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

glyph@divmod.com wrote:

...

On Fri, 25 Aug 2006 14:43:43 -0400, "Chaz." <eprparadocs@gmail.com> wrote:

...
Perhaps the simple way to say this is that I need to do group communications that support RPC semantics with minimal overhead.

I'm still not really clear on what the application is.

The application is a massively scalable data storage system. I plan on releasing it into the open source community within the next month or so. I've been working on it for almost two years now. Twisted and Python have made easier work of it from my first implementation (C/C++).

...

...
You ask about the network topology; all I can say is that it supports the normal communication means: unicast, broadcast and maybe multicast.

Heh. "Normal" communication means? After writing a P2P layer and working on a SIP implementation, I have come to understand that the only "normal" communication available is an *outgoing*, *unencrypted* HTTP request on port 80... ;-)

More seriously, if you're writing an application for distributing compute nodes to home computers, multicast is a non-starter. If it's an intranet, then maybe it's feasible. Or, if you're on Internet 2 for some reason. (Is anybody actually on internet 2 these days?)

This is not a "home application" but an enterprise and/or SSP application. Most likely it sits behind a firewall and if remote offices need to access it, they will get access via VPN portals. I guess this is sort of a topology! Doh. All I know is that I have multicast and with some effort broadcast support.

...

At any rate, producing a functioning multiunicast prototype with, e.g. PB, would be the easiest way to get started if you need to fall back to that sort of topology anyway in the case where a multicast solution doesn't work. Then you can collect data and determine how much bandwidth is going to be saved in a realistic scenario...

So I can use PB with multicast support? How would I deal with all the target machines getting responses back>

...

...
I am being intentionally vague since I don't want to have any specific network architecture.

If you want to support arbitrary network architecture, you _definitely_ can't use multicast, at all. Even determining if *unicast* datagrams work on an arbitrary network is a hard problem.

As I said the entire system sits behind a firewall and remote sites will use VPN to get to the system. This gives me multicast and broadcast (so long as everything is on the same subnet, even remote sites).

...

...
I don't want to use overlay networks, if at all possible. While they are nice, I would prefer something a little more direct (though that might not be possible). The reason? Direct operations are faster.

Sometimes. If your topology involves an extremely well-connected overlay hub peer and a bunch of intermittently or poorly connected edge peers, direct operations can be significantly slower. While I'm not a big fan of IRC's network architecture, the math on what happens if every client is responsible for all of their own messages on a channel of 1000 people is really eye-opening.

I had thought of the hub-and-spoke model and I designed the system that way, originally. But I have to respond to instantaneous demands which caused me to change the design of the system. Each of the servers can run as both servers (providing a service to a client app) and an end point (providing storage features). So a hub-and-spoke architecture are really out of the picture for me (at least I can't see an easy way). I could probably do a self-organizing overlay network on top of the machines taking advantage of how they are connected together (the real physical topology) but even that presents me with an issue: I want the system to sort of be self-configuring. As such I don't have a way to auto-detect connection speeds. I had thought of using a clock-synchronization algorithm to figure out bandwidth throttling but I thought it better to leave that to another day (or days). I also don't want the user (or owner of this beast) to have to manually configure that stuff.

...

...
I don't particular care if it is PB, XML-RPC or SOAP as the marshalling mechanism. I mention them since they allow me to solve one problem at a time. I would like to build the solution a piece at a time to do some measurements and testing. Today the underlying transport and tomorrow the marshallings.

It still seems to me like this is backwards.

The application can be complete, end-to-end, if you start marshalling data and sending it over a simplistic (but possibly too-expensive) mechanism. Then, you can replace the transport as necessary later. Preserving the semantics of the marshalling between things as radically different as XMLRPC and PB would be very hard; but as you've said, the semantics of your transport must remain identical.

That is certainly one way. I tend to think all my hard problems are going to be transport issues and work up the stack. I have had a share of algorithm issues too; nothing is quite obvious when you have a 1000 or 10,000 machines to deal with!

...

...
Now let me address the issue of TCP. It is a pretty heavy protocol to use. It takes a lot of resources on the sender and target and can take some time to establish a connection. Opening a 1000 or more sockets consumes a lot of resources in the underlying OS and in the Twisted client!

I still don't know what you mean by "resources", and as compared to what. In my experience all the alternatives to TCP end up consuming an equivalent amount of RAM and CPU time... although in some cases you might save on bandwidth.

By resources I mean memory and time. Granted on a 1GB system with 3 GB of virtual, memory isn't a big deal, most of the times. But I have seen memory leaks kill this sucker more times than I care to recall. Once I ran the application for a few days and saw all my swap being used! It was very subtle memory leak in one of the libraries (in fact one library leak consumed 584M in less than one hour!).

...

...
If I use TCP and stick to the serial, synchronized semantics of RPC, doing one call at a time, I have only a few ways to solve the problem. Do one call at a time, repeat N times, and that could take quite a while.

I'm not sure what you mean by "at a time". The operations can be quite effectively parallelized, both by TCP and by Twisted talking to the OS: if you keep a list of all your open connections and do the naive thing, i.e., for each heartbeat:

for connection in connections: connection.sendPing(timeout=30).addErrback(connection.uhOh)

the initial loop will not take very long even with a very large number of connections, and Twisted will send out traffic as network conditions permit.

Most importantly, you do not need to wait for any of the calls to complete to issue more calls, regardless of whether they're unicast or multicast. This same API could be refactored internally to group together peers in the same multicast group and coalesce their pings; but you still need to do the same complexity order of work, because you have to track each peer's response individually.

Yes, I completely forgot that I would see them all in parallel. I tend to overlook Twisted's state machine architecture when I think of solutions. I am getting better but not quite there yet...

...

Finally, if all you're concerned with is clients dying, you can remove Python from the equation entirely and let the TCP stack do its thing: set SO_KEEPALIVE on all your sockets [in Twisted-ese: self.transport.setTcpKeepAlive(True)] and just wait for connectionLost to be called when a ping fails. No user-space work _at all_, and probably pretty minimal bandwidth usage.

...
I could do M spawnProcesses and have each do N/M RPC calls.

Yow. That definitely doesn't make sense unless you have a massively SMP box.

...
Or I could use M threads and do it that way.

... and that would basically _never_ make sense, under any conditions. Python's GIL negates any SMP benefits, Twisted won't send network messages from threads anyway, and it would be substantially more complex.

Yes...see my mea culpa above....it is hard to stop thinking in terms of threads and processes!

...

...
Granted I have M sockets open at a time, it is possible for this to take quite a while to execute. Performance would be terrible (and yes I want an approach that has good to very good performance. After all who would want poor to terrible performance?)

"performance 'would be' terrible" sounds like premature optimization to me. At least, I have lots of experience with systems where this performance was more than good enough. Huge massively multiplayer games use such systems and manage to deal with tens of thousands of concurrent clients per game node with (relative) ease, over the public internet, with good performance, and without breaking the bank on bandwidth.

Do the games use TCP or UDP? I would have thought they save state about each of the players in the server and use UDP for message passing. I thought that was part of the reason most game developers where interested in STUN?

...

...
So I divided the problem down to two parts. One, can I reduce the amount of traffic on the invoking side of the RPC request? Second, is how to deal with the response. Obviously I have to deal with the issue of failure, since RPC semantics require EXACTLY-ONCE.

If you're concerned about bandwidth *as a resource of its own* then this is perhaps a legitimate concern. But if you're concerned about reducing bandwidth as a means to increase the real-time performance of the system I don't think that it's actually going to save you a lot. You save some bandwidth, but then you move a bunch of request/response tracking out of hardware and into Python. Unless your new algorithm is more efficient by a large margin, and N is very big indeed (100,000 is not "big", especially when you can partition it using techniques like overlay networks).

Bandwidth is a very important issue in this system. No one would run this on their network if it could bring down their network (or congest it so badly ...the old packet-storm issue). Minimizing bandwidth usage is only one way to deal with performance. A congested network will drop packets (requiring retransmission, etc), so I try to minimize the impact on the network.

...

...
That leads me to the observation that on an uncongested ethernet (...)

"uncongested ethernet" implies something very concrete about your network topology. Certainly it implies that you have enough spare bandwith that you don't need to be compressing every byte. Want to expound? :)

Well a congested network is about 1/2 the bandwidth; so I can expect about 5Mb/sec on a 10M ethernet, etc. So the idea would be keep traffic to a minimum. As I mentioned earlier, if I was to do normal heartbeat messages with N machines, I have N! messages moving around. So long as N is small - a few hundred machines (and I have built machines in the telecom world that have had 100 machines), the load on the network is reasonable. But once you have 1000 machines, you have a 1,000,000 messages flying around. Through the work I did I got the number down to 2,000! And over 15 seconds, that isn't too bad.

...

...
Hope this helps cast the problem...I didn't mean to sound terse before I just figured everyone had already thought about the problem and knew the issues.

I still really don't know what the problem at hand is. I gather it has something to do with sending a lot of traffic to a lot of peers but that is still a description of an implementation technique, not a problem. Are you making toast? Doing distributed testing? Sequencing genomes? Cracking encryption? Writing some kind of monster distributed enterprise calendar server? (I'm still not sure what you meant by "communicating groups", above.) Is the "problem" in this case to develop a generic infrastructure for some wider set of problems, like an open-source implementation of a MapReduce daemon? If so, what are the initial problems it's expected to be applied to? What does all this data, other than hearbteats, that you're slinging around *represent*?

_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

glyph＠divmod.com

10:13 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

On Fri, 25 Aug 2006 16:33:52 -0400, "Chaz." <eprparadocs@gmail.com> wrote:

...

glyph@divmod.com wrote:

...

...
I'm still not really clear on what the application is.

The application is a massively scalable data storage system.

Okay! Now I know what you're getting at :).

...

This is not a "home application" but an enterprise and/or SSP application. Most likely it sits behind a firewall and if remote offices need to access it, they will get access via VPN portals.

I guess this is sort of a topology! Doh. All I know is that I have multicast and with some effort broadcast support.

OK, that makes more sense.

...

...
At any rate, producing a functioning multiunicast prototype with, e.g. PB, would be the easiest way to get started if you need to fall back to that sort of topology anyway in the case where a multicast solution doesn't work. Then you can collect data and determine how much bandwidth is going to be saved in a realistic scenario...

So I can use PB with multicast support? How would I deal with all the target machines getting responses back>

My point here was really not anything about PB specifically. Using PB with multicast would require some tricks; you'd have to have a different Broker implementation, probably, and a datagram-based API. You could still use the underlying message serialization format though.

...

I had thought of the hub-and-spoke model and I designed the system that way, originally. But I have to respond to instantaneous demands which caused me to change the design of the system. Each of the servers can run as both servers (providing a service to a client app) and an end point (providing storage features). So a hub-and-spoke architecture are really out of the picture for me (at least I can't see an easy way).

I don't see that it's out of the picture - your network topology allows you to fairly effortlessly connect between machines (no need for NAT traversal or "home servers" or any of that garbage: just give an IP on the intranet) - just include the "hub" and "spoke" code in the same process, and then any process can act as a hub... dynamic load-balancing is never easy, but it is certainly a possibility.

...

I could probably do a self-organizing overlay network on top of the machines taking advantage of how they are connected together (the real physical topology) but even that presents me with an issue: I want the system to sort of be self-configuring. As such I don't have a way to auto-detect connection speeds.

You can detect connection speeds on the fly; just start doing some work, gather statistics on each connection, and reconfigure if it's not going fast enough. No need for clock synchronization.

...

...
...
Today the underlying transport and tomorrow the marshallings.

It still seems to me like this is backwards.

The application can be complete, end-to-end, if you start marshalling data and sending it over a simplistic (but possibly too-expensive) mechanism. (...) That is certainly one way. I tend to think all my hard problems are going to be transport issues and work up the stack. I have had a share of algorithm issues too; nothing is quite obvious when you have a 1000 or 10,000 machines to deal with!

Working up the stack is difficult because you can't measure the working system at any point to decide what you need to optimize. I prefer to work downwards. If your highest level of code can remain unchanged while you refactor the underlying layers, then you can run the same tests for the same high-level code with different underlying layers to get an idea of their relative performance. If you start optimizing at the bottom of the stack before the top is done, then you can easily end up with something which is optimized in the wrong direction, and which requires rewriting when the top layer is done anyway. I guess this doesn't really have much bearing on your other questions though.

...

...
...
Now let me address the issue of TCP. It is a pretty heavy protocol to use.

...

...
I still don't know what you mean by "resources", and as compared to what.

...

By resources I mean memory and time. Granted on a 1GB system with 3 GB of virtual, memory isn't a big deal, most of the times. But I have seen memory leaks kill this sucker more times than I care to recall. Once I ran the application for a few days and saw all my swap being used! It was very subtle memory leak in one of the libraries (in fact one library leak consumed 584M in less than one hour!).

I notice you don't specifically refer to features of TCP here, but instead of the perils of writing any software at all in C/C++ :). Of course, Python can have memory leaks, but I wouldn't base your architecture around bugs in libraries which will hopefully be unnecessary in the future :).

...

Yes, I completely forgot that I would see them all in parallel. I tend to overlook Twisted's state machine architecture when I think of solutions. I am getting better but not quite there yet...

It might not solve your problem. But Twisted may be doing quite a lot more work in "parallel" than you're used to. I can't really say, but I'd be curious to hear about it if you measure it.

...

...
(Threads are bad) Yes...see my mea culpa above....it is hard to stop thinking in terms of threads and processes!

Yeah, it took me a while to get out of that habit when I started writing Twisted in the first place :). (The thing that preceded it was a blocking, multithreaded abomination.)

...

Do the games use TCP or UDP? I would have thought they save state about each of the players in the server and use UDP for message passing. I thought that was part of the reason most game developers where interested in STUN?

They ... vary. A general rule of thumb is that they use TCP (or something like it) for control messages and data transfer, and then an *unreliable* most-recent-first UDP protocol for transmitting information about physical position, orientation and movement. Game protocols are incredibly involved because they're typically communicating information about a dozen systems at once. Game performance is different than typical application performance because quite often you only care about the most recent state of something, and you can happily throw away any old messages. The games that are interested in STUN are not MMPs; the reason they are using it is to establish P2P connections so that players don't have to receive their updates from a central server, and you don't need to configure your firewall to play.

...

Bandwidth is a very important issue in this system. No one would run this on their network if it could bring down their network (or congest it so badly ...the old packet-storm issue).

This is another good reason to use TCP. There are congestion control mechanisms for TCP; you would have to implement something yourself for UDP.

...

As I mentioned earlier, if I was to do normal heartbeat messages with N machines, I have N! messages moving around. So long as N is small - a few hundred machines (and I have built machines in the telecom world that have had 100 machines), the load on the network is reasonable. But once you have 1000 machines, you have a 1,000,000 messages flying around. Through the work I did I got the number down to 2,000! And over 15 seconds, that isn't too bad.

Why N! messages? Using a naive hub-and-spoke model it seems like it would just be 2N. It's only if every node needs to know about every other node that you get up to N!... why would you need that?

Chaz.

10:59 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

Please forgive the top post...I just felt it better this time since the discussion has become one of heartbeat. Well if you look at things like the linux clustering software the approach they take is brain dead - each machine pings all the others. If you had 2 machines, you would have 2 pings per cycle. With 3 machines, you would have 6 and so on. If you do put an overlay network on top of the physical topology you would definitely have something different. And a hub-and-spoke layout would give you much more than 2N. What I did was combine two approaches into a single mechanism. I use "gossip" to pass around the state of system as I know it (actually the changes in state). I use a probabilistic approach to find the machine to poll - I pick one randomly. If that machine doesn't answer I pick P other machines asking them to poll the original machine and tell me what they found (with the idea it might be congestion between me and the original machine). I find this approach converges to the true state of the system within a few polling cycles. Why do I need to know the state of all the machines? Actually the system does "self-repair" and "self-healing". When a node goes down and comes back up each node will check the information it knows about the node. Some nodes will recognize that the machine that just came back has to hold certain data, and tell it. That's the 5 cent answer. Chaz. glyph@divmod.com wrote:

...

On Fri, 25 Aug 2006 16:33:52 -0400, "Chaz." <eprparadocs@gmail.com> wrote:

...
glyph@divmod.com wrote:

...
...
I'm still not really clear on what the application is.

The application is a massively scalable data storage system.

Okay! Now I know what you're getting at :).

...
This is not a "home application" but an enterprise and/or SSP application. Most likely it sits behind a firewall and if remote offices need to access it, they will get access via VPN portals.

I guess this is sort of a topology! Doh. All I know is that I have multicast and with some effort broadcast support.

OK, that makes more sense.

...
...
At any rate, producing a functioning multiunicast prototype with, e.g. PB, would be the easiest way to get started if you need to fall back to that sort of topology anyway in the case where a multicast solution doesn't work. Then you can collect data and determine how much bandwidth is going to be saved in a realistic scenario...

So I can use PB with multicast support? How would I deal with all the target machines getting responses back>

My point here was really not anything about PB specifically. Using PB with multicast would require some tricks; you'd have to have a different Broker implementation, probably, and a datagram-based API. You could still use the underlying message serialization format though.

...
I had thought of the hub-and-spoke model and I designed the system that way, originally. But I have to respond to instantaneous demands which caused me to change the design of the system. Each of the servers can run as both servers (providing a service to a client app) and an end point (providing storage features). So a hub-and-spoke architecture are really out of the picture for me (at least I can't see an easy way).

I don't see that it's out of the picture - your network topology allows you to fairly effortlessly connect between machines (no need for NAT traversal or "home servers" or any of that garbage: just give an IP on the intranet) - just include the "hub" and "spoke" code in the same process, and then any process can act as a hub... dynamic load-balancing is never easy, but it is certainly a possibility.

...
I could probably do a self-organizing overlay network on top of the machines taking advantage of how they are connected together (the real physical topology) but even that presents me with an issue: I want the system to sort of be self-configuring. As such I don't have a way to auto-detect connection speeds.

You can detect connection speeds on the fly; just start doing some work, gather statistics on each connection, and reconfigure if it's not going fast enough. No need for clock synchronization.

...
...
...
Today the underlying transport and tomorrow the marshallings.

It still seems to me like this is backwards.

The application can be complete, end-to-end, if you start marshalling data and sending it over a simplistic (but possibly too-expensive) mechanism. (...) That is certainly one way. I tend to think all my hard problems are going to be transport issues and work up the stack. I have had a share of algorithm issues too; nothing is quite obvious when you have a 1000 or 10,000 machines to deal with!

Working up the stack is difficult because you can't measure the working system at any point to decide what you need to optimize. I prefer to work downwards. If your highest level of code can remain unchanged while you refactor the underlying layers, then you can run the same tests for the same high-level code with different underlying layers to get an idea of their relative performance. If you start optimizing at the bottom of the stack before the top is done, then you can easily end up with something which is optimized in the wrong direction, and which requires rewriting when the top layer is done anyway.

I guess this doesn't really have much bearing on your other questions though.

...
...
...
Now let me address the issue of TCP. It is a pretty heavy protocol to use.

...
...
I still don't know what you mean by "resources", and as compared to what.

...
By resources I mean memory and time. Granted on a 1GB system with 3 GB of virtual, memory isn't a big deal, most of the times. But I have seen memory leaks kill this sucker more times than I care to recall. Once I ran the application for a few days and saw all my swap being used! It was very subtle memory leak in one of the libraries (in fact one library leak consumed 584M in less than one hour!).

I notice you don't specifically refer to features of TCP here, but instead of the perils of writing any software at all in C/C++ :). Of course, Python can have memory leaks, but I wouldn't base your architecture around bugs in libraries which will hopefully be unnecessary in the future :).

...
Yes, I completely forgot that I would see them all in parallel. I tend to overlook Twisted's state machine architecture when I think of solutions. I am getting better but not quite there yet...

It might not solve your problem. But Twisted may be doing quite a lot more work in "parallel" than you're used to. I can't really say, but I'd be curious to hear about it if you measure it.

...
...
(Threads are bad) Yes...see my mea culpa above....it is hard to stop thinking in terms of threads and processes!

Yeah, it took me a while to get out of that habit when I started writing Twisted in the first place :). (The thing that preceded it was a blocking, multithreaded abomination.)

...
Do the games use TCP or UDP? I would have thought they save state about each of the players in the server and use UDP for message passing. I thought that was part of the reason most game developers where interested in STUN?

They ... vary. A general rule of thumb is that they use TCP (or something like it) for control messages and data transfer, and then an *unreliable* most-recent-first UDP protocol for transmitting information about physical position, orientation and movement. Game protocols are incredibly involved because they're typically communicating information about a dozen systems at once. Game performance is different than typical application performance because quite often you only care about the most recent state of something, and you can happily throw away any old messages.

The games that are interested in STUN are not MMPs; the reason they are using it is to establish P2P connections so that players don't have to receive their updates from a central server, and you don't need to configure your firewall to play.

...
Bandwidth is a very important issue in this system. No one would run this on their network if it could bring down their network (or congest it so badly ...the old packet-storm issue).

This is another good reason to use TCP. There are congestion control mechanisms for TCP; you would have to implement something yourself for UDP.

...
As I mentioned earlier, if I was to do normal heartbeat messages with N machines, I have N! messages moving around. So long as N is small - a few hundred machines (and I have built machines in the telecom world that have had 100 machines), the load on the network is reasonable. But once you have 1000 machines, you have a 1,000,000 messages flying around. Through the work I did I got the number down to 2,000! And over 15 seconds, that isn't too bad.

Why N! messages? Using a naive hub-and-spoke model it seems like it would just be 2N. It's only if every node needs to know about every other node that you get up to N!... why would you need that?

_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Phil Mayers

11:39 a.m.

New subject: [Twisted-Python] Multicast XMLRPC

Chaz. wrote:

...

Now let me address the issue of TCP. It is a pretty heavy protocol to use. It takes a lot of resources on the sender and target and can take some time to establish a connection. Opening a 1000 or more sockets consumes a lot of resources in the underlying OS and in the Twisted client!

People keep trying to help you, and you keep repeating yourself. From what I can gather: You *need* a relatively lightweight group communication method. My advice would be to investigate a message bus - see recent posts on this mailing list. "Spread" at www.spread.org and ActiveMQ (via the simple text-over-tcp-based STOMP protocol). Reports are that both can (under the right conditions) execute many thousands of group messages per second. Failing that, Glyph has hinted at another approach. You could elect a small number (~1%) of your nodes as "proxies" so that as well as being clients, they act as intermediaries for messages. This is a simple form of overlay network, which you also stated you didn't want to use - lord knows why. People use these techniques for a reason - they work. You *want* (have decided you want) a reliable multicast protocol over which you'll layer a simple RPC protocol. RMT (reliable multicast transport) is as yet an unsolved problem. It is VERY VERY hard. None exist for Twisted, to the best of my knowledge. I would be willing to bet money that, for "thousands" of nodes, the overhead of implementing such a protocol (in Python, one presumes) would exceed the overhead of just using TCP. If you had said "hundreds of thousands" of nodes, well, that would be different. If you want to knock an RMT up based on the assumption you won't drop packets, then be my guest, but I would suggest that if you *really* believe multicast is that reliable, then your experience of IP multicast networks has been a lot more rosy than mine, and I run a very large one. "reliable multicast" into google would be a good start - there are some good RFCs produced the the rmt IETF working group.

...

If I use TCP and stick to the serial, synchronized semantics of RPC, doing one call at a time, I have only a few ways to solve the problem. Do one call at a time, repeat N times, and that could take quite a while. I could do M spawnProcesses and have each do N/M RPC calls. Or I could use M threads and do it that way. Granted I have M sockets open at a time, it is possible for this to take quite a while to execute. Performance would be terrible (and yes I want an approach that has good to very good performance. After all who would want poor to terrible performance?)

Knuth and his comments on early optimisation apply here. Have you tried it? You might be surprised. I have some Twisted code that does SNMP to over a thousand devices. This is, obviously, unicast UDP. The throughput is very high. A simple ACK-based sequence-numbered UDP unicast will very likely scale to thousands of nodes.

...

So I divided the problem down to two parts. One, can I reduce the amount of traffic on the invoking side of the RPC request? Second, is how to deal with the response. Obviously I have to deal with the issue of failure, since RPC semantics require EXACTLY-ONCE.

How many calls per second are you doing, and approximately what volume of data will each call exchange? You seem inflexible about aspects of the design. If if were me, I'd abandon RPC semantics. Smarter people than anyone here have argued convincingly against making a remote procedure call look anything like a local one, and once you abandon *that*, RPCs look like message exchanges.

...

That gets me to the multicast or broadcast scheme. In one call I could get the N processors to start working. Now I just have to solve the other half of the problem: how to get the answers returned without swamping the network or how to detect when I didn't get an answer from a processor at all.

That leads me to the observation that on an uncongested ethernet I almost always have a successful transmission. This means I have to deal

Successful transmission is really the easy bit for multicast. There is IGMP snooping, IGMP querier misbehaviour, loss of forwarding on an upstream IGP flap, flooding issues due to global MSDP issues, and so forth.

...

with that issue and a few others. Why do I care? Because I believe I can accomplish what I need - get great performance most of the time, and only in a few instances have to deal with do the operation over again.

This is a tough problem to solve. I am not sure of the outcome but I am sure that I need to start somewhere. What I know is that it is partly transport and partly marshalling. The semantics of the call have to stay fixed: EXACTLY-ONCE.

If you MUST have EXACTLY-ONCE group communication semantics, you should use a message bus.

Chaz.

1:08 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

Phil Mayers wrote:

...

Chaz. wrote:

...
Now let me address the issue of TCP. It is a pretty heavy protocol to use. It takes a lot of resources on the sender and target and can take some time to establish a connection. Opening a 1000 or more sockets consumes a lot of resources in the underlying OS and in the Twisted client!

People keep trying to help you, and you keep repeating yourself. From what I can gather:

You *need* a relatively lightweight group communication method. My advice would be to investigate a message bus - see recent posts on this mailing list. "Spread" at www.spread.org and ActiveMQ (via the simple text-over-tcp-based STOMP protocol). Reports are that both can (under the right conditions) execute many thousands of group messages per second.

I started out using Spread some time ago (more than 2 years ago). The implementation was limited to a hundred or so nodes (that is in the notes on the spread implementation). Secondly it isn't quite so lightweight as you think (I've measured the performance). It is a very nice system but when it gets to 1000s of machines very little work has been done on solving many of the problems. My research on it goes back almost a decade starting out with Horus.

...

Failing that, Glyph has hinted at another approach. You could elect a small number (~1%) of your nodes as "proxies" so that as well as being clients, they act as intermediaries for messages. This is a simple form of overlay network, which you also stated you didn't want to use - lord knows why. People use these techniques for a reason - they work.

I know about overlay networks, gossip networks, etc. I have used both and would prefer something simpler. That is the reason for my pushing on this group - to see what ideas people might have. I appreciate Glyph's comments and perspectives - very refreshing - in contrast to the many I have gotten.

...

You *want* (have decided you want) a reliable multicast protocol over which you'll layer a simple RPC protocol. RMT (reliable multicast transport) is as yet an unsolved problem. It is VERY VERY hard. None exist for Twisted, to the best of my knowledge. I would be willing to bet money that, for "thousands" of nodes, the overhead of implementing such a protocol (in Python, one presumes) would exceed the overhead of just using TCP. If you had said "hundreds of thousands" of nodes, well, that would be different.

If you want to knock an RMT up based on the assumption you won't drop packets, then be my guest, but I would suggest that if you *really* believe multicast is that reliable, then your experience of IP multicast networks has been a lot more rosy than mine, and I run a very large one.

"reliable multicast" into google would be a good start - there are some good RFCs produced the the rmt IETF working group.

Actually I am part of the IRTF group on P2P, E2E and SAM. I know the approaches they are being tossed about. I have tried to implement some of them. I just am not of the opinion that smart people can't find solutions to tough problems. Is multicast or broadcast the right way? I don't know, but I do know that without trying we will never know. Having been part of the IETF community for a lot of years (I was part of the group that worked on SNMP v1 and the WinSock standard), I know that when the "pedal meets the metal" sometimes you discover interesting things.

...

...
If I use TCP and stick to the serial, synchronized semantics of RPC, doing one call at a time, I have only a few ways to solve the problem. Do one call at a time, repeat N times, and that could take quite a while. I could do M spawnProcesses and have each do N/M RPC calls. Or I could use M threads and do it that way. Granted I have M sockets open at a time, it is possible for this to take quite a while to execute. Performance would be terrible (and yes I want an approach that has good to very good performance. After all who would want poor to terrible performance?)

Knuth and his comments on early optimisation apply here. Have you tried it? You might be surprised.

I am sorry to say I don't know the paper or research you are referring to. Can you point me to some references?

...

I have some Twisted code that does SNMP to over a thousand devices. This is, obviously, unicast UDP. The throughput is very high. A simple ACK-based sequence-numbered UDP unicast will very likely scale to thousands of nodes.

Thanks for the information. This is what makes me think that I want something based on UDP and not TCP! And if I can do RMT (or some variant of it) I might be able to get better performance. But, as I said it is the nice thing about not having someone telling me I need to get a product out the door tomorrow! I have time to experiment and learn.

...

...
So I divided the problem down to two parts. One, can I reduce the amount of traffic on the invoking side of the RPC request? Second, is how to deal with the response. Obviously I have to deal with the issue of failure, since RPC semantics require EXACTLY-ONCE.

How many calls per second are you doing, and approximately what volume of data will each call exchange?

This is information I can't provide since the system I have designing has no equivalent in the marketplace today (either commercial or open source). All I know is that the first version of the system I built - using C/C++ and a traditional architecture (a few dozens of machines) was able to handle 200 transactions/minute (using SOAP). While there were some "short messages" (less than an normal MTU), I had quite a few that topped out 50K bytes and some up to 100Mbytes. Doing some research I have been told to expect a great many short ones and many very long ones; sort of an inverted bell curve. But there are very few real statistics. As I said I have to put a stake in the ground and build something so I am guessing where the problems might rest and trying to find some solutions for them. Hence my query.

...

You seem inflexible about aspects of the design. If if were me, I'd abandon RPC semantics. Smarter people than anyone here have argued convincingly against making a remote procedure call look anything like a local one, and once you abandon *that*, RPCs look like message exchanges.

I agree. I am not sure where the answer lies. I like Twisted because it affords a nice way to experiment with different mechanisms both at the transport and the semantic layer. I am looking for ideas! As I said I have the time and inclination to experiment. What I need are things that aren't obvious (because I haven't heard of them or thought of them).

...

...
That gets me to the multicast or broadcast scheme. In one call I could get the N processors to start working. Now I just have to solve the other half of the problem: how to get the answers returned without swamping the network or how to detect when I didn't get an answer from a processor at all.

That leads me to the observation that on an uncongested ethernet I almost always have a successful transmission. This means I have to deal

Successful transmission is really the easy bit for multicast. There is IGMP snooping, IGMP querier misbehaviour, loss of forwarding on an upstream IGP flap, flooding issues due to global MSDP issues, and so forth.

I agree about the successful transmission. You've lost me on the IGMP part. Can you elaborate as to your thoughts?

...

...
with that issue and a few others. Why do I care? Because I believe I can accomplish what I need - get great performance most of the time, and only in a few instances have to deal with do the operation over again.

This is a tough problem to solve. I am not sure of the outcome but I am sure that I need to start somewhere. What I know is that it is partly transport and partly marshalling. The semantics of the call have to stay fixed: EXACTLY-ONCE.

If you MUST have EXACTLY-ONCE group communication semantics, you should use a message bus.

I do know I need EXACTLY-ONCE semantics but how and where I implement them is the unknown. When you use TCP you assume the network provides the bulk of the solution. I have been thinking that if I use a less reliable network - one with low overhead - that I can provide the server part to do the EXACTLY-ONCE piece. As to why I need EXACTLY-ONCE, well if I have to store something I know I absolutely need to store it. I can't be in the position that I don't know it has been stored - it must be there. Thanks for the great remarks....I look forward to reading more. Chaz

Pablo Marti

1:27 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

On Sat, 2006-08-26 at 09:08 -0400, Chaz. wrote:

...

Phil Mayers wrote:

...
Chaz. wrote:

...
Now let me address the issue of TCP. It is a pretty heavy protocol to use. It takes a lot of resources on the sender and target and can take some time to establish a connection. Opening a 1000 or more sockets consumes a lot of resources in the underlying OS and in the Twisted client!

Knuth and his comments on early optimisation apply here. Have you tried it? You might be surprised.

I am sorry to say I don't know the paper or research you are referring to. Can you point me to some references?

The full version of the quote is "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." (http://www.cookcomputing.com/blog/archives/000084.html) http://en.wikipedia.org/wiki/Optimization_(computer_science) Very interesting thread. Regards, Pablo

Phil Mayers

1:53 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

Chaz. wrote:

...

I started out using Spread some time ago (more than 2 years ago). The implementation was limited to a hundred or so nodes (that is in the notes on the spread implementation). Secondly it isn't quite so lightweight as you think (I've measured the performance).

It is a very nice system but when it gets to 1000s of machines very little work has been done on solving many of the problems. My research on it goes back almost a decade starting out with Horus.

I must admit to not having attempted to scale it that far, but I was under the impression that only the more expensive delivery modes were that costly. But by the sounds of it, you don't need me to tell you that.

...

Actually I am part of the IRTF group on P2P, E2E and SAM. I know the approaches they are being tossed about. I have tried to implement some of them. I just am not of the opinion that smart people can't find solutions to tough problems.

Ok, in which case my apologies. My reading of your posts had lead me to believe, incorrectly, you may not be familiar with the various issues. In that case, you can (should) disregard most of it.

...

Is multicast or broadcast the right way? I don't know, but I do know that without trying we will never know. Having been part of the IETF

It's clearly right for some things - I'm just not sure how much bi-directional distribution would be helped by it, since you've got at some point to get the replies back.

...

community for a lot of years (I was part of the group that worked on SNMP v1 and the WinSock standard), I know that when the "pedal meets the metal" sometimes you discover interesting things.

I didn't realise winsock went near the IETF. You learn something new every day.

...

...
Knuth and his comments on early optimisation apply here. Have you tried it? You might be surprised.

I am sorry to say I don't know the paper or research you are referring to. Can you point me to some references?

Sorry, it's a phrase from Donald Knuth's (excellent) three-volume programming book, "The Art of Computer Programming". Highly recommended.

...

Thanks for the information. This is what makes me think that I want something based on UDP and not TCP! And if I can do RMT (or some variant of it) I might be able to get better performance. But, as I said it is the nice thing about not having someone telling me I need to get a product out the door tomorrow! I have time to experiment and learn.

When I wrote my reply I hadn't seen your comment on the app being distributed storage.

...

...
How many calls per second are you doing, and approximately what volume of data will each call exchange?

This is information I can't provide since the system I have designing has no equivalent in the marketplace today (either commercial or open source). All I know is that the first version of the system I built - using C/C++ and a traditional architecture (a few dozens of machines) was able to handle 200 transactions/minute (using SOAP). While there were some "short messages" (less than an normal MTU), I had quite a few that topped out 50K bytes and some up to 100Mbytes.

Oh. In which case more or less everything I wrote is useless!

...

...
Successful transmission is really the easy bit for multicast. There is IGMP snooping, IGMP querier misbehaviour, loss of forwarding on an upstream IGP flap, flooding issues due to global MSDP issues, and so forth.

I agree about the successful transmission. You've lost me on the IGMP part. Can you elaborate as to your thoughts?

Well, my experience of large multicast IPv4 networks is that short interruptions in multicast connectivity are not uncommon. There are a number of reasons for this, which can be broadly broken down into 1st hop and subsequent hop issues. Basically, in a routed-multicast environment, I've seen the subnet IGMP querier (normally the gateway) get pre-empted by badly configured or plain broken OS stacks (e.g. someone running Linux with the IGMPv3 early patches). I've also seen confusion for highly-available subnets (e.g. VRRPed networks) where the IGMP querier and the multicast designated forwarder are different. This can cause issues with the IGMP snooping on the downstream layer2 switches when the DF is no longer on the path which the layer2 snooping builds. You also get issues with upstream changes in the unicast routing topology affecting PIM. Most of these are only issues with routed multicast. Subnet-local is a lot simpler, though you do still need an IGMP querier and switches with IGMP snooping.

...

I do know I need EXACTLY-ONCE semantics but how and where I implement them is the unknown. When you use TCP you assume the network provides the bulk of the solution. I have been thinking that if I use a less reliable network - one with low overhead - that I can provide the server part to do the EXACTLY-ONCE piece.

As to why I need EXACTLY-ONCE, well if I have to store something I know I absolutely need to store it. I can't be in the position that I don't know it has been stored - it must be there.

Thanks for the great remarks....I look forward to reading more.

This makes a lot more sense now I know it's storage related. You're right, this is a tricky and uncommon problem. Let me see if I've got this right: You're building some kind of distributed storage service. Clients will access the storage by a "normal" protocol to one of the nodes. Reads from the store are relatively easy, but writes to the store will need to be distributed to all or a subset of the nodes. Obviously you'll have a mix of lots of small writes and some very large writes. Hmm. Are you envisioning that you might have >1 storage set on the nodes, and using a different multicast group per storage set to build optimal distribution? You might be able to perform some tricks depending on whether this service provides block- or filesystem-level semantics. If it's the latter, you could import some techniques from the distributed version control arena - broadly speaking, node+version number each file and "broadcast" (in the application sense) just the file + newnode+newversion to the other store nodes, and have them lock the local copy and initiate a pull from the updated node. For block-level storage, that's going to be a lot harder. For the multicast, something like NORM, which as you probably know is basically forward-error-corrected transmit channel with receiver-triggered re-transmits, would probably work. An implementation would likely be non-trivial, but a fascinating project.

Chaz.

2:14 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

Phil Mayers wrote:

...

Chaz. wrote:

...
I started out using Spread some time ago (more than 2 years ago). The implementation was limited to a hundred or so nodes (that is in the notes on the spread implementation). Secondly it isn't quite so lightweight as you think (I've measured the performance).

It is a very nice system but when it gets to 1000s of machines very little work has been done on solving many of the problems. My research on it goes back almost a decade starting out with Horus.

I must admit to not having attempted to scale it that far, but I was under the impression that only the more expensive delivery modes were that costly. But by the sounds of it, you don't need me to tell you that.

Originally I had started out thinking about work surrounding Horus and researched a lot of the group communication stuff. When I got to Spread I tried it thinking it would solve all my problems. I actually built a system using it only to be sadly disappointed. First I hit the 100+ node limit. Then I got to the static configuration, which I spent time trying to overcome. Finally when I did some measurements I decided that 1000s of machines would require a "hub-and-spoke" like architecture that Glyph suggested. I decided it was much too complicated and backed away. Since that time I let the pendulum swing to the other extreme - no predefined architecture (or aggregation of machines). I want to examine what happens when I have 1000s of machines without a topology; can I solve the problems. As I said I solved the decentralized membership list issue. Now I am on to the harder problem: can I get RPC-like semantics with reasonable performance over the 1000s of machines? I don't know.

...

...
Actually I am part of the IRTF group on P2P, E2E and SAM. I know the approaches they are being tossed about. I have tried to implement some of them. I just am not of the opinion that smart people can't find solutions to tough problems.

Ok, in which case my apologies. My reading of your posts had lead me to believe, incorrectly, you may not be familiar with the various issues. In that case, you can (should) disregard most of it.

I think the problem is on my part. I asked what I thought was an obvious question without laying the groundwork as to what I knew or how.

...

...
Is multicast or broadcast the right way? I don't know, but I do know that without trying we will never know. Having been part of the IETF

It's clearly right for some things - I'm just not sure how much bi-directional distribution would be helped by it, since you've got at some point to get the replies back.

I think I feel comfortable with using multicast (or broadcast) for the invoking RPC call. What I don't have a clear feeling for is how to correctly handle the response - I know I can't send them all within some small delta without congesting the network. So I am looking at all sorts of techniques (like holding off the responses, randomly...but I don't know how that will impact retries, etc).

...

...
community for a lot of years (I was part of the group that worked on SNMP v1 and the WinSock standard), I know that when the "pedal meets the metal" sometimes you discover interesting things.

I didn't realise winsock went near the IETF. You learn something new every day.

Me too!

...

...
...
Knuth and his comments on early optimisation apply here. Have you tried it? You might be surprised.

I am sorry to say I don't know the paper or research you are referring to. Can you point me to some references?

Sorry, it's a phrase from Donald Knuth's (excellent) three-volume programming book, "The Art of Computer Programming". Highly recommended.

Ah, ok. Having read them so many years ago I forgot most of it. lol..

...

...
Thanks for the information. This is what makes me think that I want something based on UDP and not TCP! And if I can do RMT (or some variant of it) I might be able to get better performance. But, as I said it is the nice thing about not having someone telling me I need to get a product out the door tomorrow! I have time to experiment and learn.

When I wrote my reply I hadn't seen your comment on the app being distributed storage.

...
...
How many calls per second are you doing, and approximately what volume of data will each call exchange?

This is information I can't provide since the system I have designing has no equivalent in the marketplace today (either commercial or open source). All I know is that the first version of the system I built - using C/C++ and a traditional architecture (a few dozens of machines) was able to handle 200 transactions/minute (using SOAP). While there were some "short messages" (less than an normal MTU), I had quite a few that topped out 50K bytes and some up to 100Mbytes.

Oh. In which case more or less everything I wrote is useless!

Well I don't think so. Based on your multicast comment I wonder about broadcast...have you ever seen the same thing happen? When you say "short interruptions" are we talking more than seconds? Can you elaborate a bit?

...

...
...
Successful transmission is really the easy bit for multicast. There is IGMP snooping, IGMP querier misbehaviour, loss of forwarding on an upstream IGP flap, flooding issues due to global MSDP issues, and so forth.

I agree about the successful transmission. You've lost me on the IGMP part. Can you elaborate as to your thoughts?

Well, my experience of large multicast IPv4 networks is that short interruptions in multicast connectivity are not uncommon. There are a number of reasons for this, which can be broadly broken down into 1st hop and subsequent hop issues.

Basically, in a routed-multicast environment, I've seen the subnet IGMP querier (normally the gateway) get pre-empted by badly configured or plain broken OS stacks (e.g. someone running Linux with the IGMPv3 early patches). I've also seen confusion for highly-available subnets (e.g. VRRPed networks) where the IGMP querier and the multicast designated forwarder are different. This can cause issues with the IGMP snooping on the downstream layer2 switches when the DF is no longer on the path which the layer2 snooping builds.

You also get issues with upstream changes in the unicast routing topology affecting PIM.

Most of these are only issues with routed multicast. Subnet-local is a lot simpler, though you do still need an IGMP querier and switches with IGMP snooping.

...
I do know I need EXACTLY-ONCE semantics but how and where I implement them is the unknown. When you use TCP you assume the network provides the bulk of the solution. I have been thinking that if I use a less reliable network - one with low overhead - that I can provide the server part to do the EXACTLY-ONCE piece.

As to why I need EXACTLY-ONCE, well if I have to store something I know I absolutely need to store it. I can't be in the position that I don't know it has been stored - it must be there.

Thanks for the great remarks....I look forward to reading more.

This makes a lot more sense now I know it's storage related.

You're right, this is a tricky and uncommon problem.

Let me see if I've got this right:

You're building some kind of distributed storage service. Clients will access the storage by a "normal" protocol to one of the nodes. Reads from the store are relatively easy, but writes to the store will need to be distributed to all or a subset of the nodes. Obviously you'll have a mix of lots of small writes and some very large writes.

Hmm.

Are you envisioning that you might have >1 storage set on the nodes, and using a different multicast group per storage set to build optimal distribution?

You might be able to perform some tricks depending on whether this service provides block- or filesystem-level semantics. If it's the latter, you could import some techniques from the distributed version control arena - broadly speaking, node+version number each file and "broadcast" (in the application sense) just the file + newnode+newversion to the other store nodes, and have them lock the local copy and initiate a pull from the updated node.

For block-level storage, that's going to be a lot harder.

Definitely! Right now I dealing on the filesystem level. Doing block level would be incredibly difficult. I am trying to solve the simpler problem first! lol.

...

For the multicast, something like NORM, which as you probably know is basically forward-error-corrected transmit channel with receiver-triggered re-transmits, would probably work. An implementation would likely be non-trivial, but a fascinating project.

Right now I am trying to find a solution to an interesting problem: how to find a file without knowing exactly where it exists in the network. You have to do this to make the system scale nicely. Basically each node holds information about the files (aka objects) it stores. I do this so that I don't have a central database any where (this allows the system to scale differently. With a central database I would have that set of servers scale differently than the storage nodes). Now I can build a set of machines that are the distributed database machines - each storing something - and querying them for where the file lives; this would narrow the machines I have to directly talk to, but it feels wrong. This is sort of a variation of the hub-and-spoke that Glyph talked about. But having said that I am trying to determine if I can get away from that and just go to a very unstructured environment (without intermediate database nodes). As I said I have time to experiment before I put the code in the open source community ... Peace, Chaz

glyph＠divmod.com

9:43 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

On Sat, 26 Aug 2006 10:14:48 -0400, "Chaz." <eprparadocs@gmail.com> wrote:

...

Right now I am trying to find a solution to an interesting problem: how to find a file without knowing exactly where it exists in the network. You have to do this to make the system scale nicely.

...

Basically each node holds information about the files (aka objects) it stores. I do this so that I don't have a central database any where (this allows the system to scale differently. With a central database I would have that set of servers scale differently than the storage nodes).

...

Now I can build a set of machines that are the distributed database machines - each storing something - and querying them for where the file lives; this would narrow the machines I have to directly talk to, but it feels wrong. This is sort of a variation of the hub-and-spoke that Glyph talked about. But having said that I am trying to determine if I can get away from that and just go to a very unstructured environment (without intermediate database nodes).

This sounds an awful lot like a distributed hashtable. It does implicitly use an overlay network, but not a hub-and-spoke overlay network. I'm not intimately familiar with the algorithms involved, so rather than try to describe them, I'll just refer you to the relatively nice wikipedia page on the topic: http://en.wikipedia.org/wiki/Distributed_hash_table There is also a project in Python (not Twisted though) which may serve as an example: http://thecircle.org.au/ Are these ideas useful? Have you looked at them before?

Chaz.

11:25 p.m.

New subject: [Twisted-Python] Multicast XMLRPC

glyph@divmod.com wrote:

...

On Sat, 26 Aug 2006 10:14:48 -0400, "Chaz." <eprparadocs@gmail.com> wrote:

...
Right now I am trying to find a solution to an interesting problem: how to find a file without knowing exactly where it exists in the network. You have to do this to make the system scale nicely.

...
Basically each node holds information about the files (aka objects) it stores. I do this so that I don't have a central database any where (this allows the system to scale differently. With a central database I would have that set of servers scale differently than the storage nodes).

...
Now I can build a set of machines that are the distributed database machines - each storing something - and querying them for where the file lives; this would narrow the machines I have to directly talk to, but it feels wrong. This is sort of a variation of the hub-and-spoke that Glyph talked about. But having said that I am trying to determine if I can get away from that and just go to a very unstructured environment (without intermediate database nodes).

This sounds an awful lot like a distributed hashtable. It does implicitly use an overlay network, but not a hub-and-spoke overlay network.

I'm not intimately familiar with the algorithms involved, so rather than try to describe them, I'll just refer you to the relatively nice wikipedia page on the topic:

http://en.wikipedia.org/wiki/Distributed_hash_table

There is also a project in Python (not Twisted though) which may serve as an example:

http://thecircle.org.au/

Are these ideas useful? Have you looked at them before?

_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

As I understand DHT the concept is to create a hash identifier, partition it into "chunks", and use the chunks to locate the file. It is an interesting idea and certainly one approach. I am keeping it in my back pocket. There are many reasons I don't like this approach. First, with a poorly segmented hash, you can have a few levels of indirection before reaching the file. You can see this in a lot of p2p file sharing system. I would like to see if I can overcome this performance penalty (another problem is DHT works well in a very sparse environment, so the hash keys have to be pretty big. That means more intermediate nodes). The second issue is one unique to data storage systems: I need to have multiple copies of the file around. So I had thought if I do a DHT I will just keep copies all along the path. That should solve the problem of access quickly and copies. The third issue - and this one I had more difficulty grasping - is that once an intermediate node disappears, its contents have to be passed on to someone else. Also the link from the prior node to this one (the one going away) has to be adjusted. What is the problem? It is quite possible that the node would have millions of files on it, hence copying it is impossible. That means I have to keep exact copies at multiple sites, at the same time (definitely smaller than the entire space of all the peers). But the real problem is that in a network of 1000s of machines it is quite possible the the two I am using to store indices on can disappear at the same time (granted small, but still a problem). So I opted to look at another approach, the one that I started talking about - using broadcast or multicast with some sort of RPC-like mechanism and light weight protocol applied over a lot of machines. This approach hasn't been well researched, almost being excluded out of hand. I decided it was at least worth investigating. It solves some problems like scalability and easy management. The downside is that I have to worry about building a lightweight protocol and handle RPC like AT LEAST ONCE semantics instead of EXACTLY ONCE. Glyph, thanks for the references. I will definitely look up 'thecircle' stuff. That one I didn't know about! Peace, Chaz

Chaz.

6:30 p.m.

New subject: [Twisted-Python] Not subclassing internet.TCPServer

In my code : class StorageService(internet.TCPServer): def __init__(self,tcpPort,configInfo): r = StoragePeer(configInfo) xmlrpc.addIntrospection(r) internet.TCPServer.__init__(self,tcpPort,server.Site(r)) I specifically used internet.TCPServer as a parent class so that I could do a .setServiceParent() on the object returned. If I rewrite the code as: class StorageService: def __init__(self,tcpPort,configInfo): r = StoragePeer(configInfo) xmlrpc.addIntrospection(r) self.r = internet.TCPServer(tcpPort,server.Site(r)) def setServiceParent(self,arg) : self.r.setServiceParent(arg) I get an equivalent effect. So my question is why is subclassing internet.TCPServer not a good idea? Chaz

glyph＠divmod.com

6:20 p.m.

New subject: [Twisted-Python] Not subclassing internet.TCPServer

On Thu, 24 Aug 2006 14:30:48 -0400, "Chaz." <eprparadocs@gmail.com> wrote:

...

So my question is why is subclassing internet.TCPServer not a good idea?

In the general case, the short version is: "Because we wrote that code, and we say so." Usually I am not a big fan of the "argument from authority", but in this case, it has a special significance. When a developer on a library you're using says "this is the correct, supported method to use interface XYZ", that means that is the method they are going to be supporting in the future, and unsupported usages may break. It is in your best interest to stick to the supported usage if you ever intend to upgrade that library. In a future version of twisted.internet.application, for example, it may be deemed a good idea to make TCPServer and friends a set of functions rather than classes for some reason. Since you're supposed to be calling them and not subclassing them, that usage will continue to work, but subclassing won't. Calling is generally better than subclassing anyway. When you subclass, a number of undesirable things happen: in any language with multiple inheritance this is a problem, but in Python especially, you inherit all sorts of things from your superclass other than simple functionality. For one thing, object semantics. Maybe you used to be a classic class, but subclassing turns you into a new-style class before you're ready to make that switch. Maybe your superclass has a bizarre metaclass which performs some mutation you didn't expect. Maybe it has a __new__ which does something weird. Then you inherit state. Your 'self.' namespace becomes polluted with additional variable names that may conflict with your own. These names may change in future releases, so even if they don't conflict now, they may in future releases. While inheritance can be a useful tool, it is a lot more complex than composition, so you should generally avoid it unless all these ugly side-effects are actually desirable properties in your particular application. In the case of twisted.application.internet, they are not. That's not to say they *never* are: default implementation classes paired with interfaces can insulate your applications from certain varieties of change in libraries, and sometimes all those object-model features described as annoyances above are incredibly useful (most usually in persistence libraries like Axiom or ZODB).

6752

Age (days ago)

6754

Last active (days ago)

List overview

Download

25 comments

8 participants

participants (8)

Chaz.
glyph＠divmod.com
Itamar Shtull-Trauring
Jean-Paul Calderone
L. Daniel Burr
Pablo Marti
Phil Mayers
Samuel Reynolds

Re: [Twisted-Python] Multicast XMLRPC

Chaz.

L. Daniel Burr

glyph＠divmod.com

Chaz.

Itamar Shtull-Trauring

Chaz.

Samuel Reynolds

Chaz.

Jean-Paul Calderone

Chaz.

Jean-Paul Calderone

glyph＠divmod.com

Chaz.

glyph＠divmod.com

Chaz.

glyph＠divmod.com

Chaz.

Phil Mayers

Chaz.

Pablo Marti

Phil Mayers

Chaz.

glyph＠divmod.com

Chaz.

Chaz.

glyph＠divmod.com

tags

participants (8)