[Twisted-Python] Supporting a two-part client protocol.
![](https://secure.gravatar.com/avatar/bf006aac4f248c75a22c3446679235d6.jpg?s=120&d=mm&r=g)
Hi there, I'm planning to use Twisted to write a client for the following protocol: Each messages is composed of two separate messages: 1. A header, which is a serialized C struct, containing multiple fields, among them a `length` field. 2. A Protocol Buffer payload, which length is specified by the aforementioned `length` field on the header. While the initial implementation is focused on TCP, I do hope to support this same protocol over UDP eventually. What's the best way for me to implement such a client with Twisted? Specifically, to implement support for sending/receiving messages in the above format to/from a server? Thanks, Go
![](https://secure.gravatar.com/avatar/e589db6c27c54b03de756cae2843dba5.jpg?s=120&d=mm&r=g)
On Mon, Feb 3, 2020 at 8:06 PM Go Luhng <goluhng@gmail.com> wrote:
Assuming the header has a fixed length, https://twistedmatrix.com/documents/current/api/twisted.protocols.basic.IntN... and its more-concrete subclasses are a decent source of inspiration. OTOH, that's for stream protocols, so if you want to eventually handle UDP, it's probably nicer to do the full sans-io thing (https://sans-io.readthedocs.io/) and wire it up with a more-basic Twisted protocol. Well, that's probably the better approach in any case.
![](https://secure.gravatar.com/avatar/bf006aac4f248c75a22c3446679235d6.jpg?s=120&d=mm&r=g)
Colin Dunklau wrote:
Assuming the header has a fixed length,
It does. The header is just a serialized C struct, so it's fully-specified for length and offset of each field.
Could you elaborate on this? I'm new to Twisted, and also unfamiliar with sans-io. Specifically, I'm wondering what type of "more-basic" Twisted protocol you mean.
![](https://secure.gravatar.com/avatar/e589db6c27c54b03de756cae2843dba5.jpg?s=120&d=mm&r=g)
On Tue, Feb 4, 2020 at 1:18 AM Go Luhng <goluhng@gmail.com> wrote:
The sans-io pattern is described well at that site, including a link to Cory Benfield's great talk (https://www.youtube.com/watch?v=7cC3_jGwl_U). The idea is to keep your protocol logic strictly separate from anything that does IO (like a Twisted Protocol's `dataReceived` and its transport's `write` method, or an asyncio thing, or blocking socket operations, etc), to make it easier to test and reuse. By "more-basic" I mean twisted.internet.protocol.Protocol and twisted.internet.protocol.DatagramProtocol. If you don't go full sans-io, I'd still strongly recommend splitting up you protocol implementation into distinct pieces. Twisted protocols can become hard to reason about when they become implicit state machines... avoid it by making a separate, explicit state machine and use that in the Protocol, instead of dumping the bits on the protocol instance itself. This way you at least still get the testability.
![](https://secure.gravatar.com/avatar/cf223b7cf77583c0a2665bad01f84f11.jpg?s=120&d=mm&r=g)
On Tuesday, 4 February 2020 07:39:11 GMT Colin Dunklau wrote:
The sans-io is a worth considering. The advice to use explicit state machines I fully endorse. I'm maintaining some code that uses an implicit state machine and its a pain to reason about and avoid bugs with. Barry
![](https://secure.gravatar.com/avatar/bf006aac4f248c75a22c3446679235d6.jpg?s=120&d=mm&r=g)
Thanks Colin and Barry for the reply. I read the sans-io docs and it is an attractive approach. I believe I have a plan going forward, but I'm not sure what you mean by explicit vs implicit state machine, if you care to elaborate.
![](https://secure.gravatar.com/avatar/e589db6c27c54b03de756cae2843dba5.jpg?s=120&d=mm&r=g)
On Tue, Feb 4, 2020 at 6:12 PM Go Luhng <goluhng@gmail.com> wrote:
IntNStringReceiver has a state machine, but it's embedded in the protocol implementation, so it's implicit: https://github.com/twisted/twisted/blob/twisted-19.10.0/src/twisted/protocol... It's not that easy to tell what's going on there, at first glance. The dataReceived method has _most_ of the state machine implementation, but it fiddles with instance attributes, and that length check in sendString could be considered a parser detail, rather than part of the protocol itself. The situation with LineOnlyReceiver is similar: https://github.com/twisted/twisted/blob/twisted-19.10.0/src/twisted/protocol... Now that one is simple enough that it's reasonably clear what's going on... but it's a good candidate for a simple example (analysis first, code after). This is clearly more code, but the benefit from its clearer separation of concerns is a boon... especially given that this is a reeeeal simple example dealing with one of the simplest possible protocols. Your protocol will undoubtedly be much more complex, so the benefit should be a lot clearer. In the original, the parsing details are mixed in with the higher-level semantics of the protocol, especially with respect to the max line length handling. In the "composed" version (admittedly not the best name), the parser is explicit, and entirely divorced from the protocol. It's easier to understand, simpler (even trivial) to test in isolation, and winds up being reusable outside of a Twisted Protocol. Hey, this is starting to sound like that sans-io thingie! To map LineParser's semantics to sans-io terminology, readData is for getting "input", and iterLines (actually the generator iterator it makes) produces "events": a "line event", or a "line too darn long" event (via the exception). Link for easier viewing (https://gist.github.com/cdunklau/4f8c72222295680ca20e3d4401f385b1), reproduced here for list archive posterity: import collections from twisted.internet import protocol class LineParser(object): def __init__(self, delimiter, max_length): self.delimiter = delimiter self.max_length = max_length self._buffer = b'' self._lines = collections.deque() def readData(self, data): lines = (self._buffer + data).split(self.delimiter) self._buffer = lines.pop() self._lines.extend(lines) def iterLines(self): while self._lines: line = self._lines.popleft() if len(line) > self.max_length: raise LineLengthExceeded(line) yield line if len(self._buffer) > self.max_length: raise LineLengthExceeded(self._buffer) class LineLengthExceeded(Exception): def __init__(self, culprit): super().__init__(culprit) self.culprit = culprit class ComposedLineOnlyReceiver(protocol.Protocol): delimiter = b'\r\n' MAX_LENGTH = 16384 _parser = None def dataReceived(self, data): """ Translates bytes into lines, and calls lineReceived. """ if self._parser is None: self._parser = LineParser(self.delimiter, self.MAX_LENGTH) self._parser.readData(data) try: for line in self._parser.iterLines(): if self.transport.disconnecting: # this is necessary because the transport may be told to lose # the connection by a line within a larger packet, and it is # important to disregard all the lines in that packet following # the one that told it to close. return self.lineReceived(line) except LineLengthExceeded as e: return self.lineLengthExceeded(e.culprit) def lineReceived(self, line): """ Override this for when each line is received. @param line: The line which was received with the delimiter removed. @type line: C{bytes} """ raise NotImplementedError def sendLine(self, line): return self.transport.writeSequence((line, self.delimiter)) def lineLengthExceeded(self, line): return self.transport.loseConnection()
![](https://secure.gravatar.com/avatar/e589db6c27c54b03de756cae2843dba5.jpg?s=120&d=mm&r=g)
On Tue, Feb 4, 2020 at 6:12 PM Go Luhng <goluhng@gmail.com> wrote:
I realize now that in my previous reply I conflated state machine with parser state. Sorry about that! Neither IntNStringReceiver nor LineOnlyReceiver has much in the way of state. LineReceiver does, but it's a simple flag (line or raw mode). conch.ssh's SSHTransportBase has more: https://github.com/twisted/twisted/blob/twisted-19.10.0/src/twisted/conch/ss... This is reasonably explicit. It's still mixed in with the protocol methods, but the states are at least explicitly declared. I wasn't able to find an example in Twisted of an implicit state machine. Maybe someone else has a concrete example somewhere?
![](https://secure.gravatar.com/avatar/cf223b7cf77583c0a2665bad01f84f11.jpg?s=120&d=mm&r=g)
On Wednesday, 5 February 2020 08:48:41 GMT Colin Dunklau wrote:
I wasn't able to find an example in Twisted of an implicit state machine. Maybe someone else has a concrete example somewhere?
There is an example of an explicit state machine in the twisted code for http chunked transfer encoding. Its in https://github.com/twisted/twisted/blob/trunk/src/twisted/web/ http.py#L1779 If there is code that assumes that it can react directly off of the events from a framework like twisted. When events happen in an unexpected sequence you can end with errors. For example you can get a connectionLost event at any time. If you have a state machine it is easy to know what the clean actions will be. But if there is no explicit state the code may not have the information it needs to handle the connectionLost in an appropriate ways. The situation with states and events only gets more complex when there are deferred's that run after a connectionLost event and assume a connect is still live. If such a deferred can check an explicit state it is far easier to make the code work appropriately. Barry
![](https://secure.gravatar.com/avatar/bf006aac4f248c75a22c3446679235d6.jpg?s=120&d=mm&r=g)
Thanks for the detailed responses, Colin and Barry. I have a followup question about sans-io. From the document:
This sounds very nice, however with certain protocols, you canonly decode serialized events based on context. For example, the parser needs to know if we're at the handshake stage, or the regular communication stage, or the shutdown stage, because different formats of messages are sent by the server at each stage. How does this elegantly fit into the scheme of sans-io?
![](https://secure.gravatar.com/avatar/cf223b7cf77583c0a2665bad01f84f11.jpg?s=120&d=mm&r=g)
On Thursday, 6 February 2020 16:02:40 GMT Go Luhng wrote:
That's not a protocol problem its an implementation problem surely?
The sans-io (as I remember it) says don't put IO details into your protocol code, abstract it/have clean API boundaries. (Makes it easy to write unit tests as you do not need to have a network stack. Just make the API calls: dataReceived, connectionLost, timeout etc). Its the job of your protocol code to have a state machine and know where is at any point in time. What you are calling "stage" sounds like a "state" of the state machine. The pattern in very crude outline is: Define all the events that the protocol must handle. Define all the states the protocol needs. When each event is received do the state specific action and change state. You can see this in the chunked encoding code, with one way to implement the state machine. Also note that its usual for there is be more then one state machine in most protocols. Using HTTP as an example it needs to handle the command, headers then the body handling the command and headers means splitting the byte stream into lines. Once you have the headers you can figure out how to process the body. 1. State machine for overall HTTP status, headers, body 2. State machine to split bytes received into header lines 3. State machine for chunked body encoding 4. State machine for Content-Length body encoding. Barry
![](https://secure.gravatar.com/avatar/e589db6c27c54b03de756cae2843dba5.jpg?s=120&d=mm&r=g)
On Mon, Feb 3, 2020 at 8:06 PM Go Luhng <goluhng@gmail.com> wrote:
Assuming the header has a fixed length, https://twistedmatrix.com/documents/current/api/twisted.protocols.basic.IntN... and its more-concrete subclasses are a decent source of inspiration. OTOH, that's for stream protocols, so if you want to eventually handle UDP, it's probably nicer to do the full sans-io thing (https://sans-io.readthedocs.io/) and wire it up with a more-basic Twisted protocol. Well, that's probably the better approach in any case.
![](https://secure.gravatar.com/avatar/bf006aac4f248c75a22c3446679235d6.jpg?s=120&d=mm&r=g)
Colin Dunklau wrote:
Assuming the header has a fixed length,
It does. The header is just a serialized C struct, so it's fully-specified for length and offset of each field.
Could you elaborate on this? I'm new to Twisted, and also unfamiliar with sans-io. Specifically, I'm wondering what type of "more-basic" Twisted protocol you mean.
![](https://secure.gravatar.com/avatar/e589db6c27c54b03de756cae2843dba5.jpg?s=120&d=mm&r=g)
On Tue, Feb 4, 2020 at 1:18 AM Go Luhng <goluhng@gmail.com> wrote:
The sans-io pattern is described well at that site, including a link to Cory Benfield's great talk (https://www.youtube.com/watch?v=7cC3_jGwl_U). The idea is to keep your protocol logic strictly separate from anything that does IO (like a Twisted Protocol's `dataReceived` and its transport's `write` method, or an asyncio thing, or blocking socket operations, etc), to make it easier to test and reuse. By "more-basic" I mean twisted.internet.protocol.Protocol and twisted.internet.protocol.DatagramProtocol. If you don't go full sans-io, I'd still strongly recommend splitting up you protocol implementation into distinct pieces. Twisted protocols can become hard to reason about when they become implicit state machines... avoid it by making a separate, explicit state machine and use that in the Protocol, instead of dumping the bits on the protocol instance itself. This way you at least still get the testability.
![](https://secure.gravatar.com/avatar/cf223b7cf77583c0a2665bad01f84f11.jpg?s=120&d=mm&r=g)
On Tuesday, 4 February 2020 07:39:11 GMT Colin Dunklau wrote:
The sans-io is a worth considering. The advice to use explicit state machines I fully endorse. I'm maintaining some code that uses an implicit state machine and its a pain to reason about and avoid bugs with. Barry
![](https://secure.gravatar.com/avatar/bf006aac4f248c75a22c3446679235d6.jpg?s=120&d=mm&r=g)
Thanks Colin and Barry for the reply. I read the sans-io docs and it is an attractive approach. I believe I have a plan going forward, but I'm not sure what you mean by explicit vs implicit state machine, if you care to elaborate.
![](https://secure.gravatar.com/avatar/e589db6c27c54b03de756cae2843dba5.jpg?s=120&d=mm&r=g)
On Tue, Feb 4, 2020 at 6:12 PM Go Luhng <goluhng@gmail.com> wrote:
IntNStringReceiver has a state machine, but it's embedded in the protocol implementation, so it's implicit: https://github.com/twisted/twisted/blob/twisted-19.10.0/src/twisted/protocol... It's not that easy to tell what's going on there, at first glance. The dataReceived method has _most_ of the state machine implementation, but it fiddles with instance attributes, and that length check in sendString could be considered a parser detail, rather than part of the protocol itself. The situation with LineOnlyReceiver is similar: https://github.com/twisted/twisted/blob/twisted-19.10.0/src/twisted/protocol... Now that one is simple enough that it's reasonably clear what's going on... but it's a good candidate for a simple example (analysis first, code after). This is clearly more code, but the benefit from its clearer separation of concerns is a boon... especially given that this is a reeeeal simple example dealing with one of the simplest possible protocols. Your protocol will undoubtedly be much more complex, so the benefit should be a lot clearer. In the original, the parsing details are mixed in with the higher-level semantics of the protocol, especially with respect to the max line length handling. In the "composed" version (admittedly not the best name), the parser is explicit, and entirely divorced from the protocol. It's easier to understand, simpler (even trivial) to test in isolation, and winds up being reusable outside of a Twisted Protocol. Hey, this is starting to sound like that sans-io thingie! To map LineParser's semantics to sans-io terminology, readData is for getting "input", and iterLines (actually the generator iterator it makes) produces "events": a "line event", or a "line too darn long" event (via the exception). Link for easier viewing (https://gist.github.com/cdunklau/4f8c72222295680ca20e3d4401f385b1), reproduced here for list archive posterity: import collections from twisted.internet import protocol class LineParser(object): def __init__(self, delimiter, max_length): self.delimiter = delimiter self.max_length = max_length self._buffer = b'' self._lines = collections.deque() def readData(self, data): lines = (self._buffer + data).split(self.delimiter) self._buffer = lines.pop() self._lines.extend(lines) def iterLines(self): while self._lines: line = self._lines.popleft() if len(line) > self.max_length: raise LineLengthExceeded(line) yield line if len(self._buffer) > self.max_length: raise LineLengthExceeded(self._buffer) class LineLengthExceeded(Exception): def __init__(self, culprit): super().__init__(culprit) self.culprit = culprit class ComposedLineOnlyReceiver(protocol.Protocol): delimiter = b'\r\n' MAX_LENGTH = 16384 _parser = None def dataReceived(self, data): """ Translates bytes into lines, and calls lineReceived. """ if self._parser is None: self._parser = LineParser(self.delimiter, self.MAX_LENGTH) self._parser.readData(data) try: for line in self._parser.iterLines(): if self.transport.disconnecting: # this is necessary because the transport may be told to lose # the connection by a line within a larger packet, and it is # important to disregard all the lines in that packet following # the one that told it to close. return self.lineReceived(line) except LineLengthExceeded as e: return self.lineLengthExceeded(e.culprit) def lineReceived(self, line): """ Override this for when each line is received. @param line: The line which was received with the delimiter removed. @type line: C{bytes} """ raise NotImplementedError def sendLine(self, line): return self.transport.writeSequence((line, self.delimiter)) def lineLengthExceeded(self, line): return self.transport.loseConnection()
![](https://secure.gravatar.com/avatar/e589db6c27c54b03de756cae2843dba5.jpg?s=120&d=mm&r=g)
On Tue, Feb 4, 2020 at 6:12 PM Go Luhng <goluhng@gmail.com> wrote:
I realize now that in my previous reply I conflated state machine with parser state. Sorry about that! Neither IntNStringReceiver nor LineOnlyReceiver has much in the way of state. LineReceiver does, but it's a simple flag (line or raw mode). conch.ssh's SSHTransportBase has more: https://github.com/twisted/twisted/blob/twisted-19.10.0/src/twisted/conch/ss... This is reasonably explicit. It's still mixed in with the protocol methods, but the states are at least explicitly declared. I wasn't able to find an example in Twisted of an implicit state machine. Maybe someone else has a concrete example somewhere?
![](https://secure.gravatar.com/avatar/cf223b7cf77583c0a2665bad01f84f11.jpg?s=120&d=mm&r=g)
On Wednesday, 5 February 2020 08:48:41 GMT Colin Dunklau wrote:
I wasn't able to find an example in Twisted of an implicit state machine. Maybe someone else has a concrete example somewhere?
There is an example of an explicit state machine in the twisted code for http chunked transfer encoding. Its in https://github.com/twisted/twisted/blob/trunk/src/twisted/web/ http.py#L1779 If there is code that assumes that it can react directly off of the events from a framework like twisted. When events happen in an unexpected sequence you can end with errors. For example you can get a connectionLost event at any time. If you have a state machine it is easy to know what the clean actions will be. But if there is no explicit state the code may not have the information it needs to handle the connectionLost in an appropriate ways. The situation with states and events only gets more complex when there are deferred's that run after a connectionLost event and assume a connect is still live. If such a deferred can check an explicit state it is far easier to make the code work appropriately. Barry
![](https://secure.gravatar.com/avatar/bf006aac4f248c75a22c3446679235d6.jpg?s=120&d=mm&r=g)
Thanks for the detailed responses, Colin and Barry. I have a followup question about sans-io. From the document:
This sounds very nice, however with certain protocols, you canonly decode serialized events based on context. For example, the parser needs to know if we're at the handshake stage, or the regular communication stage, or the shutdown stage, because different formats of messages are sent by the server at each stage. How does this elegantly fit into the scheme of sans-io?
![](https://secure.gravatar.com/avatar/cf223b7cf77583c0a2665bad01f84f11.jpg?s=120&d=mm&r=g)
On Thursday, 6 February 2020 16:02:40 GMT Go Luhng wrote:
That's not a protocol problem its an implementation problem surely?
The sans-io (as I remember it) says don't put IO details into your protocol code, abstract it/have clean API boundaries. (Makes it easy to write unit tests as you do not need to have a network stack. Just make the API calls: dataReceived, connectionLost, timeout etc). Its the job of your protocol code to have a state machine and know where is at any point in time. What you are calling "stage" sounds like a "state" of the state machine. The pattern in very crude outline is: Define all the events that the protocol must handle. Define all the states the protocol needs. When each event is received do the state specific action and change state. You can see this in the chunked encoding code, with one way to implement the state machine. Also note that its usual for there is be more then one state machine in most protocols. Using HTTP as an example it needs to handle the command, headers then the body handling the command and headers means splitting the byte stream into lines. Once you have the headers you can figure out how to process the body. 1. State machine for overall HTTP status, headers, body 2. State machine to split bytes received into header lines 3. State machine for chunked body encoding 4. State machine for Content-Length body encoding. Barry
participants (3)
-
Barry Scott
-
Colin Dunklau
-
Go Luhng