[Twisted-Python] Using StandardIO and pipes.

Hello! I am very new to Twisted and I have some problems using stdin.StandardIO. Here is the code: # item.py file from twisted.internet import reactor, stdio from twisted.protocols import basic from storage import Item class EchoItemProtocol(basic.LineReceiver): delimiter = '\n' def lineReceived(self, item_id): # Item.get returns a deferred d = Item.get(item_id) def writeResponse(item): self.transport.write(str(item) + '\n') d.addCallback(writeResponse) def main(): stdio.StandardIO(EchoItemProtocol()) reactor.run() if __name__ == '__main__': main() So, when I run the file it works: $ ./item.py test_item_id <storage.Item object at 0x2b15910> I want to be able to redirect output of some process into my script. But this doesn't work: $ echo test_item_id | ./item.py It doesn't produce any output. But if i replace `self.transport.write(str(item) + '\n')` with just `print str(item)` it suddenly works: $ echo test_item_id | ./item.py <storage.Item object at 0x18ac650> So my conclusions are: 1. When I write to stdin of my script from keyboard then `self.transport.write()` works as expected. 2. When I use pipes or redirection to my script `self.transport.write()` doesn't produce any output. 3. If I replace self.transport.write() with print statements it works. Why is this strange behavior? -- with regards, Maxim

Maxim Lacrima wrote: […]
Buffering, perhaps? As a quick test, try adjusting your callback to loseConnection on the transport after to the write, that should flush the buffers. If buffering turns out to be your problem, and you really do expect your pipes to be at most line-buffered, you'll need to arrange for the buffering on the stdin and stdout file descriptors to be changed. I don't think Twisted has that builtin, so you may need to do that before creating the StandardIO object using the regular Python APIs. -Andrew.

On Aug 9, 2012, at 7:39 PM, Andrew Bennetts <andrew@bemusement.org> wrote:
It's not this kind of buffering problem. I was actually able to reproduce it by replacing "Item.get()" with a deferLater. (In the future, please perform these types of replacements yourself before posting your examples; it's much easier to help out with complete, runnable programs.) The issue is that twisted.internet.stdio considers your protocol to only know about one kind of connection loss: i.e. that your entire transport has gone away. So, when the input is closed, the output file descriptor is immediately closed as well. Unfortunately - and I think this is a bug in Twisted - the input connection being lost causes the output connection to be immediately and somewhat aggressively closed. The difference between "echo" and typing interactively is that you send "test_item_id\n", but echo sends "test_item_id\n" *EOF*. Luckily you can work around this, by explicitly asking Twisted for notifications about half of the connection being closed. You do this by implementing IHalfCloseableProtocol. Although Twisted should be better about how it handles shutting down stdout, this nuance does reveal some logic that your code is missing: once your input is done, you don't have any concept of waiting for your output to be done before shutting everything down. So, if you add both the IHalfCloseableProtocol implements declaration to receive the relevant notifications, and some logic to keep track of when all the output has been written, you can get exactly the behavior that I assume you want. I've attached an example that does all of these things with a simple timer, since this is something that Twisted probably needs to document a bit better. Cheers, -glyph

On 8/9/12 10:18 PM, Glyph wrote:
Incidentally, I think this was the problem I ran into in Foolscap, trying to build a tool that lets you run individual shell commands remotely. I was able to stretch the three file descriptors (stdin, stdout, stderr) over the wire using a callRemote() for each invocation of dataReceived(), but had similar issues when closing one side of the connection. The easiest way to trigger this was to run /usr/bin/sort, which necessarily waits until the input has been closed before it will produce any output. So you run "cat data.txt | flappclient run-command", flappclient reads from StandardIO, sends the data to the far end, notices the EOF on stdin when the data is finished, sends a "stdin is closed" message to the remote side, receives the return data from sort, then tries to write it to the local stdout (which sometimes failed). I think I saw problems in the opposite situation too: if stdout were closed (because our caller didn't want to hear anything further from us), the stdin would no longer accept data. I don't remember how I triggered this situation. My workaround is here, in case you find it useful: https://github.com/warner/foolscap/blob/master/foolscap/appserver/client.py#... It worked for me at the time. (since then I've seen some trouble when running git-receive-pack over this connection, which may or may not be related, so I won't claim it's foolproof). cheers, -Brian

On Aug 10, 2012, at 11:02 AM, Brian Warner <warner@lothar.com> wrote:
Are you saying that you had issues even with implementing IHalfCloseableProtocol and overriding readConnectionLost? Doing that now seems to work fine for me. Perhaps there was a different issue that has since been fixed? I can't find one, though... -glyph

Hi everyone! Thanks for the help! I used an attached example and after implementing IHalfCloseableProtocol it works! Thanks a lot! Still, Twisted seems hard to me, probably because I don't quite understand how to use and combine Twisted's interfaces. With your help I can now connect my program to another process using pipe. But, for example, I want to be able to supply data to my program not only via stdin, but also as command line arguments. I.e. instead of $ echo foo | ./item.py I want to do: $ ./item.py foo My feeling here is that I can reuse the same protocol (EchoItemProtocol) here, but now instead of StandardIO I need my own factory (or transport, or whatever it is named) class that knows how to read command line args and talk the same protocol. Sure I can read arguments using sys.argv[1:], but what do I do next to properly pass that data to the protocol? Does my class have to implement ITransport or IProducer etc. to talk EchoItemProtocol similar to StandardIO? I hope my question is clear. Thanks in advance. On 10 August 2012 08:18, Glyph <glyph@twistedmatrix.com> wrote:
-- with regards, Maxim

Hi Drew, I was referring to the example attached by Glyph. His example helped me to properly handle stdin in my code. In addition to stdin I want to handle command line arguments, so I want to be able to do this: $ echo foo | ./check.py and this: $ ./check.py foo I think `main` function should look something like this: def main(): if sys.stdin.isatty(): # we are connected to terminal args = sys.argv[1:] # .... # What should I implement to be able to speak EchoItemProtocol??? # .... else: # we are connected to stdin stdio.StandardIO(EchoItemProtocol()) reactor.run() I am new to Twisted, so I don't know a proper term for a class I need to implement (is it a factory, a transport, an endpoint etc.???) to be able to speak EchoItemProtocol? Is this a valid approach at all? In case you can't find previously attached example, I have attached it again. On 14 August 2012 04:09, Drew Smathers <drew.smathers@gmail.com> wrote:
-- with regards, Maxim

On 07:51 am, lacrima.maxim@gmail.com wrote:
Command line arguments aren't really anything like standard input. Command line arguments are available immediately, synchronously, in their entirety. They are tokenized into a list of strings, and there are limits imposed on what bytes can appear in those strings. Standard input can only be read a little at a time, perhaps throughout the duration of the entire process, and attempting to do so may involve blocking or dealing with complicated, platform-specific non-blocking APIs. Standard input can contain any bytes and arrives as a stream, not as a reliably tokenized list of strings. Twisted includes no support for treating stdin and command line arguments in a similar fashion. After you look up the command line arguments from sys.argv, just use the values. There would seem to be little point in trying to shove them through a protocol object. Jean-Paul

Maxim Lacrima wrote: […]
Buffering, perhaps? As a quick test, try adjusting your callback to loseConnection on the transport after to the write, that should flush the buffers. If buffering turns out to be your problem, and you really do expect your pipes to be at most line-buffered, you'll need to arrange for the buffering on the stdin and stdout file descriptors to be changed. I don't think Twisted has that builtin, so you may need to do that before creating the StandardIO object using the regular Python APIs. -Andrew.

On Aug 9, 2012, at 7:39 PM, Andrew Bennetts <andrew@bemusement.org> wrote:
It's not this kind of buffering problem. I was actually able to reproduce it by replacing "Item.get()" with a deferLater. (In the future, please perform these types of replacements yourself before posting your examples; it's much easier to help out with complete, runnable programs.) The issue is that twisted.internet.stdio considers your protocol to only know about one kind of connection loss: i.e. that your entire transport has gone away. So, when the input is closed, the output file descriptor is immediately closed as well. Unfortunately - and I think this is a bug in Twisted - the input connection being lost causes the output connection to be immediately and somewhat aggressively closed. The difference between "echo" and typing interactively is that you send "test_item_id\n", but echo sends "test_item_id\n" *EOF*. Luckily you can work around this, by explicitly asking Twisted for notifications about half of the connection being closed. You do this by implementing IHalfCloseableProtocol. Although Twisted should be better about how it handles shutting down stdout, this nuance does reveal some logic that your code is missing: once your input is done, you don't have any concept of waiting for your output to be done before shutting everything down. So, if you add both the IHalfCloseableProtocol implements declaration to receive the relevant notifications, and some logic to keep track of when all the output has been written, you can get exactly the behavior that I assume you want. I've attached an example that does all of these things with a simple timer, since this is something that Twisted probably needs to document a bit better. Cheers, -glyph

On 8/9/12 10:18 PM, Glyph wrote:
Incidentally, I think this was the problem I ran into in Foolscap, trying to build a tool that lets you run individual shell commands remotely. I was able to stretch the three file descriptors (stdin, stdout, stderr) over the wire using a callRemote() for each invocation of dataReceived(), but had similar issues when closing one side of the connection. The easiest way to trigger this was to run /usr/bin/sort, which necessarily waits until the input has been closed before it will produce any output. So you run "cat data.txt | flappclient run-command", flappclient reads from StandardIO, sends the data to the far end, notices the EOF on stdin when the data is finished, sends a "stdin is closed" message to the remote side, receives the return data from sort, then tries to write it to the local stdout (which sometimes failed). I think I saw problems in the opposite situation too: if stdout were closed (because our caller didn't want to hear anything further from us), the stdin would no longer accept data. I don't remember how I triggered this situation. My workaround is here, in case you find it useful: https://github.com/warner/foolscap/blob/master/foolscap/appserver/client.py#... It worked for me at the time. (since then I've seen some trouble when running git-receive-pack over this connection, which may or may not be related, so I won't claim it's foolproof). cheers, -Brian

On Aug 10, 2012, at 11:02 AM, Brian Warner <warner@lothar.com> wrote:
Are you saying that you had issues even with implementing IHalfCloseableProtocol and overriding readConnectionLost? Doing that now seems to work fine for me. Perhaps there was a different issue that has since been fixed? I can't find one, though... -glyph

Hi everyone! Thanks for the help! I used an attached example and after implementing IHalfCloseableProtocol it works! Thanks a lot! Still, Twisted seems hard to me, probably because I don't quite understand how to use and combine Twisted's interfaces. With your help I can now connect my program to another process using pipe. But, for example, I want to be able to supply data to my program not only via stdin, but also as command line arguments. I.e. instead of $ echo foo | ./item.py I want to do: $ ./item.py foo My feeling here is that I can reuse the same protocol (EchoItemProtocol) here, but now instead of StandardIO I need my own factory (or transport, or whatever it is named) class that knows how to read command line args and talk the same protocol. Sure I can read arguments using sys.argv[1:], but what do I do next to properly pass that data to the protocol? Does my class have to implement ITransport or IProducer etc. to talk EchoItemProtocol similar to StandardIO? I hope my question is clear. Thanks in advance. On 10 August 2012 08:18, Glyph <glyph@twistedmatrix.com> wrote:
-- with regards, Maxim

Hi Drew, I was referring to the example attached by Glyph. His example helped me to properly handle stdin in my code. In addition to stdin I want to handle command line arguments, so I want to be able to do this: $ echo foo | ./check.py and this: $ ./check.py foo I think `main` function should look something like this: def main(): if sys.stdin.isatty(): # we are connected to terminal args = sys.argv[1:] # .... # What should I implement to be able to speak EchoItemProtocol??? # .... else: # we are connected to stdin stdio.StandardIO(EchoItemProtocol()) reactor.run() I am new to Twisted, so I don't know a proper term for a class I need to implement (is it a factory, a transport, an endpoint etc.???) to be able to speak EchoItemProtocol? Is this a valid approach at all? In case you can't find previously attached example, I have attached it again. On 14 August 2012 04:09, Drew Smathers <drew.smathers@gmail.com> wrote:
-- with regards, Maxim

On 07:51 am, lacrima.maxim@gmail.com wrote:
Command line arguments aren't really anything like standard input. Command line arguments are available immediately, synchronously, in their entirety. They are tokenized into a list of strings, and there are limits imposed on what bytes can appear in those strings. Standard input can only be read a little at a time, perhaps throughout the duration of the entire process, and attempting to do so may involve blocking or dealing with complicated, platform-specific non-blocking APIs. Standard input can contain any bytes and arrives as a stream, not as a reliably tokenized list of strings. Twisted includes no support for treating stdin and command line arguments in a similar fashion. After you look up the command line arguments from sys.argv, just use the values. There would seem to be little point in trying to shove them through a protocol object. Jean-Paul
participants (6)
-
Andrew Bennetts
-
Brian Warner
-
Drew Smathers
-
exarkun@twistedmatrix.com
-
Glyph
-
Maxim Lacrima