Implementing File Modes
Hello, Since there was a bit of confusion last time, I'll start by saying I am working on the subprocess.Popen module for Google Summer of Code. One of the features I am implementing is a class so that a running process can stand in in place of a file. For examples, instead of open( "filelist", mode = 'r') one would call ProcessIOWrapper( "ls -l", mode = 'r'). I am trying to decide if I should fully implement the mode argument. Right now, it essentially ignores everything but a 'U' indicated universal newlines in the mode argument. Should I leave that as is or make it so that things like "r+", "w", "a" are handled the way they would be for an actual file? Eric
2009/7/27 Eric Pruitt <eric.pruitt@gmail.com>:
Hello,
Since there was a bit of confusion last time, I'll start by saying I am working on the subprocess.Popen module for Google Summer of Code. One of the features I am implementing is a class so that a running process can stand in in place of a file. For examples, instead of open( "filelist", mode = 'r') one would call ProcessIOWrapper( "ls -l", mode = 'r'). I am trying to decide if I should fully implement the mode argument. Right now, it essentially ignores everything but a 'U' indicated universal newlines in the mode argument. Should I leave that as is or make it so that things like "r+", "w", "a" are handled the way they would be for an actual file?
I would expect "r" to produce a pipe that reads from stdout of the subprocess, and "w" to produce a pipe that writes to stdin of the subprocess. "a" would be the same as "w", and arguably "r+" would be a bidirectional pipe - read from the subprocess stdout and write to its stdin. I'd be OK with "r+" not being implemented (if it's too hard to avoid deadlocks) but "r" and "w" should be present. Paul.
Paul Moore wrote:
2009/7/27 Eric Pruitt <eric.pruitt@gmail.com>:
Hello,
Since there was a bit of confusion last time, I'll start by saying I am working on the subprocess.Popen module for Google Summer of Code. One of the features I am implementing is a class so that a running process can stand in in place of a file. For examples, instead of open( "filelist", mode = 'r') one would call ProcessIOWrapper( "ls -l", mode = 'r'). I am trying to decide if I should fully implement the mode argument. Right now, it essentially ignores everything but a 'U' indicated universal newlines in the mode argument. Should I leave that as is or make it so that things like "r+", "w", "a" are handled the way they would be for an actual file?
I would expect "r" to produce a pipe that reads from stdout of the subprocess, and "w" to produce a pipe that writes to stdin of the subprocess. "a" would be the same as "w", and arguably "r+" would be a bidirectional pipe - read from the subprocess stdout and write to its stdin.
I'd be OK with "r+" not being implemented (if it's too hard to avoid deadlocks) but "r" and "w" should be present.
What about stderr? You could add "e" if you want to read from it.
On Tue, 28 Jul 2009 03:21:30 am MRAB wrote:
What about stderr? You could add "e" if you want to read from it.
"Read from stderr" is just a read. "Write to stderr" is just a write. The difference between reading stdout and stderr is not that you have different modes, but that you are reading from different files. -- Steven D'Aprano
Steven D'Aprano wrote:
On Tue, 28 Jul 2009 03:21:30 am MRAB wrote:
What about stderr? You could add "e" if you want to read from it.
"Read from stderr" is just a read. "Write to stderr" is just a write. The difference between reading stdout and stderr is not that you have different modes, but that you are reading from different files.
By the same argument, aren't stdin and stdout also different files?
I am implementing the file wrapper using changes to subprocess.Popen that also make it asynchronous and non-blocking so implementing "r+" should be trivial to do. How about handling stderr? I have the following ideas: leave out support for reading from stderr, make it so that there is an optional additional argument like "outputstderr = False", create another function that toggles / sets whether stderr or stdout is returned or mix the two outputs. Thanks for the input, Eric On Mon, Jul 27, 2009 at 10:46, Paul Moore <p.f.moore@gmail.com> wrote:
Hello,
Since there was a bit of confusion last time, I'll start by saying I am working on the subprocess.Popen module for Google Summer of Code. One of
2009/7/27 Eric Pruitt <eric.pruitt@gmail.com>: the
features I am implementing is a class so that a running process can stand in in place of a file. For examples, instead of open( "filelist", mode = 'r') one would call ProcessIOWrapper( "ls -l", mode = 'r'). I am trying to decide if I should fully implement the mode argument. Right now, it essentially ignores everything but a 'U' indicated universal newlines in the mode argument. Should I leave that as is or make it so that things like "r+", "w", "a" are handled the way they would be for an actual file?
I would expect "r" to produce a pipe that reads from stdout of the subprocess, and "w" to produce a pipe that writes to stdin of the subprocess. "a" would be the same as "w", and arguably "r+" would be a bidirectional pipe - read from the subprocess stdout and write to its stdin.
I'd be OK with "r+" not being implemented (if it's too hard to avoid deadlocks) but "r" and "w" should be present.
Paul.
2009/7/27 Eric Pruitt <eric.pruitt@gmail.com>:
I am implementing the file wrapper using changes to subprocess.Popen that also make it asynchronous and non-blocking so implementing "r+" should be trivial to do. How about handling stderr? I have the following ideas: leave out support for reading from stderr, make it so that there is an optional additional argument like "outputstderr = False", create another function that toggles / sets whether stderr or stdout is returned or mix the two outputs.
I like MRAB's idea of using a (non-standard) "e" flag to include stderr. So "r" reads from stdout, "re" reads from stdout+stderr. Anything more complicated probably should just use "raw" Popen objects. Don't overcomplicate the interface. Paul.
On Mon, Jul 27, 2009 at 3:04 PM, Paul Moore <p.f.moore@gmail.com> wrote:
I like MRAB's idea of using a (non-standard) "e" flag to include stderr. So "r" reads from stdout, "re" reads from stdout+stderr.
Anything more complicated probably should just use "raw" Popen objects. Don't overcomplicate the interface.
In my opinion, mangling stderr and stdout together is already an overcomplication. It shouldn't be implemented. It *seems* like a good idea, until you realize that subtle changes to your OS, environment, or buffering behavior may result in arbitrary, unparseable output. For example, let's say you've got a program whose output is a list of lines, each one containing a number. Sometimes it tries to import gtk, and fails to open its display. That's fine, and you can still deal with it, as long as the interleaved output looks like this: 100 200 Gtk-WARNING **: cannot open display: 300 400 but of course the output *might* (although unlikely with such small chunks of output) end up looking like this, instead: 100 2Gtk-WAR0NING0 **: can30not 0open display: 400 this is the sort of thing which is much more likely to happen once you start dealing with large volumes of data, where there are more page-boundaries for your buffers to get confused around, and you are playing with buffering options to improve performance. In other words, it's something that fails only at scale or under load, and is therefore extremely difficult to debug. This option *might* be okay if it were allowed only on subprocesses opened in a *text* mode, and if the buffering logic involved forced stderr and stdout to be line-delimited, and interleave only lines, rather than arbitrary chunks of bytes. Of course then if you use this flag with a program that outputs binary data with no newlines it will buffer forever and crash your program with a MemoryError, but at least that's easy to debug when it happens.
On Mon, Jul 27, 2009 at 5:32 PM, Glyph Lefkowitz<glyph@twistedmatrix.com> wrote:
On Mon, Jul 27, 2009 at 3:04 PM, Paul Moore <p.f.moore@gmail.com> wrote:
I like MRAB's idea of using a (non-standard) "e" flag to include stderr. So "r" reads from stdout, "re" reads from stdout+stderr.
Anything more complicated probably should just use "raw" Popen objects. Don't overcomplicate the interface.
In my opinion, mangling stderr and stdout together is already an overcomplication. It shouldn't be implemented.
It seems like a good idea, until you realize that subtle changes to your OS, environment, or buffering behavior may result in arbitrary, unparseable output.
Agreed. Leave stderr support out of this. People who need stderr should use the full subprocess.Popen interface. Quick hack unixy types will just run their process using a shell (which already seems to be the default based on the "ls -l" example) and add 2>&1. This functionality is basically the equivalent of adding the | symbol on either or both ends of a filename given to open() in perl. (but without being so gross). I do wonder why you're trying to make it behave exactly like open() including the mode= syntax. Why not just define several names based on the behavior? ProcessReadWrapper() ProcessWriteWrapper() ProcessReadWriteWrapper() -gps
For example, let's say you've got a program whose output is a list of lines, each one containing a number. Sometimes it tries to import gtk, and fails to open its display.
That's fine, and you can still deal with it, as long as the interleaved output looks like this:
100 200 Gtk-WARNING **: cannot open display: 300 400
but of course the output might (although unlikely with such small chunks of output) end up looking like this, instead:
100 2Gtk-WAR0NING0 **: can30not 0open display:
400
this is the sort of thing which is much more likely to happen once you start dealing with large volumes of data, where there are more page-boundaries for your buffers to get confused around, and you are playing with buffering options to improve performance. In other words, it's something that fails only at scale or under load, and is therefore extremely difficult to debug.
This option might be okay if it were allowed only on subprocesses opened in a text mode, and if the buffering logic involved forced stderr and stdout to be line-delimited, and interleave only lines, rather than arbitrary chunks of bytes. Of course then if you use this flag with a program that outputs binary data with no newlines it will buffer forever and crash your program with a MemoryError, but at least that's easy to debug when it happens.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/greg%40krypto.org
My motivation came from an instance when I was using subprocess.Popen for a Linux / Windows cross platform program. In part of the program, I was writing and reading to a cron like object. On Windows, it was a text file and on Linux it would be the crontab executable. Had I been able to substitute the "open()" function with my wrapper, it would have been the only change I had to make for cross platform compatibility; instead of having to change numerous lines because Linux would need Popen and Windows would need a regular file open(), I could simply make it so that if the platform was Linux, my wrapper is used in place of that. Just another example would be having an external program decrypt a file that can be in plain text or encrypted that might go something like this: if encryptionEnabled: fileobj = subprocess.ProcessIOWrapper("gpg --decrypt supersecret.html.gpg") else: fileobj = open("notsosecret.html")
From there, the functions would not have to be modified despite using a running process versus a file object.
On Tue, Jul 28, 2009 at 01:53, Gregory P. Smith <greg@krypto.org> wrote:
On Mon, Jul 27, 2009 at 5:32 PM, Glyph Lefkowitz<glyph@twistedmatrix.com> wrote:
On Mon, Jul 27, 2009 at 3:04 PM, Paul Moore <p.f.moore@gmail.com> wrote:
I like MRAB's idea of using a (non-standard) "e" flag to include stderr. So "r" reads from stdout, "re" reads from stdout+stderr.
Anything more complicated probably should just use "raw" Popen objects. Don't overcomplicate the interface.
In my opinion, mangling stderr and stdout together is already an overcomplication. It shouldn't be implemented.
It seems like a good idea, until you realize that subtle changes to your OS, environment, or buffering behavior may result in arbitrary, unparseable output.
Agreed. Leave stderr support out of this. People who need stderr should use the full subprocess.Popen interface. Quick hack unixy types will just run their process using a shell (which already seems to be the default based on the "ls -l" example) and add 2>&1. This functionality is basically the equivalent of adding the | symbol on either or both ends of a filename given to open() in perl. (but without being so gross).
I do wonder why you're trying to make it behave exactly like open() including the mode= syntax.
Why not just define several names based on the behavior?
ProcessReadWrapper() ProcessWriteWrapper() ProcessReadWriteWrapper()
-gps
For example, let's say you've got a program whose output is a list of
each one containing a number. Sometimes it tries to import gtk, and fails to open its display.
That's fine, and you can still deal with it, as long as the interleaved output looks like this:
100 200 Gtk-WARNING **: cannot open display: 300 400
but of course the output might (although unlikely with such small chunks of output) end up looking like this, instead:
100 2Gtk-WAR0NING0 **: can30not 0open display:
400
this is the sort of thing which is much more likely to happen once you start dealing with large volumes of data, where there are more page-boundaries for your buffers to get confused around, and you are playing with buffering options to improve performance. In other words, it's something that fails only at scale or under load, and is therefore extremely difficult to debug.
This option might be okay if it were allowed only on subprocesses opened in a text mode, and if the buffering logic involved forced stderr and stdout to be line-delimited, and interleave only lines, rather than arbitrary chunks of bytes. Of course then if you use this flag with a program that outputs binary data with no newlines it will buffer forever and crash your
lines, program
with a MemoryError, but at least that's easy to debug when it happens.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/greg%40krypto.org
Hmm... can't you do this? if encryptionEnabled: p = subprocess.Popen(["gpg", "--decrypt", "supersecret.html.gpg"], stdin = subprocess.PIPE) fileobj = p.stdin else: fileobj = open("notsosecret.html") I think that works. Is there something this way won't work for? You can also do the same thing to get stdout and stderr file objects. I guess a wrapper would simplify this process. -Devin On Wed, Jul 29, 2009 at 7:41 PM, Eric Pruitt<eric.pruitt@gmail.com> wrote:
My motivation came from an instance when I was using subprocess.Popen for a Linux / Windows cross platform program. In part of the program, I was writing and reading to a cron like object. On Windows, it was a text file and on Linux it would be the crontab executable. Had I been able to substitute the "open()" function with my wrapper, it would have been the only change I had to make for cross platform compatibility; instead of having to change numerous lines because Linux would need Popen and Windows would need a regular file open(), I could simply make it so that if the platform was Linux, my wrapper is used in place of that. Just another example would be having an external program decrypt a file that can be in plain text or encrypted that might go something like this:
if encryptionEnabled: fileobj = subprocess.ProcessIOWrapper("gpg --decrypt supersecret.html.gpg") else: fileobj = open("notsosecret.html")
Well, with a few changes to your code, that would indeed work (you are using stdin as your pipe. Correct me if I'm wrong but if you intend to read from it, you need to change it to "stdout = subprocess.PIPE" and the other lines as well to reflect this change). My Google Summer of Code modifications to subprocess.Popen provide non-blocking, asynchronous I/O support and my file wrapper is built upon that augmented functionality. If I remember correctly, when I was working on the program where I first thought a file wrapper for subprocess.Popen would be rather handy, I also ran into blocking I/O as well. On Wed, Jul 29, 2009 at 20:20, Devin Cook <devin.c.cook@gmail.com> wrote:
Hmm... can't you do this?
if encryptionEnabled: p = subprocess.Popen(["gpg", "--decrypt", "supersecret.html.gpg"], stdin = subprocess.PIPE) fileobj = p.stdin else: fileobj = open("notsosecret.html")
I think that works. Is there something this way won't work for? You can also do the same thing to get stdout and stderr file objects. I guess a wrapper would simplify this process.
-Devin
On Wed, Jul 29, 2009 at 7:41 PM, Eric Pruitt<eric.pruitt@gmail.com> wrote:
My motivation came from an instance when I was using subprocess.Popen for a Linux / Windows cross platform program. In part of the program, I was writing and reading to a cron like object. On Windows, it was a text file and on Linux it would be the crontab executable. Had I been able to substitute the "open()" function with my wrapper, it would have been the only change I had to make for cross platform compatibility; instead of having to change numerous lines because Linux would need Popen and Windows would need a regular file open(), I could simply make it so that if the platform was Linux, my wrapper is used in place of that. Just another example would be having an external program decrypt a file that can be in plain text or encrypted that might go something like this:
if encryptionEnabled: fileobj = subprocess.ProcessIOWrapper("gpg --decrypt supersecret.html.gpg") else: fileobj = open("notsosecret.html")
On Tue, 28 Jul 2009 04:06:45 am Eric Pruitt wrote:
I am implementing the file wrapper using changes to subprocess.Popen that also make it asynchronous and non-blocking so implementing "r+" should be trivial to do. How about handling stderr? I have the following ideas: leave out support for reading from stderr, make it so that there is an optional additional argument like "outputstderr = False", create another function that toggles / sets whether stderr or stdout is returned or mix the two outputs.
Leaving it out is always an option. As I see it, fundamentally you can either read from stdout and sterr as two different streams, or you can interleave (mix) them. To me, that suggests three functions: ProcessIOWrapper() # read from stdout (or write to stdin etc.) ProcessIOWrapperStdErr() # read/write from stderr ProcessIOWrapper2() # read from mixed stdout and stderr I don't like a function to toggle between one and the other: that smacks of relying on a global setting in a bad way. I suppose you could add an optional argument to ProcessIOWrapper() to select between stdout, stderr, or both together. -- Steven D'Aprano
participants (7)
-
Devin Cook
-
Eric Pruitt
-
Glyph Lefkowitz
-
Gregory P. Smith
-
MRAB
-
Paul Moore
-
Steven D'Aprano