Replacing the standard IO streams (was Re: changing sys.stdout encoding)

So, after much digging, it appears the *right* way to replace a standard stream in Python 3 after application start is to do the following: sys.stdin = open(sys.stdin.fileno(), 'r', <new settings>) sys.stdout = open(sys.stdout.fileno(), 'w', <new settings>) sys.stderr = open(sys.stderr.fileno(), 'w', <new settings>) Ditto for the other standard streams. It seems it already *is* as simple as with any other file, we just collectively forgot about: 1. The fact open() accepts file descriptors directly in Python 3 2. The fact that text streams still report the underlying file descriptor correctly *That* is something we can happily advertise in the standard library docs. If you could check to make sure it works properly for your use case and then file a docs bug at bugs.python.org to get it added to the std streams documentation, that would be very helpful. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 9 June 2012 10:55, Nick Coghlan <ncoghlan@gmail.com> wrote:
One minor point - if sys.stdout is redirected, *and* you have already written to sys.stdout, this resets the file pointer. With test.py as import sys print("Hello!") sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') print("Hello!") test.py >a gives one line in a, not two (tested on Windows, Unix may be different). And changing to "a" doesn't resolve this... Of course, the actual use case is to change the encoding before anything is written - so maybe a small note saying "don't do this" is enough. But it's worth mentioning before we get the bug report saying "Python lost my data" :-) Paul.

On 09/06/2012 12:00, Paul Moore wrote:
I find that this: print("Hello!") sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') print("Hello!") prints the string "Hello!\r\r\n", but this: print("Hello!") sys.stdout.flush() sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') print("Hello!") prints the string "Hello!\r\nHello!\r\r\n". I had hoped that the flush would be enough, but apparently not.

On 09/06/2012 21:02, Serhiy Storchaka wrote:
None of these methods are not guaranteed to work if the input or output have occurred before.
That's a double negative so I'm not sure what you meant to say. Can you please rephrase it. I assume that English is not your native language, so I'll let you off :) -- Cheers. Mark Lawrence.

On 10.06.12 00:22, Mark Lawrence wrote:
open(sys.stdin.fileno()) is not guaranteed to work if the input or output have occurred before. And io.TextIOWrapper(sys.stdin.detach()) is not guaranteed to work if the input or output have occurred before. sys.stdin internal buffer can contains read by not used characters. sys.stdin.buffer internal buffer can contains read by not used bytes. With multibyte encoding sys.stdin.decoder internal buffer can contains uncompleted multibyte character.

On Mon, Jun 11, 2012 at 12:34 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
Right, but the point of this discussion is to document the cleanest available way for an application to change these settings at *application start* (e.g. to support an "--encoding" parameter). Yes, there are potential issues if you use any of these mechanisms while there is data in the buffers, but that's a much harder problem and not one we're trying to solve here. Regardless, the advantage of the "open + fileno" idiom is that it works for *any* level of change. If you want to force your streams to unbuffered binary IO rather than merely changing the encoding: sys.stdin = open(sys.stdin.fileno(), 'rb', buffering=0, closefd=False) sys.stdout = open(sys.stdout.fileno(), 'wb', buffering=0, closefd=False) sys.stderr = open(sys.stderr.fileno(), 'wb', buffering=0, closefd=False) Keep them as text, but force them to permissive utf-8, no matter how the interpreter originally created them?: sys.stdin = open(sys.stdin.fileno(), 'r', encoding="utf-8", errors="surrogateescape", closefd=False) sys.stdout = open(sys.stdout.fileno(), 'w', encoding="utf-8", errors="surrogateescape", closefd=False) sys.stderr = open(sys.stderr.fileno(), 'w', encoding="utf-8", errors="surrogateescape", closefd=False) This approach also has the advantage of leaving sys.__std(in/out/err)__ in a somewhat usable state. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

^^^^^^^[[[[[[[[[@[@Nick Coghlan writes:
On Mon, Jun 11, 2012 at 12:34 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
+1 The OP's problem is a real one. His use case (the "--encoding" parameter) seems to be the most likely one in production use, so the loss of buffered data issue should rarely come up. Changing encodings on the fly offers plenty of ways to lose data besides incomplete buffers, anyway. I am a little concerned with MRAB's report that import sys print("hello") sys.stdout.flush() sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') print("hello") doesn't work as expected, though. (It does work for me on Mac OS X, both as above -- of course there are no '\r's in the output -- and with 'print("hello", end="\r\n")'.)

On 10 June 2012 19:12, MRAB <python@mrabarnett.plus.com> wrote:
Not here (Win 7 32-bit): PS D:\Data> type t.py import sys print("Hello!") sys.stdout.flush() sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') print("Hello!") PS D:\Data> py -3.2 t.py | od -c 0000000 H e l l o ! \r \n H e l l o ! \r \n 0000020 Paul.

On 11 June 2012 07:16, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Explicit end= makes no difference to the behaviour. In fact, a minimal test suggests that universal newline mode is not enabled on Windows in Python 3. That's a regression from 2.x. See below. D:\Data>py -3 -c "print('x')" | od -c 0000000 x \n 0000002 D:\Data>py -2 -c "print('x')" | od -c 0000000 x \r \n 0000003 D:\Data>py -3 -V Python 3.2.2 D:\Data>py -2 -V Python 2.7.2 Paul.

2012/6/11 Paul Moore <p.f.moore@gmail.com>
This is certainly related to http://bugs.python.org/issue11990 -- Amaury Forgeot d'Arc

On Mon, Jun 11, 2012 at 2:43 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
Correct, but using detach() leaves sys.__std*__ completely broken (either throwing exceptions or silently failing to emit output). Creating two independent streams that share the underlying file handle is much closer to the 2.x behaviour when replacing sys.std*. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Le 10/06/2012 04:26, Nick Coghlan a écrit :
Calling detach() on the standard streams is a bad idea - the interpreter uses the originals internally, and calling detach() breaks them.
Where does it do that? The interpreter certainly shouldn't hardwire the original objects internally. Moreover, your snippet is wrong because if someone replaces the streams for a second time, garbage collecting the previous streams will close the file descriptors. You should use closefd=False. Regards Antoine.

On Sun, Jun 10, 2012 at 5:17 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
At the very least, sys.__std(in/out/err)__. Doing "sys.stderr = io.TextIOWrapper(sys.stderr.detach(), line_buffering=True)" also seems to suppress display of exception tracebacks at the interactive prompt (perhaps the default except hook is using a cached reference?). I believe PyFatalError and other APIs that are used deep in the interpreter won't respect the module level setting. Basically, it's dangerous to use detach() on a stream where you don't hold the sole reference, and the safest approach with the standard streams is to assume that other code is holding references to them. Detaching the standard streams is just as likely to cause problems as closing them.
True, although that nicety is all the more reason to encapsulate this idiom in a new IOBase.reopen() method: def reopen(self, mode=None, buffering=-1, encoding=None, errors=None, newline=None, closefd=False): if mode is None: mode = getattr(mode, self, 'r') return open(self.fileno(), mode, buffering, encoding, errors, newline, closefd) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 10.06.12 05:26, Nick Coghlan wrote:
Calling detach() on the standard streams is a bad idea - the interpreter uses the originals internally, and calling detach() breaks them.
If interpreter uses standard streams then it uses raw C streams (FILE *) stdin/stdout/etc. Calling open(sys.stdin.fileno()) bypasses internal buffering in sys.stdin, sys.stdin.buffer, sys.stdin.decoder and raw C stdin (if it used in lower level), and lose and break multibyte characters.

You should set the newline option for sys.std* files. Python 3 does something like this: if os.name == "win32: # translate "\r\n" to "\n" for sys.stdin on Windows newline = None else: newline = "\n" sys.stdin = io.TextIOWrapper(sys.stdin.detach(), newline=newline, <new settings>) sys.stdout = io.TextIOWrapper(sys.stdout.detach(), newline="\n", <new settings>) sys.stderr = io.TextIOWrapper(sys.stderr.detach(), newline="\n", <new settings>) -- Lib/test/regrtest.py uses the following code which is not exactly correct (it creates a new buffered writer instead of reusing sys.stdout buffered writer): def replace_stdout(): """Set stdout encoder error handler to backslashreplace (as stderr error handler) to avoid UnicodeEncodeError when printing a traceback""" import atexit stdout = sys.stdout sys.stdout = open(stdout.fileno(), 'w', encoding=stdout.encoding, errors="backslashreplace", closefd=False, newline='\n') def restore_stdout(): sys.stdout.close() sys.stdout = stdout atexit.register(restore_stdout) Victor

On 9 June 2012 10:55, Nick Coghlan <ncoghlan@gmail.com> wrote:
One minor point - if sys.stdout is redirected, *and* you have already written to sys.stdout, this resets the file pointer. With test.py as import sys print("Hello!") sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') print("Hello!") test.py >a gives one line in a, not two (tested on Windows, Unix may be different). And changing to "a" doesn't resolve this... Of course, the actual use case is to change the encoding before anything is written - so maybe a small note saying "don't do this" is enough. But it's worth mentioning before we get the bug report saying "Python lost my data" :-) Paul.

On 09/06/2012 12:00, Paul Moore wrote:
I find that this: print("Hello!") sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') print("Hello!") prints the string "Hello!\r\r\n", but this: print("Hello!") sys.stdout.flush() sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') print("Hello!") prints the string "Hello!\r\nHello!\r\r\n". I had hoped that the flush would be enough, but apparently not.

On 09/06/2012 21:02, Serhiy Storchaka wrote:
None of these methods are not guaranteed to work if the input or output have occurred before.
That's a double negative so I'm not sure what you meant to say. Can you please rephrase it. I assume that English is not your native language, so I'll let you off :) -- Cheers. Mark Lawrence.

On 10.06.12 00:22, Mark Lawrence wrote:
open(sys.stdin.fileno()) is not guaranteed to work if the input or output have occurred before. And io.TextIOWrapper(sys.stdin.detach()) is not guaranteed to work if the input or output have occurred before. sys.stdin internal buffer can contains read by not used characters. sys.stdin.buffer internal buffer can contains read by not used bytes. With multibyte encoding sys.stdin.decoder internal buffer can contains uncompleted multibyte character.

On Mon, Jun 11, 2012 at 12:34 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
Right, but the point of this discussion is to document the cleanest available way for an application to change these settings at *application start* (e.g. to support an "--encoding" parameter). Yes, there are potential issues if you use any of these mechanisms while there is data in the buffers, but that's a much harder problem and not one we're trying to solve here. Regardless, the advantage of the "open + fileno" idiom is that it works for *any* level of change. If you want to force your streams to unbuffered binary IO rather than merely changing the encoding: sys.stdin = open(sys.stdin.fileno(), 'rb', buffering=0, closefd=False) sys.stdout = open(sys.stdout.fileno(), 'wb', buffering=0, closefd=False) sys.stderr = open(sys.stderr.fileno(), 'wb', buffering=0, closefd=False) Keep them as text, but force them to permissive utf-8, no matter how the interpreter originally created them?: sys.stdin = open(sys.stdin.fileno(), 'r', encoding="utf-8", errors="surrogateescape", closefd=False) sys.stdout = open(sys.stdout.fileno(), 'w', encoding="utf-8", errors="surrogateescape", closefd=False) sys.stderr = open(sys.stderr.fileno(), 'w', encoding="utf-8", errors="surrogateescape", closefd=False) This approach also has the advantage of leaving sys.__std(in/out/err)__ in a somewhat usable state. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

^^^^^^^[[[[[[[[[@[@Nick Coghlan writes:
On Mon, Jun 11, 2012 at 12:34 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
+1 The OP's problem is a real one. His use case (the "--encoding" parameter) seems to be the most likely one in production use, so the loss of buffered data issue should rarely come up. Changing encodings on the fly offers plenty of ways to lose data besides incomplete buffers, anyway. I am a little concerned with MRAB's report that import sys print("hello") sys.stdout.flush() sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') print("hello") doesn't work as expected, though. (It does work for me on Mac OS X, both as above -- of course there are no '\r's in the output -- and with 'print("hello", end="\r\n")'.)

On 10 June 2012 19:12, MRAB <python@mrabarnett.plus.com> wrote:
Not here (Win 7 32-bit): PS D:\Data> type t.py import sys print("Hello!") sys.stdout.flush() sys.stdout = open(sys.stdout.fileno(), 'w', encoding='utf-8') print("Hello!") PS D:\Data> py -3.2 t.py | od -c 0000000 H e l l o ! \r \n H e l l o ! \r \n 0000020 Paul.

On 10/06/2012 21:07, Paul Moore wrote:
It's at the system command prompt. When I redirect the script's stdout to a file (on the command line using ">output.txt") I get those 15 bytes from Python 3.2. Your output appears to be 32 bytes (the second line starts with "0000020").

On 11 June 2012 07:16, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Explicit end= makes no difference to the behaviour. In fact, a minimal test suggests that universal newline mode is not enabled on Windows in Python 3. That's a regression from 2.x. See below. D:\Data>py -3 -c "print('x')" | od -c 0000000 x \n 0000002 D:\Data>py -2 -c "print('x')" | od -c 0000000 x \r \n 0000003 D:\Data>py -3 -V Python 3.2.2 D:\Data>py -2 -V Python 2.7.2 Paul.

2012/6/11 Paul Moore <p.f.moore@gmail.com>
This is certainly related to http://bugs.python.org/issue11990 -- Amaury Forgeot d'Arc

On Mon, Jun 11, 2012 at 2:43 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
Correct, but using detach() leaves sys.__std*__ completely broken (either throwing exceptions or silently failing to emit output). Creating two independent streams that share the underlying file handle is much closer to the 2.x behaviour when replacing sys.std*. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Le 10/06/2012 04:26, Nick Coghlan a écrit :
Calling detach() on the standard streams is a bad idea - the interpreter uses the originals internally, and calling detach() breaks them.
Where does it do that? The interpreter certainly shouldn't hardwire the original objects internally. Moreover, your snippet is wrong because if someone replaces the streams for a second time, garbage collecting the previous streams will close the file descriptors. You should use closefd=False. Regards Antoine.

On Sun, Jun 10, 2012 at 5:17 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
At the very least, sys.__std(in/out/err)__. Doing "sys.stderr = io.TextIOWrapper(sys.stderr.detach(), line_buffering=True)" also seems to suppress display of exception tracebacks at the interactive prompt (perhaps the default except hook is using a cached reference?). I believe PyFatalError and other APIs that are used deep in the interpreter won't respect the module level setting. Basically, it's dangerous to use detach() on a stream where you don't hold the sole reference, and the safest approach with the standard streams is to assume that other code is holding references to them. Detaching the standard streams is just as likely to cause problems as closing them.
True, although that nicety is all the more reason to encapsulate this idiom in a new IOBase.reopen() method: def reopen(self, mode=None, buffering=-1, encoding=None, errors=None, newline=None, closefd=False): if mode is None: mode = getattr(mode, self, 'r') return open(self.fileno(), mode, buffering, encoding, errors, newline, closefd) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 10.06.12 05:26, Nick Coghlan wrote:
Calling detach() on the standard streams is a bad idea - the interpreter uses the originals internally, and calling detach() breaks them.
If interpreter uses standard streams then it uses raw C streams (FILE *) stdin/stdout/etc. Calling open(sys.stdin.fileno()) bypasses internal buffering in sys.stdin, sys.stdin.buffer, sys.stdin.decoder and raw C stdin (if it used in lower level), and lose and break multibyte characters.

You should set the newline option for sys.std* files. Python 3 does something like this: if os.name == "win32: # translate "\r\n" to "\n" for sys.stdin on Windows newline = None else: newline = "\n" sys.stdin = io.TextIOWrapper(sys.stdin.detach(), newline=newline, <new settings>) sys.stdout = io.TextIOWrapper(sys.stdout.detach(), newline="\n", <new settings>) sys.stderr = io.TextIOWrapper(sys.stderr.detach(), newline="\n", <new settings>) -- Lib/test/regrtest.py uses the following code which is not exactly correct (it creates a new buffered writer instead of reusing sys.stdout buffered writer): def replace_stdout(): """Set stdout encoder error handler to backslashreplace (as stderr error handler) to avoid UnicodeEncodeError when printing a traceback""" import atexit stdout = sys.stdout sys.stdout = open(stdout.fileno(), 'w', encoding=stdout.encoding, errors="backslashreplace", closefd=False, newline='\n') def restore_stdout(): sys.stdout.close() sys.stdout = stdout atexit.register(restore_stdout) Victor
participants (9)
-
Amaury Forgeot d'Arc
-
Antoine Pitrou
-
Mark Lawrence
-
MRAB
-
Nick Coghlan
-
Paul Moore
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Victor Stinner