Paul Moore
On 1 September 2014 21:37, Andrew Barnert
wrote: This brings up a good point: having a single encoding, errors, and newlines set of parameters for Popen and the convenience functions implies that you want to pass the same ones to all pipes. But how often is that true?
My proposal was purely for encoding, and was prompted by the fact that the Windows default encoding does not support all of Unicode. Setting PYTHONIOENCODING to utf-8 for a Python subprocess allows handling of all of Unicode if you can set the subprocess channels' encoding to utf-8. As PYTHONIOENCODING affects all 3 channels, being able to set a single value for all 3 channels is sufficient for that use case.
Setting newline and the error handler were *not* part of my original proposal, essentially because I know of no other way to force a subprocess to use anything other than the default encoding for the standard IO streams. Handling programs that are defined as using the standard streams for anything other than normal text (nul-terminated lines, explicitly defined non-default encodings) isn't something I have any examples of.
The find -print0 example is out of scope, IMO, as newline handling is different from encoding. At some point, it becomes easier to manually wrap the streams rather than having huge numbers of parameters to the Popen constructor.
I'll think some more on this...
PYTHONIOENCODING allows to specify the error handler e.g., to avoid exceptions while reading list of files: $ ls | PYTHONIOENCODING=:surrogateescape python3 -c 'import sys; print(list(sys.stdin))' Or the same but with TextPopen suggested by Antoine: with TextPopen(['ls'], stdout=PIPE, ioencoding=':surrogateescape') as p: for filename in p.stdout: process(filename) os.fsencode(filename) would get original bytes. Note: ioencoding parameter is my interpretation. -- Akira