[Python-ideas] Subprocess: Add an encoding argument

Akira Li 4kir4.1i at gmail.com
Wed Sep 3 01:43:11 CEST 2014


Paul Moore <p.f.moore at gmail.com> writes:

> On 1 September 2014 21:37, Andrew Barnert
> <abarnert at yahoo.com.dmarc.invalid> wrote:
>> This brings up a good point: having a single encoding, errors, and
>> newlines set of parameters for Popen and the convenience functions
>> implies that you want to pass the same ones to all pipes. But how
>> often is that true?
>
> My proposal was purely for encoding, and was prompted by the fact that
> the Windows default encoding does not support all of Unicode. Setting
> PYTHONIOENCODING to utf-8 for a Python subprocess allows handling of
> all of Unicode if you can set the subprocess channels' encoding to
> utf-8. As PYTHONIOENCODING affects all 3 channels, being able to set a
> single value for all 3 channels is sufficient for that use case.
>
> Setting newline and the error handler were *not* part of my original
> proposal, essentially because I know of no other way to force a
> subprocess to use anything other than the default encoding for the
> standard IO streams. Handling programs that are defined as using the
> standard streams for anything other than normal text (nul-terminated
> lines, explicitly defined non-default encodings) isn't something I
> have any examples of.
>
> The find -print0 example is out of scope, IMO, as newline handling is
> different from encoding. At some point, it becomes easier to manually
> wrap the streams rather than having huge numbers of parameters to the
> Popen constructor.
>
> I'll think some more on this...

PYTHONIOENCODING allows to specify the error handler e.g., to avoid
exceptions while reading list of files: 

$ ls |
PYTHONIOENCODING=:surrogateescape python3 -c 'import sys; print(list(sys.stdin))'

Or the same but with TextPopen suggested by Antoine:

  with TextPopen(['ls'], stdout=PIPE, ioencoding=':surrogateescape') as p:
      for filename in p.stdout:
          process(filename)

os.fsencode(filename) would get original bytes.
Note: ioencoding parameter is my interpretation.


--
Akira



More information about the Python-ideas mailing list