[Python-ideas] Subprocess: Add an encoding argument

Andrew Barnert abarnert at yahoo.com
Mon Sep 1 22:37:00 CEST 2014


On Monday, September 1, 2014 1:10 PM, Akira Li <4kir4.1i at gmail.com> wrote:

>Paul Moore <p.f.moore at gmail.com> writes:
>
>> On 1 September 2014 20:14, Akira Li
>> <4kir4.1i at gmail.com> wrote:
>>> Could you provide examples how the final result could look like?
>>
>> Do you mean what I'm proposing?
>>
>> p = Popen(..., encoding='utf-8')
>> p.stdout is now a text stream assuming the data is in UTF8, rather
>> than assuming it's in the default encoding.
>
>What if you want to specify an error handler e.g., to read a file list
>from `find -print0` -like program: you could pass
>errors='surrogateescape', newlines='\0' (issue1152248) to
>TextIOWrapper(p.stdin).

Presumably you either meant passing them to `TextIOWrapper(p.stdout)` for `find -print0`, or passing them to `TextIOWrapper(p.stdin)` for `xargs -0`; find doesn't even look at its input.

>Both errors and newlines can be different for stdin/stdout pipes.

This brings up a good point: having a single encoding, errors, and newlines set of parameters for Popen and the convenience functions implies that you want to pass the same ones to all pipes. But how often is that true?

In your particular case, for `find -print0`, you want `newlines='\0'` on stdout, but not on stderr.


For the convenience methods that's probably not an issue, because the only way to read both stdout and stderr is to reroute the latter to the former anyway. But even there, you might not necessarily want input and output to be the same—`xargs -0` is a perfect example of that.

And, even forgetting #1152248, it's not hard to think of cases where you want input and output to be different. For example, I've got an old script that selects and cats a bunch of old Excel-format CSV files (in CP-1252, CRLF) off a file server, based on input data in native text files (which on my machine means UTF-8, LF). Using it with binary pipes is pretty easy, changing it to explicitly wrap each pipe in the appropriate `TextIOWrapper` would be easy, being able to pass an encoding and newline value to the Popen would be misleading…

But as long as there are enough use cases for wanting to pass the same arguments for all pipes, I think the suggestion is OK. Especially considering that often you only want one pipe in the first place, which counts as a use case for passing the same arguments for all 1 pipe, right?

(By the way, thanks for this reminder to finish testing and cleaning up that patch for #1152248…)


More information about the Python-ideas mailing list