[Python-3000] New io system and binary data

Guido van Rossum guido at python.org
Wed Sep 19 19:19:13 CEST 2007


Changing the mode between text and binary is not feasible (since it
would have to change the class). But it is perfectly acceptable to use
sys.std{in,out}.buffer if you need to write a binary transparent
filter. Of course you'll be dealing with bytes at that point so the
usual cautions apply. I wouldn't do the assignments you propose
though, since that might surprise other code which expects text files.

--Guido

On 9/19/07, Christian Heimes <lists at cheimes.de> wrote:
> Today I stumbled over another problem that is related to the unicode and
> OS string topic. The new io system - or to be more precisely the
> implicit converting of input and output data to UTF-8 makes it
> impossible to pipe binary data through Python 3.0.
>
> For example an user wants to write a filter for binary data like images
> in Python. With Python 2.5 the input and output data isn't implicitly
> converted:
>
> # stdredirect.py
> # simple stupid example
> import sys
> sys.stdout.write(sys.stdin.read())
>
> $ chmod 755 stdredict.py
> $ cat ./Mac/Demo/html.icons/python.gif | python2.5 stdredirect.py >out.gif
> $ diff ./Mac/Demo/html.icons/python.gif out.gif
>
> But Python 3.0 is using TextIOWrapper for stdin, stdout and stderr:
>
> $ cat ./Mac/Demo/html.icons/python.gif | ./python stdredirect.py
> >out.gifTraceback (most recent call last):
>   File "./stdredict.py", line 4, in <module>
>     sys.stdout.write(sys.stdin.read())
>   File "/home/heimes/dev/python/py3k/Lib/io.py", line 1225, in read
>     res += decoder.decode(self.buffer.read(), True)
>   File "/home/heimes/dev/python/py3k/Lib/codecs.py", line 291, in decode
>     (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 10-13:
> invalid data
>
> An easy workaround for the problem is:
>
> sys.stdout = sys.stdout.buffer
> sys.stdin = sys.stdin.buffer
>
> I recommend that the problem and fix gets documented. Maybe stdin,
> stdout and stderr should get a method that disables the implicit
> conversion like setMode("b") / setMode("t").

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list