[Python-3000] New io system and binary data

Christian Heimes lists at cheimes.de
Wed Sep 19 18:42:27 CEST 2007


Today I stumbled over another problem that is related to the unicode and
OS string topic. The new io system - or to be more precisely the
implicit converting of input and output data to UTF-8 makes it
impossible to pipe binary data through Python 3.0.

For example an user wants to write a filter for binary data like images
in Python. With Python 2.5 the input and output data isn't implicitly
converted:

# stdredirect.py
# simple stupid example
import sys
sys.stdout.write(sys.stdin.read())

$ chmod 755 stdredict.py
$ cat ./Mac/Demo/html.icons/python.gif | python2.5 stdredirect.py >out.gif
$ diff ./Mac/Demo/html.icons/python.gif out.gif

But Python 3.0 is using TextIOWrapper for stdin, stdout and stderr:

$ cat ./Mac/Demo/html.icons/python.gif | ./python stdredirect.py
>out.gifTraceback (most recent call last):
  File "./stdredict.py", line 4, in <module>
    sys.stdout.write(sys.stdin.read())
  File "/home/heimes/dev/python/py3k/Lib/io.py", line 1225, in read
    res += decoder.decode(self.buffer.read(), True)
  File "/home/heimes/dev/python/py3k/Lib/codecs.py", line 291, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 10-13:
invalid data

An easy workaround for the problem is:

sys.stdout = sys.stdout.buffer
sys.stdin = sys.stdin.buffer

I recommend that the problem and fix gets documented. Maybe stdin,
stdout and stderr should get a method that disables the implicit
conversion like setMode("b") / setMode("t").

Christian



More information about the Python-3000 mailing list