[issue691291] codecs.open(filename, 'U', 'UTF-16') corrupts text

And Clover report at bugs.python.org
Thu Feb 5 02:42:22 CET 2009


And Clover <and at doxdesk.com> added the comment:

> The problem is that codecs.open() forces binary mode on the underlying
file object, and this defeats the U mode.

Actually the problem is it doesn't defeat it!

The function is documented to force binary, but it actually only does
"mode = mode + 'b'", which can leave you with a mode of 'rUb'. This mode
should be invalid but in practice the 'U' wins out, and causes the
expected problems for UTF-16 and some East Asian codecs.

Until such time as text/universal mode is supported at the overlying
decoded stream level, I suggest that 'U' should be .replace()d out of
the mode as well as 'b' being added, as the documentation would imply.

----------
nosy: +aclover

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue691291>
_______________________________________


More information about the Python-bugs-list mailing list