[Python3] Reading a binary file and wrtiting the bytes verbatim in an utf-8 file

Antoine Pitrou solipsis at pitrou.net
Sat Apr 24 20:16:48 EDT 2010


Hello,

> I have to read the contents of a binary file (a PNG file exactly), and
> dump it into an RTF file.
> 
> The RTF-file has been opened with codecs.open in utf-8 mode.

You should use the built-in open() function. codecs.open() is outdated in 
Python 3.

> As I expected, the utf-8 decoder chokes on some combinations of bits;
> how can I tell python to dump the bytes as they are, without
> interpreting them?

Well, the one thing you have to be careful about is to flush text buffers 
before writing binary data. But, for example:

>>> f = open("TEST", "w", encoding='utf8')
>>> f.write("héhé")
4
>>> f.flush()
>>> f.buffer.write(b"\xff\x00")
2
>>> f.close()

gives you:

$ hexdump -C TEST
00000000  68 c3 a9 68 c3 a9 ff 00                           |h..h....|

(utf-8 encoded text and then two raw bytes which are invalid utf-8)

Another possibility is to open the file in binary mode and do the 
encoding yourself when writing text. This might actually be a better 
solution, since I'm not sure RTF uses utf-8 by default.

Regards

Antoine.





More information about the Python-list mailing list