[Python3] Reading a binary file and wrtiting the bytes verbatim in an utf-8 file
Antoine Pitrou
solipsis at pitrou.net
Sat Apr 24 20:16:48 EDT 2010
Hello,
> I have to read the contents of a binary file (a PNG file exactly), and
> dump it into an RTF file.
>
> The RTF-file has been opened with codecs.open in utf-8 mode.
You should use the built-in open() function. codecs.open() is outdated in
Python 3.
> As I expected, the utf-8 decoder chokes on some combinations of bits;
> how can I tell python to dump the bytes as they are, without
> interpreting them?
Well, the one thing you have to be careful about is to flush text buffers
before writing binary data. But, for example:
>>> f = open("TEST", "w", encoding='utf8')
>>> f.write("héhé")
4
>>> f.flush()
>>> f.buffer.write(b"\xff\x00")
2
>>> f.close()
gives you:
$ hexdump -C TEST
00000000 68 c3 a9 68 c3 a9 ff 00 |h..h....|
(utf-8 encoded text and then two raw bytes which are invalid utf-8)
Another possibility is to open the file in binary mode and do the
encoding yourself when writing text. This might actually be a better
solution, since I'm not sure RTF uses utf-8 by default.
Regards
Antoine.
More information about the Python-list
mailing list