[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5
python at mrabarnett.plus.com
Sat Jan 11 20:22:30 CET 2014
On 2014-01-11 05:36, Steven D'Aprano wrote:
> Latin-1 has the nice property that every byte decodes into the character
> with the same code point, and visa versa. So:
> for i in range(256):
> assert bytes([i]).decode('latin-1') == chr(i)
> assert chr(i).encode('latin-1') == bytes([i])
> passes. It seems to me that your problem goes away if you use Unicode
> text with embedded binary data, rather than binary data with embedded
> ASCII text. Then when writing the file to disk, of course you encode it
> to Latin-1, either explicitly:
> pdf = ... # Unicode string containing the PDF contents
> with open("outfile.pdf", "wb") as f:
> or implicitly:
> with open("outfile.pdf", "w", encoding="latin-1") as f:
The second example won't work because you're forgetting about the
handling of line endings in text mode.
Suppose you have some binary data bytes().
You convert it into a Unicode string using Latin-1, giving '\n'.
You write it out to a file opened in text mode.
On Windows, that string '\n' will be written to the file as b'\r\n'.
More information about the Python-Dev