[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

MRAB python at mrabarnett.plus.com
Sat Jan 11 20:22:30 CET 2014


On 2014-01-11 05:36, Steven D'Aprano wrote:
[snip]
> Latin-1 has the nice property that every byte decodes into the character
> with the same code point, and visa versa. So:
>
> for i in range(256):
>      assert bytes([i]).decode('latin-1') == chr(i)
>      assert chr(i).encode('latin-1') == bytes([i])
>
> passes. It seems to me that your problem goes away if you use Unicode
> text with embedded binary data, rather than binary data with embedded
> ASCII text. Then when writing the file to disk, of course you encode it
> to Latin-1, either explicitly:
>
> pdf = ... # Unicode string containing the PDF contents
> with open("outfile.pdf", "wb") as f:
>      f.write(pdf.encode("latin-1")
>
> or implicitly:
>
> with open("outfile.pdf", "w", encoding="latin-1") as f:
>      f.write(pdf)
>
[snip]
The second example won't work because you're forgetting about the
handling of line endings in text mode.

Suppose you have some binary data bytes([10]).

You convert it into a Unicode string using Latin-1, giving '\n'.

You write it out to a file opened in text mode.

On Windows, that string '\n' will be written to the file as b'\r\n'.



More information about the Python-Dev mailing list