[Python-Dev] PEP 460 reboot

Donald Stufft donald at stufft.io
Mon Jan 13 01:46:19 CET 2014


On Jan 12, 2014, at 6:55 PM, Guido van Rossum <guido at python.org> wrote:

> The key reason for introducing a separate bytes type in Python 3 is to
> avoid *mixing* bytes and text. This aims to avoid the classic Python 2
> Unicode failure, where str+unicode fails or succeeds based on whether
> str contains non-ASCII characters or not, which means it is easy to
> miss in testing. 

+1

> 
> But this does not mean the bytes type isn't allowed to have a
> noticeable bias in favor of encodings that are ASCII supersets, even
> if not all bytes objects contain such data (e.g. image data,
> compressed data, binary network packets, and so on).

+1

> 
> IMO it's totally fine and consistent if b'%d' % 42 returns b'42' and
> also for b'{}'.format(42) to return b'42'. There are numerous places
> where bytes are already assumed to use an ASCII superset:
> 
> - byte literals: b'abc' (it's a syntax error to have a non-ASCII character here)
> - the upper() and lower() methods modify the ASCII letter positions
> - int(b'42') == 42, float(b'3.14') == 3.14

Completely Agree.

> 
> I looked through the example code I recently write for asyncio (which
> uses bytes for all data read or written). There are several places
> where I have to make a clumsy detour via text strings because I need
> to include an ASCII-encoded decimal integer (e.g. the Content-Length
> header) or a hex-encoded one (e.g. for Transfer-Encoding: chunked).
> Those detours aren't needed for parsing because int() accepts bytes
> just fine.
> 
> I also note that the behavior of the re module is perfect: if the
> pattern is bytes, it can only match bytes and the extracted data is
> bytes, and ditto for text -- so it supports both types but doesn't
> allow mixing them. The urllib module does this too -- at considerable
> cost in its implementation, but it's the right thing, because there
> really are good cases to be made for treating URLs as text as well as
> for treating them as bytes (as with filenames, command line arguments,
> and environment variables).
> 
> I'm sad that the json module in Python 3 doesn't support bytes at all,
> but at least it is consistent -- it always produces text in ASCII
> encoding (by default). The same applies to the http module, which IIUC
> adheres to the standard by treating headers as Latin-1.
> 
> -- 
> --Guido van Rossum (python.org/~guido)
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140112/2437cb4f/attachment.sig>


More information about the Python-Dev mailing list