[Python-Dev] What to do for bytes in 2.6?

glyph at divmod.com glyph at divmod.com
Sun Jan 20 02:54:48 CET 2008


On 19 Jan, 07:32 pm, guido at python.org wrote:
>There is no way to know whether that return value means text or data
>(plenty of apps legitimately read text straight off a socket in 2.x),

IMHO, this is a stretch of the word "legitimately" ;-).  If you're 
reading from a socket, what you're getting are bytes, whether they're 
represented by str() or bytes(); correct code in 2.x must currently do a 
.decode("ascii") or .decode("charmap") to "legitimately" identify the 
result as text of some kind.

Now, ad-hoc code with a fast and loose definition of "text" can still 
read arrays of bytes off a socket without specifying an encoding and get 
away with it, but that's because Python's unicode implementation has 
thus far been very forgiving, not because the data is cleanly text yet. 
Why can't we get that warning in -3 mode just the same from something 
read from a socket and a b"" literal?  I've written lots of code that 
aggressively rejects str() instances as text, as well as unicode 
instances as bytes, and that's in code that still supports 2.3 ;).
>Really, the pure aliasing solution is just about optimal in terms of
>bang per buck. :-)

Not that I'm particularly opposed to the aliasing solution, either.  It 
would still allow writing code that was perfectly useful in 2.6 as well 
as 3.0, and it would avoid disturbing code that did checks of type(""). 
It would just remove an opportunity to get one potentially helpful 
warning.


More information about the Python-Dev mailing list