[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

Tue Feb 14 00:17:07 CET 2006

At 12:03 AM 2/14/2006 +0100, M.-A. Lemburg wrote:
>The conversion from Unicode to bytes is different in this
>respect, since you are converting from a "bigger" type to
>a "smaller" one. Choosing latin-1 as default for this
>conversion would give you all 8 bits, instead of just 7
>bits that ASCII provides.

I was just pointing out that since byte strings are bytes by definition, 
then simply putting those bytes in a bytes() object doesn't alter the 
existing encoding.  So, using latin-1 when converting a string to bytes 
actually seems like the the One Obvious Way to do it.

I'm so accustomed to being wary of encoding issues that the idea doesn't 
*feel* right at first - I keep going, "but you can't know what encoding 
those bytes are".  Then I go, Duh, that's the point.  If you convert 
str->bytes, there's no conversion and no interpretation - neither the str 
nor the bytes object knows its encoding, and that's okay.  So 
str(bytes_object) (in 2.x) should also just turn it back to a normal 
bytestring.

In fact, the 'encoding' argument seems useless in the case of str objects, 
and it seems it should default to latin-1 for unicode objects.  The only 
use I see for having an encoding for a 'str' would be to allow confirming 
that the input string in fact is valid for that encoding.  So, 
"bytes(some_str,'ascii')" would be an assertion that some_str must be valid 
ASCII.

> > So, it sounds like making the encoding default to latin-1 would be a
> > reasonably safe approach in both 2.x and 3.x.
>
>Reasonable for bytes(): yes. In general: no.

Right, I was only talking about bytes().

For 3.0, the type formerly known as "str" won't exist, so only the Unicode 
part will be relevant then.