[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
Phillip J. Eby
pje at telecommunity.com
Tue Feb 14 00:17:07 CET 2006
At 12:03 AM 2/14/2006 +0100, M.-A. Lemburg wrote:
>The conversion from Unicode to bytes is different in this
>respect, since you are converting from a "bigger" type to
>a "smaller" one. Choosing latin-1 as default for this
>conversion would give you all 8 bits, instead of just 7
>bits that ASCII provides.
I was just pointing out that since byte strings are bytes by definition,
then simply putting those bytes in a bytes() object doesn't alter the
existing encoding. So, using latin-1 when converting a string to bytes
actually seems like the the One Obvious Way to do it.
I'm so accustomed to being wary of encoding issues that the idea doesn't
*feel* right at first - I keep going, "but you can't know what encoding
those bytes are". Then I go, Duh, that's the point. If you convert
str->bytes, there's no conversion and no interpretation - neither the str
nor the bytes object knows its encoding, and that's okay. So
str(bytes_object) (in 2.x) should also just turn it back to a normal
bytestring.
In fact, the 'encoding' argument seems useless in the case of str objects,
and it seems it should default to latin-1 for unicode objects. The only
use I see for having an encoding for a 'str' would be to allow confirming
that the input string in fact is valid for that encoding. So,
"bytes(some_str,'ascii')" would be an assertion that some_str must be valid
ASCII.
> > So, it sounds like making the encoding default to latin-1 would be a
> > reasonably safe approach in both 2.x and 3.x.
>
>Reasonable for bytes(): yes. In general: no.
Right, I was only talking about bytes().
For 3.0, the type formerly known as "str" won't exist, so only the Unicode
part will be relevant then.
More information about the Python-Dev
mailing list