[Python-Dev] bytes type discussion

Wed Feb 15 09:39:10 CET 2006

On 2/15/06, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> Adam Olsen wrote:
> > (I wonder if maybe they should be an error in 2.x as well.  Source
> > encoding is for unicode literals, not str literals.)
>
> Source encoding applies to the entire source code, including (byte)
> string literals, comments, identifiers, and keywords. IOW, if you
> declare your source encoding is utf-8, the keyword "print" must
> be represented with the bytes that represent the Unicode letters
> for "p","r","i","n", and "t" in UTF-8.

Although it does apply to the entire source file, I think this is more
for convenience (try telling an editor that only a single line is
Shift_JIS!) than to allow 8-bit (or 16-bit?!) str literals.  Indeed,
you could have arbitrary 8-bit str literals long before the source
encoding was added.  Keywords and identifiers continue to be limited
to ascii characters (even if they make a roundtrip through other
encodings), and comments continue to be ignored.

Source encoding exists so that you can write u"123" with the encoding
stated once at the top of the file, rather than "123".decode('utf-8')
with the encoding repeated everywhere.

Making it an error to have 8-bit str literals in 2.x would help
educate the user that they will change behavior in 3.0 and not be
8-bit str literals anymore.

--
Adam Olsen, aka Rhamphoryncus