[Python-3000] PEP 3112

Mon May 7 19:45:40 CEST 2007

On 5/6/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> I just read PEP 3112, and I believe it contains a
> flaw/underspecification.
>
> It says
>
> # Each shortstringchar or longstringchar must be a character between 1
> # and 127 inclusive, regardless of any encoding declaration [2] in the
> # source file.
>
> What does that mean? In particular, what is "a character between 1 and
> 127"?
>
> Assuming this refers to ordinal values in some encoding: what encoding?
> It's particularly puzzling that it says "regardless of any encoding
> declaration of the source file".
>
> I fear (but hope that I'm wrong) that this was meant to mean "use the
> bytes as they are stored on disk in the source file". If so: is the
> attached file valid Python? In case your editor can't render it: it
> reads
>
> #! -*- coding: iso-2022-jp -*-
> a = b"Питон"
>
> But if you look at the file with a hex editor, you see it contains
> only bytes between 1 and 127.
>
> I would hope that this code is indeed ill-formed (i.e. that
> the byte representation on disk is irrelevant, and only the
> Unicode ordinals of the source characters matter)
>
> If so, can the specification please be updated to clarify that
> 1. in Grammar changes: Each shortstringchar or longstringchar must
>    be a character whose Unicode ordinal value is between 1 and
>    127 inclusive.
> 2. in Semantics: The bytes in the new object are obtained as if
>    encoding a string literal with "iso-8859-1"

Sounds like a good fix to me; I agree that bytes literals, like
Unicode literals, should not vary depending on the source encoding. In
step 2, can't you use "ascii" as the encoding?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)