[Python-3000] PEP 3112

Sun May 6 09:47:12 CEST 2007

I just read PEP 3112, and I believe it contains a
flaw/underspecification.

It says

# Each shortstringchar or longstringchar must be a character between 1
# and 127 inclusive, regardless of any encoding declaration [2] in the
# source file.

What does that mean? In particular, what is "a character between 1 and
127"?

Assuming this refers to ordinal values in some encoding: what encoding?
It's particularly puzzling that it says "regardless of any encoding
declaration of the source file".

I fear (but hope that I'm wrong) that this was meant to mean "use the
bytes as they are stored on disk in the source file". If so: is the
attached file valid Python? In case your editor can't render it: it
reads

#! -*- coding: iso-2022-jp -*-
a = b"Питон"

But if you look at the file with a hex editor, you see it contains
only bytes between 1 and 127.

I would hope that this code is indeed ill-formed (i.e. that
the byte representation on disk is irrelevant, and only the
Unicode ordinals of the source characters matter)

If so, can the specification please be updated to clarify that
1. in Grammar changes: Each shortstringchar or longstringchar must
   be a character whose Unicode ordinal value is between 1 and
   127 inclusive.
2. in Semantics: The bytes in the new object are obtained as if
   encoding a string literal with "iso-8859-1"

Regards,
Martin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: a.py
Type: text/x-python
Size: 55 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070506/c0269ce4/attachment.py